Article

Joint Maximum Likelihood Estimation for Diagnostic Classification Models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Joint maximum likelihood estimation (JMLE) is developed for diagnostic classification models (DCMs). JMLE has been barely used in Psychometrics because JMLE parameter estimators typically lack statistical consistency. The JMLE procedure presented here resolves the consistency issue by incorporating an external, statistically consistent estimator of examinees’ proficiency class membership into the joint likelihood function, which subsequently allows for the construction of item parameter estimators that also have the consistency property. Consistency of the JMLE parameter estimators is established within the framework of general DCMs: The JMLE parameter estimators are derived for the Loglinear Cognitive Diagnosis Model (LCDM). Two consistency theorems are proven for the LCDM. Using the framework of general DCMs makes the results and proofs also applicable to DCMs that can be expressed as submodels of the LCDM. Simulation studies are reported for evaluating the performance of JMLE when used with tests of varying length and different numbers of attributes. As a practical application, JMLE is also used with “real world” educational data collected with a language proficiency test.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the cognitive diagnostic modeling literature, Chiu et al. (2016) recently considered the joint MLE approach and studied consistency of the so-called item parameters, that is, the parameters of test items in a diagnostic assessment. But Chiu et al. (2016)'s argument adopts a rather strong assumption, which is the a priori existence of some consistent estimator for the individual latent attribute profiles. ...
... In the cognitive diagnostic modeling literature, Chiu et al. (2016) recently considered the joint MLE approach and studied consistency of the so-called item parameters, that is, the parameters of test items in a diagnostic assessment. But Chiu et al. (2016)'s argument adopts a rather strong assumption, which is the a priori existence of some consistent estimator for the individual latent attribute profiles. A theoretically justified approach of directly obtaining consistent estimators for both the parameters for the items and those for individual latent profiles is lacking in the literature. ...
... Establishing theory for statistical estimability and consistency for discrete latent variables in full generality requires different arguments from those in Chiu et al. (2016), Chen et al. (2019b), and Chen et al. (2019a). In addition, new computational methods need to be developed to address the unique challenge of estimation with a large number of discrete latent attributes. ...
Preprint
Structured Latent Attribute Models (SLAMs) are a family of discrete latent variable models widely used in education, psychology, and epidemiology. A SLAM postulates that multiple discrete latent attributes explain the dependence of observed variables in a highly structured fashion. Usually, the maximum marginal likelihood estimation approach is adopted for SLAMs, treating the latent attributes as random effects. The increasing scope of modern measurement data involves large numbers of observed variables and high-dimensional latent attributes. This poses challenges to classical estimation methods and requires new methodology and understanding of latent variable modeling. Motivated by this, we consider the joint maximum likelihood estimation (MLE) approach to SLAMs, treating latent attributes as fixed unknown parameters. We investigate estimability, consistency, and computation in the regime where sample size, number of variables, and number of latent attributes can all diverge. We establish consistency of the joint MLE and propose an efficient algorithm that scales well to large-scale data. Additionally, we provide theoretically valid and effective methods for misspecification scenarios when a more general SLAM is misspecified to a submodel. Simulations demonstrate the superior empirical performance of the proposed methods. An application to real data from an international educational assessment gives interpretable findings of cognitive diagnosis.
... Popular examples include the deterministic input, noisy "and" gate (DINA) model (Junker and Sijtsma, 2001), the deterministic input, noisy "or" gate (DINO) model (Templin and Henson, 2006), the reduced reparameterized unified model (Reduced RUM;Hartz, 2002), the general diagnostic model (GDM;von Davier, 2005), the log-linear CDM (LCDM; Henson et al., 2009), and the generalized DINA model (GDINA; de la Torre, 2011). To estimate these parametric models, estimators maximizing the marginal likelihood or joint likelihood functions have been employed (e.g., Chiu et al., 2016;de la Torre, 2009). ...
... To fit CDMs, popularly used parametric methods include marginal maximum likelihood estimation (MMLE) through EM algorithms (de la Torre, 2009, 2011) and MCMC techniques (DiBello et al., 2007von Davier, 2005). Chiu, Köhn, Zheng, and Henson (2016) also proposed a joint maximum likelihood estimation (JMLE) method for fitting CDMs. The parametric estimation methods usually perform well when there are sufficiently large data. ...
... As in the analysis of the joint maximum likelihood estimation in Chiu et al. (2016), we assume that there is a calibration dataset that would give a statistically consistent estimator of the calibration subjects' latent class membership c , in the sense that P( c = A c ) → 0 as J → ∞. We use N c and A 0 c to denote the sample size and the true membership matrix of the calibration dataset, respectively. ...
Preprint
Full-text available
A number of parametric and nonparametric methods for estimating cognitive diagnosis models (CDMs) have been developed and applied in a wide range of contexts. However, in the literature, a wide chasm exists between these two families of methods, and their relationship to each other is not well understood. In this paper, we propose a unified estimation framework to bridge the divide between parametric and nonparametric methods in cognitive diagnosis to better understand their relationship. We also develop iterative joint estimation algorithms and establish consistency properties within the proposed framework. Lastly, we present comprehensive simulation results to compare different methods, and provide practical recommendations on the appropriate use of the proposed framework in various CDM contexts.
... Our theoretical framework assumes a diverging number of items, which is suitable when analyzing large scale data. To the best of our knowledge, such an asymptotic setting has not received enough attention, except in Haberman (1977Haberman ( , 2004 and Chiu et al. (2016). Our theoretical analysis applies to a general MIRT model that includes the multidimensional two-parameter logistic model (Reckase and McKinley, 1983;Reckase, 2009) as a special case, while the analyses in Haberman (1977Haberman ( , 2004 and Chiu et al. (2016) are limited to the unidimensional Rasch model (Rasch, 1960;Lord et al., 1968) and cognitive diagnostic models (Rupp et al., 2010), respectively. ...
... To the best of our knowledge, such an asymptotic setting has not received enough attention, except in Haberman (1977Haberman ( , 2004 and Chiu et al. (2016). Our theoretical analysis applies to a general MIRT model that includes the multidimensional two-parameter logistic model (Reckase and McKinley, 1983;Reckase, 2009) as a special case, while the analyses in Haberman (1977Haberman ( , 2004 and Chiu et al. (2016) are limited to the unidimensional Rasch model (Rasch, 1960;Lord et al., 1968) and cognitive diagnostic models (Rupp et al., 2010), respectively. Our technical tools for studying the properties of the CJMLE include theoretical developments in matrix completion theory (e.g. ...
Article
Full-text available
Multidimensional item response theory is widely used in education and psychology for measuring multiple latent traits. However, exploratory analysis of large-scale item response data with many items, respondents, and latent traits is still a challenge. In this paper, we consider a high-dimensional setting that both the number of items and the number of respondents grow to infinity. A constrained joint maximum likelihood estimator is proposed for estimating both item and person parameters, which yields good theoretical properties and computational advantage. Specifically, we derive error bounds for parameter estimation and develop an efficient algorithm that can scale to very large datasets. The proposed method is applied to a large scale personality assessment data set from the Synthetic Aperture Personality Assessment (SAPA) project. Simulation studies are conducted to evaluate the proposed method.
... This setting seems reasonable for analyzing large-scale item response data. Similar asymptotic settings have been considered in psychometric research, including the analysis of unidimensional IRT models (Haberman, 1977(Haberman, , 2004 and diagnostic classification models (Chiu et al., 2016). Under this asymptotic setting, we propose a constrained joint maximum likelihood estimator (CJMLE) that has certain notion of statistical consistency in recovering factor loadings. ...
Preprint
Joint maximum likelihood (JML) estimation is one of the earliest approaches to fitting item response theory (IRT) models. This procedure treats both the item and person parameters as unknown but fixed model parameters and estimates them simultaneously by solving an optimization problem. However, the JML estimator is known to be asymptotically inconsistent for many IRT models, when the sample size goes to infinity and the number of items keeps fixed. Consequently, in the psychometrics literature, this estimator is less preferred to the marginal maximum likelihood (MML) estimator. In this paper, we re-investigate the JML estimator for high-dimensional exploratory item factor analysis, from both statistical and computational perspectives. In particular, we establish a notion of statistical consistency for a constrained JML estimator, under an asymptotic setting that both the numbers of items and people grow to infinity and that many responses may be missing. A parallel computing algorithm is proposed for this estimator that can scale to very large datasets. Via simulation studies, we show that when the dimensionality is high, the proposed estimator yields similar or even better results than those from the MML estimator, but can be obtained computationally much more efficiently. An illustrative real data example is provided based on the revised version of Eysenck's Personality Questionaire (EPQ-R).
... However, the number of parameters in the joint likelihood in Equation 18 grows with sample size N, and, similar to continuous IRT models, joint maximum likelihood estimation leads to inconsistent parameter estimates (Baker & Kim 2004). Chiu et al. (2016) suggested initializing the examinees' attribute pattern estimates with nonparametric classifications on examinee mastery patterns to improve estimation stability. ...
Article
Diagnostic classification tests are designed to assess examinees’ discrete mastery status on a set of skills or attributes. Such tests have gained increasing attention in educational and psychological measurement. We review diagnostic classification models and their applications to testing and learning, discuss their statistical and machine learning connections and related challenges, and introduce some contemporary and future extensions. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 10 is March 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
... The Q-matrix contains three attributes: Morphosyntactic Form, Cohesive Form and Lexical Form. The data were also analyzed in Chiu et al. (2016) using a joint likelihood estimation approach. We again use DINA, GDINA and their partial mastery counterparts for fitting this dataset. ...
Preprint
Cognitive diagnosis models (CDMs) are a family of discrete latent attribute models that serve as statistical basis in educational and psychological cognitive diagnosis assessments. CDMs aim to achieve fine-grained inference on individuals' latent attributes, based on their observed responses to a set of designed diagnostic items. In the literature, CDMs usually assume that items require mastery of specific latent attributes and that each attribute is either fully mastered or not mastered by a given subject. We propose a new class of models, partial mastery CDMs (PM-CDMs), that generalizes CDMs by allowing for partial mastery levels for each attribute of interest. We demonstrate that PM-CDMs can be represented as restricted latent class models. Relying on the latent class representation, we propose a Bayesian approach for estimation. We present simulation studies to demonstrate parameter recovery, to investigate the impact of model misspecification with respect to partial mastery, and to develop diagnostic tools that could be used by practitioners to decide between CDMs and PM-CDMs. We use two examples of real test data -- the fraction subtraction and the English tests -- to demonstrate that employing PM-CDMs not only improves model fit, compared to CDMs, but also can make substantial difference in conclusions about attribute mastery. We conclude that PM-CDMs can lead to more effective remediation programs by providing detailed individual-level information about skills learned and skills that need to study.
... The regularity conditions and the consistency result are formally described in Theorem 1, with two special cases discussed in the sequel. Similar double asymptotic settings have been considered in psychometric research, including the analyses of unidimensional IRT models (Haberman 1977;2004) and diagnostic classification models (Chiu et al. 2016). The following regularity conditions are needed for our main result in Theorem 1. ...
Article
Full-text available
We revisit a singular value decomposition (SVD) algorithm given in Chen et al. (Psychometrika 84:124–146, 2019b) for exploratory item factor analysis (IFA). This algorithm estimates a multidimensional IFA model by SVD and was used to obtain a starting point for joint maximum likelihood estimation in Chen et al. (2019b). Thanks to the analytic and computational properties of SVD, this algorithm guarantees a unique solution and has computational advantage over other exploratory IFA methods. Its computational advantage becomes significant when the numbers of respondents, items, and factors are all large. This algorithm can be viewed as a generalization of principal component analysis to binary data. In this note, we provide the statistical underpinning of the algorithm. In particular, we show its statistical consistency under the same double asymptotic setting as in Chen et al. (2019b). We also demonstrate how this algorithm provides a scree plot for investigating the number of factors and provide its asymptotic theory. Further extensions of the algorithm are discussed. Finally, simulation studies suggest that the algorithm has good finite sample performance.
... The regularity conditions and the consistency result are formally described in Theorem 1, with two special cases discussed in the sequel. Similar double asymptotic settings have been considered in psychometric research, including the analysis of unidimensional IRT models (Haberman, 1977(Haberman, , 2004 and diagnostic classification models (Chiu et al., 2016). The following regularity conditions are needed for our main result in Theorem 1. ...
Preprint
In this note, we revisit a singular value decomposition (SVD) based algorithm that was given in Chen et al. (2019a) for obtaining an initial value for joint maximum likelihood estimation of exploratory Item Factor Analysis (IFA). This algorithm estimates a multidimensional item response theory model by SVD. Thanks to the computational efficiency and scalability of SVD, this algorithm has substantial computational advantage over other exploratory IFA algorithms, especially when the numbers of respondents, items, and latent dimensions are all large. Under the same double asymptotic setting and notion of consistency as in Chen et al. (2019a), we show that this simple algorithm provides a consistent estimator for the loading matrix up to a rotation. This result provides theoretical guarantee to the use of this simple algorithm for exploratory IFA.
Article
Items with the presence of differential item functioning (DIF) will compromise the validity and fairness of a test. Studies have investigated the DIF effect in the context of cognitive diagnostic assessment (CDA), and some DIF detection methods have been proposed. Most of these methods are mainly designed to perform the presence of DIF between two groups; however, empirical situations may contain more than two groups. To date, only a handful of studies have detected the DIF effect with multiple groups in the CDA context. This study uses the generalized logistic regression (GLR) method to detect DIF items by using the estimated attribute profile as matching criteria. A simulation study is conducted to examine the performance of the two GLR methods, GLR-based Wald test (GLR-Wald) and GLR-based likelihood ratio test (GLR-LRT), in detecting the DIF items, the results based on the ordinary Wald test are also reported. Results show that (1) both GLR-Wald and GLR-LRT have more reasonable performance in controlling Type I error rates than the ordinary Wald test in most conditions; (2) the GLR method also produces higher empirical rejection rates than the ordinary Wald test in most conditions; and (3) using the estimated attribute profile as the matching criteria can produce similar Type I error rates and empirical rejection rates for GLR-Wald and GLR-LRT. A real data example is also analyzed to illustrate the application of these DIF detection methods in multiple groups.
Article
A number of parametric and nonparametric methods for estimating cognitive diagnosis models (CDMs) have been developed and applied in a wide range of contexts. However, in the literature, a wide chasm exists between these two families of methods, and their relationship to each other is not well understood. In this paper, we propose a unified estimation framework to bridge the divide between parametric and nonparametric methods in cognitive diagnosis to better understand their relationship. We also develop iterative joint estimation algorithms and establish consistency properties within the proposed framework. Lastly, we present comprehensive simulation results to compare different methods and provide practical recommendations on the appropriate use of the proposed framework in various CDM contexts.
Article
In this study, a simulation-based method for computing joint maximum likelihood estimates of the reduced reparameterized unified model parameters is proposed. The central theme of the approach is to reduce the complexity of models to focus on their most critical elements. In particular, an approach analogous to joint maximum likelihood estimation is taken, and the latent attribute vectors are regarded as structural parameters, not parameters to be removed by integration with this approach, the joint distribution of the latent attributes does not have to be specified, which reduces the number of parameters in the model.
Article
Full-text available
This paper introduces the R package CDM for cognitive diagnosis models (CDMs). The package implements parameter estimation procedures for two general CDM frameworks , the generalized-deterministic input noisy-and-gate (G-DINA) and the general diagnostic model (GDM). It contains additional functions for analyzing data under these frameworks, like tools for simulating and plotting data, or for evaluating global model and item fit. The paper describes the theoretical aspects of implemented CDM frameworks and it illustrates the usage of the package with empirical data of the common fraction subtraction test by Tatsuoka (1984).
Article
Full-text available
Some usability and interpretability issues for single-strategy cognitive assessment models are considered. These models posit a stochastic conjunctive relationship between a set of cognitive attributes to be assessed and performance on particular items/tasks in the assessment. The models considered make few assumptions about the relationship between latent attributes and task performance beyond a simple conjunctive structure. An example shows that these models can be sensitive to cognitive attributes, even in data designed to well fit the Rasch model. Several stochastic ordering and monotonicity properties are considered that enhance the interpretability of the models. Simple data summaries are identified that inform about the presence or absence of cognitive attributes when the full computational power needed to estimate the models is not available.
Article
Full-text available
A definition ofessential independence is proposed for sequences of polytomous items. For items satisfying the reasonable assumption that the expected amount of credit awarded increases with examinee ability, we develop a theory ofessential unidimensionality which closely parallels that of Stout. Essentially unidimensional item sequences can be shown to have a unique (up to change-of-scale) dominant underlying trait, which can be consistently estimated by a monotone transformation of the sum of the item scores. In more general polytomous-response latent trait models (with or without ordered responses), anM-estimator based upon maximum likelihood may be shown to be consistent for under essentially unidimensional violations of local independence and a variety of monotonicity/identifiability conditions. A rigorous proof of this fact is given, and the standard error of the estimator is explored. These results suggest that ability estimation methods that rely on the summation form of the log likelihood under local independence should generally be robust under essential independence, but standard errors may vary greatly from what is usually expected, depending on the degree of departure from local independence. An index of departure from local independence is also proposed.
Article
Full-text available
Cognitive diagnosis models are constrained (multiple classification) latent class models that characterize the relationship of questionnaire responses to a set of dichotomous latent variables. Having emanated from educational measurement, several aspects of such models seem well suited to use in psychological assessment and diagnosis. This article presents the development of a new cognitive diagnosis model for use in psychological assessment--the DINO (deterministic input; noisy "or" gate) model--which, as an illustrative example, is applied to evaluate and diagnose pathological gamblers. As part of this example, a demonstration of the estimates obtained by cognitive diagnosis models is provided. Such estimates include the probability an individual meets each of a set of dichotomous Diagnostic and Statistical Manual of Mental Disorders (text revision [DSM-IV-TR]; American Psychiatric Association, 2000) criteria, resulting in an estimate of the probability an individual meets the DSM-IV-TR definition for being a pathological gambler. Furthermore, a demonstration of how the hypothesized underlying factors contributing to pathological gambling can be measured with the DINO model is presented, through use of a covariance structure model for the tetrachoric correlation matrix of the dichotomous latent variables representing DSM-IV-TR criteria.
Book
The purpose of this book is to identify how educational tests, especially large-scale tests given to students in grades K-12, can be improved so that they produce better information about what students know and don't know. By consulting and integrating psychological research into the design of educational tests, it is now possible to create new test items that students understand better than old test items. Moreover, these new test items help identify where students may be experiencing difficulties in learning. © Cambridge University Press 2007 and Cambridge University Press, 2009.
Article
The usefulness of joint and conditional maximum-likelihood is considered for the Rasch model under realistic testing conditions in which the number of examinees is very large and the number is items is relatively large. Conditions for consistency and asymptotic normality are explored, effects of model error are investigated, measures of prediction are estimated, and generalized residuals are developed.
Article
The Asymptotic Classification Theory of Cognitive Diagnosis (ACTCD) developed by Chiu, Douglas, and Li proved that for educational test data conforming to the Deterministic Input Noisy Output “AND” gate (DINA) model, the probability that hierarchical agglomerative cluster analysis (HACA) assigns examinees to their true proficiency classes approaches 1 as the number of test items increases. This article proves that the ACTCD also covers test data conforming to the Deterministic Input Noisy Output “OR” gate (DINO) model. It also demonstrates that an extension to the statistical framework of the ACTCD, originally developed for test data conforming to the Reduced Reparameterized Unified Model or the General Diagnostic Model (a) is valid also for both the DINA model and the DINO model and (b) substantially increases the accuracy of HACA in classifying examinees when the test data conform to either of these two models.
Article
In contrast to unidimensional item response models that postulate a single underlying proficiency, cognitive diagnosis models (CDMs) posit multiple, discrete skills or attributes, thus allowing CDMs to provide a finer-grained assessment of examinees' test performance. A common component of CDMs for specifying the attributes required for each item is the Q-matrix. Although construction of Q-matrix is typically performed by domain experts, it nonetheless, to a large extent, remains a subjective process, and misspecifications in the Q-matrix, if left unchecked, can have important practical implications. To address this concern, this paper proposes a discrimination index that can be used with a wide class of CDM subsumed by the generalized deterministic input, noisy "and" gate model to empirically validate the Q-matrix specifications by identifying and replacing misspecified entries in the Q-matrix. The rationale for using the index as the basis for a proposed validation method is provided in the form of mathematical proofs to several relevant lemmas and a theorem. The feasibility of the proposed method was examined using simulated data generated under various conditions. The proposed method is illustrated using fraction subtraction data.
Article
Diagnostic classification models (DCMs) are psychometric models widely discussed by researchers nowadays because of their promising feature of obtaining detailed information on students' mastery on specific attributes. Model estimation is essential for further implementation of these models, and estimation methods are often developed within some general framework, such as generalized diagnostic model (GDM) of von Davier, the log-linear diagnostic classification model (LDCM), and the generalized deterministic input, noisy-and-gate (G-DINA). Using a maximum likelihood estimation algorithm, this article addresses the estimation issue of a complex compensatory DCM, the reduced reparameterized unified model (rRUM), whose estimation under general frameworks could be lengthy due to the complexity of the model. The proposed estimation method is demonstrated on simulated data as well as a real data set, and is shown to provide accurate item parameter estimates for the rRUM.
Article
This paper presents the development of the fusion model skills diagnosis system (fusion model system), which can help integrate standardized testing into the learning process with both skills-level examinee parameters for modeling examinee skill mastery and skills-level item parameters, giving information about the diagnostic power of the test. The development of the fusion model system involves advancements in modeling, parameter estimation, model-fitting methods, and model-fit evaluation procedures, which are described in detail in the paper. To document the accuracy of the estimation procedure and the effectiveness of the model-fitting and model-fit evaluation procedures, this paper also presents a series of simulation studies. Special attention is given to evaluating the robustness of the fusion model system to violations of various modeling assumptions. The results demonstrate that the fusion model system is a promising tool for skills diagnosis that merits further research and development.
Article
A cognitive item response theory model called the attribute hierarchy method (AHM) is introduced and illustrated. This method represents a variation of Tatsuoka's rule-space approach. The AHM is designed explicitly to link cognitive theory and psychometric practice to facilitate the development and analyses of educational and psychological tests. The following are described: cognitive properties of the AHM; psychometric properties of the AHM, as well as a demonstration of how the AHM differs from Tatsuoka's rule-space approach; and application of the AHM to the domain of syllogistic reasoning to illustrate how this approach can be used to evaluate the cognitive competencies required in a higher-level thinking task. Future directions for research are also outlined.
Article
Diagnostic classification models (aka cognitive or skills diagnosis models) have shown great promise for evaluating mastery on a multidimensional profile of skills as assessed through examinee responses, but continued development and application of these models has been hindered by a lack of readily available software. In this article we demonstrate how diagnostic classification models may be estimated as confirmatory latent class models using Mplus, thus bridging the gap between the technical presentation of these models and their practical use for assessment in research and applied settings. Using a sample English test of three grammatical skills, we describe how diagnostic classification models can be phrased as latent class models within Mplus and how to obtain the syntax and output needed for estimation and interpretation of the model parameters. We also have written a freely available SAS program that can be used to automatically generate the Mplus syntax. We hope this work will ultimately result in greater access to diagnostic classification models throughout the testing community, from researchers to practitioners.
Book
A revision will be coming out in the next few months.
Article
Although latent attributes that follow a hierarchical structure are anticipated in many areas of educational and psychological assessment, current psychometric models are limited in their capacity to objectively evaluate the presence of such attribute hierarchies. This paper introduces the Hierarchical Diagnostic Classification Model (HDCM), which adapts the Log-linear Cognitive Diagnosis Model to cases where attribute hierarchies are present. The utility of the HDCM is demonstrated through simulation and by an empirical example. Simulation study results show the HDCM is efficiently estimated and can accurately test for the presence of an attribute hierarchy statistically, a feature not possible when using more commonly used DCMs. Empirically, the HDCM is used to test for the presence of a suspected attribute hierarchy in a test of English grammar, confirming the data is more adequately represented by hierarchical attribute structure when compared to a crossed, or nonhierarchical structure.
Article
Latent class models for cognitive diagnosis have been developed to classify examinees into one of the 2 K attribute profiles arising from a K-dimensional vector of binary skill indicators. These models recognize that response patterns tend to deviate from the ideal responses that would arise if skills and items generated item responses through a purely deterministic conjunctive process. An alternative to employing these latent class models is to minimize the distance between observed item response patterns and ideal response patterns, in a nonparametric fashion that utilizes no stochastic terms for these deviations. Theorems are presented that show the consistency of this approach, when the true model is one of several common latent class models for cognitive diagnosis. Consistency of classification is independent of sample size, because no model parameters need to be estimated. Simultaneous consistency for a large group of subjects can also be shown given some conditions on how sample size and test length grow with one another.
Article
A trend in educational testing is to go beyond unidimensional scoring and provide a more complete profile of skills that have been mastered and those that have not. To achieve this, cognitive diagnosis models have been developed that can be viewed as restricted latent class models. Diagnosis of class membership is the statistical objective of these models. As an alternative to latent class modeling, a nonparametric procedure is introduced that only requires specification of an item-by-attribute association matrix, and classifies according to minimizing a distance measure between observed responses, and the ideal response for a given attribute profile that would be implied by the item-by-attribute association matrix. This procedure requires no statistical parameter estimation, and can be used on a sample size as small as 1. Heuristic arguments are given for why the nonparametric procedure should be effective under various possible cognitive diagnosis models for data generation. Simulation studies compare classification rates with parametric models, and consider a variety of distance measures, data generation models, and the effects of model misspecification. A real data example is provided with an analysis of agreement between the nonparametric method and parametric approaches.
Article
This paper is divided into two main sections. The first half of the paper focuses on the intent and practice of diagnostic assessment, providing a general organiz-ing scheme for a diagnostic assessment implementation process, from design to scoring. The discussion includes specific concrete examples throughout, as well as summaries of data studies as appropriate. The second half of the paper focuses on one critical component of the implemen-tation process – the specification of an appropriate psychometric model. It includes the presentation of a general form for the models as an interaction of knowledge structure with item structure, a review of each of a variety of selected models, separate detailed summaries of knowledge structure modeling and item structure modeling, and lastly some summarizing and concluding remarks. To make the scope manageable, the part is restricted to models for dichotomously scored items. Throughout the paper, practical advice is given about how to apply and implement the ideas and principles discussed.
Article
This paper introduces a probabilistic approach to the classification and diagnosis of erroneous rules of operations that result from misconceptions ("bugs") in a procedural domain of arithmetic. The model is different from the usual deterministic strategies common in the field of artificial intelligence because variability of response errors is explicitly treated through item response theory. As a concrete example, we analyze a dataset that reflects the use of erroneous rules of operation in problems of signed-number subtraction. The same approach, however, is applicable to the classification of several different groups of response patterns caused by a variety of different underlying misconceptions, different backgrounds of knowledge, or treatment.
Article
Descriptions are presented of two related probabilistic models that can be used for making classification decisions with respect to mastery of specific concepts or skills. Included are the development of procedures for: (a) assessing the adequacy of "fit" provided by the models; (b) identifying optimal decision rules for mastery classification; and (c) identifying minimally sufficient numbers of items necessary to obtain acceptable levels of misclassification.
Article
In cognitive diagnosis, the test-taking behavior of some examinees may be idiosyncratic so that their test scores may not reflect their true cognitive abilities as much as that of more typical examinees. Statistical tests are developed to recognize the following: (a) nonmasters of the required attributes who correctly answer the item (spuriously high scores) and (b) masters of the attributes who fail to correctly answer the item (spuriously low scores). For a person, nonzero probability of aberrant behavior is tested as the alternative hypothesis, against normal behavior as the null hypothesis. The two generalized likelihood ratio test statistics used, with the null hypothesis parameter on the boundary of the parameter space in each, have asymptotic distributions of a 50:50 mixture of a chi-square distribution with one degree of freedom and a degenerate distribution that is a constant of 0 under the null hypothesis. Simulation results, primarily based on the DINA model (deterministic inputs, noisy ‘‘AND’’ gate), are used to investigate the following: (a) how accurately the statistical tests identify normal/aberrant behaviors, (b) how the power of the tests depends on the length of the cognitive exam and the degree of the inclination toward aberrance, and (c) how sensitive the tests are to inaccurate estimation of model parameters.
Article
Cognitive diagnostic assessment traditionally has been performed through the use of specialized tests designed for this purpose. These tests are traditionally analyzed by straightforward statistical methods. However, within psychometrics, recently a much different picture of cognitive diagnosis has emerged in which statistical models are employed to obtain cognitive diagnosis information from conventional assessments not designed for cognitive diagnosis purposes. This chapter examines whether use of these statistical models accomplishes its purpose. The challenge to cognitive diagnostic models arises when a single test designed to measure a single proficiency is analyzed in an attempt to extract more information. Such design leads to a test relatively well described by the use of conventional univariate item response theory. To be sure, such an item response model does not provide a perfect description of the data, but the room for improvement appears to be modest. In addition, some causes of model variation are much more prosaic than the causes considered by cognitive diagnostic models.
Article
This study uses the rule-space methodology to explore the cognitive and linguistic attributes that underlie performance on an open-ended, short-answer, listening comprehension test. This is the first application of the methodology to language testing. The rule-space methodology is an adaptation of statistical pattern recognition techniques applied to the problem of diagnosing the cognitive attributes (knowledge, skills, abilities, strategies, etc.) underlying test performance. The methodology provides diagnostic information about the individual test-takers on each of these attributes. Based on a literature search, attribute candidates were identified which seemed likely to explain performance on the listening test. Two rule-space analyses were carried out, and the final attribute list had 15 attributes, with 14 interactions. With this, 96% of the test-takers were successfully classified into their respective latent knowledge states, and given scores on each attribute. This result suggests that the rule-space methodology can be used to identify attributes underlying performance on language tests. The study also provided useful information about the listening construct.
Article
Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.
Article
This paper uses log-linear models with latent variables (Hagenaars, in Loglinear Models with Latent Variables, 1993) to define a family of cognitive diagnosis models. In doing so, the relationship between many common models is explicitly defined and discussed. In addition, because the log-linear model with latent variables is a general model for cognitive diagnosis, new alternatives to modeling the functional relationship between attribute mastery and the probability of a correct response are discussed. Keywordscognitive diagnosis models-log-linear latent class models-latent class models
Article
The G-DINA (generalized deterministic inputs, noisy “and” gate) model is a generalization of the DINA model with more relaxed assumptions. In its saturated form, the G-DINA model is equivalent to other general models for cognitive diagnosis based on alternative link functions. When appropriate constraints are applied, several commonly used cognitive diagnosis models (CDMs) can be shown to be special cases of the general models. In addition to model formulation, the G-DINA model as a general CDM framework includes a component for item-by-item model estimation based on design and weight matrices, and a component for item-by-item model comparison based on the Wald test. The paper illustrates the estimation and application of the G-DINA model as a framework using real and simulated data. It concludes by discussing several potential implications of and relevant issues concerning the proposed framework. Keywordscognitive diagnosis–DINA–MMLE–parameter estimation–Wald test–model comparison
Article
Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.
Article
Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr{SESnt}\Pr \{ S - ES \geq nt \} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.
Article
BUGS is a software package for Bayesian inference using Gibbs sampling. The software has been instrumental in raising awareness of Bayesian modelling among both academic and commercial communities internationally, and has enjoyed considerable success over its 20-year life span. Despite this, the software has a number of shortcomings and a principal aim of this paper is to provide a balanced critical appraisal, in particular highlighting how various ideas have led to unprecedented flexibility while at the same time producing negative side effects. We also present a historical overview of the BUGS project and some future perspectives.
Article
Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well-known models, such as univariate and multivariate versions of the Rasch model and the two-parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL Internet-based testing.
NPCD: Nonparametric methods for cognitive diagnosis
  • Y Zheng
  • C.-Y Chiu
The attribute hierarchy model: An approach for integrating cognitive theory with assessment practice
  • J P Leighton
  • M J Gierl
  • S Hunka
  • JP Leighton
A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practicality. Doctoral dissertation
  • S M Hartz
Latent GOLD’s users’s guide
  • J K Vermunt
  • J Magidson
  • JK Vermunt
Large-scale language assessment using cognitive diagnosis models. Paper presented at the annual meeting of the National Council on Measurement in Education
  • R A Henson
  • J Templin
Diagnostic measurement: Theory, methods, and applications
  • A A Rupp
  • J L Templin
  • R A Henson
  • AA Rupp
A general empirical method of Q-matrix validation. Paper presented at the annual meeting of the National Council on Measurement in Education
  • J De La Torre
  • C.-Y Chiu
  • FB Baker