Chapter

Cognitive Diagnosis Modeling Using the GDINA R Package

If you want to read the PDF, try requesting it from the authors.

Abstract

The GDINA R package (Ma and de la Torre, GDINA: The generalizedDINA model framework. R package version 2.3.2. Retrieved fromhttps://CRAN.R-project.org/package=GDINA: 2019) provides psychometric tools for estimatinga range of cognitive diagnosis models (CDMs) and conducting various CDManalyses. The package is developed in the R programming environment (R CoreTeam, R: A language and environment for statistical computing. R Foundationfor Statistical Computing, Vienna.https://www.R-project.org/: 2018). This chapterdescribes the main features of the package and presents an exemplary analysis ofdata to illustrate the use of the package.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The dataset we selected the grammar session of the Examination for the Certificate of Proficiency in English (ECPE) to illustrate an example of CDM analysis application, which has been used in several previous studies [9,36,42]. The dataset contains dichotomous responses to 28 items of 2922 students, reflecting their mastery of three grammar rules (attributes): morphosyntactic rules (A1), cohesive rules (A2), and lexical rules (A3). ...
... The first item on the x-axis and the last on the y-axis were dropped for pairing items. The adjusted p-values of all item pairs are plotted in the lower right shading area, where those of adequately fitted item pairs are in grey (p > 0.05) and those of inadequately fitted item pairs are in different tones of red (p < 0.05), depending on the p-value [42]. In our case, some item pairs (e.g., items 9 and 10 and items 13 and 22) demonstrated significant misfit and thus are in demand for further exploration by domain experts. ...
Article
Full-text available
Cognitive diagnosis models (CDMs) have increasingly been applied in education and other fields. This article provides an overview of a widely used CDM, namely, the G-DINA model, and demonstrates a hands-on example of using multiple R packages for a series of CDM analyses. This overview involves a step-by-step illustration and explanation of performing Q-matrix evaluation, CDM calibration, model fit evaluation, item diagnosticity investigation, classification reliability examination, and the result presentation and visualization. Some limitations of conducting CDM analysis in R are also discussed.
... Suppose this hypothetical item intended to measure only the "Prevention" attribute (identified as 000010), but a "hidden" cognitive task was necessary to solve the item: diagnosing hypertension. In this case, the Q-matrix validation procedure would have correctly identified the "hidden" cognitive process, suggesting the latent class 001010 for this item in the final Q-matrix with the inclusion of the attribute "Clinical diagnosis" (de la Torre & Akbay, 2019; de la Torre & Minchen, 2014;Ma, 2019). ...
Article
Full-text available
Criticisms about psychometric paradigms currently used in healthcare professions education include claims of reductionism, objectification, and poor compliance with assumptions. Nevertheless, perhaps the most crucial criticism comes from learners' difficulty in interpreting and making meaningful use of summative scores and the potentially detrimental impact these scores have on learners. The term "post-psychometric era" has become popular, despite persisting calls for the sensible use of modern psychometrics. In recent years, cognitive diagnostic modelling has emerged as a new psychometric paradigm capable of providing meaningful diagnostic feedback. Cognitive diagnostic modelling allows the classification of examinees in multiple cognitive attributes. This measurement is obtained by modelling these attributes as categorical, discrete latent variables. Furthermore, items can reflect more than one latent variable simultaneously. The interactions between latent variables can be modelled with flexibility, allowing a unique perspective on complex cognitive processes. These characteristic features of cognitive diagnostic modelling enable diagnostic classification over a large number of constructs of interest, preventing the necessity of providing numerical scores as feedback to test takers. This paper provides an overview of cognitive diagnostic modelling, including an introduction to its foundations and illustrating potential applications, to help teachers be involved in developing and evaluating assessment tools used in healthcare professions education. Cognitive diagnosis may represent a revolutionary new psychometric paradigm, overcoming the known limitations found in frequently used psychometric approaches, offering the possibility of robust qualitative feedback and better alignment with competency-based curricula and modern programmatic assessment frameworks.
Article
To diagnose the English as a Foreign Language (EFL) reading ability of Chinese high-school students, the study explored how an educational theory, the revised taxonomy of educational objectives, could be used to create the attribute list. Q-matrices were proposed and refined qualitatively and quantitatively. The final Q-matrix specified the relationship between 53 reading items and 9 cognitive attributes. Thereafter, 978 examinees’ responses were calibrated by cognitive diagnosis models (CDMs) to explore their strengths and weaknesses in EFL reading. Results showed strengths and weaknesses on the 9 attributes of the sampled population, examinees at three proficiency levels and individual learners. A diagnostic score report was also developed to communicate multi-layered information to various stakeholders. The goodness of fit of the selected CDM was evaluated from multiple measures. The results provide empirical evidence for the utility of educational theories in cognitive diagnosis, and the feasibility of retrofitting non-diagnostic tests for diagnostic purposes in language testing. In addition, the study also demonstrates procedures of model selection and a post-hoc approach of model verification in language diagnosis.
Article
Full-text available
Cognitive diagnosis models (CDMs) have attracted increasing attention in educational measurement because of their potential to provide diagnostic feedback about students' strengths and weaknesses. This article introduces the feature-rich R package GDINA for conducting a variety of CDM analyses. Built upon a general model framework, a number of CDMs can be calibrated using the GDINA package. Functions are also available for evaluating model-data fit, detecting differential item functioning, validating the item and attribute association, and examining classification accuracy. A grapical user interface is also provided for researchers who are less familar with R. This paper contains both technical details about model estimation and illustrations about how to use the package for data analysis. The GDINA package is also used to replicate published results, showing that it could provide comparable model parameter estimation.
Article
Full-text available
As a core component of most cognitive diagnosis models, the Q‐matrix, or item and attribute association matrix, is typically developed by domain experts, and tends to be subjective. It is critical to validate the Q‐matrix empirically because a misspecified Q‐matrix could result in erroneous attribute estimation. Most existing Q‐matrix validation procedures are developed for dichotomous responses. However, in this paper, we propose a method to empirically detect and correct the misspecifications in the Q‐matrix for graded response data based on the sequential generalized deterministic inputs, noisy ‘and’ gate (G‐DINA) model. The proposed Q‐matrix validation procedure is implemented in a stepwise manner based on the Wald test and an effect size measure. The feasibility of the proposed method is examined using simulation studies. Also, a set of data from the Trends in International Mathematics and Science Study (TIMSS) 2011 mathematics assessment is analysed for illustration.
Article
Full-text available
Research related to the fit evaluation at the item level involving cognitive diagnosis models (CDMs) has been scarce. According to the parsimony principle, balancing goodness of fit against model complexity is necessary. General CDMs require a larger sample size to be estimated reliably, and can lead to worse attribute classification accuracy than the appropriate reduced models when the sample size is small and the item quality is poor, which is typically the case in many empirical applications. The main purpose of this study was to systematically examine the statistical properties of four inferential item-fit statistics: S - X 2 , the likelihood ratio (LR) test, the Wald (W) test, and the Lagrange multiplier (LM) test. To evaluate the performance of the statistics, a comprehensive set of factors, namely, sample size, correlational structure, test length, item quality, and generating model, is systematically manipulated using Monte Carlo methods. Results show that the S - X 2 statistic has unacceptable power. Type I error and power comparisons favor LR and W tests over the LM test. However, all the statistics are highly affected by the item quality. With a few exceptions, their performance is only acceptable when the item quality is high. In some cases, this effect can be ameliorated by an increase in sample size and test length. This implies that using the above statistics to assess item fit in practical settings when the item quality is low remains a challenge.
Code
Full-text available
R package for a set of cognitive diagnosis models. See https://github.com/Wenchao-Ma/GDINA
Article
Full-text available
The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate p-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of infer-ences drawn on the fitted model depends on the magnitude of the misfit, if the model is rejected it is necessary to assess the goodness of approximation. With this aim in mind, a class of root mean squared error of approximation (RMSEA) is described, which makes it possible to test whether the model misfit is below a specific cutoff value. Also, regardless of the outcome of the overall good-ness-of-fit assessment, a piece-wise assessment of fit should be performed to detect parts of the model whose fit can be improved. A number of statistics for this purpose are described, including a z statistic for residual means, a mean-and-variance correction to Pearson's X 2 statistic applied to each bivariate subtable separately, and the use of z statistics for residual cross-products. Item response theory (IRT) modeling involves fitting a latent variable model to discrete responses obtained from questionnaire/test items intended to measure educational achievement, personality, attitudes, and so on. As in any other modeling endeavor, after an IRT model has been fitted, it is necessary to quantify the discrepancy between the model and the data (i.e., the absolute goodness-of-fit of the model). A goodness-of-fit (GOF) index summarizes the discrepancy between the values observed in the data and the values expected under a statistical model. A goodness-of-fit statistic is a GOF index with a known sampling distribution. As such, a GOF statistic may be used to test the hypothesis of whether the fitted model could be the data-generating model. This
Article
Full-text available
The R statistical environment and language has demonstrated particular strengths for interactive development of statistical algorithms, as well as data modelling and visualisation. Its current implementation has an interpreter at its core which may result in a performance penalty in comparison to directly executing user algorithms in the native machine code of the host CPU. In contrast, the C++ language has no built-in visualisation capabilities, handling of linear algebra or even basic statistical algorithms; however, user programs are converted to high-performance machine code, ahead of execution. A new method avoids possible speed penalties in R by using the Rcpp extension package in conjunction with the Armadillo C++ matrix library. In addition to the inherent performance advantages of compiled code, Armadillo provides an easy-to-use template-based meta-programming framework, allowing the automatic pooling of several linear algebra operations into one, which in turn can lead to further speedups. With the aid of Rcpp and Armadillo, conversion of linear algebra centred algorithms from R to C++ becomes straightforward. The algorithms retain the overall structure as well as readability, all while maintaining a bidirectional link with the host R environment. Empirical timing comparisons of R and C++ implementations of a Kalman filtering algorithm indicate a speedup of several orders of magnitude.
Article
Full-text available
This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A-CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings.
Article
Full-text available
In cognitive diagnosis, the test-taking behavior of some examinees may be idiosyncratic so that their test scores may not reflect their true cognitive abilities as much as that of more typical examinees. Statistical tests are developed to recognize the following: (a) nonmasters of the required attributes who correctly answer the item (spuriously high scores) and (b) masters of the attributes who fail to correctly answer the item (spuriously low scores). For a person, nonzero probability of aberrant behavior is tested as the alternative hypothesis, against normal behavior as the null hypothesis. The two generalized likelihood ratio test statistics used, with the null hypothesis parameter on the boundary of the parameter space in each, have asymptotic distributions of a 50:50 mixture of a chi-square distribution with one degree of freedom and a degenerate distribution that is a constant of 0 under the null hypothesis. Simulation results, primarily based on the DINA model (deterministic inputs, noisy ‘‘AND’’ gate), are used to investigate the following: (a) how accurately the statistical tests identify normal/aberrant behaviors, (b) how the power of the tests depends on the length of the cognitive exam and the degree of the inclination toward aberrance, and (c) how sensitive the tests are to inaccurate estimation of model parameters.
Article
Full-text available
The likelihood ratio test statistic G2(dif) is widely used for comparing the fit of nested models in categorical data analysis. In large samples, this statistic is distributed as a chi-square with degrees of freedom equal to the difference in degrees of freedom between the tested models, but only if the least restrictive model is correctly specified. Yet, this statistic is often used in applications without assessing the adequacy of the least restrictive model. This may result in incorrect substantive conclusions as the above large sample reference distribution for G2(dif) is no longer appropriate. Rather, its large sample distribution will depend on the degree of model misspecification of the least restrictive model. To illustrate this, a simulation study is performed where this statistic is used to compare nested item response theory models under various degrees of misspecification of the least restrictive model. G2(dif) was found to be robust only under small model misspecification of the least restrictive model. Consequently, we argue that some indication of the absolute goodness of fit of the least restrictive model is needed before employing G2(dif) to assess relative model fit.
Article
Full-text available
The Rcpp package simplifies integrating C++ code with R. It provides a consistent C++ class hierarchy that maps various types of R objects (vectors, matrices, functions, environments, . . . ) to dedicated C++ classes. Object interchange between R and C++ is managed by simple, flexible and extensible concepts which include broad support for C++ Standard Template Library idioms. C++ code can both be compiled, linked and loaded on the fly, or added via packages. Flexible error and exception code handling is provided. Rcpp substantially lowers the barrier for programmers wanting to combine C++ code with R.
Article
Full-text available
Cognitive diagnosis models are constrained (multiple classification) latent class models that characterize the relationship of questionnaire responses to a set of dichotomous latent variables. Having emanated from educational measurement, several aspects of such models seem well suited to use in psychological assessment and diagnosis. This article presents the development of a new cognitive diagnosis model for use in psychological assessment--the DINO (deterministic input; noisy "or" gate) model--which, as an illustrative example, is applied to evaluate and diagnose pathological gamblers. As part of this example, a demonstration of the estimates obtained by cognitive diagnosis models is provided. Such estimates include the probability an individual meets each of a set of dichotomous Diagnostic and Statistical Manual of Mental Disorders (text revision [DSM-IV-TR]; American Psychiatric Association, 2000) criteria, resulting in an estimate of the probability an individual meets the DSM-IV-TR definition for being a pathological gambler. Furthermore, a demonstration of how the hypothesized underlying factors contributing to pathological gambling can be measured with the DINO model is presented, through use of a covariance structure model for the tetrachoric correlation matrix of the dichotomous latent variables representing DSM-IV-TR criteria.
Article
Constructed‐response items have been shown to be appropriate for cognitively diagnostic assessments because students’ problem‐solving procedures can be observed, providing direct evidence for making inferences about their proficiency. However, multiple strategies used by students make item scoring and psychometric analyses challenging. This study introduces the so‐called two‐digit scoring scheme into diagnostic assessments to record both students’ partial credits and their strategies. This study also proposes a diagnostic tree model (DTM) by integrating the cognitive diagnosis models with the tree model to analyse the items scored using the two‐digit rubrics. Both convergent and divergent tree structures are considered to accommodate various scoring rules. The MMLE/EM algorithm is used for item parameter estimation of the DTM, and has been shown to provide good parameter recovery under varied conditions in a simulation study. A set of data from TIMSS 2007 mathematics assessment is analysed to illustrate the use of the two‐digit scoring scheme and the DTM.
Article
Cognitive diagnosis models (CDMs) are an increasingly popular method to assess mastery or nonmastery of a set of fine-grained abilities in educational or psychological assessments. Several inference techniques are available to quantify the uncertainty of model parameter estimates, to compare different versions of CDMs, or to check model assumptions. However, they require a precise estimation of the standard errors (or the entire covariance matrix) of the model parameter estimates. In this article, it is shown analytically that the currently widely used form of calculation leads to underestimated standard errors because it only includes the item parameters but omits the parameters for the ability distribution. In a simulation study, we demonstrate that including those parameters in the computation of the covariance matrix consistently improves the quality of the standard errors. The practical importance of this finding is discussed and illustrated using a real data example.
Article
There has been an increase of interest in psychometric models referred to as cognitive diagnosis models. A critical concern is selecting the most appropriate model. Several tests for model comparison have been employed, which include the likelihood ratio (LR) and the Wald (W) tests. Although the LR test is relatively more robust than the W test, the current implementation of the LR test is very time consuming given that it requires to calibrate many different models and compare them to the general model. In this article, we introduce an approximation to the LR test based in a two-step estimation procedure under the generalized deterministic inputs, noisy ,“and” gate model framework, the two-step LR test (2LR). The 2LR test is shown to perform very similarly to the LR test. This approximation only requires calibration of the more general model so this statistic may be easily applied in empirical research. Keywords: cognitive diagnosis models, model comparison, item fit, Type I error, power
Article
This paper proposes a general polytomous cognitive diagnosis model for a special type of graded responses, where item categories are attained in a sequential manner, and associated with some attributes explicitly. To relate categories to attributes, a category-level Q-matrix is used. When the attribute and category association is specified a priori, the proposed model has the flexibility to allow different cognitive processes (e.g., conjunctive, disjunctive) to be modelled at different categories within a single item. This model can be extended for items where categories cannot be explicitly linked to attributes, and for items with unordered categories. The feasibility of the proposed model is examined using simulated data. The proposed model is illustrated using the data from the Trends in International Mathematics and Science Study 2007 assessment.
Article
Traditionally, teachers evaluate students’ abilities via their total test scores. Recently, cognitive diagnostic models (CDMs) have begun to provide information about the presence or absence of students’ skills or misconceptions. Nevertheless, CDMs are typically applied to tests with multiple-choice (MC) items, which provide less diagnostic information than constructed-response (CR) items. This paper introduces new CDMs for tests with both MC and CR items, and illustrates how to use them to analyse MC and CR data, and thus, identify students’ skills and misconceptions in a mathematics domain. Analyses of real data, the responses of 497 sixth-grade students randomly selected from four Taiwanese primary schools to eight direct proportion items, were conducted to demonstrate the application of the new models. The results show that the new models can better determine students’ skills and misconceptions, in that they have higher inter-rater agreement rates than traditional CDMs.
Article
Selecting the most appropriate cognitive diagnosis model (CDM) for an item is a challenging process. Although general CDMs provide better model-data fit, specific CDMs have more straightforward interpretations, are more stable, and can provide more accurate classifications when used correctly. Recently, the Wald test has been proposed to determine at the item level whether a general CDM can be replaced by specific CDMs without a significant loss in model-data fit. The current study examines the practical consequence of the test by evaluating whether the attribute-vector classification based on CDMs selected by the Wald test is better than that based on general CDMs. Although the Wald test can detect the true underlying model for certain CDMs, it is yet unclear how effective it is at distinguishing among the wider range of CDMs found in the literature. This study investigates the relative similarity of the various CDMs through the use of the newly developed dissimiliarity index, and explores the implications for the Wald test. Simulations show that the Wald test cannot distinguish among additive models due to their inherent similarity, but this does not impede the ability of the test to provide higher correct classification rates than general CDMs, particularly when the sample size is small and items are of low quality. An empirical example is included to demonstrate the viability of the procedure.
Article
The fit of cognitive diagnostic models (CDMs) to response data needs to be evaluated, since CDMs might yield misleading results when they do not fit the data well. Limited-information statistic M 2 and the associated root mean square error of approximation (RMSEA2) in item factor analysis were extended to evaluate the fit of CDMs. The findings suggested that the M 2 statistic has proper empirical Type I error rates and good statistical power, and it could be used as a general statistical tool. More importantly, we found that there was a strong linear relationship between mean marginal misclassification rates and RMSEA2 when there was model–data misfit. The evidence demonstrated that .030 and .045 could be reasonable thresholds for excellent and good fit, respectively, under the saturated log-linear cognitive diagnosis model.
Article
In contrast to unidimensional item response models that postulate a single underlying proficiency, cognitive diagnosis models (CDMs) posit multiple, discrete skills or attributes, thus allowing CDMs to provide a finer-grained assessment of examinees' test performance. A common component of CDMs for specifying the attributes required for each item is the Q-matrix. Although construction of Q-matrix is typically performed by domain experts, it nonetheless, to a large extent, remains a subjective process, and misspecifications in the Q-matrix, if left unchecked, can have important practical implications. To address this concern, this paper proposes a discrimination index that can be used with a wide class of CDM subsumed by the generalized deterministic input, noisy "and" gate model to empirically validate the Q-matrix specifications by identifying and replacing misspecified entries in the Q-matrix. The rationale for using the index as the basis for a proposed validation method is provided in the form of mathematical proofs to several relevant lemmas and a theorem. The feasibility of the proposed method was examined using simulated data generated under various conditions. The proposed method is illustrated using fraction subtraction data.
Article
Diagnostic models combine multiple binary latent variables in an attempt to produce a latent structure that provides more information about test takers' performance than do unidimensional latent variable models. Recent developments in diagnostic modeling emphasize the possibility that multiple skills may interact in a conjunctive way within the item function, while individual skills still may retain separable additive effects. This extension of either the conjunctive deterministic-input-noisy-and (DINA) model to the generalized version (G-DINA) or the compensatory/additive general diagnostic model (GDM) to the log-linear cognitive diagnostic model (LCDM) is aimed at integrating models with conjunctive skills and those that assume compensatory functioning of multiple skill variables. More recently, a result was proven mathematically that the fully conjunctive DINA model, which combines all required skills in a single binary function, may be recast as a compensatory special case of the GDM. This can be accomplished in more than one form such that the resulting transformed skill-space definitions and design (Q) matrices are different from each other but mathematically equivalent to the DINA model, producing identical model-based response probabilities. In this report, I extend this equivalency result to the LCDM and show that a mathematically equivalent, constrained GDM can be defined that yields identical parameter estimates based on a transformed set of compensatory skills.
Article
As with any psychometric models, the validity of inferences from cognitive diagnosis models (CDMs) determines the extent to which these models can be useful. For inferences from CDMs to be valid, it is crucial that the fit of the model to the data is ascertained. Based on a simulation study, this study investigated the sensitivity of various fit statistics for absolute or relative fit under different CDM settings. The investigation covered various types of model–data misfit that can occur with the misspecifications of the Q-matrix, the CDM, or both. Six fit statistics were considered: –2 log likelihood (–2LL), Akaike's information criterion (AIC), Bayesian information criterion (BIC), and residuals based on the proportion correct of individual items (p), the correlations (r), and the log-odds ratio of item pairs (l). An empirical example involving real data was used to illustrate how the different fit statistics can be employed in conjunction with each other to identify different types of misspecifications. With these statistics and the saturated model serving as the basis, relative and absolute fit evaluation can be integrated to detect misspecification efficiently.
Article
A cognitive item response theory model called the attribute hierarchy method (AHM) is introduced and illustrated. This method represents a variation of Tatsuoka's rule-space approach. The AHM is designed explicitly to link cognitive theory and psychometric practice to facilitate the development and analyses of educational and psychological tests. The following are described: cognitive properties of the AHM; psychometric properties of the AHM, as well as a demonstration of how the AHM differs from Tatsuoka's rule-space approach; and application of the AHM to the domain of syllogistic reasoning to illustrate how this approach can be used to evaluate the cognitive competencies required in a higher-level thinking task. Future directions for research are also outlined.
Article
Diagnostic classification models (aka cognitive or skills diagnosis models) have shown great promise for evaluating mastery on a multidimensional profile of skills as assessed through examinee responses, but continued development and application of these models has been hindered by a lack of readily available software. In this article we demonstrate how diagnostic classification models may be estimated as confirmatory latent class models using Mplus, thus bridging the gap between the technical presentation of these models and their practical use for assessment in research and applied settings. Using a sample English test of three grammatical skills, we describe how diagnostic classification models can be phrased as latent class models within Mplus and how to obtain the syntax and output needed for estimation and interpretation of the model parameters. We also have written a freely available SAS program that can be used to automatically generate the Mplus syntax. We hope this work will ultimately result in greater access to diagnostic classification models throughout the testing community, from researchers to practitioners.
Article
Analyzing examinees’ responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study explored the effectiveness of the Wald test in detecting both uniform and nonuniform DIF in the DINA model through a simulation study. Results of this study suggest that for relatively discriminating items, the Wald test had Type I error rates close to the nominal level. Moreover, its viability was underscored by the medium to high power rates for most investigated DIF types when DIF size was large. Furthermore, the performance of the Wald test in detecting uniform DIF was compared to that of the traditional Mantel-Haenszel (MH) and SIBTEST procedures. The results of the comparison study showed that the Wald test was comparable to or outperformed the MH and SIBTEST procedures. Finally, the strengths and limitations of the proposed method and suggestions for future studies are discussed.
Article
Although latent attributes that follow a hierarchical structure are anticipated in many areas of educational and psychological assessment, current psychometric models are limited in their capacity to objectively evaluate the presence of such attribute hierarchies. This paper introduces the Hierarchical Diagnostic Classification Model (HDCM), which adapts the Log-linear Cognitive Diagnosis Model to cases where attribute hierarchies are present. The utility of the HDCM is demonstrated through simulation and by an empirical example. Simulation study results show the HDCM is efficiently estimated and can accurately test for the presence of an attribute hierarchy statistically, a feature not possible when using more commonly used DCMs. Empirically, the HDCM is used to test for the presence of a suspected attribute hierarchy in a test of English grammar, confirming the data is more adequately represented by hierarchical attribute structure when compared to a crossed, or nonhierarchical structure.
Article
Polytomous attributes, particularly those defined as part of the test development process, can provide additional diagnostic information. The present research proposes the polytomous generalized deterministic inputs, noisy, “and” gate (pG-DINA) model to accommodate such attributes. The pG-DINA model allows input from substantive experts to specify attribute levels and is a general model that subsumes various reduced models. In addition to model formulation, the authors evaluate the viability of the proposed model by examining how well the model parameters can be estimated under various conditions, and compare its classification accuracy against that of the conventional G-DINA model with a modified classification rule. A real-data example is used to illustrate the application of the model in practice.
Article
Xu and von Davier (2006) demonstrated the feasibility of using the general diagnostic model (GDM) to analyze National Assessment of Educational Progress (NAEP) proficiency data. Their work showed that the GDM analysis not only led to conclusions for gender and race groups similar to those published in the NAEP Report Card, but also allowed flexibility in estimating multidimensional skills simultaneously. However, Xu and von Davier noticed that estimating the latent skill distributions will be much more challenging with this model when there is a large number of subgroups to estimate. To make the GDM more applicable to NAEP data analysis, which requires a fairly large subgroups analysis, this study developed a log-linear model to reduce the number of parameters in the latent skill distribution without sacrificing the accuracy of inferences. This paper describes such a model and applies the model in the analysis of NAEP reading assessments for 2003 and 2005. The comparisons between using this model and the unstructured model were made through the use of various results, such as the differences between item parameter estimates and the differences between estimated latent class distributions. The results in general show that using the log-linear model is efficient.
Article
Cognitive diagnosis models have received much attention in the recent psychometric literature because of their potential to provide examinees with information regarding multiple fine-grained discretely defined skills, or attributes. This article discusses the issue of methods of examinee classification for cognitive diagnosis models, which are special cases of restricted latent class models. Specifically, the maximum likelihood estimation and maximum a posteriori classification methods are compared with the expected a posteriori method. A simulation study using the Deterministic Input, Noisy-And model is used to assess the classification accuracy of the methods using various criteria.
Article
This paper presents a new method for using certain restricted latent class models, referred to as binary skills models, to determine the skills required by a set o f test items. The method is applied to reading achievement data from a nationally representative sample o f fourth-grade students and offers useful perspectives on test structure and examinee ability, distinct from those provided by other methods o f analysis. Models fitted to small, overlapping sets o f items are integrated into a common skill map, and the nature o f each skill is then inferred from the characteristics o f the items for which it is required. The reading comprehension items examined conform closely to a unidimensional scale with six discrete skill levels that range from an inability to comprehend or match isolated words in a reading passage to the abilities required to integrate passage content with general knowledge and to recognize the main ideas o f the most difficult passages on the test.
Article
This paper uses log-linear models with latent variables (Hagenaars, in Loglinear Models with Latent Variables, 1993) to define a family of cognitive diagnosis models. In doing so, the relationship between many common models is explicitly defined and discussed. In addition, because the log-linear model with latent variables is a general model for cognitive diagnosis, new alternatives to modeling the functional relationship between attribute mastery and the probability of a correct response are discussed. Keywordscognitive diagnosis models-log-linear latent class models-latent class models
Article
Maximum likelihood estimation of item parameters in the marginal distribution, integrating over the distribution of ability, becomes practical when computing procedures based on an EM algorithm are used. By characterizing the ability distribution empirically, arbitrary assumptions about its form are avoided. The Em procedure is shown to apply to general item-response models lacking simple sufficient statistics for ability. This includes models with more than one latent dimension.
Article
Higher-order latent traits are proposed for specifying the joint distribution of binary attributes in models for cognitive diagnosis. This approach results in a parsimonious model for the joint distribution of a high-dimensional attribute vector that is natural in many situations when specific cognitive information is sought but a less informative item response model would be a reasonable alternative. This approach stems from viewing the attributes as the specific knowledge required for examination performance, and modeling these attributes as arising from a broadly-defined latent trait resembling theϑ of item response models. In this way a relatively simple model for the joint distribution of the attributes results, which is based on a plausible model for the relationship between general aptitude and specific knowledge. Markov chain Monte Carlo algorithms for parameter estimation are given for selected response distributions, and simulation results are presented to examine the performance of the algorithm as well as the sensitivity of classification to model misspecification. An analysis of fraction subtraction data is provided as an example.
Article
This paper presents a new class of models for persons-by-items data. The essential new feature of this class is the representation of the persons: every person is represented by its membership tomultiple latent classes, each of which belongs to onelatent classification. The models can be considered as a formalization of the hypothesis that the responses come about in a process that involves the application of a number ofmental operations. Two algorithms for maximum likelihood (ML) and maximum a posteriori (MAP) estimation are described. They both make use of the tractability of the complete data likelihood to maximize the observed data likelihood. Properties of the MAP estimators (i.e., uniqueness and goodness-of-recovery) and the existence of asymptotic standard errors were examined in a simulation study. Then, one of these models is applied to the responses to a set of fraction addition problems. Finally, the models are compared to some related models in the literature.
Article
This paper provides a survey on studies that analyze the macroeconomic effects of intellectual property rights (IPR). The first part of this paper introduces different patent policy instruments and reviews their effects on R&D and economic growth. This part also discusses the distortionary effects and distributional consequences of IPR protection as well as empirical evidence on the effects of patent rights. Then, the second part considers the international aspects of IPR protection. In summary, this paper draws the following conclusions from the literature. Firstly, different patent policy instruments have different effects on R&D and growth. Secondly, there is empirical evidence supporting a positive relationship between IPR protection and innovation, but the evidence is stronger for developed countries than for developing countries. Thirdly, the optimal level of IPR protection should tradeoff the social benefits of enhanced innovation against the social costs of multiple distortions and income inequality. Finally, in an open economy, achieving the globally optimal level of protection requires an international coordination (rather than the harmonization) of IPR protection.
Article
Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well-known models, such as univariate and multivariate versions of the Rasch model and the two-parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL Internet-based testing.
Shiny: Web application framework for R. R package version 1.0.5
  • W Chang
  • J Cheng
  • J Allaire
  • Y Xie
  • J Mcpherson
Chang, W., Cheng, J., Allaire, J., Xie, Y., & McPherson, J. (2017). Shiny: Web application framework for R. R package version 1.0.5, Retrieved from https://CRAN.R-project.org/ package=shiny.
Cognitive diagnosis modeling: A general framework approach and its implementation in R
  • J De La Torre
  • W Ma
de la Torre, J., & Ma, W. (2016, August). Cognitive diagnosis modeling: A general framework approach and its implementation in R. New York: A Short Course at the Fourth Conference on Statistical Methods in Psychometrics, Columbia University.
A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practicality. (Unpublished doctoral dissertation)
  • S M Hartz
Hartz, S. M. (2002). A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practicality. (Unpublished doctoral dissertation). University of Illinois, Urbana-Champaign.