Article

Full-information item factor analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Describes a method of item factor analysis based on Thurstone's multiple-factor model and implemented by marginal maximum likelihood estimation and the em algorithm. Statistical significance of successive factors added to the model were tested by the likelihood ratio criterion. Provisions for effects of guessing on multiple-choice items, and for omitted and not-reached items, are included. Bayes constraints on the factor loadings were found to be necessary to suppress Heywood cases. Applications to simulated and real data are presented to substantiate the accuracy and practical utility of the method. (PsycINFO Database Record (c) 2000 APA, all rights reserved)(unassigned)

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As opposed to between-item multi-dimensional models, within-item multi-dimensional frameworks employ methods that take full advantage of the information in the data, which is against depending on scarce information. These models are, therefore, called "full information" models (Bock, Gibbons & Muraki, 1988) since they are based on individual's pattern of response rather than on the correlational structure of the multivariate latent response distribution (Wirth & Edwards, 2007). Mokken (1971) as cited in Sijtsma and Molenaar (2002); and Van-Abswoude, Van-Der, and Sijtsma (2004) recommended that sub-factors are interpreted based on the items loading on them (while sub-factors loading three or more items are retained in a scale, factors loading less than three items are deleted from a scale). ...
... From the preceding, multiple methods have been suggested in testing unidimensionality in developed countries. They are test essential dimensionality (Nandakumar & stout, 1993;Stout, 1987), Bootstrap modified parallel analysis test, full information item factor analysis (Bock, Gibbons & Muraki, 1988), Exploratory Factor Analysis (EFA) of tetrachoric correlations (Knol & Berger, 1991), confirmatory factor analysis of tetrachoric correlations with robust weighted least squares estimation (Muthen, 1993), non-linear factor analysis (McDonald, 1967;1962) and many others. Similarly, Hattie, Krakowski, Rogers, and Swaminathan (1996) conducted a simulation study to evaluate the dependability of Stout's unidimensionality index as used in his DIMTEST procedure. ...
... MIRT is used typically in two different settings, namely exploratory and confirmatory settings, respectively. In the confirmatory setting, the relationships between the responses and the latent traits are usually pre-specified by prior knowledge (Mckinley 1989;Janssen and De Boeck 1999), and item parameters can be estimated by various methods including marginal maximum likelihood method (Bock and Aitkin 1981;Bock et al. 1988) and Bayesian estimation (Béguin and Glas 2001). However, misspecification of the item-trait relationships in the confirmatory analysis may lead to serious model lack of fit, and consequently, erroneous assessment (da Silva et al. 2019). ...
... Note that the conditional expectations in Q 0 , Q j and E j do not have closed-form solutions. They are usually approximated using the Gauss-Hermite (GH) quadrature (Bock and Aitkin 1981;Bock et al. 1988), Monte Carlo integration (Meng and Schilling 1996) and adaptive GH quadrature (Schilling and Bock 2005). In this paper, the GH quadrature is applied, and the approximations of Q 0 , Q j and E j can be expressed by using "artificial data" in IRT literature. ...
Article
Full-text available
In this paper, we propose a generalized expectation model selection (GEMS) algorithm for latent variable selection in multidimensional item response theory models which are commonly used for identifying the relationships between the latent traits and test items. Under some mild assumptions, we prove the numerical convergence of GEMS for model selection by minimizing the generalized information criteria of observed data in the presence of missing data. For latent variable selection in the multidimensional two-parameter logistic (M2PL) models, we present an efficient implementation of GEMS to minimize the Bayesian information criterion. To ensure parameter identifiability, the variances of all latent traits are assumed to be unity and each latent trait is required to have an item exclusively associated with it. The convergence of GEMS for the M2PL models is verified. Simulation studies show that GEMS is computationally more efficient than the expectation model selection (EMS) algorithm and the expectation maximization based L1L1L_{1}-penalized method (EML1), and it yields better correct rate of latent variable selection and mean squared error of parameter estimates than the EMS and EML1. The GEMS algorithm is illustrated by analyzing a real dataset related to the Eysenck Personality Questionnaire.
... IRT models are particularly appropriate methodological tools in the context at issue as they allow us to study the relational structure of single red flags (i.e., their dimensionality) and can be applied when one has no prior information about the number and composition of the subdimensions of the phenomenon under scrutiny. In the IRT case, the dimensionality structure can be estimated empirically by comparing nested models and rotating ascertained solutions to seek for more interpretable structures (Bock & Aitkin, 1981;Bock et al., 1988;Chalmers, 2012). ...
... Model estimation procedure using maximum likelihood methods is detailed in Bock et al. (1988) and implemented in the R package "mirt" (Chalmers, 2012). ...
Article
Full-text available
The Agenda 2030 recognises corruption as a major obstacle to sustainable development and integrates its reduction among SDG targets, in view of developing peaceful, just and strong institutions. In this paper, we propose a method to assess the validity of corruption indicators within an Item Response Theory framework, which explicitly accounts for the latent and multidimensional facet of corruption. Towards this main aim, a set of fifteen red flag indicators of corruption risk in public procurement is computed on data included in the Italian National Database of Public Contracts. Results show a multidimensional structure composed of sub-groups of red flag indicators i. measuring distinct corruption risk categories, which differ in nature, type and entity, and are generally non-superimposable; ii. mirroring distinct dynamics related to specific SDG principles and targets.
... IRT models are particularly appropriate methodological tools in the context at issue as they allow us to study the relational structure of single red flags (i.e., their dimensionality) and can be applied when one has no prior information about the number and composition of the subdimensions of the phenomenon under scrutiny. In the IRT case, the dimensionality structure can be estimated empirically by comparing nested models and rotating ascertained solutions to seek for more interpretable structures (Bock and Aitkin, 1981;Bock et al., 1988;Chalmers, 2012). ...
... Model estimation procedure using maximum likelihood methods is detailed in Bock et al. (1988) and implemented in the R package "mirt" (Chalmers, 2012). ...
Preprint
Full-text available
The Agenda 2030 recognises corruption as a major obstacle to sustainable development and integrates its reduction among SDG targets, in view of developing peaceful, just and strong institutions. In this paper, we propose a method to assess the validity of corruption indicators within an Item Response Theory framework, which explicitly accounts for the latent and multidimensional facet of corruption. Towards this main aim, a set of fifteen red flag indicators of corruption risk in public procurement is computed on data included in the Italian National Database of Public Contracts. Results show a multidimensional structure composed of sub-groups of red flag indicators i. measuring distinct corruption risk categories, which differ in nature, type and entity, and are generally non-superimposable; ii. mirroring distinct dynamics related to specific SDG principles and targets.
... In most MIRT models, it is conventional to prespecify the relationships between the items and the latent traits by some prior knowledge. Various methods for item parameter estimation, including marginal maximum likelihood estimation (Bock, Gibbons, & Muraki, 1988) and Bayesian estimation (Béguin & Glas, 2001) have been proposed. However, misspecification of the item--trait relationships in the confirmatory analysis may lead to serious model lack of fit and, consequently, erroneous parameter estimation (Jin & Wang, 2014;da Silva, Liu, Huggins-Manley, & Bazán, 2019). ...
... A conventional approach to cope with the problem is the exploratory item factor analysis (IFA; Bock et al., 1988), where the misfit caused by the erroneous item--trait prespecification can be avoided. IFA aims to freely estimate the entire item--trait relationships and obtain a rotation-invariant factor loading matrix only if some appropriate constraints (e.g., the covariance matrix of the latent traits is the identity matrix) are imposed (Browne, 2001;Clarkson & Jennrich, 1988). ...
Preprint
Full-text available
The aim of latent variable selection in multidimensional item response theory (MIRT) models is to identify latent traits probed by test items of a multidimensional test. In this paper the expectation model selection (EMS) algorithm proposed by Jiang et al. (2015) is applied to minimize the Bayesian information criterion (BIC) for latent variable selection in MIRT models with a known number of latent traits. Under mild assumptions, we prove the numerical convergence of the EMS algorithm for model selection by minimizing the BIC of observed data in the presence of missing data. For the identification of MIRT models, we assume that the variances of all latent traits are unity and each latent trait has an item that is only related to it. Under this identifiability assumption, the convergence of the EMS algorithm for latent variable selection in the multidimensional two-parameter logistic (M2PL) models can be verified. We give an efficient implementation of the EMS for the M2PL models. Simulation studies show that the EMS outperforms the EM-based L 1 regularization in terms of correctly selected latent variables and computation time. The EMS algorithm is applied to a real data set related to the Eysenck Personality Questionnaire.
... The factorial analysis most found in literature is exploratory and confirmatory. However, in dichotomous responses, such approaches present some mathematical limitations resolved with the approach described by Bock, Aitkin (1981) and Bock, Gibbons, Muraki (1988), in which treatment of dichotomous items and estimated weights of the factors are carried out through a technique denominated full information factorial analysis, based on the item response theory. According to Reeve (2002) assuming uni-dimensionality can be examining, comparing the relationship between the first and second eigenvalue in the tetrachoric correlations matrix. ...
... Full-information factor analysis (Bock et al., 1988) with the R Software (R Development Core Team, 2011) was used to verify the dimensionality of the construct. The Reckase (1979) limit of 20% of the total variance was used to assess the unidimensionality of the construct. ...
Conference Paper
Full-text available
The aim of this paper is to measure the effectiveness of the Information and Communication Technology organizations (ICT), from the managers’ point of view, using the Item Response Theory (IRT). This kind of organization is normally associated to complex, dynamic, and competitive environments, and is needed to verify its effectiveness. In academic literature, the concept of organizational effectiveness and its measurement is surrounded by disagreement. Based on the dimensions of effectiveness, a construct was elaborated: a questionnaire, which was submitted to evaluation of specialists. The results show, based on the degree of difficulty, that the manager tends to have greater agreement with concerns about innovation, items 11 (-2.653) and 14 (-3.149), respectively, than items 6 (- 1.222) and 15 (-0.324), relating to society and the environment. The construct demonstrated itself to be feasible in measuring organizational effectiveness of ICT companies, from the managers’ point of view, using the Two-Parameter Logistic Model (2PLM) of the IRT. This model permits to evaluate the quality and property of each item placed within a single scale: items and respondents, which is not possible when using other similar tools.
... Both the primary and hold-out samples were analysed the same way, starting with a smoothed Spearman correlation matrix of the 220 observed variables in each sample. Matrices were smoothed using eigenvalue decomposition (Bock et al., 1988;Wothke, 1993) and the smoothed correlation matrices were nearly identical to the unsmoothed correlation matrices (Pearson's rs > .999 and Spearman's rhos > .999). ...
Preprint
Full-text available
In this study, we reduced the DSM-5 to its constituent symptoms and reorganized them based on patterns of covariation in individuals’ (n = 14,762) self-reported experiences of the symptoms to form an empirically derived hierarchical framework of clinical phenomena. Specifically, we used the points of agreement among hierarchical principal components analyses and hierarchical clustering, as well as between the randomly split primary (n = 11,762) and hold-out (n = 3,000) samples, to identify the robust constructs that emerged to form a hierarchy ranging from symptoms and syndromes up to very broad superspectra of psychopathology. The resulting model had noteworthy convergence with the upper levels of the Hierarchical Taxonomy of Psychopathology (HiTOP) framework and substantially expands on HiTOP’s current coverage of dissociative, elimination, sleep-wake, trauma-related, neurodevelopmental, and neurocognitive disorder symptoms. We also mapped some exemplar DSM-5 disorders onto our hierarchy; some formed coherent syndromes, whereas others were notably heterogeneous.
... O modelo logístico unidimensional de dois parâmetros da TRI, segundo Birnbaun (1968), é dado por: A dimensionalidade de um traço latente pode ser avaliada pela realização da análise fatorial com informação completa (full information) (BOCK et al 1988, BOCK;AITKIN, 1981) e pela análise de componentes principais com matriz de correlação tetracórica (MISLEVY, 1986), ambas as técnicas são adequadas para respostas dicotômicas. ...
Conference Paper
Full-text available
O objetivo deste artigo é mensurar a predisposição ao comportamento sustentável de universitários utilizando a Teoria de Resposta ao Item. Os itens do instrumento foram construídos com base na Teoria do Comportamento Planejado, Teoria de Ativação da Norma e do Triple Botton Line. A amostra foi composta por 120 alunos de três cursos de Institutos Federais brasileiros. Com a construção da escala e a aplicação da TRI, seis níveis âncora foram encontrados, indicando que quanto maior o nível, mais o aluno apresenta comportamento favorável à sustentabilidade. A posição dos alunos na escala permite identificar características que dominam o comportamento sustentável.
... Factorial analysis most frequently found in relevant literature is exploratory and confirmatory. However, for dichotomous responses, these approaches present some mathematical limitations resolved using the approach described by Bock and Aitkin (1981), Bock, Gibbons and Muraki (1988), in which dichotomous item treatment and factor weight loads are performed through the technique called full information factor analysis, based on the item response theory. According to Reeve (2002) the uni-dimensionality assumption may be examined comparing the relationship between the first and second eigenvalue from the tetrachoric correlations matrix. ...
Conference Paper
Full-text available
Growing challenges with respect to preserving the environment have forced changes in company operational structures. Thus, the objective of this article is to measure the evidence of Environmental Management using the Item Response Theory, based on website analysis from Brazilian industrial companies from sectors defined through the scope of the research. This is a qualitative, exploratory, and descriptive study related to an information collection and analysis instrument. The general view of the research problem with respect to the phenomenon under study in based on multi-case studies, with the methodological outline based on the theoretical reference used. Primary data was gathered from 638 company websites from 7 different Brazilian sectors and led to the creation of 26 items approved by environmental specialists. The results were attained with the measuring of Environmental Management evidence via the Item Response Theory, providing a clear order of the items involved based on each item’s level of difficulty, quality, and propriety. This permitted the measurement of each item’s quality and propriety, as well as that of the respondents, placing them on the same analysis scale. Increasing the number of items involved is suggested for future research in order to permit broader sector analysis. It would also be interesting to build a Computerized Adaptive Test (CAT) for the respondent – the end of the questionnaire – get, immediately, the degree of environment management evidence of your company. As such, a greater reach of the instrument should be considered with the objective to contribute to business management.
... Building on such ideas, Woody and McConkey (2003) described a componential approach to hypnotizability, which Woody et al. (2005; see also Woody & Barnier, 2008) empirically investigated in one of the few componential studies in the literature. They pooled data from two scales (HGSHS:A and SHSS:C) administered to over 600 participants, calculated pass rates for each of the 23 suggestions, and subjected the data to Full Information Factor Analysis (Bock & Aitkin, 1981;Bock et al., 1988). They identified a General Hypnotizability factor and four additional factors: a Direct Motor factor, important for enacting suggestions involving motor responses (e.g., hands moving apart); a Motor Challenge factor, important for enacting suggestions that inhibit motor responses (e.g., eye catalepsy); a Perceptual-Cognitive factor, important for enacting suggestions for hallucinations (e.g., voice hallucination); and a Posthypnotic Amnesia factor, important for enacting suggestions that temporarily impair memory (e.g., forgetting the preceding suggestions). ...
Article
Full-text available
Although responsiveness to hypnotic suggestions (hypnotizability) typically is conceptualized and studied as a singular homogeneous capability, numerous lines of evidence suggest instead that it is a hierarchically structured cognitive capacity comprising a core superordinate ability and ancillary subordinate component abilities. After reviewing current approaches to the measurement of hypnotizability and componential approaches to other cognitive capabilities, we highlight outstanding questions in the field and argue for a componential approach to the study of hypnotizability. Such an approach assumes that hypnotizability is not a unitary construct but is rooted in multiple subabilities that interact to give rise to individual differences that are expressed within specific contexts. We revisit previous componential work on hypnotizability and propose a series of steps by which a componential model can be more rigorously interrogated and integrated with contemporary advances in our understanding of human cognition.
... The first analysis involved examining the factor structure of the Indonesian FAD using CFA. Following the recommendation of treating the number of response categories lower than five as ordered-categorical (Rhemtulla et al., 2012), and considering the non-normal or highly skewed Likert data in the study (Muthén & Kaplan, 1985), the Ordinal CFA, also known as item factor analysis (IFA) was used (Bock et al., 1988). IFA is a useful tool to explore the theoretical dimensions of measurement instruments in psychological research with ordinal indicators (Hayat et al., 2021;Rahayu et al., 2021). ...
Article
Full-text available
As a foundational instrument in the measurement of family functioning, this study investigated the psychometric properties of the scores on the 53-item Family Assessment Device (FAD) on multicultural Indonesian university student samples during the early phase of the COVID-19 pandemic. This study employed a quantitative cross-sectional research design involving 2740 respondents (74.4% women and 25.6% men; aged 17-29 years). It is unique in that it applies a multiple indicators multiple causes (MIMIC) model to the Indonesian FAD scores. Overall, the construct validity of FAD item scores was confirmed and correlations between factors consistent with findings from the original version were identified. Based on the MIMIC model, five covariates were found to have a significant direct effect on at least one factor, while two other covariates had no significant direct effect on any factor. This study will facilitate the development of future research and psychological knowledge regarding family functioning. ARTICLE HISTORY
... A recent function to estimate the tetrachoric correlations can be found in package polycor, (18), for R (2016). However, the estimated matrix of the sample tetrachoric correlation obtained from the classical algorithms is often nonpositive definite, (19). ...
Article
Full-text available
The model proposed in this paper has been thought to be used in large-scale assessment tests designed explicitly to measure more than one latent trait. Those tests are usually split into subtests, in which each subtest is designed to measure mainly a unique unidimensional latent trait. Admission tests of some universities are typical examples of that type of tests.
... Item factor analysis (IFA; Bock, Gibbons, & Muraki, 1988) is an invaluable method for investigating the latent structure underlying the discrete item response data that arises in many social science applications. In particular, IFA allows researchers to summarize a large number of item responses using a smaller number of continuous latent factors, thereby reducing the dimensionality of the data and potentially making the data easier to understand. ...
Preprint
Full-text available
We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle user-defined constraints on loadings and factor correlations. For GOF assessment, we explore new simulation-based tests and indices. In particular, we consider extensions of the classifier two-sample test (C2ST), a method that tests whether a machine learning classifier can distinguish between observed data and synthetic data sampled from a fitted IFA model. The C2ST provides a flexible framework that integrates overall model fit, piece-wise fit, and person fit. Proposed extensions include a C2ST-based test of approximate fit in which the user specifies what percentage of observed data can be distinguished from synthetic data as well as a C2ST-based relative fit index that is similar in spirit to the relative fit indices used in structural equation modeling. Via simulation studies, we first show that the confirmatory extension of Urban and Bauer's (2021) algorithm produces more accurate parameter estimates as the sample size increases and obtains comparable estimates to a state-of-the-art confirmatory IFA estimation procedure in less time. We next show that the C2ST-based test of approximate fit controls the empirical type I error rate and detects when the number of latent factors is misspecified. Finally, we empirically investigate how the sampling distribution of the C2ST-based relative fit index depends on the sample size.
... Therefore, confirmatory factor analyses were conducted with the pretest data to assess whether these modeling approaches were psychometrically reasonable. Because all items were dichotomously scored, a full-information item factor analysis (Bock et al., 1988) was performed. ...
Article
Full-text available
Using multiple external representations is advocated for learning in STEM education. This learning approach assumes that multiple external representations promote richer mental representations and a deeper understanding of the concept. In mathematics, the concept of function is a prototypical content area in which multiple representations are used. However, there are hardly any experimental studies investigating the effect of learning functional thinking with multiple representations compared to learning with only one form of representation. Therefore, this article reports on a quasi-experimental intervention study with students from Grade 7, using three measurement time points. The study compared the multi-representational learning of functional thinking with both tables and graphs with mono-representational learning with either tables or graphs. The results show that multi-representational learning led to advantages in learning qualitative functional thinking. However, in quantitative functional thinking, learning with both graphs and tables did not result in higher learning gains than learning exclusively with graphs. Furthermore, students were better able to transfer their knowledge from graphs to tables than vice versa. The results also indicate that multi-representational learning requires more time than mono-representational learning but can lead to higher learning gains. In sum, the results show that the effect of learning with representations is a complex interaction process between learning content and the forms of representation.
... In particular, Haberman (1977) shows that in practical settings with a finite number of items, standard convergence theorems do not hold for JMLE as the number of people grows. To remedy this, the authors instead treat ability as a nuisance parameter and marginalized it out (Bock, 1981;Bock, 1988). Dempster (1977) introduces an Expectation-Maximization (EM) algorithm to iterate between (1) updating beliefs about item characteristics and (2) using the updated beliefs to define a marginal distribution (without ability) p(r ij |d j ) by numerical integration of a i . ...
Preprint
Full-text available
Item Response Theory (IRT) is a ubiquitous model for understanding human behaviors and attitudes based on their responses to questions. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving psychometric modeling leading to improved scientific understanding and public policy. However, while larger datasets allow for more flexible approaches, many contemporary algorithms for fitting IRT models may also have massive computational demands that forbid real-world application. To address this bottleneck, we introduce a variational Bayesian inference algorithm for IRT, and show that it is fast and scalable without sacrificing accuracy. Applying this method to five large-scale item response datasets from cognitive science and education yields higher log likelihoods and higher accuracy in imputing missing data than alternative inference algorithms. Using this new inference approach we then generalize IRT with expressive Bayesian models of responses, leveraging recent advances in deep learning to capture nonlinear item characteristic curves (ICC) with neural networks. Using an eigth-grade mathematics test from TIMSS, we show our nonlinear IRT models can capture interesting asymmetric ICCs. The algorithm implementation is open-source, and easily usable.
Article
Full-text available
Purpose Patient-Reported Outcomes (PROs) are widely used in clinical trials, epidemiological research, quality of life (QOL) studies, routine clinical care, and medical surveillance. The Patient Reported Outcomes Measurement Information System (PROMIS) is a system of reliable and standardized measures of PROs developed with Item Response Theory (IRT) using latent scores. Power estimation is critical to clinical trials and research designs. However, in clinical trials with PROs as endpoints, observed scores are often used to calculate power rather than latent scores. Methods In this paper, we conducted a series of simulations to compare the power obtained with IRT latent scores, including Bayesian IRT, Frequentist IRT, and observed scores, focusing on small sample size common in pilot studies and Phase I/II trials. Taking the PROMIS depression measures as an example, we simulated data and estimated power for two-armed clinical trials manipulating the following factors: sample size, effect size, and number of items. We also examined how misspecification of effect size affected power estimation. Results Our results showed that the Bayesian IRT, which incorporated prior information into latent score estimation, yielded the highest power, especially when sample size was small. The effect of misspecification diminished as sample size increased. Conclusion For power estimation in two-armed clinical trials with standardized PRO endpoints, if a medium effect size or larger is expected, we recommend BIRT simulation with well-grounded informative priors and a total sample size of at least 40.
Article
Researchers simulating covariance structure models sometimes add model error to their data to produce model misfit. Presently, the most popular methods for generating error-perturbed data are those by Tucker, Koopman, and Linn (TKL), Cudeck and Browne (CB), and Wu and Browne (WB). Although all of these methods include parameters that control the degree of model misfit, none can generate data that reproduce multiple fit indices. To address this issue, we describe a multiple-target TKL method that can generate error-perturbed data that will reproduce target RMSEA and CFI values either individually or together. To evaluate this method, we simulated error-perturbed correlation matrices for an array of factor analysis models using the multiple-target TKL method, the CB method, and the WB method. Our results indicated that the multiple-target TKL method produced solutions with RMSEA and CFI values that were closer to their target values than those of the alternative methods. Thus, the multiple-target TKL method should be a useful tool for researchers who wish to generate error-perturbed correlation matrices with a known degree of model error. All functions that are described in this work are available in the fungible R library. Additional materials (e.g., R code, supplemental results) are available at https://osf.io/vxr8d/.
Article
In multidimensional tests, the identification of latent traits measured by each item is crucial. In addition to item–trait relationship, differential item functioning (DIF) is routinely evaluated to ensure valid comparison among different groups. The two problems are investigated separately in the literature. This paper uses a unified framework for detecting item–trait relationship and DIF in multidimensional item response theory (MIRT) models. By incorporating DIF effects in MIRT models, these problems can be considered as variable selection for latent/observed variables and their interactions. A Bayesian adaptive Lasso procedure is developed for variable selection, in which item–trait relationship and DIF effects can be obtained simultaneously. Simulation studies show the performance of our method for parameter estimation, the recovery of item–trait relationship and the detection of DIF effects. An application is presented using data from the Eysenck Personality Questionnaire.
Article
Full-text available
Understanding occupational preferences through Big Five personality traits offers a crucial insight into the socio-psychological profiles of working individuals, extending beyond mere occupational behaviors. Previous research, however, has not conclusively shown that the broad, situation-general Big Five traits can systematically account for occupational preferences as outlined by the existing RIASEC model. The RIASEC framework’s reliance on theory-driven, preselected occupational scenarios may hinder this explanation. In this study, we initially employed data-driven, exploratory methods to identify and validate occupational preference factors from thousands of participants’ responses to a wide array of occupational titles. Subsequently, we explored the connections between the Big Five traits and these newly identified preference factors. Our analysis revealed a coherent and systematic relationship between data-driven occupational preferences and the Big Five traits, formulating the Hexagonal Openness–Extraversion–Agreeableness model of occupational personality traits. This model facilitates a broader understanding of individuals’ work-related personalities from a comprehensive social-psychological viewpoint.
Article
Exploratory cognitive diagnosis models have been widely used in psychology, education and other fields. This paper focuses on determining the number of attributes in a widely used cognitive diagnosis model, the GDINA model. Under some conditions of cognitive diagnosis models, we prove that there exists a special structure for the covariance matrix of observed data. Due to the special structure of the covariance matrix, an estimator based on eigen‐decomposition is proposed for the number of attributes for the GDINA model. The performance of the proposed estimator is verified by simulation studies. Finally, the proposed estimator is applied to two real data sets Examination for the Certificate of Proficiency in English (ECPE) and Big Five Personality (BFP).
Article
Full-text available
Many studies in fields such as psychology and educational sciences obtain information about attributes of subjects through observational studies, in which raters score subjects using multiple-item rating scales. Error variance due to measurement effects, such as items and raters, attenuate the regression coefficients and lower the power of (hierarchical) linear models. A modeling procedure is discussed to reduce the attenuation. The procedure consists of (1) an item response theory (IRT) model to map the discrete item responses to a continuous latent scale and (2) a generalizability theory (GT) model to separate the variance in the latent measurement into variance components of interest and nuisance variance components. It will be shown how measurements obtained from this mixture of IRT and GT models can be embedded in (hierarchical) linear models, both as predictor or criterion variables, such that error variance due to nuisance effects are partialled out. Using examples from the field of educational measurement, it is shown how general-purpose software can be used to implement the modeling procedure.
Article
Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.
Article
Item-response theory (IRT) represents a key advance in measurement theory. Yet, it is largely absent from curricula, textbooks and popular statistical software, and often introduced through a subset of models. This Element, intended for creativity and innovation researchers, researchers-in-training, and anyone interested in how individual creativity might be measured, aims to provide 1) an overview of classical test theory (CTT) and its shortcomings in creativity measurement situations (e.g., fluency scores, consensual assessment technique, etc.); 2) an introduction to IRT and its core concepts, using a broad view of IRT that notably sees CTT models as particular cases of IRT; 3) a practical strategic approach to IRT modeling; 4) example applications of this strategy from creativity research and the associated advantages; and 5) ideas for future work that could advance how IRT could better benefit creativity research, as well as connections with other popular frameworks.
Chapter
This research aims to measure the ANEEL Consumer Satisfaction Index (IASC) of CELESC electricity distributor through the application of the Item Response Theory. The IASC is an indicator that allows evaluates the satisfaction of the residential consumer with the services provided by the electricity distributor with the intention of stimulating continuous improvement. The data set of the satisfaction survey fulfilled out by ANEEL in 2020 is quantitative and was obtained through secondary sources. It was possible to test the validity and internal consistency of the items and through the Item Response Theory (IRT) an interpretable scale was built to measure the level of consumer satisfaction. It was possible to calculate and explain the information curves of the items and the test, as well as demonstrate the gain that can be obtained with its application to measure the level of consumer satisfaction. The satisfaction level scale was created in five categories: “very dissatisfied, dissatisfied, neither dissatisfied/nor satisfied, satisfied and very satisfied. The results showed that 1.05% of consumers are very dissatisfied, 15.85% are dissatisfied, 55.75% are neither dissatisfied/nor satisfied with the company; 26.31% are satisfied, and 1.05% are very satisfied with the energy distributor. The created scale allows the energy distributor to monitor the measured indicators and evaluate the evolution of consumer satisfaction. In addition, managers to develop strategies to improve the level of consumer satisfaction can use the model created, to achieve continual improvement in service delivery of the Electricity Distributor.
Article
Full-text available
p>Creative thinking is an important skill of the modern world, and its assessment with the help of modern digital tools is becoming an increasingly complex methodological task. The inclusion of process data of task performance in the assessment model of creative thinking is a promising direction that becomes possible in computer testing. The use of such data allows us to consider the processes of creative thinking in dynamics, which makes the assessment of the level of creativity of students more accurate and multifaceted. The purpose of the study was to determine the possibility of using process data of task performance as part of evaluating creative thinking using a tool in a digital environment. The paper presents an analysis of the work of 823 4th grade students who, during the assignment, created images in a closed simulation environment to assess creative and critical thinking. The analysis of process data of completing tasks performance took place using N-grams of various lengths. As a result, the sequences of actions of students with different levels of creative thinking were compared, and various strategies of behavior of the test takers were identified in task for creative thinking compared with a task for critical thinking. Together with information about the level of creativity based on the analysis of the created product, process data of task performance improves the understanding of the functioning of tasks through the prism of the task execution process by the test takers. It also makes a step forward in detailing the feedback that can be obtained as part of testing.</p
Article
Full-text available
Las escalas de categorías aportan numerosas ventajas en las evaluaciones educativas, que contrastan con el delicado análisis estadístico que precisan sus medidas con posterioridad. Sus propiedades métricas no siempre han sido atendidas de forma correcta, unas veces por desconocimiento, otras por no existir criterios comunes sobre cuáles son las técnicas estadísticas más adecuadas. En este trabajo se apuntan algunas de las razones de este panorama confuso y se proporcionan sugerencias de análisis poco conocidas en las investigaciones educativas, con una triple intención: una mejora en la métrica de las escalas; la búsqueda de dimensiones internas en los datos; y el estudio de las respuestas que se obtienen a partir de unas dimensiones establecidas previamente. Para alcanzar la primera, se propone la ley de Zipf, sin duda una muy interesante sugerencia para transformar la métrica de las escalas categóricas de frecuencias. Para conseguir la segunda se sugieren algunas extensiones del Análisis Factorial. Por último, y mediante un ejemplo intuitivo, se plantea un Análisis Conjunto referido a una evaluación educativa concreta.Palabras clave: Escalas, escalas Likert, análisis factorial, análisis conjunto, ley de Zipf. Categorical scales provide many advantages in educational evaluations, which contrast with the delicate statistical analysis that their measurements require. Metric properties have not always been treated properly, sometimes because of ignorance, others by the absence of common criteria about which statistical techniques are more appropriate. In this work, we point out some of the reasons for this confusing panorama we also provide little known tips on analysis in educational research, with a threefold aim: an improvement on the metric of the scales, the search for internal data dimensions, and to study the obtained responses from previously established dimensions. To achieve the first one, it is proposed the Zipf's law, certainly a very interesting suggestion to transform the metric of the frequency categorical scales. In order to get the second, some extensions to the factorial analysis are suggested. Finally, and through intuitive example, we propose a Conjoint Analysis referred to a particular educational evaluation.Keywords: Scales, Likert scales, factorial analysis, conjoint analysis, Zipf’s law.
Chapter
Many multivariate statistical models have dimensional structures. Such models typically require judicious choice of dimensionality. Likelihood ratio tests are often used for dimensionality selection. However, to this day there is still a great deal of confusion about the asymptotic distributional properties of the log-likelihood ratio (LR) statistics in some areas of psychometrics. Although in many cases the asymptotic distribution of the LR statistic representing the difference between the correct model (of specific dimensionality) and the saturated model is guaranteed to be chi-square, that of the LR statistic representing the difference between the correct model and the one with one dimension higher than the correct model is not likely to be chi-square due to a violation of one of regularity conditions. In this paper, we attempt to clarify the misunderstanding that the latter is also assured to be asymptotically chi-square. This common misunderstanding has occurred repeatedly in various fields, although in some areas it has been corrected.KeywordsAsymptotic chi-square distributionRegularity conditionsCanonical correlation analysisModels of contingency tablesMultidimensional scalingFactor analysisNormal mixture models
Thesis
Full-text available
A busca por alternativas para avaliar o desempenho logístico foi o que instigou esta pesquisa. Nessa linha de pensamento, o trabalho procurou responder ao seguinte problema de pesquisa: De que maneira a Teoria da Resposta ao Item pode contribuir para a avaliação de desempenho logístico no serviço ao cliente? Neste sentido, o trabalho teve como objetivo geral verificar de que forma a Teoria da Resposta ao Item (TRI) pode contribuir com a avaliação do desempenho logístico no serviço ao cliente. Na consecução dos objetivos propostos o estudo realiza uma revisão literária sobre: a logística e sua mensuração, serviço ao cliente e sua mensuração, Teoria da Resposta ao Item (TRI), construção e interpretação de escalas de medidas, elaboração de um conjunto de itens e elaboração de instrumentos de medição. O referencial teórico, dentro de uma metodologia apropriada, serviu de base para a construção de um conjunto de itens relacionados com a logística no serviço ao cliente. De posse deste conjunto de itens é proposta uma Sistemática para Avaliação de Desempenho Logístico no Serviço ao Cliente (SADLSC) baseada na Teoria da Resposta ao Item. Para testar e validar o estudo, a sistemática foi aplicada em duas empresas industriais. O resultado final da aplicação da sistemática foi a construção de uma Escala de Desempenho Logístico no Serviço ao Cliente (EDLSC) para cada empresa. O estudo concluiu que a TRI oferece condições, suporte e modelos matemáticos para auxiliar na anális
Article
Full-text available
The application of the Rasch measurement model in rehabilitation is now well established. Both its dichotomous and polytomous forms provide for transforming ordinal scales into interval-level measures, consistent with the requirements of fundamental measurement. The growth of applying the model in rehabilitation spans 30 years, during which both the protocol has steadily developed and several software packages have emerged that provide for analysis, together with the “R” language that has an increasing set of codes for applying the model. This article reviews that development and highlights current practice requirements, including those for providing the relevant information for the methods, and what is expected of the analysis. In addition, this provides a worked example and looks at the remaining issues and current developments of its application.
Article
Full-text available
This article aims to record the visitor’s satisfaction with the services provided in Greek thematic mu-seums, the behavioral intention of visitors, and whether tourism marketing techniques take place in thematic museums. There was an attempt to record the opinion of the people working in thematic mu-seums and their perception of tourism marketing, the perceptions of visitors about their experience, their satisfaction, the existing situation in the thematic museums of the country, and the possible changes that may arise having as main purpose the increase of tourist traffic. The ultimate goal of the research model investigated and analyzed in the article is to understand if the visitor will revisit the mu-seum or will be the word-of-mouth advertisement of the museum. There is a need for tourism market-ing techniques in museums, as well.
Article
Full-text available
Measurement is at the heart of scientific research. As many—perhaps most—psychological constructs cannot be directly observed, there is a steady demand for reliable self-report scales to assess latent constructs. However, scale development is a tedious process that requires researchers to produce good items in large quantities. In this tutorial, we introduce, explain, and apply the Psychometric Item Generator (PIG), an open-source, free-to-use, self-sufficient natural language processing algorithm that produces large-scale, human-like, customized text output within a few mouse clicks. The PIG is based on the GPT-2, a powerful generative language model, and runs on Google Colaboratory—an interactive virtual notebook environment that executes code on state-of-the-art virtual machines at no cost. Across two demonstrations and a preregistered five-pronged empirical validation with two Canadian samples (NSample 1 = 501, NSample 2 = 773), we show that the PIG is equally well-suited to generate large pools of face-valid items for novel constructs (i.e., wanderlust) and create parsimonious short scales of existing constructs (i.e., Big Five personality traits) that yield strong performances when tested in the wild and benchmarked against current gold standards for assessment. The PIG does not require any prior coding skills or access to computational resources and can easily be tailored to any desired context by simply switching out short linguistic prompts in a single line of code. In short, we present an effective, novel machine learning solution to an old psychological challenge. As such, the PIG will not require you to learn a new language—but instead, speak yours.
Article
Full-text available
One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between observed items and latent traits, which is typically addressed by the exploratory analysis and factor rotation techniques. Recently, an EM-based L1-penalized log-likelihood method (EML1) is proposed as a vital alternative to factor rotation. Based on the observed test response data, EML1 can yield a sparse and interpretable estimate of the loading matrix. However, EML1 suffers from high computational burden. In this paper, we consider the coordinate descent algorithm to optimize a new weighted log-likelihood, and consequently propose an improved EML1 (IEML1) which is more than 30 times faster than EML1. The performance of IEML1 is evaluated through simulation studies and an application on a real data set related to the Eysenck Personality Questionnaire is used to demonstrate our methodologies.
Chapter
In this article, we illustrate widely-used multidimensional item response theory (MIRT) models, including models with correlated factors, bifactor and the two-tier extensions. We also discuss various multidimensional models for items with dichotomous, ordered, and nominal responses. We show their connections with classic unidimensional models and the categorical confirmatory factor analysis model. We review item and person parameter estimation approaches, and present major applications of MIRT modeling in educational and psychological measurement. Last but not least, we discuss future research directions of MIRT.
Article
Full-text available
Brain network analyses have exploded in recent years and hold great potential in helping us understand normal and abnormal brain function. Network science approaches have facilitated these analyses and our understanding of how the brain is structurally and functionally organized. However, the development of statistical methods that allow relating this organization to phenotypic traits has lagged behind. Our previous work developed a novel analytic framework to assess the relationship between brain network architecture and phenotypic differences while controlling for confounding variables. More specifically, this innovative regression framework related distances (or similarities) between brain network features from a single task to functions of absolute differences in continuous covariates and indicators of difference for categorical variables. Here we extend that work to the multitask and multisession context to allow for multiple brain networks per individual. We explore several similarity metrics for comparing distances between connection matrices and adapt several standard methods for estimation and inference within our framework: standard F test, F test with scan-level effects (SLE), and our proposed mixed model for multitask (and multisession) BrAin NeTwOrk Regression (3M_BANTOR). A novel strategy is implemented to simulate symmetric positive-definite (SPD) connection matrices, allowing for the testing of metrics on the Riemannian manifold. Via simulation studies, we assess all approaches for estimation and inference while comparing them with existing multivariate distance matrix regression (MDMR) methods. We then illustrate the utility of our framework by analyzing the relationship between fluid intelligence and brain network distances in Human Connectome Project (HCP) data.
Article
Item factor analysis (IFA), also known as Multidimensional Item Response Theory (MIRT), is a general framework for specifying the functional relationship between respondents' multiple latent traits and their responses to assessment items. The key element in MIRT is the relationship between the items and the latent traits, so-called item factor loading structure. The correct specification of this loading structure is crucial for accurate calibration of item parameters and recovery of individual latent traits. This paper proposes a regularized Gaussian Variational Expectation Maximization (GVEM) algorithm to efficiently infer item factor loading structure directly from data. The main idea is to impose an adaptive [Formula: see text]-type penalty to the variational lower bound of the likelihood to shrink certain loadings to 0. This new algorithm takes advantage of the computational efficiency of GVEM algorithm and is suitable for high-dimensional MIRT applications. Simulation studies show that the proposed method accurately recovers the loading structure and is computationally efficient. The new method is also illustrated using the National Education Longitudinal Study of 1988 (NELS:88) mathematics and science assessment data.
Conference Paper
Full-text available
Com o avanço da tecnologia, surgiram os Testes Adaptativos Computadorizados (CAT). Esses testes apresentam inúmeras vantagens sobre os testes aplicados de forma tradicional, principalmente para os programas que aplicam testes longos, como é o caso do Exame Nacional do Ensino Médio (Enem). Neste contexto, este artigo tem por objetivo identificar quantos itens são necessários via CAT para obter um escore aproximado do obtido via método tradicional de aplicação do teste de Matemática e suas Tecnologias, do Enem 2012, composto por 45 itens. Para tanto, os escores foram divididos em intervalos de proficiência para melhor análise e uma simulação post-hoc foi realizada com 5.000 respondentes. Os resultados indicaram que, no máximo, 33 itens já seriam suficientes para estimar a proficiência dos respondentes, o que gera uma redução de pelo menos 26,6% no comprimento do teste, trazendo benefícios como redução no tempo de teste e, consequentemente, da fadiga dos respondentes, economia de itens do banco de itens, entre outros.
Article
The aim of latent variable selection in multidimensional item response theory (MIRT) models is to identify latent traits probed by test items of a multidimensional test. In this paper the expectation model selection (EMS) algorithm proposed by Jiang et al. (2015) is applied to minimize the Bayesian information criterion (BIC) for latent variable selection in MIRT models with a known number of latent traits. Under mild assumptions, we prove the numerical convergence of the EMS algorithm for model selection by minimizing the BIC of observed data in the presence of missing data. For the identification of MIRT models, we assume that the variances of all latent traits are unity and each latent trait has an item that is only related to it. Under this identifiability assumption, the convergence of the EMS algorithm for latent variable selection in the multidimensional two‐parameter logistic (M2PL) models can be verified. We give an efficient implementation of the EMS for the M2PL models. Simulation studies show that the EMS outperforms the EM‐based L1 regularization in terms of correctly selected latent variables and computation time. The EMS algorithm is applied to a real data set related to the Eysenck Personality Questionnaire.
Article
To understand text-based analytical writing quality, we examined (a) the dimensions of this genre, (b) relations between these dimensions, (c) how student demographics factors predict performance in the identified dimensions of writing quality, and (d) how the identified dimensions predict overall writing quality. Text-based analytical writing data from grades 7–12 students (N = 206) were analyzed using confirmatory factor analysis and structural equation modeling. Results showed that the dimensions of writing quality were best captured by a three factor model. Ideas/Structure, Evidence Use, and Language Use were related, but dissociable dimensions of writing. Key demographic variables predicted performance across dimensions in unique ways, which in turn, predicted holistic scores. Specifically, female students outperformed males in each dimension and English only students performed higher than English Learners in each dimension. We discuss the implications of a multidimensional view of writing quality in light of writing evaluation in research and practice.
Article
Full-text available
The Motivational-Developmental Assessment (MDA) measures a university student’s motivational and developmental attributes by utilizing overlapping constructs measured across four writing prompts. The MDA’s format may lead to the violation of the local item independence (LII) assumption for unidimensional item response theory (IRT) scoring models, or the uncorrelated errors assumption for scoring models in classical test theory (CTT) due to the measurement of overlapping constructs within a prompt. This assumption violation is known as a testlet effect, which can be viewed as a method effect. The application of a unidimensional IRT or CTT model to score the MDA can result in imprecise parameter estimates when this effect is ignored. To control for this effect in the MDA responses, we first examined the presence of local dependence via a restricted bifactor model and Yen’s Q3 statistic. Second, we applied bifactor models to account for the testlet effect in the responses, as this effect is modeled as an additional latent variable in a factor model. Results support the presence of local dependence in two of the four MDA prompts, and the use of the restricted bifactor model to account for the testlet effect in the responses. Modeling the testlet effect through the restricted bifactor model supports a scoring inference in a validation argument framework. Implications are discussed.
Article
New Zealand police has long been suspected of systematic bias against the indigenous Māori. One resource available to investigate this possibility is the annual counts of police apprehensions and prosecutions, by offence type. However, model specification/fitting is complicated as these data are constrained counts, interdependent and multivariate. For example, there are limited options for factor models beyond continuous or binary data. This is a serious limitation for in our dataset, while measurements are clustered, different individuals are measured at each variable. Focusing on principal component/factor analysis representations, we show that under the canonical logit link, latent variable models can be fitted via Gibbs sampling, to multivariate binomial data of arbitrary trial size by applying Pólya‐gamma augmentation to the binomial likelihood. We demonstrate that this modelling approach, by incorporating shrinkage, will produce a fit with lower mean square error than techniques based on deviance minimization commonly employed for binary datasets. By exploring theoretical properties of the proposed models, we demonstrate a larger range of latent structures can be estimated and the presence of hidden replication improves prediction when data are multivariate binomial, which gives us greater flexibility for investigating associations between ethnicity and prosecution probability.
Article
Adaptive tests are more efficient than fixed‐length tests through the use of item response theory; adaptive tests also present students questions that are tailored to their proficiency level. Although the adaptive algorithm is straightforward, developing a multidimensional computer adaptive test (MCAT) measure is complex. Evidence‐centered design (ECD) can provide a highly structured framework to guide the development of a MCAT. In this paper, the development of the adaptive reading motivation measure (ARMM) demonstrates the process of applying ECD to the development of a MCAT measure. This paper focuses on the conceptual assessment framework layer of the ECD that guides the technical aspects of MCAT development. The five models of conceptual assessment framework layer are student model, task model, evidence model, assembly model, and presentation model. How each model guided the development of the ARMM is described in detail. In this paper, we demonstrate how the conceptual assessment framework in ECD can provide a useful structure for developing MCAT measures.
Article
Full-text available
In this paper, a new two-parameter logistic testlet response theory model for dichotomous items is proposed by introducing testlet discrimination parameters to model the local dependence among items within a common testlet. In addition, a highly effective Bayesian sampling algorithm based on auxiliary variables is proposed to estimate the testlet effect models. The new algorithm not only avoids the Metropolis-Hastings algorithm boring adjustment the turning parameters to achieve an appropriate acceptance probability, but also overcomes the dependence of the Gibbs sampling algorithm on the conjugate prior distribution. Compared with the traditional Bayesian estimation methods, the advantages of the new algorithm are analyzed from the various types of prior distributions. Based on the Markov chain Monte Carlo (MCMC) output, two Bayesian model assessment methods are investigated concerning the goodness of fit between models. Finally, three simulation studies and an empirical example analysis are given to further illustrate the advantages of the new testlet effect model and Bayesian sampling algorithm.
Article
Full-text available
Importance Veterans from recent and past conflicts have high rates of posttraumatic stress disorder (PTSD). Adaptive testing strategies can increase accuracy of diagnostic screening and symptom severity measurement while decreasing patient and clinician burden. Objective To develop and validate a computerized adaptive diagnostic (CAD) screener and computerized adaptive test (CAT) for PTSD symptom severity. Design, Setting, and Participants A diagnostic study of measure development and validation was conducted at a Veterans Health Administration facility. A total of 713 US military veterans were included. The study was conducted from April 25, 2017, to November 10, 2019. Main Outcomes and Measures The participants completed a PTSD-symptom questionnaire from the item bank and provided responses on the PTSD Checklist for Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) (PCL-5). A subsample of 304 participants were interviewed using the Clinician-Administered Scale for PTSD for DSM-5. Results Of the 713 participants, 585 were men; mean (SD) age was 52.8 (15.0) years. The CAD-PTSD reproduced the Clinician-Administered Scale for PTSD for DSM-5 PTSD diagnosis with high sensitivity and specificity as evidenced by an area under the curve of 0.91 (95% CI, 0.87-0.95). The CAT-PTSD demonstrated convergent validity with the PCL-5 (r = 0.88) and also tracked PTSD diagnosis (area under the curve = 0.85; 95% CI, 0.79-0.89). The CAT-PTSD reproduced the final 203-item bank score with a correlation of r = 0.95 with a mean of only 10 adaptively administered items, a 95% reduction in patient burden. Conclusions and Relevance Using a maximum of only 6 items, the CAD-PTSD developed in this study was shown to have excellent diagnostic screening accuracy. Similarly, using a mean of 10 items, the CAT-PTSD provided valid severity ratings with excellent convergent validity with an extant scale containing twice the number of items. The 10-item CAT-PTSD also outperformed the 20-item PCL-5 in terms of diagnostic accuracy. The results suggest that scalable, valid, and rapid PTSD diagnostic screening and severity measurement are possible.
ResearchGate has not been able to resolve any references for this publication.