Preprint

A Note on Exploratory Item Factor Analysis by Singular Value Decomposition

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

In this note, we revisit a singular value decomposition (SVD) based algorithm that was given in Chen et al. (2019a) for obtaining an initial value for joint maximum likelihood estimation of exploratory Item Factor Analysis (IFA). This algorithm estimates a multidimensional item response theory model by SVD. Thanks to the computational efficiency and scalability of SVD, this algorithm has substantial computational advantage over other exploratory IFA algorithms, especially when the numbers of respondents, items, and latent dimensions are all large. Under the same double asymptotic setting and notion of consistency as in Chen et al. (2019a), we show that this simple algorithm provides a consistent estimator for the loading matrix up to a rotation. This result provides theoretical guarantee to the use of this simple algorithm for exploratory IFA.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Latent factor models are widely used to measure unobserved latent traits in social and behavioral sciences, including psychology, education, and marketing. When used in a confirmatory manner, design information is incorporated, yielding structured (confirmatory) latent factor models. Motivated by the applications of latent factor models to large-scale measurements which consist of many manifest variables (e.g. test items) and a large sample size, we study the properties of structured latent factor models under an asymptotic setting where both the number of manifest variables and the sample size grow to infinity. Specifically, under such an asymptotic regime, we provide a definition of the structural identifiability of the latent factors and establish necessary and sufficient conditions on the measurement design that ensure the structural identifiability under a general family of structured latent factor models. In addition, we propose an estimator that can consistently recover the latent factors under mild conditions. This estimator can be efficiently computed through parallel computing. Our results shed lights on the design of large-scale measurement and have important implications on measurement validity. The properties of the proposed estimator are verified through simulation studies.
Article
Full-text available
Multidimensional item response theory is widely used in education and psychology for measuring multiple latent traits. However, exploratory analysis of large-scale item response data with many items, respondents, and latent traits is still a challenge. In this paper, we consider a high-dimensional setting that both the number of items and the number of respondents grow to infinity. A constrained joint maximum likelihood estimator is proposed for estimating both item and person parameters, which yields good theoretical properties and computational advantage. Specifically, we derive error bounds for parameter estimation and develop an efficient algorithm that can scale to very large datasets. The proposed method is applied to a large scale personality assessment data set from the Synthetic Aperture Personality Assessment (SAPA) project. Simulation studies are conducted to evaluate the proposed method.
Article
Full-text available
Matrix perturbation inequalities, such as Weyl's theorem (concerning the singular values) and the Davis–Kahan theorem (concerning the singular vectors), play essential roles in quantitative science; in particular, these bounds have found application in data analysis as well as related areas of engineering and computer science. In many situations, the perturbation is assumed to be random, and the original matrix has certain structural properties (such as having low rank). We show that, in this scenario, classical perturbation results, such as Weyl and Davis–Kahan, can be improved significantly. We believe many of our new bounds are close to optimal and also discuss some applications.
Article
Full-text available
This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.
Article
Joint maximum likelihood estimation (JMLE) is developed for diagnostic classification models (DCMs). JMLE has been barely used in Psychometrics because JMLE parameter estimators typically lack statistical consistency. The JMLE procedure presented here resolves the consistency issue by incorporating an external, statistically consistent estimator of examinees’ proficiency class membership into the joint likelihood function, which subsequently allows for the construction of item parameter estimators that also have the consistency property. Consistency of the JMLE parameter estimators is established within the framework of general DCMs: The JMLE parameter estimators are derived for the Loglinear Cognitive Diagnosis Model (LCDM). Two consistency theorems are proven for the LCDM. Using the framework of general DCMs makes the results and proofs also applicable to DCMs that can be expressed as submodels of the LCDM. Simulation studies are reported for evaluating the performance of JMLE when used with tests of varying length and different numbers of attributes. As a practical application, JMLE is also used with “real world” educational data collected with a language proficiency test.
Article
Describes a method of item factor analysis based on Thurstone's multiple-factor model and implemented by marginal maximum likelihood estimation and the em algorithm. Statistical significance of successive factors added to the model were tested by the likelihood ratio criterion. Provisions for effects of guessing on multiple-choice items, and for omitted and not-reached items, are included. Bayes constraints on the factor loadings were found to be necessary to suppress Heywood cases. Applications to simulated and real data are presented to substantiate the accuracy and practical utility of the method. (PsycINFO Database Record (c) 2000 APA, all rights reserved)(unassigned)
Article
The usefulness of joint and conditional maximum-likelihood is considered for the Rasch model under realistic testing conditions in which the number of examinees is very large and the number is items is relatively large. Conditions for consistency and asymptotic normality are explored, effects of model error are investigated, measures of prediction are estimated, and generalized residuals are developed.
Article
The use of analytic rotation in exploratory factor analysis will be examined. Particular attention will be given to situations where there is a complex factor pattern and standard methods yield poor solutions. Some little known but interesting rotation criteria will be discussed and methods for weighting variables will be examined. Illustrations will be provided using Thurstone's 26 variable box data and other examples.
Article
Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candes and collaborators. Typically, it is assumed that the underlying matrix has low rank. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has `a little bit of structure'. In particular, the matrix need not be of low rank. The procedure is very simple and fast, works under minimal assumptions, and is applicable for very large matrices. Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to give simple solutions to difficult questions in low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, problems related to graph limits, and generalized Bradley-Terry models for pairwise comparison.
Article
Drawing on the authors’ varied experiences working and teaching in the field, Analysis of Multivariate Social Science Data, Second Editionenables a basic understanding of how to use key multivariate methods in the social sciences. With updates in every chapter, this edition expands its topics to include regression analysis, confirmatory factor analysis, structural equation models, and multilevel models. After emphasizing the summarization of data in the first several chapters, the authors focus on regression analysis. This chapter provides a link between the two halves of the book, signaling the move from descriptive to inferential methods and from interdependence to dependence. The remainder of the text deals with model-based methods that primarily make inferences about processes that generate data. Relying heavily on numerical examples, the authors provide insight into the purpose and working of the methods as well as the interpretation of data. Many of the same examples are used throughout to illustrate connections between the methods. In most chapters, the authors present suggestions for further work that go beyond conventional exercises, encouraging readers to explore new ground in social science research. Requiring minimal mathematical and statistical knowledge, this book shows how various multivariate methods reveal different aspects of data and thus help answer substantive research questions.
Article
Exponential response models are a generalization of logit models for quantal responses and of regression models for normal data. In an exponential response model, {F(θ):θΘ}\{F(\theta): \theta \in \Theta\} is an exponential family of distributions with natural parameter θ\theta and natural parameter space ΘV\Theta \subset V, where V is a finite-dimensional vector space. A finite number of independent observations Si,iIS_i, i \in I, are given, where for iI,Sii \in I, S_i has distribution F(θi)F(\theta_i). It is assumed that θ={θi:iI}\mathbf{\theta} = \{\theta_i: \mathbf{i} \in \mathbf{I}\} is contained in a linear subspace. Properties of maximum likelihood estimates \hat\mathbf{\theta} of θ\mathbf{\theta} are explored. Maximum likelihood equations and necessary and sufficient conditions for existence of \hat\mathbf{\theta} are provided. Asymptotic properties of \hat\mathbf{\theta} are considered for cases in which the number of elements in I becomes large. Results are illustrated by use of the Rasch model for educational testing.