Yunxiao Chen’s research while affiliated with London School of Economics and Political Science and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (67)


Sequential Change Point Detection with FDR Control in Reconfigurable Sensor Networks
  • Preprint

April 2025

·

2 Reads

Seungwon Lee

·

Yunxiao Chen

·

This paper investigates sequential change-point detection in reconfigurable sensor networks. In this problem, data from multiple sensors are observed sequentially. Each sensor can have a unique change point, and the data distribution differs before and after the change. We aim to detect these changes as quickly as possible once they have occurred while controlling the false discovery rate at all times. Our setting is more realistic than traditional settings in that (1) the set of active sensors - i.e., those from which data can be collected - can change over time through the deactivation of existing sensors and the addition of new sensors, and (2) dependencies can occur both between sensors and across time points. We propose powerful e-value-based detection procedures that control the false discovery rate uniformly over time. Numerical experiments demonstrate that, with the same false discovery rate target, our procedures achieve superior performance compared to existing methods, exhibiting lower false non-discovery rates and reduced detection delays.


Figure 1: ANES 2020: Empirical cumulative distribution function (ECDF). Highlighted variables: Feminists (solid line, ), Gay men and Lesbians (dashed line, ), BLM movement (dotted line, ), and Scientists (dash-dot line, ).
Figure 3: PISA 2018: Empirical and model-implied marginal distributions for response times (in log-minutes). The solid line ( ) is the SN model, and the dashed line ( ) the Normal model.
Generalized Latent Variable Models for Location, Scale, and Shape parameters
  • Article
  • Full-text available

March 2025

·

43 Reads

Psychometrika

We introduce a general framework for latent variable modeling, named Generalized Latent Variable Models for Location, Scale, and Shape parameters (GLVM-LSS). This framework extends the generalized linear latent variable model beyond the exponential family distributional assumption and enables the modeling of distributional parameters other than the mean (location parameter), such as scale and shape parameters, as functions of latent variables. Model parameters are estimated via maximum likelihood. We present two real-world applications on public opinion research and educational testing, and evaluate the model’s performance in terms of parameter recovery through extensive simulation studies. Our results suggest that the GLVM-LSS is a valuable tool in applications where modeling higher-order moments of the observed variables through latent variables is of substantive interest. The proposed model is implemented in the R package glvmlss , available online.

Download


When Composite Likelihood meets Stochastic Approximation

December 2024

·

15 Reads

·

1 Citation


Bias and RMSE of IRT Parameter Estimates when c = 25 and β = −0.1.
A Latent Variable Model with Change Points and Its Application to Time Pressure Effects in Educational Assessment

October 2024

·

35 Reads

Educational assessments are valuable tools for measuring student knowledge and skills, but their validity can be compromised when test takers exhibit changes in response behavior due to factors such as time pressure. To address this issue, we introduce a novel latent factor model with change-points for item response data, designed to detect and account for individual-level shifts in response patterns during testing. This model extends traditional Item Response Theory (IRT) by incorporating person-specific change-points, which enables simultaneous estimation of item parameters, person latent traits, and the location of behavioral changes. We evaluate the proposed model through extensive simulation studies, which demonstrate its ability to accurately recover item parameters, change-point locations, and individual ability estimates under various conditions. Our findings show that accounting for change-points significantly reduces bias in ability estimates, particularly for respondents affected by time pressure. Application of the model to two real-world educational testing datasets reveals distinct patterns of change-point occurrence between high-stakes and lower-stakes tests, providing insights into how test-taking behavior evolves during the tests. This approach offers a more nuanced understanding of test-taking dynamics, with important implications for test design, scoring, and interpretation.


Unfolding the Network of Peer Grades: A Latent Variable Approach

October 2024

·

24 Reads

Peer grading is an educational system in which students assess each other's work. It is commonly applied under Massive Open Online Course (MOOC) and offline classroom settings. With this system, instructors receive a reduced grading workload, and students enhance their understanding of course materials by grading others' work. Peer grading data have a complex dependence structure, for which all the peer grades may be dependent. This complex dependence structure is due to a network structure of peer grading, where each student can be viewed as a vertex of the network, and each peer grade serves as an edge connecting one student as a grader to another student as an examinee. This paper introduces a latent variable model framework for analyzing peer grading data and develops a fully Bayesian procedure for its statistical inference. This framework has several advantages. First, when aggregating multiple peer grades, the average score and other simple summary statistics fail to account for grader effects and, thus, can be biased. The proposed approach produces more accurate model parameter estimates and, therefore, more accurate aggregated grades, by modeling the heterogeneous grading behavior with latent variables. Second, the proposed method provides a way to assess each student's performance as a grader, which may be used to identify a pool of reliable graders or generate feedback to help students improve their grading. Third, our model may further provide insights into the peer grading system by answering questions such as whether a student who performs better in coursework also tends to be a more reliable grader. Finally, thanks to the Bayesian approach, uncertainty quantification is straightforward when inferring the student-specific latent variables as well as the structural parameters of the model. The proposed method is applied to two real-world datasets.


Figure 1.
Figure 2.
Figure 3. Estimated network structure for MDD and GAD. a Complete-case analysis. b Proposed method.
MSEs and biases for edge parameters.
A Note on Ising Network Analysis with Missing Data

July 2024

·

30 Reads

·

1 Citation

Psychometrika

The Ising model has become a popular psychometric model for analyzing item response data. The statistical inference of the Ising model is typically carried out via a pseudo-likelihood, as the standard likelihood approach suffers from a high computational cost when there are many variables (i.e., items). Unfortunately, the presence of missing values can hinder the use of pseudo-likelihood, and a listwise deletion approach for missing data treatment may introduce a substantial bias into the estimation and sometimes yield misleading interpretations. This paper proposes a conditional Bayesian framework for Ising network analysis with missing data, which integrates a pseudo-likelihood approach with iterative data imputation. An asymptotic theory is established for the method. Furthermore, a computationally efficient Pólya–Gamma data augmentation procedure is proposed to streamline the sampling of model parameters. The method’s performance is shown through simulations and a real-world application to data on major depressive and generalized anxiety disorders from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC).


Pairwise stochastic approximation for confirmatory factor analysis of categorical data

April 2024

·

8 Reads

·

2 Citations

British Journal of Mathematical and Statistical Psychology

Pairwise likelihood is a limited‐information method widely used to estimate latent variable models, including factor analysis of categorical data. It can often avoid evaluating high‐dimensional integrals and, thus, is computationally more efficient than relying on the full likelihood. Despite its computational advantage, the pairwise likelihood approach can still be demanding for large‐scale problems that involve many observed variables. We tackle this challenge by employing an approximation of the pairwise likelihood estimator, which is derived from an optimization procedure relying on stochastic gradients. The stochastic gradients are constructed by subsampling the pairwise log‐likelihood contributions, for which the subsampling scheme controls the per‐iteration computational complexity. The stochastic estimator is shown to be asymptotically equivalent to the pairwise likelihood one. However, finite‐sample performance can be improved by compounding the sampling variability of the data with the uncertainty introduced by the subsampling scheme. We demonstrate the performance of the proposed method using simulation studies and two real data applications.


Figure 1. Path diagram of the proposed model, where the dashed lines indicate the DIF effects.
Estimated item easiness and DIF effects for the detected DIF items. Itemâˆdˆδ ItemˆItemâ Item∠Itemâˆd Itemâˆdˆ Itemâˆdˆδ Itemâˆdˆδ ItemˆItemâ Item∠Itemâˆd Itemâˆdˆ Itemâˆdˆδ
DIF Analysis with Unknown Groups and Anchor Items

February 2024

·

98 Reads

·

4 Citations

Psychometrika

Ensuring fairness in instruments like survey questionnaires or educational tests is crucial. One way to address this is by a Differential Item Functioning (DIF) analysis, which examines if different subgroups respond differently to a particular item, controlling for their overall latent construct level. DIF analysis is typically conducted to assess measurement invariance at the item level. Traditional DIF analysis methods require knowing the comparison groups (reference and focal groups) and anchor items (a subset of DIF-free items). Such prior knowledge may not always be available, and psychometric methods have been proposed for DIF analysis when one piece of information is unknown. More specifically, when the comparison groups are unknown while anchor items are known, latent DIF analysis methods have been proposed that estimate the unknown groups by latent classes. When anchor items are unknown while comparison groups are known, methods have also been proposed, typically under a sparsity assumption – the number of DIF items is not too large. However, DIF analysis when both pieces of information are unknown has not received much attention. This paper proposes a general statistical framework under this setting. In the proposed framework, we model the unknown groups by latent classes and introduce item-specific DIF parameters to capture the DIF effects. Assuming the number of DIF items is relatively small, an L1L_1 L 1 -regularised estimator is proposed to simultaneously identify the latent classes and the DIF items. A computationally efficient Expectation-Maximisation (EM) algorithm is developed to solve the non-smooth optimisation problem for the regularised estimator. The performance of the proposed method is evaluated by simulation studies and an application to item response data from a real-world educational test.


Variable selection in latent variable models via knockoffs: an application to international large-scale assessment in education

December 2023

·

114 Reads

·

1 Citation

Journal of the Royal Statistical Society Series A (Statistics in Society)

International large-scale assessments (ILSAs) play an important role in educational research and policy making. They collect valuable data on education quality and performance development across many education systems, giving countries the opportunity to share techniques, organisational structures, and policies that have proven efficient and successful. To gain insights from ILSA data, we identify non-cognitive variables associated with students’ academic performance. This problem has three analytical challenges: (a) academic performance is measured by cognitive items under a matrix sampling design; (b) there are many missing values in the non-cognitive variables; and (c) multiple comparisons due to a large number of non-cognitive variables. We consider an application to the Programme for International Student Assessment, aiming to identify non-cognitive variables associated with students’ performance in science. We formulate it as a variable selection problem under a general latent variable model framework and further propose a knockoff method that conducts variable selection with a controlled error rate for false selections.


Citations (34)


... , p f f f and 12 , , , m    are the new variables sought, the former is known as the common factor, which can be understood as the commonality, and the latter is called the single factor, which is also known as the individuality factor. The positive integer P denotes the number of common factors, which is much smaller than the original number of variables m , that is, the original m variables are simplified into a small number of factors, the coefficients kj a and ( 1, 2, , ; 1, 2, , ) j j m k p  == are called factor loadings or factor loadings, the former is called the common factor loadings, the latter is called the univariate factor loadings since we are concerned with the common factors only, the factor loadings are usually referred to only as the former [15]. Notation: 11 ...

Reference:

An Exploration of the Role of Factor Analysis in Cultural Cognition on English Proficiency
Pairwise stochastic approximation for confirmatory factor analysis of categorical data
  • Citing Article
  • April 2024

British Journal of Mathematical and Statistical Psychology

... Detecting DIF through logistic regression and odds ratio analysis is essential to maintain test fairness and comparability (Carle & Mara, 2016;Jin et al., 2018). Studies have revealed that DIF analysis is necessary for high-stakes and low-stakes assessments like language proficiency tests for kids, where gender DIF was detected and addressed via statistical and expert judgment methods Wallin et al., 2024). Expunging and adjusting items showing DIF can enhance the accuracy of test scores and prevent biases that may affect parameter estimation and equating processes (Kabasakal & Kelecioglu, 2015;Maller & Pei, 2017). ...

DIF Analysis with Unknown Groups and Anchor Items

Psychometrika

... DIF detection methods are typically categorized as non-parametric (Holland and Thayer, 1986;Dorans and Kulick, 1986) and parametric (Rudner et al., 1980;Raju, 1988;Lord, 1980;Thissen et al., 2013;Lord, 1980;Swaminathan and Rogers, 1990). More recently, DIF detection methods that do not require predefined group memberships or anchor items have been proposed (Chen et al., 2023;Wallin et al., 2024;Halpin, 2024). While research has explored methods on handling items with DIF (Cho et al., 2016;Liu and Jane Rogers, 2022), items identified with significant DIF are often removed during item calibration, leading to wasted resources and efforts in their development and administration. ...

DIF Statistical Inference Without Knowing Anchoring Items

Psychometrika

... Upon computation of the MLE, in exploratory settings, an orthogonal or oblique rotation can be applied to the estimated factor loading matrix,Â, to obtain a more interpretable and sparse solution (e.g., Jennrich, 2004, Liu et al., 2023. ...

Rotation to Sparse Loadings Using [Formula: see text] Losses and Related Inference Problems

Psychometrika

... Almost no general methods are available for latent DIF analysis when the anchor set is unavailable. Two notable exceptions are Chen et al. (2022) and Robitzsch (2022). In Chen et al. (2022), a Bayesian hierarchical model for latent DIF analysis is proposed and applied for the simultaneous detection of item leakage and preknowledge in educational tests. ...

Detection of two-way outliers in multivariate data and application to cheating detection in educational tests
  • Citing Article
  • September 2022

The Annals of Applied Statistics

... Unless the objective function is strictly concave, both the ( However, the GH approach runs into computational challenges when the dimension of the latent space is high. In such cases, alternatives like adaptive GH quadrature (Rabe-Hesketh et al., 2005, Schilling andBock, 2005) or stochastic approximation methods (Cai, 2010, Zhang andChen, 2022) should be considered. ...

Computation for Latent Variable Model Estimation: A Unified Stochastic Proximal Framework

Psychometrika

... For each given K, we obtain a modelM with its estimated parametersÂ,b,Σ À Á using the EMS algorithm. Inspired by Cho, Wang, Zhang, and Xu (2020) and Chen and Li (2020), the approximated BIC of the modelM with the estimated parametersÂ,b,Σ À Á can be computed as. ...

Determining the number of factors in high-dimensional generalised latent factor models
  • Citing Article
  • August 2021

... There are several families of models for parametric modeling of latent variables. Over the years, psychometricians have developed and routinely employed a variety of estimation methods, including Maximum Likelihood estimators (Chen & Zhang, 2021), Least Squares estimators (Savalei & Rosseel, 2022), Bayesian samplers (Levy & Mislevy, 2017;Wu et al., 2020), and regularization techniques (Robitzsch, 2023), as well as numerous modifications and combinations tailored to specific models. More recently, backpropagation has been proposed as a technique for estimating psychometric models Converse, 2021). ...

Estimation Methods for Item Factor Analysis: An Overview
  • Citing Chapter
  • October 2021