Ke-Hai Yuan

Ke-Hai Yuan
University of Notre Dame | ND · Department of Psychology

PhD

About

205
Publications
120,442
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,011
Citations
Additional affiliations
July 2008 - December 2015
University of Notre Dame
Position
  • Professor
August 1998 - June 2001
University of North Texas
Position
  • Professor (Assistant)
August 1998 - June 2001
University of North Texas
Position
  • Professor (Assistant)

Publications

Publications (205)
Article
Full-text available
Data in social and behavioral sciences typically contain measurement errors and also do not have predefined metrics. Structural equation modeling (SEM) is commonly used to analyze such data. This article discuss issues in latent-variable modeling as compared to regression analysis with composite-scores. Via logical reasoning and analytical results...
Article
Most existing studies on the relationship between factor analysis (FA) and principal component analysis (PCA) focus on approximating the common factors by the first few components via the closeness between their loadings. Based on a setup in Bentler and de Leeuw (Psychometrika 76:461–470, 2011), this study examines the relationship between FA loadi...
Article
Full-text available
Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models a...
Article
Full-text available
This study applies equivalence testing methods on a representative sample of published empirical structural equation modeling studies in the psychological sciences and assesses the extent to which reported fit results are replicated when compared to those obtained via traditional methods. Results of 382 models from a sample of 242 articles publishe...
Article
Purpose Although numerous studies have been conducted to explore the impact of various factors on employees' turnover intention and intention to remain with the organization, the relationship between these two constructs remains largely unexplored. Considering the significance of these constructs, particularly in the context of the COVID-19 pandemi...
Chapter
Structural equation modeling (SEM) is a widely used technique for studies involving latent constructs. While covariance-based SEM (CB-SEM) permits estimating the regression relationship among latent constructs, the parameters governing this relationship do not apply to that among the scored values of the constructs, which are needed for prediction,...
Chapter
Cronbach’s alpha remains very important as a measure of internal consistency in the social sciences. The Spearman-Brown formula indicates that as the number of items goes to infinity, the reliability of the composite eventually approaches one. Under proper conditions, as the lower bound of the reliability the coefficient alpha also keeps increasing...
Article
Full-text available
R-squared measures of explained variance are easy to understand, naturally interpretable, and widely used by substantive researchers. In mediation analysis, however, despite recent advances in measures of mediation effect, few effect sizes have good statistical properties. Also, most of these measures are only available for the simplest three-varia...
Article
Purpose The purpose of this paper is to discuss the pros and cons of partial least squares approach to structural equation modeling (PLS-SEM). The topics include bias, consistency, maximization of R ² , reliability and model validation. Design/methodology/approach The approach in this study is descriptive, and the method consists of logical argume...
Article
The impact of missing data on statistical inference varies depending on several factors such as the proportion of missingness, missing-data mechanism, and method employed to handle missing values. While these topics have been extensively studied, most recommendations have been made assuming that all missing values are from the same missing-data mec...
Article
Observational data typically contain measurement errors. Covariance‐based structural equation modelling (CB‐SEM) is capable of modelling measurement errors and yields consistent parameter estimates. In contrast, methods of regression analysis using weighted composites as well as a partial least squares approach to SEM facilitate the prediction and...
Chapter
The aim of this article is to discuss key features of missing data analysis and its related developments. Concepts covered by this article include missing-data mechanisms, statistical methods commonly used to handle missing values, and statistical methods to address uncertainty of the underlying missing-data mechanism which include tests of the mis...
Article
This is a follow-up reply to “Structural parameters under partial least squares and covariance-based structural equation modeling: A comment on Yuan and Deng (2021 Yuan, K.-H., & Deng, L. (2021). Equivalence of partial-least-squares SEM and the methods of factor-score regression. Structural Equation Modeling, 28, 557–571. https://doi.org/10.1080/10...
Presentation
Full-text available
This presentation gives an overview of recent developments in moderation and mediation analyses. The developments are in 6 articles (listed at the end of this document) that contain statistical models that honestly match the conceptual models for 1) moderation analysis, 2) moderated mediation, 3) mediated moderation, and 4) mediation. \\ The conve...
Article
Full-text available
Mediation and moderation analyses are commonly used methods for studying the relationship between an independent variable (X) and a dependent variable (Y) in conducting empirical research. To better understand the relationships among variables, there is an increasing demand for a more general theoretical framework that combines moderation and media...
Article
Structural equation modeling (SEM) and path analysis using composite-scores are distinct classes of methods for modeling the relationship of theoretical constructs. The two classes of methods are integrated in the partial-least-squares approach to structural equation modeling (PLS-SEM), which systematically generates weighted composites and uses th...
Article
Structural equation modeling (SEM) is widely used in behavioral, social, and education research. Drawing publication-ready path diagrams for SEM is not a pleasant task with the existing software. The article introduces an open-source web-based graphical application, semdiag, for drawing WYSIWYG SEM path diagrams interactively. The application is an...
Article
Structural equation modeling (SEM) has been deemed as a proper method when variables contain measurement errors. In contrast, path analysis with composite scores is preferred for prediction and diagnosis of individuals. While path analysis with composite scores has been criticized for yielding biased parameter estimates, recent literature pointed o...
Article
This article proposes a two-level moderated mediation (2moME) model with single level data, and develops measures to quantify the moderated mediation (moME) effect sizes for both the conventional moME model and the 2moME model. A Bayesian approach is developed to estimate and test moME effects and the corresponding effect sizes (ES). Monte Carlo re...
Chapter
Cronbach’s coefficient alpha remains very important as a measure of internal consistency. The well-known Spearman-Brown formula indicates that as the number of items (i.e., the dimension) goes to infinity, the coefficient alpha eventually approaches one. In this work, we show that under the assumption of a one-factor model, not necessarily with par...
Article
Full-text available
issing values that are missing not at random (MNAR) can result from a variety of missingness processes. However, two fundamental subtypes of MNAR values can be obtained from the definition of the MNAR mechanism itself. The distinction between them deserves consideration because they have characteristic differences in how they distort relationships...
Article
Partial-least-squares approach to structural equation modeling (PLS-SEM) uses proxies of latent variables to conduct regression analysis, which directly addresses the needs of prediction and classification. Regression analysis using factor-scores has the same capacity but different factor scores have been noted with different properties. This artic...
Article
Differential item functioning (DIF) analysis is an important step in establishing the validity of measurements. Most traditional methods for DIF analysis use an item-by-item strategy via anchor items that are assumed DIF-free. If anchor items are flawed, these methods will yield misleading results due to biased scales. In this article, based on the...
Article
Full-text available
The article On Averaging Variables in a Confirmatory Factor Analysis Model
Article
Data in social sciences are typically non‐normally distributed and characterized by heavy tails. However, most widely used methods in social sciences are still based on the analyses of sample means and sample covariances. While these conventional methods continue to be used to address new substantive issues, conclusions reached can be inaccurate or...
Article
Chi-square type test statistics are widely used in assessing the goodness-of-fit of a theoretical model. The exact distributions of such statistics can be quite different from the nominal chi-square distribution due to violation of conditions encountered with real data. In such instances, the bootstrap or Monte Carlo methodology might be used to ap...
Article
Full-text available
Measures of explained variance, ΔR2 and f,2 are routinely used to evaluate the size of moderation effects. However, they suffer from several drawbacks: (a) Not all the variance components of the outcome variable Y are related to the effect of moderation, and so an effect size with the total variance of Y as the denominator cannot accurately charact...
Article
Multilevel structural equation models (MSEM) are typically evaluated on the basis of goodness of fit indices. A problem with these indices is that they pertain to the entire model, reflecting simultaneously the degree of fit for all levels in the model. Consequently, in cases that lack model fit, it is unclear which level model is misspecified. Alt...
Chapter
Many aspects of multivariate analysis involve obtaining the precision matrix, i.e., the inverse of the covariance matrix. When the dimension is larger than the sample size, the sample covariance matrix is no longer positive definite, and the inverse does not exist. Under the sparsity assumption on the elements of the precision matrix, the problem c...
Article
Data-driven model modification plays an important role for a statistical methodology to advance the understanding of subjective matters. However, when the sample size is not sufficiently large model modification using the Lagrange multiplier (LM) test has been found not performing well due to capitalization on chance. With the recent development of...
Presentation
Full-text available
The purpose of this interdisciplinary paper is to open a new horizon for psychologists and mind scientists. This paper is interdisciplinary because it links psychology, applied computing, mathematics, neuroscience, methodology and data sciences. This paper is under the assumption that the brain is a fuzzy inference engine and the mind is a colle...
Article
In exploratory factor analysis (EFA), cross-loadings frequently occur in empirical research, but its effects on determining the number of factors to retain are seldom known. In this paper, we analyzed whether and how cross-loadings affected the performance of the parallel analysis (PA), the empirical Kaiser criterion (EKC), the likelihood ratio tes...
Conference Paper
The squared multiple correlation ($R^2$) is commonly used to measure how well the outcome variable is linearly related to a set of predictors. Unfortunately, $R^2$ is biased for its population counterpart ($\rho^2$), and the bias increases as the number of variables ($p$) increases. Efforts have been made to modify $R^2$. The most notable result is...
Article
Although callous-unemotional traits have been shown to play an important role in cyberbullying perpetration, little is known about mediating and moderating mechanisms underlying this relationship. In the present study, we examined the mediating role of moral disengagement in the association between callous-unemotional traits and cyberbullying perpe...
Article
Although childhood psychological maltreatment has been shown to play an important role in moral disengagement, little is known about the mediating and moderating mechanisms underlying this relationship. This study examined whether callous-unemotional (CU) traits mediated the relationship between childhood psychological maltreatment and moral diseng...
Book
Full-text available
This conference proceeding represents presentations given at the Annual Meeting of the International Society for Data Science and Analytics (ISDSA) in Nanjing, China, during July 6–8, 2019. The Annual Meeting of ISDSA aims to provide a global forum for researchers and practitioners in the field of data science and data analytics to communicate and...
Article
With single-level data, Yuan, Cheng and Maxwell developed a two-level regression model for more accurate moderation analysis. This article extends the two-level regression model to a two-level moderated latent variable (2MLV) model, and uses a Bayesian approach to estimate and test the moderation effects. Monte Carlo results indicate that: 1) the n...
Article
Compared to the conventional covariance-based SEM (CB-SEM), partial-least-squares SEM (PLS-SEM) has an advantage in computation, which obtains parameter estimates by repeated least squares regression with a single dependent variable each time. Such an advantage becomes increasingly important with big data. However, the estimates of regression coeff...
Chapter
It is well-known that factor analysis and principal component analysis often yield similar estimated loading matrices. Guttman (Psychometrika 21:273–285, 1956) identified a condition under which the two matrices are close to each other at the population level. We discuss the matrix version of the Guttman condition for closeness between the two meth...
Article
Data in social and behavioral sciences are routinely collected using questionnaires, and each domain of interest is tapped by multiple indicators. Structural equation modeling (SEM) is one of the most widely used methods to analyze such data. However, conventional methods for SEM face difficulty when the number of variables ( p) is large even when...
Article
Survey data often contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. However, conventional SEM methods are not crafted to handle data with a large number of variables (p). A large p can cause Tml, the most widely used likelihood ratio statistic, to depart drastically from the assumed chi-square distr...
Article
Effect size is crucial for quantifying differences and a key concept behind Type I errors and power, but measures of effect size are seldom studied in structural equation modeling (SEM). While fit indices such as the root mean square error of approximation may address the severity of model misspecification, they are not a direct generalization of c...
Article
Full-text available
Motivated by the need to effectively evaluate the quality of the mean structure in growth curve modeling (GCM), this article proposes to separately evaluate the goodness of fit of the mean structure from that of the covariance structure. Several fit indices are defined, and rationales are discussed. Particular considerations are given for polynomia...
Article
Meta-analysis plays a key role in combining studies to obtain more reliable results. In social, behavioral, and health sciences, measurement units are typically not well defined. More meaningful results can be obtained by standardizing the variables and via the analysis of the correlation matrix. Structural equation modeling (SEM) with the combined...
Article
Full-text available
Ridge generalized least squares (RGLS) is a recently proposed estimation procedure for structural equation modeling. In the formulation of RGLS, there is a key element, ridge tuning parameter, whose value determines the efficiency of parameter estimates. This article aims to optimize RGLS by developing formulas for the ridge tuning parameter to yie...
Book
Full-text available
Conducting statistical power analysis for both simple and complex statistical models using online app or R package WebPower. Correlation and partial correlation One-sample and two-sample proportions One-sample and two-sample t-tests One-way ANOVA, repeated-measures ANOVA, two-way Linear, logistic, and Poisson regression Cluster randomized trials an...
Chapter
Full-text available
Principal component analysis (PCA) is a multivariate statistical technique frequently employed in research in behavioral and social sciences, and the results of PCA are often used to approximate those of exploratory factor analysis (EFA) because the former is easier to implement. In practice, the needed number of components or factors is often dete...
Article
Unless data are missing completely at random (MCAR), proper methodology is crucial for the analysis of incomplete data. Consequently, methods for effectively testing the MCAR mechanism become important, and procedures were developed via testing the homogeneity of means and variances–covariances across the observed patterns (e.g., Kim & Bentler in P...
Article
Full-text available
Among test statistics for assessing overall model fit in structural equation modeling (SEM), the Satorra–Bentler rescaled statistic (Formula presented.) is most widely used when the normality assumption is violated. However, many researchers have found that (Formula presented.) tends to overreject correct models when the number of variables (p) is...
Article
Full-text available
Measurement invariance (MI) entails that measurements in different groups are comparable, and is a logical prerequisite when studying difference or change across groups. MI is commonly evaluated using multi-group structural equation modeling through a sequence of chi-square and chi-square-difference tests. However, under the conventional null hypot...
Article
Mean and mean-and-variance corrections are the 2 major principles to develop test statistics with violation of conditions. In structural equation modeling (SEM), mean-rescaled and mean-and-variance-adjusted test statistics have been recommended under different contexts. However, recent studies indicated that their Type I error rates vary from 0% to...
Article
Survey data often contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. With typical nonnormally distributed data in practice, a rescaled statistic Trml proposed by Satorra and Bentler was recommended in the literature of SEM. However, Trml has been shown to be problematic when the sample size N is smal...
Conference Paper
Full-text available
A Monte Carlo-based power analysis is proposed for t-test to deal with non-normality and heterogeneity in real data. The step-by-step procedure of the proposed method is introduced in the paper. For comparing the performance of the Monte Carlo-based power analysis to that of conventional pooled-variance t-test, a simulation study was conducted. The...
Conference Paper
In research on approximating factor analysis (FA) by principal component analysis (PCA), FA loadings and PCA loadings are typically compared using some measure of closeness or distance. Previous studies have used the average squared canonical correlation (ASCC) between the two loading matrices as a measure of closeness. This measure has the advanta...
Article
Data in psychology are often collected using Likert-type scales, and it has been shown that factor analysis of Likert-type data is better performed on the polychoric correlation matrix than on the product-moment covariance matrix, especially when the distributions of the observed variables are skewed. In theory, factor analysis of the polychoric co...
Article
Full-text available
When the assumption of multivariate normality is violated and the sample sizes are relatively small, existing test statistics such as the likelihood ratio statistic and Satorra–Bentler’s rescaled and adjusted statistics often fail to provide reliable assessment of overall model fit. This article proposes four new corrected statistics, aiming for be...
Article
The normal-distribution-based likelihood ratio statistic is widely used for power analysis in structural Equation modeling (SEM). In such an analysis, power and sample size are computed by assuming that follows a central chi-square distribution under and a noncentral chi-square distribution under . However, with either violation of normality or no...
Article
Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes...
Article
Full-text available
Nonnormality of univariate data has been extensively examined previously (Blanca et al., Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78–84, 2013; Miceeri, Psychological Bulletin, 105(1), 156, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate ana...
Article
Structural equation models are typically evaluated on the basis of goodness-of-fit indexes. Despite their popularity, agreeing what value these indexes should attain to confidently decide between the acceptance and rejection of a model has been greatly debated. A recently proposed approach by means of equivalence testing has been recommended as a s...
Article
Full-text available
Multigroup structural equation modeling (SEM) plays a key role in studying measurement invariance and in group comparison. When population covariance matrices are deemed not equal across groups, the next step to substantiate measurement invariance is to see whether the sample covariance matrices in all the groups can be adequately fitted by the sam...
Article
Full-text available
Structural equation modeling (SEM) has become one of the most widely used multivariate methods in many disciplines. It is important to summarize the findings across studies and to obtain more general results in each discipline or cross fields. Meta analytical SEM (MASEM) plays a key role in such a direction. The five MASEM articles in this issue of...
Article
In structural equation modeling (SEM), parameter estimates are typically computed by the Fisher-scoring algorithm, which often has difficulty in obtaining converged solutions. Even for simulated data with a correctly specified model, non-converged replications have been repeatedly reported in the literature. In particular, in Monte Carlo studies it...
Chapter
Guttman (Psychometrika 21 273–286:1956) showed that the loadings of factor analysis (FA) and those of principal component analysis (PCA) approach each other as the number of variables p goes to infinity. Because the computation for PCA is simpler than FA, PCA can be used as an approximation for FA when p is large. However, another side of the coin...
Article
Full-text available
The conventional setup for multi-group structural equation modeling requires a stringent condition of cross-group equality of intercepts before mean comparison with latent variables can be conducted. This article proposes a new setup that allows mean comparison without the need to estimate any mean structural model. By projecting the observed sampl...
Article
Full-text available
This article proposes 2 classes of ridge generalized least squares (GLS) procedures for structural equation modeling (SEM) with unknown population distributions. The weight matrix for the first class of ridge GLS is obtained by combining the sample fourth-order moment matrix with the identity matrix. The weight matrix for the second class is obtain...
Article
Full-text available
Conventional null hypothesis testing (NHT) is a very important tool if the ultimate goal is to find a difference or to reject a model. However, the purpose of structural equation modeling (SEM) is to identify a model and use it to account for the relationship among substantive variables. With the setup of NHT, a nonsignificant test statistic does n...
Article
Full-text available
Cronbach’s coefficient alpha is a widely used reliability measure in social, behavioral, and education sciences. It is reported in nearly every study that involves measuring a construct through multiple items. With non-tau-equivalent items, McDonald’s omega has been used as a popular alternative to alpha in the literature. Traditional estimation me...
Article
Full-text available
Multigroup structural equation modeling (SEM) plays a key role in studying measurement invariance and in group comparison. However, existing methods for multigroup SEM assume that different samples are independent. This article develops a method for multigroup SEM with correlated samples. Parallel to that for independent samples, the focus here is...
Chapter
Full-text available
This article studies the relationship between loadings from factor analysis (FA) and principal component analysis (PCA) when the number of variables p is large. Using the average squared canonical correlation between two matrices as a measure of closeness, results indicate that the average squared canonical correlation between the sample loading ma...
Article
Full-text available
Means and covariance/dispersion matrix are the building blocks for many statistical analyses. By naturally extending the score functions based on a multivariate \(t\)-distribution to estimating equations, this article defines a class of M-estimators of means and dispersion matrix for samples with missing data. An expectation-robust (ER) algorithm s...
Article
Full-text available
This article compares parameter estimates by 2-stage ML (TSML) and a recently developed 2-stage robust (TSR) method for structural equation modeling (SEM) with missing data. In the design, data are missing at random (MAR) after an auxiliary variable (AV) is included, and they are missing not at random (MNAR) otherwise. Results indicate that, when e...
Article
Full-text available
Certain diversity among team members is beneficial to the growth of an organization. Multiple measures have been proposed to quantify diversity, although little is known about their psychometric properties. This article proposes several methods to evaluate the unidimensionality and reliability of three measures of diversity. To approximate the inte...
Article
Full-text available
The paper clarifies the relationship among several information matrices for the maximum likelihood estimates (MLEs) of item parameters. It shows that the process of calculating the observed information matrix also generates a related matrix that is the middle piece of a sandwich-type covariance matrix. Monte Carlo results indicate that standard err...
Article
Normal-distribution-based maximum likelihood (NML) is most widely used for missing data analysis although real data seldom follow a normal distribution. When missing values are missing at random (MAR), recent results indicate that NML estimates (NMLEs) are still consistent for nonnormally distributed populations as long as the variables are linearl...
Article
This paper reviews various methods of identifying missing data mechanisms. The three well‐known mechanisms of missing completely at random ( MCAR ), missing at random ( MAR ), and missing not at random ( MNAR ) are considered. A number of tests deem rejection of homogeneity of means and/or covariances ( HMC ) among observed data patterns as a means...
Article
Full-text available
Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression m...
Article
Full-text available
Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when th...
Article
Full-text available
When item parameter estimates are used to estimate the ability parameter in item response models, the standard error (SE) of the ability estimate must be corrected to reflect the error carried over from item calibration. For maximum likelihood (ML) ability estimates, a corrected asymptotic SE is available, but it requires a long test and the covari...
Article
Full-text available
In this paper, we define a class of cross-validatory model selection criteria as an estimator of the predictive risk function based on a discrepancy between a candidate model and the true model. For a vector of unknown parameters, $n$ estimators are required for the definition of the class, where $n$ is the sample size. The $i$th estimator $(i=1,\...
Article
Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sens...
Article
Full-text available
Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytic...
Article
Full-text available
Research today demands the application of sophisticated and powerful research tools. Fulfilling this need, this two-volume text provides the tool box to deliver the valid and generalizable answers to today's complex research questions. The Oxford Handbook of Quantitative Methods in Psychology aims to be a source for learning and reviewing current b...
Article
Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be “tailored” to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends o...
Chapter
This paper extends Bartlett’s formula for computing factor scores to general structural equation modeling. The derived formulas can handle modeling situations where existing formulas cannot apply, namely, when there are exogenous observed variables in the model. The derivation of general formulas for computing Bartlett factor scores leads to extend...
Article
Full-text available
Normal-distribution-based maximum likelihood (ML) and multiple imputation (MI) are the two major procedures for missing data analysis. This article compares the two procedures with respects to bias and efficiency of parameter estimates. It also compares formula-based standard errors (SEs) for each procedure against the corresponding empirical SEs....
Article
Full-text available
Yuan and Hayashi (2010)14. Yuan , K.-H. and Hayashi , K. 2010. Fitting data to model: Structural equation modeling diagnosis using two scatter plots. Psychological Methods, 15: 335–351. [CrossRef], [Web of Science ®]View all references introduced 2 scatter plots for model and data diagnostics in structural equation modeling (SEM). However, the ge...
Article
Full-text available
The paper develops a two-stage robust procedure for structural equation modeling (SEM) and an R package rsem to facilitate the use of the procedure by applied researchers. In the first stage, M-estimates of the saturated mean vector and covariance matrix of all variables are obtained. Those corresponding to the substantive variables are then fitted...