Q&A
Find answers to technical questions and follow scientific discussions
Discussion
Started 15th Jul, 2020

Theoretical assumptions for correlating errors during SEM

My question is connected to rather unclear point of error correlation that many scholars encounter while conducting their SEM analysis. It is pretty often when scholars report procedures of correlating the error terms to enhance the overall goodness of fit for their models. Hermida (2015), for instance, provided an in-depth analysis for such issue and pointed out that there are many cases within social sciences studies when researchers do not provide appropriate justification for the error correlation. I have read in Harrington (2008) that the measurement errors can be the result of similar meaning or close to the meanings of words and phrases in the statements that participants are asked to assess. Another option to justify such correlation was connected to longitudinal studies and a priori justification for the error terms which might be based on the nature of study variables.
In my personal case, I have two items with Modification indices above 20.
lhs op rhs mi epc sepc.lv sepc.all sepc.nox
12 item1 ~~ item2 25.788 0.471 0.471 0.476 0.476
After correlating the errors, the model fit appears just great (Model consists of 5 latent factors of the first order and 2 latent factors of the first order; n=168; number of items: around 23). However, I am concerned with how to justify the error terms correlations. In my case the wording of two items appear very similar: With other students in English language class I feel supported (item 1) and With other students in English language class I feel supported (item 2)(Likert scale from 1 to 7). According to Harrington (2008) it's enough to justify the correlation between errors.
However, I would appreciate any comments on whether justification of similar wording of questions seems enough for proving error correlations.
Any further real-life examples of wording the items/questions or articles on the same topic are also well-appreciated.

Popular replies (1)

13th Aug, 2020
Holger Steinmetz
Universität Trier
Dear Artem and Marcel,
there are two problems with post-hoc correlating errors
1) the error covariance is causally unspecific (as any correlation). If one possibility is true--namely that both items additionally measure an omitted latent--then estimating the error cov will fit the model but the omitted latent variable still is not explicitly contained in the model. This may be unproblematic if this latent is just the response reaction on a specific word contained in both items --but sometimes it may be a substantial latent variable missing in the model whose omission will bias the effects of other, contained latent variables.
2) While issue #1 still presumes that the factor model is correct (but the items *in addition* share a further cause), the need for estimating error covs will emerge as a sign of a fundamental misspecification of the factor model: If the factor model is too simple (e.g., you test a 1-factor model whereas the true structure contains more) than the only proposal the algorithm can make is to estimate error covs. These can be interpreted as the valves in a technical system. Opening the valves will reduce the pressure but not solve the problem. To the contrary: Your model will fit but it is worse than before.
One simple add-hoc test is to estimate the error cov and then to include further variables in the model which correlate (or receive / emit effects) with/from/on the latent target variable. You will often see that the model which had fitted one minute ago (due to the estimation of the error cov) again shows a substantial misfit as the factor model is still wrong and cannot explain the new restrictions and correlations between the indicators and the newly added variables.
Please note that the goal in CFA/SEM is not to get a fitting model! The (mis)fit of the model is just a tool to evaluate the causal correctness. If data fit would be the essential goal than SEModeling would be so easy: Just saturate the model and you always get a perfect data fit.
One aspect is the post-hoc justification of the error-covs: I remember once reading MacCallum (I think that it was him) who wrote that he knows no colleague who would not have enough phantasy to come to an idea to explain a post-hoc need for an error covariance. :)
Hence, besides the causal issues noted above, there are statistical problems with regard to overfitting capitalization on chance (as any other post-hoc change of the model). That is: Better look onto your items before doing the model testing and think wether they could be reasons that lead to an error covariance.
One example is the longitudinal case where error covariances between the same items are expected and are included from the beginning.
If you have to post hoc include the error covariances, carefully consider other potential reasons (mainly the more fundamental issues noted in #2) and replicate the study. But replication in causal inference context should always imply an enlargement of the model (i.e., including new variables).
Best,
Holger
Gerbing, D. W., & Anderson, J. C. (1984). On the meaning of within-factor correlated measurement errors. Journal of Consumer Research, 11, 572-580.
Landis, R. S., Edwards, B. D., & Cortina, J. M. (2009). On the practice of allowing correlated residuals among indicators in structural equation models. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and Methodological Myths and Urban Legends: Doctrine, Verity and Fable in Organizational and Social Sciences (pp. 193-215). New York: Routledge.

Most recent answer

8th Jun, 2022
Aline Philibert
Doctors without Borders, Geneva, Switzerland
Dear all,
I have some questions about correlation of errors in SEM
With my team we submitted a paper including SEM, we have been asked:
A bias analysis / sensitivity analysis for measurement error is encouraged under different assumptions about the correlations
between these measurement errors, (this means variance correlation?)
(i m working on R)
Thank you
Aline

All replies (4)

13th Aug, 2020
Marcel Grieger
Georg-August-Universität Göttingen
you have kind of provided the answer to your questions yourself. There are instances, especially while adapting or developing new scales, where correlated errors cannot be avoided. Given appropriate circumstances, I do not foresee grave issues, as long as your model fit improves considerably and you do not hide the fact that you left the "confirmatory realm" in favour of exploratory endeavours.
Since you asked for real-life examples: I have developed an instrument to measure self-efficacy beliefs of prospective and in-service teachers who teach an interdisciplinary subject.
Item 1: asks about whether teachers feel confident enhancing three core competencies among their students.
Items 2: asks about whether teachers think they can identify the three core competencies while observing lessons.
Both questions appear after one another, refer to competencies in general and one of them lists them. The fit improved considerable after allowing the errors to correlate (e.g. RMSEA from .105 to .065; CFI from .962 to .987, TLi from .936 to .976, even the Chi-quare became nearly not significant from p<.001 to p=.043).
Hope that helps. May I ask you for the complete titles of Hermida and Harrington in return? I am sure I can use them to "justify" my correlations.
Best wishes from Germany
Marcel
13th Aug, 2020
Holger Steinmetz
Universität Trier
Dear Artem and Marcel,
there are two problems with post-hoc correlating errors
1) the error covariance is causally unspecific (as any correlation). If one possibility is true--namely that both items additionally measure an omitted latent--then estimating the error cov will fit the model but the omitted latent variable still is not explicitly contained in the model. This may be unproblematic if this latent is just the response reaction on a specific word contained in both items --but sometimes it may be a substantial latent variable missing in the model whose omission will bias the effects of other, contained latent variables.
2) While issue #1 still presumes that the factor model is correct (but the items *in addition* share a further cause), the need for estimating error covs will emerge as a sign of a fundamental misspecification of the factor model: If the factor model is too simple (e.g., you test a 1-factor model whereas the true structure contains more) than the only proposal the algorithm can make is to estimate error covs. These can be interpreted as the valves in a technical system. Opening the valves will reduce the pressure but not solve the problem. To the contrary: Your model will fit but it is worse than before.
One simple add-hoc test is to estimate the error cov and then to include further variables in the model which correlate (or receive / emit effects) with/from/on the latent target variable. You will often see that the model which had fitted one minute ago (due to the estimation of the error cov) again shows a substantial misfit as the factor model is still wrong and cannot explain the new restrictions and correlations between the indicators and the newly added variables.
Please note that the goal in CFA/SEM is not to get a fitting model! The (mis)fit of the model is just a tool to evaluate the causal correctness. If data fit would be the essential goal than SEModeling would be so easy: Just saturate the model and you always get a perfect data fit.
One aspect is the post-hoc justification of the error-covs: I remember once reading MacCallum (I think that it was him) who wrote that he knows no colleague who would not have enough phantasy to come to an idea to explain a post-hoc need for an error covariance. :)
Hence, besides the causal issues noted above, there are statistical problems with regard to overfitting capitalization on chance (as any other post-hoc change of the model). That is: Better look onto your items before doing the model testing and think wether they could be reasons that lead to an error covariance.
One example is the longitudinal case where error covariances between the same items are expected and are included from the beginning.
If you have to post hoc include the error covariances, carefully consider other potential reasons (mainly the more fundamental issues noted in #2) and replicate the study. But replication in causal inference context should always imply an enlargement of the model (i.e., including new variables).
Best,
Holger
Gerbing, D. W., & Anderson, J. C. (1984). On the meaning of within-factor correlated measurement errors. Journal of Consumer Research, 11, 572-580.
Landis, R. S., Edwards, B. D., & Cortina, J. M. (2009). On the practice of allowing correlated residuals among indicators in structural equation models. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and Methodological Myths and Urban Legends: Doctrine, Verity and Fable in Organizational and Social Sciences (pp. 193-215). New York: Routledge.
3rd Feb, 2022
Dorota Mierzejewska-Floreani
SWPS University of Social Sciences and Humanities
I was pleased to read your correspondence.
I want to ask, have you encountered in CFA with covariance's creation for all errors of reversed items? In my model, they sound very similar (50% of items), and the fit is significantly improved if I correlate their errors. However, if I correlate errors of any unreversed items, the fit does not change. Am I making any mistake in thinking or inferring? (sorry, I am a beginner)
8th Jun, 2022
Aline Philibert
Doctors without Borders, Geneva, Switzerland
Dear all,
I have some questions about correlation of errors in SEM
With my team we submitted a paper including SEM, we have been asked:
A bias analysis / sensitivity analysis for measurement error is encouraged under different assumptions about the correlations
between these measurement errors, (this means variance correlation?)
(i m working on R)
Thank you
Aline

Similar questions and discussions

Related Publications

Article
SEM have two part that is structural model and measurement model. Composite structural model and measurement model testing enable to test the measurement error as part indivisible of SEM. Based on that condition, SEM can be used and developed to know about a correlation measurement influence in exogen variable and endogen variable, and correlation...
Article
Abstract Structural Equation Modeling (SEM) is a multivariate data analysis, and one of the requirements in using SEM is that the data has interval scale. Some researchers argue that Likert scale is an interval, yet many others assume that this type of data is ordinal, and therefore transformation is important to apply to uplift the measurement sca...
Got a technical question?
Get high-quality answers from experts.