Science topic

# Latent Variable Modeling - Science topic

Explore the latest questions and answers in Latent Variable Modeling, and find Latent Variable Modeling experts.
Questions related to Latent Variable Modeling
Question
I am trying to extract a linear equation of the regression estimation of a model performed on the AMOS program. I have one latent endogenous latent variable, one latent mediator, and one latent exogenous variable. To extract a regression equation, I have relied on the following formula:
Z = Intercept + aX + bY + error
where X is the exogenous variable and Y is the mediator. a and b are standardized loading factors
Knowing that AMOS does not provide the intercept value of the latent endogenous variable, how can I calculate it?
Is it sufficient to consider the error variance calculated by AMOS as the error value?
Sami Mo The default in many SEM programs is that latent variables are mean centered. In other words, the means of the latent variables are assumed to be zero by default in many programs. This is because latent means are arbitrary in many situations and/or directly linked to observed variable means. Although I'm not an AMOS user, I suspect that this is probably true in AMOS as well (to be sure, you could check the AMOS user's manual--I'm pretty sure there must be a description of the default mean structure settings in that document). This is why you do not get a latent intercept through the default settings.
There are ways though to identify the latent variable mean and intercept structure in cases where this is useful. For example, you can select one indicator per factor as reference indicator for which the loading is fixed to 1 and the observed variable intercept is fixed to zero. Then, the latent means and intercepts can be identified and estimated and would be provided in the output. (You would have to specifically request the estimation of these parameters though as they aren't given by default.)
Also, in the completely standardized solution, there would not be intercepts in any case because standardized variables are in deviation form (i.e., have means of zero) by definition of the z standardization process.
Error variance estimates are typically not added to equations (error variables are variables, not constants) but the error variances are sometimes depicted in path diagrams.
Question
I am conducting a confirmatory factor analysis examining a three-factor model using R's lavaan package. My factors are shifting, working memory, and response inhibition. I used the modindices function to see if there was any modification index that would reduce my chi-square by 10 or more. It told me that two of my response inhibition variables highly covaried, so I redid the model and accepted the suggestion. As expected, the model improved and my loading coefficients for response inhibition evened out, but all of a sudden the standardized factor loadings for shifting became "NA." What does this mean, and why did a tweak in the response inhibition factor affect the shifting factor? In the original model, I had values for the shifting factor (though it was my worst factor). The only change I made in this new model was drawing the covariance between the two suggested response inhibition variables.
Again, it is impossible to say without knowing more about your model and sample size as Heywood cases can have many different causes (small sample, too few indicators per factor, model misspecification, overparameterization, etc.)
Question
I am working on an SEM model in MPLUS with survey data (n=300) that has ordinal/binary indicators, multiple imputations (10), and survey weights.
My variables are:
Y1: latent with 7 ordinal indicators
Y2: Observed, single item, ordinal
Y3: latent with 6 binary indicators
Y4: Latent, 7 ordinal indicators
X1: Binary, dummy
X2 Binary dummy
Model is:
Y1 on Y2 Y3 Y4 X1 X2
Y2 on Y3 Y4 X1 X2
Y3 on Y4 X1 X2
Y4 on X1
X1 with X2
My latent variable model has a perfect fit with CFI= 0.99, TLI=0.98, RMSEA=0.03, SRMR=0.08, and chi-sq/df=1.35. However, given the complexity of the model and the small sample size, I wanted to reduce the variables by calculating composite scores (sum of indicator value*factor loading) for all three latent variables, and run a path model. The significance of the variables is almost similar to that in the latent model but fit indices changed significantly. CFI=0.94, TLI =0.16, SRMR=0.03, RMSEA=0.12. I read in the literature when the model has a small degree of freedom, RMSEA has no interpretive value. But what about TLI, why is it so low compared to the CFI? Any thoughts on what can I test to find the cause of this strange result?
Hi,
I am not 100% sure, but these are my thoughts on that issue.
The CFI and TLI are computed as follows (see http://www.davidakenny.net/cm/fit.htm)
CFI = ((Chi_2_H0 – df_h0) – (Chi_2_H1 – df_h1))/(Chi_2_H0 – df_h0)
TLI = ((Chi_2_H0 /df_h0) – (Chi_2_H0 / df_h0))/(Chi_2_H0 /df_h0)
Thus, CFI and TLI penalty complexity differently.
Your model appears almost saturated (df = 1). I simulated a model similar to yours (see attached). Note, that a guessed parameters blindly to reproduce the discrepancy between CFI and TLI.
In the first (miss-specified) model, I fixed the path between Y4 and X2 to zero. The misfit was similar to that you observed (Chi (df = 1) = 9.257, CFI = .960, TLI = .435). The model tells that omitting Y4 and X2 is problematic and that the association between Y4 and X2 is statistically significant.
In the second model, I additionally constrained the path between Y4 and X1 to zero. However, I simulated data such that Y4 ON X1 is statistically insignificant. Thus, adding that constrained had almost no impact on the Chi-Square value (9.398, df = 2); however, the TLI appears more favorable, while the CFI remained almost unchanged (TLI = .747, CFI = .964).
Plugging the values into the formula of CFI and TLI and computing the ratio of the numerators reveals that the extra degree of freedom has a much stronger influence on the TFI (considering that the nominator is the same in both models).
Ratio of numerators CFI = (((218.637 - 14)-(9.398-2)))/(((218.637 - 14)-(9.257-1)) ~ 1 Ratio of numerators TFI = (((209.239/12)-(9.398/2)))/(((209.239/12)-(9.257/1))) ~ 1.7
This is of course just a description of what happens within the simulated dataset and not a general rule. However, this example illustrates that the discrepancy you observed is probably the result of model with low df that inappropriately omits a statistically significant path between Y4 and X4.
However, this is just guessing because you did not show the full model.
Hope that helps, Manuel
Question
I almost asked this as a "technical question" but I think it's more of a discussion topic. Let me describe where I get lost in this discussion, and what I'm seeing in practice. For context, I'm a quantitative social scientist, not "real statistician" nor a data scientist per se. I know how to run and interpret model results, and a little statistical theory, but I wouldn't call myself anything more than an applied statistician. So take that into account as you read. The differences between "prediction-based" modeling goals and "inference-based" modeling goals are just starting to crystalize for me, and I think my background is more from the "inference school" (though I wouldn't have thought to call it that until recently). By that I mean, I'm used to doing theoretically-derived regression models that include terms that can be substantively interpreted. We're interested in the regression coefficient or odds ratio more than the overall fit of the model. We want the results to make sense with respect to theory and hypotheses, and provide insight into the data generating (i.e., social/psychological/operational) process. Maybe this is a false dichotomy for some folks, but it's one I've seen in data science intro materials.
The scenario: This has happened to me a few times in the last few years. We're planning a regression analysis project and a younger, sharper, data-science-trained statistician or researcher suggests that we set aside 20% (or some fraction like that) of the full sample as test sample, develop the model on our test sample, and then validate the model on the remaining 80% (validation).
Why I don't get this (or at least struggle with it): My first struggle point is a conceptual/theoretical one. If you use a random subset of your data, shouldn't you get the same results on that data as you would with the whole data (in expectation) for the same reason you would with a random sample from anything? By that I mean, you'd have larger variances and some "significant" results won't be significant due to sample size of course, but shouldn't any "point estimates" (e.g., regression coefficients) be the same since it's a random subset? In other words, shouldn't we see all the same relationships between variables (ignoring significance)? If the modeling is using significance as input to model steps (e.g., decision trees), that could certainly lead to a different final model. But if you're just running a basic regression, why would anyone do this?
There are also some times when a test sample just isn't practical (i.e., a data set of 200 cases). And sometimes it's impractical because there just isn't time to do it. Let's set those aside for the discussion.
Despite my struggles, there are some scenarios where the "test sample" approach makes sense to me. On a recent project we were developing relatively complex models, including machine learning models, and our goal was best prediction across methods. We wanted to choose which model predicted the outcome best. So we used the "test and validate" approach. But I've never used it on a theory/problem-driven study where we're interested in testing hypotheses and interpreting effect sizes (even when I've had tens of thousands of cases in my data file). It always just seems like a step that gets in the way. FWIW, I've been discussing this technique in terms of data science, but I first learned about it when learning factor analysis and latent variable models. The commonality is how "model-heavy" these methods are relative to other kinds of statistical analysis.
So...am I missing something? Being naive? Just old-fashioned?
If I could phrase this as a question, it's "Why should I use test and validation samples in my regression analyses? And please answer with more than 'you might get different results on the two samples' " :)
Thanks! Looking forward to your insights and perspective. Open to enlightenment! :)
As was mentioned above by Alexander Kolker , there are actually two scenarios for using your models. If you want to assess the reliability and the operability of your model or algorithm that produces predictions, you need to compare predictions and actuals. The design of this comparison should be as close as possible to what you'd like to implement in practice. There are also different setups that can be used, e.g., the rolling-origin evaluation when you re-estimate coefficients each time a new observation appears. When it gets to multiple objects/time series, the task becomes a bit more difficult as you need to use more sophisticated error metrics to evaluate your models. Please let me recommend our recent work on this topic where we consider the rolling-origin setup:
The key element in model validation is the choice of metrics or indicators to compare alternative models. For regression analysis if you use the test/validation approach it is usually MAE or MSE, but see the notes in the article above how to choose between the MAE and MSE. In the article above we also recommend the use of metrics for bias, such as the mean error (ME), median error (MdE), and the Overestimation Percentage corrected (OPc), which is the new we metric proposed. The idea is that if you obtain biased predictions, you theoretically can improve them and your model is not optimal.
But if you do not want to split your data, you can use information criteria, such as AICc or BIC. This then corresponds to the second scenario described by Alexander Kolker .
Question
I am performing a small study. This model has passed construct reliability, convergent validity, and construct validity. However, discriminant validity is not achieved. Is "Discriminant Validity" possible for two latent variable model. Any guideline from literature. Any advice ?
Regards
When two latent variables are more highly correlated than expected based on your theory, they may lack discriminant validity. So evidence for discriminant validity would come from correlations that are "not too strong", whatever that means in your case/theory.
The same guidelines as for observed variables apply (see, e.g., Campbell and Fiske's 1959 guidelines based on their multitrait- multimethod matrix approach), except that latent variables tend to be more highly correlated than observed variables, due to the removal of random measurement error in latent variables. So in that sense, latent variable analysis offers a stronger test of discriminant validity.
Question
Hey,
For my research I have created 7 latent variables, which will all consist of at least 3 items (measured by an online survey with statement and a 7-point Likert scale). My question is the following: is it possible to apply Confirmatory Factor Analysis? I want to show in the result section that I grouped the right items together under the correct latent variable before conducting multiple linear regression with my latent variables. Also, is there some literature available to back this up?
Kind Regards,
Bram ten Barge
To the best of my knowledge, all structural equation modeling software (or packages) take a raw data and convert them into a matrix form prior to analysis. SPSS, for instance, uses Pearson's correlation by default. I may be wrong, but I think SAS uses Pearson by default as well.
If you run CFA/SEM without selecting correct correlation/covariance structure, your path coefficients may be off and produce misleading results. Unfortunately, some of the general statistical software (e.g. SPSS) does not provide users an option to select the correct correlation matrix. This means users are stuck with Pearson whether they like it or not. One way to bypass this potentially misleading data processing step is that you run your own correlation and covariance matrices and use those information as data input.
Moreover, simply because you know which item belongs to what latent factor does not guarantee that your factor loading would be significant. I assume you read many validation papers using CFA, and the results almost always vary by context. More modern approach of validating the latent construct is called bifactor analysis. As you adapt more modern approach, small differences can make a big difference.
In your case, however, my gut feeling is that the results you get from Pearson (default in many software) and Polychoric matrices may not be different by significant amount. This is because I view the 7-point Likert scale as continuous variable. Of course, this assumption cannot be confirmed without item-level evaluation.
In my quick Google search, this is an article I found:
Hope this clarified some of your questions.
Question
I have questions regarding measurement invariance for longitudinal data when analyzing latent variables. I would like to analyze 4 cohorts (grades 3,4,5,6 at baseline) longitudinally over 2 years with four assessments.
1. If I consider the four cohorts as one sample, should I then show measurement invariance (configural invariance, metric invariance, scalar invariance and strict invariance) between the cohorts for each of the four assessments. Or would it be more appropriate to evaluate each cohort separately? As an analysis method I would like to calculate random-intercept cross-lagged panel models.
2. Is it necessary to test for configural invariance, metric invariance, scalar invariance as well as strict invariance between the four assessments for the latent variables? Or is it also sufficient under certain conditions if only configural invariance is given?
Question
I am writing my research paper and I have noticed that the moderator variables of my model are not strongly theoretically supported. Just to clarify, there are articles explaining the phenomenas but there are a few articles supporting them as moderator effects. Indeed, I run the model on SPSS and SmartPLS and the results were statistically significant (With high validity & reliability results). However, I also have to find theories to support them as moderator variables since I am pursuing deductive approach.
If you know any academic article supporting the use of moderator variables with low number of theoretical support (or without support), kindly type them down so I go through them.
Interesting question. I would argue that a) either you provide a potential theoretical background for the moderating variable or b) you don't incorporate the moderation effect at all if you are unable to provide the reader with a theoretical explanation for the found effects. Ultimately, it comes down to whether or not a researcher is bound to integrate as much significant variables as possible, or to select a few central variables with a strong theoretical background that are able to tell a theoretical story. Does the integration of the moderating variable change the relationship between the outcome variable and other predictors? Is there a suppression effect?
Question
I wanted to perform a confirmatory analysis on Schwartz PVQ questionnaire part. The construction of the questionnaire assumes that individual items combine into dimensions (latent variables), those dimmensions in turn - into larger dimensions, and those into even larger dimensions. For example (fragment is in the picture uploaded below): three questions/items each create the variables: SDT, SDA, POD and POR, and then, according to the instructions of the questionnaire, SDT and SDA, form the superior/master dimension SD (and POD and POR makes OP). Then PO and SD merge into one variable again... and this is just a fragment of the questionnaire key.
I don't understand why such a construct in SPSS AMOS is always under-identified - what is missing in it? What structure? How to calculate this model?
@Tomasz you need to put regression weights on the paths from zero order latent factors to second order latent factors viz. C_SD and C_PO. Kindly put regression weights on both the paths as AMOS won't run the model when regression weights are not put when there are two zero order latent factors (two paths) preceding second order latent factor.
Question
Hi folks! We did a large survey and planned to use 3 indicators to represent a latent construct (Sense of Place/SoP) in a latent regression model (structural equation modeling). However, one of the indicators did not work well (too low model fit) and I was left with only two indicators. I fear that reviewers who are used with formative scales will think that two indicators yields a very weak scale.
As reflective items are interchangeable, the latent construct should not change whether you use few or many indicators. I can argue for the face & construct validity of the indicators/scale, but there are many other suggested indicators for SoP if one were to develop a formative scale. Is there some good references for scale with few indicators? By the way: the model fit was excellent. (see enclosed structural model, the SoP construct had only two indicators).
John-Kåre,
generally, people run into trouble with small numbers of indicators because their model is under-identified. Obviously, that is not the case with your model, and it also doesn't seem to be what you are concerned about.
From a measurement perspective, the question is: Do your indicators adequately capture the scope of your construct? Do they reliably and validly measure your construct? For a relatively simple construct, two indicators might be just fine. Of course, it always helps if you can provide independent evidence.
Why exactly did you omit one indicator? And was this a previously validated instrument?
Question
I have a three latent variable model where LV1 predicts LV2 and LV3. How do I test if the effect of LV1 is the same on both LV2 and LV3. I have set the paths to be equal, but how do I then check whether the paths are actually equal or not (i.e., if there is a statistically significant difference)?
Jost, use the chi-square difference test. You just have to estimate first the model the with the equality constraint and second the model without this constraint. The difference between both chi-square values is also chi-square distributed with one df. If the model with the constraint fits the model significantly worse than the model without this constraint, then the path coefficients are not equal.
Just keep in mind that the variables included in this calculation must be on the same scale.
Generally, I wonder why you actually want to estimate the equality of the path coefficients. For example, if you have a model in which stress (LV1) has an effect on psychosomatic complaints (LV2) and job dissatisfaction (LV3).
I cannot see a reason why including an equality constraint would be useful in this case.
HTH, Karin
Question
I want to estimate the causal relationship between two latent variables over time using structural equation modelling. both latent variables are measured according to different indicators (observed). the data is panel with N=21 and T=16 years.
after going through SEM estimation techniques, cross-lagged panel models seem to be the best option, however, given the small sample size where N=21. is cross-lagged panel applicable in this case? I noticed that cross-lagged models involve large N and small T (max 5 waves).
The obstacle I'm facing is the small sample size given the number of indicators.
Any ideas if cross-lagged models are applicable or are there other techniques? (please note that I am estimating two latent variables with many indicators)
Hi Tara,
given the large number of T's, a VAR model would probably be an alternative (which AFAIK) resembles the CLPM. Further, I would approach the analysis by running individual time series plots and methods.
Beard, E., Marsden, J., Brown, J., Tombor, I., Stapleton, J., Michie, S., & West, R. (2019). Understanding and using time series analyses in addiction research. Addiction, 114(10), 1866-1884.
Bringmann, L. F., Pe, M. L., Vissers, N., Ceulemans, E., Borsboom, D., Vanpaemel, W., . . . Kuppens, P. (2016). Assessing temporal emotion dynamics using networks. Assessment, 23(4), 425-435. doi:10.1177/1073191116645909
Hoffart, A., Langkaas, T. F., Øktedalen, T., & Johnson, S. U. (2019). The temporal dynamics of symptoms during exposure therapies of PTSD: A network approach. European Journal of Psychotraumatology, 10(1), 1-11.
Having said that please not that longitudinal models do not imply so many advantages compared to cross-sectional models referring the estimation of causal effects, see
Papies, D., Ebbes, P., & Van Heerde, H. J. (2017). Addressing endogeneity in marketing models. In Advanced Methods for Modeling Markets (pp. 581-627). Cham: Springer.
VanderWeele, T. J., Jackson, J. W., & Li, S. (2016). Causal inference and longitudinal data: a case study of religion and mental health. Social Psychiatry and Psychiatric Epidemiology, 51(11), 1457-1466. doi:10.1007/s00127-016-1281-9
That is, you should think about confounders and include them as controls. If you do this as a multilevel model (what you should do IMHO) than using person-means across time as controls achieves controlling for person-trait-level confounder which is something at least, see
Antonakis, J., Bastardoz, N., & Rönkkö, M. (2019). On ignoring the random effects assumption in multilevel models: Review, critique, and recommendations. Organizational Research Methods. doi:10.1177/1094428119877457
Finally, the large number of T's will probably result in a nonlinear trend which could be an interesting side result of the study, especially when explaining it with between-subjects variables but also be part of the level 1 model. That is, a generalized additive mixed model would be an option. At present I don't know yet however, whether the GAMM (gen. add. mixed model) can be integrated with the VAR model but I guess it should be possible.
Andersen, R. (2009). Nonparametric methods for modeling nonlinearity in regression analysis. Annual Review of Sociology, 35(1), 67-85. doi:10.1146/annurev.soc.34.040507.134631
Jones, K., & Almond, S. (1992). Moving out of the linear rut: the possibilities of generalized additive models. Transactions of the Institute of British Geographers, 434-447.
HTH
--Holger
Question
Dear community,
I would like to combine two subscales of a questionnaire to form one predictor: It is an instrument on intrinsic vs. extrinsic goals which assesses for several goals their attainment and importance by two separated questions. Since using the attainment and the importance as two distinct predictors won't work I was thinking of using the difference as a variable by subtracting the importance from the attainment. For example a positive or zero value for the scale "personal growth" would mean that the attainment is large/as large as it's attributed importance, while a negative value indicates the lack of attainment of an important value.
Has anybody modelled something similar before, e.g. using lavaan or MPLUS? It seems to work nicely as manifest varibale in a MLR but I would like to try it as SEM.
Thank you very much in advance,
Kind regards
Matteo
Hello Matteo,
It wasn't clear why you would be unable to use both an attainment and an importance rating in your model. Maybe you could explain why that restriction exists for your research.
However, it is important to note that difference scores are always less reliable than either of the two constituent scores unless: (a) both measures are perfectly reliable (highly unlikely in your scenario); or (b) the correlation between the measures is zero (probably unlikely as well). If both measures have comparable variance, then the formula for reliability of a difference score is: (mean reliability - rxy) / (1 - rxy), where: mean reliability is the average reliability of the two measures, and rxy is the correlation between measure "x" and measure "y".
One final thought: Imagine two respondents, one with maximum rating on attainment and maximum rating on importance (yielding a zero difference, following your proposed approach); the other with a minimum rating on attainment and minimum rating on importance (again, zero difference). Numerically, these are "identical" cases. However, I'd be hard pressed to argue that they're comparable with respect to the goal in question.
Question
I know that they are different but I am kind of at a loss for how to explain the difference, but I may be overthinking it. In simple terms, my understanding is that a latent mean(intercept) is an estimation of a theoretical construct that is equal to 0. You can rescale the latent mean by fixing the variance to 1 and setting the intercept of one of the indicators to 0. In this case the latent mean will take on the scale of the indicator. An observed composite mean is simply the mean of the indicators of the factor. The main difference between the two is that the composite mean includes the measurement error of the observed variables, while the latent mean adjusts for this.
Am I missing anything else here? Also, I assume that it would be inappropriate to compare the latent mean from one study with a composite mean from another study?
Dear Lendi,
if you allow some self-ad: Here are to sources that focus on the difference:
Steinmetz, H. (2010). Estimation and comparison of latent means across cultures. In P. Schmidt, J. Billiet, & E. Davidov (Eds.), Cross-cultural analysis: Methods and applications (pp. 85-116). Taylor & Francis Group: Routledge.
Steinmetz, H. (2013). Analyzing observed composite differences across groups: Is partial measurement invariance enough? Methodology, 9(1), 1-12. doi:10.1027/1614-2241/a000049
With best regards
--Holger
Question
Is there any possibility to get the estimated values (listed for every participant) of a specific latent variable in R or Mplus?
Hi Robin and Roland,
I assume that there are not really possibilities to do that. Yes, you can estimate factor scores (or latent variable scores) but they are....well estimates and introduce error into the scores (the "factor indeterminacy problem"). The only goal that factor scores fulfill is that they reproduce the factor correlation matrix. But really values of individuals? I don't think there is a method to do that.
Beauducel, A. (2005). How to describe the difference between factors and corresponding factor-score estimates. Methodology, 1(4), 143-158. doi:10.1027/1614-2241.1.4.143
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6(4), 430-450.
DiStefano, C., Zhu, M., & Mindrila, D. (2009). Understanding and using factor scores: Considerations of the applied researcher. Practical Assessment, Research & Evaluation, 14(20), 1-11.
Hardt, K., Hecht, M., Oud, J. H., & Voelkle, M. C. (2019). Where Have the Persons Gone?–An Illustration of Individual Score Methods in Autoregressive Panel Models. Structural Equation Modeling: A Multidisciplinary Journal, 26(2), 310-323.
HTH
Holger
Question
I wish to use self-monitoring scale developed by Richard D. Lennox and Raymond N. Wolfe ("Revision of self-monitoring"). It is a 2 factor structure scale with total 13 items-
(1) ability to modify self-presentation-7 items & (2) sensitivity to expressive behaviour of others-6 items .
But for a part of my study I am trying to show the impact of self-monitoring on consumption behaviour , so I feel only the first dimension (ability to modify self-representation) is useful for me . Also I have 6-7 more latent variables so the survey is already touching 90 questions so I want to minimise the items .
Can I just use the 7 items given by this scale (representing the first dimension of self-monitoring), without disturbing psychometric properties and still call the composite variable of these 7 items - "SELF MONITORING"? Is this an acceptable practice in Research ? (I will doing SEM eventually)
You can use only that subscale but you must justify that specific theoretical concept instead of the general variable.
Question
We can say that these factorial analysis approach are generally used for two main purposes:
1) a more purely psychometric approach in which the objectives tend to verify the plausibility of a specific measurement model; and,
2) with a more ambitious aim in speculative terms, applied to represent the functioning of a psychological construct or domain, which is supposed to be reflected in the measurement model.
What do you think of these generals' use?
The opinion given for Wes Bonifay and colleagues can be useful for the present discussion:
Hi! Factor analysis help to identify the underlying dimensions. Sometimes the measurement items varies based on various contexts. Besides if the measurement is not well established conducting factor analysis can produce clear dimensions that can be used for the particular research model. Thanks
Question
I am running a PLS-PM to evaluate Satisfaction "S".
I have defined a latent variable LV "D" and I concern how to define the manifest variables(MV).
I have:
• 3 variables regarding satisfaction with "D", expressed on 7-likert scale
• 1 variable regarding how many x you consume
• 1 variable regarding how much you spend to buy x
My doubt:
I expect the satisfaction MVs to be positive correlated with LV and the last two MVs to be negative correlated. I have seen someone trasforming MVs into negativeMVs to have unidimensionality condition respected, but I would like to avoid this solution becouse it conceptually does not make much sense.
1. can I consider my MVs to be reflective with LV since I have a MV about consuption and a MV about expences?
2. in alternative, can I state my LV "D" to be explained by 3 reflective MVs and 2 formative MVs?
3. suggestions to define my LV?
Thanks a lot for the attenction, any help will be highly appreciated.
I think it would be better if before analyzing using PLS, testing the validity and reliability of the MV against LV is first performed. Later, invalid or unreliable MVs should not be included in PLS analysis. Usually a negative MV should have been indicated invalid.
Question
I have a structural model in SEM where I try to analyze the relationship between latent variables. However, it is only significant with chi-square and not with other indices (CFI, TLI and RMSEA).
I have tried to dissect the model by building many structural relationships between two latent variables based on my hypotheses to justify they are valid. Have this been done before? If so, can you link me some researches who have done it? Thank you.
chi-square
Question
Hi,
I am a serious novice at AMOS (still learning terminology etc. as I go), I have been reading and researching everything and still struggling at the particular point.
After running my model it says that the co variance matrix between my latent variables is not a positive definite and that the solution is not admissible. I changed the variance on my errors to 1, as this is what I read was supposed to solve the problem and it still hasn't helped.
Please see attached my model and the output.
The model is supposed to investigate, how the characteristics of victim, offender and offence (e.g. observed variables are sex, age etc.) influence the outcome of the case, which is an observed variable as well.
Any help would be greatly appreciated been at this for weeks.
EDIT. There is also a previous model, with the disturbance terms on the latent variables, however I removed these when AMOS wouldn't allow me to draw the co variance arrow from the latent variables, only from the disturbance terms. However it did produce different output results but with the same error message. If this is any help? (see attached)
Or for a variable that does not vary!
Question
Hey everyone,
currently, I am working with Gaussian process latent variable based models. In literature, the model likelihood is discussed for model selection.
Unfortunately, this does not work for my application. Currently I am using the log likelihood and the reconstruction error. The model likelihood increases and the reconstruction error decreases with increasing dimension/inducing points. BIC doesn't make sense in this context (and behave similar...).
Are there better parameters for model selection?
All the best,
Will
Thank you Karthika .P ,
I know these publications - I already implemented an ARD based model selection approach. But I have following problems:
- I can choose the dimension of a given model using the ARD values
- The number of inducing points is still unsolved
- An analysis of reconstruction error and model likelihood shows, that more dimensions and inducing points will leads to better results (this is a good thing)
- For my application, I want to find the information content in the latent space. I could do this using supervised learning afterwards (e.g.: Boruta ), but I do not want to do that because I am interested in the information content of the latent space itself without prior knowledge (such as labels).
- I do not want to use approaches based on the fitted GPLVM model.
So ARD, log. likelihood and reconstruction error are not appropriate in my context.
Question
In value-added models, school selection is endogenous to school-level treatments as well as the school random effects. It is easy to justify that students in highly selective schools (e.g., middle schools) would go to highly selective schools further (e.g., high schools). Therefore, I think it would be reasonable to use the school effects for high schools (random effects) in the study of middle school outcomes, as an instrumental variable to control for the potential endogenous selection problem. What do you think?
I checked your recommendations and I hardly find the use of further/future information as an instrumental variable?
Question
Hello,
I have 5 variables each measured using multiple items, taken from already published studies (that is there is evidence of construct reliability and validity).
I am taking one variable as example here, lets say V1 which is measured with help of 6 items [V1 (i1, i2, i3, i4, i5, i6)]. In previous study all six items loading significantly on V1.
I ran model for two different group of respondents.
In Study 1, only 4 items loaded significantly on V1
V (i1, i2, i3, i4) with alpha, composite reliability and AVE in acceptable threshold. [Case A]
In Study 2 where different set of respondents were used, 5 items loaded significantly on V1 (i1, i3, i4, i5, i6) with alpha, composite reliability and AVE in acceptable threshold. [Case B]
Now I want to compute variables to make a comparison between the two groups.
In such situations, should I compute the variable using the original study where V1 was measured with 6 items or should I compute variables as per the loading mentioned in Case A (4 items), and Case B (5 items), and then run t-test for comparison?
(N.B. Variables are reflective so I think the difference in number of items will not matter, or will it *thinking*)
Ali
It seems to me:
1. If you have adapted the items in a scale to a different context, that means it is now intended to measure a different construct. (Similar to the original construct, and maybe even closely related, but not the same.) The items themselves have presumably been reworded, even if only slightly, to correspond to the new context. Therefore reliability, validity, etc. need to be re-established.
2. Your preliminary results seem to say the items do not relate (to each other, and to the construct) in exactly the same way in both groups (studies). Multi-group confirmatory factor analysis will quantify those differences in the measurement model. It can also help you "control for" those differences when comparing the relationships of the constructs between the groups, if you estimate the entire model (measurement and substantive segments) all at once. However, if the between-group differences in the measurement model are large, it does call into question whether the constructs represented by the latent variables in the substantive model are actually the same.
3. If you don't or can't estimate the entire model all at once with multi-group DFA, a practical work-around could be to combine the two groups and go through the usual steps for scale construction (examining EFA, alpha, item-to-total correlations, etc.) to determine which subset of the items produces the best scale for the combined groups. Use those and only those items to create the scale. Then divide the data back into your two original groups, and proceed with your comparison of the groups based on that scale.
4. Another work-around would be to form the scale from only those items that "work" in both groups: i1, i3, and i4 in your example. Or, as others have suggested, keep all 6 of the original items. Check to see if either of those approaches yields acceptable alpha values in both groups.
4. I would not under any circumstances use two different subsets of items to form the scale in the two groups. If you do that, any comparison you make is an "apples to oranges" comparison. If you find a difference between groups, you can't know whether the groups actually differ on a particular construct, or the constructs measured by your two different subsets are really two different constructs.
Question
I have some explanatory variables and i have designed a stated preference survey design with those variables or attributes. Along with that I have asked questions about safety perception (only three questions measured in ordinal scale) keeping in mind not to enlarge the questionnaire. If I do factor analysis, these three questions will merge to one factor. My question is whether I can estimate an ICLV model directly incorporating the three perception based questions (as three questions are related to safety but not necessarily correlated) measured in ordinal scale, into the choice model estimation without doing factor analysis? Is there any other suggestions? Please help.
Considering that you are trying to develop an ICLV model through SEM, my suggestion shall be to go for PCA, identify the single factor from all 3 related questions ( as you predict) and integrate this factor with the model. This way, the maximum likelihood estimate shall be high, showing strength of your model.
However, if you want to integrate all 3 questions in your model separately, there is a chance of reduced max. likelihood estimate, which is not desired for any model. Needless to mention the model may also appear loaded with redundant variables, making the model presentation clumsy.
Question
Hi there, can anyone provide me suggestions on choosing software? I'm interested in knowing which software is the best for simultaneous estimation approach. Matlab, Mplus, Biogeme, etc. are all capable for this. What is your favorite software and why? What are their pros and cons? Thank you.
I, personally, am not a huge fan of Matlab because of long code and the troublesome mistake checking process. I'm curious if Latent Gold can be used for simultaneous estimation of hybrid choice models?
EQS.
Hope this helps
Matt
Question
Can someone please explain how to do a path analysis within a structure equation model in AMOS when having only latent variables and within the research model 3 moderators and 1 mediator? I am struggling to build it in AMOS to test the structural model. I would be grateful for any hint. Thank you.
If you have a large sample and non-normal data, you can use the ADF estimator in AMOS.
With dichotomous IVs, I would use a multi-group model. The AMOS manual explains this in Chapter 25.
Question
I have two latent variable models which I have identified in separate analyses (we'll call them Model A and Model B). In each of the analyses, all parameters were freely estimated and I achieved excellent model fit. The latent variables in each model are thought to represent trait abilities measured via different assessment tools. I want to see how the latent variables from Model A predict the latent variables in Model B. Given that each model is believed to represent trait-level (static) abilities, is it acceptable to constrain each of the models based on the parameters identified in the separate analyses, and ONLY freely estimate the prediction parameters from Model A to Model B?
Jeffrey M. DeVries Thanks for the great reference!
David L Morgan Thanks for the input. I know that sample size is a hotly debated topic, so it's always good to learn what others are using as benchmarks.
Question
What are the differences between SEM and Path Model? As far as I know, SEM overcomes two of the issues with path model, with latent variables and non-recursive models. But is it necessary that SEM should always have a latent variable in the model? Please clarify. Thank You.
Hi Denila,
my former colleagues are strictly right but SEM is also often used as a general term for a specification and test of a causal structure (for instance by Judea Pearl). From this perspective, SEM is a generic term and describes a theoretical model that expresses your causal assumptions (i.e. assumptions on existing and - more importantly - non-existing effects.
In order to estimate a non-recursive model, you need instruments for both relevant variables. These are variables having effects only on one variable. For further information see
Frone, M. R., Russell, M., & Cooper, M. L. (1994). Relationship between job and family satisfaction: Causal or noncausal covariation? Journal of Management, 20(3), 565-579.
Kline, R. B. (2006). Formative measurement and feedback loops. In G. R. Hancock & R. O. Mueller (Eds.), (pp. 43-68). Greenwich, CT: Information Age.
Paxton, P. M., Hipp, J. R., & Marquart-Pyatt, S. (2011). Nonrecursive models. Endogeneity, reciprocal relationships, and feedback loops. Thousand Oaks: Sage.
Wong, C.-S., & Law, K. S. (1999). Testing reciprocal relations by nonrecursive structural equation models using cross-sectional data. Organizational Research Methods, 2(1), 69-78.
Best,
Holger
Question
I'm using 4-6 parcels as indicators for each of the 4 latent factors in my structural model. I parcelled the items in 2 factors using the internal-consistency approach (each parcel containing items from 1 subscale), and the other 2 factors using the domain-representative approach (each parcel containing a representation of items from all subscales). I would like to know whether what I have done is appropriate?
Great discussion beyween Melissa and Dr. Steinmetz. Cleared so many confusions.
Question
Is there a package in R (or Stata) to solve the Integrated Choice and Latent Variable Models ( hybrid choice model)?
Dear Jianron :
I am not aware about stata, but you can certainly do it in R. ICLV involves nothing but maximizing the log of product of choice model and measurement equation probability terms. You can write required likelihood expression and then maximize it using the simulated maximum likelihood.
By the way, if you are looking for a quick solution, you can try BIOGEME (http://biogeme.epfl.ch/home.html), if you haven't already.
Below are some references in ICLV:
1. Daly, A., Hess, S., Patruni, B., Potoglou, D., & Rohr, C. (2012). Using ordered attitudinal indicators in a latent variable choice model: a study of the impact of security on rail travel behaviour. Transportation, 39(2), 267-297.
2. Ben-Akiva, M., Walker, J., Bernardino, A. T., Gopinath, D. A., Morikawa, T., & Polydoropoulou, A. (2002). Integration of choice and latent variable models. Perpetual motion: Travel behaviour research opportunities and application challenges, 431-470.
Thanks
Divyakant
Question
Can we design a 1st, 2nd, 3rd, 4th and 5th order/level variables Model comprised of reflective and Formative variables as follows: items to 1st order variables are formative & 1st order variables to 2nd Order variable is Reflective than 2nd order variable to 3rd order variables are Formative, 3rd to 4th is also formative and 4th to 5th order/Level is Formative?
your model could be any level that you want, but interpretation of the model with higher level than three is hard and complicated. Theoretically, there is no problem.
Question
I'm planning to perform a SEM in order to investigate intention to innovate regarding to attitude toward innovation and entrepreneurial self-efficacy, as well as other variables.
Each of these factors are measured through 4 or 5 items in a five-point Likert scale. "Latent variable" is a variable that cannot be directly measured. Are these factors latent variables, because they are measured through other measures (the 4 itens), or are these factors observed variables, where the observation is consisted of 4 itens?
To my understanding, it should be seen as an observed variable, because each item is not alone a variable, but the mean of all of them is. Nevertheless, I am not sure about this definition, and would be grateful if someone could elucidate it for me, once this changes the whole structure of SEM.
Best regards,
Pedro
In SEM, the objective is to define your model and test its goodness-of-fit against itself (estimated and observed covariance matrices).
If you define the model, then you name a variable a latent variable. But latent variables are something that is not directly measureable (i.e. pain, quality of life, culture). Observed (manifest) variables are directly observed and measured (i.e. physiology, behavior, RT, pupil dilation).
Included is small SEM lit review that describes the basics and lingo used like the difference between latent variables and observed variables and which factors should be used for which.
Question
Hi,
I have a problem with SEM using Lisrel. I am using a Lisrel for my SEM modelling. All my variables are ordinal. Hence the indicator variables for the independent latent variables (intention) and the observed dependent variable (behaviour) are all ordinal. Now if I try to define an observed variable as a dependent variable to the latent variable, it assumes it to be another indicator variable to the latent variable. In order to solve this problem, I have tried using a single indicator latent variable. Hence I create a latent variable where behaviour is a single indicator and this latent variable then becomes the dependent variable to the latent variable intention remains the independent variable. The model works but reading the above argument I have doubts on its reliability. So I have two questions:
1) In Lisrel, how can I treat an observed variable - behaviour, as a dependent variable to a latent variable without it being mistaken as another indicator variable of the independent latent variable intention
2) If this is not possible then can I use a single indicator latent variable to define my dependent variable
Sandeep
Hi,
I am re-igniting this thread with a new query. My previous issues on single indicator latent variables (using ordinal data) were answered and the model has been working successfully. I am now looking to answer the question that how should one deal with correlating errors in Lisrel. Should one only needs to correlate errors specified by the Lisrel output modification indices or does one need to follow some other procedure/method?
Sandeep
Question
I am conducting latent measurement invariance analyses using MPlus. The data are best modeled by a single factor (A) and a reverse-coded method factor (RC). All six items load onto A; three of the six items load onto RC.
I would like to calculate composite reliability (omega) for A. It appears that the variance of the dual-loading items is affected (lowered) by this factor structure, which results in potentially liberal estimations of omega. Modeling the data using a correlated error structure provides markedly higher variances for the cross loading items, and, thus, a lower estimation for omega.
What is the appropriate way to calculate omega for data with this structure? Thank you very much for your help,
Chris Napolitano
Christopher,
here are some other articles that may also be helpful:
Rodriguez, A., Reise, S. P., & Haviland, M. G. (2015, November 2). Evaluating Bifactor Models: Calculating and Interpreting Statistical Indices. Psychological Methods. Advance online publication. http://dx.doi.org/10.1037/met0000045
Abstract: Bifactor measurement models are increasingly being applied to personality and psychopathology measures (Reise, 2012). In this work, authors generally have emphasized model fit, and their typical conclusion is that a bifactor model provides a superior fit relative to alternative subordinate models. Often unexplored, however, are important statistical indices that can substantially improve the psychometric analysis of a measure. We provide a review of the particularly valuable statistical indices one can derive from bifactor models. They include omega reliability coefficients, factor determinacy, construct reliability, explained common variance, and percentage of uncontaminated correlations. We describe how these indices can be calculated and used to inform: (a) the quality of unit-weighted total and subscale score composites, as well as factor score estimates, and (b) the specification and quality of a measurement model in structural equation modeling.
Tenko Raykov & George A. Marcoulides (2016) Scale Reliability Evaluation
Under Multiple Assumption Violations, Structural Equation Modeling: A Multidisciplinary Journal, 23:2, 302-313, DOI: 10.1080/10705511.2014.938597
Abstract: A latent variable modeling approach to evaluate scale reliability under realistic conditions in empirical behavioral and social research is discussed. The method provides point and interval estimation of reliability of multicomponent measuring instruments when several assumptions are violated. These assumptions include missing data, correlated errors, nonnormality, lack of unidimensionality, and data not missing at random. The procedure can be readily used to aid scale construction and development efforts in applied settings, and is illustrated using data from an educational study.
HTH
Karin
Question
I am trying to  include some attitudinal in mode choice models. Does anybody know which software package can be used to estimate the model?
I don't know if you are familiar with program coding but I found that writting your own estimating code in  MATLAB (or R) is best option as it will give you much flexibility to deal with any model and data specification. I've done some latent choice modelling in MATLAB . It has very good function libraries for MLE or even Bayesian estimation and is very easy to program.
Question
I would like to know about anaysing SEM images related to types of wear
Dear Nicolas,
Thanks and Regards
Question
I am doing a correlational study on the relationship between teachers' sense of efficacy and school culture score( IV) and student achievement(DV). The new NJ Teacher's Evaluation system moderates/mediates the relationship.
The moderator or mediator relationship should be based on theory, primarily on early studies done in that context. Teacher evaluation system is primarily aimed at quality management, it may be affecting the teachers' perceived efficacy. Adding to the answers as already provided by other scholars, the ultimate decision would  be based on theoretical support in favor of mediation or moderation. The following references may be helpful for futher reading:
• Husbands, C. T., & Fosh, P. (1993). Students' evaluation of teaching in higher education: experiences from four European countries and some implications of the practice . Assessment and evaluation in higher education, 18(2), 95-114.
• Kupermintz, H. (2003). Teacher effects and teacher effectiveness: A validity investigation of the Tennessee Value Added Assessment System. Educational evaluation and policy analysis, 25(3), 287-298.
• Sanders, W. L., Wright, S. P., & Horn, S. P. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of personnel evaluation in education, 11(1), 57-67.
• Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of personality and social psychology, 51(6), 1173.
Question
Hello
I'm running a SEM model with many latent variables, and after generating factor scores for each of them (based on the observed indicators), I'm struggling to normalize them in order to proceed with the analysis. The regular z-score is not working.
I have found some studies using the Tukey's proportion estimation formula for normalization, but had no success looking for this transformation.
Does anyone have any experience in this area?
Perfect, have you checked for the presence of outliers ?
Question
My question is regarding SEM using Amos. I used only three observed variables for each latent factors and I am having six latent factors. In CFA I found one of these observed variables is not at satisfactory level.(Its factor loading is less than 0.5) In that case can I remove that variable and do the analysis having two observed variable per latent factor or any option for that?
Although there is a rule of thump that each latent factor should have three indicators, but you can still run the model having two indicators (observed variable) in one latent factor. As you mentioned, one of your indicators has <0.5 loading. The issue of "how less is too less" is a relative issue in keeping/dropping indicators. That is to say, you should not always drop an item just because of having low factor loading, rather you should also consider some other indices such as composite reliability, convergent validity, discriminant validity etc. In some cases, it may be the case that you can go ahead with an item having low loading, but the item is important in reflecting the corresponding factor (e.g., high content validity) and when you have acceptable composite reliability, convergent validity, discriminant validity etc.
There is no "one-size-fits-all" solution! For your decision, you may consult this book since you are using AMOS:
Byrne, B. M. (2013). Structural equation modeling with AMOS: Basic concepts, applications, and programming. Routledge.
Question
I have a data-set that contains different states of a country. In every state there are different companies and one company in every state is manager of other companies in that state (other companies are branches of this leader company at different levels). I want normalize (or standardize) this data-set and after that use Factor Analysis(FA) to combine different input features to create a single performance indicator.
• Is is possible to normalize data in every state separately and consider the leading company features values as denominator of other companies in that state?
• Can we compare a company from one state with other company in another state in this structure? (comparing to using one leading company for whole data).
• Is this normalization method affect factor analysis assumptions?
** Whole data leading company is so big and has very high value features so I decided to use this normalization structure. Scale and measurement unit of features are different.
I suggest that you normalize data globally. This will allow you to compare companies across all states.
Question
I know there are a bunch of validated 'peer friendship quality'-scales out there, but the problem is that they all involve a lot of items.For my newest research I want to ask adolescents about the 'friendship quality' with at least 6 different peers. I want to ask them also a lot of other questions.
To avoid repetitive answering and questionnaire fatigue, I was now considering the use of a one or two-item (e.g.  How good do you  consider the friendship with peer X?) scale with a range between 1 (very low quality) and 10 (very high quality).
I know latent constructs are better in terms of measurement error, but keeping respondents motivated is perhaps more difficult if you ask them to complete 20 items per peer.
Although the psychometric literature would suggest to use multi-item measurement, there are some references that used one/few items to measure peer relationship. Here are some references that might be of help to you:
1. Srivastava, Sanjay, Maya Tamir, and Kelly M. McGonigal (2009), "The Social Costs of Emotional Suppression: A Prospective Study of the Transition to College", Journal of Personality and Social Psychology, 96, 883-897. (this is measuring "closeness to peers" by using one item)
2. Stage, Frances K. (1989), "Motivation, Academic and Social Integration, and the Early Dropout", American Educational Research Journal, 26, 385-402. (This is measuring "social integration: peer group relations" by using two items)
3. Zhou, Zhongyun (Phil), Yulin Fang, and Douglas R. Vogel (2012), "Attracted to or Locked In? Predicting Continuance Intention in Social Virtual World Services", Journal of management Information Systems, 29, 273–305. (this is measuring "relational capital" by using one item)
It might be necessary to modify the measurement item(s) based on your relevance to the research topic. Thanks.
Question
I am working on my master's thesis and will be testing models that have facets of health as the outcome. Specifically, I am looking at:
• physical health (i.e., health problems, such as hypertension, pain, vision problems),
• functional health (i.e., how health problems impair or limit daily functioning, such as working, sleeping, seeing),
I'm thinking that these facets of health are formed by their indicators, rather than the indicators being reflective of the facet of health. But can an argument be made in favor of reflective?
Related, if I do treat these are formative, what are the implications for treating these latent variables as endogenous outcomes? I've read Diamantopoulos et al (2008) and I am not sure how, or if any recommendations for formative latent variables change if the latent variable is the outcome.
If it helps in any way, most of my indicators are categorical, but I also have a few continuous. I was planning on using robust weighted least squares as my estimator and conducting my analyses in Mplus.
Thank you in advance, and let me know if you need more details.
Gretchen, I think we get into trouble when we try to turn "physical [or mental] health" into an outcome measure. There's no agreed-on measure for that, if we even agreed on what "that" is.
I'd try for more precise constructs like physical function, perceived health status, medical history and distinct biometric values like BP and BMI.
Question
Thank you for reading this question.
In my data the age was captured in categories (e.g., "under 20 years", "21-30 years").  Since those categories lead to some kind of inaccuracy I'm not sure if "age" has to be modeled as manifest variable with or without a disturbance term. The point is: Does measuring in categories produce an error which is modeled in SEM or not?
Moreover, if "age" is modeled as a manifest variable it belongs to the measurement model but since my hypotheses include "age" doesn't it also belong to the structural model?
Wouldn't  it be a solution to run a multi-group analysis using age groups for grouping and compare the effects across them instead of including age as variable in one model? (I'm not familiar with amos, but for me it looks like age is treated like continious even it's jsut ordered categorical in this case?); beside the advantage of testing invariance across the age groups, in this model you could test, if the direct and indirect effect of you latents are the same in the different groups;  in addition this would fix your problem with the error term.
Best
Question
I have three questions regarding SEM, as below:
1-  There are observed and latent variables in the models and I know the definition of each, but I do not know whether we can consider the total score of a scale under an observed variable or not?
As I have around 150 items (for all scales used in my research), I can not draw measurement model and mediating models based on the items. Do you have any suggestion? Please!
2- Would you please let me know whether drawing a covariance between the mediators is wrong or not?
Although mediators in the models are kind of endogenous variables and covariances should be drawn for exogenous ones only. But I saw some scholars that used a convariance between mediators! Is it right?
3- Is it acceptable if I draw separate models in my thesis based on my hypotheses?
Indeed, I used SPSS to to cover 3 of objectives, but for the last objective which is about the mediating effects, I used SEM-Amos. I drew 4 different models based on each hypothesis. Is it correct?
The attached file is a sample to show what I am trying to ask.
Thank you for your valuable time,
Dear Dr. Moslehpour
Best,
Question
I have multiple repeated risk factors and 6 times points of children growth.I want to create a latent variable for growth change and how this affected by repeated risk factors.
Do your 'repeated risk factors' change with time? Either way, you could look at them as time varying or time-invariant predictors.
You could model children growth (CG) using a growth curve model by specifying slope and intercept:
CG1 = fCG_alpha +  D1,
CG2 = fCG_alpha + 1 fCG_beta + D2,
CG3 = fCG_alpha + 2 fCG_beta + D3,
CG4 = fCG_alpha + 3 fCG_beta + D4,
CG5 = fCG_alpha + 4 fCG_beta + D5,
CG6 = fCG_alpha + 5 fCG_beta + D6;
Of course, if you feel that growth is not linear, you could change the coefficients to reflect quadratic etc. relations.
Then you could relate your risk factors to fCG_beta, adjusting for fCG_alpha (your intercept, or baseline value).
ie.
fCG_beta = * fCG_alpha + *age + *BMI + *fTimeVaryingRiskFactor_beta;
in which case, fTimevaryingRiskFactor would make it a parallel growth curve model.
You should be careful with optimization problems and missing data though.
Question
I would ideally like to use a single threshold - eg category 0/1 vs 2/3.
Thanks, Pat!
Once a mentor, always a mentor.
Question
In my two segment latent class model, I used maximum iteration =50. Then ran my limdep program but showed the message maximum 50 iteration. Exit iteration status=1. I have not understood if my result with the above message is right or wrong?
I was used SAS  for latent class analysis. In some cases especially when I was run the model with mar than 3 classes, even 5000 iterations were not enough.  As suggested above, increase the number of iterations.
Question
What does it mean when the captured variance in X-block in PLSR is so low, for example, 20% but the captured variance of Y-block is 98%?
Is this model proper? Can this model be used for prediction?
The matrices X and Y are simultaneously decomposed into a sum of L major components. The correlation between the two blocks X and Y is obtained by a linear regression coefficient. For L latent variables, the best linear relationship between the scores of the two blocks is obtained by small rotations of latent variables X and Y blocks this process is to identify the optimal number of latent variables, which can be obtained by cross-validation with low prediction error. As new variables are orthogonal to each other, and therefore uncorrelated. Usually, the first Latent variables explain most of the total variance contained in the data and the analysis of factors must explain the maximum intercorrelation between variables which is not observable, perhaps adding in factors may be a way to improve the model because it is not being obtained a good intercorrelation between the matrices.
Question
Can we use 2 different likert scales eg., 2&4 within a single latent variable?
Within the same structural model can I use indicators of one latent variable with continuous data and indicators for another with categorical data?
For a latent variable, I have two indicators with 4 point Likert scale and two are dichotomous, is it correct to use two different likert scales to measure the same latent variable?