Science method
Exploratory Factor Analysis - Science method
Explore the latest questions and answers in Exploratory Factor Analysis, and find Exploratory Factor Analysis experts.
Questions related to Exploratory Factor Analysis
I conducted an Exploratory Factor Analysis using Principal Axis Factoring with Promax rotation, resulting in the identification of 8 factors. Now, I aim to examine potential associations between these extracted factor scores (treated as continuous variables) and other variables in my research. However, I am uncertain about the most appropriate statistical approach to compute these factor scores.
Should I compute the average sum of each factor, employ regression, or utilize the Bartlett method, Anderson-Rubin, any others….?
Any insights or alternative approaches are highly appreciated.
Thank you!
Hello,
I have a data set with N = 369 individuals measured at a single time point. The goal of the study is to create an assessment of psychological safety (PS). The assessment is a self-report measure asking participants to indicate how psychologically safe they feel using a unipolar 5-point Likert scale ranging from 1 (not at all) to 5 (extremely).
In addition to the assessment I am creating, I also measured a number of demographic variables (e.g., age, salary) and a few additional measures of team environment for validation (e.g., an existing measure of PS, level of team interdependence).
My primarily goal is to run exploratory factor analysis (EFA). This is the first time anyone has conceptualized PS as multidimensional, so one of the primary goals is to uncover the potential factor structure of PS. Also, to identify candidate items for deletion.
In order to prepare for the EFA analyses, I am cleaning the data by following recommendations in (the excellent) Tabachnik & Fidell (2013, 6th ed).
I am currently at the point where I am checking the data for multivariate outliers, starting with Mahalanobis distance. And I cannot find explicit guidelines regarding which variables I should be including as "IVs" in the analysis.
QUESTION: Which variables should I be including in my search for multivariate outliers? Do I include all variables, or only my target variables?
Specifically, do I include only the variables that represent the item pool for my forthcoming PS assessment? Or do I include all the PS items AND demographic variables, the existing PS assessment, interdependence measure, etc.??
I ran the Mahalanobis distance analyses 2 times using both approaches, and found substantial differences:
- TIME 1 - With just the PS assessment variables --> I identified n = 28 multivariate outliers.
- TIME 2 - With PS items + demographics, etc. --> I identified n = 10 multivariate outliers (all identified as outliers in the TIME 1 analysis).
Syntax I am using - the bolded variables are the ones I am questioning if I should include or not:
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Subjno
/METHOD=ENTER Age Salary Edu WorkStructure TeamSize Tenure_OnTeam JapaneseBizEnviron EdmondsonPS_TOT Interdep_TOT Valued_TOT PS_1 PS_48 PS_141 PS_163 PS_43 PS_53 PS_73 PS_133 PS_135 PS_19 PS_60_xl26 PS_93 PS_106_xl26 PS_143 PS_58 PS_86 PS_182 PS_56 PS_69 PS_103 PS_164 PS_22 PS_35 PS_91 PS_30 PS_59 PS_63 PS_90 PS_131 PS_140 (**Note, PS assessment var list is truncated b/c large number)
/RESIDUALS = OUTLIERS(MAHAL)
/SAVE MAHAL.
i’m trying to run polychotic correlation with Stata v13, but I’m confused.
Hey guys,
I have found two multi-item scales in my previous research regarding my master thesis. I want to know if I can compute an EFA for the dependent and for the independent variable?
I'm doing a validation study and for construct validity, I'll analyze my data through EFA and CFA. So, the same data should be used in both processes or should we divide the data? And if we divide the data what percent of the data should be included in EFA and what percent in CFA?
Can you please help me on how to use Exploratory Factor analysis as a Method of my research, actually my research is all about the Dimension of illegal activities commission. Please -I'm new researcher
What is meant by collecting data separatly for newly developed questions to run EFA in PLS-SEM? Like do i need to make to separte questionnaire sets one consisting of questions adopted from prior literatures and the other for the self developed questions?
Is it possible to run EFA from the data collected by pilot study and for the empirical work only running the CFA?
Please suggest.
Thanks in advance.
Dear RG community,
I have been trying to help my PhD student with an EFA. We have ended up with a very nice stabilised solution (using principal axis factoring, varimax rotation and extracting a fixed number of factors) apart from the determinant of the correlation matrix being too low. We had initially checked for correlations > 0.8 in absolute value but did not find any so we went back and removed one item from each pair where the bivariate correlation was > 0.7 in absolute value. This improved the determinant but it was still considerably lower than the recommended threshold of 0.00001. What do you suggest we should do: continue to reduce the bivariate correlation threshold and remove more items, just live with the low determinant value as what we are really interested in is the scales themselves derived from the EFA, or is there another way to detect and remove multicollinearity?
Thanks very much for your expert advice.
Hello, all!
I have translated and heavily edited a validated survey for use in a new context. This involved changing language, context, and cutting items from 56 to 30.
I'm now trying to find the underlying factor structure of this data, which I assume will differ from the 8 factors in the original 56-item scale.
For the purposes of my research, I tried to have respondents compare their expectations with their preferences for a given item. So, an item would present a statement such as (hypothetically) "During class, students make connections between content from different subjects." This would be followed by two Likert-type questions: "I would like this to be true" and "I expect this will be true". Repsondents answered each of these sub-questions for every item, on a 6-point scale (no neutral).
I'm a bit stuck, conceptually, on how to approach factor analysis with these paired questions. Conceptually, I'd assume that it shouldn't matter whether I incuded both "Q_a" and "Q_b", since those should load onto the same factor. But then, I think, the nature of the questions might confound such an analysis.
Does anyone have any wisdom or literature on how this type of paired response factor analysis has been done?
Thank you!
Dear researchers, can both EFA and CFA be applied to the database obtained at the end of the scale development process at the same time?
I am currently engaged in a study that applies regression and ANOVA models to several latent variables, including entrepreneurial passion, risk attitude, and entrepreneurial self-efficacy.
In the context of this study, I am seeking a rigorous and straightforward method for determining the factor loadings and latent variable scores for each participant. I am particularly interested in going beyond the traditional methods of simply calculating average or sum scores for these latent variables.
I believe estimating these factors would be more precise using an approach similar to that employed in Covariance-Based Structural Equation Modeling (CB-SEM) models.
Could you provide guidance on how to implement this approach effectively? Would you recommend specific statistical techniques or software tools for this purpose?
How can I ensure the validity and reliability of the obtained factor loadings and latent variable scores? Any advice or resources you could share would be greatly appreciated.
1- I determined exploratory factor analysis (eight dimensions and 37 questions were finalized).
2- Then I checked the measurement model. In the measurement model, only 13 questions had factor loadings greater than 0.5.
3- In this case, the previous dimensions taken from the exploratory factor analysis cannot be used, because one has been removed and the three dimensions have only 2 questions.
4- In this new mode, it is not possible to define a model with hidden variables, because we only have the indicators. Because the number of indicators for each dimension in this new mode is not enough, so the dimensions cannot be considered. Because according to Hair (2010), at least three questions are needed for each variable.
Question:
What do you recommend?
It is welcome any study from any field, as much as possible :)
Thank in you in advance
The result of the exploratory factor analysis for the measurement tool I used is not compatible with the original scale. For example, in the original study, item 3 is in factor 1, but in the study I conducted, it is in factor 3. In general, all items in the scale show factor loadings in different sub-dimensions than the original ones. How should I proceed?
Good morning,
I am not an expert on Factor Analysis, so I hope the explanation of my problem makes sense.
My current task is to perform analyses on an older data set from experiments my lab conducted a couple of years ago. We have 212 individual items, consisting of a dozen or so demographic questions and items from a total of 24 different scales that measure separate constructs. Given the large number of items and constructs, I would like to reduce the number of dimensions to achieve a more clear starting point for theory development. Obviously, an exploratory factor analysis is a good choice for this.
My question is whether I have to input the 200 or so individual non-demographic items in the EFA, or whether I can instead just use the 24 composite variables/constructs and reduce the number of dimensions from there. My hesitation using all the individual items is that an EFA would simply return something very similar to the composite variables, as the constructs generally have a quite high internal consistency and are fairly distinct from each other based on theory. The obvious caveat with using the composite variables is that it is not something I have seen done much, and as I am not an expert on EFA, I am unsure as to whether there is a major methodological road block to using composite variables that I am unaware of.
Thank you for your help!
Best wishes,
Pascal
Hi
I'm going to do a cluster analysis, after conducting an exploratory factor analysis for a series of Likert scale questions.
The cluster analysis will of course include other variables such as demographics, together with the factor analysis products.
My question is, do I use Factor Score (regression based or Bartlett based), for those 'Factors' from EFA or do I use mean score (average of the Likert scale questions that manifest individual factor)?
I know factor score has many advantage than mean score, but factor scores have a standard deviation of 1, and mean of 0. It's hard to interpret the later on cluster analysis based on these. While mean score seems to make more sense to interpret.
So, which score shall I use for the later on cluster analysis? mean score or factor score, or something else?
Thank you.
Hi all,
Kindly help me to know when can we use Exploratory Factor Analysis(EFA) and Confirmatory Factor analysis(CFA)?
Regards,
In the context of regression, methods for detecting collinearity are well described in the literature. In the context of exploratory factor analysis (EFA), however, I am facing a situation where I get highly unstable factor loadings and factor correlations from different boot-strapped samples (despite a sample size of more than 700), which I assume is due to multicollinearity. I am not sure how to explore its source. The common methods rely on the intercorrelation among explanatory variables (i.e., the latent factors in EFA), but the intercorrelation its-self is highly unstable.
My search for finding references on this topic was not much successful. Any explanation and/or reference on multicollinearity in EFA would be appreciated.
Thank you in advance,
Ali
Hello,
I applied exploratory factor analysis with network analysis to data from healthy and diseased patients. The analysis shows different clusters of parameters; some are similar in both groups, and some groups cluster differently. For instance, the parameters IleValLeu are similarly clustered in both groups whereas the parameters Pro Ala are not (see figure).
How shall I interpret the data? For instance, in the Pro/Ala case, I expected to see some differences between the groups, but they look pretty much the same to me.
Is the differences about correlation? The scatterplot of the data shows slightly different regression models, but not something compelling.
Is it the value itself? But again there is no real difference in the value distribution between the two groups.
So, what is the actual outcome of the network analysis?
Thank you



Hypothetically, if I would like to validate a scale and I need to explore its latent factors first with EFA followed by a CFA to validate its structure, do I need a new data set for the CFA? Some argued that randomly spliting one dataset into two is also acceptable. My question is that can I apply EFA on the dataset (with all data included) and then randomly select half of the dataset to conduct a CFA?
Dear Research Scholars, you are doing well!
I am a Ph.D. scholar in Education. Now I am working on my thesis. kindly guide me that when to perform the EFA whether it use on pilot studying data or actual research data.
Regards
Muhammad Nadeem
Ph.D. In Education , faculty of Education,
University of Sindh, Jamshoro
I am conducting an exploratory factor analysis and to determine the number of factors I used a paired analysis.
How can I generate the number of factors correctly in stata? Or other tool?
When using parallel analysis in stata, for example, if you proceed with Principal Axis Factoring all my Eigenvalues from the Parallel Analysis using a Principal Axis Factoring lower than 1. (Suggests in this case to retain all factors)
when out of curiosity, I use principal component factors, or even principal component analysis (I know this is not EFA), it suggests retaining 3 factors (which satisfies me)
I am running a PCA in JASP and SPSS with the same settings, however, the PCA in SPSS shows some factors with negative value, while in JASP all of them are positive.
In addition, when running a EFA in JASP, it allows me to provide results with Maximum Loading, whle SPSS does not. JASP goes so far with the EFA that I can choose to extract 3 factors and somehow get the results that one would have expected from previous researches. However, SPSS does not run under Maximum Loading setting, regardless of setting it to 3 factors or Eigenvalue.
Has anyone come across the same problem?
UPDATE: Screenshots were updated. EFA also shows results on SPSS, just without cumulative values, because value(s) are over 1. But why the difference in positive and negative factor loadings between JASP and SPSS.
When designing the questionnaire for EFA, what do I need to keep in mind when it comes to the order of the questions?
More specifically, does the order of the questions need to be completely randomized or is it generally allowed to still ask questions in topic blocks according to potential factors/constructs I have in mind?
Thanks everyone!
For my research project I am adding new items to a previously validated scale. in the previous research they performed an exploratory factor analysis revealing a two-factor structure, but the internal consistency of one of the scales was quite poor so the aim of my study was to add new words to improve the internal consistency. so do i need to do another exploratory factor analysis as the scale will now have new words or can i do a confirmatory factor analysis because i'm still using the same scale?
I did the confirmatory factor analysis of the questionnaire. If in each subscale I select the questions that have the highest factor load and I consider these questions as a short version and do other steps to verify the validity and reliability of the questionnaire, is this the correct way?
by doing exploratory factor analysis, I have less sub scale in comparison with sub scales in the long-form questionnaire.
In exploratory factor analysis, to name the dimensions, some items are different and heterogeneous from other items of the same dimension, and they make it difficult to name each category. For example, in the first category "personality type of the manager", in the 3rd category, two items "level of analysis of the spouse" and "wife's perception of the existence of social justice", in the 5th category, the item "organizational level of the manager (senior, middle, operational manager) ", in category 6, the item "Education level of wife" and in category 7, the item "Job transition of wife".
Regardless of the heterogeneous items above with other items in the same category, the naming of each category has been done as follows. The question is: what to do with heterogeneous objects? Are they deleted?
Category 1. Family intimacy: spouse's relationship with family; years of married life; wife's compassionate analysis of the company events recounted by the manager; Manager's personality type
Category 3. Manager's attitude towards his wife: manager's mental health; The amount of analysis of the wife; Manager's leadership style; The manager's level of trust in his wife; Wife's perception of the existence of social justice
Category 5. Spouse's power of influence on the manager: manager's organizational level (senior, middle, operational manager); The wife's effort to show support to the manager; Spouse's ability to regulate manager's emotions; The extent of the spouse's influence on the manager's decision-making; wife being a housewife
Category 6. Environment: the level of stability of the environment; Accompanying men (in the role of manager) with their working wives; Manager's income; spouse's level of education; The degree of balance between work and family by the manager
Category 7. Spouse's physical health Question: Spouse's job transfer; Wife's physical health; Manager's physical health
Here is the code I'm using:
[fa(vars, 4, rotate="oblimin", fm = "pa")]
Hi. I recently developed new survey items to measure extrinsic motivation. In the exploratory factor analysis, four items intended to measure extrinsic motivation loaded onto the same factor with negative scores (-.822; -.813; -.808; -.553). However, in the subsequent confirmatory factor analysis, the same items produced positive scores (.78; .80; .67; .56). I do not have a theory about why this could be and I would appreciate any help with explaining this finding.
Thank you.
I used confirmatory factor analysis without using the exploratory factor analysis first. The assumption that supported this was the clear presence of the factors in the literature. Is it correct not to go for the exploratory factor analysis and jump directly to CFA when you have clearly established factors in the literature and theory?
Can i split a factor that has been identified through EFA.
N=102 4 factors have been identified.
However, one of the 4 actually has two different ideas that obviously are factoring together. I am working on trying to explain how they go together but it is very easy to explain them as two separate factors.
When I conduct a Conformatory analysis the model fit is better for them separate. . . but running a confirmatory analysis on the same population of subjects that I conducted the Exploratory analysis on appears to be a frowned upon behavior.
Exploratory Factor Analysis and Confirmatory Factor Analysis are used in scale development studies. Rasch Analysis method can also be used in scale development. There are some researchers who consider the Rasch Analysis as up-to-date analysis. Frankly, I don't think so, but is there a feature that makes EFA, CFA or Rasch superior to each other in Likert type scale development?
I am conducting an EFA for a big sample and nearly 100 variables, but no matter what I do, the determinant keeps its ~0 value.
What should I do now?
I plan to develop a semi-structured assessment tool and further validate it on a relatively small sample of below 50 (clinical sample). I have been asked by the research committee to consider factor analysis.
So in this context, I wanted to know if anyone has used regularized factor analysis for tool validation which is recommended for small sample sizes?
I am aware that a high degree of normality in the data is desirable when maximum likelihood (ML) is chosen as the extraction method in EFA and that the constraint of normality is less important if principal axis factoring (PAF) is used as the method of extraction.
However, we have a couple of items in which the data are highly skewed to the left (i.e., there are very few responses at the low end of the response continuum). Does that put the validity of our EFAs at risk even if we use PAF?
This is a salient issue in some current research I'm involved in because the two items are among a very small number of items that we would like, if possible, to load on one of our anticipated factors.
Dear all,
I am conducting research on the impact of blockchain traceability for charitable donations on donation intentions (experimental design with multiple conditions, i.e., no traceability vs. blockchain traceability).
One scale/factor measures “likelihood to donate” consisting of 3 items (dependent variable).
Another ”trust” factor, consisting of 4 items (potential mediator).
Furthermore, a “perception of quality” consisting of 2 items (control).
And a scale “prior blockchain knowledge” consisting of 4 items (control).
My question is: since all these scales are taken from prior research, is CFA sufficient? Or, since the factors are from different studies (and thus have never been used together in one survey/model) should I start out with an EFA?
For instance, I am concerned that one (or perhaps both) items of ”perception of charity quality” might also load on the “trust”-scale. e.g., the item “I am confident that this charity uses money wisely”
Curious to hear your opinions on this, thank you in advance!
Greetings,
I am a DBA student conducting a study about "Factors Impacting Employee Turnover in the Medical Device Industry in the UAE."
My research model consists of 7 variables, out of which:
- 5 Variables measured using multi-item scales adapted from literature ex. Perceived External Prestige (6 items), Location (4 items), Flextime (4 items),.. etc.
- 2 are nominal variables
I want to conduct a reliability analysis using SPSS & I thought I need to do the below?
- Conduct reliability test using SPSS Cronbach's alpha for each construct (except for nominal variables)
- Deal with low alpha coefficients (how to do so?)
- Conduct Exploratory Factor Analysis to test for discriminant validity
Am I thinking right? Attached are my results up to now..
Thank you
Hi everyone,
I have longitudinal data for the same set of 300 subjects over seven years. Can I use '''year' as a control variable? Initially, I used one way ANOVA and found no significant different across seven years in each construct.
Which approach is more appropriate?. Pooling time series after ANOVA (if not significant) or using 'year' as a control variable?
Hello everyone,
As the title suggests, I am trying to figure out how to compute a reduced correlation matrix in R. I am running an Exploratory Factor Analysis using Maximum Likelihood as my extraction method, and am first creating a scree plot as one method to help me determine how many factors to extract. I read in Fabrigar and Wegener's (2012) Exploratory Factor Analysis, from their Understanding Statistics collection, that using a reduced correlation matrix when creating a scree plot for EFA is preferable compared to the unreduced correlation matrix. Any help is appreciated!
Thanks,
Alex
I`m conducting the translation of a very short scale of 12 ítems to assess therapeutic alliance in children. I have 61 answers and I wonder if that number of subjects it`s acceptable to run Exploratory Factor Analysis. I know that there is a suggestion of 5 participants for item to do EFA and 10 participants for item to do CFA. However, the number of participants here seem to be very smal for these analysys. What it´s your opinion?
Exploratory factor analysis was conducted to determine the underlying constructs of the questionnaire. The results show the % variance explained by each factor. What does % variance mean in EFA? How to interpret the results? How can I explain the % variance to a non-technical person in simple non-statistical language?
I have good result with low variance explanation (Less than 50%) in exploratory factor analysis, and read some discussions about the acceptable for total variance explanation < 50% in social sciences. Please recommend papers to support this issue or give me your suggestion.
Thanks in advance.
I am using the Environmental Motives Scale in a new population. My sample size is 263.
The results of my exploratory factor analysis showed 2 factors (Eigenvalue>1 and with loadings >0.3) - Biospheric Factor and Human Factor
Cronbach alpha was high for both factors (>0.8)
However, unexpectedly, confirmatory factor analysis showed that the model did not fit well:
RMSEA= 0.126, TLI =0.872 and SRMR = 0.063, AIC = 6786
After a long time on Youtube, I then checked the residual matrix and found that the standardized covariance residuals between two of the items in the Biospheric factor was 7.480. From what I understand if values are >3, it indicates that there may be additional factor/s that are accounting for correlation besides the named factor. I therefore tried covarying the error terms of those two items and rechecked the model fit using CFA.
Results of this model show much better model fit.
RMSEA = 0.083, TLI = 0.945, SRMR = 0.043, AIC = 6731 (not as much difference as I thought there would be)
The questions I am now left with (which google does not seem to have the answer to) are:
1. Is it acceptable to covary the error terms to improve model fit?
2. How does covarying error terms impact on the scoring of the individual scales? Can I still add up the items to measure biospheric vs human scales as I would have without the covarying terms?
I would be so grateful for any insight or assistance.
Thank you
Tabitha
I used exploratory factor analysis for 4 latent variables, the result in the table called "total of variance explained" has "% of variance". is it similar to average variance extracted?
What steps to do discriminant validity in SPSS? I run the factor analysis, then compute the latent variable to become observed variables, after that I run the correlation. is it the correct process?
Thanks for your attention
Hi, I am working on a project about ethical dilemmas. This project requires development of a new questionnaire that should be valid and reliable. We started with collecting the items from the literature (n= 57), performed content validity where irrelevant items were removed (n=46), and piloted it to get the level of internal consistency. Results showed that the questionnaire has a high level of content validity and internal consistency. We were requested to perform exploratory factor analysis to confirm convergent and discriminant validity.
Extraction PCA
rotation varimax
Results: the items' communalities were higher than 0.6.
kMO 70%
Barttlett's test is significant.
Number of extracted factors 11with total explained variance 60%.
My issue is 6 factors contain only 2 items. Should I remove all these items?
With notice that the items are varied, each one describes a different situation, and only they share in that they are ethical dilemmas and deleting them will affect the overall questionnaire ability to assess participants' level of difficulty and frequency of such situations.
EFA is new concept for me; I am really confused by this data.
Do you know any renowned article which has been published in Scopus journal describing that for conducting the Exploratory Factor Analysis (EFA), which method is the best, 'Principal component' or 'Principal Axis Factoring ' in SPSS?
I collected 109 responses for 60 indicators to measure the status of urban sustainability as a pilot study. So far I know, I cannot run EFA as 1 indicator required at least 5 responses, but I do not know whether I can run PCA with limited responses? Would you please suggest to me the applicability of PCA or any other possible analysis?
Query1)
Can mirt exploratory factor analysis method be used for factor structure for marketing/management research studies because most of the studies that I have gone through are related to education test studies?
My objective is to extract factors to be used in subsequent analysis (Regression/SEM)
My data is comprised of questions like:
Data sample for Rasch Factors
Thinking about your general shopping habits, do you ever:
a. Buy something online
b. Use your cell phone to buy something online
c. Watch product review videos online
RESPONSE CATEGORIES:
Yes = 1
No = 0
Data sample for graded Factors
Thinking about ride-hailing services such as Uber or Lyft, do you think the following statements describe them well?
a. Are less expensive than taking a taxi
c. Use drivers who you would feel safe riding with
d. Save their users time and stress
e. Are more reliable than taking a taxi or public transportation
RESPONSE CATEGORIES:
Yes =3
Not sure = 2
No = 1
Query2) If we use mirt exploratory factor analysis using rasch model for dichotomous and graded for polytomous, do these models by default contain tetrachoric correlation for rash model and polychoric correlation for graded models?
My objective is to extract factors to be used in subsequent analysis (Regression/SEM)
Note: I am using R for data analysis
Hi,
I used a self-efficacy tool for my sample. According to the original article, there is only one factor in the tool. However, in the Exploratory factor analysis for my sample, two factors were found. How can I interpret this result?
Thank you so much for your answer in advance!
Hi, I have run an exploratory factor analysis (principal axis factoring, oblique rotation) on 16 items using a 0.4 threshold. This yielded two factors, which I had anticipated as the survey was constructed to measure two constructs. Two items had factor loadings <0.4 (from Factor 1) so I removed them, leaving 14. However, upon closer inspection, one of the items from Factor 2 loaded on to Factor 1 (|B| = 0.460).
The distinction between the two constructs is very clear so there should not be any misunderstanding on the part of the participants (n = 104). I'm unsure of what to do. I checked the Cronbach's alpha for each factor: Factor 1 (a = .835 with the problematic item, a = .839 without). Factor 2 is a = .791).
Do I remove the item? Any advice would be very much appreciated. Thank you!
Can anyone help me with the sample size calculation for the exploratory factor analysis? Do you know how to calculate it and with which statistical program? Thank you.
Can mirt exploratory factor analysis method be used for factor structure for marketing/management research studies because most of the studies that I have gone through are related to education test studies?
My objective is to extract factors to be used in subsequent analysis (Regression/SEM)
My data is comprised of questions like:
Data sample for Rasch Factors
Thinking about your general shopping habits, do you ever:
a. Buy something online
b. Watch product review videos online
RESPONSE CATEGORIES:
1 Yes
2 No
Data sample for graded Factors
Thinking about ride-hailing services such as Uber or Lyft, do you think the following statements describe them well?
a. Are less expensive than taking a taxi
b. Save their users time and stress
RESPONSE CATEGORIES:
1 Yes
2 No
3 Not sure
I have extracted factor scores after EFA and then used K-means to identify clusters.
But there are confusion in validating the number of cluster in SPSS, as it does not show any AIC or BIC value on the basis i can differentiate and finalise the number of clusters.
I have adopted my questionnaire from previous literature. I want to know if I still need to carry out EFA before carrying out PLS-SEM for my thesis?
Hi,
I am working on exploratory factor analysis in SPSS using promax rotation.
Upon checking pattern matrix, there is one question which has factor loading greater than 1.0? should I need to ignore if loading greater than 1.0?
Also i realised there are few negative loadings in pattern matrix? Can negative loadings be still considered for the analysis?
Thanks much.
Hello everyone,
I have run a confirmatory factor analysis in R to assess the translated version of an existing questionnaire. It is unidimensional and consists of 16 adjective-based items rated on a 7-point Likert Scale.
Here are the results:
X2= 627.197, df= 104, p= 0.000, RMSEA= 0.109, CI 90%= [0.101, 0.117], SRMR= 0.063, CFI= 0.839, TLI= 0.814
I am aware of all the cutoffs; well the result of my RMSEA is troublesome. On the other hand, as I am delving into similar topics, some of tem just reported these results as satisfying and didn't conduct an Explaratory Factor Analysis.
What I am wondering is if my results are acceptable to just limit myself to report them and run no EFA study?
Or should I run EFA and then gather data again based on the model proposed by EFA results?
Thanks for your time,
Sara
The items used in my study have been adapted from the instruments that have been developed by some previous researchers. Most of the recently published articles didn't show the EFA results.
The Project is incomplete, Please open the file attached to see!
I have a data which contains Three Level Items (YES-NO-NotSure). Is it technically right to transform data into numeric type and perform EFA (Exploratory Factor Analysis) to extract factor scores to use in subsequent analysis?
I assessed psychometric features of a construct and after exploratory factor analysis,
more than half of the items were excluded. How the construct/ content validity was influenced?
Any ideas or suggestions for reading ?
Dear researchers, I am a master student, and now writing my graduation thesis. I am studying how the six dimensions of post-purchase customer experience influence customer satisfaction, in turn, repurchase intention. I have adapted the measurement scales of the six dimensions of post-purchase customer experience to make it more applicable for my study context. My question is: do I have to conduct exploratory factor analysis in SPSS? I have done that, but there are so many cross-loadings, I tried different methods, but the results still look not good. There are two dimensions of post-purchase customer experience(customer support and product quality) are loading to the same new factor, I feel it is not acceptable, because they are very different. I understand there may be some problems related to my questionnaire, but I have no chance to improve the questionnaire now.
I tried to use SmartPLS to do my analysis, and the factor analysis in this software looks great,but I think the factor analysis here is CFA instead of EFA. So can I skip EFA to do CFA directly?
I will need to finish my thesis in 1 month, and I really need your help. Thank you!
Hello RG researchers,
I am a bit confused due to different questions and comments.
Well, I have a single factor containing 11 items (Likert rating). For the EFA, I am using SPSS (maximum likelihood) and I use lavaan and Amos for the CFA. I've got three questions:
1. KMO and Bartlett's tests' criteria are met while the normality tests (Kolmogorov-Smirnov and Shapiro-Wilk test) are not met (they are both significant). So, am I good to keep up the EFA or shall I need to use Satorra-Bentler or Yuan-Bentler adjustments (if yes, what software do I need to use)?
2. Should I be checking the normality for each item or checking the variable's normality is enough?
3. For the divergent validity, I use two other variables aside from my main questionnaire. Do they also need to be distributed normally as well?
Thanks for your time,
Sara
I am examining results from an exploratory factor analysis (using Mplus) and it seems like the two-factor solution fits the data better than the one factor solution (per the RMSEA, chi-square LRT, CFI, TLI, and WRMR). Model fit for the one factor model was, in fact, poor (e.g., RMSEA = .10, CFI = .90). In the two factor model, the two latent factors were strongly correlated (.75) and model fit was satisfactory (e.g., RMSEA = .07, CFI = .94). The scree plot, a parallel analysis, and eigenvalue > 1, however, all seem to point to the one-factor model.
I am not sure whether I should retain the one or two factor model. I'm also not sure whether I should look at other parameters/model estimates to make determine how many factors I retain. Theoretically, both models make sense. I intend to use these models to conduct an IRT (uni- or multidimensional graded response model - depending on the # of factors I retain).
Thank you in advance!
Currently, I am performing a factor analysis on 6 items.
I read that the residual plot can be used to assess the assumptions of normality, homoscedasticity, and linearity. However, I do not understand which residuals to use for this analysis. Do I need to examine 15 different plots for each combination of the 6 items?
Hello all,
This is my first time doing CFA AMOS.
Initially, I developed a scale for a specific industry 17 items 5 factor scale based on theory of other industries. This proposed scale was tested with two ) datasets first with n=91 year 1 and second n=119 year 2 from a single institution. EFA identified 3 underlying factors in both the datasets, no items were deleted.
During year 3, a sample of n=690 consisting of participants all over the nation was used to do CFA using SPSS AMOS. Following is the output:
1. Based on EFA (3 factors, 17 items)
a) Chisquare = 1101.449 and df= 116 [χ2/DF = 9.495]
b) GFI = 0.805
c) NFI = 0.898
d) IFI = 0.908
e) TLI = 0.892
f) CFI = 0.908
g) RMSEA =0.111 (PClose 0.000)
h) Variance
Estimate S.E. C.R. P Label
F1 .573 .056 10.223 ***
F2 .668 .043 15.453 ***
F3 .627 .040 15.620 ***
i) Covariance
Estimate S.E. C.R. P Label
F1 <--> F2 .446 .036 12.502 ***
F1 <--> F3 .365 .032 11.428 ***
2) Based on theory (5 factors, 17 items)
a) Chisquare = 440.594 and df= 109 [χ2/DF = 4.042]
b) GFI = 0.926
c) NFI = 0.959
d) IFI = 0.969
e) TLI = 0.961
f) CFI = 0.969
g) RMSEA =0.066 (PClose 0.000)
h) Variance
Estimate S.E. C.R. P Label
F1 .677 .047 14.334 ***
F2 .670 .043 15.493 ***
F3 .648 .054 12.100 ***
F4 .741 .061 12.103 ***
F5 .627 .040 15.620 ***
i) Covariance
Estimate S.E. C.R. P Label
F1 <--> F2 .503 .036 14.057 ***
F1 <--> F3 .581 .041 14.262 ***
F1 <--> F4 .546 .041 13.388 ***
F1 <--> F5 .398 .032 12.321 ***
F2 <--> F3 .457 .036 12.848 ***
F2 <--> F4 .403 .035 11.405 ***
F2 <--> F5 .458 .033 13.899 ***
F3 <--> F4 .553 .042 13.036 ***
F3 <--> F5 .360 .032 11.275 ***
F4 <--> F5 .358 .033 10.754 ***
My questions:
1. Do I have to normalize the data before CFA analysis? (I am finding conflicting information since my scale is a likert scale and extreme values are not really outliers ?)
2. Can I report that theory based model is a better fit compared to EFA model? Would doing so be appropriate?
3. Is there anything else I need to do ?
Any guidance will be greatly appreciated.
Thank you,
Sivarchana
What is the best method or criteria to use in choosing the best item, when cross-loadings of items is evident in exploratory factor analysis? Thanks!
I am working on developing a new scale .On running the EFA, only one factor emerged clearly while the other two factors were messy with multiple item loadings from the different factors.
1- Is it possible that I remove the cross-loadings one by one to reach a better factor structure by re-running the analysis?
2-If multiple items still load on one factor, what criteria should I use to determine what this factor is?
Can the measured variables that remain ungrouped in exploratory factor analysis be included as separate variables during the structural equation modeling (SEM) of the latent variables observed in factor analysis? Please help me with some references.
I am developing a questionnaire and first performing an exploratory factor analysis. After I have the final factor structure, I plan on regressing the factor scores on some demographic covariates. Since I am anticipating missing item responses, I am thinking of imputing the item scores before combining them into factor scores (by average or sum).
I came across a paper that suggested using mice in stata and specifying the factor scores as passive variables. I am wondering if this is the best approach since I read somewhere that says passive variables may be problematic. Or, are there any alternative solutions? Thank you!
Here is a link to the paper, and the stata codes are included in the Appendix.
From the item under 0,32,0,40,0,50 as we have selected ? From the item that doesnt have a loading on any factor or from the items that have loading to two or three or more factors?Is there a rule in terms of the order of procedures?From which one should we start to exclude?