Science topic

# Correlation Analysis - Science topic

Explore the latest questions and answers in Correlation Analysis, and find Correlation Analysis experts.
Questions related to Correlation Analysis
• asked a question related to Correlation Analysis
Question
I have a dataset for 500 participants and their total sleep time for a period of 10 days, some of these participants do not have data for all the days, e.g. 10 of them have data only for 4 days instead of 10. Can I perform a correlation analysis on this data for sleep time and temperature, without excluding these participants? What other methods can I use for this analysis? Can I use linear regression to explain the effect of temperature on sleep?
Don't delete! You are biasing your sample.
You can use robust variance estimation to allow for clustering of observations within participants.
• asked a question related to Correlation Analysis
Question
I have to run a pearson correlation analysis between HPA axis markers- cortisol , ACTH and inflammatory marker IL-6(analysed through ELISA) and Depression anxiety questionnaires which were recorded at baseline and post interventions. I also need check the data for normality before running the analysis. If someone can provide an appropriate suggestion.
Thank you so much Anna. Would like to ask when running a correlation analysis between hormones and questionnaire data for post-intervention do you correlate it directly with the post scores or first calculate the change score and then correlate it with hormones concentration ?
• asked a question related to Correlation Analysis
Question
To calculate the sample size, I need to assume the smallest "r" with clinical relevance (error = 5% and power = 80%). My question is, which is that value? I searched for other papers to interpret correlation strength properly, but I had no results. Could you please suggest any?
I am afraid that there won't be any proper interpretation of correlation strength. And there won't by any discussions of what the clinical relevance of a correlation would be.
• asked a question related to Correlation Analysis
Question
why there are some using confirmatory factor analysis?
According to PLS-SEM (ADANCO), Discriminant Validity could be used for the Heterotrait-Monotrait Ratio of Correlations and Fornell-Larcker Criterion.
Construct validity refers to how well a test measures the concept for which it was designed. It is essential for establishing a method's overall validity.
Assessing construct validity is particularly important when studying intangible phenomena, such as intelligence, self-confidence, or happiness, which cannot be directly measured or observed. To evaluate these constructs, multiple observable or quantifiable indicators are required.
Validity of measurement types, Construct validity is one of the four measurement validity types. The remaining three are:
Content Validity: Is the test completely representative of what it intends to measure?
Face validity: Does the test's content appear to correspond with its objectives?
Criterion validity: Do the results accurately measure the specific outcome that they are intended to measure?
Confirmatory factor analysis (CFA) is a frequent technique for determining construct validity. Similar to EFA, CFA is a tool that researchers can use to attempt to reduce the total number of observed variables into latent factors based on data similarities.
• asked a question related to Correlation Analysis
Question
i am analysing the associations between social sustainability practices (62 observed indicators) that are grouped under 8 social categories, and sustainability enablers (10 observed indicators)
it is theoretically hypothesised that there is positive correlation between social sustainability practices and the enablers.
i'am interested in knowing the observed indicators and drawing conclusion based on that, i do not want to draw conclusions based on constructs.
can i use canonical correlation analysis (CCA) to analyse the strength of associations between 62 social indicator (dependent set) and another set of the 10 enablers ( independent set)?
or is it better to use structural equation modelling (SEM) and analyse path coefficients from independent variable (enablers) ----------------> 8 social categories each forming a latent dependent variable?
OK is good, as for the interpretation of the results you must look to the principal component scores (SOC and ENAB) most loaded on canonical variates. The original variables loadings on their components will allowyou to interpret the meaning of the principal components and then of the canonical solution, see for interpretation of components by their loadings (correlation coefficient between variables and their components) see the attached file. (Chapter: Catching the sense of experimental results: a case study on animal behavior).
• asked a question related to Correlation Analysis
Question
I have come across some theses which perform ANOVA, independent sample t-test, or multi-way ANOVA after performing factor analysis.
Most of the theses make use of Likert Scales (between 5 and 7). I am wondering about the basis of analysing data using the above-identified, which I understand should be performed on data that follow a normal distribution. With Likert Scale, data analysis should not use the mean. What would justify the use of one-way ANOVA instead of the Kruskal-Wallis H test and others such as the Mann-Whitney test?
Also, before performing data analysis after factor analysis, how are observed variables computed into new factors? (sum / average or another method).
There are factor-analytic methods for ordinal data (i.e., binary or ordered categorical variables such as many Likert items; see references below). These techniques allow you to analyze/infer continuous latent variables ("factors") based on ordinal items. You could then extract continuous factor scores for further analyses or add external variables directly to the factor/structural equation model so that the factors can be used as outcome variables.
Under certain conditions, the simple sum or unweighted mean of the items can be used as a "proxy" for a continuous latent factor score, namely when all items that measure a given factor are unidimensional and have equal loadings. Other techniques that allow you to infer a continuous scale from binary or ordinal items are provided by item response theory (IRT), for example, the Rasch (1-parameter logistic) model. Many IRT models assume that a continuous latent "trait" factor underlies a set of binary/ordinal items.
Finney, S. J., & DiStefano, C. (2006). Non-normal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.). Structural equation modeling: a second course. Greenwich, CT: Information Age Publishing.
Flora, D. B., & Curran, P. J. (2004). An Empirical Evaluation of Alternative Methods of Estimation for Confirmatory Factor Analysis With Ordinal Data. Psychological Methods, 9(4), 466–491. https://doi.org/10.1037/1082-989X.9.4.466
Jöreskog, K. and Moustaki, I. (2001). Factor analysis for ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347-387.
Takane, Y. & De Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.
• asked a question related to Correlation Analysis
Question
Hi everyone,
I tried to run a Point Biserial Correlation with one continuous variable and several dummy coded nominal variables however, my continuous (dependent) variable violated the normality assumption.
Are there any alternatives for assessing the correlation between one continuous dependent variable and several dummy coded nominal variables?
Thank you!
If I understand correctly, you have several binary variable and one numeric variable, and you want to see if (and by how much, measured perhaps by Cohen's d or a correlation measure) the values on this numeric variables differ between the two values for each of the binary variables, BUT you do not feel a t-test is appropriate. Is this correct? If only differences in means are what you want, then bootstrapping could be used. Otherwise, there are lots of transformations (including rank-based tests) that can be done. But, I am not sure if I understand your problem.
• asked a question related to Correlation Analysis
Question
In my research, I used 7-point Likert scale for measuring situation and agreement. They are used and coded respectively as below:
Situation:
Far less = 1
Moderately less = 2
Slightly less = 3
Almost the same = 4
Slightly more = 5
Moderately more = 6
Far more = 7
Agreement
Strongly disagree = 1
Disagree = 2
Slightly disagree = 3
Neutral = 4
Slightly agree = 5
Agree = 6
Strongly agree = 7
My problem is the questionnaires were distributed to two opposite groups (i.e. claimant and defendant).
When measuring the situation, I asked the respondents about a comparative degree that most accurately describes their positions.
For example, I asked the claimant if they had less/ almost the same/ more resource than the defendant and vice versa. If the claimant chooses less resource, which is equivalent to the defendant choosing more resource. Therefore, I tried to code the data as follows:
Far less/ more = extremely asymmetric= 1
Moderately less/ more = moderately asymmetric = 3
Slightly less/ more = slightly asymmetric = 5
Almost the same = symmetrical= 7
May I ask if it is appropriate for me to convert and interpret the data like the above? Are there other ways that can help me to better analyse the data?
Thanks
Nan Cao Non-parametric tests for independence, such as Spearman's correlation or the chi-square test, should be used for ordinal data (individual Likert-scale items). Use parametric tests such as Pearson's r correlation or t-tests for interval data (overall Likert scale scores).
A Likert scale is made up of four or more Likert-type items that reflect related questions and are combined to form a single composite score/variable. Likert scale data may be examined as interval data, which means that the mean is the most accurate indicator of central tendency. To explain the scale, utilize means and standard deviations.
• asked a question related to Correlation Analysis
Question
Hey, everyone,
A is positively correlated with B;
A is also positively correlated with C;
BUT, B is negatively correlated with C.
How to explain this result?
• asked a question related to Correlation Analysis
Question
From the correlation analysis, in the past, it is estimated that Vsw has quite a good correlation with IMF-Bz during the geomagnetic storm events; however, for some intense events, quite less correlation between Vsw and IMF-Bz was also observed, what could be the possible factors behind this?
There are two different physical mechanisms of generation of the magnetospheric convection — quasi-viscous interaction between the solar wind and the magnetosphere in the low-latitude boundary layer [Axford, Hines, 1961] and IMF reconnection with the geomagnetic field at the dayside magnetopause with subsequent formation of open magnetotail lobes [Dungey, 1961].
IMF-Bz is just one of them.
• asked a question related to Correlation Analysis
Question
In the file that I attached below there is a line upon the theta(1) coefficient and another one exactly below C(9). In addition, what is this number below C(9)? There is no description
I asked that question to the person who code that package, and he said C(9) coefficient does not have any meaning here, just ignore. It comes up because the package is written for the old version of Eviews and has not been updated that is why.
• asked a question related to Correlation Analysis
Question
Hi!
Here have two variables, A and B. The Pearson correlations coefficients table showed that they had no significant correlational relationship between A and B. Then, is it redundant to do a further regression study between A and B?
If there is no significant correlation, then the proportion of the variation in the dependent variable that is explained by a linear regression will also be small. So do not do a linear regression.
• asked a question related to Correlation Analysis
Question
I have a dataset composed of aphid and parasitoid abundances captured in Moericke traps on a monthly scale for 10 years. As I do not have data on parasitism, but on the occurrence of aphids and parasitoids, I cannot use common trophic networks. In this way, I think I could explore some community-level relationships through correlation-based networks. However, I would like to know if there is any impediment to using this approach or if anyone has already used it.
Grateful!
It is mainly studies in microbiology that have applied these methods, and I know of no study that has done this on host-parasitoid or prey-predator networks. I think the general principle is the same though.
You will probably have to think about applying a time-lag in your models, to take into account the development time of parasitoids in aphids. I enclose 4 publications that have studied these questions, with models that are relatively easy to implement in R.
Of course the main issue would be that you cannot directly link parasitoid abundances to the biological control service provided (correlation is not causation).
We can continue to discuss by mail if you want, and see what we can do together on this subject if you are interested.
Kevin
• asked a question related to Correlation Analysis
Question
Climate change mitigation
On the other hand, I am of the opinion that the issue of Land use – Land use changes may be applied in the research that I am dealing with.
Namely, I am kindly inviting you to have a look at the application of my IntErO model for the calculation of soil erosion intensity and runoff (link): www.geasci.org/IntErO.
- Simple installation;
- Examples of applications available on the same web page (link): www.geasci.org/IntErO;
- Examples of applications available on my profile on Research Gate;
- Examples of applications available on web page (link): www.geasci.org/Publications;
The impact of land-use changes on soil erosion intensity and runoff is our link for potential cooperation.
Sincerely,
Velibor Spalevic
Mob/Viber/WhatsApp: +382 67 201 222
• asked a question related to Correlation Analysis
Question
I am following the way how a previous paper (PMID: 30948552) treating their spatial transcriptomic (ST) data. It seems like they combined all expression matrix (not mentioned whether normalized or log transformed) of different conditions, and calculate a gene-gene similarity matrix (by Pearson rather than Spearman), and they finally got some gene modules (clustered by L1 norm and average linkage) with different expression between conditions.
So I have several combination of methods to imitate their workflow.
For expression matrix, I have two choice. The first one is a merged count matrix from different conditions. The second one is a normalized data matrix (default by NormalizeData function in seurat, log((count/total count of spot)*10000+1)). For correlation, I have used spearman or pearson to calculate a correlation matrix.
But, I got stuck.
When I use a count matrix, no matter which correlation method, I get a heatmap with mostly positive value pattern, which looks strange. And for a normalized data matrix (only pearson calculated), I got a heatmap with sparse pattern, which is indescribably strange too.
My questions:
1. Which combinations of data and method should I use?
2. Would this workflow weaken the correlation of the genes since some may have correlations only in specific condition?
3. Whatever you think of my work?
You install the R and Rstudio software and visit this web site:
Heatmap in R: Static and Interactive Visualization - Datanovia
• asked a question related to Correlation Analysis
Question
Is it possible to run a correlation test on a continuous DV and a categorical IV with 3 levels?
I'm investigating is gender is associated with academic procrastination, however, my gender variable is coded as: 0 = Male; 1 = Female; 2 = Non-Binary.
Initially I ran a Pearson product-moment correlation coefficient test however, I have now realised that this may not be the right procedure.
Any help would be greatly appreciated!
Run ANoVA test with tukey HSD
• asked a question related to Correlation Analysis
Question
I have categorical variables and I want to test the relationships between categorical variables of two sets. I have the SPSS 22 version, I can't find how to make the nonlinear canonical correlation analysis.
• asked a question related to Correlation Analysis
Question
Hello, I have a question regarding an investigation of Reuter et al. (2019)
Where I have managed to extract the questions (5 scale likert) that I detail below
Attitude
- Fake news poses a threat
- Fake news can manipulate the population's opinion
- Fake news can manipulate the opinion of politicians, journalists and other influential players
- Fake news harms the democracy
- It's the state's task to prevent fake news
- Social bots pose a threat
- The state censorship poses a threat
- Fake news is just a pretext to be able to fight system critical actors
- Fake news is at most annoying, but does not pose a threat
Interaction
- I have perceived fake news
- I have deleted/reported fake news
- I have dislikes fake news
- I have commented on fake news
- I have liked/disliked fake news
- I have shared fake news
- I have created fake news
Since this research has not named its questionnaire and it does not yet have dimensions. Can I take it as a reference in my research?
Of course, the issue later will be validation and reliability. Can you please guide me if there is already a defined questionnaire for these variables (attitude, interaction)?
It sounds like they have published their items, so you can use them in your research. But be sure to reverse some of the items in the first scale. As for the second scale, it seems to be a real mix of content, from disliking fake news to creating it.
What do you mean by a "dimension less" Likert scale?
• asked a question related to Correlation Analysis
Question
Excuse me if this is a question with an obvious answer (I am a MSC student not a professional researcher). I am exploring the the correlates between a number of variables (different social skills with anxiety, age and cognitive factors). I also wanted to see if there are gender differences between these variables/correlations.
Comparison of means between genders shows no significant difference between males and females for any variable.
If I enter gender as a variable and explore correlations with other variables there are also no significant correlations.
However, when I use the split-group function (by gender) on SPSS and run my correlation analysis for all other variables (not inc. gender)- some correlations are significant just for male participants and some for female. I just want to check that this makes sense (I think it does). I imagine that by using the split-group function this in some ways works a bit like using gender as a moderator? In that I can see how gender impacts the relationship between certain variables? (e.g x correlates with y but only in group a not group b). So I believe I have found that although there are no differences in mean scores between groups (gender), gender does have an interaction effect on the correlation between some variables.
Any tips on how to report on this? and does this seem correct? would it be better to actually just run a moderation analysis?
Do regression not correlation. Test the coefficient of gender differences. Choose a reasonable dependant variable.
• asked a question related to Correlation Analysis
Question
When one conduct a Canonical correlation analysis many function are extracted (the lower data set n) the First function has high correlation and contribute much to the variation. While the other have very low contribution. Can the researcher select the function who sum 90 percent variance.
You can find many statistical applications in SPSS.
• asked a question related to Correlation Analysis
Question
Hi everyone,
I am using two scales, previous research caculated the reliability of the BFI-2-S, they caculated the average cronbach's alpha of subscales (eg. (Extraversion + Agreeableness...)/5).
But in another scale, previous research caculate all items cronbach's alpha only once, and they caculated correlation between all subscales.
1. What's different in the two approaches?
2. How much correlation is appropriate? I didn't find relevant literature.
I would've shown, each scale's Cronbach's alpha, which is close to (1) and hope they are all reasonably high (e.g., above .80).
• asked a question related to Correlation Analysis
Question
I would like to study correlation between four transcripts (fold changes of mRNA expression) at different time intervals (5 time points). How can I perform this analysis?
Try Correlation matrix on R Programming. Try corrplot( ) package in R
• asked a question related to Correlation Analysis
Question
We performed a correlation analysis of solar wind speed (Vsw) with z-component of interplanetary magnetic field (Bz) during geomagnetic storm events with different strengths. We found varying results: a bit larger correlation coefficient for some events, but also a moderate association during some events. What sort of results can we expect?
Are you suggesting that there might be a tendency for Bz to have one sign or the other (+/-) in correlation with the velocity of the solar wind? I can imagine that there might be something related to the solar cycle, where by the polarity of the sun's magnetic field changes, but I'd think that would be very weakly manifest (say) in the IMF by the time it reaches Earth, what with the IMF being distorted by solar wind and CMEs, etc. I'd be very surprised if you find a strong correlation. And, then, if you did, what would you do with it? Just asking.
• asked a question related to Correlation Analysis
Question
If for example, I have a large enough dataset with multiple input variables and one target. But it is unknown whether the input variable(s) are correlated with the target or not. Is there any way to quantitatively analyze if the target has a correlation with the input variable(s) only from the data points?
For example, I have three independent variables x,y, and z. And dependent variable (target) is r. Here for the purpose of demonstration, the (x,y) is known to be (m*cos(m)/c,m*sin(m)/c). This is a function of a spiral in a 2D space, where the m is an array of points and c is a constant. (Figure is attached) The target variable r is the distance of the (x,y) points from the origin (0,0) in the 2D cartesian space.
The independent variable z is said to have uniform random values and has no relation with the target variable r.
The values of the Pearson's r for an independent variable and the target are found to be
r_x,r = 0.03250883308649153
r_y,r = -0.10980064148604964
r_z,r = -0.17896621141606622
Now to be specific, my question becomes is there any quantitative way to observe that the x and y variables together have a correlation to the target variable, and variable z has no correlation with the target r?
I am not sure do you mean a regression equation? y = b0 + b1*x + e?
• asked a question related to Correlation Analysis
Question
I have to find the correlation and causality between participants’ smartphone uses pattern and loneliness. For that, i have the dependent variable loneliness (both baseline (0 to 80) and daily: in an 0 to 10 numeric scale), and independent variables listed from daily smartphone uses screen time, spend time in several categorized apps like social media, entertainment, communication, physical activity, sleep, and some social variables and moods like how much the person felt bored, anxious, satisfied, productive, etc. I will collect these independent variables for 10 days, and the psychosocial variables will be collected three times a day. I also have to use a couple of prediction algorithms for predicting loneliness with the above-mentioned independent variables. But first, I have to find the correlation and causality between the dependent and independent variables. I just have programming experience until now with python, no statistics knowledge. I have about 9 months remaining to complete my thesis. Please share your valuable guideline and resources on how can I proceed with this challenge where I can get the most meaningful results.
Abiodun Christian Ibiloye Thank you so much sir for your enormous suggestions.
• asked a question related to Correlation Analysis
Question
The aim of my research is to analyse the correlation between two delta values (change between two timepoints) via regression analysis.
Let the variables be X, Y, and Z, and t0 represent pre-intervention and t1 represent post-intervention. X is a psychometric value (Visual Analogue Scale ranging from 0 to 100), Y and Z are biological values.
For example, I want to calculate the correlation between delta (Xt1 - Xt0), delta (Yt1 - Yt0), and delta (Zt1 - Zt0).
I am aware that delta value is statistically inefficient, therefore Pearson's correlation or Spearman's correlation is out. I would appreciate any advice or any model examples. Thanks!
One main issue with correlations of observed variable difference (change) scores is that these correlations may be strongly attenuated due to measurement error (unreliability). In terms of classical test theory, observed variable difference (change) scores often have low reliabilities because measurement error from both pretest and posttest affect the error variance component of the difference (change) score. To avoid the problem of low change score reliability, latent difference (change) scores can be used that are based on differences between true scores rather than observed scores. (True scores are by definition free of measurement error.) The correlations between "true difference scores" can be estimated using methods of structural equation modeling/longitudinal confirmatory factor analysis.
To mathematically identify latent difference score variables, you either need multiple (at least 2) observed variables ("indicators", measures) for each construct X, Y, and Z at each of the two time points or find a way to otherwise identify the error variance component of each observed variable X, Y, Z through appropriate constraints.
If you have appropriate estimates of the reliabilities of X, Y, and Z, you could derive (compute) the error variance components and specify them accordingly in a latent difference (change) score model with fixed error variances. Other options may be available for your design. You could check out the extensive literature on latent change score modeling to explore this option further and see if it works for your design, e.g.:
Another option that may be applicable in your case is a relatively simple computational correction for attenuation, see, e.g.:
Again, this also requires that the reliability of the change scores be known.
• asked a question related to Correlation Analysis
Question
Dear colleagues.
Hello, I am currently studying brain connectivity in the disease group.
Recently, I constructed a connectivity matrix from each participants' neuroimaging data (diffusion tensor imaging), then ran the edge-wise correlational analysis with the neuropsychological score using a tool similar to Network-based statistics (a.k.a NBS; Zalesky et al., 2010).
As a result, I got an edge-level network consisting of 10 edges with 19 nodes.
Those significantly denoted edges are not connected to each other but identified as single edges.
Conventionally, I used graph theoretical measures such as a degree or betweenness centrality for defining hub node (i.e. hub region = betweenness centrality + 1SD within the nodes of the network).
However, in this case, I have an edge-level network that is hard to say is clustered or connected but consisted of multiple single edges.
From here, I want to specifically emphasize more significant edges or nodes within the identified network as a hub region (well it is hard to say it is a hub, but at least for easy comprehension), but I am quite struggling with what approach to take.
All discussion and suggestions are welcomed here.
Or if I am misunderstanding any, please give me feedback or comments.
Jean
For example, you could invent a measure based on cluster coefficients.
You could consider all vertices that lie in a shortest edge distance up to a certain maximum, and count all edge in it.
Also, it can be interesting to determine the number of cliques a vertex takes part in. Or the distribution of the number of vertices in these cliques. Or the overlapping percentages of any pair of cliques a vertex is in.
Regards,
Joachim
• asked a question related to Correlation Analysis
Question
I conducted ADF test and found that one out of my four variables was stationary at the level form. The stationary variable at the level form was also stationary in the first difference form. Do I need to get all the variables to the same level form, I(1), before implementing the Pearson's Moment correlation analysis?
Hi Etuk,
Thanks for your response, but the issue is that if you do not get all the variables to same level one may fall foul of one of the assumptions of Pearson's correlation which is equality of observations. Differencing as a data transformation technique results in loss of data for each differencing; therefore by having a mix of level and I(I) variables you will not have equal observations as all the I(I) variables would have lost one observation each
• asked a question related to Correlation Analysis
Question
I want to run a correlation matrix analysis between multiple response variables. The data was obtained from the operation of a continuous bioreactor over a 330 days period, but the operating conditions were changed every three months. I am not sure if I have to run a correlation test for each operational strategy (i.e. the condition that was changed every three months - in this case was the concentration of a contaminant) or if I can run a single test for the whole operation of the reactor.
The results make more sense if I run a single test, but I am not sure if I can do that assuming the conditions were changed. But I know that, for environmental analyses as whole, it is quite common to run correlation analysis even when there are other factors changing over time.
You can get answers of all these questions at a time by wisely utilizing technique of regression analysis. Of course, it will include use of dummy variables, ANOVA, ANCOVA etc.
• asked a question related to Correlation Analysis
Question
Why is the importance of analyzing and conducting research on the correlation between the stock market valuation of securities (stocks, bonds, etc.) and the economic and financial situation of business entities growing?
In recent years, the importance of examining the correlation between stock market valuation of securities (stocks, bonds, etc.) and the economic and financial situation of business entities is growing, because there are more and more anomalies and speculations on capital markets, which may reduce these correlations. The importance of studying this issue is growing particularly when in certain listed markets valuations of certain financial assets and instruments are less and less related to economic fundamentals and the significance of speculation is growing. If such a lack of correlation between the stock market valuation and the economic and financial situation of business entities is growing, then the financial and / or economic crisis may occur. Such a situation of increase in speculative factors on stock exchanges and commodity exchange commodity markets appeared in 2006-2008 and contributed to the global financial crisis in 2008.
Do you agree with me on the above matter?
In the context of the above issues, I am asking you the following question:
Why is the importance of analyzing and conducting research on the correlation between the stock market valuation of securities (stocks, bonds, etc.) and the economic and financial situation of business entities growing?
I invite you to the discussion
Thank you very much
Best wishes
Dear Nazir Ali,
Yes, you asked a key question in the context of the topic of this discussion. Typically, it is in the event of an exceptionally large overvaluation or undervaluation of market valuations of listed securities that these kinds of questions are asked.
Thank you very much,
Best wishes,
Dariusz Prokopowicz
• asked a question related to Correlation Analysis
Question
Hello,
I am doing my MSc dissertation research on mental health. The aim of my project is to study the association between children's mental health and their parents' mental health, before and during Covid-19.
I am using one of the UK Data Services datasets. The datasets are 4 waves (one before and three during Covid-19), each wave has three datasets (Adult dataset, children dataset and youth dataset).
The mental health of children has been assessed using the Strength and Difficulties Questionnaire (SDQ) and the parents' mental health has been evaluated using the General Health Questionnaire (GHQ). SDQ and GHQ are both give me scores that indicate mental health.
When I merge the three datasets of each wave alone, I end up with SDQ just for children and GHQ just for adults and missing in both. Therefore, I couldn't run the analysis.
My questions are:
How to merge these datasets from all four waves in one dataset, probably?
What the suitable statical test (Pearson's correlation for each wave alone or One Way ANOVA)?
Thank you so much
It sounds like your data is in a long format. You may want to merge your data in a wide format.
The long-format data look like this following.
id child sdq ghq
1 1 x
2 1 x
3 1 x
1 0 x
2 0 x
3 0 x
For the rows of the indicator child = 1, parents' data (ghq) is missing. For the rows of the indicator child = 0, the child's data (sdq) is missing. Note that the id is duplicated twice: one for a parent's data and the other for a child's data.
The wide-format data should look like this.
id sdq-c ghq-p
1 x x
2 x x
3 x x
If that is the case, you will need to restructure your data.
• asked a question related to Correlation Analysis
Question
I've completed 3 Pearsons correlations for the following 2-tailed hypotheses:
(H1):A significant correlation between total NCS scores and total MMQ-Contentment score exists
(H2):A significant correlation between total NCS scores and total MMQ-Ability score exists (H3):A significant correlation between total NCS scores and total MMQ-Strategy score exists
And found r values of 0.2 (p = .004), 0.3 ( p <.001) and 0.13 (p = .069) respectively.
I've also collected data on participant age ranges (e.g. "30-35") and level of education ("Master's degree") and would like to test if either of these are covariates.
What is/are the most appropriate statistical test(s)?
Run regressions to determine if the potential confounding variables have any effect on the DVs of interest. If they do they may be confounders. Then look at the attached screenshot search for more detailed information. Best wishes David Booth
• asked a question related to Correlation Analysis
Question
I developed Cox PH model with time-dependent covariates to predict probability of default. One of these covariates (unemployment) is non-stationary and its coefficient (in exponential form) is enormous, particularly - 2.518e+32; I can't guess if this value is due to non-stationarity. I am interested if non-stationarity can trigger such a big coefficient?
p.s. This variable was important according to several variable selection techniques, such as Information Value, Xgboost variable importance plot, correlation analysis.
• asked a question related to Correlation Analysis
Question
Hi!
I have run a canonical correlation analysis on my data and want to display it for a poster as a plot but am having trouble finding anywhere online describing how to do this in SPSS 27. Any advice would be appreciated!
Hello Laura,
There's no direct option for this in SPSS. However, do have a look at the candisc library in R (specifically, the plot.cancor function).
• asked a question related to Correlation Analysis
Question
Hi Fellows,
The matrix is here at the bottom: https://statweb.stanford.edu/~jtaylo/courses/stats202/visualization.html. A similar version is seen on the book Introduction to Data Mining. It's clear that colours toward the red end indicate stronger correlation, but what attributes or variables are really correlated as shown? For example, along the main diagonal, cases of the same species show mostly perfect correlation, with a few near-perfect occurrences. Normally, a correlation is calculated with two columns of values, not two single cases.
Thanks
RP
• asked a question related to Correlation Analysis
Question
Hi,
I'm testing the relationship between dividends and share repurchases, using panel regression. I'm trying to include correlation analysis and the results are pretty much similar for all variables, which is that they are not correlated very much. The correlation coefficient is between -0.1 and 0.1 on most of them or the p value is much higher than 0.05. Is this an expected outcome for variables used in fixed effects model? (I've done the Hausman test, which suggested use of the fixed effects model). I'm just wondering, whether this is not some bad sign that the model is bad.
Correlation might perfectly change when you use fixed-effects. As you know, the results under FE and RE can be totally different.
In terms of correlation, you can compare the correlation between the variables in levels and the variables in changes (or demeaned). If you compute the correlation of the changes in the variable, this should be an expected outcome of the FE model.
• asked a question related to Correlation Analysis
Question
I performed a growth performance experiment of microalgae with four treatment. Where I measure cell dry weight (unit: mg/L), cell density (unit: ×10^5 cells/ml), chlorophyll a (unit: µg/ml) and Beta carotene (unit: µg/ml) content of the microalgae. Reviewer suggests me to analyze correlation of cell growth/size with the pigment content (Chlorophyll a and Beta carotene) of microalgae. So, how can I measure correlation of cell growth/size with the pigment content (Chlorophyll a and Beta carotene) of microalgae using Microsoft Excel or other suitable data analysis software? Thank you.
Thank you all
• asked a question related to Correlation Analysis
Question
I want to investigate the relationship between differences in coral physiological variables based on euclidean distances and seawater environmental variables using DISTLM and dbRDA in PRIMER, but I am not sure if this analysis is suitable given the lack of replication I have in my predictor variable (environmental) matrix.
I have attached an excel file illustrating the structure of my data set (the response and predictor variables). Briefly, I have a multivariate data set of measured physiological variables (e.g. lipid concentration, protein concentration, tissue biomass etc.) for corals collected from five different locations (A-E), where each site is very unique in its seawater physico-chemical parameters. I collected 12 corals per site (total of 60 samples). I have constructed a resemblance matrix of the physiological data in PRIMER based on Euclidean distances, and there is clear grouping of data points in the NMDS, which coincides with the different collection sites for each coral. I want to investigate the proportion of the observed variation in the multivariate data cloud that can be explained by the environmental characteristics of each collection site (e.g. mean annual sea surface temperature, seawater chlorophyll concentration, salinity etc.). However, the dataset of environmental variables does not have replication. i.e. for each site (A-E), I only have one value for mean annual sea surface temp, one value of salinity etc.
All of the case-study examples I have read about distance-based redundancy analysis in R or PRIMER have two resemblance matrices (predictor and response) both of which have replication. However, in my case, my response variables have replication (i.e. 12 samples per site), whereas my environmental variables do not have replication (i.e. one measurement per variable per site).
Can someone advise me whether or not dbRDA is suitable in this instance? If as I predict, it is not suitable, can you recommend a better approach? I am not an expert in multivariate statistics, but I want to make sure that the approach I take is sound.
Any and all advice is welcome. Thanks
Hi Rowan, I am in a similar situation. What I did I used an average of the response variables. But I do not know if it the optimal solution. Did you solve this riddle at the end?
• asked a question related to Correlation Analysis
Question
Hello Everyone,
I am currently supporting a research work that calculates correlation between two variables for an experimental group and comparing the correlation coefficient to that of control with known correlation coefficient. The theory is that the correlation of the two groups are the same.
How do I determine when sufficient data has been collected for the experimental group?
Note: when the data study started, Power analysis was not done, so effect size was not known.
I want to use bootstrapping but not sure if it is the right approach. Thanks for your response.
it would be then best to think about employing an optional stopping rule. This should be unproblematic as long as you use a Bayesian approach:
If you are looking for a free, easy-to-learn software that can perform Bayesian (and frequentist analysis) check out JASP.
Best of luck with your research,
Sven
• asked a question related to Correlation Analysis
Question
Relationship between conducted scientific research and innovations
Innovations can be the result of conducted research and scientific research.
Research work may concern, for example, defining, developing and planning the process of implementing innovative technologies in production processes or determining the potential industrial application of new, innovative types of materials, eg in the field of organic, biodegradable materials replacing hard-degradable plastics.
Research work is a process that is to lead to the implementation of specific research, analytical, research and implementation goals, the development of new solutions, formulas, laws, dependencies, correlations, inventions, patents and sometimes also innovations. Research work requires financial outlays, knowledge, and human resources of educated, experienced research staff. On the other hand, innovations are a new added value created, the value of which is determined by the possibilities of using a particular innovation in manufacturing processes, production or offering services. Innovation is a kind of product of previously conducted research.
I invite you to the discussion
Dear Osama Rahil Shaltami,
Yes, without research, many innovations would not arise.
Best regards,
Dariusz Prokopowicz
• asked a question related to Correlation Analysis
Question
Hello everyone,
I'm currenlty writing my bachelor's thesis about the impacts of COVID-19 on the hotel sector in Germany, with the goal of developing a training manual for hotel management and entrepreneurs on how to cope with future pandemics.
One of my sub RQs is: What is the correlation between hotels responses to the pandemic and their occupancy rates?
How can I answer that questions if the hotels responses are based on qualitative data (interviews about entrepreneurial behaviour, e.g. one hotel said that in order to cope with the pandemic they increased their social media presence and improved their online appearance) and the occupancy rates are quantitative data?
Basically, my goal is to support my recommendations - which will be a training manual for hotel management and entrepreneurs on how to cope with future pandemics - by saying Hotel A did this and their occupancy rate increased (I'm obviously aware that correlation doesn't mean causation and this will also be one of the major limitations of my research, only using one hotel KPI).
Best,
Felix
Suggestion: Code your qualitative data into a small number of categories that reflect the main ways hotels cope. Assuming there are several hotels that fall in each category, you could do a one-way analysis of variance to see if the mean occupancy rates differ among the category groups.
• asked a question related to Correlation Analysis
Question
I am trying to find correlation/association between two categorical vectors. I tried using Chi2, but it does not have a predefined upper limit or lower limit. This is when I found Tschuprow's T and Cramer's V. Now, I have many questions in this regard, and I would appreciate all help
1. Are they good informative indicators for the categorical association?
2. Are they reliant on p-values and degrees of freedom similar to Chi2?
3. Are the scores skewed? I read that having Cramer score > 0.25 means it is very strong relation, which is not the case with all other metrics
4. Do they have any preliminary conditions to be applied
5. Can you recommend any other metrics that measure the data association/correlation/dependency for categorical/nominal values?
6. I understand that these metrics rely on contingency table counts. Are there any metrics that use a different method?
I am looking for some metric that is as usable and informative as Pearson correlation or Spearman correlation and having 0-1 limits for the score.
Cramer's V will range from 0 to 1. You can play with some toy data to see how it reacts in different cases. The interpretation for a "large" effect changes depending on how many categories there are (in the dimension with the smaller number of categories). These interpretations are addressed in Cohen, 1988, Statistical Power Analysis for the Behavioral Sciences, 2nd Edition. Some other effect size statistics to consider are Tschuprow's T and Goodman Kruskal lambda.
• asked a question related to Correlation Analysis
Question
I have a data set of particulate concentration (A) and corresponding emission from car (B), factory (C) and soil (D). I have 100 observations of A and corresponding B , C and D. Lets say, there are no other factor is contributing in particulate concentration (A) other than B, C and D. Correlation analysis shows A have linear relationship with B , exponential relationship with C and Logarithmic relationship with D. I want to know which factor is contributing more in concentration of A (Predominant factor). I also want to know if any model can be build like following equations
A = m*A+n*exp(B)+p*Log (C), where m, n and p are constant, from the data-set I have
Maybe you can consider the recursive least squares algorithm (RLS). RLS is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations with the corresponding set of dependent (observed) variables. For the recursive least squares algorithm with forgetting factor (RLS-FF), acquired data is weighted according to its age, with increased weight given to the most recent data.
Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS-FF algorithm to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The power dissipated by agitation was accessed by a torque meter (pilot plant). The proposed (adaptive) control algorithm compared favourably with PID. Simulations assessed the effect of numerically generated white Gaussian noise (2-sigma truncated) and of first order delay. This investigation was reported at (MSc Thesis):
• asked a question related to Correlation Analysis
Question
We have a phylogenetic tree of 16s rRNA sequence of many bacteria, and we also have a phylogenetic tree of protein sequences of those bacteria. Now, how can we correlate these two types of sequence in terms of evaluation?
Hi there，you can assess the Mantel correlation between the phylogenetic distance matrices.
• asked a question related to Correlation Analysis
Question
I am looking forward to perform a Cross-correlation between Pc5 intensity data and Solar Wind Parameters during Pc5 pulsations. Is it necessary to detrend the data before correlating them? Or detrend is not entirely necessary? What is the difference between the Cross-correlation with dtrend and without dtrend?
Generally you should detrend before trying to measure correlation. A classic paper on this subject is by Yule (1926):
• asked a question related to Correlation Analysis
Question
Hello,
I have conducted a survey experiment with two conditions. I want to research whether the perceived physical fitness influences the investment decision of the investor. In the survey I had one 'normal condition' and one 'physically not fit condition'. In SPSS that is: 1=Normal; 2=NotFit. The investment decision is: 1=Yes; 2=No. Is it possible to run a correlation analysis on this to see what the effect of the IV on the DV is? Or is this not possible?
So are the NORMAL and PHYSICALLY FIT conditions something that you randomly allocated respondents to, or are they responses they gave to question? If it is the former then the 2x2 chisq or logistic regression, but if it is the later then like NO for finding if it "influences" the decision. There would be other (presumably unmeasured) variables related to both of these these..
• asked a question related to Correlation Analysis
Question
I have two scenarios, which would you prefer while analyzing your data?
a) Is it all right to enter all predictors in your data set, into multiple regression/logistic regression analysis, that are significant in univariate tests?
This is irrespective of the variable whether they are demographic variables, related to obstetric health (i.e seemingly unrelated into one model).
b) Or would you make separate regression models, grouping together similar set of variables (related), predicting a single dependent variable?
Which approach is better?
There are several methods used to identify the best model for your regression analysis. You may try to do multiple model and see which one has the best goodness of fit and best explains the variability of your outcome with the highest proportion. Regards
• asked a question related to Correlation Analysis
Question
Dear All,
I have these data. Two are continous and one are categorical. I wonder, is there any method to perform a correlation analysis on the impact of two continous data on one categorical data?
The example of the data is attached below.
Well if the DV is really categorical you might try some kind of logistic regression. I have attached some course notes that you may find helpful D.Booth
• asked a question related to Correlation Analysis
Question
I am wanting to compare fossil insect densities with charcoal concentrations and other indexes for disturbance.
My issue is that insect samples were subsampled according to geochemical patterns in the soil and are of varying resilution (between 2 and 8 cm) and also temporal resolution (20 - 120 years). The core where the other indexes from disturbance were counted from is from the same site but was subsampled in 1-cm resolution. We have a hunch that bark beetles and charcoal densities are related, but how to test this statistically, is it even possible? To correlate these insect time slices of varying resolution, with the data points from the other core?
Example insect samples:
S1 0-4 cm 2017 - 2004
S2 4-6 cm 2004 - 1969
S3 6-8 cm 1969 - 1946
S4 8-11 cm 1946 - 1899
S5 11-15 cm 1899 -1841
S6 15-17 cm 1841 - 1806
Finally, my colleague used some analysis in R with the libraries bincor (Polanco-Martinez, 2018), ggplot2 (Wickham, 2016) and trend (Pohlert, 2018). He used a binned correlation sapproach that allows for irregular time series (Mudelsee, 2012). In the calculation, he used the rule based on average spacing as it showed him better performance than MOnte Carlo simulations. We found a significant correlation between charcoal volumes and number of primary bark beetle fossils and are in the process of submitting our manuscript to a fitting journal. Thank you for your repplies and interest to this question. :)
• asked a question related to Correlation Analysis
Question
In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).
From the data analysis sheet, I found the two figures. Which one is selected for your manuscript and why select 1/2?
Interesting.
• asked a question related to Correlation Analysis
Question
Hello everyone,
I have been trying to solve this riddle but I am not getting any closer. I am in desperate need of help.
I want to examine the relationship between two continuous (scaled) variables.
I failed the assumption to use Pearson's as my variables do not show a linear relationship. Therefore I considered Spearman's rho. However! (as I am examining a relationship within a particular group) my sample size is small (n=9), which makes Spearman's useless and my variables do not show a monotonic relationship (however, this could be because my sample size is small..).
After some digging, discovered Kendalls-tau-b and read that it can be used to look at continuous variables and or/ ordinal variables. However, again, Kendalls-tau-b also looks to see whether there is a monotonic relationship (my scatterplot diagram shows non-monotonicity between the two variables) BUT it's a good association analysis to use with small sample sizes.
I honestly do not know what to do. Should I use a Kendalls-tau-b? I am completely at loss.
One year later, I have the same question...
Did you end up knowing the answer?
Thanks!
• asked a question related to Correlation Analysis
Question
We are conducting a study about the correlation of covid-19 psuedoscience theories to the proliferation of COVID-19 positive cases. Which statistical test should we use to correlate the data that we are going to gather from the Covid-19 pseudoscience theories scale (4pt likert scale) to the number of positive cases? (data of the covid-19 positive cases will be from the health departments tally). Thank you for those who could help us in advanced. ☺️
• asked a question related to Correlation Analysis
Question
Hello,
My IV is personality with 5 levels. My DV is reaction time with 4 levels. I ran a Pearson correlation analysis and ended up with 20 results. Only 1 result is statistically significant. My question is: in my results section, do I report the significant result and say that the rest are not significant? Or do I have to report every single result in apa style (do it 20 times since there are 20 correlations)? I was thinking about creating a table to display the results as well. What would be the best way to report my results? Thank you!
Hello Agata,
If it made sense to evaluate each of the 20 combinations, then ideally, you should report each of the 20 correlations. One table should allow for both correlations, and summary statistics (e.g., mean & SD) to be reported easily. It's understandable that people generally tend to spend more time discussing the "significant" results than the non-significant results (however determined).
As for interpretation, I'd be cautious about over-emphasizing one significant relationship among 20 coefficients. If each was tested for statistical significance at the .05 level, then, by chance alone, you'd expect about 1 significant correlation out of every 20 obtained even if the population values were all zero.
• asked a question related to Correlation Analysis
Question
Hello everyone! I want to calculate the correlation between two variables, but the numbers of their samples are different. N1=45 and N2=20. Does anyone know how to solve this problem? Thanks！
It would help to know a bit more about the problem. I need to make assumptions in order to develop an answer.
Say you grew wheat. At grain filling stage you picked some heads and analyzed for three enzymes, and 4 nutrients. These tests are destructive, but on neighboring plants you let the heads mature and gathered yield data. Within each plot you could have taken subsamples, and then average the subsamples to remove some of the within-plot variability and pair the plot averages to run a correlation. Just document how this was done in the methods.
If the samples were measured on the same experimental units but there are missing values, the simplest approach is to remove the experimental units that have missing values. The problem is that if these missing experimental units have something in common then you have biased your results. You could try imputation if you have enough other information, but there are problems with imputing over half of your data. Here too there is a problem if all of your imputed values are from one subpopulation.
• asked a question related to Correlation Analysis
Question
Hello. I hope someone can help me with figuring out the latest approaches to doing a survey study. I have a couple of questions as following:
1. What would be a more advanced level of doing analysis for a survey study beyond descriptive statistical analysis and correlational analysis?
In particular, how would one decide between regression, factor analysis, decision tree analysis, and/or cluster analysis? Or something else?
2. What if the data set has many missing values...I read threads about the recommendations but I lack the knowledge and I hope to know the easiest way possible without losing N size. In fact, I'm tempted to the mean substitution despite the bad reputation regarding it.
Ultimately, if the data set can't be managed properly, I guess I should just report the descriptive analysis and the correlational findings...Would this still be publishable, if this is the new-kind topic? Any recommendation for books to read will be a great help. Alternately, if any gold-standard survey study written as an example could be useful too.
Any thoughts would be much appreciated.
Thank you so very much!
1. What is your research question?
2. A book that might help you is ‘using multivariate statistics‘ by Tabashnick and Fidell.
• asked a question related to Correlation Analysis
Question
In a correlation analysis between two variables sign was –22 but in a multiple linear regression due to influence of other variables it became 21. Conceptually it must be a negative correlation.
The change of sign depends on the presence of partial correlations between regressors and dpendent variables, this happens when regressors are not mutually independent, see: https://en.wikipedia.org/wiki/Partial_correlation.
• asked a question related to Correlation Analysis
Question
Dear colleagues, how do you think about applying Pearson's R or Spearman's Rho correlation analysis on the panel data? Is possible to meaningfully interpret the results? Do you know any study or research that would fit my question? I highly appreciate your help.
Best
Ibrahim
Zeynep Köylü Thank you!
Best
Ibrahim
• asked a question related to Correlation Analysis
Question
I need to understand the most influential factor in ANN, however this easily can be done in simple correlation analysis, but anyone can suggest a method in the ANN model itself?
Hi;
You may find your answer by having a look at Table 2 of this paper: 10.1007/s10346-009-0183-2
Also, you need to read the relevant explanations in the body to figure it out.
Hope it helps.
• asked a question related to Correlation Analysis
Question
I am analysing the following data and am not sure how to proceed with the statistics, so any help will be highly appreciated.
The data consists of a decent number of patients who underwent a procedure. They have been asked to fill out a questionnaire in which they were to rank their satisfaction on a Likert like scale (1-5). Also, a medical assistant rated the usability of the device during the procedure per patient (also Likert scale). Later, the product was also rated on a Likert scale in terms of quality, performance and such.
I have tested for normality and the data is mostly not normally distributed. I have correlated Spearman rho and found that some variables were weakly to moderatly correlated among each other and with variables like weight and age.
However, I am unsure how to further report the data in a statistically correct manner. Usually, a Kruskal Wallis test etc. would be performed to show that groups are significantly different, and initially I tried to do that. But I am not sure this even makes sense with my data (as I don't really have groups to begin with, but rather a number of seperate cases that rated something; we did not compare the procedure to other procedures nor form subgroups of patients).
However, it seems I am missing something.
What we want to show with the data is that 1) the patients are generally satisfied with the product, quality is high etc (which can be determined simply by descriptive statistics),
2) that satisfaction, usability and such are not independent from each other, meaning, e.g. procedures that had high ratings for performance should also have high ratings for quality (is this correctly demonstrated with the correlation via Spearman test and how can I plot the correlations on a graph? For parametric data it seems that usually a scatterplot between two variables would be used, but can I do the same if I used Spearmans rho?
and 3) that the quality of the procedure is linked to the other variables. Should I perform ordinal regression for this? Or is this already covered by the correlations?
Is the Kruskal Wallis test needed/ Can I even use it on my sample?
Thank you so much!
If you know a little R this website looks very interesting
It illustrates how to get tab plots , heat maps, fluctuation plots, spine plots, etc.
• asked a question related to Correlation Analysis
Question
I have conducted regularized canonical correlation analysis using MixOmics package in R studio because the number of variables in my data are more than that number of observations. I now want conduct commonality analysis using "yhat" package but I have been getting error and I realized the rcc output is slightly different from CCA output. Can anyone help me on how to go about it please?
Anytime n<p potential problems arise. Be very careful here. I assume regularized means your estimator is like ridge or lasso, etc. I f not think about using something like them. Perhaps you could start with this link: Computer Age Statistical Inference: Algorithms, Evidence, and Data Science | Bradley Efron, Trevor Hastie | download (b-ok.cc) These guys invented a lot of this stuff. best< D. Booth
• asked a question related to Correlation Analysis
Question
A signal is split into two parts and one of them is going through a filter (say, with a transfer function H(f)) and the other part stays unchanged. Then I want to know how to calculate their cross correlation function. My guess is, given the spectral density function S(f), it will be the ordinary Wiener-Khinchine theorem with an addition of the transfer function: R=Integral{S(f)H(f)*exp(i*2*pi*f*t)df}
Agreeing with Pascal Salart, but making it simple. One signal is x(t), the other is y(t), you study the covariance matrix E(x(t+i) y(t+j)), which can be estimated by low pass filtering vectors v(t, i) = (x(t+i-N), ... x(t+i))
and w(t, j) =(y(t+j-N),... y(t+j)) with N a length of observation window,
then the scalar product <v(t, i), w(t, j)> estimates the mean expectation E() above.
The matrix obtained is a Gram matrix, hence it diagonalises with eigenvectors, eigenvalues obtained by Gram-Schmidt algorithm. QED...
Ok ?
• asked a question related to Correlation Analysis
Question
Hi,
I am utilizing a questionnaire to measure self reported technology proficiency of individuals. This particular questionnaire has 34 items representing 6 sub scales (Containing 5, 5, 5, 5, 8, and 6 items respectively). I have a single output variable also. I am trying to make a correlation table with these 6 sub scales against the output variable and check for significance. My questions is what is correct approach to do this. As it has two layers of variable (6 dimensions representing 34 items) and i am trying to map them against one output variable. How can an aggregate correlation be calculated?
Hi Daniel Wright , sorry if i wasnt clear. Attached is the screenshot of the reference study. I am trying to repeat this calculation, although in a different context. Notice the authors have calculated correlation of 6 subscales from a questionnaire and checked if they have correlation with 3 different outcome variables. My question is - how we can do this practically, as each of these 6 subscales have in turn 5 to 8 questions under each of them. Hope it is more clear now
• asked a question related to Correlation Analysis
Question
Hi, These are the outputs generated by a system. Green-colored (nr) is desired output, while the system generates Red-colored (nf). I also have a dataset in CSV file. For this waveform, What would be an appropriate statistical and quantitative way to assess the correlation and/or similarity between the data-sets above? I want a single number to summarize the similarity in terms of shape? Or any other method that indicates these are similar.
It's a classic case use for ARIMA, a regression method that allows accounting for repeated measurements along time. You can find insights here:
• asked a question related to Correlation Analysis
Question
Is the excel sheet ok or I need econometrics tools such as SPSS and MATLAB?
Yes, I think Excel does a lot of cool stuff including correlations and multiple regression. Here in this attached link the process of using the excel (French version) to calculate the Pearson correlation coefficient.
About multiple and simple regressions, Excel also do them. For multiple linear regression:
Once XLSTAT has been launched, choose the XLSTAT / Modeling / Linear regression command. Once the button is clicked, the dialog box corresponding to the regression appears. In the General tab, you can then select the data on the Excel sheet. Here are corresponding links (French versions) both for simple linear regression and multiple regression:
Of course, other softwares are interesting for correlations and multiple regressions like STATISTICA, Minitab, SAS and others.
• asked a question related to Correlation Analysis
Question
I have two time series variables, X and Y.  and I want to use cross correlation analysis to see the relationship between them. The software is SPSS 21.0. when cross correlation analysis was used, what is the first selected variable? and how to explain the positive lags and negative lags?
If the selected order of variable is chaged, so the significance cross correlation changed too (from positive lags to negative lags, or from negative to positive).
Thank you very much
• asked a question related to Correlation Analysis
Question
I am writing a piece of code in the Syntax of SPSSstatistics. The code is the following.
GET DATA /TYPE=XLSX
/FILE='file'
/SHEET=name 'Sheet2'
/CELLRANGE=full
/ASSUMEDSTRWIDTH=32767.
EXECUTE.
DATASET NAME DataSet1 WINDOW=FRONT.
RANK VARIABLES= WA CI (D) /RANK /PRINT=YES /TIES=MEAN.
SELECT IF( RWA EQ '1' AND RCI EQ '1').
The problem is at the last line of the above code. SPSS selects the right data and then, SPSS creates two variables called RWA and RCI. SO far, things are fine. Then, SPSS should select cases only when the two variables assume value 1 and 1 simultaneously. Unfortunately it doesn't work. Am I wrong somewhere?
Thanks. Francesco
Francesco Ciardiello maybe you need to use SELECT IF( RWA EQ 1 AND RCI EQ 1) without the quotes
• asked a question related to Correlation Analysis
Question
I'm working on the virulence gene profile (sefA, pefA and lpfA genes) and the antimicrobial resistance gene profile (bla-CTX-M, bla-SHV and bla-TEM genes) of 95 Salmonella enterica samples.So basically for each sample, I have scored them for the presence or absence of each genes. Could you suggest the best stat analysis to correlate the presence of virulence and resistance genes? I'm thinking of using Fisher's Exact test for pairwise comparison (3 virulence genes x 3 AMR genes). Thank you so much!
Yap, Sir Eric ha a good points to your question, Sir Jason, and that's the possible evaluation to be done in correlation between the two genes.
• asked a question related to Correlation Analysis
Question
I am working to improve a manuscript and I have been advised to provide a map showing where the correlation is significant. Any information or links to learn about this would be quite helpful. Thank you in anticipation!
What is asked to you is to create a spot map showing the areas with significant correlation. You can use specific color for the areas to show whether they have significant r or not? Say, all areas with significant r to shown as red while those areas with non-significant r may be shown as green. I presume you know the calculation of r and testing of it for significance?
• asked a question related to Correlation Analysis
Question
Dear experts,
I am going to analyze the correlation between the two items. Some mutual variables affect these items.
By which statistical method I can understand which variable has a stronger impact on the correlation between the items?
Thank you so much,
Majid