Science topic
Correlation Analysis - Science topic
Explore the latest questions and answers in Correlation Analysis, and find Correlation Analysis experts.
Questions related to Correlation Analysis
I have a dataset for 500 participants and their total sleep time for a period of 10 days, some of these participants do not have data for all the days, e.g. 10 of them have data only for 4 days instead of 10. Can I perform a correlation analysis on this data for sleep time and temperature, without excluding these participants? What other methods can I use for this analysis? Can I use linear regression to explain the effect of temperature on sleep?
I have to run a pearson correlation analysis between HPA axis markers- cortisol , ACTH and inflammatory marker IL-6(analysed through ELISA) and Depression anxiety questionnaires which were recorded at baseline and post interventions. I also need check the data for normality before running the analysis. If someone can provide an appropriate suggestion.
To calculate the sample size, I need to assume the smallest "r" with clinical relevance (error = 5% and power = 80%). My question is, which is that value? I searched for other papers to interpret correlation strength properly, but I had no results. Could you please suggest any?
why there are some using confirmatory factor analysis?
i am analysing the associations between social sustainability practices (62 observed indicators) that are grouped under 8 social categories, and sustainability enablers (10 observed indicators)
it is theoretically hypothesised that there is positive correlation between social sustainability practices and the enablers.
i'am interested in knowing the observed indicators and drawing conclusion based on that, i do not want to draw conclusions based on constructs.
can i use canonical correlation analysis (CCA) to analyse the strength of associations between 62 social indicator (dependent set) and another set of the 10 enablers ( independent set)?
or is it better to use structural equation modelling (SEM) and analyse path coefficients from independent variable (enablers) ----------------> 8 social categories each forming a latent dependent variable?
I have come across some theses which perform ANOVA, independent sample t-test, or multi-way ANOVA after performing factor analysis.
Most of the theses make use of Likert Scales (between 5 and 7). I am wondering about the basis of analysing data using the above-identified, which I understand should be performed on data that follow a normal distribution. With Likert Scale, data analysis should not use the mean. What would justify the use of one-way ANOVA instead of the Kruskal-Wallis H test and others such as the Mann-Whitney test?
Also, before performing data analysis after factor analysis, how are observed variables computed into new factors? (sum / average or another method).
Hi everyone,
I tried to run a Point Biserial Correlation with one continuous variable and several dummy coded nominal variables however, my continuous (dependent) variable violated the normality assumption.
Are there any alternatives for assessing the correlation between one continuous dependent variable and several dummy coded nominal variables?
Thank you!
In my research, I used 7-point Likert scale for measuring situation and agreement. They are used and coded respectively as below:
Situation:
Far less = 1
Moderately less = 2
Slightly less = 3
Almost the same = 4
Slightly more = 5
Moderately more = 6
Far more = 7
Agreement
Strongly disagree = 1
Disagree = 2
Slightly disagree = 3
Neutral = 4
Slightly agree = 5
Agree = 6
Strongly agree = 7
My problem is the questionnaires were distributed to two opposite groups (i.e. claimant and defendant).
When measuring the situation, I asked the respondents about a comparative degree that most accurately describes their positions.
For example, I asked the claimant if they had less/ almost the same/ more resource than the defendant and vice versa. If the claimant chooses less resource, which is equivalent to the defendant choosing more resource. Therefore, I tried to code the data as follows:
Far less/ more = extremely asymmetric= 1
Moderately less/ more = moderately asymmetric = 3
Slightly less/ more = slightly asymmetric = 5
Almost the same = symmetrical= 7
May I ask if it is appropriate for me to convert and interpret the data like the above? Are there other ways that can help me to better analyse the data?
Thanks
Hey, everyone,
Think about this:
A is positively correlated with B;
A is also positively correlated with C;
BUT, B is negatively correlated with C.
How to explain this result?
From the correlation analysis, in the past, it is estimated that Vsw has quite a good correlation with IMF-Bz during the geomagnetic storm events; however, for some intense events, quite less correlation between Vsw and IMF-Bz was also observed, what could be the possible factors behind this?
In the file that I attached below there is a line upon the theta(1) coefficient and another one exactly below C(9). In addition, what is this number below C(9)? There is no description
Hi!
Here have two variables, A and B. The Pearson correlations coefficients table showed that they had no significant correlational relationship between A and B. Then, is it redundant to do a further regression study between A and B?
I have a dataset composed of aphid and parasitoid abundances captured in Moericke traps on a monthly scale for 10 years. As I do not have data on parasitism, but on the occurrence of aphids and parasitoids, I cannot use common trophic networks. In this way, I think I could explore some community-level relationships through correlation-based networks. However, I would like to know if there is any impediment to using this approach or if anyone has already used it.
Grateful!
I am following the way how a previous paper (PMID: 30948552) treating their spatial transcriptomic (ST) data. It seems like they combined all expression matrix (not mentioned whether normalized or log transformed) of different conditions, and calculate a gene-gene similarity matrix (by Pearson rather than Spearman), and they finally got some gene modules (clustered by L1 norm and average linkage) with different expression between conditions.
So I have several combination of methods to imitate their workflow.
For expression matrix, I have two choice. The first one is a merged count matrix from different conditions. The second one is a normalized data matrix (default by NormalizeData function in seurat, log((count/total count of spot)*10000+1)). For correlation, I have used spearman or pearson to calculate a correlation matrix.
But, I got stuck.
When I use a count matrix, no matter which correlation method, I get a heatmap with mostly positive value pattern, which looks strange. And for a normalized data matrix (only pearson calculated), I got a heatmap with sparse pattern, which is indescribably strange too.
My questions:
- Which combinations of data and method should I use?
- Would this workflow weaken the correlation of the genes since some may have correlations only in specific condition?
- Whatever you think of my work?
Looking forward to your reply!
Is it possible to run a correlation test on a continuous DV and a categorical IV with 3 levels?
I'm investigating is gender is associated with academic procrastination, however, my gender variable is coded as: 0 = Male; 1 = Female; 2 = Non-Binary.
Initially I ran a Pearson product-moment correlation coefficient test however, I have now realised that this may not be the right procedure.
Any help would be greatly appreciated!
I have categorical variables and I want to test the relationships between categorical variables of two sets. I have the SPSS 22 version, I can't find how to make the nonlinear canonical correlation analysis.
Hello, I have a question regarding an investigation of Reuter et al. (2019)
Where I have managed to extract the questions (5 scale likert) that I detail below
Attitude
- Fake news poses a threat
- Fake news can manipulate the population's opinion
- It's the task of the operators of Facebook, Twitter and Co. to prevent fake news
- Fake news can manipulate the opinion of politicians, journalists and other influential players
- Fake news harms the democracy
- It's the state's task to prevent fake news
- Social bots pose a threat
- The state censorship poses a threat
- Fake news is just a pretext to be able to fight system critical actors
- Fake news is at most annoying, but does not pose a threat
Interaction
- I have perceived fake news
- I have deleted/reported fake news
- I have dislikes fake news
- I have commented on fake news
- I have liked/disliked fake news
- I have shared fake news
- I have created fake news
Since this research has not named its questionnaire and it does not yet have dimensions. Can I take it as a reference in my research?
Of course, the issue later will be validation and reliability. Can you please guide me if there is already a defined questionnaire for these variables (attitude, interaction)?
Excuse me if this is a question with an obvious answer (I am a MSC student not a professional researcher). I am exploring the the correlates between a number of variables (different social skills with anxiety, age and cognitive factors). I also wanted to see if there are gender differences between these variables/correlations.
Comparison of means between genders shows no significant difference between males and females for any variable.
If I enter gender as a variable and explore correlations with other variables there are also no significant correlations.
However, when I use the split-group function (by gender) on SPSS and run my correlation analysis for all other variables (not inc. gender)- some correlations are significant just for male participants and some for female. I just want to check that this makes sense (I think it does). I imagine that by using the split-group function this in some ways works a bit like using gender as a moderator? In that I can see how gender impacts the relationship between certain variables? (e.g x correlates with y but only in group a not group b). So I believe I have found that although there are no differences in mean scores between groups (gender), gender does have an interaction effect on the correlation between some variables.
Any tips on how to report on this? and does this seem correct? would it be better to actually just run a moderation analysis?
When one conduct a Canonical correlation analysis many function are extracted (the lower data set n) the First function has high correlation and contribute much to the variation. While the other have very low contribution. Can the researcher select the function who sum 90 percent variance.
Hi everyone,
I am using two scales, previous research caculated the reliability of the BFI-2-S, they caculated the average cronbach's alpha of subscales (eg. (Extraversion + Agreeableness...)/5).
But in another scale, previous research caculate all items cronbach's alpha only once, and they caculated correlation between all subscales.
1. What's different in the two approaches?
2. How much correlation is appropriate? I didn't find relevant literature.
Thanks for your professional help!
I would like to study correlation between four transcripts (fold changes of mRNA expression) at different time intervals (5 time points). How can I perform this analysis?
We performed a correlation analysis of solar wind speed (Vsw) with z-component of interplanetary magnetic field (Bz) during geomagnetic storm events with different strengths. We found varying results: a bit larger correlation coefficient for some events, but also a moderate association during some events. What sort of results can we expect?
If for example, I have a large enough dataset with multiple input variables and one target. But it is unknown whether the input variable(s) are correlated with the target or not. Is there any way to quantitatively analyze if the target has a correlation with the input variable(s) only from the data points?
For example, I have three independent variables x,y, and z. And dependent variable (target) is r. Here for the purpose of demonstration, the (x,y) is known to be (m*cos(m)/c,m*sin(m)/c). This is a function of a spiral in a 2D space, where the m is an array of points and c is a constant. (Figure is attached) The target variable r is the distance of the (x,y) points from the origin (0,0) in the 2D cartesian space.
The independent variable z is said to have uniform random values and has no relation with the target variable r.
The values of the Pearson's r for an independent variable and the target are found to be
r_x,r = 0.03250883308649153
r_y,r = -0.10980064148604964
r_z,r = -0.17896621141606622
Now to be specific, my question becomes is there any quantitative way to observe that the x and y variables together have a correlation to the target variable, and variable z has no correlation with the target r?
I have to find the correlation and causality between participants’ smartphone uses pattern and loneliness. For that, i have the dependent variable loneliness (both baseline
(0 to 80) and daily: in an 0 to 10 numeric scale), and independent variables listed from daily smartphone uses screen time, spend time in several categorized apps like social media, entertainment, communication, physical activity, sleep, and some social variables and moods like how much the person felt bored, anxious, satisfied, productive, etc. I will collect these independent variables for 10 days, and the psychosocial variables will be collected three times a day. I also have to use a couple of prediction algorithms for predicting loneliness with the above-mentioned independent variables. But first, I have to find the correlation and causality between the dependent and independent variables. I just have programming experience until now with python, no statistics knowledge. I have about 9 months remaining to complete my thesis. Please share your valuable guideline and resources on how can I proceed with this challenge where I can get the most meaningful results.
The aim of my research is to analyse the correlation between two delta values (change between two timepoints) via regression analysis.
Let the variables be X, Y, and Z, and t0 represent pre-intervention and t1 represent post-intervention. X is a psychometric value (Visual Analogue Scale ranging from 0 to 100), Y and Z are biological values.
For example, I want to calculate the correlation between delta (Xt1 - Xt0), delta (Yt1 - Yt0), and delta (Zt1 - Zt0).
I am aware that delta value is statistically inefficient, therefore Pearson's correlation or Spearman's correlation is out. I would appreciate any advice or any model examples. Thanks!
Dear colleagues.
Hello, I am currently studying brain connectivity in the disease group.
Recently, I constructed a connectivity matrix from each participants' neuroimaging data (diffusion tensor imaging), then ran the edge-wise correlational analysis with the neuropsychological score using a tool similar to Network-based statistics (a.k.a NBS; Zalesky et al., 2010).
As a result, I got an edge-level network consisting of 10 edges with 19 nodes.
Those significantly denoted edges are not connected to each other but identified as single edges.
Conventionally, I used graph theoretical measures such as a degree or betweenness centrality for defining hub node (i.e. hub region = betweenness centrality + 1SD within the nodes of the network).
However, in this case, I have an edge-level network that is hard to say is clustered or connected but consisted of multiple single edges.
From here, I want to specifically emphasize more significant edges or nodes within the identified network as a hub region (well it is hard to say it is a hub, but at least for easy comprehension), but I am quite struggling with what approach to take.
All discussion and suggestions are welcomed here.
Or if I am misunderstanding any, please give me feedback or comments.
Thank you in advance!
Jean
I conducted ADF test and found that one out of my four variables was stationary at the level form. The stationary variable at the level form was also stationary in the first difference form. Do I need to get all the variables to the same level form, I(1), before implementing the Pearson's Moment correlation analysis?
I want to run a correlation matrix analysis between multiple response variables. The data was obtained from the operation of a continuous bioreactor over a 330 days period, but the operating conditions were changed every three months. I am not sure if I have to run a correlation test for each operational strategy (i.e. the condition that was changed every three months - in this case was the concentration of a contaminant) or if I can run a single test for the whole operation of the reactor.
The results make more sense if I run a single test, but I am not sure if I can do that assuming the conditions were changed. But I know that, for environmental analyses as whole, it is quite common to run correlation analysis even when there are other factors changing over time.
Why is the importance of analyzing and conducting research on the correlation between the stock market valuation of securities (stocks, bonds, etc.) and the economic and financial situation of business entities growing?
In recent years, the importance of examining the correlation between stock market valuation of securities (stocks, bonds, etc.) and the economic and financial situation of business entities is growing, because there are more and more anomalies and speculations on capital markets, which may reduce these correlations. The importance of studying this issue is growing particularly when in certain listed markets valuations of certain financial assets and instruments are less and less related to economic fundamentals and the significance of speculation is growing. If such a lack of correlation between the stock market valuation and the economic and financial situation of business entities is growing, then the financial and / or economic crisis may occur. Such a situation of increase in speculative factors on stock exchanges and commodity exchange commodity markets appeared in 2006-2008 and contributed to the global financial crisis in 2008.
Do you agree with me on the above matter?
In the context of the above issues, I am asking you the following question:
Why is the importance of analyzing and conducting research on the correlation between the stock market valuation of securities (stocks, bonds, etc.) and the economic and financial situation of business entities growing?
Please reply
I invite you to the discussion
Thank you very much
Best wishes
Hello,
I am doing my MSc dissertation research on mental health. The aim of my project is to study the association between children's mental health and their parents' mental health, before and during Covid-19.
I am using one of the UK Data Services datasets. The datasets are 4 waves (one before and three during Covid-19), each wave has three datasets (Adult dataset, children dataset and youth dataset).
The mental health of children has been assessed using the Strength and Difficulties Questionnaire (SDQ) and the parents' mental health has been evaluated using the General Health Questionnaire (GHQ). SDQ and GHQ are both give me scores that indicate mental health.
When I merge the three datasets of each wave alone, I end up with SDQ just for children and GHQ just for adults and missing in both. Therefore, I couldn't run the analysis.
My questions are:
How to merge these datasets from all four waves in one dataset, probably?
What the suitable statical test (Pearson's correlation for each wave alone or One Way ANOVA)?
Thank you so much
I've completed 3 Pearsons correlations for the following 2-tailed hypotheses:
(H1):A significant correlation between total NCS scores and total MMQ-Contentment score exists
(H2):A significant correlation between total NCS scores and total MMQ-Ability score exists (H3):A significant correlation between total NCS scores and total MMQ-Strategy score exists
And found r values of 0.2 (p = .004), 0.3 ( p <.001) and 0.13 (p = .069) respectively.
I've also collected data on participant age ranges (e.g. "30-35") and level of education ("Master's degree") and would like to test if either of these are covariates.
What is/are the most appropriate statistical test(s)?
I developed Cox PH model with time-dependent covariates to predict probability of default. One of these covariates (unemployment) is non-stationary and its coefficient (in exponential form) is enormous, particularly - 2.518e+32; I can't guess if this value is due to non-stationarity. I am interested if non-stationarity can trigger such a big coefficient?
p.s. This variable was important according to several variable selection techniques, such as Information Value, Xgboost variable importance plot, correlation analysis.
Hi!
I have run a canonical correlation analysis on my data and want to display it for a poster as a plot but am having trouble finding anywhere online describing how to do this in SPSS 27. Any advice would be appreciated!
Hi Fellows,
The matrix is here at the bottom: https://statweb.stanford.edu/~jtaylo/courses/stats202/visualization.html. A similar version is seen on the book Introduction to Data Mining. It's clear that colours toward the red end indicate stronger correlation, but what attributes or variables are really correlated as shown? For example, along the main diagonal, cases of the same species show mostly perfect correlation, with a few near-perfect occurrences. Normally, a correlation is calculated with two columns of values, not two single cases.
Thanks
RP
Hi,
I'm testing the relationship between dividends and share repurchases, using panel regression. I'm trying to include correlation analysis and the results are pretty much similar for all variables, which is that they are not correlated very much. The correlation coefficient is between -0.1 and 0.1 on most of them or the p value is much higher than 0.05. Is this an expected outcome for variables used in fixed effects model? (I've done the Hausman test, which suggested use of the fixed effects model). I'm just wondering, whether this is not some bad sign that the model is bad.
I performed a growth performance experiment of microalgae with four treatment. Where I measure cell dry weight (unit: mg/L), cell density (unit: ×10^5 cells/ml), chlorophyll a (unit: µg/ml) and Beta carotene (unit: µg/ml) content of the microalgae. Reviewer suggests me to analyze correlation of cell growth/size with the pigment content (Chlorophyll a and Beta carotene) of microalgae. So, how can I measure correlation of cell growth/size with the pigment content (Chlorophyll a and Beta carotene) of microalgae using Microsoft Excel or other suitable data analysis software? Thank you.
I want to investigate the relationship between differences in coral physiological variables based on euclidean distances and seawater environmental variables using DISTLM and dbRDA in PRIMER, but I am not sure if this analysis is suitable given the lack of replication I have in my predictor variable (environmental) matrix.
I have attached an excel file illustrating the structure of my data set (the response and predictor variables). Briefly, I have a multivariate data set of measured physiological variables (e.g. lipid concentration, protein concentration, tissue biomass etc.) for corals collected from five different locations (A-E), where each site is very unique in its seawater physico-chemical parameters. I collected 12 corals per site (total of 60 samples). I have constructed a resemblance matrix of the physiological data in PRIMER based on Euclidean distances, and there is clear grouping of data points in the NMDS, which coincides with the different collection sites for each coral. I want to investigate the proportion of the observed variation in the multivariate data cloud that can be explained by the environmental characteristics of each collection site (e.g. mean annual sea surface temperature, seawater chlorophyll concentration, salinity etc.). However, the dataset of environmental variables does not have replication. i.e. for each site (A-E), I only have one value for mean annual sea surface temp, one value of salinity etc.
All of the case-study examples I have read about distance-based redundancy analysis in R or PRIMER have two resemblance matrices (predictor and response) both of which have replication. However, in my case, my response variables have replication (i.e. 12 samples per site), whereas my environmental variables do not have replication (i.e. one measurement per variable per site).
Can someone advise me whether or not dbRDA is suitable in this instance? If as I predict, it is not suitable, can you recommend a better approach? I am not an expert in multivariate statistics, but I want to make sure that the approach I take is sound.
Any and all advice is welcome. Thanks
Hello Everyone,
I am currently supporting a research work that calculates correlation between two variables for an experimental group and comparing the correlation coefficient to that of control with known correlation coefficient. The theory is that the correlation of the two groups are the same.
How do I determine when sufficient data has been collected for the experimental group?
Note: when the data study started, Power analysis was not done, so effect size was not known.
I want to use bootstrapping but not sure if it is the right approach. Thanks for your response.
Relationship between conducted scientific research and innovations
Innovations can be the result of conducted research and scientific research.
Research work may concern, for example, defining, developing and planning the process of implementing innovative technologies in production processes or determining the potential industrial application of new, innovative types of materials, eg in the field of organic, biodegradable materials replacing hard-degradable plastics.
Research work is a process that is to lead to the implementation of specific research, analytical, research and implementation goals, the development of new solutions, formulas, laws, dependencies, correlations, inventions, patents and sometimes also innovations. Research work requires financial outlays, knowledge, and human resources of educated, experienced research staff. On the other hand, innovations are a new added value created, the value of which is determined by the possibilities of using a particular innovation in manufacturing processes, production or offering services. Innovation is a kind of product of previously conducted research.
I invite you to the discussion
Hello everyone,
I'm currenlty writing my bachelor's thesis about the impacts of COVID-19 on the hotel sector in Germany, with the goal of developing a training manual for hotel management and entrepreneurs on how to cope with future pandemics.
One of my sub RQs is: What is the correlation between hotels responses to the pandemic and their occupancy rates?
How can I answer that questions if the hotels responses are based on qualitative data (interviews about entrepreneurial behaviour, e.g. one hotel said that in order to cope with the pandemic they increased their social media presence and improved their online appearance) and the occupancy rates are quantitative data?
Basically, my goal is to support my recommendations - which will be a training manual for hotel management and entrepreneurs on how to cope with future pandemics - by saying Hotel A did this and their occupancy rate increased (I'm obviously aware that correlation doesn't mean causation and this will also be one of the major limitations of my research, only using one hotel KPI).
Best,
Felix
I am trying to find correlation/association between two categorical vectors. I tried using Chi2, but it does not have a predefined upper limit or lower limit. This is when I found Tschuprow's T and Cramer's V. Now, I have many questions in this regard, and I would appreciate all help
- Are they good informative indicators for the categorical association?
- Are they reliant on p-values and degrees of freedom similar to Chi2?
- Are the scores skewed? I read that having Cramer score > 0.25 means it is very strong relation, which is not the case with all other metrics
- Do they have any preliminary conditions to be applied
- Can you recommend any other metrics that measure the data association/correlation/dependency for categorical/nominal values?
- I understand that these metrics rely on contingency table counts. Are there any metrics that use a different method?
I am looking for some metric that is as usable and informative as Pearson correlation or Spearman correlation and having 0-1 limits for the score.
I have a data set of particulate concentration (A) and corresponding emission from car (B), factory (C) and soil (D). I have 100 observations of A and corresponding B , C and D. Lets say, there are no other factor is contributing in particulate concentration (A) other than B, C and D. Correlation analysis shows A have linear relationship with B , exponential relationship with C and Logarithmic relationship with D. I want to know which factor is contributing more in concentration of A (Predominant factor). I also want to know if any model can be build like following equations
A = m*A+n*exp(B)+p*Log (C), where m, n and p are constant, from the data-set I have
We have a phylogenetic tree of 16s rRNA sequence of many bacteria, and we also have a phylogenetic tree of protein sequences of those bacteria. Now, how can we correlate these two types of sequence in terms of evaluation?
I am looking forward to perform a Cross-correlation between Pc5 intensity data and Solar Wind Parameters during Pc5 pulsations. Is it necessary to detrend the data before correlating them? Or detrend is not entirely necessary? What is the difference between the Cross-correlation with dtrend and without dtrend?
Hello,
I have conducted a survey experiment with two conditions. I want to research whether the perceived physical fitness influences the investment decision of the investor. In the survey I had one 'normal condition' and one 'physically not fit condition'. In SPSS that is: 1=Normal; 2=NotFit. The investment decision is: 1=Yes; 2=No. Is it possible to run a correlation analysis on this to see what the effect of the IV on the DV is? Or is this not possible?
I have two scenarios, which would you prefer while analyzing your data?
a) Is it all right to enter all predictors in your data set, into multiple regression/logistic regression analysis, that are significant in univariate tests?
This is irrespective of the variable whether they are demographic variables, related to obstetric health (i.e seemingly unrelated into one model).
b) Or would you make separate regression models, grouping together similar set of variables (related), predicting a single dependent variable?
Which approach is better?
Dear All,
I have these data. Two are continous and one are categorical. I wonder, is there any method to perform a correlation analysis on the impact of two continous data on one categorical data?
The example of the data is attached below.
I am wanting to compare fossil insect densities with charcoal concentrations and other indexes for disturbance.
My issue is that insect samples were subsampled according to geochemical patterns in the soil and are of varying resilution (between 2 and 8 cm) and also temporal resolution (20 - 120 years). The core where the other indexes from disturbance were counted from is from the same site but was subsampled in 1-cm resolution. We have a hunch that bark beetles and charcoal densities are related, but how to test this statistically, is it even possible? To correlate these insect time slices of varying resolution, with the data points from the other core?
Example insect samples:
S1 0-4 cm 2017 - 2004
S2 4-6 cm 2004 - 1969
S3 6-8 cm 1969 - 1946
S4 8-11 cm 1946 - 1899
S5 11-15 cm 1899 -1841
S6 15-17 cm 1841 - 1806
In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).
From the data analysis sheet, I found the two figures. Which one is selected for your manuscript and why select 1/2?
Hello everyone,
I have been trying to solve this riddle but I am not getting any closer. I am in desperate need of help.
I want to examine the relationship between two continuous (scaled) variables.
I failed the assumption to use Pearson's as my variables do not show a linear relationship. Therefore I considered Spearman's rho. However! (as I am examining a relationship within a particular group) my sample size is small (n=9), which makes Spearman's useless and my variables do not show a monotonic relationship (however, this could be because my sample size is small..).
After some digging, discovered Kendalls-tau-b and read that it can be used to look at continuous variables and or/ ordinal variables. However, again, Kendalls-tau-b also looks to see whether there is a monotonic relationship (my scatterplot diagram shows non-monotonicity between the two variables) BUT it's a good association analysis to use with small sample sizes.
I honestly do not know what to do. Should I use a Kendalls-tau-b? I am completely at loss.
We are conducting a study about the correlation of covid-19 psuedoscience theories to the proliferation of COVID-19 positive cases. Which statistical test should we use to correlate the data that we are going to gather from the Covid-19 pseudoscience theories scale (4pt likert scale) to the number of positive cases? (data of the covid-19 positive cases will be from the health departments tally). Thank you for those who could help us in advanced. ☺️
Hello,
My IV is personality with 5 levels. My DV is reaction time with 4 levels. I ran a Pearson correlation analysis and ended up with 20 results. Only 1 result is statistically significant. My question is: in my results section, do I report the significant result and say that the rest are not significant? Or do I have to report every single result in apa style (do it 20 times since there are 20 correlations)? I was thinking about creating a table to display the results as well. What would be the best way to report my results? Thank you!
Hello everyone! I want to calculate the correlation between two variables, but the numbers of their samples are different. N1=45 and N2=20. Does anyone know how to solve this problem? Thanks!
Hello. I hope someone can help me with figuring out the latest approaches to doing a survey study. I have a couple of questions as following:
1. What would be a more advanced level of doing analysis for a survey study beyond descriptive statistical analysis and correlational analysis?
In particular, how would one decide between regression, factor analysis, decision tree analysis, and/or cluster analysis? Or something else?
2. What if the data set has many missing values...I read threads about the recommendations but I lack the knowledge and I hope to know the easiest way possible without losing N size. In fact, I'm tempted to the mean substitution despite the bad reputation regarding it.
Ultimately, if the data set can't be managed properly, I guess I should just report the descriptive analysis and the correlational findings...Would this still be publishable, if this is the new-kind topic? Any recommendation for books to read will be a great help. Alternately, if any gold-standard survey study written as an example could be useful too.
Any thoughts would be much appreciated.
Thank you so very much!
In a correlation analysis between two variables sign was –22 but in a multiple linear regression due to influence of other variables it became 21. Conceptually it must be a negative correlation.
Dear colleagues, how do you think about applying Pearson's R or Spearman's Rho correlation analysis on the panel data? Is possible to meaningfully interpret the results? Do you know any study or research that would fit my question? I highly appreciate your help.
Best
Ibrahim
I need to understand the most influential factor in ANN, however this easily can be done in simple correlation analysis, but anyone can suggest a method in the ANN model itself?
I am analysing the following data and am not sure how to proceed with the statistics, so any help will be highly appreciated.
The data consists of a decent number of patients who underwent a procedure. They have been asked to fill out a questionnaire in which they were to rank their satisfaction on a Likert like scale (1-5). Also, a medical assistant rated the usability of the device during the procedure per patient (also Likert scale). Later, the product was also rated on a Likert scale in terms of quality, performance and such.
I have tested for normality and the data is mostly not normally distributed. I have correlated Spearman rho and found that some variables were weakly to moderatly correlated among each other and with variables like weight and age.
However, I am unsure how to further report the data in a statistically correct manner. Usually, a Kruskal Wallis test etc. would be performed to show that groups are significantly different, and initially I tried to do that. But I am not sure this even makes sense with my data (as I don't really have groups to begin with, but rather a number of seperate cases that rated something; we did not compare the procedure to other procedures nor form subgroups of patients).
However, it seems I am missing something.
What we want to show with the data is that 1) the patients are generally satisfied with the product, quality is high etc (which can be determined simply by descriptive statistics),
2) that satisfaction, usability and such are not independent from each other, meaning, e.g. procedures that had high ratings for performance should also have high ratings for quality (is this correctly demonstrated with the correlation via Spearman test and how can I plot the correlations on a graph? For parametric data it seems that usually a scatterplot between two variables would be used, but can I do the same if I used Spearmans rho?
and 3) that the quality of the procedure is linked to the other variables. Should I perform ordinal regression for this? Or is this already covered by the correlations?
Is the Kruskal Wallis test needed/ Can I even use it on my sample?
Thank you so much!
I have conducted regularized canonical correlation analysis using MixOmics package in R studio because the number of variables in my data are more than that number of observations. I now want conduct commonality analysis using "yhat" package but I have been getting error and I realized the rcc output is slightly different from CCA output. Can anyone help me on how to go about it please?
A signal is split into two parts and one of them is going through a filter (say, with a transfer function H(f)) and the other part stays unchanged. Then I want to know how to calculate their cross correlation function. My guess is, given the spectral density function S(f), it will be the ordinary Wiener-Khinchine theorem with an addition of the transfer function: R=Integral{S(f)H(f)*exp(i*2*pi*f*t)df}
Hi,
I am utilizing a questionnaire to measure self reported technology proficiency of individuals. This particular questionnaire has 34 items representing 6 sub scales (Containing 5, 5, 5, 5, 8, and 6 items respectively). I have a single output variable also. I am trying to make a correlation table with these 6 sub scales against the output variable and check for significance. My questions is what is correct approach to do this. As it has two layers of variable (6 dimensions representing 34 items) and i am trying to map them against one output variable. How can an aggregate correlation be calculated?
Hi, These are the outputs generated by a system. Green-colored (nr) is desired output, while the system generates Red-colored (nf). I also have a dataset in CSV file. For this waveform, What would be an appropriate statistical and quantitative way to assess the correlation and/or similarity between the data-sets above? I want a single number to summarize the similarity in terms of shape? Or any other method that indicates these are similar.
Is the excel sheet ok or I need econometrics tools such as SPSS and MATLAB?
I have two time series variables, X and Y. and I want to use cross correlation analysis to see the relationship between them. The software is SPSS 21.0. when cross correlation analysis was used, what is the first selected variable? and how to explain the positive lags and negative lags?
If the selected order of variable is chaged, so the significance cross correlation changed too (from positive lags to negative lags, or from negative to positive).
Thank you very much
I am writing a piece of code in the Syntax of SPSSstatistics. The code is the following.
GET DATA /TYPE=XLSX
/FILE='file'
/SHEET=name 'Sheet2'
/CELLRANGE=full
/READNAMES=on
/ASSUMEDSTRWIDTH=32767.
EXECUTE.
DATASET NAME DataSet1 WINDOW=FRONT.
RANK VARIABLES= WA CI (D) /RANK /PRINT=YES /TIES=MEAN.
SELECT IF( RWA EQ '1' AND RCI EQ '1').
The problem is at the last line of the above code. SPSS selects the right data and then, SPSS creates two variables called RWA and RCI. SO far, things are fine. Then, SPSS should select cases only when the two variables assume value 1 and 1 simultaneously. Unfortunately it doesn't work. Am I wrong somewhere?
Thanks. Francesco
I'm working on the virulence gene profile (sefA, pefA and lpfA genes) and the antimicrobial resistance gene profile (bla-CTX-M, bla-SHV and bla-TEM genes) of 95 Salmonella enterica samples.So basically for each sample, I have scored them for the presence or absence of each genes. Could you suggest the best stat analysis to correlate the presence of virulence and resistance genes? I'm thinking of using Fisher's Exact test for pairwise comparison (3 virulence genes x 3 AMR genes). Thank you so much!
I am working to improve a manuscript and I have been advised to provide a map showing where the correlation is significant. Any information or links to learn about this would be quite helpful. Thank you in anticipation!
Dear experts,
I am going to analyze the correlation between the two items. Some mutual variables affect these items.
By which statistical method I can understand which variable has a stronger impact on the correlation between the items?
Thank you so much,
Majid