Multivariate Analysis - Science topic
A set of techniques used when variation in several variables has to be studied simultaneously. In statistics, multivariate analysis is interpreted as any analytic method that allows simultaneous study of two or more dependent variables.
Questions related to Multivariate Analysis
I conducted a bivariate analysis between independent and outcome variables. I got a crude odds ratio of less than one. I got an adjusted odds ratio greater than one for the same independent variables on multivariate analysis. How can I interpret it? Do you think it can happen? how?
There is a common understanding that, in many cases, the absence of extreme values (univariate outliers) in individual variables may reduce the likelihood of extreme values when considering multiple variables simultaneously (multivariate outliers).
May i get some good citations to strengthen above fact?
I plan to do PCA and MANOVA/PERMANOVA as my multivariate analysis tests for my data set. My problem is how the results are interpreted in PCA plots, and MANOVA/PERMANOVA differs from research paper to research paper and needs to be clarified. Would anyone suggest excellent sources to follow? I would appreciate good research papers with good PCAs and MANOVA/PERMANOVA.
most of the time researchers perform the bivariate analysis of one dependent variable with several independent variables, and set the criteria at p value of 0.25 to retrieve a candidates variable for multivariable. is there a hard rule to set the p value? if yes what are the criterias?
Practically, I tried to modelled are variable into multivariate analysis using enter method where I simply put every factors into one model. Oddly enough, the odds ratio does not seem to be correct as it appears to be this way (exhibit I).
There are some variables whose EXP-B seems to be wrong. I use complex sample and all analysis are conducted in a weighted manner using complex sample analysis.
I am currently analyzing data from a study and am running into some issues. I have two independent variables (low vs high intensity & protein vs no protein intervention) and 5 dependent variables that I measured on two separate occasions (pre intervention and post intervention). So technically I have 4 groups a) low intensity, no protein b) low intensity, protein c) high intensity, no protein and d) high intensity, protein.
Originally I was going to do a two-way MANOVA as I wanted to know the interaction between the two independent variables on multiple dependent variables however I forgot about the fact I have two measurements of each of the dependent variables and want to include how they changed over time.
I can't seem to find a test that will incorporate all these factors, it seems like I would need to do a three-way MANOVA but can't seem to find anything on that. So I am thinking of a) calculating the difference in the dependent variables between the two time stamps and using that measurement for the MANOVA or b) using MANOVA for the measurement of dependent from the post test and then doing a separate test to see how each of the dependent variables changed over time. Is this the right line of thinking or am I missing something? When researching this I kept finding doubly multivariate analysis for multiple observations but it seems to me that that only allows for time and one other independent variable, not two.
Any guidance or feedback would be greatly appreciated :)
In many articles I saw that ARDL model can be used when there is only 1 cointegration relationship. Therefore, to check the number of cointegration relationships I used Johansen cointegration test and found there is only 1 cointegration relationship. But in theory, ARDL model uses bound F statistic test to check whether there are cointegration relationships exist. How do we identify number of cointegration relationships using bound test. Is it unnecessary to use bound test if I already used the Johansen cointegration test?
as my question already indicates I would like to do some multivariate analysis of my proteomics data as I have multiple characteristics in my samples. I have successfully used MetaboAnalyst for multivariate analysis in metabolomics approaches. Do I have to expect some drawbacks by using MetaboAnalyst for proteomics data or is there an easy tool such metaboAnalyst for proteomics data?
Thank you for your help!
I am conducting a multivariate time series analysis and my time series variables are measured in different scales (Dollars, Rupees, counts). I want to know whether standardization of variables is required before I apply VAR/ARCH/GARCH models. In addition, I want to know whether the mentioned models can be used if I have dummy variables.
In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.
Without relation to the image, the dependent variables may be k life satisfactions scores measured at sequential time points and p job satisfaction scores measured at sequential time points. In this case there are k+p dependent variables whose linear combination follows a multivariate normal distribution, multivariate variance-covariance matrix homogeneity, and linear relationship, no multicollinearity, and each without outliers
In my time series dataset, I have 1 dependent variable and 5 independent variables and I need to find the independent variable that affects the dependent variable the most (the independent variable that explains most variations in the dependent variable). Consider all 5 variables are economic factors.
For example, we have a multivariate time series comprising 8 univariate time series. I am aware some deep learning libraries can help to predict each of the time series in the multivariate series. I want to control what to forecast (for instance, forecast the first 4 series). Is it possible to use such deep learning libraries to accomplish that or there is a better way to do it?
This is in context of the objective function of a multivariate optimization problem say, f(a,b,c).
I am looking for a "measure" for the degree of bias of f(a,b,c) towards any of the input variables.
I have 15 treatments. My main interest is to find the best treatment. The response is measured every day up to 30 days. My model has an interaction effect between time and treatments. I use suitable effect size, but I need power of 80% with type I error 5%. How can I calculate sample size by simulation?
Within a project about geographical traceability of horticultural products, we would like to apply classification models to our data set (e.g. LDA) to predict if it is possible to correctly classify samples according to their origin and based on the results of 20-25 different chemical variables.
We identified 5 cultivation areas and selected 41 orchards (experimental units) in total. In each orchard, 10 samples were collected (each sample from a different tree). The samples were analyzed separately. So, at the end, we have the results for 410 samples.
The question is: the 10 samples per orchard have to be considered pseudoreplicates since they belong to the same experimental unit (even if collected from indepedent trees)? Should the LDA be performed considering 41 replicates (the 41 orchards, taking the average of the 10 samples) or should we run it for the whole dataset?
Thank you for your help.
I have collected data to assess determinant factors for an outcome variable with two responses (YES or No). The independent variables are large in number and in five categories (socio-economic, behavioral, environmental, occupational exposure, and drinking water). I want to see the association between the independent variables and the outcome variable. Moreover, I need to identify the most influencing determinants among the variables. As many authors did, I did a bivariate analysis for each independent variable and then selected significant ones for the next multivariate analysis. But, the variables prepared for multivariate analysis are still large in number. I must minimize the number of variables before using multivariate analysis.
Can I use PCA after the bivariate analysis (binary logistic regression) for each qualified variable within its own category? If using PCA is possible, should include the PCA output in a manuscript for publication?
I tried PCA before bivariate analysis, but some variables, which are associated significantly with the outcome variables when I use binary logistic regression are absent in components 1 and 2 of the PCA result.
Thank you for your help
I need suggestions for groundwater assessment-related articles used discriminant analysis in their analysis and study, as well as how to apply this analysis in R programming.
In my current study, I am identifying the association between some independent variables and a dependent variable. For which I am using bivariate analysis (Cross tab with p values) and multivariate analysis (multiple regression with adjusted Odds ratio). Some previous studies on my topic used different p value cut off points e.g. p<0.25, 0.05, and others included some variables without such restriction.
What should I do? Should I include the same variables in both of the bivariate and multivariate analyses?
Thank you in advance.
Dear esteemed colleagues
Many times in empirical research, a researcher faces an issue of sample size. The researcher may need to work with a small sample. It would be of help if you share your views and some literature that suggests the acceptable size of a small sample.
I am a newbie with multivariate analyses and hoping to get some help here...
I am trying to conduct a Distance-based Linear Modelling (distLM) on PRIMER v7 to analyse the relationship of some environmental data with species abundance data.
I understand that the Marginal Test identifies the environmental variables that, when considered individually and ignoring all other environmental variables, have a significant relationship with the biological data.
But what is the purpose of the Marginal Test, when some of the outcomes are bound to change in a Sequential Test, where the overlapping effects of the environmental variables are considered, which is a more realistic scenario?
eg. Factor X3 shows up insignificant in the Marginal Test, but it becomes significant in the Sequential Test when Factors X1 and X2 are taken into account.
Thank you very much in advance for any input!
How can I conduct a correlation test between two nominal variable gender and shelter type (5 categories) and other scale variable for example income, travel time, and shelter distance?
My objective is to show the significant correlation among variables.
I have done my thesis in simple lattice design (2 replications) for 2 years. I want to do combined analysis of variance. How can do it? which statistical software? which program?
Actually, I have one independent variable and a dependent variable, I am using one latent variable which is a mediator and through that I wanted to know the effect on dependent variable. The problem is I am not aware if data collected for latent variable is through semi structured questionnaire and not on likert scale will make the possibility for Structure Equation Model? Kindly explain and help. I am rookie researcher and new to multivariate analysis and other methods.
I'm looking working on a clustering analysis and would be curious if anyone has ideas about how to deal with nested categorical variables.
Normally I would calculate a distance/dissimilarity matrix (Gower when some variables are categorical), and then feed this to a clustering algorithm of choice. Now what happens when some categorical variables are nested?
If measuring characteristics of water samples like turbidity, temperature, dissolved gases, and presence/absence of 50 chemical compounds in the water.
* presence/absence of chemical compounds can be treated as 50 separate binary/categorical variables
* but say that these chemicals belong to 4 groups of compounds?
We could simply add an additional categorical variable "group" and for more complex nesting "subgroup", "subsubgroup"... OK, but as far as I understand, Gower distance is a bit like Manhattan distance in that it calculates a distance for each variable and then adds weights. What but part of the information will be redundant, and even more so if there are more levels of nesting. I was wondering whether anyone has come up with something else to specifically deal with that. Maybe some form of weighting of the variables?
Looking forward to your inputs!
I have data of 6 groups with sample size of n = 2, 10, 2, 9, 3, 1 and I want to perform Permutational multivariate analysis of variance (PERMANOVA) on these data.
My question is: Is it correct to run perMANOVA on these data with the small sample size? The results look strange for me because the group of n = 1 showed insignificant difference to other groups although the graphical representation of the groups clearly show a difference.
I seem to be getting really exponentially high HR on multivariate analysis by cox-regression. I only have 70 or so patients and 5 poor outcomes. Any ideas why my results may have turned out this way?
If I use SmartPLS to test the structural model then how I can measure the Goodness of Fit Index (GFI). What are the indices I need to observe for validating the research model?
I know that some literature said that the minimum estimate for the path coefficient should be around 0.2, but is there any discretion or other opinion regarding this matter? Thank you for the attention.
I am trying to understand how multivariate data preprocessing works but there are some questions in my mind.
For example, I can do data smoothing, transformation (box-cox, differentiation), noise removal in univariate data (for any machine learning problem. Not only time series forecasting). But what if one variable is not noisy and the other is noisy? Or one is not smooth and another one is smooth (i will need to sliding window avg. for one variable but not the other one.) What will be the case? What should I do?
I am performing a meta-analysis of risk factors odds ratios including only adjusted OR. Some papers report only p-values of both univariate and multivariate analyses, and sufficient data to calculate crude, unadjusted odds ratio.
Do you know a valuable method to estimate the adjusted OR and standard error from these data?
Is it possible to do a multivariate analysis from a compositional statistical analysis perspective?
Without excluding measurements outside the compositional structure of a substance such as T C°,pH,EC...etc , Or are there conditions and restrictions for that?
Hello my friends - I have a set of independent variables and the Likert scale was used on them and I have one dependent variable and the Likert scale was used as well. I made the analysis and I want to be sure that I'm doing this right - how I can use control variables such as age, gender, work experience and education level as control variables to measure their effect on the relationship between the independent variables and the dependent variable? Please give me one example. Thanks
I would like to analyse the beta diversity of more than five sampling sites and would like to check the similarity. Can I analyse whether any species sharing among sampling sites? I would like to use Bray-Curtis, Jaccard or Sorenson similarity index. Can I analyse it through PAST or Origin Pro? Which method is best to interpret the result?
I collected 109 responses for 60 indicators to measure the status of urban sustainability as a pilot study. So far I know, I cannot run EFA as 1 indicator required at least 5 responses, but I do not know whether I can run PCA with limited responses? Would you please suggest to me the applicability of PCA or any other possible analysis?
I am asking your kind comments or recommendation on analyzing hierarchical- and multiple responses (outcomes). I use hierarchical and multiple responses to express my outcome variable which is because my outcome is Quality of life (by Rand-36 or SF-36). However, by calculating the 36-items questions, I would have a continuous mean score for the total quality of life. But, as you may know, under SF-36, we also could calculate 8 domain scores (separately PF, RP, BP, GH, and MH, RE, VT, SF), and 2 dimensions (summary) scores (the PCS and MSC). Therefore, in a way, my outcomes are multiple responses and also are hierarchical.
level 1: total mean score of quality of life
level 2: --- Physical component summary
level 3: ------ PF: physical functioning
level 3: ------ RF: role limitation due to physical problems
level 3: ------ BP: body pain
level 3: ------ GH: general health
level 2: --- Mental component summary
level 3: ------ MH: mental health
level 3: ------ RE: role limitation due to emotional problems
level 3: ------ VT: vitality (fatigue or energy)
level 3: ------ SF: social functioning
My purpose of study (cross-sectional design) is to understand associated factors to hemodialysis patients' quality of life. Therefore, I have a series of explanatory variables (Xs) to estimate the Ys. My original analysis was using "multiple regression" to each of the quality of life scores (Three hierarchical levels of scores: the total QoL mean score, each of the 8 domain scores, and each of the 2 dimension scores).
But, this brings me to the problem of "multiple comparisons" and also I treated each type of scores (no matter the total QoL mean score, or the domain score, or the summary score) as "independent to each other" which actually are correlated. However, from the QoL measurement instrument, there is inherent hierarchical and also correlations among the three levels of scores in the designed conceptual framework: SF-36.
Therefore, Ii would like to kindly ask for your comments or recommendation:
1). how can I analyze my (Y (outcomes) when they are multiple-responses and hierarchical?
2). will multilevel analysis (hierarchical linear regression) work for my Ys?
3). other analysis methods could try?
4), could you please suggest to me some literature to explore this issue I am countering?
I am planning to use PCA and OPLS DA for my study in biochemometrics but i quite tight on budget. I am not sure how much is the SIMCA software although they have a trial version, I am worried if I'll be able to maximize the use of the free version in my data. Are there alternatives that is cheaper or free but will give quality data analyses on PCA and OPLS DA?
I have a colleague who is doing analysis and he wants to assess the effect of risk factors such as age, vitals, electrolytes on surgical management outcome. The variable for the surgical outcome is categorized as unfavourable(0) and unfavourable(1). however some variables like electrolytes, where we have potassium has three categories low(1), normal(2) and high(3). We need your support and help. I will be grateful for your answers and recommendations to some papers for comparison.
I am looking for adaptable interpretation of my data to understand the relationship between species abundance in different communities with soil nutrients (N,P,K and OC). For this I have 5 communities with 3 replicates each. Kindly suggest which multivariate analysis will be most appropriate. Thanks.
Hello All ?
I want data set for "Methods of Multivariate Analysis" by Alvin Rencher . Please share . The link given in the book
is not working ?
I once read if i have a single independent variable and two+ dependent variables, i should use multivariate analysis.
But then i read somewhere that multivariate analysis = inferential statistics (where the analysis results generalizing the whole population)
Is it possible to use statistical analysis that won't generalize the results?
I am using "mvprobit" in STATA, however it is not clear to me how i can estimate marginal effect after this. Any help will be much appreciated.
TLDR: How many variables can I have in a VAR or VECM model?
I am writing my thesis and I am using a VECM (VAR model with error correction for cointegration) model for analyzing the relationship between the prices of an energy exchange and some other factors. So far I have 4 variables in my model and I am thinking of adding more.
My question is that after how many variables does the model become unusable and unstable or can I add as much as I like?
Thank you for your answers in advance!
I did a MCA analysis using FactoMineR. I know how to interpret cos2, contributions and coordinates, but I don't know how values of v.test should be interpreted.
I'm using two way crossed ANOSIM with Time and Treatment to analyze similarities in and between groups. I chose 4999 permutations but that was an arbitrary decision. I'd like to know how to define the correct amount of permutations and even though I've read some of Clarke's papers (Clarke, 1993. Non-parametric multivariate analysis of changes in community structure) and the PRIMER v.6 manual, I'm afraid I have not reached a clear answer, possibly because of lack of understanding or need of simpler language. I'd appreciate any help.
I'm trying to tabulate some concentrations of compounds to eventually test if tide (or depth for other samples) and distance along a sampling transect affect these compounds (the 5 variables on the right). However, the only way I can think to do this is in the attached image, where I'm forced to repeat the distance measurements which results in them getting treated as separate values. If I just split each dependent variable into two based on tide, then I no longer have that independent variable which slows things down quite a bit.
Also, I'm trying to make my Tide variable binary, but I don't see an option for that - perhaps that's also a problem here? Attached is an image of my table.
I want to investigate the relationship between differences in coral physiological variables based on euclidean distances and seawater environmental variables using DISTLM and dbRDA in PRIMER, but I am not sure if this analysis is suitable given the lack of replication I have in my predictor variable (environmental) matrix.
I have attached an excel file illustrating the structure of my data set (the response and predictor variables). Briefly, I have a multivariate data set of measured physiological variables (e.g. lipid concentration, protein concentration, tissue biomass etc.) for corals collected from five different locations (A-E), where each site is very unique in its seawater physico-chemical parameters. I collected 12 corals per site (total of 60 samples). I have constructed a resemblance matrix of the physiological data in PRIMER based on Euclidean distances, and there is clear grouping of data points in the NMDS, which coincides with the different collection sites for each coral. I want to investigate the proportion of the observed variation in the multivariate data cloud that can be explained by the environmental characteristics of each collection site (e.g. mean annual sea surface temperature, seawater chlorophyll concentration, salinity etc.). However, the dataset of environmental variables does not have replication. i.e. for each site (A-E), I only have one value for mean annual sea surface temp, one value of salinity etc.
All of the case-study examples I have read about distance-based redundancy analysis in R or PRIMER have two resemblance matrices (predictor and response) both of which have replication. However, in my case, my response variables have replication (i.e. 12 samples per site), whereas my environmental variables do not have replication (i.e. one measurement per variable per site).
Can someone advise me whether or not dbRDA is suitable in this instance? If as I predict, it is not suitable, can you recommend a better approach? I am not an expert in multivariate statistics, but I want to make sure that the approach I take is sound.
Any and all advice is welcome. Thanks
I have seen a plethora of scholars using a cutoff value for independent variables to enter into the multivariable analysis. They usually use 0.2 or 0.25 as a cut of point to take variables from bivariable to multivariable analysis. But, I personally don't believe as the procedure is highly exposed to bias which could give us a biased estimate finally because of confounders. Any saying, please?
I have been helping analyse a sustainability project that compares % of biomass in composts.
As the design is 5x2 (4 replicates each, that's what I got given) using 15 predictors I'm using PERMANOVA and will later analyse the power to see if the analysis is valid.
However, the variables (chemical compounds and physical characteristics) have different units and quite different range values and I need to standardize them(I'm using z-score).
Have been looking for a while, but can't find an answer to the questions:
Should I apply the standardization by variables, meaning find each variable mean and standard deviation, or should I use the central point (the whole dataset, mean and standard deviation of all applied to each measurement)?
They give me different results and I would like to be able to support the choice I will make.
Would love to hear some insights and references into that.
All the best,
I'm working on a dataset looking at the effect of fertiliser * time on floristics diversity. I have run mixed models on a few responses including species richness and found that richness is lower in one treatment level. I have also run PERMANOVA/PERMDISP to measure the effect of treatments on community composition and SIMPER analyses to look at the species contributing to the similarities/dissimilarities within/between groups. I now would like to find out which species are not present in the group with lower species richness, however, I'm not sure how I can use SIMPER (or another method) to identify which species are not present in one group in relation to another. Can anyone help please?
I am performing a RDA analysis with vegan package in R.
I have a doubt regarding 'decostand' & 'scale' functions. Are they the same? Should I use one of them?
I have many soil variables ('data.var' : pH, CIC, N content, C/N ratio, microbial biomass, pH, CE, etc), and I was using both functions in my script:
rda_indexes <- rda(data.var.sdz ~ data$depth, data, perm=999,scale=TRUE)
I just realized that maybe this is is wrong. Any ideas on this?
Thanks in advance!
I am using Structural Equation Modeling (SEM) to determine the relationship between job demands and job strain. Five job demands are measured using 3 items, and job strain is measured using 4 items. A competing measurement model strategy utilizing a Confirmatory Factor Analysis (CFA) approach revealed that a model where job demands is estimated by a first-order five factor model comprised out of the five measured job demands and where job strain is estimated as a one-factor model fits the data best.
The next step in my analysis was to add directionality, resulting in the structural model. Estimating the standardized regression weights of the five job demands showed that two job demands, work overload and emotional demands, have a beta higher than |1|. I already checked for multicollinearity using the VIF score and this revealed that the highest VIF score, a score of 2.1, was assigned to emotional demands. This does not clearly indicate multicollinearity.
The emotional demands variable has a significant correlation of .42** with job strain, work overload has a significant correlation of .20** with job strain and emotional demands and work overload have a correlation .56**. Interestingly is that the beta of work overload equals -1.44, which is negative, whereas the its correlation is positive. Further, the beta of emotional demands is 2.11. When all variables are included, the R^2 in job strain equals 0.73.
When removing the work overload variable, the beta of emotional demands decreases to .58 and the R^2 decreases to 0.39. Likewise, when removing the emotional demands, the beta of work overload increases to -.08 and the R^2 decreases to 0.28. Looking at this effect, it seems to me that work overload is a suppressor variable in my model. However, I am not sure if this is the case, nor if it is correct for my standardized regression weights to be larger than |1|.
Does anyone know what to do with this issue?
If you require any additional information or data, please let me know.
Thank you in advance!
The background: I do research on stomach contents and have a dataset with many stomachs as samples (rows of dataset) and abundance for several prey categories in the stomachs (columns of dataset). I can group my data for different factors (e.g. year, season, size-class etc.) for example to test for differences in diet composition between years. I am using the R 'vegan' package.
My question: When I run e.g. a PERMANOVA (in fact the adonis2 function from vegan) on the raw data, means several thousand stomachs as individual samples, I got high significances but also low R2 values as the high number of residuals 'spoil' the model. When I summarise the data and THEN perform the multivariate statistic, I got lower significances but also higher R2 values, which is desirable (as they explain the contribution to the model). The problem here is, that sometimes I have only 1 degree of freedom (e.g. comparing only to years with each other) and then the statistic doesn't work at all.
What would be the right way to do, when dealing with such data? Going for one or the other way of structuring the data? Or go for something completely different, e.g. Kruskal-ANOVA?
Many thanks for any suggestions.
In my work, I use the multivariate GARH model (DCC-GARCH). I am testing the existence of autocorrelation in the variance model. Ljung-Box tests (Q) for standardized residuals and square standardized residuals give different results.
Should I choose the Ljung-Box or Ljung-Box square test?
I have listed factors of each animal (mammals) species rescued during a dam inundation such as "can swim/unable to swim", "arboreal/terrestrial", "cryptic/none cryptic", frequency of capture: "common/rare" etc. What analysis can be done in order to determine the contributing factors and grouping of characteristics of the animals that influences it to be rescued? I've been suggested Principle Component Analysis (PCA) but I'm also exploring other options out there.
If in a multivariate model we have several continuous variables and some categorical ones, we have to change the categoricals to dummy variables containing either 0 or 1.
Now to put all the variables together to calibrate a regression or classification model, we need to scale the variables.
Scaling a continuous variable is a meaningful process. But doing the same with columns containing 0 or 1 does not seem to be ideal. The dummies will not have their "fair share" of influencing the calibrated model.
Is there a solution to this?
May I know how to obtain the adjusted OR (aOR) in multivariate model when the crude (bivariate) analysis is not significant & is not meant to be included into the multivariate model?
I am using SPSS software.
Thank you in advance.
Using a sampling grid we evaluated weeds density in each node using an ordinal scale (0=no weeds; 1=less than 100 plants; 2=between 200 and 300 plants etc). We sampled 50 fields and are seeking to convert the frequency distribution of densities calculated for each field into a single value that can serve as a response variable in a multivariable analyses. Thanks!
For doing metabolite profiling of herbal drugs/medicinal plants using Chromatography methods, we have to evaluate the data using multivariate analysis such as PCA, PLS, PLS-DA, HCA etc.). It will be very helpful for our students if you can recommend free (on line) software(s) that can do the multivariate analysis.
I am trying some Multi Level SEMs with the lavaan package in R. I am wondering, if the current version is able to model and calculate any cross-level interactions (e.g. like somehow in MPlus with some placeholder such as "s | y ON x" or sth. similar).
Is that sth. lavaan can already do or is Mplus for these cases the tool of choice?
Thanks in advance!
Participants go through a virtual scenario wherein they act as part of a squad of Marines, going through a market, getting fired upon, discovering other captured soldiers who are being held by insurgents, and ultimately freeing the one live soldier.
After going through the scenario, they first provided information about their "somatic anxiety" (a between subjects variable). Then, for each element of the scenario, they stated the emotion felt, and rated how intensely they had felt that emotion. They then recalled everything they could remember about the scenario, and completed a recognition task.
I was able to match the intensity ratings, recall, and recognition measures for each of the elements of the scenario. For simplicities sake, let's say there are only three elements.
So I have as predictors:
Somatic anxiety score (between subjects)
Element 1 Intensity
Element 2 Intensity
Element 3 Intensity
And for outcome measures:
Element 1 Recall
Element 2 Recall
Element 3 Recall
Element 1 Recognition
Element 2 Recognition
Element 3 Recognition
I was thinking of running the analysis with recall and recognition separately.
I need help with which analysis to run. Is this appropriate for canonical correlation?
If so, what about the between subjects variable? Can canonical correlation handle that?
Thanks for your help!
I am attempting to analyze the tissue homogenate to understand the pathogenesis of insulin resistance. Kindly help me in this regard if someone is working on this.
We have a dataset consisting of six years of fish detection data from an open ocean acoustic tracking array that was deployed to record the presence of acoustically-tagged fishes. The array consists of 50 permanently moored and widely spaced tracking stations divided equally between a “deep” and “shallow” stratum. Our core question is “Does the ‘community’ of detected fish (16 species) differ across depth strata and seasons?” Secondarily, “What habitat covariates help explain differences in the community?” We are not especially interested in year or station effects.
I’m working in PRIMER v7 that allows PERMANOVA models with both fixed factors and continuous covariates. If I considered a model without covariates, the design might be a repeated measures approach (since stations never move) with: Season = fixed (4 levels), Stratum = fixed (2 levels), Station = random (50 levels) nested within Stratum.
Things get more tricky when we consider adding covariates. Some covariates (e.g., distance from shore, seafloor slope, sediment type) are always linked to station and will not change though time while others such as remotely-sensed water temperature and chlorophyll vary on rather shorter timescales. One thought would be to include smaller time blocks as a random effect, maybe in one week or one month increments (so 321 or 60 blocks, respectively, over 6 years) and use mean temperate and chlorophyll values for each time block.
So my questions are:
1) Is it a PERMANOVA ‘felony’ to have some habitat covariate values that repeat many times while others do not?
2) Should Station even be a random effect when habitat covariates are tied to them?
We also considered using PERMANOVA just for fixed factor Season x Stratum x Interaction tests and the DISTLM routine for the continuous covariates but the same problem of static covariates due to repeated 'sampling' of the same stations would seem to remain.
We welcome any insights or criticisms on this approach
How can I select variables for PCA analysis from huge set of data? except the biological significance of the variables, are there other criteria for objectively choosing the most relevant variables to take into account in a multivariate analysis (PCA in this case)?
Thank you in advance...
is it possible to use the linear discriminant analysis (LDA) to determine which of the analyzed variables best separates the different groups (which are already known)?
For example, I want to understand how 3 different croplands are different in terms of ecosystem services provisioning. So, I decide to measure 4 variables for each ecosystem (Soil Carbon, Dry matter, Biodiversity, and GHG) and then I run an LDA analysis (on PAST 3.4 here)
I get this result (see the attached picture). Here clearly the Grassland seems to be more different than the other two croplands (because it is more displaced than the other two croplands on the X-axis).
Would it be correct to conclude that this grassland differs most from the other 2 crops and this seems to be determined by its level of biodiversity?
Thanks (and of course, these data are not real. That's just an example)