Science topic

Multivariate Analysis - Science topic

A set of techniques used when variation in several variables has to be studied simultaneously. In statistics, multivariate analysis is interpreted as any analytic method that allows simultaneous study of two or more dependent variables.
Questions related to Multivariate Analysis
  • asked a question related to Multivariate Analysis
Question
2 answers
I conducted a bivariate analysis between independent and outcome variables. I got a crude odds ratio of less than one. I got an adjusted odds ratio greater than one for the same independent variables on multivariate analysis. How can I interpret it? Do you think it can happen? how?
Relevant answer
Answer
Hello Fisha Mehabaw Alemayoh. Readers will be better able to help you if you report the two ORs with their 95% CIs.
Meanwhile, are you aware of the distinction between positive and negative confounding?
In some disciplines, negative confounding is known as suppression.
HTH.
  • asked a question related to Multivariate Analysis
Question
10 answers
There is a common understanding that, in many cases, the absence of extreme values (univariate outliers) in individual variables may reduce the likelihood of extreme values when considering multiple variables simultaneously (multivariate outliers).
May i get some good citations to strengthen above fact?
Relevant answer
Answer
I agree with Sal Mangiafico's assessment of Dursa Hussein's response. It sure does look (and walk and quack) like AI generated text.
Hettiarachchi H.A.H., I don't recall ever seeing empirical data the relationship between presence of univariate outliers and multivariate outliers. But I am (presently) far from convinced that there is a strong relationship between the two. Why? Because a multivariate outlier can be thought of as an odd combination of values. When explaining this to students, I give an example of a dataset that has demographic variables on a sample of people ranging in age from early childhood to old age. In such a dataset, ages of 5 and 85 would not be univariate outliers; and body weights of 20 kg and 95 kg would not be univariate outliers. But an individual with age = 5 years and body weight = 95 kg would be a bivariate outlier. Imagine what such a data point would look like in a scatter-plot.
Again, I have not seen empirical data on this question, but I do know that multivariate outliers can occur when there are no univariate outliers. HTH.
  • asked a question related to Multivariate Analysis
Question
3 answers
multivariate analysis
Relevant answer
Answer
In multivariate analysis, canonical correspondence analysis (CCA) is an ordination technique that determines axes from the response data as a linear combination of measured predictors. CCA is commonly used in ecology in order to extract gradients that drive the composition of ecological communities. CCA extends Correspondence Analysis (CA) with regression, in order to incorporate predictor variables.
  • asked a question related to Multivariate Analysis
Question
8 answers
I plan to do PCA and MANOVA/PERMANOVA as my multivariate analysis tests for my data set. My problem is how the results are interpreted in PCA plots, and MANOVA/PERMANOVA differs from research paper to research paper and needs to be clarified. Would anyone suggest excellent sources to follow? I would appreciate good research papers with good PCAs and MANOVA/PERMANOVA.
Relevant answer
Answer
As for PCA I think this papere i wrote some years ago could be of interest as for interpretation of results (here based on component loadings)
  • asked a question related to Multivariate Analysis
Question
1 answer
most of the time researchers perform the bivariate analysis of one dependent variable with several independent variables, and set the criteria at p value of 0.25 to retrieve a candidates variable for multivariable. is there a hard rule to set the p value? if yes what are the criterias?
Relevant answer
Answer
Hello Berhanu Negese Kebede. The bivariate (or univariable) screening you describe is not recommended, because it is known to produce over-fitted models. See Mike Babyak's nice 2004 article on over-fitting, for example. And see the datamethods.org checklist section on multivariable models. HTH.
  • asked a question related to Multivariate Analysis
Question
5 answers
Practically, I tried to modelled are variable into multivariate analysis using enter method where I simply put every factors into one model. Oddly enough, the odds ratio does not seem to be correct as it appears to be this way (exhibit I).
There are some variables whose EXP-B seems to be wrong. I use complex sample and all analysis are conducted in a weighted manner using complex sample analysis.
Relevant answer
Answer
Take sample from your data , check auto correlation and PACF.
If the auto correlation decreases with the lag then check for outliers.
Remove that from data then find. You will get correct result.
  • asked a question related to Multivariate Analysis
Question
7 answers
Hello,
I am currently analyzing data from a study and am running into some issues. I have two independent variables (low vs high intensity & protein vs no protein intervention) and 5 dependent variables that I measured on two separate occasions (pre intervention and post intervention). So technically I have 4 groups a) low intensity, no protein b) low intensity, protein c) high intensity, no protein and d) high intensity, protein.
Originally I was going to do a two-way MANOVA as I wanted to know the interaction between the two independent variables on multiple dependent variables however I forgot about the fact I have two measurements of each of the dependent variables and want to include how they changed over time.
I can't seem to find a test that will incorporate all these factors, it seems like I would need to do a three-way MANOVA but can't seem to find anything on that. So I am thinking of a) calculating the difference in the dependent variables between the two time stamps and using that measurement for the MANOVA or b) using MANOVA for the measurement of dependent from the post test and then doing a separate test to see how each of the dependent variables changed over time. Is this the right line of thinking or am I missing something? When researching this I kept finding doubly multivariate analysis for multiple observations but it seems to me that that only allows for time and one other independent variable, not two.
Any guidance or feedback would be greatly appreciated :)
Relevant answer
Answer
Hello Isabel,
The basic design is a two-between (intensity, 2 levels; protein, 2 levels), one-within (occasion, 2 levels) manova. Between-within manovas are sometimes called "doubly multivariate."
So yes, you have three factors.
However, you may wish to consider a couple of points:
1. Do you really intend to interpret the results across arbitrary linear composite/s of the five DVs (generated by the software to maximize the variance accounted for by the between-subjects IVs), or would you prefer to address how the study factors influence scores on each of the five DVs? The first calls for the multivariate test; the second calls for univariate tests.
2. Consider using the pre-intervention score for a given dependent variable as a covariate, then use the post-intervention score as the DV score. This eliminates the within-subjects factor, and offers the ability to "adjust" for randomly occurring differences at the outset among the four treatment combinations.
3. In general, tests of change over time (e.g., meanT2 - meanT1) force you to work with difference scores that are less reliable than either of the scores from which they are derived, unless all time scores are perfectly reliable (no measurement error), which is seldom if ever the case.
Good luck with your work.
  • asked a question related to Multivariate Analysis
Question
2 answers
In many articles I saw that ARDL model can be used when there is only 1 cointegration relationship. Therefore, to check the number of cointegration relationships I used Johansen cointegration test and found there is only 1 cointegration relationship. But in theory, ARDL model uses bound F statistic test to check whether there are cointegration relationships exist. How do we identify number of cointegration relationships using bound test. Is it unnecessary to use bound test if I already used the Johansen cointegration test?
Relevant answer
Answer
The ARDL bounds test assumes at most one long run levels relationship. It does not test for multiple long run relations. The Johansen test can test for multiple cointegrating relationships. The difference is that the Johansen procedure assumes all levels variables are I(1) whereas the ARDL procedure assumes the dependent levels variable is I(1) and the independent variables can be I(1) or I(0). Using both tests may be useful if there is uncertainty over the orders of integration of the levels variables and the number of long run relations. If you are certain all levels variables are I(1) you only need to use Johansen. If you are certain there is one long run levels relationship and uncertain over the independent variables' orders of integration you only need to use the ARDL test.
  • asked a question related to Multivariate Analysis
Question
3 answers
We just got all of our results in the univariable analysis insignificant. Should we perform multivariable analysis in this context?
Relevant answer
I don't really believe this is a good idea.
First of all, I don't get why you tested univariate before multivariate.
Second, what is the nature of your study and what it aims to answer? It's exploratory or it's for prediction? And what kind of model was used, a simple linear regression or something completely different?
If it's exploratory, try another method to judge the variables. If it's prediction, re-evaluate your data set and how the variables are presented.
  • asked a question related to Multivariate Analysis
Question
3 answers
Hi,
as my question already indicates I would like to do some multivariate analysis of my proteomics data as I have multiple characteristics in my samples. I have successfully used MetaboAnalyst for multivariate analysis in metabolomics approaches. Do I have to expect some drawbacks by using MetaboAnalyst for proteomics data or is there an easy tool such metaboAnalyst for proteomics data?
Thank you for your help!
BR,
Timo
Relevant answer
Answer
MetaboAnalyst is primarily designed for metabolomics data analysis, but it also supports proteomics data analysis. Therefore, it is possible to use MetaboAnalyst for multivariate analysis of proteomics data, but there may be some limitations.
One potential issue is that the data preprocessing methods for proteomics data might differ from those used for metabolomics data. For example, the normalization and scaling methods for proteomics data might not be the same as those used for metabolomics data. Additionally, the statistical tests available in MetaboAnalyst might not be optimized for proteomics data.
However, despite these limitations, MetaboAnalyst is still a powerful tool that can be used for proteomics data analysis. It provides a user-friendly interface for data uploading, preprocessing, and multivariate analysis. It also supports a wide range of multivariate analysis techniques, including PCA, PLS-DA, OPLS-DA, and clustering analysis, which can be used to identify patterns and relationships in the data.
If you are looking for a specialized tool for proteomics data analysis, there are other software packages available, such as Progenesis QI for Proteomics, MaxQuant, and Perseus. These tools are designed specifically for proteomics data analysis and offer a wide range of advanced features and analysis methods.
  • asked a question related to Multivariate Analysis
Question
1 answer
I am conducting a multivariate time series analysis and my time series variables are measured in different scales (Dollars, Rupees, counts). I want to know whether standardization of variables is required before I apply VAR/ARCH/GARCH models. In addition, I want to know whether the mentioned models can be used if I have dummy variables.
Relevant answer
Answer
Your level variables must be converted to log returns before you estimate these models. However, it is not clear how you define the “counts” variable. You should ensure that all variables exhibit a certain level of heteroscedasticity and clustering.
Our recent paper published in Energy Economics might help you to get some direction for your analysis:
You can download a brief version of the code written in RATS from the link below:
  • asked a question related to Multivariate Analysis
Question
6 answers
In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.
Without relation to the image, the dependent variables may be k life satisfactions scores measured at sequential time points and p job satisfaction scores measured at sequential time points. In this case there are k+p dependent variables whose linear combination follows a multivariate normal distribution, multivariate variance-covariance matrix homogeneity, and linear relationship, no multicollinearity, and each without outliers
Relevant answer
Answer
Generally one would avoid programming something like MANOVA yourself. There are dozens of software packages already available - many free that will run MANOVA. For example, you could run this in R (a free, open source software programming environment for statistical analysis) fairly easily. You could probably do this easily with existing packages in R.
That said, if you are interested in following up MANOVA with individual dependent variables, there's no real point in using MANOVA. Just run the the individual analyses and correct for multiple testing (if desired). Personally I've never found MANOVA very useful.
There are other multivariate approaches that could be considered - but again not useful I think if you just end up interpreting individual DVs.
  • asked a question related to Multivariate Analysis
Question
17 answers
In my time series dataset, I have 1 dependent variable and 5 independent variables and I need to find the independent variable that affects the dependent variable the most (the independent variable that explains most variations in the dependent variable). Consider all 5 variables are economic factors.
Relevant answer
There are several statistical models that can be used to analyze the relationship between time series variables. Some common approaches include:
  1. Linear regression: This is a statistical model that is commonly used to understand the relationship between two or more variables. In your case, you can use linear regression to model the relationship between your dependent variable and each of the independent variables, and then compare the coefficients to determine which independent variable has the strongest effect on the dependent variable.
  2. Autoregressive Integrated Moving Average (ARIMA) model: This is a type of statistical model that is specifically designed for analyzing time series data. It involves using past values of the time series to model and forecast future values.
  3. Vector Autoregression (VAR) model: This is a statistical model that is similar to ARIMA, but it allows for multiple time series variables to be analyzed simultaneously.
  4. Granger causality test: This is a statistical test that can be used to determine whether one time series is useful in forecasting another time series. In other words, it can be used to determine whether one time series "Granger causes" another time series.
  5. Transfer function model: This is a statistical model that is used to analyze the relationship between two or more time series variables by considering the transfer of information from one series to another.
It's worth noting that there is no one "best" model for analyzing the relationship between time series variables, and the appropriate model will depend on the specifics of your data and the research question you are trying to answer.
  • asked a question related to Multivariate Analysis
Question
3 answers
For example, we have a multivariate time series comprising 8 univariate time series. I am aware some deep learning libraries can help to predict each of the time series in the multivariate series. I want to control what to forecast (for instance, forecast the first 4 series). Is it possible to use such deep learning libraries to accomplish that or there is a better way to do it?
Thanks
Relevant answer
Answer
Thank you
Kareem Omran
for your comprehensive response. I will give those models you mentioned a try. I have been using LSTM Encoder-Decoder which seems to be impossible for me to actualise my goal.
Once again, thank you massively!
  • asked a question related to Multivariate Analysis
Question
4 answers
This is in context of the objective function of a multivariate optimization problem say, f(a,b,c).
I am looking for a "measure" for the degree of bias of f(a,b,c) towards any of the input variables.
Relevant answer
Answer
Siddhartha Pany, then, perhaps, you may be willing to look for something like
"marginal contribution to risk ... in portfolio theory"
"... the rate of change in risk (objective function) ... with respect to a small percentage change in the size of a portfolio allocation weight",
however, this is a very specific example from portfolio theory.
Similarly, the notion of elasticity might be another example (for bias) from economics in attachment. The magnitude of the elasticity (ratio) could of any size, so a bound on elasticity should be imposed as a constraint, I guess.
Good luck.
  • asked a question related to Multivariate Analysis
Question
2 answers
I have 15 treatments. My main interest is to find the best treatment. The response is measured every day up to 30 days. My model has an interaction effect between time and treatments. I use suitable effect size, but I need power of 80% with type I error 5%. How can I calculate sample size by simulation?
Relevant answer
Answer
did you calculate eventually? need guidance
  • asked a question related to Multivariate Analysis
Question
5 answers
Within a project about geographical traceability of horticultural products, we would like to apply classification models to our data set (e.g. LDA) to predict if it is possible to correctly classify samples according to their origin and based on the results of 20-25 different chemical variables.
We identified 5 cultivation areas and selected 41 orchards (experimental units) in total. In each orchard, 10 samples were collected (each sample from a different tree). The samples were analyzed separately. So, at the end, we have the results for 410 samples.
The question is: the 10 samples per orchard have to be considered pseudoreplicates since they belong to the same experimental unit (even if collected from indepedent trees)? Should the LDA be performed considering 41 replicates (the 41 orchards, taking the average of the 10 samples) or should we run it for the whole dataset?
Thank you for your help.
Relevant answer
Answer
Nick VL Serão Thank you for this solution. I have been looking for this answer. But do you know how to accomplish this on R?
  • asked a question related to Multivariate Analysis
Question
6 answers
I have collected data to assess determinant factors for an outcome variable with two responses (YES or No). The independent variables are large in number and in five categories (socio-economic, behavioral, environmental, occupational exposure, and drinking water). I want to see the association between the independent variables and the outcome variable. Moreover, I need to identify the most influencing determinants among the variables. As many authors did, I did a bivariate analysis for each independent variable and then selected significant ones for the next multivariate analysis. But, the variables prepared for multivariate analysis are still large in number. I must minimize the number of variables before using multivariate analysis.
Can I use PCA after the bivariate analysis (binary logistic regression) for each qualified variable within its own category? If using PCA is possible, should include the PCA output in a manuscript for publication?
I tried PCA before bivariate analysis, but some variables, which are associated significantly with the outcome variables when I use binary logistic regression are absent in components 1 and 2 of the PCA result.
Thank you for your help
Relevant answer
Answer
Yes you are right Adane Sirage Ali.
As an alternative to MCA and PCA, we have a third option. This MCA and PCA combination is known as Multiple Factor Analysis for Mixed Data (FAMD). MCA is used for categorical data, PCA is used for continuous data, and FAMD is used when categorical and continuous data are combined in a study.
  • asked a question related to Multivariate Analysis
Question
23 answers
I need suggestions for groundwater assessment-related articles used discriminant analysis in their analysis and study, as well as how to apply this analysis in R programming.
Reghais.A
Thanks
Relevant answer
  • asked a question related to Multivariate Analysis
Question
6 answers
In my current study, I am identifying the association between some independent variables and a dependent variable. For which I am using bivariate analysis (Cross tab with p values) and multivariate analysis (multiple regression with adjusted Odds ratio). Some previous studies on my topic used different p value cut off points e.g. p<0.25, 0.05, and others included some variables without such restriction.
What should I do? Should I include the same variables in both of the bivariate and multivariate analyses?
Thank you in advance.
Relevant answer
Answer
You should not select variables by their p-value. You should build a model based on theory. If you consider a variable interesting or important enough to include it in the model, then include it. Otherwise don't.
  • asked a question related to Multivariate Analysis
Question
5 answers
Dear esteemed colleagues
Many times in empirical research, a researcher faces an issue of sample size. The researcher may need to work with a small sample. It would be of help if you share your views and some literature that suggests the acceptable size of a small sample.
Best regards
Relevant answer
Answer
In the field of next-generation sequencing you can find papers with n=1. This is particularily wide-spread, but not limited to, for single-cell analyses, where each cell is considered an independent entity. Accordingly, the papers report incredible pseudoreplication (n = several thousands instead of 1).
What sample size is acceptable depends on the particular case: how well established is the theory? is there a quantitative understanding and are the precisions of the estimates good enough? is it well embedded into other findings? are there additional experiments confirming the results and discrediting possible alternatives? how relevant or surprising is the conclusion?
Often, bad experimental design and choice of the analysis strategy do more harm than having a "small" sample.
  • asked a question related to Multivariate Analysis
Question
2 answers
Dear all,
I am a newbie with multivariate analyses and hoping to get some help here...
I am trying to conduct a Distance-based Linear Modelling (distLM) on PRIMER v7 to analyse the relationship of some environmental data with species abundance data.
I understand that the Marginal Test identifies the environmental variables that, when considered individually and ignoring all other environmental variables, have a significant relationship with the biological data.
But what is the purpose of the Marginal Test, when some of the outcomes are bound to change in a Sequential Test, where the overlapping effects of the environmental variables are considered, which is a more realistic scenario?
eg. Factor X3 shows up insignificant in the Marginal Test, but it becomes significant in the Sequential Test when Factors X1 and X2 are taken into account.
Thank you very much in advance for any input!
Relevant answer
Answer
You could think of this in terms of variances of model errors and the relative sizes of those variances.
For a single explanatory variable, it may have a predictive power as measured by its ability to reduce the variance of what it is being used to predict. At the stage of a single variable regression that reduction in variance may be small compared to the overall variance of the dependent variable. But when you get the multiple variable stage, you compare that reduction in variance with a much smaller error variance which the error variance of the model omitting your variable.
Thus a given reduction in variance is judged unimportant in the single DV case, but important in the multiple DV case. That is really all the "significance test" tells you.
However, you have to recall the effect of sample size here, since even a small error-reduction will be judged significant with a large enough sample.
Thus the significance tests are saying that, for the single DV case, you don't have enough evidence to show that there really is a reduction in error variance compared to a mean-value prediction, while in the multiple DV case you do have enough evidence that the extra variable does reduce the smaller error variance.
This may not help much. However, the single-variable results should be a reminder to look at the corresponding cross-plots of the variables to check for anomalous data-values.
  • asked a question related to Multivariate Analysis
Question
11 answers
Hello everyone,
How can I conduct a correlation test between two nominal variable gender and shelter type (5 categories) and other scale variable for example income, travel time, and shelter distance?
My objective is to show the significant correlation among variables.
Thank you.
Relevant answer
Answer
Look for eta coefficient
  • asked a question related to Multivariate Analysis
Question
2 answers
Hello friend,
I have done my thesis in simple lattice design (2 replications) for 2 years. I want to do combined analysis of variance. How can do it? which statistical software? which program?
Thanks
Relevant answer
Answer
Hello dear you can use R.software
  • asked a question related to Multivariate Analysis
Question
10 answers
Actually, I have one independent variable and a dependent variable, I am using one latent variable which is a mediator and through that I wanted to know the effect on dependent variable. The problem is I am not aware if data collected for latent variable is through semi structured questionnaire and not on likert scale will make the possibility for Structure Equation Model? Kindly explain and help. I am rookie researcher and new to multivariate analysis and other methods.
Relevant answer
Answer
This would require the open ended questions to reflect something that can be meaningfully scored on a continuous (quantitative, i.e., at least ordinal) scale. If the answers to those open questions are purely qualitative in nature (nominal scale), it may be difficult or impossible to include those variables in an SEM as indicators of continuous latent variables.
  • asked a question related to Multivariate Analysis
Question
2 answers
Hello,
I'm looking working on a clustering analysis and would be curious if anyone has ideas about how to deal with nested categorical variables.
Normally I would calculate a distance/dissimilarity matrix (Gower when some variables are categorical), and then feed this to a clustering algorithm of choice. Now what happens when some categorical variables are nested?
Fictious example
If measuring characteristics of water samples like turbidity, temperature, dissolved gases, and presence/absence of 50 chemical compounds in the water.
* presence/absence of chemical compounds can be treated as 50 separate binary/categorical variables
* but say that these chemicals belong to 4 groups of compounds?
Thoughts
We could simply add an additional categorical variable "group" and for more complex nesting "subgroup", "subsubgroup"... OK, but as far as I understand, Gower distance is a bit like Manhattan distance in that it calculates a distance for each variable and then adds weights. What but part of the information will be redundant, and even more so if there are more levels of nesting. I was wondering whether anyone has come up with something else to specifically deal with that. Maybe some form of weighting of the variables?
Looking forward to your inputs!
Mick
Relevant answer
Answer
Thank you for taking the time to reply Muhammad Ali .
I looked at the linked resources but do not see anything related to my question (*nested categorical* variables, not simple categorical variables). In case I missed it, could you please indicate the relevant section?
Kind regards,
Mick
  • asked a question related to Multivariate Analysis
Question
5 answers
I have data of 6 groups with sample size of n = 2, 10, 2, 9, 3, 1 and I want to perform Permutational multivariate analysis of variance (PERMANOVA) on these data.
My question is: Is it correct to run perMANOVA on these data with the small sample size? The results look strange for me because the group of n = 1 showed insignificant difference to other groups although the graphical representation of the groups clearly show a difference.
Thank you
Relevant answer
Answer
I personally wouldn't do that. I would suggest that you increase your sample sizes. BTW your sample with n=1 has infinite variance which is why you observe that result. Best wishes David Booth
  • asked a question related to Multivariate Analysis
Question
3 answers
I seem to be getting really exponentially high HR on multivariate analysis by cox-regression. I only have 70 or so patients and 5 poor outcomes. Any ideas why my results may have turned out this way?
Relevant answer
Answer
Interested
  • asked a question related to Multivariate Analysis
Question
5 answers
If I use SmartPLS to test the structural model then how I can measure the Goodness of Fit Index (GFI). What are the indices I need to observe for validating the research model?
  • asked a question related to Multivariate Analysis
Question
10 answers
I know that some literature said that the minimum estimate for the path coefficient should be around 0.2, but is there any discretion or other opinion regarding this matter? Thank you for the attention.
Relevant answer
Answer
Very useful to follow. Good inquiry and answers. Thank you, professors.
Kind Regards,
  • asked a question related to Multivariate Analysis
Question
3 answers
I am trying to understand how multivariate data preprocessing works but there are some questions in my mind.
For example, I can do data smoothing, transformation (box-cox, differentiation), noise removal in univariate data (for any machine learning problem. Not only time series forecasting). But what if one variable is not noisy and the other is noisy? Or one is not smooth and another one is smooth (i will need to sliding window avg. for one variable but not the other one.) What will be the case? What should I do?
Relevant answer
Answer
You are looking for a perfect answer that will cover any situation. Data analysis doesn't work like that. I would suggest you look at John Tukey work on Exploratory Data Analysis. That's the best answer that I know of today. Best wishes David Booth
  • asked a question related to Multivariate Analysis
Question
1 answer
I am performing a meta-analysis of risk factors odds ratios including only adjusted OR. Some papers report only p-values of both univariate and multivariate analyses, and sufficient data to calculate crude, unadjusted odds ratio.
Do you know a valuable method to estimate the adjusted OR and standard error from these data?
Relevant answer
Answer
Hello Emilio,
I can't see a dependable way to elicit an estimate of adjusted OR (with or without SE) in the absence of the data points, and given only p-values and the crude OR.
Among the reasons is: suppose p-values are "< .001" for each case (simple model with only one IV vs. multiple predictor model). There's no information there to infer the degree of change due to inclusion/exclusion of control variables. As well, would you not have to have exactly the same set of control variables (no more, no fewer) in each study in order to have a comparable set of inferred values for adjusted OR?
If you had the difference in model chi-square for 1-IV vs. multiple-IV models (and the associated df), you could infer the difference in information, but that's not what you're asking for.
If only a few studies are lacking the information you require, you might try reaching out to the author/s and requesting either the data or if they can furnish the adjusted OR values you seek.
Good luck with your work.
  • asked a question related to Multivariate Analysis
Question
3 answers
Please suggest the method and tools.
Relevant answer
Answer
BTW this attachment gives a useful.way to do Mahalanobis d°2. Best wishes David Booth
  • asked a question related to Multivariate Analysis
Question
10 answers
Is it possible to do a multivariate analysis from a compositional statistical analysis perspective?
Without excluding measurements outside the compositional structure of a substance such as T C°,pH,EC...etc , Or are there conditions and restrictions for that?
Relevant answer
Answer
Thank you dear Rabah Kechiched again
  • asked a question related to Multivariate Analysis
Question
5 answers
Hello my friends - I have a set of independent variables and the Likert scale was used on them and I have one dependent variable and the Likert scale was used as well. I made the analysis and I want to be sure that I'm doing this right - how I can use control variables such as age, gender, work experience and education level as control variables to measure their effect on the relationship between the independent variables and the dependent variable? Please give me one example. Thanks
Relevant answer
Answer
you may include all of your control variables - age, gender, work experience and education level - in one block and begin the regression analysis with them. Next, you include the independent variables that are supposed to have an effect on your dependent variable. I would recommend entering them one after another so that you can observe the change each new variable causes to the ones already included and the explained variance in the dependent variable. Once you have entered all variables, you will have controled for the initial four variables, presumably rendering they influence on the dependent variable insignificant.
I do not have and article present, but perhaps someone else can help you out.
Best
Marcel
  • asked a question related to Multivariate Analysis
Question
5 answers
I would like to analyse the beta diversity of more than five sampling sites and would like to check the similarity. Can I analyse whether any species sharing among sampling sites? I would like to use Bray-Curtis, Jaccard or Sorenson similarity index. Can I analyse it through PAST or Origin Pro? Which method is best to interpret the result?
Relevant answer
Answer
David Eugene Booth Dear sir, May I have an answer? I am so confused
  • asked a question related to Multivariate Analysis
Question
4 answers
I collected 109 responses for 60 indicators to measure the status of urban sustainability as a pilot study. So far I know, I cannot run EFA as 1 indicator required at least 5 responses, but I do not know whether I can run PCA with limited responses? Would you please suggest to me the applicability of PCA or any other possible analysis?
Relevant answer
Answer
I would recommend you read about the difference between EFA and PCA first. Whether or not you should run an EFA has nothing to do with the number of response options on the indicators, five or otherwise. In general, EFA is preferable to PCA as it is considered to be the 'real' factor analysis. The are many threads on RG on this issue.
Best
Marcel
  • asked a question related to Multivariate Analysis
Question
6 answers
Dear colleagues,
I am asking your kind comments or recommendation on analyzing hierarchical- and multiple responses (outcomes). I use hierarchical and multiple responses to express my outcome variable which is because my outcome is Quality of life (by Rand-36 or SF-36). However, by calculating the 36-items questions, I would have a continuous mean score for the total quality of life. But, as you may know, under SF-36, we also could calculate 8 domain scores (separately PF, RP, BP, GH, and MH, RE, VT, SF), and 2 dimensions (summary) scores (the PCS and MSC). Therefore, in a way, my outcomes are multiple responses and also are hierarchical.
level 1: total mean score of quality of life
level 2: --- Physical component summary
level 3: ------ PF: physical functioning
level 3: ------ RF: role limitation due to physical problems
level 3: ------ BP: body pain
level 3: ------ GH: general health
level 2: --- Mental component summary
level 3: ------ MH: mental health
level 3: ------ RE: role limitation due to emotional problems
level 3: ------ VT: vitality (fatigue or energy)
level 3: ------ SF: social functioning
My purpose of study (cross-sectional design) is to understand associated factors to hemodialysis patients' quality of life. Therefore, I have a series of explanatory variables (Xs) to estimate the Ys. My original analysis was using "multiple regression" to each of the quality of life scores (Three hierarchical levels of scores: the total QoL mean score, each of the 8 domain scores, and each of the 2 dimension scores).
But, this brings me to the problem of "multiple comparisons" and also I treated each type of scores (no matter the total QoL mean score, or the domain score, or the summary score) as "independent to each other" which actually are correlated. However, from the QoL measurement instrument, there is inherent hierarchical and also correlations among the three levels of scores in the designed conceptual framework: SF-36.
Therefore, Ii would like to kindly ask for your comments or recommendation:
1). how can I analyze my (Y (outcomes) when they are multiple-responses and hierarchical?
2). will multilevel analysis (hierarchical linear regression) work for my Ys?
3). other analysis methods could try?
4), could you please suggest to me some literature to explore this issue I am countering?
Relevant answer
Answer
Ronán Michael Conroy thank you so much for your kind advice. I will read through the Dirichlet distribution.
  • asked a question related to Multivariate Analysis
Question
2 answers
I am planning to use PCA and OPLS DA for my study in biochemometrics but i quite tight on budget. I am not sure how much is the SIMCA software although they have a trial version, I am worried if I'll be able to maximize the use of the free version in my data. Are there alternatives that is cheaper or free but will give quality data analyses on PCA and OPLS DA?
Relevant answer
Answer
You can also use free online tools for multivariate statistics: MetaboAnalyst (https://www.metaboanalyst.ca/) or Biostatflow (http://biostatflow.org).
p.
  • asked a question related to Multivariate Analysis
Question
2 answers
I have a colleague who is doing analysis and he wants to assess the effect of risk factors such as age, vitals, electrolytes on surgical management outcome. The variable for the surgical outcome is categorized as unfavourable(0) and unfavourable(1). however some variables like electrolytes, where we have potassium has three categories low(1), normal(2) and high(3). We need your support and help. I will be grateful for your answers and recommendations to some papers for comparison.
Relevant answer
Answer
For reporting purpose, you may follow the table 4 write up of the attached article.
  • asked a question related to Multivariate Analysis
Question
5 answers
Hello all,
I am looking for adaptable interpretation of my data to understand the relationship between species abundance in different communities with soil nutrients (N,P,K and OC). For this I have 5 communities with 3 replicates each. Kindly suggest which multivariate analysis will be most appropriate. Thanks.
Relevant answer
Answer
Try Pearson correlation alongside Multiple regression to check the dependent variable (soil nutrients) using the independent variables (species abundance)
Best!!
  • asked a question related to Multivariate Analysis
Question
3 answers
Hello All ?
I want data set for "Methods of Multivariate Analysis" by Alvin Rencher . Please share . The link given in the book
is not working ?
Relevant answer
Answer
  • asked a question related to Multivariate Analysis
Question
9 answers
I once read if i have a single independent variable and two+ dependent variables, i should use multivariate analysis.
But then i read somewhere that multivariate analysis = inferential statistics (where the analysis results generalizing the whole population)
Is it possible to use statistical analysis that won't generalize the results?
Relevant answer
Answer
creative researcher
If you mean in your research about demographic variables, the matter is different here, you must use the binary Two-Way Variance Analysis for multiple comparisons, so that you can extract the differences between the independent and dependent variables.
  • asked a question related to Multivariate Analysis
Question
8 answers
I am using "mvprobit" in STATA, however it is not clear to me how i can estimate marginal effect after this. Any help will be much appreciated. 
Relevant answer
Answer
Dear All,
I have worked on this and written codes below for estimating marginal effect after MVPROBIT. I hope this will be useful. *Example data use http://www.stata-press.com/data/r7/school.dta, clear *Step 1: Running the MV probit model mvprobit (private = years logptax loginc) (vote = years logptax loginc)
*Step 2: Post estimation command to estimate Predictions from multivariate probit models estimated by SML mvppred pred_xb, xb *Step 3" Generate coeefiencts for each binay category gen Coef_years_private = -.0089447 gen Coef_logptax_private =-.1018381 gen Coef_loginc_private =.3787381 gen Coef_years_vote = -.0160871 gen Coef_logptax_vote =-1.260877 gen Coef_loginc_vote =.9744685 *Step 4: Calculating Marginal effect using (Coefficients and linear Predictions) gen ME_years_private =normalden(pred_xb1)*Coef_years_private gen ME_logpatx_private =normalden(pred_xb1)*Coef_logptax_private gen ME_loginc_private =normalden(pred_xb1)*Coef_loginc_private gen ME_years_vote =normalden(pred_xb1)*Coef_years_vote gen ME_logpatx_vote =normalden(pred_xb1)*Coef_logptax_vote gen ME_loginc_vote =normalden(pred_xb1)*Coef_loginc_vote *Step 5 After estimating the ME for each observation we can get the mean using the summarize or mean command.
  • asked a question related to Multivariate Analysis
Question
4 answers
TLDR: How many variables can I have in a VAR or VECM model?
I am writing my thesis and I am using a VECM (VAR model with error correction for cointegration) model for analyzing the relationship between the prices of an energy exchange and some other factors. So far I have 4 variables in my model and I am thinking of adding more.
My question is that after how many variables does the model become unusable and unstable or can I add as much as I like?
Thank you for your answers in advance!
Relevant answer
Answer
If you are not constrained by degrees of freedom, then you should be guided by the theory. Include all the relevant variables suggested by theory, that should constitute the minimum set of variables that you should include. Should you add more variables after that? That depends. What is the justification for those additional variables? You should be able to provide a justification for those added variables. You should also avoid a "kitchen sink" approach.
  • asked a question related to Multivariate Analysis
Question
5 answers
I did a MCA analysis using FactoMineR. I know how to interpret cos2, contributions and coordinates, but I don't know how values of v.test should be interpreted.
Thank you
Relevant answer
Answer
(a v.test over 1.96 is equivalent to a p-value less than 0.05 )
  • asked a question related to Multivariate Analysis
Question
4 answers
Please see the attached file. Is it possible to find key variables in a system where they are interlinked?
Relevant answer
Answer
In PCA we get only orthogonal components, what I need is the most correlated variables, so that a small change in it will affect the whole system.
  • asked a question related to Multivariate Analysis
Question
3 answers
I'm using two way crossed ANOSIM with Time and Treatment to analyze similarities in and between groups. I chose 4999 permutations but that was an arbitrary decision. I'd like to know how to define the correct amount of permutations and even though I've read some of Clarke's papers (Clarke, 1993. Non-parametric multivariate analysis of changes in community structure) and the PRIMER v.6 manual, I'm afraid I have not reached a clear answer, possibly because of lack of understanding or need of simpler language. I'd appreciate any help.
Relevant answer
Answer
As a general rule (valid for N > 30), the statistical error halves, if the sample size is increased 4fold. I would say, that in randomization tests you get a rough estimate with 250, a reasonable result with 1000 and you are on the safe side with 4000. More than 10000 is rarely necessary. As the default number of permutations of anosim in in R is 999 (+1, the observed case), you don't make anything wrong with 4999.
  • asked a question related to Multivariate Analysis
Question
3 answers
I'm trying to tabulate some concentrations of compounds to eventually test if tide (or depth for other samples) and distance along a sampling transect affect these compounds (the 5 variables on the right). However, the only way I can think to do this is in the attached image, where I'm forced to repeat the distance measurements which results in them getting treated as separate values. If I just split each dependent variable into two based on tide, then I no longer have that independent variable which slows things down quite a bit.
Also, I'm trying to make my Tide variable binary, but I don't see an option for that - perhaps that's also a problem here? Attached is an image of my table.
Relevant answer
Answer
SPSS IMO is extremely confusing in this area,. Rather than confuse you .more I'll leave that to an SPSS person and mention a simpler solution outside of SPSS, R
If you download jarad Lander R for everyone available in the z-library you will notice that R has a function that reads other data formats directly and simply allows you to use your current table without changing anything.. Examples abound in the Lander book which also explains downloading the R software which is all completely free, and runs on essentially any device including smart phones. Keep your data essentially as is and
switch to R following Lander's advice and move on to a world without such problems all for zero cost. The Lander text has commonly used procedures already set up, eg regression, etc that can be used directly. My advice is look it over and try it. I did and never looked back. Questions can always be asked here and to many other places found by a Google search.. Try it you're going to like it. Best wishes, David Booth
  • asked a question related to Multivariate Analysis
Question
21 answers
I need helps , what is the best and free software to performing moderator analysis and covariate or multivariate ? Please help 
Relevant answer
Answer
I recommend you Stata very helpful software for multivariate analysis.
  • asked a question related to Multivariate Analysis
Question
1 answer
I want to investigate the relationship between differences in coral physiological variables based on euclidean distances and seawater environmental variables using DISTLM and dbRDA in PRIMER, but I am not sure if this analysis is suitable given the lack of replication I have in my predictor variable (environmental) matrix.
I have attached an excel file illustrating the structure of my data set (the response and predictor variables). Briefly, I have a multivariate data set of measured physiological variables (e.g. lipid concentration, protein concentration, tissue biomass etc.) for corals collected from five different locations (A-E), where each site is very unique in its seawater physico-chemical parameters. I collected 12 corals per site (total of 60 samples). I have constructed a resemblance matrix of the physiological data in PRIMER based on Euclidean distances, and there is clear grouping of data points in the NMDS, which coincides with the different collection sites for each coral. I want to investigate the proportion of the observed variation in the multivariate data cloud that can be explained by the environmental characteristics of each collection site (e.g. mean annual sea surface temperature, seawater chlorophyll concentration, salinity etc.). However, the dataset of environmental variables does not have replication. i.e. for each site (A-E), I only have one value for mean annual sea surface temp, one value of salinity etc.
All of the case-study examples I have read about distance-based redundancy analysis in R or PRIMER have two resemblance matrices (predictor and response) both of which have replication. However, in my case, my response variables have replication (i.e. 12 samples per site), whereas my environmental variables do not have replication (i.e. one measurement per variable per site).
Can someone advise me whether or not dbRDA is suitable in this instance? If as I predict, it is not suitable, can you recommend a better approach? I am not an expert in multivariate statistics, but I want to make sure that the approach I take is sound.
Any and all advice is welcome. Thanks
Relevant answer
Answer
Hi Rowan, I am in a similar situation. What I did I used an average of the response variables. But I do not know if it the optimal solution. Did you solve this riddle at the end?
  • asked a question related to Multivariate Analysis
Question
5 answers
I have seen a plethora of scholars using a cutoff value for independent variables to enter into the multivariable analysis. They usually use 0.2 or 0.25 as a cut of point to take variables from bivariable to multivariable analysis. But, I personally don't believe as the procedure is highly exposed to bias which could give us a biased estimate finally because of confounders. Any saying, please?
Thank you!
Relevant answer
Answer
Hello Wubet,
The practice of requiring a bivariate relationship exist between a possible IV with the chosen DV before using the IV in a multivariable model is a "quick and dirty" approach to pruning a large set of potential IVs to a more manageable size.
There are three serious problems with this "method."
1. You are not guaranteed to find the "optimal" ensemble of IVs for explaining differences in DV scores. There are many other methods available (often used in data mining applications) which fare far better towards this goal.
2. You are running a risk of mistakenly incorporating at least one IV into the model, due to the sheer number of bivariate relationships evaluated. This risk increases as the number of candidate IVs increases.
3. You'll be guaranteed to miss any instance of suppressor effects among IVs.
So, I too would be skeptical of this approach.
Good luck with your work.
  • asked a question related to Multivariate Analysis
Question
3 answers
Dear colleagues,
I have been helping analyse a sustainability project that compares % of biomass in composts.
As the design is 5x2 (4 replicates each, that's what I got given) using 15 predictors I'm using PERMANOVA and will later analyse the power to see if the analysis is valid.
However, the variables (chemical compounds and physical characteristics) have different units and quite different range values and I need to standardize them(I'm using z-score).
Have been looking for a while, but can't find an answer to the questions:
Should I apply the standardization by variables, meaning find each variable mean and standard deviation, or should I use the central point (the whole dataset, mean and standard deviation of all applied to each measurement)?
They give me different results and I would like to be able to support the choice I will make.
Would love to hear some insights and references into that.
All the best,
Erica
Relevant answer
Answer
Hello Erica,
Standardize within each variable, so that each has a mean of 0 and SD of 1. The consequence of not standardizing is that the variable(s) with largest SD(s) will exert more influence on measured distances between cases.
Good luck with your work.
  • asked a question related to Multivariate Analysis
Question
4 answers
Hello,
I'm working on a dataset looking at the effect of fertiliser * time on floristics diversity. I have run mixed models on a few responses including species richness and found that richness is lower in one treatment level. I have also run PERMANOVA/PERMDISP to measure the effect of treatments on community composition and SIMPER analyses to look at the species contributing to the similarities/dissimilarities within/between groups. I now would like to find out which species are not present in the group with lower species richness, however, I'm not sure how I can use SIMPER (or another method) to identify which species are not present in one group in relation to another. Can anyone help please?
Many thanks,
Erika
Relevant answer
Answer
Why not just make a list of observed species for each treatment and compare lists? You could use a program like R, and a join feature to have the program tell you which records are in one list and not another. If it is more a question of abundance rather than presence-absence then run a multiple comparison test for each species.
  • asked a question related to Multivariate Analysis
Question
5 answers
Hi everywone!
I am performing a RDA analysis with vegan package in R.
I have a doubt regarding 'decostand' & 'scale' functions. Are they the same? Should I use one of them?
I have many soil variables ('data.var' : pH, CIC, N content, C/N ratio, microbial biomass, pH, CE, etc), and I was using both functions in my script:
.
.
data.var.sdz<-decostand(data.var,"standardize")
rda_indexes <- rda(data.var.sdz ~ data$depth, data, perm=999,scale=TRUE)
.
.
I just realized that maybe this is is wrong. Any ideas on this?
Thanks in advance!
Relevant answer
Answer
Hi,
Using decostand + standardize is the same as scale. For example:
data(varespec)
sptrans <- decostand(varespec, "standardize")
ss <- scale(sptrans)
summary(ss[,1:3])
summary(sptrans[,1:3])
So, in your case, if you are using data.var.sdz, there is no need to use
scale=TRUE in rds function.
Using scale=TRUE will not do anything to data.var.sdz since it is already scaled.
  • asked a question related to Multivariate Analysis
Question
6 answers
I am using Structural Equation Modeling (SEM) to determine the relationship between job demands and job strain. Five job demands are measured using 3 items, and job strain is measured using 4 items. A competing measurement model strategy utilizing a Confirmatory Factor Analysis (CFA) approach revealed that a model where job demands is estimated by a first-order five factor model comprised out of the five measured job demands and where job strain is estimated as a one-factor model fits the data best.
The next step in my analysis was to add directionality, resulting in the structural model. Estimating the standardized regression weights of the five job demands showed that two job demands, work overload and emotional demands, have a beta higher than |1|. I already checked for multicollinearity using the VIF score and this revealed that the highest VIF score, a score of 2.1, was assigned to emotional demands. This does not clearly indicate multicollinearity.
The emotional demands variable has a significant correlation of .42** with job strain, work overload has a significant correlation of .20** with job strain and emotional demands and work overload have a correlation .56**. Interestingly is that the beta of work overload equals -1.44, which is negative, whereas the its correlation is positive. Further, the beta of emotional demands is 2.11. When all variables are included, the R^2 in job strain equals 0.73.
When removing the work overload variable, the beta of emotional demands decreases to .58 and the R^2 decreases to 0.39. Likewise, when removing the emotional demands, the beta of work overload increases to -.08 and the R^2 decreases to 0.28. Looking at this effect, it seems to me that work overload is a suppressor variable in my model. However, I am not sure if this is the case, nor if it is correct for my standardized regression weights to be larger than |1|.
Does anyone know what to do with this issue?
If you require any additional information or data, please let me know.
Thank you in advance!
Relevant answer
Answer
It looks like a typical suppressor situations due to highly correlated predictor variables as indicated by the standardized path coefficients > |1| and the opposite signs of the path coefficients compared to the zero-order correlations. Since these path coefficients are partial regression coefficients (not: zero-order correlations), they can definitely be > |1|. This does not necessarily point to an improper solution (unless you get a latent variance or residual variance estimate that is negative or a latent R^2 > 1).
The two predictors emotional demand and work overload are probably highly correlated with each other, and so one suppresses variance in the other when predicting job strain. This is not necessarily problematic. I would check out the general literature on suppression effects in path models, e.g.,
  • asked a question related to Multivariate Analysis
Question
4 answers
The background: I do research on stomach contents and have a dataset with many stomachs as samples (rows of dataset) and abundance for several prey categories in the stomachs (columns of dataset). I can group my data for different factors (e.g. year, season, size-class etc.) for example to test for differences in diet composition between years. I am using the R 'vegan' package.
My question: When I run e.g. a PERMANOVA (in fact the adonis2 function from vegan) on the raw data, means several thousand stomachs as individual samples, I got high significances but also low R2 values as the high number of residuals 'spoil' the model. When I summarise the data and THEN perform the multivariate statistic, I got lower significances but also higher R2 values, which is desirable (as they explain the contribution to the model). The problem here is, that sometimes I have only 1 degree of freedom (e.g. comparing only to years with each other) and then the statistic doesn't work at all.
What would be the right way to do, when dealing with such data? Going for one or the other way of structuring the data? Or go for something completely different, e.g. Kruskal-ANOVA?
Many thanks for any suggestions.
Relevant answer
Answer
Kruskal ANOVA is univariate and would ignore the multivariate nature of the question. It is also true as Jaime Pinzon says that applying a test that assumes homogeneity of dispersions may not be appropriate if there is a presumption that different fish have different breadths of diet (specialists vs generalists, for example).
You are getting small differences that are highly significant because the numerous samples give you massive power, but you are in danger of missing the bigger picture. The underlying problem here is the sample grain, and this is well understood for such dietary studies. What is found in an individual fish's stomach only tells you what it has just eaten. The data are dominated by individual-individual noise. To improve the 'signal (diet) to noise (last meal)' ratio it is often appropriate to redefine your samples as the sum or average of a number of fish guts. Choose a number (say 4, or 5, or 10) and then randomly sample groups of fish of that size from your data within each of your combinations of factors. This will give you much more robust data relevant to differences in diet among levels of factors, rather than differences in last meals. Finally, the fish are 'doing the sampling' so you have no control over the sampling effort, so it would be sensible to standardize (convert to percentages) and maybe do a mild (square root) transformation prior to calculating appropriate resemblances (e.g. Bray-Curtis).
For hypothesis testing, if you aren't too bothered by differences in dispersion Permanova may be OK. Alternatively multiway ANOSIM (which gives measures of relative effect sizes) would be an appropriate method, although it's not in R yet as far as I know. For more see Lek E., Fairclough D. V., Platell M. E., Clarke K. R., Tweedley J. R. & Potter I. C. (2011) To what extent are the dietary compositions of three abundant, co-occurring labrid species different and related to latitude, habitat, body size and season? J. Fish Biol. 78, 1913–1943 for a fish example (they used a sample size of 4 fish I think). For comparison of Permanova and Anosim see doi:10.1111/aec.13059. 3-way ANOSIM (incorporating the Lek et al. data) coming soon in the same issue.
  • asked a question related to Multivariate Analysis
Question
5 answers
In my work, I use the multivariate GARH model (DCC-GARCH). I am testing the existence of autocorrelation in the variance model. Ljung-Box tests (Q) for standardized residuals and square standardized residuals give different results.
Should I choose the Ljung-Box or Ljung-Box square test?
N=1500
Relevant answer
Answer
The Ljung-Box test is aimed at testing the independance of errors using residuals of an ARMA model estimated on the same data. But it makes use of autocorrelations so it is not powerful when the errors are uncorrelated but not independent. When applied to squared residuals, it can reveal ARCH and GARCH effects. Note that the errors of a ARCH-GARCH model are uncorrelated but not independent. Have a look at the excellent book by Francq and Zakoian entitled "GARCH Models: Structure, Statistical Inference and Financial Applications" published by Wiley in 2010.
  • asked a question related to Multivariate Analysis
Question
6 answers
Greetings,
I have listed factors of each animal (mammals) species rescued during a dam inundation such as "can swim/unable to swim", "arboreal/terrestrial", "cryptic/none cryptic", frequency of capture: "common/rare" etc. What analysis can be done in order to determine the contributing factors and grouping of characteristics of the animals that influences it to be rescued? I've been suggested Principle Component Analysis (PCA) but I'm also exploring other options out there.
Thank you.
Relevant answer
Answer
Hello Nur,
If your sample is only comprised of rescued animals, then you won't be able to pinpoint features/attributes that distinguish them from non-rescued animals (especially if the vital traits differ by species). At best, you could look for traits that appear in a high fraction of the rescued animals, and propose that these might be salient features for survival via rescue.
Good luck with your work.
  • asked a question related to Multivariate Analysis
Question
6 answers
If in a multivariate model we have several continuous variables and some categorical ones, we have to change the categoricals to dummy variables containing either 0 or 1.
Now to put all the variables together to calibrate a regression or classification model, we need to scale the variables.
Scaling a continuous variable is a meaningful process. But doing the same with columns containing 0 or 1 does not seem to be ideal. The dummies will not have their "fair share" of influencing the calibrated model.
Is there a solution to this?
Relevant answer
Answer
Monika Mrozek I think that based on Johannes Elfner shared, it makes sense NOT to scale the discrete variables.
  • asked a question related to Multivariate Analysis
Question
3 answers
How do I perform multivariable analysis adjusting for age and gender, in SPSS?
Relevant answer
Answer
Depending on what you mean by "adjusting", your design (are you comparing quasi-experimental groups or real experimental groups, or something else), and your research question(s), you might consider a matching procedure. You will need to provide more information to get more informative answers, but I recommend consulting (https://www.amazon.com/Design-Observational-Studies-Springer-Statistics/dp/3030464040).
  • asked a question related to Multivariate Analysis
Question
3 answers
Dear all,
May I know how to obtain the adjusted OR (aOR) in multivariate model when the crude (bivariate) analysis is not significant & is not meant to be included into the multivariate model?
I am using SPSS software.
Kindly enlighten.
Thank you in advance.
Relevant answer
Answer
See the attached screenshot for how to find it in SPSS. IF there is no relationship in the simple regression there's very little chance of of finding one in the multivariable case. Please note that multivariate means multiple dependent variables. With best wishes, David Booth
  • asked a question related to Multivariate Analysis
Question
11 answers
Using a sampling grid we evaluated weeds density in each node using an ordinal scale (0=no weeds; 1=less than 100 plants; 2=between 200 and 300 plants etc). We sampled 50 fields and are seeking to convert the frequency distribution of densities calculated for each field into a single value that can serve as a response variable in a multivariable analyses. Thanks!
Relevant answer
Answer
This is a typical situation where a novel approach I have worked out recently (with the help of a researcher from France) might be applicable.
It's all about distributions of "points" or "units" over a discrete scale of ordered categories (also sometimes called points, confusingly).
Let's say we have 5 ordered categories from lowest to highest, e.g. 0, 1, 2, 3, 4 (5-point-scale). Let's assume there is some procedure thru which a fixed number of points, e.g. 20, are distributed over this scale. We want to know the RANK of this distribution among all possible ranks of distributions with a given number of categories (here:5) and a given number of observations (here:20).
There is a rather simple formula by which this RANK can be calculated. There an even simpler (but similar) formula by which the total number of such distributions can be computed.
Given both RANK and MEX RANK a percentage can be calculated which is a good representative for this profile of data.
See attachment for an example list of profiles.
  • asked a question related to Multivariate Analysis
Question
86 answers
For doing metabolite profiling of herbal drugs/medicinal plants using Chromatography methods, we have to evaluate the data using multivariate analysis such as PCA, PLS, PLS-DA, HCA etc.). It will be very helpful for our students if you can recommend free (on line) software(s) that can do the multivariate analysis.
Relevant answer
Answer
R is the best...
  • asked a question related to Multivariate Analysis
Question
6 answers
Which is the best and most versatile instrument between FT-NIR and FT-IR for multivariate analysis? Thanks.
Relevant answer
Answer
David Eugene Booth Many thanks for all your advices, I will. Best, Thomas.
  • asked a question related to Multivariate Analysis
Question
4 answers
Hi!
I am trying some Multi Level SEMs with the lavaan package in R. I am wondering, if the current version is able to model and calculate any cross-level interactions (e.g. like somehow in MPlus with some placeholder such as "s | y ON x" or sth. similar).
Is that sth. lavaan can already do or is Mplus for these cases the tool of choice?
Thanks in advance!
Tim
Relevant answer
Answer
Hi folks,
I had an email exchange with Yves Rosseel in which I asked about the future plans. He said that he plans to approach random slopes end of summer. I guess (but don't know if cross-level interactions will come automatically with this, what i would assume).
Best,
---Holger
  • asked a question related to Multivariate Analysis
Question
3 answers
Participants go through a virtual scenario wherein they act as part of a squad of Marines, going through a market, getting fired upon, discovering other captured soldiers who are being held by insurgents, and ultimately freeing the one live soldier.
After going through the scenario, they first provided information about their "somatic anxiety" (a between subjects variable). Then, for each element of the scenario, they stated the emotion felt, and rated how intensely they had felt that emotion. They then recalled everything they could remember about the scenario, and completed a recognition task.
I was able to match the intensity ratings, recall, and recognition measures for each of the elements of the scenario. For simplicities sake, let's say there are only three elements.
So I have as predictors:
Somatic anxiety score (between subjects)
Element 1 Intensity
Element 2 Intensity
Element 3 Intensity
And for outcome measures:
Element 1 Recall
Element 2 Recall
Element 3 Recall
Element 1 Recognition
Element 2 Recognition
Element 3 Recognition
I was thinking of running the analysis with recall and recognition separately.
I need help with which analysis to run. Is this appropriate for canonical correlation?
If so, what about the between subjects variable? Can canonical correlation handle that?
Thanks for your help!
Virginia Diehl
Relevant answer
Answer
Hello Virginia,
If your interest is in how the set of element intensities relate to the sets of elements recall and/or elements recognition, then canonical correlation or multivariate regression could be applied.
One feature of canonical correlation that might recommend it as a choice here is the opportunity to rotate solutions of either variable set; your second set could be evaluated to see whether the recognition/recall appear to generalize to two distinguishable factors/sets.
Good luck with your work.
  • asked a question related to Multivariate Analysis
Question
3 answers
I am attempting to analyze the tissue homogenate to understand the pathogenesis of insulin resistance. Kindly help me in this regard if someone is working on this.
Thank you
Relevant answer
Answer
I will glad if someone can provide a guideline on how it can be done using any statistical package. Which variable will be assign the quantitative X (i.e. independent variable) which I believed will be the wave length. My concern is what the Y variable is will be made up of. I need someone to shed more light on this.
Thanks
  • asked a question related to Multivariate Analysis
Question
4 answers
We have a dataset consisting of six years of fish detection data from an open ocean acoustic tracking array that was deployed to record the presence of acoustically-tagged fishes. The array consists of 50 permanently moored and widely spaced tracking stations divided equally between a “deep” and “shallow” stratum. Our core question is “Does the ‘community’ of detected fish (16 species) differ across depth strata and seasons?” Secondarily, “What habitat covariates help explain differences in the community?” We are not especially interested in year or station effects.
I’m working in PRIMER v7 that allows PERMANOVA models with both fixed factors and continuous covariates. If I considered a model without covariates, the design might be a repeated measures approach (since stations never move) with: Season = fixed (4 levels), Stratum = fixed (2 levels), Station = random (50 levels) nested within Stratum.
Things get more tricky when we consider adding covariates. Some covariates (e.g., distance from shore, seafloor slope, sediment type) are always linked to station and will not change though time while others such as remotely-sensed water temperature and chlorophyll vary on rather shorter timescales. One thought would be to include smaller time blocks as a random effect, maybe in one week or one month increments (so 321 or 60 blocks, respectively, over 6 years) and use mean temperate and chlorophyll values for each time block.
So my questions are:
1) Is it a PERMANOVA ‘felony’ to have some habitat covariate values that repeat many times while others do not?
2) Should Station even be a random effect when habitat covariates are tied to them?
We also considered using PERMANOVA just for fixed factor Season x Stratum x Interaction tests and the DISTLM routine for the continuous covariates but the same problem of static covariates due to repeated 'sampling' of the same stations would seem to remain.
We welcome any insights or criticisms on this approach
- Eric
Relevant answer
Answer
To me there appear to be two questions. One concerns differences in the detected community in space and time, which can simply be addressed using Permanova (or Anosim). The other concerns which of the measured variables 'best explain' the observed differences in the fish community. I wouldn't try to address this using Permanova at all, but some of the other tools in the box such as BEST, which can look for the subset of measured 'environmental variables' (I wouldn't call them covariates) best explainingg the community pattern, Linktree which looks for thresholds in individual variables explaining divisions in the community data, or even just bubble plots in MDS as a first mode of exploration. I'm never comfortable with trying to set up a single test that ticks all boxes, and prefer a piecemeal approach where one builds an explanation using a variety of tests and approaches.
  • asked a question related to Multivariate Analysis
Question
10 answers
How can I select variables for PCA analysis from huge set of data? except the biological significance of the variables, are there other criteria for objectively choosing the most relevant variables to take into account in a multivariate analysis (PCA in this case)?
Thank you in advance...
  • asked a question related to Multivariate Analysis
Question
6 answers
Hello,
is it possible to use the linear discriminant analysis (LDA) to determine which of the analyzed variables best separates the different groups (which are already known)?
For example, I want to understand how 3 different croplands are different in terms of ecosystem services provisioning. So, I decide to measure 4 variables for each ecosystem (Soil Carbon, Dry matter, Biodiversity, and GHG) and then I run an LDA analysis (on PAST 3.4 here)
I get this result (see the attached picture). Here clearly the Grassland seems to be more different than the other two croplands (because it is more displaced than the other two croplands on the X-axis).
Would it be correct to conclude that this grassland differs most from the other 2 crops and this seems to be determined by its level of biodiversity?
Thanks (and of course, these data are not real. That's just an example)
Relevant answer
Answer
Hello Matteo,
It would be correct to say that the centroid (mean on the linear composite of the variables forming the first discriminant root or function) for the grasslands group is further from the aggregate (all cases) centroid, or from the centroids of the other two groups. However, the display doesn't inform you as to what variable(s) are most influential in the function.