Science topic

# Multivariate Statistics - Science topic

Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. The application of multivariate statistics is multivariate analysis.

Questions related to Multivariate Statistics

Hello

I have been in constant discussions with my friends and colleagues in recent years, in my experience I generally use multivariate statistics because most data sets do not have the assumptions for classical frequentist statistics. However, I know some people who use univariate and Bayesian methods to answer the same hypothesis questions. With this, the question would be, what would be the most appropriate way to answer our research questions?

For example, I want to explore the relationships between soil nematode communities and microbial communities under different treatments, including relationships of functional and taxonomic composition between each other. Nematodes and microbes belongs to different trophic levels, i.e., bacterivore nematodes feed on bacterias; fungivore nematodes feed on fungi; herbivore nematode feed on plant roots; and omnivore nematode prey on bacterivore, fungivore, and herbivore nematodes. In conclusion, which statistical tools are suit for analysis of relationships in complex soil food web?

I did a MCA analysis using FactoMineR. I know how to interpret cos2, contributions and coordinates, but I don't know how values of v.test should be interpreted.

Thank you

I want to investigate the relationship between differences in coral physiological variables based on euclidean distances and seawater environmental variables using DISTLM and dbRDA in PRIMER, but I am not sure if this analysis is suitable given the lack of replication I have in my predictor variable (environmental) matrix.

I have attached an excel file illustrating the structure of my data set (the response and predictor variables). Briefly, I have a multivariate data set of measured physiological variables (e.g. lipid concentration, protein concentration, tissue biomass etc.) for corals collected from five different locations (A-E), where each site is very unique in its seawater physico-chemical parameters. I collected 12 corals per site (total of 60 samples). I have constructed a resemblance matrix of the physiological data in PRIMER based on Euclidean distances, and there is clear grouping of data points in the NMDS, which coincides with the different collection sites for each coral. I want to investigate the proportion of the observed variation in the multivariate data cloud that can be explained by the environmental characteristics of each collection site (e.g. mean annual sea surface temperature, seawater chlorophyll concentration, salinity etc.). However, the dataset of environmental variables does not have replication. i.e. for each site (A-E), I only have one value for mean annual sea surface temp, one value of salinity etc.

All of the case-study examples I have read about distance-based redundancy analysis in R or PRIMER have two resemblance matrices (predictor and response) both of which have replication. However, in my case, my response variables have replication (i.e. 12 samples per site), whereas my environmental variables do not have replication (i.e. one measurement per variable per site).

Can someone advise me whether or not dbRDA is suitable in this instance? If as I predict, it is not suitable, can you recommend a better approach? I am not an expert in multivariate statistics, but I want to make sure that the approach I take is sound.

Any and all advice is welcome. Thanks

It is possible to learn a standard SVM in a kernel space. But is it possible to do the same with L1 regularization ?

Discriminant analysis has the assumption of normal distributions, homogeneity of variances, and correlations between means and variances. If those assumptions are not fullfilled, is there any non-parametric method that can be used as a "substitute" for Discriminant analysis?

Many thanks in advance.

Hi everyone

I have a dependent categorical variable of three levels, corresponding to three sectors of activity in the agricultural field, A, B and C and within each of them there are sub-levels, for example under A there are four sub-levels a1, a2, a3 and a4.

Over time, for more profit, farmers change their activity, this change is subject to several factors like demands, financial support, etc. (we are talking here about independent, quantitative and categorical variables).

For example, a farmer who practiced activity A, changed his activity to B, i.e. He has completely changed the activity sector or can only change to a subsector, for example moving from “a2” to “a1” (the same applies to other farmers.

Is there a statistical technique that can be used to model these changes?

I have a dataset made by plant species as presence/absence (1/0) found in 133 samples within an archeological area. Every sample has a particular substrate, a position in the monument and it is made in a monument (for substrate, position and monument I have data as words, no numbers, eg for substrate I have green rock or black rock). My aim is to see if some species or groups of species are more associated with some substrates or positions or monuments or if there is any other pattern. What method would you recommend me to try with? Thank you very much in advance!

I'm trying to build up a model to use 'chemical data' to predict 'sensory evaluation data'.

I've got multiple sensory data blocks (Matrix a1, a2, a3, ..., an) and one chemical data block (Matrix b). Now I'm trying to build up a chemical parameters - sensory evaluation model.

Is there any multivariate statistical methods that I can use? As far as I know, the L-PLS can only deal with 3 matrixes. Or any machine-learning methods you'd like to recommend?

Many thanks!

Hi!

I have 3 IVs (metric):

1. affect sensitivity (subscales: positive affect, negative affect)

2. affect regulation (self-relaxation, self-motivation)

3. stress (threat, demand)

and I want to measure 3 DVs (metric):

a. self-access

b. well being

c. symptoms

Moreover, I want to measure a moderator effect (culture, nominal).

How can I measure both multiple regression (for all IVs and each subscales) and moderator analysis for my case?

Thanks in advance. :)

It is known that PCA maximizes the total variance explained. Let's say it hits a total variance of 51%. Yet, doing a factor analysis instead would reduce such a total variance to 30% (naturally, since FA looks at the common variance only).

In such a case, is calculating the AVE (convergent validity) based on the FA loadings irrelevant since one is only going to use the PCA loadings/variance/scores in the first place and in subsequent analysis? If yes, can one conclude that his 'AVEs'

**based on PCA**are > 0.5? The goal is just data reduction for multiple regression and not (C)FA or any other SEM modeling.I would like to use multivariate statistical methods to characterize a hydrogeochemical processes and controlling factors of groundwater quality in a semi-arid region of Algeria

Suggest me the best software to develop a regression equation using more than five independent variables.

I need to use multivariate statistical techniques to extract information about the similarities or dissimilarities between to extract information about the similarities or dissimilarities between sampling sites, identification of water quality variables responsible for spatial variations in groundwater quality, and the influence of possible sources (natural and anthropogenic) on the water quality parameters.

Hello,

I am using multivariate multiple regression for my master's thesis but I'm not sure if I am doing the analysis and reporting it in the right way. I have very limited time till the deadline to submit thesis. So any help is very much appreciated

I would be really glad if someone can recommend/send articles/dissertations using this analysis.

Thanks in advance,

Yağmur

**How can I do mean separation for treatments arranged in factorial ways by SAS? Or could you tell me other software which can do this? Thank you**

I run repeated measures ANOVA for an intervention study. I have 3 intervention groups and 3-time points. The output in SPSS showed that there is

*no significant main effect of time*and*no significant interaction effect for group*time*in**Tests of Within-Subjects Effects**table. When I checked**Tests of Between-Subjects Effects**, I also did not find a significant result for the group. However,**pairwise comparison tables with Bonferroni**, there is a significant difference between two 2 time points in my experimental group (one of my intervention groups). Also,**Multivariate Tests**table indicated*Wilks' lambda*to be significant for the experimental group.I got confused about these findings and looked up what people report in the articles. In some papers, people were reporting Wilks' lambda, while in others people where reporting main and interaction effects. What would you recommend me to do? Is there any rule of thumb?

Say I am interested in examining individual differences in cognition and behavior and am interested in how specific survey scores and parameters predict/covary with performance on a task. How would I analyze this data based on the literature?

Are there conventional methods for analyzing differences in psychological phenomenon across individuals? Is that exactly what uni/multivariate statistics is for? Or are there alternative methods? Are there where advanced statistics comes in?

Is it more compelling and/or informative to analyze individual differences in a single subject design, an aggregate model/submodel (GLM), or as a dynamical system?

What does the basic and current literature say? What papers or books explicitly discuss this?

Thanks,

JD

In your experience, how do you know or what do you need to check in order to declare econometric data as bad (univariate & multivariate cases).

Recently several measures for testing independence of multiple random variables (or vectors) have been developed. In particular, these allow the detection of dependencies also in the case of pairwise independent random variables, i.e., dependencies of higher order.

Thus, if you had a dataset which was considered uninteresting - because no pairwise dependence was detected - it might be worth to retest it.

If your data is provided in a matrix x where each column corresponds to a variable. Then the following lines of R-code perform such a test with a visualization.

install.packages("multivariance")

library(multivariance)

dependence.structure(x)

If the plot output is just separated circles (these represent the variables) then no dependence is detected. If you get some lines connecting the variables to clusters then dependence is detected, e.g.

dependence.structure(dep_struct_several_26_100)

dependence.structure(dep_struct_iterated_13_100)

dependence.structure(dep_struct_ring_15_100)

dependence.structure(dep_struct_star_9_100)

Depending on the number of samples and number of variables the algorithm might take some time, the above examples with up to 26 variables and 100 samples run quickly.

Due to publication bias datasets are usually only published if some (pairwise) dependence is present. Thus there should be (plenty) of cases where data was considered uninteresting, but a test on higher order dependence shows dependencies. If you have such datasets, it would be great if you share it.

Comments and replies - public and private - are welcome.

For those interested in a bit more theoretic background: arXiv:1712.06532

I have the data of total dissolved soilds of apple as references (y-variable).

I also have near-infared spectra data as predictors (x-variables).

I have the

*StatSoft Statistica*software for the analysis.I am doing a study on classifying fruits on visible region taking spectra with Vis-NIR spectroscopy. I want to classify the fruit based on maturity in terms of skin color. I am trying to use SVMC, SIMCA and KNN with PLS toolbox. I went to the wiki site of the eigenvector, but the procedure described there is a bit blurry to me. It would be great if somebody could tell me the stepwise procedure on how to perform these on PLStoolbox using Matlab.

I've been through google search, signed up to a specialised statistical website and checked on my texts (though not advanced), and I can't find a nonparametric analog to the one-way MANOVA. Any accurate advice please?

The multivariate statistical framework MaAsLin detected the abundance of bacterial class TK17 positively correlated with mean salinity (q=0.19) in the roots and rhizosphere of my study tree. I can't find much information on this class and I'm wondering if anyone out there is familiar with this class of bacteria.

Hi everyone, i am new to statistics and would like to consult you about some statistics issues.

I have a judgement task with multiple-choice questions. The design is a 5x2 factorial design, in this case i have 10 conditions, each condition had 10 questions, we had a total of 100 questions. Each participant Chooses one choice out of 6 options which correspond to 6 categories (i.e., Category A-F).

In this case, i will collect frequency data, and i think i could use chi square test for the frequency data. I want to know (1) whether any two of the 6 categoies in each condition significantly differ from each other and (2) whether the option A of any 2 conditions differ from each other (in this logic, option B of any 2 conditions and so on so forth).

I have a few questions, they are:

1. Since i have 10 items for each condition for each participant, how should i manage the frequency data, like in a tabulated table or in SPSS? Should i just add up the frequency such that if i have 50 participants i would have a total of 50people x 10items = 500 counts for each condition? Would it be inappropriate because each count represents not one person? I wonder whether there are any better way to handle the case.

2. i think the frequency data in fact can be turned into percentage data. In this case, we will have ratio scale data. then we may be able to use ANOVA to handle the data. But i wonder whether i should do in this way.

Thanks a lot for comments and advice.

i want compare more than 50 groups? SPSS could not performing ANOVA. which test shal i use to compare these groups ? kindly give your suggestions

I have three dependent variables, and 10 predictors and I am analyzing the data with multivariate regression. However, I need to compare the model and the contribution of each predictor with another groups. Any ideas how to proceed?

I should analyse some biomarkers predicting development of AKI using NRI. I have no idea how to do it via SPSS.

I am carrying out a multinomial analysis (dependent variable with 5 categories). I have read multiple discussions on the question what the minimum number of cases is per independent variable. However, I cannot find whether there is a recommended number of cases per category of the dependent variable. Can anyone help me with this?

Many thanks in advance!

Dear all,

do you know if:

1 - can I run an RDA with negative (taxa) values (as delta Control - Treatment)?

2 - Do I have to use the function decostand function on these delta values before performing the RDA?

3 - Shall I use Bray-Curtis distance (dist='bray") in the RDA function?

Best

Alessandro

what is leave-one-out classification method used in discriminant analysis for classifying the cases? how the cases are classified under this mehod?

Looking for

**recent***(preferably meta-analytic*) findings that yielded estimates of the proportion of shared variance among common personnel selection methods such as structured and unstructured interviews, assessment centers, general cognitive abilities tests, personality tests, etc. Thank youI want to perform O2PLS-DA analysis of multiomics data (from different metabolomics lipidomics and proteomics experiments) by using SIMCA 130.2? I have data in matrix format (samples in row with labels and variables in column). I can perform upto PCA, PLS-DA and OPLS-DA, but the O2PLS-DA tab is not active. I think I do have a problem with data arrangement. However, not sure if its the only problem. Any help will be highly appreciated!

I am trying to analyse the degrees of influence of a few environmental factors on benthic mollusc assemblage structure using DistLM and dbRDA plots. After selecting the best model, I have used a forward-stepping selection proceedure based on Bray-Curtis distance measures to run both adj R^2and AIC selection criteria tests. Two things are odd - the results of both marginal tests came out almost identical for both adj R^2 and AIC, and there are no factors/values listed at all for either of the sequential tests! What have I done/what should I have done?

In many studies, it is observed that the geochemical and environmental data do not follow a normal distribution. This may be due to the samples from different populations or origins.

The

*basic statistics*(mean, standard deviations etc.) are sometimes computed considering these data which may lead to bias or wrong results. Because, the statistical methods (classical) are always based on the assumptions of normal distribution of data.*For these types of data,*

*can we compute median (as a measure of location) and median absolute deviation (as a measure of spread) instead of mean and standard deviation?*

*Can we use non-parametric methods for multivariate statistics or statistical tests for such type of data?*

What are the suggestions of statisticians, environmentalists and geochemists?

How many people do I need to recruit if I conduct randomized, between subject pilot study using 4 different condition for 4 type of manipulations?

I read here that principal components scores are always in Euclidean distance and the distance of PCA is Euclidean: https://www.mii.lt/zilinskas/uploads/visualization/lectures/lect4/lect4_pca/PCA1.ppt

Is it true? I have a list of 20 principal components scores and have never been shown what distance measure do they represent. I want to calculate the Manhattan distance similarity and indicies between my samples according to these 20 principal components, but it would be pointless if the principal components are already made of Euclidean distance and I calculate out of them the Manhattan. So do always the PC scores actually represent the Euclidean distance measure or any else? Or they are not based on any distance measure? I hope that they don't, so I can go ahead and obtain the accurate Manhattan distance between the samples.

When conducting a one - way ANOVA, the F ratio is defined as the sum of squares between/sum of squares within.

However, when you actually do the math, the F ratio is the mean square between/mean square within.

For example:

(sum of squares between/degrees of freedom) = mean square (i.e., for variance explained)

And

(sum of squares within/degrees of freedom) = mean square (i.e., for variance not explained, or error).

My question is why do we need to adjust the sum of squares by the degrees of freedom in order to determine the F ratio?

I am looking for suggestions for analyses that can compare of different taxa in terms of the relative difference in composition among sites.

I have 4 parallel datasets of species abundance data from 4 different taxa sampled in the same sites (n=12).

Each site was sampled between 4 - 10 times. Usually (not always) sampling was done at the same time for all taxa within a site, but not all sites were sampled at the same time so the data are unbalanced.

I can create balanced subsets if needed but this would severely truncate the data.

I've heard of co-correspondence analysis, co-inertia anlaysis, and possibly multiple-factor analysis as potential candidates for doing this type of comparison but I'm not sure about the differences or which is most appropriate.

Are there pros and cons/restrictions/assumptions for each of these?

Is there an alternative method that I have mentioned that would be better?

Also what do these analyses allow me to test exactly - is their intention is to be able say for example that taxa A and B had high correlation in terms of variation in composition across sites, while taxa C showed low correlation with any other taxa ...etc ?

Thanks

Tania

I'm really confused about familywise error. Here are my questions:

1. I know that multiple comparisons (e.g., Group A vs Group B, Group B vs Group C, Group A vs Group C) based on the same dependent variables will increase type 1 error, and that's the reason why we use ANOVA instead of multiple independent t-tests. But what if the multiple comparisons are based on multiple dependent variables (DV1, DV2, DV3) within the same two groups (eg. Group A vs Group B)? Does it need correction, such as FDR?

2. What about multiple one-way repeated-measures ANOVAs? If there are several dependent variables, do I need to adjust the p-vlaue? How can I adjust it (the p-value for the main effect and interaction effect)?

3. I have learned some about robust statistics recently, and I'm not sure if multiple comparisons / familywise error in robust statistics is also a problem. If so, how can I correct it? I read some books and searched online, but found no information concerning familywise error in robust statistics. So could you give me some ideas?

Any ideas are appreciated. Thanks a lot!

I want to run a before/after test probing the change of variability.

This is my setup: 7 operators, 6 samples, 2 measurements (1 before training and 1 after the training) being the two measurements made on the same sample. Roughly speaking I have a 6x7x2 matrix of measurements of many different physical quantities.

I want to demonstrate that, for each physical quantity, the measurements made before training are less sparse than those made after the training ("the training is usefull and serve to standardize the operator's skills").

I can not figure out how to demonstrate this. Running a two-way anova on a single physical quantity I get the results reported in the figure test_anova2.bmp.

It is clear that the variability between operators is largely decreased. This proof is quite naive and not rigorous, moreover this is only one physical property but I have more than 20 features to take into account.

Finally my question:

Is there a rigorous method to prove what I see naively? Do I have to run the test on every single physical property or there is a way to use all of them toghether?

Any help will be appreciated.

Thanks

Hi

i have data coming from a survey tends to measure the degree conservative behaviours. However, first i will conduct hierarchical cluster analysis and then k-means clustering to create my blocks. Since clustering algorithms has a few pre analysis requirements, i suppose outliers will not be a problem at first stage.

However i am planing to define my clusters by using factor scores, which i am going to produce by using factor analysis method. Unless it was not a problem for cluster analysis, on factor analysis and discriminant analysis stage i will be more subjected to outlier effects.

Since all my variables has same range value, there are some significant outliers among data. Because these data is survey data collected from individuals, i believe i should only eliminate outliers stemming from data entry errors?

Do you think this is a proper way?

Thanks a lot

I am trying to obtain transfer curve (Vg, gate voltage vs. Id, drain current) with graphene transistor using 2636A Keithley source-meter. Device basic structure is attached.

I have put drain voltage fixed as 30 mV and made the gate voltage sweep from -20 V to 20 V. When the gate voltage is negative, I am getting perfect curve as it should be theoretically. But when the gate voltage appears as positive, I am getting negative current. I have attached the curve I have obtained. Your kind suggesting is requested to come out from this predicament.

As objects for my cluster analysis, I compare three different types of Sport Mega Events (Summe Olympics, Winter Olympics and World Cups) and their correspondent impact factors (Costs, Surface area, New Venues etc.). As I don't want to compare just the absolute variation of the impact but more the relative variation of the impact of the events, I want to cancel out the 'between variation' between the different types of events. If I would compare just the absolute variation, it is quite reasonable that bigger events have just bigger impacts. But that is obvious and not very astonishing.

Therefore I apply the following homogenisation technique to impact factors to cancel out the between variation:

x(E,T) − x ̄(E) + x ̄

x ̄ is the overall average.

x ̄(E) is the average per event type

x(E,T) is the Individual impact factor value before homogenisation

Hello,

I would like to check multigroup hypothesis with the help of R Studio/STATA/Mplus Softwares but I would like to read some documents that used a multigroup analysis

After running a multivariate model with 4 dependent variables, I am struggling to calculate the marginal effects of explanatory variables on the dependent. can anyone help

how can I calculate the marginal effects of explanatory variables after running a multivariate probit model on STATA?

I have sampled 6 populations of lizards for which I have presented each lizard with 4 odour treatments. I want to know if lizards flick their tongue less when presented with a control (no odor) then with a specific odour.

My mixed model (individual and trial as random effect, because of repeated measures) was very overdispersed with a poisson distribution, so I used a negative binomial distribution (now a value of 3; is this ok?) in the glmmadmb function. I have a significant 3-way interaction between the population (factor, 6 levels), treatment (factor, 4 levels) and walk (continuous, time spend walking) and get the below output with the summary.

If I am correct, the reference/intercept for the 3-way interaction is popBru : every level of treatment : I(walk/10). This would mean that the estimate for the first comparison popDol:treatctrl:I(walk/10) is obtained by comparing this with popBru:treatctrl:I(walk/10). The estimate of popVis:treathie:I(walk/10) is then from the comparison with popBru:treathie:I(walk/10), etc. So the output is giving me the differences between populations for each level of I(walk/10):treatment, correct?

I am actually more interested in the comparison between treatments within each level of I(walk/10):population. I have tried with the releveling code and by reordering the variables in the model, but it keeps giving me the same. Does anyone know how I can put the focus more on comparing treatments than on comparing populations?

Many thanks,

Charlotte

My study has a control group and an experimental group and involves a pre-test and post-test for each group. The pre-test and post-test are identical and have a subjective measure (ratings of anxiety) and an objective measure (word production ability). I would like to know if the difference between pre and post tests are significant for each group and if there is a significant difference between the two groups. I would also like to know if the subjective and objective measures are correlated. Which statistical analysis or analyses should I use?

This question is for multivariate nonlinear regression analysis.

The model includes the following variables (N=367):

(i) X is an independent variable

(ii) Y is dependent variable

The result shows following significant relationships:

(i) c= X--->Y; Beta value=0.47 (p<0.005)

On including mediators "M" it is found that "M" partially mediates the relationship between "X" and "Y" (Barron & Kenny,1986):

a=X-->M; Beta value=0.80 (p<0.005)

b=M-->Y ; Beta value=0.24(p<0.005)

c*=X-->Y ; Beta value=0.34 (p<0.005)

However, on measuring the moderating effect of Gender (Male=262 and Female=105) in the mediating model following results are obtained:

No mediating effect in the case of male (Barron & Kenny,1986):

c=X-->Y (without mediation) ; Beta value=0.49(p<0.005)

On including mediator "M"

a=X--->M ; Beta value=0.79 (p<0.005)

b=M-->Y; Beta value=0.16 (p=0.21) insignificant path ~absence of mediation

While for female; complete mediation occurs

c=X--->Y (Without mediator) ; Beta value=0.40 (p<0.005)

a=X-->M; Beta value=0.74(p<0.005)

b=M-->Y ; Beta value=0.56(p<0.005)

c*=X-->Y ; Beta value=-0.047 (p=0.796 i.e. insignificant and beta value is negative)

Though, the stated relationship is a case of complete mediation (c* is insignificant and beta value approaches zero) but it is observed that the effect size is greater than 1 i.e. a.b (indirect effect)/c (Total effect)= 0.74*0.56/0.40=1.03 which is considered as inconsistent mediation.

Kindly advice me how will I report such findings and are there any literature support that incorporates inconsistent but full mediation of gender?

Dear all,

I am using Smart PLS. In my measurement model, I noticed I have to delete quite a number of indicators (> 20%) that is below than 0.4 loading (Hulland, 1999). The current construct has Composite Reliability (> 0.7) and AVE (>0.5) and the estimated model fit (SRMR < 0.08). The measurement model now is fit for hypothesis testing. My question is, is it okay to delete such amount of indicators? If okay, do you have reference to assist me to know more? If not okay, why? I would love to hear from you. Thanks.

I am looking for a way to incorporate observation weights into a partial least-squares regression.

More specifically I want to extract the first pair of singular vectors

*u*and*v*from the matrix X^{T}Y where X is an n observations x k predictors matrix and Y is an n observations x p response variables matrix.When the observations are unweighted these singular vectors maximize the covariance between projections onto

*u*and projections onto*v.*I would like this to emphasize maximizing the covariance between some observations more than others.Any pointers are greatly appreciated.

Harry

I have data from a business plan competition. There are a total of 127 judges and 201 business plans in which each of the judges rate each plan they are assigned on 6 items.

Judges are randomly assigned to plans such that each plan has anywhere from 4-10 judges rating them. Plans are also randomly assigned to judges such that no plan has the exact same panel of judges.

Can I calculate ICCs in this case? If so, how would I do that in SPSS?

Hi, I am testing for an association between gender and an ordinal variable. From what I have read online, when dealing with ordinal data, normally we use the the linear-by-linear association output in SPSS. However, over 20% of my cells have an expected count less than 5 (in my case, its 30%). When both variables are nominal and this issue arises, we can use the G-Test (Likelihood Ratio). Is there a similar alternative test for when this issue arises with ordinal data? What can I do at this point? Thanks!

How can I extract two (some) samples out of the original sample? Is there any criteria for that?

Can I just arrange them in ascending order and divide them into two samples with 2500 elements in each?

How can I find the weight of the each sample?

Thank y'all ! :)

"Age and sex differences in relation between frugality and self efficacy" - Here there are 2 Independent variables - age and sex , each having 2 levels (I intend to keep adolescents and young adults as the 2 age groups). There are 2 Dependent variables - frugality and self efficacy.

Objectives :

1. Are there any age differences in frugality and self efficacy?

2. Are there any sex differences?

3. What is the relation between frugality and self efficacy?

Sample :

N = 100 Male ( 50 - adolescents, 50 - young adults ) ; 100 Females (50 - adolescents , 50 young adults )

Data analysis : I would be greatly obliged if I could get the views of Fellow Researchers and Professors about the statistics that are best suited for this research.

I was wondering if MANOVA would be suitable.

I have run my univariate normality test with the rules of -2 and +2 for skewness and kurtosis (George & Mallery, 2010) and multivariate normality test with the rules of < 3 for skewness and in between -2 and 2 for kurtosis (Chemingui & lallouna, 2013). Thank God, the dataset passed all these rules. That's mean I have a normally distributed data. My question is, do I still need to bother outliers? Or I shall just report outliers to be not applicable in my study? Even if there are outliers, my data is still normally distributed. What's your opinion?

*I'm using Likert Scale for my entire questionnaire.

I am using this method for interpreting the proportion of each predictor variable in a disease occurrence which contains 3 outliers, then I cannot use multiple linear regression, now I want to know that what are the assumptions of Probit-Logit model.

in proc LIFETEST there can't be missing data, because it is cut. some of below procedures may be iplemented in survival analysis?

"

**CALIS**Procedure — Fits structural equation models**GEE**Procedure — Generalized estimating equations approach to generalized linear models

**MCMC**Procedure — General purpose Markov chain Monte Carlo (MCMC) simulation procedure that is designed to fit Bayesian models with arbitrary priors and likelihood functions

**MI**Procedure — Performs multiple imputation of missing data

**MIANALYZE**Procedure — Combines the results of the analyses of imputations and generates valid statistical inferences

**SURVEYIMPUTE**Procedure — Imputes missing values of an item in a data set by replacing them with observed values from the same item and computes replicate weights (such as jackknife weights) that account for the imputation" (Sas documentation)

I have 14 treatments that i have analysed using ANOVA. There were no significant differences. Just by looking at the treatment means I am convinced there are differences. So which other statistical test can i use. I even used the Kruskal wallis test, since the data is not normal.

I am trying to find out association between risk and disease (education and reporting of NCD) of populations in three different areas (i.e. in spatial context).There are six population groups. Each pair of population group belong to a particular neighborhood and each pair itself is made up of two population groups; one from higher and one from lower.In some cases Relative Risk is greater than 1 (significant) indicating causation. What can be the probable explanations if the value is not significant?

I would like to analyse three breeding grounds, one active, one recently abandoned and one abandoned for several years. The predictors are continuous, rough percentage of the coverage of a particular habitat type. Their distribution is quite skewed, many times only either 100 or 0%. The units are patches, altogether around 300, with 30 predictors. The dependent variable is therefore multinomial (three classes exactly). Which would be a good statistical technique to analyse this situation? I thought of multinomial boosted regression trees but cannot find any good instructions to do it. Thanks!!!

I plan to conduct a study that includes one continues dependent variable (attitudes) and seven categorical independent variables teaching position (general or special education teacher), gender, level of education, previous inclusive teaching experience, years of teaching, training in inclusive education, and the presence or absence of family members with disabilities).

I will use descriptive research to obtain information about the target population and describe the characteristics of the teachers in my study. The second method is correlation research to determine whether or not there is a relationship, without exploring the cause-effect links, between the dependent and independent.

So, I developed three questions and one of them is the following:

R3: Are teachers’ attitudes toward the inclusion of hard of hearing students in general education classrooms in public schools differentiated by factors including current teaching position, training in inclusive education, the teacher’s gender and level of education, previous inclusive teaching experience, years of teaching, and the presence or absence of family members with disabilities?

And for analyzing this question

I was going to use t tests, and one-way ANOVAs to determine the relationship between the independent variables, and the dependent variable as following:

1. The differences in teachers’ attitudes toward the inclusion of students who are hard of hearing based on teaching position (independent t-test to compare differences in group means).

2. The differences in teachers’ attitudes toward the inclusion of students who are hard of hearing based on gender (independent t-test to compare differences in group means).

3. The differences in teachers’ attitudes toward the inclusion of students who are hard of hearing based on level of education (one-way ANOVA to compare differences between group means).

4. The differences in teachers’ attitudes toward the inclusion of students who are hard of hearing based on previous inclusive teaching experience (independent t-test to compare differences in group means).

5. The differences in teachers’ attitudes toward the inclusion of students who are hard of hearing based on the number of years of teaching (one-way ANOVA to compare differences in group means).

6. The differences in teachers’ attitudes toward the inclusion of students who are hard of hearing based on teachers’ training in inclusive education (independent t-test to compare differences in group means).

7. The differences in teachers’ attitude toward the inclusion of students who are hard of hearing based on having family members with disabilities (independent t-test to compare differences in group means).

But some of my colleagues suggesting using other analysis model where I can control the effect between all independent variables on the dependent variable. In other words, they indicated that there is may be interaction between the IVs that might influence the effect of the relationship between one independent variable and the depended variable and so I have to control that. I was thinking of using the Analysis of Covariance (ANCOVA), but I am a little confuse how to control the effect of all these independent variables. Any suggestions

Dear Sir/ Madam,

I collected data from 107 respondents on their three cognitive styles and five types of behavioral characters. So, every respondent gave me their response on eight variables (three cognitive styles and five behavioral characters).

Now I want to make the relations among eight variables. Please suggest me what types of statistical test (SPSS) are suitable for making the above relations.

If you have any material (with data analysis) like the above problem, please share with me.

With Regards,

Surajit Saha

(PhD Scholar, IME Department, IIT Kanpur, INDIA)

i have experiment with 6 factors when i use BBD i got 54 runs but i want increase the number of runs for more accuracy. If i use CCD i got 86 runs but 3 or 4 of them have negative values. Is it Ok to delete these negative runs and carry on the experiment normally i mean this have no effect on the model ? and what is better to use in this case BBD with replicates or CCD ?

I want to run a regression analysis to study the relationship between material porosity and PPI on velocity loss in a metal foam. Porosity and PPI are predictor variables and Velocity loss is dependent variable. I have two separate metal foams-Copper and Aluminum. I am confused as to whether I must should use multilevel regression or nested regression (typical multi-variable regression).

Hi all,

i want to do multiple imputation on iem level, since several studies have shown the superiority of item level imputation compared to scale level.

I want to do a sensitivity analysis. Some authors suggest, to add or substract a constant to the imputed values. However many of the applications used continious scale scores for such analysis. Since I do imputation on item level, I could add such a constant, however, I have never seen any comparable application. I know many of NMAR models (selection models and so on), however, since I am not interested in growth, they do not fit my application (at least I think so).

So, du you think this strategy is reasonable and do you have any papers, that have used this strategy. Or any other suggestions?

Many thanks in advance,

best wishes, Manuel

I have a matrix with species as columns and sites as rows and it contains counts of individuals per site per species. It was suggested that due to the high variability in counts (they go from 0 to ~3000), I should transform my matrix. Basically, I am doing multivariate statistics to find differences in community compositions among bioregions.

My knowledge of stats is by no means great, but I do know transformations are used to normalize your data and perform parametric statistical analyses, and you can check if the transformation applied does this by doing a test (e.g., Shapiro-Wilks). But, this is different to what I want to do and I don't understand why and when transformations should be applied to your data and how do I decide that the transformation chosen is the correct one or most suited for my data and research question. Is there like a rule of thumb for the application of transformations (e.g., if the SD is over 2 or the counts vary by more than two orders of magnitude, etc. the data should be transformed)?

I have not been able to find information about this, so I would really appreciate your help.

Thanks!

I am not a statistician and would be very thankful if someone could clarify me this.

I am reading about the Generalized Additive Models that 'they don't handle interaction well. Rather than fitting multiple variables simultaneously, the algorithm fits a smooth curve to each variable and then combines the results additively, thus giving rise to the name Generalized Additive Models.' (in http://ecology.msu.montana.edu/labdsv/R/labs/lab5/lab5.html)

Could someone give an example of two interacting variables and how they are being handled by GAMs?

Hi, I have 3 groups of data. SGA: <5% birth weight, AGA: 5%≤birthweight≤95%, and LGA: >95% birth weight. 50 SGA and LGA cases were selected from a cohort, and matched to 50 AGA cases according to ethnicity. Hormones levels were measured. I can't find any significant associations within each group. Can I run a multiple regression with these 150 cases altogether with hormones as independent variables to predict birth weight? How can I test it is reasonable to do that?

I'm hoping the informed can kindly help me with some suggestions on the appropriate statistical techniques to use for panel data collected from 60 biofilter columns over 6 months.

1.

**The data.**There are several different designs. Each design represents a unique combination of vegetation, filter media and saturation zone type. There are ~5 replicates for each design. In some months, they were all dosed with greywater, and with stormwater in other months. We have collected outflow concentration of several pollutants including P, N, TSS and metals, each month.2.

**My plan A.**This was my original plan. The idea is to treat the data set as panel data and run a regression using the appropriate method. Since we've been tracking the performance of each biofilter through time, I surmise that the 'Pooled Cross-Section' method probably isn't good. So, I should use 'Fixed/Random Effect.' This plan has been put in doubt due to further review of literature.3.

**Plan B, your suggestion.**When I looked at how other did their analysis on measurements collected over time, I found that none of them used the methods I stated above. They just used k-way ANOVA most of the time. At best, they included Kruskal-Wallis and Principle Component Analysis. What also seem strange to me is that, even though they collected multiple pollutant measurements at different times, they presented only 1 single value in the their papers. Did they some how take an average? If so, what's the averaging method?I would appreciate any suggestions.