Science topic

# Multivariate Data Analysis - Science topic

Explore the latest questions and answers in Multivariate Data Analysis, and find Multivariate Data Analysis experts.

Questions related to Multivariate Data Analysis

Hello,

I am using multivariate multiple regression for my master's thesis but I'm not sure if I am doing the analysis and reporting it in the right way. I have very limited time till the deadline to submit thesis. So any help is very much appreciated

I would be really glad if someone can recommend/send articles/dissertations using this analysis.

Thanks in advance,

Yağmur

Hi ! I'm looking for an open source program dealing with exploratory technique of multivariate data analysis, in particular those based on Euclidean vector spaces.

Ideally, it should be capable of handling databases as a set of k matrices.

There is a software known as ACT-STATIS ( or a an older version named SPAD) who perform this task, but as far as I know they are not open source. Thanks !

I ran a PCA analysis on SPSS Using varimax rotation . there were 14 variables and three components were extracted. I noticed that some of the variables loaded strongly on two different components. that is one variable loaded strongly on more than one component. what is wrong pls

Multivariate data analyses were and still an effective tool to solve problems related to several topics, however I wonder if anyone can use it in the field of artificial intelligence. How can we do it and what are the restrictions as well as the aims that should be kept in mind when we do that?

There are several independent variables and several dependent variables. I want to see how those independent variables affect the dependent variables. In other words, I want analyze:

[y1, y2, y3] = [a1, a2, a3] + [b1, b2, b3]*x1 + [c1, c2, c3]*x2 + [e1, e2, e3]

The main problem is that y1, y2, y3 are correlated. y1 increases may have lead to decrease of y2 and y3. In this situation, what multivariate multiple regression models can I use? And what assumptions of those models?

How to reduce BIAS and improve RMSE in multivariate data analysis?

I did a MCA analysis using FactoMineR. I know how to interpret cos2, contributions and coordinates, but I don't know how values of v.test should be interpreted.

Thank you

Hi all,

For my master thesis, I conducted a Multivariate multiple regression since my three dependent variables are correlated with each other. I used Stata 16 and the command "mvreg"

However, I can't find how I get the adjusted R-squared and I really want to report the model fit but the only value I got is the R-squared. Is there a specific reason that I can't get the value of the adjusted R-squared? Or does someone know how to get the adjusted R squared in the right way so I can report it?

Furthermore, I can't find how to control for multicollinearity with VIF values after MMR and I think that the explanation that my independent variables and control variables are not highly correlated with each other when looking at my correlation table are not sufficient to exclude multicollinearity.Does someone know how to do this?

Thanks in advance!

Hi there,

For my hierarchical regression model, I have planned to report the VIF values to indicate collinearity. I worry that as I am including interaction terms, these values will be high. Would entering the interaction terms separately (i.e., without the proposed moderator) help with this?

Secondly, what is the recognised threshold for a critical VIF value? This stats resources states a value lower than 10 is acceptable (Hair, Joseph F., et al. Multivariate Data Analysis: A Global Perspective. 7th ed. Upper Saddle River: Prentice Hall, 2009. Print.), but some studies have standardised / mean centred the variables even at a VIF of approx 4.0. Lastly, any advice on choosing standardisation or mean centring would be greatly appreciated.

Thanks a lot,

Esther

Discriminant analysis has the assumption of normal distributions, homogeneity of variances, and correlations between means and variances. If those assumptions are not fullfilled, is there any non-parametric method that can be used as a "substitute" for Discriminant analysis?

Many thanks in advance.

Hello,

is it possible to use the linear discriminant analysis (LDA) to determine which of the analyzed variables best separates the different groups (which are already known)?

For example, I want to understand how 3 different croplands are different in terms of ecosystem services provisioning. So, I decide to measure 4 variables for each ecosystem (Soil Carbon, Dry matter, Biodiversity, and GHG) and then I run an LDA analysis (on PAST 3.4 here)

I get this result (see the attached picture). Here clearly the Grassland seems to be more different than the other two croplands (because it is more displaced than the other two croplands on the X-axis).

Would it be correct to conclude that this grassland differs most from the other 2 crops and this seems to be determined by its level of biodiversity?

Thanks (and of course, these data are not real. That's just an example)

Dear colleagues,
I'm working with Path analyses in lavaan and MVN packages. There are some results that for me are confusing and features from the MVN that I do not know how to set.

My dataset is composed of 130 rows and 7 variables.
Using the MVN package to run Mardia, hz and Royston, results indicate that my data is multivariate normal.
However, the Mahalanobis distance at different alpha levels give me different results:
alpha 0.5 = 14
alpha 0.6 = 8
alpha 0.65 = 5
alpha 0.7 = 0

I have different questions:
1. what is the meaning of 'candidate' outliers? if MVN and Q-Q plot indicates good fit to MVN, should I not consider the outliers? based on what?

2. if outliers are the main concern to get multivariate normality, how can it be that I get multivariate normality by 3 different methods while I have 0-14 candidate outliers?

3. Most important, where can I read which alpha and tolerance values I have to use in my case? Is there any way I can see any tutorial, or recommendations of the alpha according to sample size and other data characteristics?

I was looking on the web and I have found no answer to this. So, any literature recommendation or advice will be welcome.

Thanks and sorry for taking your time.
Sincerely,

I'm currently working on my master's thesis, in which I have a model with two IVs and 2 DVs. I proposed a hypothesis that the two IVs are substitutes for each other in improving the DVs, but I cannot figure out how to test this in SPSS. Maybe I'm thinking to 'difficult'. In my research, the IVs are contracting and relational governance, and thus they might be complementary in influencing my DVs or they might function as substitutes.

I hope anyone can help me, thanks in advance!

For my research there're five independent variable (Risk, usefulness, Awareness, Complexity and Ease of use) and single dependent variable (Adoption of e-banking). I want to test the relationship between IV and DV using Pearson correlation analysis and multiple regression analysis to test the hypothesis in spss. I know Independent variables can be collected through survey but i don't know if Dependent variables can also be collected from survey. So, my question is, can i collect dependent variable through survey using a 5-point likert scale?

*Hello everyone*

I would compute Hosking & Wallis discordancy test based on L-comoments for multivariate Regional Frequency Analysis. Please help me by answer to these questions.

**1- Is that the same transpose of the U matrix in the attached file?**

**2- Are there any R packages related to multivariate L-comoments homogeneity tests?**

I worked with “lmomco” and “lmomRFA” before this.

*Thanks for any help.*

I am making a research design for estimating the relation of three independent variables namely organic, conventional and integrated agriculture in relevance to the decrease of soil erosion. i need to decide on what kind of analysis i need, that is, bi-variate or multivariate data analysis. i feel that taking pairs of correlation between pairs of variables is more appropriate than looking for interaction that leads to a aggregate measurement that is, multiple correlation or multiple linear regression. so anyone that can comment and help to answer my question will be appropriate as a feed back.

According to Hair et al. not only parameters or observed items but also model complexity and number of constructs with communalities should be focused for sample size calculation. They mentioned that minimum sample size should be at least 500 if the model with large numbers of constructs (>7) and some with lower communalities and/or having fewer than three measured items.

Hair JF, Black WC, Babin BJ, Anderson RE. Structural equations modeling overview. Multivariate Data Analysis, 7th edition. United Kingdom: Prentice Hall PTR; 2013.

My data- set has 5 variables. One of the variables is the group . There are 10 different groups. How to check the relationships between the groups based on the other 4 variables.

I want to check which groups are most similar to each other and which are very different?

Also how do I plot the hierarchical nature and the spatial nature of these relationships between groups.

Which multivariate technique shall I choose ? I am using R .

Thank you very much in advance for your help.

Hi,

I have analysed my data using multivariate multiple regression (8 IVs, 3 DVs), and significant composite results have been found.

Before reporting my findings, I want to discuss in my results chapter (briefly) how the composite variable is created.

I have done some reading, and in the sources I have found, authors simply state that a 'weighted linear composite' or a 'linear combination of DVs' is created by SPSS (the software I am using).

They do not explain

*how*they are weighted, and as someone relatively new to multivariate statistics, I am still unclear.Are the composite DVs simply a mean score of the three DVs I am using, or is a more sophisticated method used on SPSS?

If the latter is true, could anyone either a) explain what this method is, or b) signpost some useful (and accessible) readings which explain the method of creating composite variables?

Many thanks,

Edward Noon

I am doing a study on temperature compensation study on fruit dry matter prediction using NIRS spectra. As I don't know much about matlab and mostly perform my multivariate regression using Unscrambler software, I am looking for simplified version of external parameter orthogonalization algorithm.

For instance comparing satisfaction levels coded on a scale from 0 (completely dissatisfied) to 10 (completely satisfied) between 2010 and 2011.

the PCA results for a multi layer aquifer (carbonate karst layer and alluvial aquifer are the most reservoir) gives three factors (eigenvalues above 1). the PC1 shows positive weightings with electrical conductivity, Cl, Na, Ca and negative weightings with HCO3. PC2 shows moderate positive weightings with pH and SO4 and moderate negative weightings with Mg. PC3 shows moderate negative weightings with K and moderate positive weightings with NO3 & pH. what is the meanning of the two last factors (PC2 & PC3). thank you in advance.

I want to correlate meteorological data and particulate matter data. Can I use both the Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA)? Or is there any preliminary test to determine which one to use? Thanks!

Hello,

a seemingly simple design question: The aim is to visualize the dependence of A and B by connecting A and B by a straight line (possibly with a label). The design options are: line type, line strength, text or symbolic label.

How would you visualize the "significance" and/or "strength" of the dependence?

Details:

- A and B are either independent (no line) or dependent. They are considered dependent if the likelihood of being independent (the p-value / "significance") is small (which corresponds in each setting to a certain value of a test statistic).

- The "strength" of dependence of A and B might be given on a scale, e.g. [-1,1] if one considers classical correlation.

(The use of colour is a further design option, which breaks down in black and white print. Therefore it was excluded.)

### all below can be skipped, it provides only further details for the reader interested in the background of the question ###

The detection of dependence and its quantification are usually separate procedures, thus a mixture of both might be confusing...

Background:

Apart from many other new contributions the paper arXiv:1712.06532

introduces a visualization scheme for higher order dependencies (including consistent estimators for the dependence structure).

Based on feedback there seems to be a tendency to interpret the method/visualization by a wrong intuition (rather than by its description given in the paper)... so I wonder if this can be moderated by an improved visualization.

If you want to test your intuition use in R:

install.packages("multivariance")

library(multivariance)

dependence.structure(dep_struct_several_26_100,alpha = 0.001)

dependence.structure(dep_struct_star_9_100,alpha = 0.01)

dependence.structure(dep_struct_ring_15_100,alpha = 0.01)

# which performs dependence structure detections on sample datasets

The current visualization does NOT include the "strength" of dependence, but that's what some seem to believe to see.

The paper is concerned with dependencies of higher order, thus it is beyond the simple initial example of this question. But still, it depicts dependencies by lines and uses as a label usually the value of the test statistic. Redundancy is introduced by using colour, line type and in certain cases also the label to denote the order of dependence.

It seems that using the value of the test statistic as label causes irritation. The fastest detection method is based on conservative tests, in this setting there is a one-to-one correspondence (independent of sample sizes and marginal distributions) between the value of the test statistic and the p-value - thus it provides a very reasonable label (for the educated user). In general the value of the test statistic gives only a rough indication of the significance.

A further comment to the distinction between "significance" and "strength": In the paper also several variants of correlation-like measures are introduced, which are just scaled version of the test statistics. Thus (for a fixed sample size and fixed marginals) there is also a one-to-one correspondence between the "strength" and the conservative "significance". These measures also satisfy certain dependence measure axioms. But one should keep in mind that these axioms are not sufficient to provide a sensible interpretation of different (or identical) values of the "strength" in general (e.g., when varying the marginal distributions). ... that's why currently all methods are based on "significance".

Recently several measures for testing independence of multiple random variables (or vectors) have been developed. In particular, these allow the detection of dependencies also in the case of pairwise independent random variables, i.e., dependencies of higher order.

Thus, if you had a dataset which was considered uninteresting - because no pairwise dependence was detected - it might be worth to retest it.

If your data is provided in a matrix x where each column corresponds to a variable. Then the following lines of R-code perform such a test with a visualization.

install.packages("multivariance")

library(multivariance)

dependence.structure(x)

If the plot output is just separated circles (these represent the variables) then no dependence is detected. If you get some lines connecting the variables to clusters then dependence is detected, e.g.

dependence.structure(dep_struct_several_26_100)

dependence.structure(dep_struct_iterated_13_100)

dependence.structure(dep_struct_ring_15_100)

dependence.structure(dep_struct_star_9_100)

Depending on the number of samples and number of variables the algorithm might take some time, the above examples with up to 26 variables and 100 samples run quickly.

Due to publication bias datasets are usually only published if some (pairwise) dependence is present. Thus there should be (plenty) of cases where data was considered uninteresting, but a test on higher order dependence shows dependencies. If you have such datasets, it would be great if you share it.

Comments and replies - public and private - are welcome.

For those interested in a bit more theoretic background: arXiv:1712.06532

I'm looking for examples and analyses of autocorrelation tests performed on cross-sectional data and the issues encountered (data prop for analysis, problems with estimations, etc). Not interested in spatial autocorrelation.

Many thanks in advance!

I have a dataset having 56 variables, in which 4 dependent and 52 independent variables. In those independent variables 45 variables are categorical and 3 dependent variables are categorical remains are continuous. Each variables having 1500 observations. Independent variables are nominal, and dependent Categorical variables are ordinal. I want to check, there is any effect of independent variables on each dependent variables .

Dear all,

My question is the following:

I have large datset: 100,000 observations, 20 numerical and 2 categorical variables (i.e. mixed variables)

I need to cluster these observations based on the 22 variables, I have no idea how many clusters/groups a priori I should expect.

As the large dataset I use clara() function in r (based on "pam").

Because of the large number of observations, there is no way to compare distance matrixes (R does not allow such calculations, and is not a problem of RAM), therefore the common way of cluster selection using treeClust() and pamk() and comparison of "silhouette" does not work.

My main quesitons is: can I use factors like total SS, within SS, between SS to have an idea of the best performing Tree (in terms of number of clusters)? Do you have any other idea of how can I select the right number of clusters?

Best regards

Alessandro

I'm trying to use Mettler Toledo's IC Quant software to generate calibration models for the reaction components. I run my reactions at high temperature and pressure, since I cannot collect the reaction standards needed for the calibration at the reaction temperature, I prepare the reaction standards first by running the reactions to different conversions of my limiting reagent (so that I have different concentrations for the components). I then collect the spectrum of these reaction standards at room temperature and use these spectra and the measured GC-FID concentrations for multivariate data analysis.

Now the problem is, the absorptions are becoming less intense with increasing temperature. Hence when I try to apply the calibration model (built using reaction standards collected at 25 C) to the real-time reaction spectra collected at the reaction temperature of 140 C, I see a significant offset in the predicted concentrations from that of its actual value (the predicted concentration have negative values). I also notice that the temperature dependence is linear in the range that I tested (25 - 140 C). I'd like to to know if there is a standard procedure to apply the temperature correction to the spectra collected at a different temperature in real-time to get accurate predictions for concentrations.

A power analysis software such as G3 can determine the minimum required sample size for logistic regression, but I can't find a software to determine the sample size for a multinomial logit regression

I'm intrested in forecasting Stock market data. I tried to predict the close price and volume seperately using

**ARIMA**, but not got a better result. So I tried with**LSTM**model. If I want to perform a multivariate analysis, considering the co-relation of both variables, which are the best**multivariate analysis techniques**?I have the data of total dissolved soilds of apple as references (y-variable).

I also have near-infared spectra data as predictors (x-variables).

I have the

*StatSoft Statistica*software for the analysis.when we do EFA, do we need to include all variable's item together or each variable's item separately? (For example, imagine i have 3 latent variables and each latent variable have 10 item. When we do EFA do we need to put 30 items together or each 10 items separately? Meaning that doing same procedure 3 times for each variable.)

I am doing a study on classifying fruits on visible region taking spectra with Vis-NIR spectroscopy. I want to classify the fruit based on maturity in terms of skin color. I am trying to use SVMC, SIMCA and KNN with PLS toolbox. I went to the wiki site of the eigenvector, but the procedure described there is a bit blurry to me. It would be great if somebody could tell me the stepwise procedure on how to perform these on PLStoolbox using Matlab.

Hi all,

I have a data frame of multivariate abundances, measured from sites under two different treatments. These sites have been sampled each summer for ~15 years, and thus are repeated measures.

I am modelling my data using the mvabund package in R, which fits Generalized Linear Models.

My model is of the form abundance~Year*Treatment

When I perform an ANOVA on my model, I need to account for the lack of independence between years as the samples are repeated measures. I want to do this using restricted permutations with the 'permute' package in R.

However, I am struggling to theoretically understand how I need to permute my data in order to account for this.

Any help or suggestions would be greatly appreciated. First time poster so sorry if this is at all unclear.

For example, I do ADONIS (http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/adonis.html) to investigate an effect of BMI (continues variable) on beta-diversity (weighted UniFrac) and I got significant p-value. This means that BMI have an effect on beta-diversity. How the p-value was calculated?

H0 for ADONIS is ‘‘the centroids of the groups, as defined in the space of the chosen resemblance measure, are equivalent for all groups.’’. It’s pretty clear how this hypothesis will be tested when having two groups. But it is not clear how this is done in case of continues variable. Does anyone know?

One more question :), I often see that PERMANOVA and ADONIS are used interchangeably. Is this correct?

Data Transform, Nominal Scale, Ordinal Scale, Interval Scale, SPSS

I am trying to complete a multiple imputation of some missing data in my dataset using SPSS I have three continuous variables that are each missing data. However, when I attempt to conduct a multiple imputation for these variables I get the message:

Warning: There are no missing values to impute in the requested variables.

This is obviously incorrect as I have several missing values in each of these columns and when I ran the "analyze patterns" function is revealed missing data in these three categories. Any thoughts on how I might resolve this issue? Thanks in advance!

I'm running an exploratory study where for three weeks I get participants to report their perception of a phenomenon using a self-report Likert scale whilst wearing an array of sensors.

**Data types:**

- Ordinal Likert data (1-5), collected 15 times per day
- Skin temperature data collected once per second
- Room temperature data collected once every 3 minutes
- Room humidity data collected once every 3 minutes
- Heart rate data collected once per second

This data will be collated into a table for each participant, where a brief example of the data is shown in the attached image.

For each participant, I might expect between 50-120 complete entries like this. However, as participants are from a hard to recruit medical population, I can only realistically expect to get about 5-6 participants.

So, I'll have a reasonable number of data instances (roughly 250-720), but from few participants. I also expect the Likert data to be unbalanced and to centre around a neutral (3) rating, with the extremes of the scale being relatively rare.

**The aim**is to explore if the Likert (particularly the class) data has any form of relationships, or if it's possible to form a model of some sort. So, say it would be great to be able to find a correlation between Likert ratings of 5 and SkinTemp. However, I would be astounded to find such simple relationship- instead, I imagine it will be far more likely that the Likert Data will be influenced by each of the data points at any one time.

**My current thinking**will be to try out PCA and maybe an LDA approach. However, as I'm relatively new to multivariate data analysis I'm still unsure if this would be a good way to explore the data.

If I had a larger dataset, I'd be tempted to try some supervised machine learning classification algorithms (e.g. RF, adaboost or SVM), however, I'm even more cautious that such a small dataset might make this approach inappropriate.

**The question**- If you were faced with this type of data and this amount of data, what would be the predominant statistical approaches and methods you'd use to interrogate this data set?

**The cheeky extra question**- As I'm new to this, can anyone recommend any good books or tutorials which specifically deal with reasonably sized datasets from few participants? Most seem to have the situation of medium-large datasets with 'high' N, so it's hard to know if it would be appropriate to employ the same analytic process.

For most real life situations I have come accross number of active effects is quite low. I need a situation where a large number (i.e. more than half) of the contrasts are active for illustrating a method of analysisng unreplicated factorial experiments.

Hello all,

I want to use Spearman's Rank Correlation to to measure association between two constructs - Culture and Ethics. I have coded the Likert Scale data and aggregated the responses from the several questions in the questionnaires into two sets (Culture and Ethics) using the median values of responses from each question. However, in one of the Construct of 7 questions the median values are the same number (2). Now SPSS will not perform Spearman's Rank Correlation for the two Construct because 'one of the variable is constant'. Am I right to have used median values? What did I do wrong? How do I remedy this?

Many thanks!

How to draw dot plot for different groups and denote comparisons ?

(See attached images)

If there is a procedure in SPSS/ Excel or if there is any free user friendly online S/W guide me.

Thank you

I'd like to compare between two sofware SPSS and Statgraphic centurion version 15 for principle component analysis and factor analysis.The factorability tests include the Kaiser-Meyer-Olsen (KMO) measure of sampling adequacy and Bartlett’s test of sphericity" in Statgraphics centurion version 15.

how to get discriminant loadings in multiple discriminant analysis by using SPSS?

SPSS is reporting canonical discrimant function coefficient. Are these refer to discriminant loadings ?

what is the difference between discrimant function coefficient and loading?

thanks in advance

How to calculate the Average Variance Extracted (AVE) by SPSS in SEM?

I know what it estimate by some software such as AMOS, Lisrel, ....

I want to know if that can be used in SPSS for calculation of AVE?

It is essential for multivariate data analysis.

I want to perform O2PLS-DA analysis of multiomics data (from different metabolomics lipidomics and proteomics experiments) by using SIMCA 130.2? I have data in matrix format (samples in row with labels and variables in column). I can perform upto PCA, PLS-DA and OPLS-DA, but the O2PLS-DA tab is not active. I think I do have a problem with data arrangement. However, not sure if its the only problem. Any help will be highly appreciated!

Field, A. (2009). Discovering statistics using SPSS. Sage publications.

Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (1998). Multivariate data analysis (Vol. 5, No. 3, pp. 207-219). Upper Saddle River, NJ: Prentice hall.

I would like to use the propensity score matching in measuring the effect of treatment between the control and treated group

doing it by spss 22 after the R plug is easy but I would like to understand the output and measure the effect

I tried the offset function but it is quite complex. I have 60,000 rows of data from UV detection. I need to offset downwards by +1 for all of them so that only y axis of graphed chromatogram(s) will be affected. If there is a way of manipulating the figure to do this please let me know.

The percentage deviation and multiplication did not work.

I am using excel 2017. If you know Origin it wont matter it is similar. I couldnt download SPSS.

End goal is attached below.

I have generated a new variable from a factor analysis "religiosity" as same as in the article in the attached picture. But then I do not know how to standardize the scales of the values of this variable to 0-100 scales as it is in the article. I want to take the same steps but I am lost here. Could anyone help?

I have an interval scale survey data. I have defined the socioeconomic status of the survey site with 5 parameters and wish to create a composite index. I do not have any prior weights for these parameters. Can I use PCA to create an index? My data is satisfying KMO and Bartlett Sphericity test, but for the results to be robust is it important for the data to pass normality and "no non-linearity" tests as well? Some of my parameters are normal while some are not. Please guide.

Hi

When constructing a principal coordinate analysis (PCoA) to see (dis)similarity in a set of data, you have to choose between "Co-variance standardized" and "distance standardized" measures of variability. If you are using genetic distances between populations, which measure do you choose (between co-variance standardized and distance standardized) to construct your PCoA?

Thanks in advance for all answers

The project is based on following patients longitudinally at different age categories, with high variability in time of follow-up as well as missing follow-ups.

The data that we are analyzing is parametric and continuous.

We are thinking of plotting a mixed effect model, but is there a robust test to calculate a difference among the samples?

Thanks in advance for any suggestions!

Dear all,

I wonder if it is possible not only to combine different variables into one variable via "compute" or "transform" in SPSS but to add subcategories to that variable too.

For example, I have 24 items (answers either yes--> 1 or no-->0) of which 12 items belong to subcategory 1 and 8 items to subcategory 2 and again 8 items to subcat. 3. I would like to make one variable out of all 24 variables but without "loosing" my subcategory arrangement.

I hope my question,explanation is clear and I appreciate every help!

Thanks a lot,

Anne

Hi, I am analyzing the data of this questionnaire (link below). Most of dependent variables are dichotomous/binary and independent variables are ordinal (educational level), scale (age) and binary (gender).

I know I could use a logistic regression, for example to predict the effect of independent variable on a binary response variable like "I use a tablet computer".

But I would like to create a model of latent variables (so first I need some factors), created from dependent variables, that would show something similar as a Structural equation model. For example that latent variables created from items like different internet usage, different skills and different options of internet availability (exogenous) explain about xx percent of the variance of the latent variable created from items describing e-government use (endogenous).

But maybe I am mixing apples and oranges :-)

Thank you for any suggestions

(I use SPSS and AMOS, also heard about the FACTOR software)

Hi, I conducted PCA on a set of 28 variables capturing various economy related data using Stata. The eigen values come greater than 1 for 7 components. When I conduct KMO, the output just states "The correlation matrix is singular". Can I still go ahead with the results of PCA? Is KMO a necessary condition for PCA? If yes, how can I fix the singular matrix problem?

Hi,

I have conducted a study to see if a training intervention has an impact on delegate's perceptions of the supportiveness of 3 different behavioural factors (i.e. I have 3 DVs, which are each measured at 3 different timepoints). I have assessed delegates before training (time 1), immediately after training (time 2) and 6 months later (time 3). Currently, I have run three separate one-way repeated measures ANOVAs to test this- one for each behavioural category DV, and have used the bonferroni corrected post-hoc test to see where there are significant differences between the three time points.

Is this correct, or is there some way of running a repeated measures MANOVA?

I also have data from a control group at time 1 and time 2 (but not time 3). Therefore, I have run three mixed ANOVA's (one for each behavioural cateogory DV); where the group (intervention or control) is the between subjects factor and the time (time 1 or time 2) is the within subjects factor. As there are only 2 time points assessed, no post-hoc tests are conducted. Is this approach correct, or should I be using mixed MANOVA as I have multiple (related) DVs again? Can you perform mixed MANOVA when your intervention and control group are very different in size? (i.e. N=1283 Vs N=40)

I know that MANOVA is good for multiple dependent variables, but I am struggling to find many tutorials that use a similar study design to mine?

Finally, when using the bonferroni corrected post-hoc, I know that this accounts for multiple comparisons within one test, but as I have run several separate ANOVA tests, should I be controlling for this too somehow (as in theory, couldn't all the separate ANOVAs I've run be considered as part of an overall 'family' of tests and therefore increase the likelihood of type 1 error?)

Thanks in advance

I wrote a algorithm based on GMM. After a few iterations the following problem is caused: density values in Some observations are zero and the entire algorithm makes trouble.

For example, for i=67203 from observations, the density values in all components is zero (not even a small number).

With this code, I calculated the density of all components for this observation and all became zero.

for l=1:9

pr(l)=mvnpdf(y(67203,:),mu(l,:),sigma(:,:,l))

end

Hence, posterior probabilites and then parameters (mu and sigma matrix) are NaN.

Is there a way to prevent this zero values for densites and NaN values for Posteriori probabilities?

I am looking for suggestions for analyses that can compare of different taxa in terms of the relative difference in composition among sites.

I have 4 parallel datasets of species abundance data from 4 different taxa sampled in the same sites (n=12).

Each site was sampled between 4 - 10 times. Usually (not always) sampling was done at the same time for all taxa within a site, but not all sites were sampled at the same time so the data are unbalanced.

I can create balanced subsets if needed but this would severely truncate the data.

I've heard of co-correspondence analysis, co-inertia anlaysis, and possibly multiple-factor analysis as potential candidates for doing this type of comparison but I'm not sure about the differences or which is most appropriate.

Are there pros and cons/restrictions/assumptions for each of these?

Is there an alternative method that I have mentioned that would be better?

Also what do these analyses allow me to test exactly - is their intention is to be able say for example that taxa A and B had high correlation in terms of variation in composition across sites, while taxa C showed low correlation with any other taxa ...etc ?

Thanks

Tania

Hi folks,

i want to survey employee in three waves and need to connect the three timepoints to the individual. Furthermore I need to survey one leader of every employee at each timepoint for causal inferences.

Do you have any suggestions for software which is easy to use for this purpose?

Thank you

Is there any source for the acceptable size of smallest cluster and threshold of ratios of sizes in cluster analysis output?

Thank you for your help.

I want to show distribution of species along a transect line, can any one guide me on how to draw it?

Thanks in advance for your time.

Suppose I want to compare two independent groups X1 and X2 with respect to one latent variable Z (comprises 3 indicators Z1, Z2, and Z3) through SPSS. Is it possible to use Z as Test variable instead of using Z1, Z2, and Z3 separately?

I measured ad liking with two repeated-measure samples: participants of each sample saw 7-7 different ads, 14 in total. I had to do two waves because otherwise the study would have been too long. Participants were different in the first and second wave. I would like to compare the ad liking scores of the total 14 items, but I don't know which statistical test to use: if I compare only one wave, it would be a one-way repeated-measure ANOVA, but adding the second wave where participants are different, it is no longer a repeated-measure design, but the combination of repeated-measure and independent samples. Anyone has an idea? I use SPSS.

I have a quite huge dataset including data from different years. I would like to create one variable including data from 4 different variables. These different variables include the value 1 or 5 (or -2,-1 for missing). Some of the variables are overlapping with the same value, why I can't just use the function "compute variable - sum".

I have measured task performance

**(Performance)**of 48 people under two different conditions**(Condition 1 and Condition 2**). The time taken by participants to complete the task (**Time**) and their accuracy (**Score**) were measured in both conditions as two independent variables representing**Performance**. I have also measured participants sensitivity to noise**(Sensitivity)**. Can I use MANCOVA for this type of analysis using**Time**and**Score**as two independent variables and**Sensitivity**as a covariate? And if so, what would a significant (p<.05) interaction between**Time*Score*Sensitivity**mean? ThanksI am conducting an experiment with 1 dependent variable and 3 independent variables, each IV have 2 and 3 levels, so I will have 12 treatments in total (12 experimental groups). The dependent variable is credibility and I know that the most appropiate way of making the analysis is by doing multifactorial or ANOVA. For this reason I would like to know: which is the most appropiate "measurement scale" to assess the DV in terms of statistical applications (ANOVA)?

If DV is ordinal data, should I use multinomial or multi-linear logistic regression to perform analysis?

Thanks.

I am calculating effect sizes for a literature review, however several of the papers use neuropsychological tests and group means are reported as standardised T-scores (i.e. mean = 50, sd = 10).

Can I calculate an effect size from these types of statistics?