Science topic

Parametric Statistics - Science topic

Explore the latest questions and answers in Parametric Statistics, and find Parametric Statistics experts.
Questions related to Parametric Statistics
  • asked a question related to Parametric Statistics
Question
7 answers
I have six ecosystems in two substrate categories (Triplicates essentially). I have determined shannon wiener index values for each ecosystem and also for the two categories separately. I have done this for two separate sets of data that were sampled in two separate years. Is it possible to statistically compare the development of the biodiversity between each of the categories i.e., the development of biodiveristy in ecosystem 1 between the two years, using the shannon wiener values somehow? Are there any other tests that could work? I am aware of the hutcheson t test however, some of my data is not normally distributed.
I would really appreciate some help!
Relevant answer
Answer
To statistically compare Shannon index values (a measure of diversity) between two years, you can use several methods depending on the data's nature and distribution. Here's a step-by-step guide:
1. Prepare Your Data
Ensure you have the Shannon index values for the two years. Your data might look something like this:
Year Shannon_Index
2022 2.3
2022 2.5
2022 2.1
2023 2.7
2023 2.8
2023 2.6
2. Check Normality
Determine if the Shannon index values follow a normal distribution. This can be done using tests such as the Shapiro-Wilk test.
3. Choose a Statistical Test
Based on the normality of the data, choose an appropriate test:
If data is normally distributed:
Use an independent t-test if the variances between the groups are equal (you can check for this using Levene's test).
Use Welch's t-test if the variances are not equal.
If data is not normally distributed:
Use a non-parametric test such as the Mann-Whitney U test (also known as the Wilcoxon rank-sum test).
4. Perform the Test
  • asked a question related to Parametric Statistics
Question
5 answers
Is it very literally subbing in shannon wiener index values instead of species abundances?
Relevant answer
Answer
By the laws of statistics, no crime, it is allowed.
  • asked a question related to Parametric Statistics
Question
3 answers
In a causal model (such as multiple IVs and single DV) with presence of a mediator or moderator, do we have to consider such mediator or moderator when assessing the parametric assumptions or do we have ignore them and consider only the IV/s and DV in the model?
Relevant answer
Answer
Since you are going to involve a third variable that will eventually impact your results, you need to take that third variable into account and check for normality and other assumptions before you carry out your final analysis. However, while analysing the IV and DV, if the data is not found to be normally distributed, then a mediator or moderator is less likely to help ensure normality. In such a scenario, you could simply opt for non-parametric tests.
  • asked a question related to Parametric Statistics
Question
7 answers
Hi, 
I have a general question regarding weather or not one can use cronach's alfa for measuring scale reliability? As far as I understand alfa is only used for metric data, right ? 
Thanks 
Davit 
Relevant answer
Answer
Francois E Steffens sir, i have a question regarding the reliability testing,
in my questionnaire, i have interval scale questions that means closed ended questions, so can i use Cronbach alpha test for checking the reliability?
if possible please attach some article so that i can read that article.
thank you
  • asked a question related to Parametric Statistics
Question
3 answers
In determining the number of social research subjects, there are many opinions that the data must be more than 30 as a condition to be tested for parametric statistics. What theory underlies this?
Relevant answer
Answer
From your question, I infer that you need a theoretical underpinning / justification for a certain sample size. This cannot simply be substantiated with a threshold value (like 30), because it strongly depends on the (expected) effect size, chosen alpha, number of variables, etc. Assuming that your research is single-level, I recommend you to determine your sample size in advance, for example by using G*Power. You can find many demo-video's on YouTube that can help you to apply G*power. Good luck!
  • asked a question related to Parametric Statistics
Question
2 answers
I a comment to one of members on researchgate re normalising data using Normal Quantile Transformation which is as follows: “… If you want to run a three factor ANOVA model with interactions, I recommend using normal quantile transformation first, followed by a regular three factor ANOVA on the normal scores. This is a robust ANOVA procedure….” . I have a survival data-set for my study in which I have 5 vitamin levels x 2 heat levels x 2 genders. The 2 dependent variables that I am trying to analyse are: 1- The day I got 50% survival 2- the day I got 90% survival I have used Kaplan Meier analysis to generate survival curves (all good with that). I want to run a 3 way Anova to look for interactions between the independent variables (vitamin, gender and heat) but my data are not normally distributed even though I tried to normalise them using Log10, Square Root, and normal quantile transformation that were suggested. I was wondering if: 1- there is another 3 way anova for nonparametric vales? 2- there is another way to transform my data (other than the 3 types I tried) to normalise my data to parametric valued so that I can use 3way anova in parametric statistics ?
Relevant answer
Answer
You may want to use a Cox Proprotional Hazards model to test effects and interactions of your covariables on the hazard ratio.
You should also include "age", as this is generally a relevant confounder of survival.
  • asked a question related to Parametric Statistics
Question
8 answers
Majority of our questions were in Likert scale (from 5 very frequent to 1 never), and we use a pretest-posttest methodology. To compare the pretest and the posttest, we wanted to use paired sample t-test. However, this is a parametric test wherein the data should be normally distributed.
I have also read in the work of Norman, G. (2010) that parametric statistics can still be used with Likert data even with non-normal distributions.
What would be the best option here? Should we proceed in using the paired sample t-test, or go for Wilcoxon tests since the data is not normal? Thank you for answering in advance.
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in health sciences education, 15(5), 625-632.
Relevant answer
Answer
Just go for normal. The means and standard errors will be close.
  • asked a question related to Parametric Statistics
Question
62 answers
I have large sample (just one sample with 1100 cases) and I want to test a hypothesis about comparing mean of my sample in the two groups (each group has 550 cases).
Some statisticians told me "you can use formal student t-test because the data are normal, based on Central Limit Theorem".
I'm confused, the Central Limit Theorem is about "mean of sample means". for example, if we have a data with 100,000 cases which is not normal then we can take 100 samples. In this case, the average of 100 sample means would be normal. Now I can use the t-test.
If my sample is large, can I use parametric statistics (or testing hypothesis test) with a non-normal distribution of the data?
Relevant answer
Answer
As a complement to this discussion and valuable remarks made by Prof. David Eugene Booth , let me also put this quote from Wilcoxon [1]. It totally agrees with my experience in a field well known from generating "difficult" datasets (clinical trials -> clinical biochemistry).
1.4 The Central Limit Theorem
When working with means or least squares regression, certainly the best-known method for dealing with non-normality is to appeal to the central limit theorem. Put simply, under random sampling, if the sample size is sufficiently large, the distribution of the sample mean is approximately normal under fairly weak assumptions. A practical concern is the description sufficiently large. Just how large must n be to justify the assumption that ̄X(bar) has a normal distribution? Early studies suggested that n=40 is more than sufficient, and there was a time when even n=25 seemed to suffice. These claims were not based on wild speculations, but more recent studies have found that these early investigations overlooked two crucial aspects of the problem.The first is that early studies looking into how quickly the sampling distribution of ̄X(bar) approaches a normal distribution focused on very light-tailed distributions where the expected proportion of outliers is relatively low. In particular, a popular way of illustrating the central limit theorem was to consider the distribution of ̄X(bar) when sampling from a uniform or exponential distribution. These distributions look nothing like a normal curve, the distribution of ̄X(bar) based on n=40 is approximately normal, so a natural speculation is that this will continue to be the case when sampling from other non-normal distributions. But more recently it has become clear that as we move toward more heavy-tailed distributions, a larger sample size is required.The second aspect being overlooked is that when making inferences based on Student’s t, the distribution of T can be influenced more by non-normality than the distribution of ̄X(bar). In particular, even if the distribution of ̄X(bar) is approximately normal based on a sample of n observations, the actual distribution of T can differ substantially from a Student’s t-distribution with n−1 degrees of freedom.Even when sampling from a relatively light-tailed distribution, practical problems arise when using Student’s t as will be illustrated in Section 4.1. When sampling from heavy-tailed distributions, even n=300 might not suffice when computing a 0.95 confidence interval via Student’s t.
[1] Wilcox, Rand. (2012). Introduction to Robust Estimation and Hypothesis Testing. 10.1016/C2010-0-67044-1.
  • asked a question related to Parametric Statistics
Question
4 answers
Would you please share your kind opinion regarding this issue?
Relevant answer
Answer
You can calculate the mean of any set of numbers. e.g., 1,2,3, (1+2+3)/3 = 2. Being non-normal is different than wanting to estimate a parameter. If you mean something different please specify this.
  • asked a question related to Parametric Statistics
Question
7 answers
Hi,
We received a statistical reviewer comments on our manuscript and one of the comments goes as follows: '... Note that common tests of normality are not powered to detect departures from normality when n is small (eg n<6) and in these cases normality should be support by external information (eg from larger samples sizes in the literature) or non-parametric tests should be used.'
This is basically the same as saying that 'parametric tests cannot be used when n<6', at least without the use of some matching external data which would permit accurate assumption of data distribution (of course in real life such datasets do not exist). And this just doesn't seem right. t-test and ANOVA can be used with small sample sizes as long as they satisfy test assumptions, which according to the reviewer cannot be accurately assumed and thus cannot be used...
I see two possible ways of addressing this:
  1. Argue that parametric tests are applicable and that normality can be assumed using residual plots, testing homogeneity or variance, etc. This sounds as the more difficult, risky and really laborious option.
  2. Redo all the comparisons with non-parametric test based on this one comment. Which just doesn't seem right and empirically would not yield a different result. It would be applicable to 15-20 comparisons presented in the paper..
Maybe someone else would have other suggestions on the correct way to address this?
For every dataset in the paper, I assume data distribution by identifying outliers (outliers - >Q3 + 1.5xIQR or < Q1 - 1.5xIQR; extreme outliers - > Q3 + 3xIQR or < Q1 - 3xIQR), testing normality assumption by Shapiro-Wilk’s test and visually inspecting data distribution using frequency histograms, distribution density and Q-Q (quantile-quantile) plots. Homogeneity of variance was tested using Levene’s test.
Datasets are usually n=6 and are exploratory gene expression (qPCR) pairwise comparisons or functional in vivo and in vitro (blood pressure, nerve activity, response magnitude compared to baseline data) repeated measures data between 2-4 experimental groups.
Relevant answer
Answer
This probably does not help you, but I thought that I would have a look at the original Student (Gossett) paper of 1918 as the test was specifically designed for (very) small samples:
"if our sample be small, we have two sources of uncertainty: (1) owing to the “error of random sampling” the mean of our series
of experiments deviates more or less widely from the mean of the population,
and (2) the sample is not sufficiently large to determine what is the law of
distribution of individuals. It is usual, however, to assume a normal distribution,
because, in a very large number of cases, this gives an approximation so close
that a small sample will give no real information as to the manner in which
the population deviates from normality: since some law of distribution must
he assumed it is better to work with a curve whose area and ordinates are
tabled, and whose properties are well known. This assumption is accordingly
made in the present paper, so that its conclusions are not strictly applicable to
populations known not to be normally distributed; yet it appears probable that
the deviation from normality must be very extreme to load to serious error. " My emphasis
" Section X. Conclusions
1. A curve has been found representing the frequency distribution of stan-
dard deviations of samples drawn from a normal population.
2. A curve has been found representing the frequency distribution of the
means of the such samples, when these values are measured from the mean of
the population in terms of the standard deviation of the sample.
3. It has been shown that the curve represents the facts fairly well even
when the distribution of the population is not strictly normal." Again my emphasis.
The are several examples with a sample size below 10 in the paper.
When I used to teach this stuff (1st year geography students), I would demonstrate the Fisher Randomization and permutation test for very small numbers as the students could do this by hand and thereby see the underlying logic of the test. I would show that you could permute the data of the two variables under the null hypothesis of no difference and see how extreme a result you could get 'by chance' and then compare the observed value to this; no normality assumptions were needed in coming to some sort of judgement.
  • asked a question related to Parametric Statistics
Question
5 answers
Choice of parametric statistical methods used in analysis of non-parametric data is often criticize due to lack of fulfillment of assumptions, for example, homogeneity of variance and normality of distribution. Some literature says that one can use parametric statistical procedures when the sample size is 'large enough'. What is this size which is 'large enough'? I will appreciate your input. Please provide relevant journal articles. Thank you for reading this question.
Relevant answer
Answer
As of my understanding large enough to indicate above 30. Then there's will not be much difference in the results either you do parametric or non-parametric tests. Before doing the Analysis you can verify the kurtosis value.
  • asked a question related to Parametric Statistics
Question
4 answers
I am going to examine the effectiveness of an intervention study. My advisor wants me to use parametric tests when testing the effectiveness of the intervention. Based on the central limit theory, I need to assign more than 30 people to intervention and control groups to be able to use parametric tests. In these circumstances, I must lead at least three intervention groups. Can I take 10 people instead of 30 people in the experimental group and bootstrap to analyze the data using parametric tests?
Relevant answer
Answer
I'm not 100% clear but it seems that you may have misunderstood the central limit theorem. There is no specific value of n which ensures the central limit holds (in those cases where it applies). Rate at which the sampling distribution of the mean converges on the normal distribution depends on the skew and kurtosis and shape of the distribution being sampled. If it is unimodal, with mild skew and without heavy tails it may be fine to use parametric tests even with relatively small n. Likewise the the sampling distribution of the mean can be very non-normal even with huge n if skew is severe, the distribution multimodal and with heavy tails.
Bootstrapping can be helpful in these cases, but it isn't great with very small n. One useful check can be to compare results from parametric approaches with more robust approaches such as rank-transformation tests (Mann-Witney or Wilcoxon for instance). If there is a big discrepancy it suggests that the violation of assumptions is quite severe and the more conservative/robust procedure should be preferred.
Finally the distribution assumptions apply to the residuals of your model - so its usually best to look at the residual diagnostics rather than the raw data.
  • asked a question related to Parametric Statistics
Question
3 answers
I have two datasets of equal size and characteristics, I'm looking for metrics to compare these data. I'm using the Python language associated with Pandas and Sklearn.
  • asked a question related to Parametric Statistics
Question
13 answers
In general term, Bayesian estimation provides better results than MLE . Is there any situation, Where Maximum Likelihood Estimation (MLE) methods gives better results than Bayesian Estimation Methods?
Relevant answer
Answer
I think that your answer may vary depending on what you consider as better results. In your case, I will assume that you are referring to better results in terms of smaller bias and mean square error. As stated above, if you have poor knowledge and assume a prior that is very far from the true value, the MLE may return better results. In terms of Bias, if you work hard you can remove the Bias of the MLE using formal rules and you will get better results in terms of Bias and MSE. But if you would like to look at as point estimation, the MLE can be seen as the MAP when you assume a uniform distribution.
On the other hand, the question is much more profound in terms of treating your parameter as a random variable and including uncertainty in your inference. This kind of approach may assist you during the construction of the model, especially if you have a complex structure, for instance, hierarchical models (with many levels) are handled much easier under the Bayesian approach.
  • asked a question related to Parametric Statistics
Question
27 answers
Hello, I'm struggling to find out which non-parametric test I need to use to compare the VO2 Max scores between physically active women and physically inactive men.
Aka, gender and physical activity are two independent(?) variables. I need a non-parametric test because I ran the data through normality tests and it says it's not normally distributed. I've looked at a Mann-Whitney U test but I don't think it's appropriate as it only lets me select one grouping variable?
Sorry I'm still really new at all this, any help would be appreciated.
Relevant answer
Answer
Simeon Stoynov , if new variable gender and activity level is created then we will have four groups namely; active men, inactive men, active women and inactive women. Afterwards, the four groups can be compared through Kruskal Wallis test.
  • asked a question related to Parametric Statistics
Question
3 answers
Hello,
I have data (size of neuronal soma area for two different mouse genotype) which are lognormally distributed.
In order to be able to use parametrical statistical analysis (ttest, ANOVA,...) I transformed my data with the log() function. Now perfect, the log(data) follow a normal distribution, so I can run my stats.
My problem is now how do I make figure with it?
Should I represent my histogram with the mean of the data, or of the log(data)? How do I calculate the percentage change between my two groups (based on data or log(data))? etc
Because if I represent the log(data) on my graph, the scale should be a logarithmic one, and it is not the more intuitive way to catch up the difference.
But am I authorized to represent the mean and sem of my data, if the statistical test was done on their log()?
Thank you for your help, my coming figures (and my mental health) will gain from it!
Relevant answer
Answer
There is no strict rule or law that you had to present data exactly the way it is analyzed. For some more sophisticated model this is not even possible.
My advice is to show the data using a scatterplot using a logarithmically scaled Y-axis like here:
You can indicate the means or the medians be a vertical line (just write in the legend what the line means), but to my opinion it's sufficient to just show the points.
  • asked a question related to Parametric Statistics
Question
5 answers
Hi all,
Just a quick question. 2 of my groups when compared do not demonstrate normal distrubution (ND), so in order to compare group differences I chose a sign test (previously failed assumptions of ND so cannot run paired t-test or non-parametric wilcoxan-signed rank). The other two groups in which I am looking to compare do show normal distrubution (as will as fulfilling the other assumptions) warranting use of the paired t-test. My question is to what extent can I comment on the results of both tests (as t-test considers means ,whereas sign considers median) in relation to each other ?
Thanks in advance!
Relevant answer
Answer
The distribution of the data is entirely irrelevant for the paired t-test. The *differences* should be roughly normal distributed.
The two tests are about different hypotheses, and note that for instance the very same data may demonstrate a positive median shift and at the same time a negative mean shift (or the other way around). How would you interpret this?
I would use the test adressing the hypothesis that I want to test. If there is no easy-peasy ready-made test, you might consider a bootstrap approach.
  • asked a question related to Parametric Statistics
Question
8 answers
Hello,
I'm currently analyzing my survey data for my master thesis and my advisor suggested to do a one-sided t-test. However, I'm not quite sure if I can really do this or if I should use a test of proportion or a binominal probability test?
I had questions with three options (only one could be chosen). Let's say A, B, C. I generated a new variable in Stata, called x. For each subject, let x=0 if he/she answered C. Furthermore, let x=1 if the subject answered A or B. Thus, x is a binary variable, with the mean x_m telling us how many percentage of subjects in the sample did not choose C.
He then said: "Now, we would like to test whether the population mean of x (say, mu_x) is larger than 0.5 (50%). By construction of x, we can employ a one-sided t-test for this (Note that x satisfy the distributional assumptions; if we could keep drawing samples from the population, then x_m would follow a normal distribution with mean mu_x). Thus, you can simply employ a t-test to test whether the mean is significantly larger than 0.5."
Can I really use a one-sided t-test here? How is this data normally distributed?
I'm not sure about the non-parametric options that I could use instead...
What I have found was a one-sample test of proportion. In stata I would then type "prtest x == .5". Another option would be the binominal probability test "bitest x == .5".
I'm quite new to these statistical tests and have never done it. Thank you so much for any help!!
Relevant answer
Answer
Slight digression. The main reason not to use the binomial test is if you want an interval estimate. It turns out that the interval test created from the binomial test don't have particularly good properties in small samples. This creates the additional problem that the inference from the CI and the test don't match. In these cases its possible that a confidence interval from a t test might have better coverage for example. A very simple modification of the normal approximation is known to work quite well in practice. This is to add two successes and failures before calculating the normal approximation (usually attributed to Agresti and Coull).
There is another reason not to use the binomial test - which is if you want to include prior information. It is very easy to conduct Bayesian inference with simple proportions like these. It turns out that the add two successes and failures CI I mentioned is more-or-less equivalent to Bayesian posterior probability interval with a beta(2,2) prior. This is a reasonable prior if you expect a proportion to be somewhere in the middle of the range rather than close to 0 or 1.
  • asked a question related to Parametric Statistics
Question
6 answers
Why is the T-Test the only parametric statistical method that can be used with small samples? Minimum 10 persons
Relevant answer
Answer
Parametric tests are based on distribution models that can be expressed as a parametric formula. In this way, all tests based on the binomial, Poisson, exponential, gamma, beta, normal, Weibull, Cauchy, beta-binomial, negative binomial, gamma-Poisson, inverse gaussian, geometric, Gompertz, Gumbel, and so on.
For any given distribution model it is possible to derive the liklihood ratio statistic. The problem is usually to identify the probability distribution of that statistic (under H0). We know from Wilk's theorem that this is approximately Chi², but only in few cases we can give the exact distribution. One of these rare cases is the normal distribution, where we know that likelihood ratio statistic can be transformed into an F-statistc with an analytically tractable and known distribution (note the t² = F, so this applies to the t-tests).
Having an "exact distribution"of the test statistics is nice mathematically but usually is not that relevant in practice. In practice, we must guess reasonable assumptions, and all our neat mathematical models are always approximations of the real world. Therefore, in practice, there isusually not much difference of using an "exact" test or an approximate test, because in any case they rely on neccesarily approximate ideas of the real world (I hope I don't need to stress that our assumptions should certainly be reasonable and not in stark contrast to our observations and experiences). It is also better to use an approximate test on a reasonable distribution model (e.g. a Chi²-based test on a beta-distribution model) than an "exact" test like the F-or t-test in cases where the normal assumption is not reasonable.
  • asked a question related to Parametric Statistics
Question
17 answers
Hi. OK we all know the well used effect size criteria for Pearson correlation coefficents of .1 = small, .3 = medium and .5 = large. However, I've picked up over some time another criteria related to correlations of small = .1, medium = .24 and large = .37. This is largely based on the fact that commonly cited benchmarks for r were intended for use with the biserial correlation rather than  point biserial and that for a point-biserial correlation the .1, .24, .37 criteria correspond more closely to the value for d. Also from reading the Pearson Product moment is of mathematical equivalence to point-biserial. Therefore could one use the .1, .24, .37 as effect size criteria for Pearson product moment correlation coefficients. Or is there something different about the point biserial correlation from Pearson's r  (regardless of the mathematical equivalence) that doesn't allow that criteria to be used.
Relevant answer
Answer
And what is missing is the context. For an ecological field study, an r of 0.1 is rather large, and for a dose-response relationship in a cell-culture experiment an r of 0.9 is rather low. "One size fits it all" is never working.
  • asked a question related to Parametric Statistics
Question
4 answers
Hi all,
I've been reading up on parametric survival analysis, especially on accelerated failure time models, and I am having trouble wrapping my head around the family of distributions.
I am aware that there are the exponential/weibull/log-normal/log-logistic distributions. But what I can't seem to find a clear and consistent answer on is which of the following is the one that is actually assumed to follow one of those distributions? Is it the survival time T, the log survival time ln(T), the hazard function h(t), the survival function S(t), or the residuals ε?
Thanks in advance.
Relevant answer
Answer
Dear Shaun,
It would be T itself - as is, or transformed as per your interest. For example, ln T would be something that you model. The survival function would be the exponential/weibull/etc... The hazard function has a shape derived from this survival function but is not the function itself. Residuals are something else entirely, they tell how well you have empirically described the data using your assumptions about S(t). The distributions of residuals can be discussed in their own right, but this is not part of the initial choice of S(t), it is part of the discussion of how your choice of model has described your data.
Hope this helps.
  • asked a question related to Parametric Statistics
Question
4 answers
I am writing a non-parametric/parametric statistical analysis paper on three Independent data sets. (Human Development Index, Gini Index, US Aid) for 10 countries, annually over the last 10 years. I want to find out whether the Gini index can be described as a predictor for the country's Human Development, and whether US Aid impacts this.
I want to know which tests I should conduct to find an inference for my data.
Relevant answer
Answer
Taking into account that you have only 10 observations for each variable, I suggest to analyze the variables in pairs, i.e. Human Development Index, Gini index and Human Development Index, US Aid . You can try to use the exact test for correlation coefficient (see for example https://documentation.statsoft.com/STATISTICAHelp.aspx?path=Power/PowerAnalysis/Examples/Example9ExactTestsandConfidenceIntervalsfortheCorrelationCoefficient).
Of course using the exact tests needs some assumptions concerning the distribution of the observable random variables, but some of them does not need the assumption of normality.
  • asked a question related to Parametric Statistics
Question
10 answers
It's a weighted least-squares polynomial regression, so it's based on assuming normal errors, and the normal probability model is parametric. However, in some statistics book and online statistics resources lo(w)ess is often termed a non-parametric method, and in some software (e.g. in STATA), lo(w)ess is found under "non-parametric analyses". I think this is wrong, as "(non-)parametric" refers to the probability model, not to the functional model (describing the relationship between the expectation of the response depending on the predictors). What do the experts say?
  • asked a question related to Parametric Statistics
Question
4 answers
I'm wondering what test to use.
I have 9 IVs and 1 repeated DV (measured twice). The 9 IVs measure different aspects of identity, and include 2 composite scales and 7 subscales between them.
Initially, I had intended to median split each IV and conduct 2x2 mixed ANOVAs, i.e. High vs. Low IV on the DV. The issue is the inflation of Type 1 error (I hadn't factored that in when designing the experiment).
Is there another test that could include all IVs that wouldn't inflate the error rate?
Relevant answer
Answer
As David Morse and John Morris have noted, some kind of regression model will give you complete control over what interactions are included in the model. You'll need to use a multilevel model or generalized estimating equations (GEE) to account for the repeated measures. What software are you using?
You mentioned median splits. That is usually a bad idea. Here are some good articles and notes on that topic. HTH.
  • asked a question related to Parametric Statistics
Question
4 answers
Dear to whom it may concern,
I would like to ask people who are interested in univariate analysis in metabolomics. Now, I am proceeding my metabolomics data using univariare analysis, namely p-values and FDR-adjusted p-values.
However, as far as I know, the calculation of a p-value for each feature depends on two factors: (a) distribution of each feature and (b) variance of each feature between case and control group. To be more specific, the first step is that we need to apply a statistical tool (I do not know which tool can help me to check this issue) to check whether one examined feature is normally distributed in both these groups or in only one of them, and of course, there are two scenarios as follows:
1. If this feature is normally distributed in both these group, we proceed to use F-test as a parametric test to check whether the variance of this feature in both these groups is equal or unequal. If it is equal, we can do a t-test assuming equal variance, otherwise, a t-test with unequal variance must be taken into account.
2. If not, a non-parametric test will be applied to obtain a p-value for this feature. In this case, may you please show me which tests are considered as non-parametric tests?
I am unsure that what I mention above is right because I am a beginner in metabolomics. In case, this procedure is right, that means that each feature will be processed under this step by step one to obtain a p-value because all features are expressed differently in the distribution and variance way between these groups (case and control).
I hope that you may spend a little time correcting my idea and give me some suggestions in this promising field.
Thank you so much.
Pham Quynh Khoa.
Relevant answer
Answer
Hello, first of all, what's the sample amount in each group? In general, I suggest to test non-parametric if n < 6. For smaller sample size it is not really serious to assume parametric conditions, independently from test results. As non-parametric alternative for t-test, I suggest u-test (Mann-Whitney-Test). Best regards, Max
  • asked a question related to Parametric Statistics
Question
3 answers
Hi there,
I am carrying out a study on factors affecting electric vehicle adoption. I am using a likert scale format (from strongly disagree to strongly agree) for all of the predictor variable such as price or range anxiety and then a continuous dependent variable (the likelihood they believe they will adopt EVs).
I would assume to treat it strictly as ordinal and non-parametric but I have seen some sources say you can use parametric tests if your DV is continuous.
I have to bear in mind i am doing a second regression analysis where I add in demographic variables on top of predictors . Could anyone assist me with what is preferred in statistics in my case?
Thank you in advance
Relevant answer
Answer
Hello Thomas,
Are you using individual item responses as your variables? If so, it is safer to treat them as ordinally-scaled values rather than interval-scaled values. If, however, you're combining multiple, related item responses into a (summated) total score, then many folks would treat this as a pseudo-continuous variable. Indeed, that is the approach that Rensis Likert described in his 1932 monograph.
You could certainly use an ordinal regression approach (with individual item responses as variables).
Good luck with your work.
  • asked a question related to Parametric Statistics
Question
1 answer
Hello,
I have the challenging issue that many meta-analyses are having since I have my data on advertisement-level instead of user-level. This means that my sample size is small (11) but each observation has over a milliion impressions. The data is averaged of all of these impressions. I have tried using non-parametric analysis, like Spearman Rho, but lately I have been trying to weight my cases.
When cases are weighted, the sample size suddenly is increased to 30K. When running analyses, should I use non-parametric statistics since my data originally is non normally distributed, or can I use parametric statistics since my sample size is large enough?
I would like to hear your opinions.
Thanks in advance
Relevant answer
Answer
Syl -
I do not understand just what is your application, but two thoughts first occurred to me:
1) If some data points are clusters of others, and you only need to predict at the higher level, then that sounds like you would have a strong case for heteroscedasticity (and thus, WLS regression) as noted on page 111 in a book by Ken Brewer, as considered in a project of mine: https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression
2) You asked about 'normality' regarding your data, but it is the estimated residuals (or the random factors of the estimated residuals in WLS) which one might want near enough to 'normal' with a sample size large enough ... not the y or x data. 
Regarding sample size, that is a consideration, but it also sounds like you have to consider what points may or may not be independent of other points, or how the data may have been collected.  You might want to look into cluster sampling, possibly single-stage cluster sampling, to see if that is what you have going on.
But then, I did not really follow just what you are doing.
Best wishes - Jim
  • asked a question related to Parametric Statistics
Question
3 answers
What are the main test of normality before applying parametric statistics?
if data size is more than 200 or 500 or 1000 participants then can we skip these tests or its mandatory to apply test of normality
Relevant answer
Answer
You can run these tests just by putting the data into any of the statistical software package such as SPSS or Minitab etc. I recommend you to do this with Minitab because I feel that it is more user friendly for beginners (if you are a beginner in using statistical software(s)).
In general, the rejection of the null hypothesis (which is in this case is: the under study data set follows the normal distribution) is based on the test statistic value. Many researchers use p-value (although there is still a debate going on about reporting it in studies) as well. The p-value which is less than or equal to the "level of significance" refutes the claim of normality. Most commonly used levels of significance are 1%, 5% or 10% (it may vary subject to the nature of the study).
Best of luck!
  • asked a question related to Parametric Statistics
Question
3 answers
Any advice would be welcome regarding where I can look to formulate my argument regarding this point. I know there is debate around Likert scales being used in parametrig testing, is it likely that this belief is the rationale for the the use of parametric tests in this scenario?? Thank you in advance.
Relevant answer
Answer
Yes- Absolutely.
  • asked a question related to Parametric Statistics
Question
18 answers
Suppose that my sample size is greater than 30. However, it does not follow a normal distribution. Should I use a parametric statistical test or a non-parametric statistical test?
Relevant answer
Answer
It depends on what you want to know. In general, if you know the sampling distribution of the test statistic you are in parametric territory. If you don't it's nonparametric.. See the attached file. Best regards, David Booth
  • asked a question related to Parametric Statistics
Question
2 answers
Definition and examples, please.
Relevant answer
Answer
Generally the purpose of parametric studies is to evaluate the sensitivity of the analysis to different parameters. This can be carried out by different methods.
  • asked a question related to Parametric Statistics
Question
3 answers
IF I HAVE A METRIC AND NON NORMAL DATA (SHAPIRO-WALK TEST <.001), CAN I USE PARAMETRIC STSTISTICS WITH A RELATIVELY LARGE DATASET (N=144).
Relevant answer
Answer
Rajib Sarkar, you may need to test the data graphically for normality. Notice that ANOVA and t test are resistant to moderate deviations from normality. If still non-normal, try to explore for outlying and extreme values, remove them and re-assess for normality. Transforming data (log or others) may work.
  • asked a question related to Parametric Statistics
Question
9 answers
There are three groups from different professions. Data is not normal. Since ANOVA comes under the category of parametric statistics, this can not be applied. What to do now?
Relevant answer
Answer
The suggestion given by Debopam Ghosh may be attempted.
  • asked a question related to Parametric Statistics
Question
6 answers
In most research works, the word parametric analysis have been used for correlation between two variables. Whereas the variables could have failed in test of normality and homogeneity, can it also be called parametric analysis?
if the normality test and associated parametric assumptions are not fulfilled, should bi-variant analysis be termed as parametric?
Relevant answer
Answer
It sounds like your question is about when to use the terms "parametric" and "nonparametric".
The first thing I would say is that it's not necessary to use them at all. It is best to simply state which test is used (e.g. "Pearson correlation") and which assumptions of the analysis were confirmed (e.g. homoscedasticity). There can be some confusion about the terms "parametric" and "nonparametric".. (See, for example: https://stats.stackexchange.com/questions/142101/why-is-pearson-parametric-and-spearman-non-parametric )
To your specific question, the terms "parametric" and "nonparametric" attach themselves to the test, not the data. So, a t-test is always parametric, no matter what data is fed in to it. It always assumes that the data follow a certain distribution, whether or not the data in fact follow this distribution.
  • asked a question related to Parametric Statistics
Question
5 answers
Some scholars , especially that have psychology backgroung argue that likert type scales are interval scales so that we can use parametric statistics. Some other scholars argue that likert type scales are ordinal scales so that we cannot use parametric ones. I want scholars to clarify this issue to have clear understanding and application in my research endeavours? I thank you in advance for your suggestions!
Relevant answer
Answer
How to think about Likert scales - the perennial debate between Statistics and Psychology!  One perspective is that Likert scales are responses to words like “somewhat agree” that get translated into discrete numbers (e.g., 1,2,3,4,5,6,7).  But how can we know that the “distance” between 2 and 3 is the same as the “distance” between 6 and 7?  I do think they make a good point, especially for just a single Likert scale item with a small number of choices.
From the perspective of Psychology, I feel Likert scales are, first and foremost, an estimate of an underlying attitude.  Attitude is a theoretical construct about our inclination toward a target (e.g., chocolate ice cream, democracy) with a valence along a continuum (i.e., negative to positive).  A Likert scale item is a coarse estimation of the true value of the attitude. But we rarely use a single item.  Average many Likert scale items that have high reliability (i.e., alpha).  Then your estimate of the attitude is getting less coarse, and a more precise approximation of the true underlying attitude.  As such, it seems reasonable to treat the score as a continuous variable.  Hope this helps your thinking, Damtew! ~ Kevin
  • asked a question related to Parametric Statistics
Question
4 answers
The common classification of statistics is to divide it into parametric and nonparametric statistics. In the simplest form it should be said that parametric statistics are used to measure the hypotheses that are small in size. Quantitative variables are due to the fact that they are quantitative and indivisible because of the fact that they are moderate and standard deviations, and because of this characteristic, it is common for parametric tests to have assumptions that include the normal distribution of the society. Because in the absence of normal distribution, the mean and standard deviation do not represent the actual representation of the data.
Nonparametric statistics are used to test the qualitative variables and rank. These tests, also referred to as "no-default tests", do not require any special assumptions.
Regarding the conversion of variables, it should be recalled that quantitative variables can be converted into qualitative variables and evaluated them with nonparametric tests, but the opposite is not possible.
It is worth noting that the level of accuracy of parametric statistics is more than nonparametric statistical tests and it is usually suggested that nonparametric tests should not be used if parametric tests are possible. It should be noted that most behavioral science variables are judged by nonparametric tests. Placed.
As you know, the random variable may be assigned to one of the four measurement scales, such as nominal, order, distance, and relative. A statistical method is said to be nonparametric when there is at least one of the following conditions:
1- Suitable data that has a nominal scale.
2- Appropriate data that has a sequential scale.
3- Suitable data that have a relative distance scale, but the population distribution function of the random variables from which the data is obtained is not clear.
Advantages of using non-parametric methods:
1- Calculation of nonparametric methods is usually easy.
2. Nonparametric methods can be used for data that parametric methods can not be applied to. This situation is where the scale of data measurement is nominal or sequential.
3. In nonparametric methods, it is not necessary to assume that the random variable of the population has a probability distribution. These methods are based on the sampling distribution, but in the form of sampling, it is not necessary to assume a specific form for population probability distribution.
4. If a non-parametric method can be applied to a weak measurement scale, then it can be used for more robust scales.
Relevant answer
Answer
Hi, thank you for the answer you gave
The statistical impedance for parametric and nonparametric methods can be found if can be found. Show ?
  • asked a question related to Parametric Statistics
Question
12 answers
I opted for non-parametric test because the data is skewed and I have varying number of data in each group (90-400).
From reading, it seems to me that there is no hard rule on such situation. Some statisticians advocate the usage of parametric tests, stating that the skewness of the data will be minimised due to large sample size. Some advised to go with the Wilcoxon test anyways, and that the maximum size of 20 is just a rule of thumb.
Can anyone please advise me? Thank you very much!
Relevant answer
Answer
I basically agree with answers above, just to make the ANOVA assumptions straight:
Except for normal distribution (of data in each group!), you need the equality of variance in each group (homoscedasticity) and independence of measurements between groups.
You can test normality by Shapiro-Wilk, equality of variances could be tested for example by Bartlett test.
If your data violate the normality or homoscedasticity assumption, go for Kruskal-Wallis test. There is special method for multiple comparisons after K-W test, which can be easily performed in R by "kruskalmc" function from package "pgirmess".
  • asked a question related to Parametric Statistics
Question
6 answers
I have one variable (out of six) that doesn't adhere to the assumption of homogeneity of variance for ANOVA. I don't really want to perform data transformation. Is there any way around this? I am performing 2-way ANOVA and don't know of a good non-parametric equivalent.
Relevant answer
Answer
Hi, Tessa.
First, you may want to assess the homogeneity of the data by plotting the residuals (or standardized residuals, etc.) vs. the predicted values form the model.  This has two advantages:  1) it's looking at the marginal homogeneity suggested by the model, not from the original data  2) you are using your eyes to make the determination, which are better and more more appropriate tools than statistical tests for this purpose.
When assessing homoscedasticity, remember that anova has some robustness for heteroscedasticity.
In cases of heterogeneity, there are a couple of options.  A common approach is to use "weighted least squares anova".  Commonly, the weight used is the reciprocal of the group variance.  I'm not too sure how you would implement this practically on a two-way anova.
Another approach is to adjust the hypothesis tests for the heteroscedasticity.  Your options might vary depending on the software you use. In R, there is the white.adjust option.  Other software might implement a Welch-Satterthwaite adjustment. These will correct the p-values of the anova for the heteroscedasticity. 
However, be cautious that any post-hoc test you are using will be robust to heteroscedasticity.
Beyond these approaches, there are other robust approaches that may be useful.  Whether or not you use R, the following may present a useful list: https://stats.stackexchange.com/questions/91872/alternatives-to-one-way-anova-for-heteroskedastic-data
  • asked a question related to Parametric Statistics
Question
8 answers
Hi everyone, was wondering if I could drop a query to all of the statisticians out there?
I'm comparing gene expression of biomarkers between control and treated samples, most of my data was nonparametric so I have log transformed the values so as to induce normal distribution. Despite this, there are a few instances where one of my groups ( either control or treated) holds normally distributed data, but the other group will show nonparametric data; despite log transformation. I have also transformed using square root, squared and inverse values, but have found logging values usually giving the best results.
I was wondering what statistical test would normally be used in situations like this?
Many thanks in advance and I look forward to hearing from you.
Relevant answer
Answer
Non-parametric tests exist to cater for what you just described above. Like the first poster highlighted, just use the alternative tests suggested, since they are 'distribution-free tests'. If the two means you're comparing are dependent, instead of using paired t test, you should use Wilcoxon signed-ranks test.
At the end of the day, you can compare your results (both from parametric and non-parametric tests) but you have to highlight the difference, often using superscripts and defined under the result tables.
  • asked a question related to Parametric Statistics
Question
16 answers
I found someone say:
"There are few consequences associated with a violation of the normality assumption, as it does not contribute to bias or inefficiency in regression models.  It is only important for the calculation of p values for significance testing, but this is only a consideration when the sample size is very small.  When the sample size is sufficiently large (>200), the normality assumption is not needed at all as the Central Limit Theorem ensures that the distribution of disturbance term will approximate normality."
Is this true, so could i say that if my sample size larger than 200 then normality assumption is not needed??!!
Relevant answer
Answer
Maybe one more comment :)
The t-test is based on the assumptions that the data are i.i.d. normal, so that the distribution of the sample means are also normal and the distribution of the sample variance is chi-squared. These are assumptions from which the math of the test is derived.
If the data was in fact normal distributed, then the test was an exact test. However, there is nothing like normal distributed data in reality. There is only data that has a distribution that is a more or less good approximation to normal. In this (realistic) case the t-test is an approximate test. The question is only if this approximation is sufficiently good for the purpose.
  • asked a question related to Parametric Statistics
Question
3 answers
I am comparing 2 variables both of which were answered using a likert scale type questions. I was considering doing a Mann Whitney U and a Chi square? Is it worth doing them both?
Relevant answer
Answer
  • asked a question related to Parametric Statistics
Question
7 answers
The plasma potassium concentration in general healthy population is in a range of 3.5-5.5 mmol/L. If it is assumed that plasma K fit into a normal distribution, based on the data that presented the mean and std of plasma K from a sampling, what methods or formulars can I take to calculate the sample size by which the data collected satisfies with the normal distribution?
Relevant answer
Answer
Good morning Aoming,
Have you started collecting samples of serum K+?  If so, you can use a Shapiro-Wilk test to determine if your data are normally distributed.  Retaining the null (p>0.051) means your data are normally distributed.  If so, you can use a program such as GPower to determine your needed sample size for a given effect size.  Failing that, Cohen's seminal text on power analysis demonstrates how to calculate this by hand by obtaining a Coehn's d (your effect size) and then using his tables to determining the n needed to get an appropriate power.
If your normal values come from a text or an authoritative guide, nearly all physiologic measurements in such a guide are normally distributed.  If you do not wish to rely on their assertion that the values are normally distributed, simply use a non-parametric test to test your data.  
So, if your question is, do the K+ values of my sample deviate from a known mean, a simple t-test should be sufficient.  
Does this help?
Rich
  • asked a question related to Parametric Statistics
Question
11 answers
I had a data sample. I used the Shapiro-Wilk test for normality. But, now, I have a doubt. Can one always use the Shapiro-Wilk test? Should I have use other tests? How can you understand what test you have to use? What are the vantages and advantages of each test for normality?
Relevant answer
Answer
Normality tests will always reveal non-normality as your sample size grows (real data are highly unlikely to be truly normal in the limit).  I recommend visual approaches like qqplots or visual comparison to a normal density plot. Small deviations will have a small impact on your results.
  • asked a question related to Parametric Statistics
Question
8 answers
Drug Abuse Screening Test -20 are answered with No and Yes and is coded with 0,1 called dichotomous but as a whole scale this scale has also been classified like ordinal scale i.e. Low drug abused 1-5 scores, Intermediate drug abused 6-10 scores, Substantial drug abused 11-15 scores, Severe drug abused 16-20 scores but again this was converted into as a whole scale as Likert Scale.Can we consider it interval scale and consider it for Parametric Statistics.
Relevant answer
Answer
Even for small samples and yes or no questions, you can use a log linear model.
  • asked a question related to Parametric Statistics
Question
19 answers
-should I interpret them whatever the loadings are? if yes, what will be the interpretation?
-If I need to do treatments? if yes, please suggest what should be done ?
Relevant answer
Answer
Items of the same scale may be written in a positive way (i.e. In the same direction of the scale meaning) or in a negative way (in the opposite direction). For example you may find the following item in a typical depression scale: "I am satisfied with my life" which might has a negative sign when factorially analyzed with other items that measuring depression.
  • asked a question related to Parametric Statistics
Question
4 answers
Murphy et al. described in their article that they made two special regressors by convolution of time and dispersion derivatives with time series of pupil size.
I know that we get canonical HRF using SPM function "spm_hrf". But, I don't know how we get temporal and dispersion derivatives to convolve these derivatives with a certain time series like pupil size.
How do we get temporal and dispersion derivatives on SPM?
Relevant answer
Answer
If you are using scripting this is what you can use:
matlabbatch{i}.spm.stats.fmri_spec.bases.hrf.derivs = [1 1];
In this way you add both temporal and dispersion derivatives. You can turn  a 1 into a 0 to disable one derivative. If they are both 0 it's equivalent to a canonical HRF.
  • asked a question related to Parametric Statistics
Question
9 answers
Establishment of Reference Values is a very important aspect of Diagnostic Research. I want to know the minimum sample size required when data is parametric or non-parametric. 
In Teitz Text of Clinical Chemistry 120 is the minimum sample size given for Non-Parametric data but statisticians do not agree and say it is very small. Unfortunately they do not give their own formulae, so our projects are delayed.
Prof Aamir Ijaz, FCPS, FRCP (edin)
HOD Chemical Pathology and Endocrinology, AFIP Rawalpindi Pakistan
Relevant answer
Answer
The magic number of 120 (actually 119) for reference interval studies comes from the numbers needed to determine the 90% confidence intervals of the 2.5th and 97.5th percentiles of a population using non-parametric statistics. A smaller number will not allow you to make this calculation. Thus this number is the smallest number to see how uncertain your estimate is of these percentiles. Whether the accuracy obtained is sufficient  for  your needs is not a statistic question. Some additional factors - with a skewed distribution the 90% CI of the top reference limit can be very wide (but very narrow for the lower reference limit), but will be much narrower for a distribution closer to Gaussian. A larger n is needed for higher centiles (eg for the 99th centile the number is about 600 from memory). Other key statistical issues are outlier removal and transformation if using parametric statistics. Of course this is just the statistical part - the selection of population, whether to partition by age or sex or other,  pre-analytical and analytical quality can also strongly influence the reference interval determined.
Regards,
Graham
  • asked a question related to Parametric Statistics
Question
3 answers
I am trying to conduct a meta-analysis using the difference in mean change between intervention and control groups.  I have one study with data on mean (SD) for baseline, after and change in both intervention and control groups. Therefore, I want to calculate the mean change and its corresponding standard deviation to be used in meta-analysis; for this, I will need to use the paper which reported the mean and standard deviation values for baseline, after and change to calculate the correlation r and use it to calculate the mean (SD) for change for other papers. Is it valid to calculate correlation r for just one study and use it for all other studies included in meta-analysis? If no, how many papers are reasonable?
Relevant answer
Answer
I wouldn't go as far as to say it wasn't valid  Any evidence of the likely correlation is better than no evidence. Perhaps you can try a sensitivity analysis, using the correlation from the paper, but seeing what happens when you go (say) +/- 0.1 away from it.
  • asked a question related to Parametric Statistics
Question
18 answers
I have a matrix with species as columns and sites as rows and it contains counts of individuals per site per species. It was suggested that due to the high variability in counts (they go from 0 to ~3000), I should transform my matrix. Basically, I am doing multivariate statistics to find differences in community compositions among bioregions.
My knowledge of stats is by no means great, but I do know transformations are used to normalize your data and perform parametric statistical analyses, and you can check if the transformation applied does this by doing a test (e.g., Shapiro-Wilks). But, this is different to what I want to do and I don't understand why and when transformations should be applied to your data and how do I decide that the transformation chosen is the correct one or most suited for my data and research question. Is there like a rule of thumb for the application of transformations (e.g., if the SD is over 2 or the counts vary by more than two orders of magnitude, etc. the data should be transformed)?
I have not been able to find information about this, so I would really appreciate your help.
Thanks!
Relevant answer
Answer
The question stated by Denisse is a quite standard one but not easy to answer. The appropriate answer depends critically on the underlying assumptions for the data and the stated problem. For a problem in species composition, to use a "compositional data approach" is becoming standard. For a general review of the state of the art you can have a look at "Modeling and Analysis of Compositional Data" (Pawlowsky-Glahn, Egozcue, Tolosana-Delgado, 2015)
The main assumption for compositional data is that the relevant information is contained in the ratios between the abundances of species. Therefore, does not matter about the number of individuals measured but on the proportions of the species. When this is so, one realizes that the scale of proportions is not absolute but relative. That is, observing 5 individuals compared with 10 individuals is just observing half abundance; observing 1000 individuals compared with an abundance of 1005 is almost the same. In this circumstances, standard mean values and variances are no longer valid, and a suitable transformation is mandatory in order to transform into a new scale which can be assumed near to absolute. If you are interested in compositional information (I think this is your case) the immediate recipe is: "transform your data using an isometric log-ratio transformation (ilr)", so that you represent your proportions by ilr-coordinates. The interesting result is that standard statistics can be safely applied to that ilr-coordinates (see Mateu-Figueras, G., Pawlowsky-Glahn, V. and Egozcue, J. J.:
The principle of working on coordinates. In Pawlowsky-Glahn, V. and Buccianti A. (Eds.) Compositional Data Analysis: Theory and Applications,
ISBN-10: 0-470-71135-3, Wiley, Chichester UK, 2011.
However, your data are counts that may be very low, including zeros, and large counts. Large counts, divided by the number of observations can be identified with proportions or probabilities. However, low counts cannot. If you observe 1000 individuals and you get 0 of species A, it does not mean that the actual proportion for species A is 0 but a small proportion. In these cases with low counts, the counts by themselves are not considered compositional. What is effectively compositional are the probabilities (proportions of each species) in a multinomial sampling. This is what is done in multinomial logistic regression. For more information see
Josep-Antoni Martín-Fernández et al.
Bayesian-multiplicative treatment of count zeros in compositional data sets, Statistical Modelling 2015; 15(2): 134-158
With respect to other transformations. The logit transformation is a limiting case of Box-Cox transformation. The ilr-coordinates are a multivariate generalization of the univariate logit. Componentwise logarithmic transformation produces analyses that are dependent on the total number of counts. No transformation confronts you with spurious correlation if the data is assumed compositional.
Compositional data packages are available in R: "compositions", "zCompositions","robCompositions". See also the free stand alone program CoDapack (teaching software) for exploratory analysis of compositional data.
  • asked a question related to Parametric Statistics
Question
1 answer
This question is in the context of aproach paramétric to analyse the data obtained in a experimental study.
Relevant answer
Answer
It does not make sense to say that there is a clear-cut dividing line - at any of the thresholds you suggest, or indeed at any other... It is true that 5%, 10% (and indeed four-sigma and five-sigma !) have been adopted as approximate guidelines in various contexts - but one could never say (for example) that 4.9% means unconditional acceptance, with 5.1% implying complete rejection.   
  • asked a question related to Parametric Statistics
Question
9 answers
- Sample Size = 134
- Shapiro Wilk Statistics all >.90, but also statistically significant.
- LOG10 transformation --> still not-normal distributed
Relevant answer
Answer
Hi,
If the distribution is not normal or if your sample size is too small you have to use nonparametric tests.
  • asked a question related to Parametric Statistics
Question
6 answers
My PhD thesis contain a cross-culture research between two nationalities witch requires me to standardized my data to avoid the issue of responses type a cross-cultures for that i did the Z-score to standardized my data in Spss. So my question is: is it right to run T-test and Correlations directly to the standardized data or is it just for raw data?
Relevant answer
Answer
Sorry, I erased my previous answer, because it was not thought out well. So Stephen refers to my first answer.
In principle, you could use the z-transformed variable for t-tests but it makes not always sense! You have to take care how you transform the variable. Just think about the variables: after transformation each variable has a mean of 1 and a standard deviation of 0. What will happen if you compare two independent samples: right the difference of 0-0 is 0, so the t-test makes no sense! So, if you use z-transformed dependent variables, the z-transformation has to be done across all groups.
It is very similar with the dependent t-test, especially for highly or not correlated variables. With a high correlation, both variables will look nearly the same. For example, take a random variable, create a second one by adding a fixed value. If you run a t-test on this raw data, you will probably find a difference. Now z-transform those two variables and run again the t-test, the difference will be zero.
  • asked a question related to Parametric Statistics
Question
4 answers
Under what circumstances may parametric tests be used for non-normal data?
Please provide article or texts I may review to substantiate your response.
Relevant answer
Answer
Thanks Bryan:
Much appreciated.
  • asked a question related to Parametric Statistics
Question
7 answers
I am doing some literature review on non parametric regression techniques.
I would like to ask those familiar with the topic if you may know the disadvantages and advantages of ANNs compared to other non parametric regression techniques like :
- MARS (Multiple Adaptive Regression Spline)
- Projection Pursuit Regression
- Gaussion Process Models (?)
- Additive Models
Is There Anyone who has a comparative literature on it?
Your Contribution will be of great help.
Relevant answer
Answer
Hi Yves,
Maybe this paper where ANN is compared among other models (MARS, PPR, SVR, RF) is useful to you.
Cheers,
Manuel
  • asked a question related to Parametric Statistics
Question
19 answers
I'm going to use Pearson's correlation coefficient in order to investigate some correlations in my study. I've tested my data and I'm pretty sure that the distribution of my data is non-normal. My data are the cumulative incidence cases of a particular disease in 50 wards. Can I use Pearson's coefficient or not? (Link me to references if there be.)
Relevant answer
Answer
Dear Mohsen Ahmadkhani,
Pearson's correlation is a measure of the linear relationship between two continuous random variables. It does not assume normality although it does assume finite variances and finite covariance. When the variables are bivariate normal, Pearson's correlation provides a complete description of the association.
Spearman's correlation applies to ranks and so provides a measure of a monotonic relationship between two continuous random variables. It is also useful with ordinal data and is robust to outliers (unlike Pearson's correlation).
The distribution of either correlation coefficient will depend on the underlying distribution, although both are asymptotically normal because of the central limit theorem.
Don't forget Kendall's tau! Roger Newson has argued for the superiority of Kendall's τa over Spearman's correlation rS as a rank-based measure of correlation in a paper whose full text is now freely available online:
Newson R. Parameters behind "nonparametric" statistics: Kendall's tau,Somers' D and median differences. Stata Journal 2002; 2(1):45-64.
He references (on p47) Kendall & Gibbons (1990) as arguing that "...confidence intervals for Spearman’s rS are less reliable and less interpretable than confidence intervals for Kendall’s τ-parameters, but the sample Spearman’s rS is much more easily calculated without a computer" (which is no longer of much importance of course). Unfortunately I don't have easy access to a copy of their book:
Kendall, M. G. and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. London: Griffin.
  • asked a question related to Parametric Statistics
Question
3 answers
Kindly, How to use maximum likelihood method or any other method to parametrize a mathematical model to discuss risk analysis?
Relevant answer
Answer
If your (probabilistic) model governing the data creation is well-specified, then the method of Maximum Likelihood Estimation is literally finding those parameters that optimize the likelihood function given the observed data.
Under standard assumptions, the likelihood function is the product of the probability model n times (number of data observations). This function will have specific form in the parameters.
Suggest the following to get you started:
  • asked a question related to Parametric Statistics
Question
3 answers
If you are given a parametric surface, possibly non smooth, what is the best way of finding its global minimum with related matlab commands?
Relevant answer
Answer
Another option would be to use genetic algorithms (the relevant command is ga in Matlab) as well as other related stochastic procedures, inspired from evolution or various other natural processes.
Such algorithms are able to find global optima much easier than the elementary optimization algorithms.
  • asked a question related to Parametric Statistics
Question
11 answers
I have seen researchers use Pearson's correlation coefficient (r) or coefficient of determination to evaluate performance of developed models. So what is the difference between them? 
Relevant answer
Answer
Hi Arman, are you sure they refer to a "coefficient of regression" (R)? It think it would make more sense if they referred to Pearson's correlation coefficient (r). In that case the coefficient of determination R^2 would be equal to the square of r. 
Pearson's r is usually used to express the correlation between two quantities. For example let's say you want to demonstrate that the hours spend studying are correlated with the grade obtained in an exam. You could calculate Pearson's r to evaluate whether the two quantities are correlated. 
R^2 is usually used to evaluate the quality of fit of a model on data. It expresses what fraction of the variability of your dependent variable (Y) is explained by your independent variable (X).  
  • asked a question related to Parametric Statistics
Question
7 answers
Can the Pain Intensity Numerical Rating Scale be treated as an interval scale to use a parametric statistical analysis such as ANOVA?
Relevant answer
Answer
Aleksander, you may be interested in Ronán Conroy's Stata Journal article in which he argues very convincingly that the Wilcoxon-Mann-Whitney test is not properly called a nonparametric test:  In fact, it does estimate a useful parameter, viz., the proportion of cases in which a randomly sampled score from population 0 is higher than a randomly sampled observation from population 1.  HTH.  ;-)
  • asked a question related to Parametric Statistics
Question
4 answers
I have carried out impedance spectroscopy of my sample in the temperature range of 303-373 k, in complex impedance spectra I try to feet my data with using equivalent R-C circuit and evaluating the value of R and C. All the nyquist plot consist of 3 depressed semi-circle from higher to lower frequencies. So first circle at higher frequency I assigned as a bulk resistance and bulk capacitance. generally the value of bulk resistance is decreased with  increase in temperature but in my case the bulk resistance gives random value for different temperature. so what are the reasons for such type of behavior. My nyquist plots also consist series resistance.
Relevant answer
Answer
The bulk resistance follows the Arrhenius law in different temperature. The activation energy values at low temperature and high temperature are difference.  In generally the value of bulk resistance is decreased with  increase in temperature   What is your system and measures experimental condiditions?
  • asked a question related to Parametric Statistics
Question
4 answers
If we have an experimental design with control and experimental group, and we run a normality test (i.e., Kolmogorov-Smirnov) for each group and it show that one group has a normal distribution and the other has not. Should we use a parametric test or non-parametric test?
Thank you
Relevant answer
Answer
How do you conclude that the distribution of the one group is normal? Because the KS-test was not significant? - But this surely is the interpretation of a non-significant result, what actually is a clear no-go.
Test on normality don't help much, since they don't answer your question, and they were not intended to be used in such a way.
It is also not the question whether or not a distribution is normal. Real data is never normal distributed, and it is only a mettar of the sample size to make any normality test "detect" the deviation at any arbitrary level of significance. The question is rather: is kind and the strength of the deviation of any relevance to the problem?
And finally you should not select a test based on the distribution. Each test tests a particular hypothesis (with in a particular model). You should have some justifiable idea about the model and the hypothesis you are interested in. If you then find that there is no nice way to perform the test *you* want, then there is always the option to bootstrap the sampling distribution.
  • asked a question related to Parametric Statistics
Question
13 answers
Which different methods are used for parametric optimization of EDM? Which is the best suited?
What is the advantages, limitations and difference between different methods of optimization of EDM like taguchi, regression analysis, genetic algorithm, gauss elimination, ANOVA etc...
Relevant answer
Answer
The main problem is not the selection of the optimization method. The first most serious mistake is made, the correct selection of the independent process parameters. For this I must as know at the EDM, which process energy source (PES) I use. After that I have sufficiently well explain why I do not use all the process parameters, ie, they should be for the optimization area without greater impact. The second problem is the selection of the optimization area. Many works assume the setting parameters of the existing industrial plant. In this case, the optimization is only one technology-finding. In scientific work I demand that from the fundamental knowledge once the best pulse parameters are selected. Emergency then also new PES need to be developed and built.
Then a larger parameter range must be optimized, which is not necessarily two or three measurement points are ideal. Can an optimization area are limited by logical considerations, then it must also be specified. In a variety of work on the optimization but are very narrow parameter ranges "optimized" for which I did not need any great mathematical methods.
Ultimately, I have to find out the result and properly evaluate me. For example, if the current 2-5 A varied and comes out as a result of 3.789 A, then that is really nonsense. Why, because I can not set this value and my process control usually leads that only such a mean value can be achieved.
So I have to consider whether the values can be set whether the scheme, these values not constantly changed and whether the actual process flow does not cause a change of this parameter.
We once compared different optimization methods in the 1980s ziger years and superimposed topologies from abrasion, wear and roughness. Then she received an optimization area and you can make a statement, what happens when certain changes to the process parameters.
  • asked a question related to Parametric Statistics
Question
12 answers
I have come across many articles where gender has been used as a moderating variable or as a control variable in the model applying multiple regression (standard/step-wise) and gender has been coded as 1,2. one of the assumptions of parametric statistics like Pearson Correlation or regression analysis is having data on continuous scale/interval scale. How to justify including gender and coding it as 1,2 in the model.
Relevant answer
Answer
Hi Eshrat,
You are mixing up two different approaches here. Correlation is not synonymous with regression in what you are trying to accomplish. 
In regression you can certainly regress the outcome on a binary variable such as gender (regardless if it is coded 0,1 or 1,2). You could also use a t-test in which you compare the difference in means between the two gender types. These are indeed similar approaches.
Where things differ here is that in multiple regression you will likely add additional covariates to the model, typically as control variables. T-tests do not allow for covariates.
I hope this helps
Ariel
  • asked a question related to Parametric Statistics
Question
7 answers
I want to compare data from four independent groups (as part of a three-way ANOVA in SPSS). The data are not normally distributed, so I want to apply a transformation so that I can run parametric statistical tests. However, some groups are skewed positively and some negatively. As there are different transformations for different issues, I could apply the relevant transformation, but then I wouldn't be able to compare the groups.
How should I approach treating this data?
Should I transform it, or should I simply run the ANOVAs and report on the violations? Or should I not transform it and run non-parametric tests?
Relevant answer
Answer
Tyron, I think you have at least two options to proceed. One is that you may consider using the the Kruskal-Wallis test (non-parametric) with an interaction factor coded as three digits, each one representing the level of your three factors. Another one is to try to normalize your data using a logarithmic transformation and then recheck normality; if the number of violations diminishes beyond a previously determined threshold (e.g. one quarter of the variables) you might reconsider the ANOVA test on the logarithmic data.
  • asked a question related to Parametric Statistics
Question
9 answers
I have a sample (n=30) of surveys which were completed pre and post a training day. All people involved in the training day completed the questions pre training on a 5 point (strongly disagree to strongly agree) scale. They then completed the same questions post training. 
Can i use a parametric statistic (paired samples t test), or should i use a non parametric alternative (Wilcox signed rank test)? Or something completely different???
Thanks
Relevant answer
Answer
Assumption for applying t test is that data must be normally  distributed, if data is not  normally distributed and if there is outliers in data then wilcoxon sign rank test is more powerfull... u can check normallity by applying boxplot graph...
  • asked a question related to Parametric Statistics
Question
8 answers
Hi everyone,
I want to do a survey research on the cosmetics companies in Malaysia,and using parametric method to analyze data. Unfortunately, the number of cosmetics companies is less than 20 so my total target population is less than 20. How to sample?and  what is appropriate sampling method to used?do i have to change my research design from quantitative to qualitative?
Relevant answer
Answer
Rosita -
An answer to your question depends upon the kind of data you are collecting (continuous, yes/no, etc.), the purpose (inference to the remainder of the population, descriptive, ?), number and nature of questions, accuracy needed, inherent variances  (sigmas), resources available,  goals, etc. 
I'll speak to a quantitative survey.
Are you saying that the entire finite population size, N, is less than 20, and you want a sample size, n?  If you have sufficient resources that you can easily census this without strain which might otherwise promote nonsampling error, such as measurement error, then n = N would give you the result a smaller n only aspires to provide.  But if you cannot easily do that, then a random sample size would depend upon whether or not it may be best to stratify, which would leave very small samples in each stratum, and whether or not you have an idea as to what sigmas may be. If you have regressor data on the population, then you have other options. 
If you do a sample rather than census, you should consider the finite population correction  (fpc) factor.  (See, for example, https://www.researchgate.net/publication/262970761_FINITE_POPULATION_CORRECTION_%28fpc%29_FACTOR.)
Graphics are important to learn what you can from the data, for generally any sample or census.
Here are a couple of suggestions among many possible textbooks that may be of use to you:
Cochran, W.G.(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.
Blair, E. and Blair, J.(2015), Applied Survey Sampling, Sage Publications.
Best wishes - Jim 
  • asked a question related to Parametric Statistics
Question
2 answers
We have two techniques/methods/algorithms. One produces results that follow normal distribution; however, distribution of the results of the other is not known. If we want to arrive at some conclusion on performance comparison, whether two systems are same, different, or one is better than other, which statistic-test or sequence of statistic-tests need to be performed?
Relevant answer
Answer
sorry I do not have the expertise to answer this question.
Regards
  • asked a question related to Parametric Statistics
Question
4 answers
Is there any  study (references) about estimating the scale parameter of Laplace distribution with assuming that, the Location parameter is unknown?. Any good references/sources to have a look please? Many thanks in advance for any help.
 
Relevant answer
Answer
Have you gone through the following link
There are very interesting brief results, I think its obtained by  Padmini Uma,
  • asked a question related to Parametric Statistics
Question
5 answers
Hello. I´m looking for a paper using this statistical test: canonical correlation analysis with optimal scaling. I perform this test in SAS and SPSS and i get different kind of result.
And another question for SAS users or in the sense of parametric statistic. Are variable on Likert scale useful to do such analysis and can it be mixed with variable on ratio scale
I welcome any paper with those types of analysis... REGARDS
Relevant answer
Dear Emmanuel, 
Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis.
Waaijenborg S, Zwinderman AH. Bioinformatics. 2009 Nov 1;25(21):2764-71. 
Medium optimization for the production of a novel bioflocculant from Halomonas sp. V3a' using response surface methodology.
He J, Zhen Q, Qiu N, Liu Z, Wang B, Shao Z, Yu Z.
Bioresour Technol. 2009 Dec;100(23):5922-7.  
Constructing personality maps, mapping personality constructs: multidimensional scaling recovers the big five factors from internal and external structure.
Bimler D, Kirkland J.
Span J Psychol. 2007 May;10(1):68-81.
Calculation of phase coexistence properties and surface tensions of n-alkanes with grand-canonical transition-matrix monte carlo simulation and finite-size scaling.
Singh JK, Errington JR.
J Phys Chem B. 2006 Jan 26;110(3):1369-7
  • asked a question related to Parametric Statistics
Question
11 answers
In inferenctial statistics we consider the significance result if the statistical test value (e.g t stat.) is above the critical value, why in non-parametric test, the value supposed to be below the critical value to be considered significant?
Relevant answer
Answer
The idea with "significance" is whether or not a statistic corresponds to a p-value below a given level.  The word "significance" is often misleading because the effect size may not be substantial. Setting one level for all cases, usually 0.05, is a bad idea, because the p-value is a function of sample size.  An isolated p-value is rather meaningless. A power/type II error analysis or other sensitivity analysis is needed. If you can use a confidence interval, even if you have to use Tschebychev's inequality, you are generally much better situated to make informed, practical decisions. 
Parametric tests generally have more "power," because they generally make fuller use of the available information. But they also make assumptions. A distribution-free test generally, say a rank, or more infamously, a sign test in general, will not use available information fully, but also does not make distributional assumptions. They thus have lower 'power.'  
Note that the consideration of power should not just be for a consideration when picking a test. It is important to know for a given outcome to go with a given 'achieved' p-value.   
  • asked a question related to Parametric Statistics
Question
11 answers
Many statistical textbooks state that parametric significance tests require a normal distribution of the samples' data points.
Norman and Streiner write: "It seems that most people who have taken a stats course forget [a] basic idea when they start to worry about when to use parametric statistics such as t tests. Although it is true that parametric statistics hang on the idea of a normal distribution, all we need is a normal distribution of the means, not of the original data." (My italics.) (Norman GR, Streiner DL (2003) PDQ Statistics, 3rd ed., BC Becker Inc., Hamilton, London.)
I am not enough of a mathematician to verify this statement. What are your thoughts?
Relevant answer
Answer
"Many statistical textbooks state that parametric significance tests require a normal distribution of the samples' data points."
That's either wrong or a wrong interpretation.
The parametric tests are about parameters in a model. The models are fitted to data based on assumptions about properties of the "errors" (differences between the "best" model predictions and the observed data). These assumptions may imply a normal distribution of the "errors", but they may also imply any other distribution. The key is that the whole maths behind model fitting and assessment of statistical significance becomes very simple when the distribution of "errors" is normal. In ancient days, where no computers were available, people were restricted to such models because everything else was next to impossible to calculate, and R.A. Fisher developed nice and simple-to-use "shortcuts" to get the result without applying any complicated math (like the calculation fo the t-value as mean-difference divided by the standard error of the mean difference; the corresponding p-value could then be obtained by a provided look-up table; everything could be done with pencil and paper).
So there are two misconceptions:
- "parametric" is related to a model and not to the normal distribution
- the often required normal distribution is about the "errors" and not about the data.
Now to your citation of Norman and Steiner:
It is half-way correct.
Given the "errors" are exactly normal distributed, then statistical models based on this assumptions are exact. When the error-distribution is not exactly normal, then these tests are not exact. In reality, nothing has a perfect, exact normal distribution. It is like with a "circle": a circle is a mathematical concept of a set of points that have all the identical distance to the same (center)point. Although this concept is very useful in practice, there is nothing in nature that is really exactly circular. So in reality, there is no exact test based on such assumptions anyway. The key question is if the model is good enough for the purpose.
All parametric models internally work on the likelihood function of the data. For instance, the t-test is nothing but a likelihood-ratio test that compares the likelihoods of the data for two values of the parameter (the parameter is the mean difference between groups; one value is the null-hypothesis value, the other one is the value under which the likelihood of the data is maximal). The shape of this likelihood function approximates the shape of the normal distribution, and the approximation gets better for increasing sample sizes (this follows from the central limit theorem). Therefore, a test that is assuming that the likelihood function has the shape of a normal distribution will give quite reasonable or useful results even when the distribution of the errors is clearly not normal, as long as the sample size is large enough, so that the shape of the likelihood function is a reasonable approximation of the shape of a normal distribution.
  • asked a question related to Parametric Statistics
Question
10 answers
I am trying to perform a mixed ANOVA to compare the effects of two teaching methods on a group of students at three different points in time. I have read that parametric tests must not be used on ordinal data. However, after testing my variables using Shapiro-Wilk test, all of them meet the criteria for a normal distribution.
Would I concur in a methodological mistake if I performed a Mixed ANOVA?
Relevant answer
Brilliant. Thank you to both of you for your help and recommendations. It is very much appreciated.
  • asked a question related to Parametric Statistics
Question
2 answers
The model is y_i = x_i'*beta + n_i'*f+epsilon_i, where beta is the parametric coefficient regression of vaiables X1,..., Xp; f is the functional form of the nonparametric curve (an spline); and epsilon_i is the error term.
Relevant answer
Answer
Thanks, Emmanuel.
  • asked a question related to Parametric Statistics
Question
14 answers
Hi everyone. I've run the Wald test for zero interactions and found out that there is an interaction effect in my regression model. Then, I tested my model for Multicollinarity using VIF and found that those variables with interaction terms have a high VIF > 10. So, I'm wondering should I take out these variables (assuming that there is no interaction effect, which by the way would mean that there are omitted variables in my regression). Any suggestions on this will be greatly appreciated.
Thanks
Relevant answer
Answer
What I find very convenient is using PCA, as recommended also by Yuanzhang Li. A PCA on the 22 variables yield also 22 PC’s, ranging from very important, PC1, to insignificant, PC22. I would retain and continue calculations only with the most relevant PC’s. In other words, use only those PC’s that explain say >= 95% of the total variation and skip the rest. Usually this leaves you with much less PC’s. A PC analysis on the correlation matrix effectively eliminates multicollinearity and results in PC’s that are orthogonal ie are linear independent of each other. Since each PC is a linear combination of the original variables they can be calculated back to the original variables after performing a MLR on the PC’s. Thus,
PC1 = a1 x A + b1 x B  ….. v1 x V              (x is the multiplication sign)
PC2 = a2 x A + b2 x B  ….. v2 x V
Etc
Run MLR on the PC’s.
This yields,
Response = coef1 x PC1 + coeff2 x PC2……………+.coeffx x PCx   (where x < 22)
Now calculate back to original variables A through V
This yields,
Response = coeffa x A  + coeffb x B……………+.coeffv x V
Now the effect or contribution of the each variable can be evaluated. If small, discard it from the equation. Based on a residual analysis you can try to incorporate interaction terms, such as A x B,or even quadratic terms. But only if this necessary and you have sufficient degrees of freedom.
I hope this helps.
  • asked a question related to Parametric Statistics
Question
8 answers
What are the best methods to analyse nominal, ordinal, interval, and ratio data in Life Sciences / Agricultural Sciences with reference to  Non-parametric & parametric tests?
Do you've a taxonomic illustration for statistical analytic methods according to type of data, Parametric/Non-parametric, Uni-variate/Multivariate tests.
Sample is attached.
Relevant answer
Answer
You can use such taxonomic charts as a rough guide but in principle your analysis depends on your actual research question and the structure and properties of your data. Consult an applied statistician.
  • asked a question related to Parametric Statistics
Question
2 answers
Need explanation with simple example.
Relevant answer
Answer
I use the following small data set with three groups:
DATA
Group one Group two Group three
2 12 17
4 13 18
6 14 19
8 15 21
16 22
23
Overall descriptive statistics
Group N Sum Mean Std. Deviation Variance SS*
ALL 15 210 14. 6.536 42.714 598
*SS=Sum of Squares of deviations from the mean=(N-1)*Variance
GROUP STATISTICS
Group N Sum Mean Std Deviation Variance SS
One 4 20 5. 2.582 6.667 20
Two 5 70 14. 1.581 2.5 10
Three 6 120 20. 2.366 5.6 28
ANOVA
Response
Sum of Squares df Mean Square F Sig.
Between Groups 540. 2 270 55.862 .000
Within Groups 58 12 4.833
Total 598. 14
The total sum of squares (of deviations from the overall mean) = 598
The within group sum of squares = 20 + 10 + 28 = 58
This can be thought of as the noise or random part, because there is no other explanation why the 4 observations in group 1 are (2 4 6 8) rather than (5 5 5 5) etc. for the other groups.
The between groups sum of squares = 598 – 58 = 540
It can also be written as 4(5-14)**2 +5 (14-14)**2 + 6(20-14)**2=540
The reason why the 3 group means are (5 14 20) rather than (14 14 14) is the fact that they belong to different groups.
We say the total sum of squares is split into Between Group and Within Group sums of squares.
The degrees of freedom between groups is 3-1=number of groups-1
The degrees of freedom within groups = (4-1)+(5-1)+(6-1)=12
The total degrees of freedom is 15-1=14
Mean Square = Sum of squares/df
MS Between groups: 540/2=270
MS Within groups: 58/12 = 4.833
F=270/4.833 = 55.862
This F can be thought of as a Signal to Noise ratio
  • asked a question related to Parametric Statistics
Question
12 answers
I am trying to find statistical differences between two lists of values, where the 1st list is smaller and generally shows Gaussian distribution and the 2nd list is much larger and generally non-parametric. Is it still appropriate to use the Mann Whitney U test here?
Secondly, Can you use the Mann Whitney U test on Parametric data? In some cases I have two lists which are both Gaussian, but as most of my data is non-parametric, is it okay to use Mann Whitney (for consistency)?
Relevant answer
Answer
@ Dave and Alan, what many overlook is that MW is at least interval scaled data (so for instance *not* for ordinal scaled data, as often misunderstood), and it is a test for location shift only and only if the distributions of the compared groups are similar execpt for the location (e.i. all other moments must be equal). I often see that MW test is used as a test for location difference when the data is heteroskedastic (different variances in the groups). But the MW is not a test of location (hard to say what it actually tests then). Thus: "doesn't care what the underlying distribution (if any) is." is often severely misinterpreted! Althogh the shape of the distribution does not matter, it must be the SAME in both groups (except for the location).
  • asked a question related to Parametric Statistics
Question
10 answers
What is the difference between parametric and non parametric research?
Relevant answer
Answer
Parametric methods deal with the estimation of population parameters (like the mean)
while the non-parametric are distribution free methods. They rely on ordering (ranking) of observations.
  • asked a question related to Parametric Statistics
Question
1 answer
Definition and examples, please.
Relevant answer
Answer
Muhammad - you put one of your topics as non-parametric research; can't you answer this question yourself? Just pick up any random basic research text and all should be revealed!!
  • asked a question related to Parametric Statistics
Question
1 answer
I'm looking for references which nicely explain something like "evolution of regression analysis from SLR (Simple Linear Regression) to NPR (Non-Parametric Regression)".
Please let me know your suggestions.
Relevant answer
Answer
A number of modern regression texts cover this material, such as John Fox's or Sandy Weisberg's.
  • asked a question related to Parametric Statistics
Question
16 answers
The distribution of values in the samples should provide a good estimate of the population distribution. If this is skewed we are often told to avoid using parametric statistics. However, doesn't the central limit theorem (CLT) contradict this? The central limit theorem states that provided the samples are not tiny, the sampling distribution will always be normal even if the population distribution (estimated by the distribution within the study sample) is skewed. Hence it seems wrong to not use parametric statistics, which are, of course, carried out on the (normal) sampling distribution.
Relevant answer
Answer
Hi Mark
Your sample has to be both around 30 odd values and a random sample from the population of interest. If so, feel free to use parametric statistics on what is going to be a normal distribution in the limit and close enough in practice.
You can have less than 30 odd values, randomly selected, from a population already known to be normally distributed and still use parametric statistics.
You can have arbitrarily many values, not randomly selected and you can't make inferences about the population at all.
Trust this helps
Best
Marco
  • asked a question related to Parametric Statistics
Question
2 answers
Normality test may be easier to establish in ideal conditions but in real life working conditions, say in industrial-type organizations, such a test does face some challenges. Data collection for my study is based on stratified sampling i.e closeness of fit to the population. If the respondents do not have much time, nor do they have every opportunity to be to be sampled, it could be more practical to pre-select the respondents due to their availablility. Under such conditions, the results of a normality test could be less than 0.05 and the data may not be normal. Could this be accepted or should countermeasures be carried out? Are there exceptions for not carrying out normality tests or how could this problem be addressed to meet the 0.05 requirement? E.g. transform the data, parametric test? What are your views?
Relevant answer
Answer
1) Normality tests are often too sensitive. How do you define a sensible/reasonable sensitivity (power)?
2) Deviations from normality ofen compromise the power, but not the type-I error rate. What is more important for you?
3) What is your explanation for the fact that your "availbility sampling" seems to change resonable assumptions about the errors (that is, seem to give "non-normal" data whereas the other selection method gives "normal" data)?
4) Transformations are a good option, especially when they can be interpreted in a sensible natural, logical way. If you transform, then you need to transform all similar data, not just those for which a transformation seems to improve "normality".
5) Non-parametric tests (i.e. rank-based tests) are possible, but here you need to specifiy what you actually are testing. To test a location shift, the frequency distributions for the groups must be identical (except for the location). Otherwise, such tests will just indicate differences in the compound of symmetry&location of the distributions.