De Montfort University

Question

Asked 1st Jul, 2014

# If you had a data set which exhibited both non-normally distributed and normally distributed data, which statistical test would you use?

A parametric or non parametric test?

## Most recent answer

Hi,

In Andy field's book ''Discovering statistics using SPSS''>>Differences between several related groups (page 575), the test of normality-using Shapiro–Wilk test- resulted in 2 groups that weren't normally distributed and one group was normally distributed. The author proceeded with a non parametric test 'using Friedman test'

2 Recommendations

## Popular answers (1)

Rio de Janeiro State University

Hi

it is really commom. I did this question to my biostatistic professor some time ago. If you have to choose, it is better and more realistic use non-parametric analysis. Because it is better consider a parametric data as a non-parametric one than the opposite.

I hope I helped you.

Tatiana

19 Recommendations

## All Answers (36)

Dow University of Health Sciences

How Can a Single data show both normal and skewed distribution...

1 Recommendation

Newcastle University

Hi Ambrina, thanks for your reply.

hm..how should i put this....i have a few variables to compare..and the data for each variables showed different distribution.

1 Recommendation

Instituto Nacional De Geriatría

Hello Gheeseon Lim!

The analysis must be designed according to the distribution of the dependent variable. Of course you must take into account the distribution of independent variables, however, the main objective is to find an explanation the de distribution of the dependent variable.

I hope you find this useful.

Cheers

2 Recommendations

University of Malaya

Hi Lim,

Yes, this could happen. If you are comparing between a number of groups, and the data of some groups are homogenous and others are not, then you have to count which groups are homogenous and which are not. If the number of homogenous groups is more than or equal to non-homogenous, then use parametric, if not, use non-parametic. The same concept is applied if you are comparing between different groups with some variables (count then choose the test).

Please note that this issue is normal, especially in cytotoxicity assays such as MTT assay, but non acceptable in others such as real time PCR, where the SD is very low.

I hope this could help!,

Cheers,

HMA Ahmed.

15 Recommendations

Rio de Janeiro State University

Hi

it is really commom. I did this question to my biostatistic professor some time ago. If you have to choose, it is better and more realistic use non-parametric analysis. Because it is better consider a parametric data as a non-parametric one than the opposite.

I hope I helped you.

Tatiana

19 Recommendations

University of Turku

I'd agree with Tatiana on this: Isn't it generally harder to find significance on nonparametric analyses (such as U-test vs the T-test) and as such the significant results gained would be even more believable when acquired using such a method (especially if M&M states that the data was at times normally distributed)?

1 Recommendation

Maulana Azad National Institute of Technology, Bhopal

This happened in my case where one variable's data was non-normally distributed and other two were normally distributed. When I did the analysis using only non normally distributed I used non parametric test, whereas when I used only normally distributed data, I used parametric test. I would like to know whether this is right or wrong using both parametric and non parametric tests in a study for different variables?

3 Recommendations

Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán

Adding to Namrata's question, I have a similar situation. In my experiment there are three time points (three different days of the experiment) and two independent groups (it is important to mention that the subjects are always different for each time point). When I use the Shapiro-Wilk's test to evaluate normality, most groups pass the test but some do not. Therefore, should I employ a non-parametric test as recommended by Tatiana, a parametric test (because most groups pass the normality test) as recommended by Hany, or a mix of both parametric and non-parametric tests as suggested by Namrata? I am sure that only one of these three options should be the more appropriate one. But which of the three is it? Thanks in advance!

1 Recommendation

University of Houston College of Optometry

I am in the situation Leon. I will use a mix of both but I also read that "true normality is actually a myth", further I also learned that when you have a large enough sample sizes (n > 30 or 40), the “violation” of the normality assumption should not cause major problems; this implies that we have some liberty to apply some parametric procedures even when the data are not normally distributed . This has some connection to the central limit theorem, if the sample data are approximately normal then the sampling distribution too will be normal. On this note, please check your sample size. Hope this helps. :)

1 Recommendation

Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán

Thanks for your response Eugene. From what I have investigated thus far, the best option of the three decison criteria is to use a

**combination of both***parametric and non-parametric tests*as suggested by Namrata and you. When you are dealing with**small sized groups as is my case, normality plays an important role in the statistical analysis. First, you run all your groups through a non-parametric test as suggested by Tatiana. But you don't stop there. You can further test all your groups for normality and variance distribution and the groups that pass both tests can be further subjected to a parametric test. This will allow an additional opportunity for attempting to find a statistically significant difference in groups that did not reveal a statistically significant effect when using a non-parametric test. Finally, it is very important to disclose in the material & method section, that some groups (specify which groups) were subjected to parametric and non-parametric tests, while others just to non-parametric tests as they did not gather the necessary criteria for using a parametric test.**6 Recommendations

The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger —

*no matter what the shape of the population distribution*. This fact holds especially true for sample sizes over 30.Islamic Azad University Falavarjan Branch

Dear Prof. Gheeseong Lim ,

Thanks for your interesting question. Today, I faced this problem and the answers helped me a lot. I had two groups, one of which had normally distributed data and another of which has non-normally distributed data. I'm working on the Social Sciences field and this is common in our field, as well. My major is Applied Linguistics.

Thank you and best wishes,

Laya

4 Recommendations

University of New Mexico

Excellent discussion from our colleagues. All points are well taken including use of parametric vs. non-parametric tests, sample size, central limit theorem, etc. May I add one more idea? When distributions are not normally distributed one does transformation of the data. A common transformation is taking the logarithm of the variable value. This results in highly skewed distributions to become more normal and then they can be analysed using parametric tests. What if you transform all dependent variable data using a log transformation and then you can uniformly apply parametric test, provided all data become normal. Do you think that this is a feasible solution?

4 Recommendations

Neurovision Language Teaching and Research Center

I hope that the following links can provide you with the answers you are looking for.

Conference Paper Clustering Heterogeneous Data Sets

Thank you all for this useful information. I was beginning to think something was wrong with my data. Ali thank you for the readings!

1 Recommendation

University of Ioannina

θα έκανα λογαριθμική ανάλυση με σκοπό να γίνει κανονικοποίηση του δείγματος. εκτός και ακολουθούσε το κεντρικό θεώρημα για το μέγεθος του δείγματος και θα θεωρούσα την κατανομή κανονική

University of Louisville

I agree with checking the normality of the distribution and running parametric and non-parametric tests based on the type of distribution. My question is related to the reporting of such data. For example, if I have four data pairs, out of which two have normal distribution and other do not. Should i be plotting mean or median in such scenario.

Thanks!

UNSW Sydney

For Pawan Sharma's question, I think reporting mean (SD) for normally distributed data and Median (IQR range) for not-normal distribution may be better.

Revathy Mani

Università degli Studi di Milano-Bicocca

I agree with all answer, but there is other considerations. In the case of large samples (my last case >300), some test of normality (i.e. Shapiro-wilk) tend to produce false negatives (data are not normally even the plot seems to be normal) and classically both parametric and non parametric comparisons give the same result. A possible solution is to use a Bayesian approach.

However depends of your analysis and their relative structure. Despite normality, for ANOVA is more important NOT the normal distribution of data, but the normal distribution of residuals.

University of Southern Denmark

Excellent discussion.

I am in an almost same situation;

I have tested 19 participants with a measurements using 11 different methods in two sessions (test and retest). In order to check the effects of method and session on the results of testing, I wanted to use repeated measure ANOVA (parametric analyses). But the data using two different testing methods in retest (one of the session) are not normally distributed. Can I still use repeated measure ANOVA (parametric analyses)?

Thanks in advance.

Università degli Studi di Milano-Bicocca

I think YES. With parametrics you have the possibility to build a correct factorial design.

In the case of ANOVA (which is regression based), the most important factor is NOT the distribution of data, but the distribution of residuals. This point is more important in rmANOVA.

3 Recommendations

Microbial Treasure

the repeated measures test is a longitudinal one. Is your experiment conducted in different time periods?

University of Southern Denmark

Thank you Alessio Facchin

And yes Mehrdad Rabiei test and retest were performed in different time (with some time span interval)

University of Manitoba

As a follow up of the above discussion, i have small sample size, i checked my data and most of them are not normally distributed, so i went to do a non parametric test, then i found some significant difference in the mean for two variables, what should i do next?

1- check the normality of these variables----> if not normally distributed-->log transform the data--->then run a parametric test again only for those variables?

1 Recommendation

College of Medicine & Sagore Dutta Hospital

I have two sets of independent, continuous data. One group shows normal distribution, other doesn't. What to use? Independent t or mann Whitney?

1 Recommendation

All India Institute of Medical Sciences Deoghar

Dear Dr. Podder,

I think this blog would be helpful for you.

College of Medicine & Sagore Dutta Hospital

Thanks a lot @ Dr. Himel Mondal, the blog is super useful. Cleared many of my doubts. Thanks a lot

1 Recommendation

University of Oxford

Dear Hany Mohamed Aly Ahmed ,

Could you please point me to a reference backing up the statement below?

*'If you are comparing between a number of groups, and the data of some groups are homogenous and others are not, then you have to count which groups are homogenous and which are not. If the number of homogenous groups is more than or equal to non-homogenous, then use parametric, if not, use non-parametic. The same concept is applied if you are comparing between different groups with some variables (count then choose the test).'*

Thank you very much for your help.

Best wishes,

Katerina

1 Recommendation

University of Technology Mauritius

Hello,

Did you get a reference for backing your statement? Else, if not, what test did you do if some were normally distributed and other not normally distributed?

Thanks

De Montfort University

Hi,

In Andy field's book ''Discovering statistics using SPSS''>>Differences between several related groups (page 575), the test of normality-using Shapiro–Wilk test- resulted in 2 groups that weren't normally distributed and one group was normally distributed. The author proceeded with a non parametric test 'using Friedman test'

2 Recommendations

## Similar questions and discussions

## Related Publications

The objective of this research was to determine factors that influence application of non-parametric analysis technique. The data emanated from research done by postgraduate students over a ten year period (1995-2004) and archived by the project in postgraduate education research (PPER). A Survey of three South African universities was conducted. T...

This study examines the privatized firms financial performance, to compare pre and post- privatization financial performance from 13 State Owned Enterprises (SOEs). The period used in this study are 1991-2003. The objective of this study is to analyze how big the privatization effects of fir ms financial performance. There are 13 SOEs that have bee...

The test which are based on without knowledge of frequency function and parameter of that distribution. They are known as non-parametric (N.P.) tests. A non-parametric test is a test which is independent of the frequency from which the samples are drawn. In other words, non-parametric test does not make any assumption regarding the form of the popu...