Question
Asked 5th Feb, 2016

# Correct tests to run when Homogeneity of variance is violated in ANOVA?

I have been running some data in SPSS and the homogeneity of variance test has been violated. All three groups in the test have the same sample size. Am I correct in thinking the best tests to run are the Welch and Brown-Forsythe tests and then a post hoc test of Games-Howell.
I am new to stats so thank you for the help in advance.

## Most recent answer

4th Apr, 2021
Savarimuthu vincent de paul
DIET, Kumulur, Tiruchirappalli (Constituent Unit of SCERT, Chennai)
Welch and Brown-Forsythe tests and then a post hoc test of Games-Howell.

## All Answers (24)

6th Feb, 2016
Mehmet Sinan Iyisoy
Necmettin Erbakan Üniversitesi
Yes you are right.
You can go with Welch and BF and GH. I think you are following Andy Field's book.
Actually the important thing when doing Anova, is the homoscedasticity and normality of residuals. But Welch and BF ANOVA are robust enough.
Let us know if you have further questions.
2 Recommendations
6th Feb, 2016
Jochen Wilhelm
Justus-Liebig-Universität Gießen
Before searching a procedure that will work under weird assumptions you should first try to find out what is the reason that your data behaves so considerably different than expected under your assumptions. This way you have the chance to better understand the whole data and to better interpret the results. Some of these questions are:
Why does the variance of the residuals depend on the predictors?
How does it depend on the predictors?
Can you identify a more reasonable model for the data?
If the symmetry of the residuals is not a concern but the uncertainty of the estimates,  then you should consider a weighted regression. Using weights inversely proportional to the variance often works well.
1 Recommendation
Assuming that its proposed data analysis is correct, it should analyze the quality of data (presence of outliers, measurements correct) nce this is a problem in the realization of the data analysis
6th Feb, 2016
Jochen Wilhelm
Justus-Liebig-Universität Gießen
What is bad about the precondition to undrstand your data?
1 Recommendation
7th Feb, 2016
Bruce Weaver
Lakehead University Thunder Bay Campus
Josh, you appear to be concluding that you have problematic heterogeneity because some test of the variances was significant.  This raises several questions (Q) & comments (C).
• Q1. What are the SDs?
• Q2. What homogeneity of variance test did you use?
• C1. When the sample sizes are all the same (as in your case), or nearly the same, ANOVA is quite robust to heterogeneity of variance.  As Box (1953) said, "To make the preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!"
• C2. You could try some simulations yourself to try to determine how much impact your observed level of variance heterogeneity is likely to have.  The link below shows how to do this using Stata, for example.
• C3. If you are talking about one-way ANOVA, the standard methods for dealing with heterogeneity of variance are the Welch or Brown-Forsythe F-tests.  (These are options for the ONEWAY procedure in SPSS, for example.)  Alternatively, if you estimate your model with a procedure intended for multilevel models (e.g., MIXED procedure in SPSS, proc MIXED in SAS, etc), you can allow the variances to be heterogeneous.  This approach can be used for factorial ANOVA models too.  Here are some examples using SPSS MIXED.
* One-way ANOVA via MIXED procedure.
* COVTYPE(DIAG) on the REPEATED sub-command allows
* the variances to be heterogeneous (by group).
MIXED Y BY Group
/FIXED=Group | SSTYPE(3)
/PRINT=SOLUTION R
/REPEATED=Group | SUBJECT(UniqueID) COVTYPE(DIAG)
/EMMEANS = tables(Group).
* Two-way ANOVA via MIXED, allowing for heterogeneity of variance.
MIXED Y BY A B
/FIXED= A B A*B | SSTYPE(3)
/PRINT=SOLUTION R
/EMMEANS=TABLES(A)
/EMMEANS=TABLES(B)
/EMMEANS=TABLES(A*B) compare(B)
/REPEATED= A*B | SUBJECT(UniqueID) COVTYPE(DIAG).
HTH.
1 Recommendation
thanks for the help guys it is much appreciated. I'm new to learning about stats so only have a basic knowledge so far.
The data which i have violated the levene's test of homogeneity of varinace with a result of .048. I have also ran a Shapiro-Wilko test for distribution as well on the data before I ran the one-way ANOVA
1 Recommendation
29th May, 2016
James R Knaub
Home-based research, Retired US Fed Govt
Josh -
At first, reading your question and various answers here, I thought you were looking at heteroscedasticity in a regression, which should be treated more as a degree, not yes there is heteroscedasticity, or no there isn't.  But then it became apparent that you are looking at different samples, not regression, and just comparing variance estimates for those samples, right?   Well, if so, then the estimates of variance would differ from sample to sample, somewhat, even if drawn from the same population.  And even if they came from different populations with different variances, as Jochen indicated, you have to think about how this may or may not impact your study.
If you are looking at differences in means, you might just want to get an estimate of each mean, an estimate of each variance,  and any covariance if appilcable, and just estimate confidence intervals about differences in means, one pair at a time.  This could be much more meaningfully interpreted than ANOVA for all your samples.
Please note the following:
Press release:
Cheers - Jim
2 Recommendations
Dear Josh, I am experiencing the same problem as you did. Which approch did ypu take finlally?... I have a 3 factorial design of 2X2X3 levels which give me 12 treatments. I ran the test of homogeneity of variance and it is significant so I can not apply the factorial ANOVA. I was told to apply a Kruskal Wallis test but that is a one way ANOVA. I am also new to statistics. If you have any input to my case I would appreciate it very much.
Karen,,, How many experimental units per treatment. A low number, implies greater sensitivity to the significance
1 Recommendation
Carlos, there are 15 units per treatment which means 180 unit in total
Do you have Outliers?
1 Recommendation
31st May, 2017
Jochen Wilhelm
Justus-Liebig-Universität Gießen
Adding questions to Carlos' point:
How large is the sample size?
Is the amount of heteroscedasticity relevant (to your problem)?
Is there a more sensible error-model than the normal distribution for your data (e.g. are you analyzing concentrations, rates, waiting-times, counts, proportions that ought to be handeled by gamma-, Poisson-, negative-binomial, beta-, binomial or some other error-model)?
31st May, 2017
Mehmet Sinan Iyisoy
Necmettin Erbakan Üniversitesi
Karen, I don't think that non-homogeneity will be a problem for your case (look at the Bruce's answer above). Also note that homogeneity and normality of residuals are important (not the data itself).
1 Recommendation
5th Feb, 2018
Wahyuni Amalia Idris
Bandung Institute of Technology
I also have a problem with homogeneity of variance. I run Mixed model ANOVA with 2 factors (one within and one between), each factor has 2 levels, each level consist of 12 participants When I run SPSS, the result in Levene ‘s test is violated in one of treatment. Can I still run F test? Or should I make my data homogeneous first?
Thanks for your response…
5th Feb, 2018
Jochen Wilhelm
Justus-Liebig-Universität Gießen
Does the assumption of normal distributed residuals make sense in the first place? If so, you can use a weights to account for the diffence in variance. If not, chose a more reasonable distributional model (you may then still check things like over- or under-dispersion, as the models typically assumes some relationosship between variance and expectation).
1 Recommendation
27th Nov, 2018
National Institute of Abiotic Stress Management
I am new to stat test. I have two factorial data(4 levels under each factor) pooled from two years study. In my case, data are conforming the normality test but failed for homogeneity variance test. now what i should do?. if i go for non parametric test, do i possible to get information regarding the interaction between factors?
27th Nov, 2018
Jochen Wilhelm
Justus-Liebig-Universität Gießen
If the variances in the different groups are not very different (10-fold, at least), I would stay with a usual linear model where you can analyze (and interpret) interactions.
1 Recommendation
What test do you use to check homogeneity of variances? Levene's test ? How much has been P-value?
27th Nov, 2018
Jochen Wilhelm
Justus-Liebig-Universität Gießen
No test! Just check it. Plot the data. Look. Maybe calculate the variances and simply compare them. Is one at least ten times larger than another?
I don't know your sample size(s). I assume this is large enough to get some reliable impression of the variance. If not, there is anyway not much to do.
Another thing: do the variances increase with the means? If so, it may be that the normality assumption is not ok in the first place. You should then check the distribution of the logarithms (usually, the variable is log-normal distributed, so the logs will be normal distributed, and they shuld have homogeneous variances).
24th Mar, 2020
Carlos Vicente Rey
Instituto de Desarrollo Urbano
Based on the question and answers I think its pertinent to make another question here. I want to use Games-Howell post hoc test in Stata but i haven't found to the moment found a way to run this test. Does anybody has a solution for this?
24th Mar, 2020
Bruce Weaver
Lakehead University Thunder Bay Campus
Hello Carlos Vicente Rey. Type <search pwmc> in Stata's command window. From the help for -pwmc-:
Description
• pwmc performs pairwise comparisons of means. It computes pairwise differences of the means of varname over the levels of over(varname). Confidence intervals are derived using procedures allowing for unequal variances across groups.
• pwmc implements Dunnett's C, GH and T2 procedures, as proposed by Dunnett (1980), Games and Howell (1976) and Tamhane (1979).
• pwmci is the immediate form of pwmc. See immed.
HTH.
24th Mar, 2020
Carlos Vicente Rey
Instituto de Desarrollo Urbano
Hello Bruce Weaver . It worked perfectlly! thank a lot!
20th May, 2020
Douglas Flint
The University of Sheffield
Hi Josh Cobb Carlos Vicente Rey thanks so much for your questions. I can see Bruce Weaver has been able to help out again! I've got question on a related post -- https://www.researchgate.net/post/How_do_you_report_a_three-way_ANOVA_with_robust_standard_errors_from_SPSS_output If you're able to help, and had time to post an answer, I'd be so grateful! :)

## Similar questions and discussions

How do you report a three-way ANOVA with robust standard errors from SPSS output?
Question
• Douglas Flint
Hi everyone,
I need some help--and I'm hoping that all of you who are better at statistics than me will be able to help the fog lift!
Background to my problem:
Just to clarify, I have used robust estimates, as my Levene's test was highly significant (p<.001), and I have quite unequal groups sizes for one of the factors. If any statistical amateurs like me are reading this with the same problem, and want a paper that tells you why that might be a problem, I found Field and Wilcox, 2017 pretty non-technical on this. Check it out here: https://osf.io/v3nz4/).
My sample size is 255.
So SPSS generates my output, and all is well and good.
As you might expect, SPSS dutifully outputs: the normal "Test of Between Subjects Effects" which contains my Type III Sum of Squares, F statistics, p value and partial eta sqaured.
It also produces a table of "Paramter Estimates with Robust Standard Errors", which contains the b coefficient, robust standard error, the t-statistic, the p value, and confidence intervals (which I assume are confidence intervals for the b coefficients). It produces such information for one level of each of the terms in the model (i.e. it gives estimates for level 1 of factor 1, and no estimate for level 0 of the same factor--as SPSS says "This parameter is set at 0 because it is redundant"--presumably because the b co-efficients are an estimate of the effect of one level of the factor relative to the other 'base' level of the factor, when also controlling for all other terms in the model).
The rub is that I now don't know what to do with my output in terms of how to report it in papers (I will publish in psychology).
Here's where I could do with your help:
If this were a non-robust three-way ANOVA, I would say something like the following in my paper in this case: "a three-way ANOVA revealed a main effect of factor 1 (F(1, 247)=42.34, p<.001, partial eta sqaured=.14)" and I might report means of the two groups too along with their standard deviations.
But as I understand it, I can't do that in this case, because the statistics provided by the "tests of between subjects effects" table are not calculated using robust standard error estimates. I assume this because the statistics provided are the same as if I had told SPSS to do a normal three way ANOVA without selecting the "Parameter estimates with robust standard errors". I also *believe* that it would defeat the point to report these statistics, and moreover would be misleading to do so, as the F statistic and p value can be affected by heteroskedasticity. That was the whole reason for wanting to use robust standard errors!
So what have I done instead? At the moment, I've written: "A three-way ANOVA was conducted, using heteroskedasticity-consistent standard errors as a Levene’s test indicated that homogeneity of variance in the DV could not be assumed and the number of participants in each group were not equal. There was a main effect of variable 1 (b=1.04, t(247)=2.26, p<.05, ηp2=.02): with participants in the... (imagine descriptions of differences in means between groups here)."
The problem:
My general question is:
(1) Is that daft? Is a reviewer going to scoff and reveal me for the fraud that I feel like I am when it comes to statistics?
Notice, I am not provided with a degree of freedom for the t statistic in "Parameter Estimates with Robust Standard Errors". As I understand it though, I can just read the "df" for this test statistic off the "Error" line from the "Tests of Between Subjects Effects" table (i.e. the one that contains output for a normal, i.e. non-standard-error-corrected, ANOVA), since the t statistic is for the b coefficient, and the b coefficient gives me an estimate of the effect of a one-unit change in the factor concerned on my outcome variable (which in this case, is the same as talking about the effect of an experimental condition relative to a control condition), *when controlling for all the other terms in the model*. Where it's the lattermost part of that sentence that tells me that I have the same amount of degrees of freedom as a normal ANOVA.
My specific questions are:
(2) Is it fine to take my df for the t statistic from the traditional ANOVA output in SPSS?
(3) Is it fine to call what I've done a three-way ANOVA, since there is no F statistic reported here, but rather, horror of horrors, a t statistic?
(4) Does anyone know of any examples of papers in psychology that illustrate how three-way ANOVAs with robust standard errors tend to be reported (I can't find any)?
If you could reply in idiot-proof terms that would be so appreciated. To give you an idea of who you're dealing with here, I tried to learn R last year: after three days of watching tutorials online, I still couldn't even open my dataset in R, and nearly threw my computer out the window.
(Finally, it you're an amateur like me reading this and faced with a similar problem, you might want to check out these other threads for some great tips--I wouldn't have been able to get to this point without reading these first:
Thanks everyone for taking the time to read this in advance, and hope you're all safe at this troubling time.

## Related Publications

Article
Full-text available
When analyzing experimental chemical data, it is often necessary to incorporate the structure of the study design into the chemometric/statistical models to effectively address the research questions of interest. ANOVA-Simultaneous Component Analysis (ASCA) is one of the most prominent methods to include such information in the quantitative analysi...
Article
Full-text available
Non-functional requirements (NFRs) demonstrate how the software system works, though functional requirements represent the tasks of the system (software system). It does not indicate that the latter is more significant, but a majority of requirement collecting approaches emphasis on functional requirements. Therefore, due to the subjective nature a...
Article
When one wants to follow up a significant test, it runs into the problem of multiple comparisons. We are interested in drawing inference on various component hypothesis when the overall hypothesis is rejected. The problem of simultaneous estimation of the entire class of k * =k(k-1)/2 pairwise comparisons θ i -θ j of the means θ i , 1≤i≤k, for imba...
Got a technical question?
Get high-quality answers from experts.