# handling replicate and repeat measurement in ANOVA

Sorry for this relatively simple question which need a long explanation

In a experiment,

--> to illustrate: I am counting the number of step that mice need to walk trough a 10 cm long beam

I have 4 different groups (group A, group B, group C and group D)

-> A) non transgenic animals with B) or without treatment, C) transgenic animal with D) or without treatment

I have in each group a number of replicate (n) different

--> n(A) = 15 ; n(B) = 10 ; n(A) = 14 ; n(A) = 9

I have a number of measurement (m) different for each replicate (n)

--> we do 20 experiements and keep results only when mice do not stop during walking

We will consider an ideal case with normal distribution and equal variances

Question:

Can I perform an analysis of the variance (ANOVA) with a different amount of replicate in each group? Or does it influence so much the equality of variances that I can´t use data like that?

May I use ANOVA with a different amount of measurement in each replicate? And if not, how do you choose which replicates have to be "excluded"?

Thank you

In a experiment,

--> to illustrate: I am counting the number of step that mice need to walk trough a 10 cm long beam

I have 4 different groups (group A, group B, group C and group D)

-> A) non transgenic animals with B) or without treatment, C) transgenic animal with D) or without treatment

I have in each group a number of replicate (n) different

--> n(A) = 15 ; n(B) = 10 ; n(A) = 14 ; n(A) = 9

I have a number of measurement (m) different for each replicate (n)

--> we do 20 experiements and keep results only when mice do not stop during walking

We will consider an ideal case with normal distribution and equal variances

Question:

Can I perform an analysis of the variance (ANOVA) with a different amount of replicate in each group? Or does it influence so much the equality of variances that I can´t use data like that?

May I use ANOVA with a different amount of measurement in each replicate? And if not, how do you choose which replicates have to be "excluded"?

Thank you

## All Answers (10)

Jochen Wilhelm· Justus-Liebig-Universität GießenHowever, *if* there is for some reasons a conviction that the expected variances are not equal (heteroscedastic) the first question should be: why? Obviousely then there is some considerable experimental aspect that is not modelled adequately. If there is no such conviction (or a more appropriate model is used where heteroscedasticity is not anymore a problem), there is no problem in using groups of different sizes.

Another remark: if you count the number of steps, your response is a count variable, where the Poisson distribution should be taken to model the errors. Unless the number of steps is considerably large so that the normal approximation is reasonable.

A last thing: I suppose the interesting scientific question is whether the transgenic and wild-type animals react differently to the treatment. This corresponds to the interaction of genotype and treatment. Take care to explicitely model and analyze this interaction (if this is you main goal).

Nicolas Luigi Pascal Casadei· University of TuebingenI was thinking to use mean of my replicate for 2 main reasons:

- It decrease the variability of the group

- I am not interested of the evolution of the performance with runs (I already exclude this effect in a previous study)

Does this mean make still a count variable?

Best

Jochen Wilhelm· Justus-Liebig-Universität Gießen(Btw: The average is the expected value of a Poisson variable)

Emmanuel Curis· Université René Descartes - Paris 5And a precision: different number of replication is not related to increased heteroscedasticity. You may have exactly the same number of replicates in each cell and a strong heteroscedasticity, or a very different number of replicates and « perfect » homoscedasticity. What wiil change is the standard error on the means in each cells, but that's not a problem, what matters is the equality of « population » variances which are not sample-size related.

But what will very probably matter also if that you have repeated measurements on each mouse, you may expect strong correlations, which is violates ANOVA assumptions. Making means is a way to eliminate this problem. Mixed models or general least squares are other ways to account for this, which take into account the mouse variability.

Luiz Max Carvalho· The University of EdinburghI pretty much agree with Dr. Curis: heteroscedasticity is not related to design (un)balancing; it is related to the data-generating process itself. Also, as Dr. Curis exposed, if you have multiple measures per mice then the autocorrelation structure violates ANOVA assumption of iid errors. I suggest the article below for you to catch a glimpse on mixed models. It is short and simple, and the authors provide a simple code for implementation in R.

http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002590 # Plos pathogens article.

Cheers,

Nicolas Luigi Pascal Casadei· University of TuebingenTo Jochen, number of step is between 10 and 15. Data set was looking normally distributed when I was working with mean of 3 to 5 replicates by animals.

Average number of stops was not considered, we have more robust method to test why they stop during a run (anxiety, motor deficit, smell deficit...).

To Emmanuel, it is not impossible that the treatment change the number of stop or change the number of step in one or the other direction. Working at the animal level always provide surprises!

Thank you for your precision on the unpronounceable word "homoscedasticity" (at least for the french guy I am).

Thanks Luiz for the link, if any question, I will ask it here.

Best

Emmanuel Curis· Université René Descartes - Paris 5Nicolas Luigi Pascal Casadei· University of TuebingenAs Emmanuel said, and after reading the article that Luiz advised (thanks again for it), I am wondering what is the advantage to use a mixed model on that experiment.

I also want to ask when it make sens to test the auto-correlation? In the paper they recommend to do it when a "periodicity" is expected.

Best regards

Emmanuel Curis· Université René Descartes - Paris 5However, this correlation structure may not capture all the correlation, hence the use of other kinds of correlation of the residuals.

As for your specific experiment... I think more details are needed to answer the question on the need of a mixed model. But it is worth testing them, at least to learn and understand them on a "simple" case...

Nicolas Luigi Pascal Casadei· University of TuebingenCan you help by adding an answer?