• Zoran Grkov added an answer:
    When should I perform ANCOVA?

    I was wondering about when is it appropriate to use covariate in the analysis of data. Some cases are straightforward such as when you have pre – post intervention scores in homogeneous groups and use pre intervention scores as a covariate.

    But, what to do in the case when comparing for example, motor skills in people with intellectual disability and those without IT. Does it make any sense to include IQ as a covariate when we know that these two groups are different in IQ scores? Any thoughts on this?

    Zoran Grkov · Freelance Consultant and Senior Expert on Quality Related Issues

    Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression.

    The ANCOVA (analysis of covariance) can be thought of as an extension of the one-way ANOVA to incorporate a "covariate". Like the one-way ANOVA, the ANCOVA is used to determine whether there are any significant differences between the means of two or more independent (unrelated) groups (specifically, the adjusted means). However, the ANCOVA has the additional benefit of allowing you to "statistically control" for a third variable (sometimes known as a "confounding variable"), which may be negatively affecting your results. This third variable that could be confounding your results is the "covariate" that you include in an ANCOVA. 

  • Michelle Soekoe added an answer:
    How can I determine population bottleneck in a genetic study using SSR?

    I had studied population genetic via SSR marker in 5 different wildlife habitats. Three methods of I.A.M, T.P.M and S.S.M were studied by BOTTLENECK software. This result related to one of the mentioned population, which had tested as below. I would greatly appreciate it if you kindly give me some opinion on test methods and discussion:

    Assumptions: all loci fit I.A.M., mutation-drift equilibrium.
    Expected number of loci with heterozygosity excess: 3.33
    4 loci with heterozygosity deficiency and 1 loci with heterozygosity excess.
    Probability: 0.04330

    Assumptions: all loci fit T.P.M., mutation-drift equilibrium.
    Expected number of loci with heterozygosity excess: 3.14
    4 loci with heterozygosity deficiency and 1 loci with heterozygosity excess.
    Probability: 0.05649

    Assumptions: all loci fit S.M.M., mutation-drift equilibrium.
    Expected number of loci with heterozygosity excess: 2.72
    4 loci with heterozygosity deficiency and 1 loci with heterozygosity excess.
    Probability: 0.12497

    Caution: only 5 polymorphic loci (minimum 20).

    Assumptions: all loci fit I.A.M., mutation-drift equilibrium.
    T2: -4.229 Probability: 0.00001

    Assumptions: all loci fit T.P.M., mutation-drift equilibrium.
    T2: -5.744 Probability: 0.00000

    Assumptions: all loci fit S.M.M., mutation-drift equilibrium.
    T2: -6.130 Probability: 0.00000

    Assumptions: all loci fit I.A.M., mutation-drift equilibrium.
    Probability (one tail for H deficiency): 0.04688
    Probability (one tail for H excess): 0.96875
    Probability (two tails for H excess and deficiency): 0.09375

    Assumptions: all loci fit T.P.M., mutation-drift equilibrium.
    Probability (one tail for H deficiency): 0.04688
    Probability (one tail for H excess): 0.96875
    Probability (two tails for H excess or deficiency): 0.09375

    Assumptions: all loci fit S.M.M., mutation-drift equilibrium.
    Probability (one tail for H deficiency): 0.04688
    Probability (one tail for H excess): 0.96875
    Probability (two tails for H excess or deficiency): 0.09375



    Michelle Soekoe · Rhodes University

    When reading the output from Bottleneck, is the probability the significance (p) value?

  • Bin Jiang added an answer:
    Do we need a new definition of fractals for big data? Or must fractals be based on power laws?

    So far definitions of fractals are mainly from mathematical point of view for the purpose of generating fractal sets or patterns, either strictly or statistically; see illustrations below (Figure 1 for strict fractals, while Figure 2 for statistical fractals):

    Big data are likely to show fractal because of the underlying heterogeneity and diversity. I re-defined fractal as a set or pattern in which the scaling pattern of far more small things than large ones recurs multiple times, at least twice with ht-index being 3. I show below how geographic forms or patterns generated from twitter geolocation data bear the same scaling property as the generative fractal snowflake.

    Jiang B. and Yin J. (2014), Ht-index for quantifying the fractal or scaling structure of geographic features, Annals of the Association of American Geographers, 104(3), 530–541, Preprint:
    Jiang B. (2015), Head/tail breaks for visualization of city structure and dynamics, Cities, 43, 69-77, Preprint:

    The new definition of fractals enables us to see the fractals emerged from big data. The answer to the question seems obvious. Yes, we need the new definition. BUT, some of my colleagues argued that the newly defined fractals are not fractal anymore, because they do not follow power laws.

    Bin Jiang · Högskolan i Gävle

    Dear Oscar Sotolongo-Grau,

    Your comment makes perfect sense to me! The key point is that power laws are not absolute; what are claimed to be a power law could a power law for the head part, and the tail part with exponential distribution. Also fractal is not absolute, but with different degree of being fractal. For example, with respect to the following figure, the fractals have different degree at different stages or iterations. The different degree is captured by the ht-index.

  • Mervyn Thomas added an answer:
    Is it necessary to include statisticians in the institutional research and ethical committee for reviewing a manuscript and research proposal?
    Several institutions from developing countries like Nepal don't have a single statistician or any personel from community medicine in the research and ethical committee of the medical institution. Finally there will be a poorly conducted trial. Is it correct?
    Mervyn Thomas · Emphron Informatics

    @Susan,  God preserve me from gifted amateur statisticians. I'd rather not use a gifted amateur as a statistician or as a surgeon.

  • Zheng Feei Ma added an answer:
    Any suggestion about using ANCOVA with repeated measures?

    My consulting adviser said that we can't use covariance method when there are more than 2 time points. But I'm not sure about it again!

    What's your idea about that?

    Zheng Feei Ma · University of Otago

    Hi Roshanak, do you have it published so that i can have a look at your paper?

  • Paul R. Yarnold added an answer:
    Is there is a formal test to compare whether two groups are significantly different in a Multiple Correspondence Analysis (MCA)?

    I have a number of group variables that I capture on MCA plots. I also obviously have the coordinates for these plots in the X, Y and Z dimensions. i.e, most people plot the maps as 2 dimensional and then descriptively outline what is going on with the analyses.
    You can also do MCA plots in a 3rd dimension (Z dimensions). This would be a 3D plot
    Obviously MCA is a visual representation. I was wondering if there are any recognised tests for comparing group differences (e.g, male and female, or 3 way group splits) in the 2D and 3D cases.
    I asked Greenacre and received no reply as yet. I cannot find literature here answering this condition.

    Paul R. Yarnold · Optimal Data Analysis LLC

    Dear Brad,

    I appreciate your interest and will be thrilled to assist you in any manner that I am able. Yes, a personal message is the way to go.

  • Ali Tarhini added an answer:
    Where to get some materials about "multiple-linear discriminant analysis" ?

    We are trying to use multiple-linear discriminant analysis as a method to measure the degree to which seven independent variables contribute to a linear function that best discriminates between 3 items in one dependent variable. From my preliminary reading, I found that this method can accommodate mixed independent variables (Kohli and Devaraj, 2003) and is suitable to situations where the dependent variable is categorical (Lee 2004). It can explain relationships between multiple independent variables and one categorical dependent variable (Kohli and Devaraj, 2003). With discriminant analysis, a linear combination of independent variables that will discriminate best between predefined groups can be derived (Hair et al., 1992), which is exactly what we trying to do.

    I would be thankful if anyone can give some hints on how to use this method.

    Many thanks in advance


    Ali Tarhini · Brunel University

    Thanks Kelvyn for enrishing our knowledge.

    To be honest, i never used either of the two methods to anlyze my data before. I am more into using Structural Equation Modeling. But, SEM is not appropriate for situations. I attached the research-in-progress paper that we are currently extending for your reference. As you will notice we only have one dependent variable (categorical type) which make it impossible for us to use SEM. we are about to finish the data collection stage. so, based on your recommendation, LDA is more preferable when we have normal distribution then  discriminant analysis . I will definitely take that into consideration before making up my mind.

  • Maha Pervaz Iqbal added an answer:
    How can I statistically compare a pre and post test survey with categorical variables?

    There are 31 items that were asked in this survey and were rated on a Likert scale. The same survey was used in the pre- and post test.

    Maha Pervaz Iqbal · University of New South Wales

    Thank you all. I am reflecting on all the comments and will work on my analysis.

  • Mark Ghamsary added an answer:
    I am confused about different kinds of SS in ANOVA tables. Could anybody help me?
    I am confused about different kinds of SS in ANOVA tables. Could anybody help me to understand the differences between Type I, III, IIV SS which are appeared in SAS and SPSS outputs?
    Mark Ghamsary · Loma Linda University

    The best way to understand the concept of SS type1 to 4 is to do it by examples:)

    Start with simple regression with 3 inspectors. Or with a balanced 2 way ANOVA

  • Miranda Yeoh added an answer:
    Can I use a t-test that assumes that my data fit a normal distribution in this case? Or should I use a non-parametric test, Mann Whitney?
    I read several research papers on the motivation to study science. All of them used t-tests without first testing that the data was normally distributed.
    Tthis was one set of data that I got for one analysis: for Levene's test, F=0.537, p=0.465, t=1.612 (if equal variance assumed), t=1.649 (equal variance not assumed; df=141 (equal variance assumed), df=107.05 (equal variance not assumed). My sample size is 143, 50 in group 1, and 93 in group 2. The values of p for the t-test exceeded 0.05 whether equal variances were assumed or not.
    I can still report means, and SD's, but I should use a non-parametric test like Mann Whitney, right? What is your advice? Is it ethical to use a t-test without stating/making sure that the data is normally distributed?
    Miranda Yeoh · Kolej Matrikulasi Selangor, MALAYSIA (Selangor Matriculation College)

    Thanks to all of you for your responses.  I asked several related questions on this thread.  At first, it was for a research where the sizes were unequal.  That's because gender was a variable, and like many places, I have 25% male and 75% female respondents.  After that I asked for another study where gender wasn't a variable, and the 2 groups were equal in size.  I was able to confirm many things from the expertise of all of you, and I got 2 papers published in local journals.  For us on RG, WE CELEBRATE SCHOLARSHIP.

  • Jorge Ortiz Pinilla added an answer:
    Is paired t-test- enough?

    we have completed our pilot study and obtained pre and post intervention measures on each participants e.g. BMI, food habit scores etc. Participants received a nine week intervention (nutrition 5 sessions, and parenbting 4 session)We did not have a control group. Would a paired t-test be enough? Whta is the best way to test the intervention's effect accounting for socio-demographic factors (age, gender, number of sessions attended etc)> Thanks

    Jorge Ortiz Pinilla · Saint Thomas University

    Erikson, Jay and Saad:

    I think all the suggestions of using the paired t-test forgive an important aspect of the question asked by André: "What is the best way to test the intervention's effect accounting for socio-demographic factors (age, gender, number of sessions attended etc?". This is not answered with the paired t-test. 

    Some of the interventions before have considered this aspect. 

    Jorge Ortiz

  • Azalea Othman added an answer:
    What is the relevance of assay CVs to our sample results?

    I'm trying to figure out exactly why we do CV in relation to our sample results.

    Since CV of less than 10% considered most ideal - but how do we apply that to our results? Do we need to use this percentage and re-calculate our results?

    What does '10%' means?

    Azalea Othman · University of Queensland

    Thanks Angel. This is really helpful! 

  • Nicholas E Rowe added an answer:
    Do we actually need (or understand) more than basic statistics for routine investigation?

    Most PhD work is littered with statistical markers (SD, ANOVA, p=, N=, two-tailed lesser-spotted thingamajig), but they often refer to simple relationships, that may be better understood (& evaluated) if they were written in plain language. Anything above basic statistics has a specific use (a bit like the need to speak Latin, programme in Python, perform extraordinary feats of mental maths etc.). So why all this emphasis on being a statistical genius & how many of us (beyond those with the job title 'Statistician') are genuinely conversant with the field ?
    Is statistics used like a badge of 'cleverness' ? I only use Latin for established phrases, I use a calculator for clever maths, I prefer percentages and plain words to explain how variables relate to each other & I use SPSS to do the clever statistics thing (if I don't do percentages on a calculator). Despite what is claimed, not all fields are actually 'scientific' and I think that the output of our research should be designed for clarity and usability. In my experience, the over use of statistics does not promote this outside of genuine hard science.
    Am I 'worthy' of being in the Ivory Tower?

    Can anyone share articles which explore how the average person understands (or wants) heavy statistics and if plain language could demonstrate a point more clearly?

    Nicholas E Rowe · University of Lapland

    Aeron:  sorry for the very late response (difficult to keep up with older threads).  I think the point about end-user comprehension is very important.  I would venture that a huge amount of people have limited statistical knowledge.  Unfortunately, this has become interpreted as an intellectual deficit, so people rarely admit to it.  I think that to assume your readers have an in-depth or working knowledge of unexplained statistics is ... an assumption.  Even when we assert that we write for a certain level / field, I think that the tests we run and report have to be clearly explained in words, and not just left as numerical statements.  Not only does this help explain the context and application of your work, but it also shows that you have correctly chosen, interpreted and related the results.  I think this is an important part of academic education & also needs to be stressed in statistical teaching.

  • Kandamuthan M added an answer:
    How can one statistically prove that a set of measurements are accurate?

    I'm currently doing my experiments in triplicate format (three measurements per sample) and I'm using the mean of the three readings for further calculations. I want to statistically analyse each set of readings (the three readings for one sample) to see if my measuring instrument is accurate enough to do the future experiments in duplicate format (two measurements per sample). How should I proceed?


    Sample => Reading 1: 0.123 / Reading 2: 0.110 / Reading 3: 0.134

    should I trust my instrument enough to get two readings and use the mean for later calculations? 

    Kandamuthan M · DM Wayanad Institute of Medical Sciences, Wayanad, Kerala, India

    Calculate the standard deviation of the values and see the variability between the measurements is minimum so to confirm that the instrument is accurate in the measurement.

  • Federico Nave added an answer:
    How can I choose an appropriate data transformation method for a set of data?

    I would like to know whether there are any set of rules for data transformation depending on the distribution or skewness etc.

    Federico Nave · University of San Carlos of Guatemala

    Hi Jobim,

    I think that Subhash Chandra answer is quite appropriate, the statistical analysis must to tend to see what model fits better to data and not try to fit data to a given model. And Ariel Linden gives (for me) the best definition and maybe the only reasonable application of data transformation, to test it for linearity.

    Best regards

  • Pardis Td added an answer:
    Is it appropriate to compare ED50 across different drug values when the maximum effect magnitudes are different for each drug?
    Let's say drug X produces a maximum response of 50, so its ED50 is the dose that elicits a response of 25. Drug Y produces a maximum response of 30, so its ED50 is the dose that elicits a response of 15. Isn't comparing these two with statistics like comparing apples and oranges? Yes, you could normalize the maximum response of each to 100, but that seems to be misleading when comparing two drugs. What I really need is a good published reference about the do's and don't of ED50 values, maximum responses, etc. and their analyses, especially in behavioral pharmacology.
  • M. A. Aghajani added an answer:
    Free Software for Curve fitting or best fit equation
    We are using TableCurve2D for fitting our data. Problem with this software it is windows based and commercial software. We need a free software equivalent TableCurve2d (I mean similar functions) which can be run in command mode.

    I will highly appreciate if some one suggest free software which take my data and fit it in large number of equations by regression or non-regression. Finally it give me equation in which my data fit best.
    M. A. Aghajani · Agriculture and Natural Resources Research Center of Golestan Province

    I can recommend TableCurve 2D and CurveExpert Professional

  • Kayvan Agahi added an answer:
    It is possible to compare means from a field trial without replication?
    Working with demonstration plots, when no replication of treatments is possible due to the surface availability, variability among locations or costs associated, which is the best statistical method to compare treatments means? Is the Student t-test appropriate to compare means among plots (treatments)?
    Kayvan Agahi · Shahed University

    see this page

  • Carlos Lara-Romero added an answer:
    How can I optimize the random effect structure in a GLMM?

    I am using generalized mixed models using glmer function of the “lme4” package. Glmer() always uses Maximum Likelihood (ML) rather than REstricted Maximum Likelihood (REML) ( How can I fit a GLMM with REML to optimize the random effects structure? does it make sense?

    Carlos Lara-Romero · King Juan Carlos University

    :D Don't worry Francismeire

    there is hope, try in There is nice information inside of this web

  • Ariel Linden added an answer:
    Is there a software that calculates absolute disparity statisjavascript:tics?

    Or is the calculation done manually?

    Ariel Linden · University of Michigan

    Good luck, Tolu

  • Is there an alternative test for chi square on frequency data ?

    is there any other statistical test which can be applied n frequency set of data for figuring out statistical significance...  

    Seyed Mahdi Amir Jahanshahi · University of Sistan and Baluchestan

    Hi Manu,

    As i know, there exist discrete form of popular goodness of fit tests which they are

    useful for frequency data. To know more about them checkout below links:

    good luck.

  • Hume Francis Winzar added an answer:
    What type of statistical analyses should I use?

    i have data from 500 participants who responded to 61 questions using a lickert scale (SA-SWA-A-Maybe-DA-SWD-SD) I want to know the rate of agreement/disagreement with the 61 questions, which statistical data analysis would be most appropriate for making sense of this information?

    thank you for any help you can offer.

    Hume Francis Winzar · Macquarie University

    Expanding on the suggestion from Robert McClelland above, you probably have several different constructs that you expect to see measured by these 61 items. 

    A true Likert Scale is actually a summative scale - the sum of several 1-5 (Agree-Disagree) items. (Or 7-point agree-disagree items). Humans are fickle and inconsistent, and Agree-Disagree is a very blunt measurement, so we ask several agree-disagree questions that are all variants of the same idea. The logic is that an overstatement on one item will be compensated by an understatement on another item. Thus when the items are summed, or averaged, then we get a figure that is reasonably consistent over time for the one person, and differences are reasonably consistent between individuals. 

    It's a lot easier dealing with, say, ten constructs than with 61 items.

    That is, for example, items (questions) 1 through 6 may be designed to measure "Perceived work stress", and items 7 though 13 may be designed to measure "Perceived workplace harmony", items 14 through 20 relate to "Job Satisfaction", and so on. If this is the case then you would combine the relevant items to make a summed rating scale - you add them up. For this to work properly you need to be sure that all of the items in a summative scale are highly positively correlated. The usual test for this preliminary task is a Confirmatory Factor Analysis. Or Cronbach's Alpha if you are happy giving all items equal importance.

    If you have no theory about what constructs align with what items (What were you thinking when you wrote the questions?) then you would check if there are any "natural" groupings of question/items, using Exploratory Factor Analysis. Again, this will help you to reduce all of those 61 items into a smaller number of different constructs.

    Don't stop thinking when using any of these techniques though: If there is no logical reason for an item to be included in a scale then don't just blindly add it to the scale. With 61 items it is very likely that several items will appear to be correlated purely by chance.

    These techniques assume that your data are at least Interval-scaled, but of course an agree-disagree scale is, in truth, an ordinal scale. But the techniques work fine just the same. Many years ago my evil Stats Professor made us Doctoral students run hundreds of simulations, testing the results of violating various assumptions of data in statistical analyses, and comparing results using the "correct" techniques. Summing, averaging and correlating Likert Scales is fine.

    The following reference is aged, but it may serve as a primer before you move to more recent methods.

    Title: A Paradigm for Developing Better Measures of Marketing Constructs
    Author(s): Gilbert A. Churchill, Jr.
    Source: Journal of Marketing Research, Vol. 16, No. 1 (Feb., 1979), pp. 64-73
    Stable URL:
    Abstract: A critical element in the evolution of a fundamental body of knowledge in marketing, as well as for improved marketing practice, is the development of better measures of the variables with which marketers work. In this article an approach is outlined by which this goal can be achieved and portions of the approach are illustrated in terms of a job satisfaction measure.

  • G. Rathinasabapathy added an answer:
    Where can I find websites to get free scientific publications?

    Please I need links like or; I´m from Bolivia and sometimes it is too expensive buy scientific papers, usually it is not one or three, also for the students could have access without having a account where you have to be endorsed for a institution to get it (like on reserchgate). Also publications in other fields such as art, music, etc. Thank you for answers

    G. Rathinasabapathy · Tamil Nadu Veterinary and Animal Sciences University

    Dear Andrei Gonzales

    You check the following Open Access Journal Aggregators:

    For all branches of knowledge,

    For medical sciences,


  • Mundher Abdullah added an answer:
    Where can I find 2013 US health expenditure data?

    I would be glad to be pointed in the direction of 2013 per capita health expenditures for all US metropolitan statistical areas. I found just about a dozen in published tables of the American Community Survey 2013. 

    Mundher Abdullah · Putra University, Malaysia

    Hi bro

    you can check the link it has a lot of data with differant areas


  • Monica Maria Cossu added an answer:
    Where can I find bankruptcy statistics?

    I am intending to predict default probabilities by using logit regression on financial ratios of several companies from the same sector. Any idea where could I find a data set of companies from same sector that defaulted? I will be able to find all the financial ratios for them afterwards as long as they are listed but I need the basic data set of companies.

    Monica Maria Cossu · Università degli Studi di Sassari

    Useful information on companies and other enterprises in crisis (before bankruptcy) in

    Central Credit Register (normally present in all countries)

  • Jochen Wilhelm added an answer:
    Can anyone specify the value of interest of binary dependent variable in glmer() formula?

    Dear all:

    In R, the glmer() function takes whatever it finds as the first value in the column for the dependent variable and calculates its coefficients for that value. However, if you are interested in getting coefficients for the other value, you might want to change that. Is there a way to do this?

    For example:

    I have a binary response Tipo, which can be S or P. I get coefficients for S, but I'm interested in coefficients for P. How do I do that? 

    This is my formula:

    glmer(Tipo~ Action.Chain.Pos + Tense + Pr.2.Pr + Co.2.Pr + prestsoc + (1 | Muestra) + (1 |nombrelema), data=data, family=binomial, contrasts="sum")

    Jochen Wilhelm · Justus-Liebig-Universität Gießen

    Thank you for sharing your findings/solutions. Too few people do this.

  • Rianne Jacobs added an answer:
    Any suggestions on how to deal with these two sets of data: One healthy with only zero values and a second set of sick with broad distribution?

    statistical data interpretation

    Rianne Jacobs · Wageningen UR


    It really depends on your research question. If you want to be able to explain the presence or no presence of the biomarker as well as the concentration of the marker when it is present, a ZI model is appropriate. If you only want to model the concentration of the marker when it is present, as it seems is the case, then of course a ZI model is not relevant.

    In the initial question, there was no mention of the type of response in the sick group, so I assumed it was discrete. Obviously, for a continuous response you need a continuous distribution, and I did mention the gamma as an example. It is true that a gamma distribution has a positive support and is not suitable for 0 values. However, if the assumption is that the marker is present in the tumor patient, then theoretically, a 0 concentration is not possible and all patients have a positive concentration. If the concentration is below the detection limit, I would assign those cases the value of the detection limit (or some other low/background value). Assigning 0 would be fundamentally wrong, as all tumor patients have the marker.

About Statistics

Statistical theory and its application.

Topic Followers (52687) See all