Universität Osnabrück

Question

Asked 18 March 2015

# What is the difference between "mean±SD" and "mean±SE"?

This may be general question but it is confusing for me. So, it would be great if someone could explain where to use "mean±SD" and "mean±SE"?

## Most recent answer

Patrice Showers Corneli I think you are mistaken here.

On the SAS page, they clearly give you both options you mention to calculate the variance and what Jochen already described as:"

*average*of the squared*residuals*(the sum is divided by n) or as an*estimate of*the expected squared*error*(the sum is divided by n-1)". You can choose if "d" in the formula may be N or df, where df is the default option. The standard deviation is described as the square root of the variance. Directly below the SD description you find the formula SAS uses to claculate the "STDERR | STDMEAN" s/sqrt(sum(w_{i})), which corresponds to what others have already stated.The documentation for SPSS and Minitab show the same formulas to calculate variance, standard deviation and/or standard error, no contradiction here.

Therefore, may it be the case that you are simply wrong and misread the documentation? For example, Minitab states that the SE is s/sqrt(N) which is clearly not the same as √( ∑i(observation

_{i}-mean)^{2}/n) what you stated above.1 Recommendation

## Popular answers (1)

Royal College of Surgeons in Ireland

The standard error is a rather useless statistic. As a measure of the precision of measurement of the mean, it is much less useful than the 95% confidence interval (it is, in fact, a sort of 67% confidence interval, which is about as useful as a half a hat.

So rather than give standard errors, the recommendation is to give confidence intervals. Douglas Altman lists giving standard errors to describe data as a definite error in his classification of errors in statistical presentation (see page 2666 of http://www.medicine.mcgill.ca/epidemiology/moodie/AGLM-HW/Altman1998.pdf)

Also see a very good note by Altman here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1255808/

40 Recommendations

## All Answers (72)

Universität Osnabrück

Firstly, it is mandatory to understand the difference between and SD-standard deviation and SE(M)-standard error of the mean.

The SD describes a dispersion measure of the SAMPLE. You can find roughly 66% of the sample data between ±1SD for normal distributed variable.

The mean estimated by your sample is usually an estimator of the population mean. But when you draw more that one sample from the same population, the mean values will slightly vary. Therefore, the SE(M) gives you an estimator for the variation. SE is calculated SD/sqr(N)

As you can see, with growing N, SE(M) becomes smaller. That's why the power of statistical tests increases with increasing N, e.g. the difference between two mean values is divided by the standard error of the mean -> one sample t-Test

9 Recommendations

Riga Stradins University

Thanks for this question! Actually this is little bit confusing for me too. I see a lot of publications where "mean±SD" is used. But I'm not sure if it is mathematically correct. In some discussions I've heard that using "mean±SD" makes no sense form statistical point of view and we need to calculate the SE value. Because "mean±SE" is the one which represents the variability of the sampling distribution of a statistic.

Is there a good mathematician who could put things in order here?

Universität Osnabrück

If you measure a variable, e.g. a psychological trait, then the mean value of the sample is said to be an unbiased estimator of the population mean. But if you would draw k samples with the same N, the measured mean values would differ. But how large is the variabilty? This variability of the mean values is represented by the SE. Luckily, it is not necessary to draw k samples of the size N from the population to determine the variablity. It can be derived from the sample SD --> SE=SD/sqrt(N).

With the SE at hand, you can for example caculate the confidence intervals for the mean, showing in which range the "true" mean value of the population will be. E.g. mean±(1,96*SE) for a 95% confidence interval, where 1,96 is the z score for alpha/2.

1 Recommendation

Royal College of Surgeons in Ireland

The standard error is a rather useless statistic. As a measure of the precision of measurement of the mean, it is much less useful than the 95% confidence interval (it is, in fact, a sort of 67% confidence interval, which is about as useful as a half a hat.

So rather than give standard errors, the recommendation is to give confidence intervals. Douglas Altman lists giving standard errors to describe data as a definite error in his classification of errors in statistical presentation (see page 2666 of http://www.medicine.mcgill.ca/epidemiology/moodie/AGLM-HW/Altman1998.pdf)

Also see a very good note by Altman here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1255808/

40 Recommendations

Justus-Liebig-Universität Gießen

Edgars,

"But I'm not sure if it is mathematically correct." - this is all

*mathematically*correct (unless you don't make some calculation error). This is neither the problem nor the right question.The right question is: Is is

*reasonable*to show SD or is is*reasonable*to show the SE. What reasonable is depends on several things. First of all: if the data is not approximately normal distributed, then SD it typically unreasonable, and SE may be reasonable only when the sample size is large. Then you have to ask yourself is you want to show the dispersion of the DATA or the precision of an ESTIMATE. Both have their own distinct value (their "right to exists", so to say).It may be important to report how dispersed the values of a variable is. Example: physiological measures like blood pressure, BMI etc, also IQ and alike. This helps to judge if any particular value (possibly your own BMI) is "normal" or "rather atypical".

In reaserch often parameters are estimates (like the mean BMI). It may be of interest to show how precisely this parameter was estimated based on the used data. However there is a very bad habit that presumably causes a lot of confusion:

A typical research question is of the tyle like "does my treatment have an effect on the BMI?" (e.g. "does a high omega-6-fatty acid diet reduce the BMI?"). The central question is *NOT* the BMI of the controls and *NOT* the BMI of the treated. The central question is about the DIFFERENCE of the BMI between these groups. Therefor it makes only very limited sense to present the precision of the mean BMI for the two groups. The really

*relevant*estimate is the*difference*between the groups, and for this estimate the SE could/should* be (but is typically not!) provided. This estimate like the difference, the treatment*effect*, does not have a SD; there is only a SE. Often, the estimates of the "group means" are rather irrelevant, and so are their SEs. In simpler designs such group means and their SEs can be easily calculated and so they are presented. But when the design gets more complicates (more-factorial designs, paired or longitudinal designs, designs wirth nested and/or random effects) the calculation of a simple group mean becomes cumbersome or even impossible, but yet the*effects*are estimated and the precision of*these*estimates can be given.*presenting the CI should be preferred over the SE

35 Recommendations

University of Guelph

sd means the deviation of each sample from mean and se means how confident you are about your mean

3 Recommendations

Iran University of Medical Sciences

I think some researchers mean mean±2SD but they write wrongly mean±SD.

I am I right?

Do you think so?

Justus-Liebig-Universität Gießen

I don't think so. It would make some sense to plot mean±2SD, as it gives an approximate 95% confidence interval. However, most researchers I know are primarily interested in showing small-as-possible error bars, than therefore thay show SE (simply because it usually gives the smallest error bars; they don't care much about the interpretation of what is shown). It may happen that they accidentally write "SD" instead of "SE", such mistakes happen, but in this case the intervals shown (wrongly) indicate a higher precision or lower variance than was actually obtained (in your example it would be the other way around, what does not seem to be as bad).

2 Recommendations

Bangor University

"If what one wants to describe is the variability in the original sampled population then one should use the standard deviation (s.d.), whose expected value does not depend on the number of replicates. If one wants to compare treatment or group means, one should use con- fidence intervals (c.i.) or the standard error of the mean (s.e.). These last two statistics decrease in size with an increase in the number of replicates, reflecting increased reliability of the estimated means." - Aphalo et al., 2012

1 Recommendation

CNRS

You should read this article

Error bars

it helped me a lot concerning this topic.

2 Recommendations

Consultant Pathologist and Transfusion Medicine Specialist to Patankar Pathology, Laha Diagnostic , Madhur Pathology and Emergency Blood Bank , Gwalior, India

A useful conversations to read

Iran University of Medical Sciences

I understood my wrong!

SD is for the sample (not the society). When we divide it by

**√n**it is called SE which is an estimation for the society's SD (not sample's SD)!!!CI = 2SE (or 2SD of society)

Therefore, 2SE < 1SD

In other words, 2SD (society) < 1 SD (sample)

1 Recommendation

Justus-Liebig-Universität Gießen

Funnily, the article about statistical errors is itself not free of errors. For instance, a confidence interval is

**a range of values that is expected to contain estimates of similar studies with a given probability.***not*Cardiology Oncology Research Collaborative Group (CORCG)

You have just to watch these two amazing videos on Joshua Starmer (a geneticist)on his youtube channel where he explains statistics in a super easy and funny way...

StatQuickie: Standard Deviation vs Standard Error https://www.youtube.com/watch?v=A82brFpdr9g

StatQuest: The standard error https://www.youtube.com/watch?v=XNgt7F6FqDU

Hope this would help you ;)

4 Recommendations

Universität Osnabrück

@Jochen: "funny" may be not the right description. ;-)

How can it be that confidence intervals are most often described wrongly?

1 Recommendation

Justus-Liebig-Universität Gießen

@Rainer,

I don't know... maybe because most people don't realize how weird, indirect and counter-intuitive frequentist reasoning is? The mistake is obviousely already made when interpreting p-values, and it is consequently transferred to the interpretation of CIs.

Also a nice one: "

*Most studies report several P values, which increases the risk of making a type I error: such as saying that a treatment is effective when chance is a more likely explanation for the results*" - the common but wrong view that P values are related to the type-I error, and treating "chance" as an explanation. This is also a wrong concept I read very very often.Another very common "mistake" is that almost everything is reduced to t-tests. The text also focuses too much on t-tests. It gives the (wrong) impression that a "parametric test" is always a t-test. And the "

*Wilcoxon rank-sum test (or another non-parametric test)*" is proposed as an alternative in cases where the t-test is not approriate (but it then tests an entirely different hypothesis!). This is also written very frequently.Less common it the bad example given in Fig.5. This is just a case where modelling a linear relationship is perfectly fine (but it is used as a counter-example). One may check if there went something wrong with the single outlying point, but this point has not much impact (the CI for the slope is [3.1, 4.4] with this point and [3.0, 3.3] without), instead of considering some strange non-linear relationship that explaines this point.

I also sometimes see bar charts with broken axes. I don't consider this any good practice (using barcharts is already often not a good practice in the first place!).

However, the manuscript also makes many good and valid points. It's worth reading.

Aswan University

By answering of prof. Jochen & prof. Kenneth Carling
As Jochen made clear. SD is about the variation in a variable, whereas Standard error is about a statistic (calculated on a sample of observations of a variable) and SEM about the specific statistic mean. You want to describe the variation of a (normal distributed) variable - use SD; you want to describe the uncertaintly of the population mean relying on a sample mean (when the central limit theorem is applicable) - use SEM.

Shahjalal University of Science and Technology

Please anyone tell me how can I calculate (mean+/- SD) for my given individual sample?

Harvard University

With respect to our statistician collogues, here is my take on the difference between SD and SEM

Standard deviation (SD) calculates the dispersion or the variability of the "population/dataset" around the mean of that particular "population/dataset". So SD is a measure of the variability within a "population/dataset".

Standard error of the mean (SEM) is a measure that quantifies how far your "sample "is likely to be from the "true" mean of the "population". SEM is simply the SD of the averages of repeated experiments. The lower SEM is, the more likely it is that your calculated mean is close to the actual mean of the "papulation". In other words SEM quantifies the precision of the mean.

In an ideal condition (if we had all the time, energy, and sanity) SEM of a sample could be calculated as the SD of "all of the averages (means)" from a population/dataset. In other words, SEM is simply the result of measuring the variability of a "sample" from the true "population" by cellulating multiple SD of all those averages; So, SEM is simply the SD of the averages of repeated experiments.

For example, you can estimate the mean and SD of pulse rates for an athlete in one day by serial measurements of pulse rate. SEM of the pulse rate for the same athlete can be measured by a) the serial calculation of pulse rates over several assigned days; followed by, 2) calculating the SD of the averages of those pulse rates in all measurement days!

Thankfully statisticians provided a formula to estimate the SEM without having to repeat all those experiments and compromising the integrity of what it means. We have a classic formula to measure SEM: SEM is calculated by dividing the SD by the square root of the sample size of the experiment.

16 Recommendations

University of Utah

The standard deviation (SD) is a measure of dispersion of the sample. It will generally be larger than the expected standard error of the population which is simply - as Maryam points out - the SD/√n. Therefore is corrects the SD for by the size of the sample. Hence, small samples from a population will have larger SEs than large samples from the same population. It standardizes the dispersion measure by sample size.

2 Recommendations

University of Utah

Again, the two most popular answers are those of self-appointed experts who do not know their statistics. Thank you Maryam for your fine statistically accurate description free of opinionated bias and excessive hubris..

The statement "*presenting the CI should be preferred over the SE" is misleading. The SE is a simple nomalization of the SD and the CI is a simple function of the SD. The CI and the SD contain the very same information. They are mathematically confounded.

1 Recommendation

Universität Osnabrück

Dear Patrice,

could you please elaborate, why the statement "presenting the CI should be preferred over the SE" is misleading? Also, CI, SE and SD do not contain the same information and have different purposes. You are in so far right, that the sample size is implicitly part of the SD calculation, but only with the SD value at hand, you cant say anything about the width of the SE and hence not of the CI. Both are needed for frequentistic inferences and have it's own purposes. Confounded, yes, because SD is the upper limit of SE, but there is nothing more you can say, since N is independent of the population Standard Deviation, for which SD is an estimate.

National Institute of Occupational Health

Hey, you may also refer to the article of Barde and Barde, Perspect Clin Res. 2012.

Link for the article :

1 Recommendation

University of Utah

Rainer, the statement is misleading because it suggests that the CI has unique information independent of the SD and/or SE. If you know the SD, you know can calculate the CI which is a very convenient way to present the variability around the mean but is mathematically derived from the standard error. The CI is an easy and convenient function of the standard deviation, the critical value and the sample size (in other words of the SE).

Both SD and CI add needed information, in different expressions of the same thing: the variance. The mean is really not useful without the measure of variability around the mean.The p-value doesn't give you the upper and lower limits of the CI but does tell you whether the CI covers the values expected under the null.

1 Recommendation

Mekelle University

If you are inferring to the population from the estimates of a sample, using standard error of the mean ( SEM) is preferable; but when you want to simply describe data of a given sample, you can use standard deviation(SD). By the way, the sample size issue needs to be considered

3 Recommendations

University of Al-Qadisiyah

SD and/or SE. If you know the SD, you know can calculate the CI which is a very convenient way to present the variability around the mean but is mathematically derived from the standard error.

1 Recommendation

University of Utah

Wisam and Leake. Thank you for quoting me verbatim in your answer. For the more detailed answer please see my post of a year ago where you will find these exact statements.

Furthermore, the two "most popular" answers from Jochen and Ronan still have misleading information. The CI and the SD contain absolutely the same variance (σ ) information around the mean and the same. The CI merely gives the upper and lower bounds of the interval most likely to cover the mean. As a convenient function of the standard deviation it is a very convenient way to display the standard error for a given level of confidence (say 95%):

*Upper limit CI = mean*+ 1.96 × (

*s*/sqrt

*n)*

*Lower limit CI or m*1.96 × (

*s*/sqrt

*n)*

or

*CI = (mean +*1.96 ×

*SE, mean -*1.96 ×

*SE)*

where s is the sample standard deviation and SE is the standard error the mean which corrects for sample size.

Indeed it can be readily seen that the CI is a function of the standard deviation (and the mean). In other words the SE and the CI are completely confounded with respect to one another: The CI cannot be defined in the absence of the estimated standard deviation. The CI is simply far easier to interpret than the mean+/-SE.

3 Recommendations

Leibniz Institute of Agricultural Development in Transition Economies

This is not a trivial question. Many researchers, even statisticians, misuse SD and SEM. In most cases, SD should be reported, instead of SEM. I used to be confused too.

SD describes the variation of the values of a variable, while SEM is the standard deviation of the sample means. In other words, SD is about how spread out of the data values in the sample/population is; SEM is about the uncertainty (or precision of the measurements) about the sample means (as means vary when taking a new sample from the same population). Mathematically, SEM = SD/sqrt (N), where N is the sample size. As N >=1, SD >= SEM. Thus, researchers may tend to report SEM (smaller values) rather than SD.

@Ronán Michael Conroy was largely right: in the case that you need to report SEM, then you would rather report CI instead—which is more intuitive, although mathematically they are confounding as Patrice Showers Corneli pointed out.

Here are some useful literature:

Guidelines for reporting statistics in journals published by the American Physiological Society:

Inappropriate use of standard error of the mean when reporting variability of study samples: https://www.sciencedirect.com/science/article/pii/S1028455914000084

Mean (Standard Deviation) or Mean (Standard Error of Mean): Time to Ponder:

14 Recommendations

University of Florida

This is just a rhetorical difference. SD is the technically pure term it doesn't have the pejorative sense of having "erred." But remember, it measures a "standard" departure from the mean, which is also an "error" if you use this value as a predictor of the mean. Keep in mind that if you sample a random variable, you will not observe the value of the mean but something different (even tho the value you observe is an unbiased estimator of the mean). If you take the difference between that value and the mean, you can call this an "error" in estimation if you are using the sample value to predict the mean. if you take the average of these errors, you get a "standard" error, which is also a standard "deviation," the difference on average between sample values and the true mean. Stick to the actual meanings of the words and you will be fine.

1 Recommendation

University of Utah

In statistics, error does not mean mistake. It is simply variation or departure from the mean.. Both measures are deviations from the mean (or measures of error). The difference is that the SEM takes into account the sample size which is a more than rhetorical difference.

2 Recommendations

University of Science and Technology Houari Boumediene

**Three uses of SD**means that

**all the numbers**are inside the data range of the expected values like in engineering measurements which use

**mean±3SD**. We have then the data range of 6SD.

**SD**means that

**most of the numbers**are around the mean value with a data range of 2SD.

1 Recommendation

University of Utah

Mr.Moussaoul. with due respect your definition does not agree in any way with statitisical theory.

2 Recommendations

Universiti Kebangsaan Malaysia

In biomedical journals, Standard Error of Mean (SEM) and Standard Deviation (SD) are used interchangeably to express the variability. However, they measure different parameters. SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean.

In other words, SD characterizes typical distance of an observation from distribution center or middle value. If observations are more disperse, then there will be more variability. Thus, a low SD signifies less variability while high SD indicates more spread out of data.

On the other hand, SEM by itself does not convey much useful information. Its main function is to help construct confidence intervals (CI). CI is the range of values that is believed to encompass the actual (“true”) population value. This true population value usually is not known, but can be estimated from an appropriately selected sample. Wider CIs indicate lesser precision, while narrower ones indicate greater precision.

In conclusion, SD quantifies the variability, whereas SEM quantifies uncertainty in estimate of the mean. As readers are generally interested in knowing the variability within sample and not proximity of mean to the population mean, data should be precisely summarized with SD and not with SEM.

In general, the use of the SEM should be limited to inferential statistics where the author explicitly wants to inform the reader about the precision of the study, and how well the sample truly represents the entire population.

Kindly refer to these citations for additional information:

1. What to use to express the variability of data: Standard deviation or standard error of mean? https://doi.org/10.4103/2229-3485.100662

2. Empowering statistical methods for cellular and molecular biologists. https://doi.org/10.1091/mbc.E15-02-0076

3. Error bars in experimental biology. https://doi.org/10.1083/jcb.200611141

8 Recommendations

Teledyne Technologies

Under the normality assumption, SEM is a function of sample size n. And SD is a special case where n=1. So, in measurement science, SD is sometimes known as “single measurement standard deviation“.

1 Recommendation

University of Science and Technology Houari Boumediene

Mr Patrice Showers Corneli You are wrong.

What do you mean ?

I have used the same standards like ASME.

If you have not understood, see my paper:

Structural Probabilistic Health Monitoring of a Potentially Damaged Bridge,
Mohammed Lamine Moussaoui, Mohamed Chabaat and Abderrahmane Kibboua,

4th International Conference on Materials Design and Applications ICMDA 2021, Tokyo, Japan

University of Port Harcourt

I agree with Francis Tieng

Teledyne Technologies

I disagree with these two claims: "The standard error is a rather useless statistic" and "... SEM by itself does not convey much useful information" in the above discussions of Ronán Michael Conroy and Francis Tieng, respectively. In measurement uncertainty analysis (e.g. the GUM uncertainty framework), SEM is known as the Type A standard uncertainty (SU). SU is the fundamental quantity in the Law of Propagation of Uncertainty (LPU).

GUM: Guide to the Expression of Uncertainty in Measurement https://www.bipm.org/en/committees/jc/jcgm/publications

Justus-Liebig-Universität Gießen

For sure, the SEM is indicating the uncertainty associated with a mean of independent mesurements, as it is the square root of the variance of the sampling distribution of the mean.

But the

*statistic*(the estimated value from the observed data)*can be extremely misleading.*To give an example: you have a Poisson process and you are counting the number of events in a given interval to estimate the expected value. To make an extreme case: You have only 2 observations: 12 and once again 12. So the best guess about the expectation is 12. Given the SEM statistic being calculated "as usual", the SEM is 0. So there is no uncertainty at all. This is obviously nonsense.

Ok, let's say you have 3 observations and they are not all identical. Say: 12, 10 and 14. The mean is 12 again, the SEM is 1.15. Well, now you have some finite estimate of an uncertainty, but that not really better. Given that the data were from a Poisson process, we know, by theory, that the variance is at least as large as the expectation, so if we guess that the expectation is 12, then for n=3 the SEM is at least 2.

The same applies for the normal distribution, where we usually don't know the variance, so we usually don't know the SEM either and we must estimate it from the observed data. And in small samples, these estimates can be very wrong, and this is not a very rare event.

3 Recommendations

King's College London

I think SE is more suitable for induction about the mean of X while SD is more appropriate if we are going to infer about X itself.

1 Recommendation

Teledyne Technologies

Jochen Wilhelm your viewpoints apply to any “statistic” calculated from a small sample. But the limitation, i.e. large uncertainty, of a statistic does not rule out the usefulness of the statistic. Besides, the probability of two same observations randomly drawn from a Poisson distribution is zero, which won’t happen in real world. A nice property of a statistic is that, on the average, the statistic converges to the parameter.

Justus-Liebig-Universität Gießen

The probability to get twice the same number in two subsequent realizations of a Poisson variable with mean lambda is log(P) = 0.28 * 0.6^log(lambda). For lambda = 8 this is about 10%. And this certainly happensin the real world.

Regarding the convergence on average: note that the variance is unbiased, but notthe square root of the variance.

Teledyne Technologies

Indeed, the sample variance is unbiased, but the sample SD or SEM is biased. I advocate the use of unbiased SU in measurement uncertainty analysis.

Simon Fraser University

Francis Tieng has a great answer. But many researchers still use SEM because it is smaller so it looks better

Baskent University

For example i want to understand if a treatment is effective,, i have a placebo group and a treatment group,, and i am looking at how long the disease lasts after intervention, captured in the variables P and T

what i am really interested is the distributions of P and T and how they are different

what captures the essence of these distributions are the mean and the standard deviation,,

so i actually want to see standard deviation instead of SE

----

what is answered by SE is are the two population means different,,

as far as i understand it doesnt say anything about the shape of the distribution of the populations,, maybe someone can correct me if im wrong,,

Justus-Liebig-Universität Gießen

I don't see why mean and SD are "descriptive", at least not why they are generally so.

The mean equals the average, but is has a different meaning. I'd agree that the average is a descriptive statistic. But the mean is an estimate of the expected value of an assumed distribution model of which the data are considered a sample. To my understanding, this is very intferential. The problem is that for any given set of values, average and mean have the very same numerical value. The difference is only in the meaning, and this is rather abstract.

The SD is a complicated beast, as it is the square root of the variance, and this variance can be taken either as the

*average*of the squared*residuals*(the sum is divided by n) or as an*estimate of*the expected squared*error*(the sum is divided by n-1). These two things are not numerically identical (although the difference becomes negilible for large n). I may agree on calling the average of the squared residuals a descriptive statistic, but not the estimate. This is the same for the square-roots of these values, the SDs.Practically, however, the reported SDs are typically calculated based on the division by n-1, to it really is the inferential statistic. Reporting this as descriptive is wrong in any case.

The SEM is always calculated based on the "inferential SD" (division by n-1) and therefore is always an inferential measure (here we agree).

1 Recommendation

College of Saint Benedict and Saint John's University

I'm not sure I follow your statement, "But the mean is an estimate of the expected value of an assumed distribution model of which the data are considered a sample."

I don't think the mean is always interpreted as an estimate of an assumed distribution model. The mean is the mean regardless of the characteristics of a distribution of scores. Whether we consider the mean to be an accurate measure of central tendency for a collection of observations is a separate question, to my mind - and we need not assume this to be the case.

I consider a statistic as "descriptive" if it is intended to apply exclusively to the set of

*observed*data (i.e., the sample), and not intended to represent*unobserved*data (i.e., parameters). For example, the mean height of sample A is in no way taken as reflective of the mean height of sample B, or any collection of cases outside of sample A.I'm curious to hear your thoughts

Justus-Liebig-Universität Gießen

Dear Blaine Tomkins ,

This may be a language problem and I possibly implied a wrong distinction between the english words "mean" and "average". My difficulty is to distinguish the meaning from the words used. So let's be more concrete and distinguish the arithetic mean (of a set of values), and the expected value (of a random variable) (and to avoid the word "average"):

The arithmetic mean is a sample statistic, the expected value is something that can be estimated from a sample. A BLUE and MLE for the expected value is the arithmetic mean of the sample (this is, btw. independent of the assumed distribution model - it applies to any distribution with a finite expectation).

After all, one calculates the sum of a set of values and divides this by the number of values. This statistic may be interpreted as a descriptive statistic but also as an inferrential statistic. This is fully in line with what you say about the

*intention*of the statistic.My critique on Konstantinos' post was that is not automatically just a descritive statistic. It can have an inferential interpretation. And I think it is in fact more often used in its inferrential meaning than in its descriptive meaning.

So I think we both say the same, but my English skills are in need of improvement ;)

College of Saint Benedict and Saint John's University

Jochen Wilhelm I think I see what you mean now. I agree. In practice, the mean is more often understood and used as an inferential statistic. The mean of a sample is, by itself, rather uninformative of anything. In nearly all cases we're interested in the mean in order to answer some question that goes beyond the observed data - even if it's as simple as comparing one mean to another mean.

I cannot fault you for your English when I only know how to read and write English and (to some extent) Spanish. English is a hard language to master given how many irregular verbs it has and the large number of unusual spellings. All in all, your English skills are quite impressive

University of Utah

This set of answers in the past 7 years since the original question address by Dr. Sailesh Palikhe, have been all over the place with wacky ideas.

The question is "what is the difference between "mean±SD" and "mean±SE". These have rather precise mathematical definitions.

A Normal (Gaussian) population is defined completely by the population mean,

**𝝁**and the population variance**𝝈**^{2}^{ }. These are unknowable, but under the assumptions of the Normal Distribution we can derive with calculus the Maximum Likelihood Estimators (MLE) of the mean=∑(x_{i})/n, the variance =(∑_{i}(x_{i}-mean)^{2}/n). These estimators, in the limit converge to**𝝁**and**𝝈**and the standardized deviation^{2}**𝝈**. These describe the central measure and the deviations from the central measure.So, by definition, the MLE(

**𝝈)**=√( ∑_{i}(x_{i}-mean)^{2}/n).With a small sample size, the estimator SD is somewhat biased to be large.

To help correct for that bias and get a value closer to the true

**𝝈**, we use the SE, the standard error which goes to the**𝝈**in the limit as n gets larger.By definition, the SE = √( ∑

_{i}(observation_{i}-mean)^{2}/(n-1)).That is all there is to it.

Justus-Liebig-Universität Gießen

Patrice Showers Corneli , why do you stress the normal distribution? What you describe is independent of this particular distribution. There are numerous other distributions that are fully discribed by their first two moments. And even if the distribution is fully described by its first moment only (like for instance the Poisson), estimates for the variance (or SD) are just not informative but still correct.

1 Recommendation

Lakehead University Thunder Bay Campus

**EDIT**. I failed to notice (until now) that Patrick Gauthier's name was selected when I was aiming for

**@Patrice Showers Corneli**! Sorry about that Patrick!

Patrick T Gauthier, you wrote:

"By definition, the SE = √( ∑

_{i}(observation_{i}-mean)^{2}/(n-1))."That is the equation for the sample SD, not the SE. The SE of the mean = SD/sqrt(n). I assume it was a typo. HTH.

1 Recommendation

Lakehead University Thunder Bay Campus

What statistical software do you use, Patrice Showers Corneli? I would like to check its documentation.

Meanwhile, if you trust that SAS, Minitab & SPSS calculate these basic statistics correctly, you can check these pages:

- https://v8doc.sas.com/sashtml/proc/zormulas.htm
- https://support.minitab.com/en-us/minitab/21/help-and-how-to/statistics/basic-statistics/how-to/display-descriptive-statistics/methods-and-formulas/methods-and-formulas/
- https://www.ibm.com/docs/en/SSLVMB_28.0.0/pdf/IBM_SPSS_Statistics_Algorithms.pdf -- look at the algorithms for the
**EXAMINE**command

HTH.

1 Recommendation

University of Utah

I use have used countless software packages, started out with from-scratch statistical matrix analysis using MatLab and the old Bell Labs S and then S-plus and recently R that is derived from S, and also the pioneering Generalized Linear modeling program GLIM (Generalized Linear Interactive Models) by John Nelder along with the classic Generalized Modeling book by McCullough and Nelder and the wonderful Annette Dobson book on the same topic. If you want to know just how I think about statistics read one of these books. Read also papers and books by the great R.A. Fisher and even better AWF Edwards classic "Likelihood".

I use JMP (SAS) for full exploratory data analysis, and multivariate statistical analysis (Principal Component Analysis, MANOVA, etc. For my maximum likelihood model analysis of DNA sequences, I use PAUP, Phylip, Raxml, Phyml, Mesquite, HYPHY and many others. I can tell you all about you might want to know about the relationships among the Normal, Chi-Sq, Binomial, Multinomial, Gamma, Beta, negative bionomial, etc. etc.

Why do you ask?

University of Utah

Some clarifications here. A perusal of the web sites lists a number of conflicting definitions of SD and SE. This clarifies why the topic came up in the first place and means that I cannot stand firmly by the definitions learned in my graduate school course statistics. I note, for example, the various software primers that Bruce Weaver suggested. The first one from a SAS primer on basic statistics, is identical to mine, the other two (Minitab, which I have used and liked, very much defines them the other way around. Perhaps we all learned them differently.

I am presently away from my Grad school mathematical statistics reference works so will have to double check just what I learned in the next few days.

Universität Osnabrück

Patrice Showers Corneli can you say where you found conflicting definitions of SD and SE and describe them? It seems pretty clear in the links Bruce provided, imho.

1 Recommendation

University of Utah

I leave that to you. In short, the SAS page supports SE = √( ∑i(observationi-mean)

^{2}/(n-1)). The other two supportSE = √( ∑i(observationi-mean)

^{2}/n).College of Saint Benedict and Saint John's University

This is why I still require my intro stats students to conduct statistical tests by hand in addition to using software

1 Recommendation

Universität Osnabrück

Patrice Showers Corneli I think you are mistaken here.

On the SAS page, they clearly give you both options you mention to calculate the variance and what Jochen already described as:"

*average*of the squared*residuals*(the sum is divided by n) or as an*estimate of*the expected squared*error*(the sum is divided by n-1)". You can choose if "d" in the formula may be N or df, where df is the default option. The standard deviation is described as the square root of the variance. Directly below the SD description you find the formula SAS uses to claculate the "STDERR | STDMEAN" s/sqrt(sum(w_{i})), which corresponds to what others have already stated.The documentation for SPSS and Minitab show the same formulas to calculate variance, standard deviation and/or standard error, no contradiction here.

Therefore, may it be the case that you are simply wrong and misread the documentation? For example, Minitab states that the SE is s/sqrt(N) which is clearly not the same as √( ∑i(observation

_{i}-mean)^{2}/n) what you stated above.1 Recommendation

## Similar questions and discussions

## Related Publications

The physicist Ernest Rutherford said, "If your experiment needs statistics, you ought to have done a better experiment." Although this aphorism remains true for much of today's research in cell biology, a basic understanding of statistics can be useful to cell biologists to help in monitoring the conduct of their experiments, in interpreting the re...

Statistical analyses involving multiple predictors are generalizations of simpler techniques developed for investigating associations between outcomes and single predictors. Although many of these should be familiar from basic statistics courses, we review some of the key ideas and methods here as background for the methods covered in the rest of t...

Statistical analyses involving multiple predictors are generalizations of simpler techniques developed for investigating associations between outcomes and single predictors. Although many of these should be familiar from basic statistics courses, we review some of the key ideas and methods here as background for the methods covered in the rest of t...