National Health Authority

Question

Asked 14th Apr, 2015

# How to handle Narrow Confidence Intervals?

I am running survival analysis using SAS. I have a very large sample size (>19,000) and I am finding very narrow CIs, For example, 1.066 (1.062-1.069). The model is also weighted because it is a complex survey with mortality follow-up. What would explain the narrow CI?

## Most recent answer

If the confidence interval is relatively narrow (e.g. 0.70 to 0.80), the effect size is known precisely. If the interval is wider (e.g. 0.60 to 0.93) the uncertainty is greater. Intervals that are very wide (e.g. 0.50 to 1.10) indicate that we have little knowledge about the effect and that further information is needed.

1 Recommendation

## All Answers (13)

Linden Consulting Group, LLC

Your very large sample size may very well explain your narrow CIs. As a test, take a random sample from these data (let's say 1000 units), and rerun the analysis. I would bet the bank that your CIs will increase accordingly.

That said, there may be something going on with the weighting that may impact the estimates as well.

1 Recommendation

Bennett University

sample size (n) and sample std dev (s) are inversely related by a factor of sqrt(n). Imagine it like this: if your sample size tends to infinity, it is no longer a sample, but a universe. Hence whatever mean and std dev you have are the true statistic. Then there is no need for CI because you already know the mean. Your large sample size is the reason for low (s) and hence narrow CI.

Justus-Liebig-Universität Gießen

As already said, the width of the CI is (also) a function of the sample size. Larger sample size -> smaller CI. Huge samples -> tiny CIs.

The width of the CI is (an estimate of) the

*precision*obtained to estimate the parameter*based on the sampling variation*. Note that this precision can be arbitrarily high with arbitrarily large sample sizes - and that**it does not indicate**any "correctness" of the estimate. The "precision" measured by the CI is a sample-statistical feature and**it must not be mistaken**as "likely range of the 'true' parameter value"! (as such it is often*wrongly*described in stats textbooks) It is easily possible to get a very highly precise estimate that is grossly "wrong", simply because the underlying model was "wrong" (for instance missing a non-linear relationship, missing a relevant predictor or interaction) or the experiment was inadequate (for instance by non-random sampling, introducing biases, "wrong" read-outs etc.).You first have to discuss the adequacy of the experiment, you then have to discuss different sensible models, then you may decide for a favorable model (due to some scientific [not statistical!] reasons), assuming that it is adequate or the best available for your purpose (you may also really use different models and see how this impacts the estimation of your parameter of interest). Then you can take the withs of the CIs as indication for the

*statistical*precision of the parameter estimates and discuss their location (i.e. the effect sizes) in the context of your*scientific*question (e.g.: is an OR of 1.066 or up to 1.069 somehow relevant?).2 Recommendations

Eli Lilly

Thank you for the additional and insightful information. Dr Linden I did run a random sample and the CI were slightly wider although I ended up loosing significance of some of variables.

What do you all think about weight /normalize in SAS?

Royal College of Surgeons in Ireland

I think that the size of the effect is part of the explanation. If this is the hazard ratio for blood pressure or age, for example, the effect size is for a one-millimetre or one-year increase in the predictor, so the effect is very small.

In this example, using age in decades or 5-year increments gives a more interpretable estimate of the effect on risk. The CI's look OK, it's the effect size that looks odd.

1 Recommendation

Oslo City CURATO

I am not sure that I understand Wilhelm Jochen correctly. To me precision is a measurement error, the variation between several "identically" performed measurements one object, described by dispersion parameters.To describe this error you need to make such repeated measurements on at least 15-20 objects.This error is not dependent on sample size. Sample variation, which is not an error, is also described by dispersion parameters, and does not change to any degree, once you have measured approximately 20 cases. Accuracy is the difference between means and is described by location parameters. Also accuracy is independent of sample size.

Milin Padalkar states that a large sample size is causing a low sample std (s) and therefore a narrow CI. I disagree. The sample std is independent of sample size, while CI is, as stated, dependent on sample size.

CI shows you the interval within which a true mean is likely to be found when all objects in the universe is measured. There are, however, a number of reasons why measurements always (!) are more or less incorrect, and the CI only takes into account the error caused by your sampling. When all cases are measured this error is zero, and your PI=0.

What beats me is that some researcher recommend CI rather than p-values by stating that CIs can be used for decision making, p-values not. I maintain the p-values and CIs are exactly the same. There is a 100% linear relationship between p-values and CIs.

It seems to me that CI may be considered to be a dispersion parameter, which it is not!

In line with some of the above: Calculating p-values and CIs, the assumption is made that there is a infinite large number of cases. Usually we make assessments based on few cases, CIs and p-values help us to see the reliability of these assessments. When all cases are included we have a measured value, not an assessment, and CIs and p-values are meaningless. This is the case when we have small well defined groups.

U.S. Food and Drug Administration

Just quick question, because you have mentioned it's survey data-did you use (PROC SURVEYPHREG (together with cluster, domain, and strata statements)?

Justus-Liebig-Universität Gießen

Dear Arne,

I am not sure I undersand you correctly. But we may probably ba able to resolve our understanding problems...

There are different sources of variance, particularily the "biological" and the "technical" variance, both influencing the variability of individual values we record/measure. Such recorded values are used to estimate means or effects. Such estimates are "uncertain", and the inverse of this uncertainty is called "precision" (of the estimate).

Biological and technical variance are given "constants" (depending on the phenomenon measured and the way it is measured). It has nothing to do with sample size or how often we measure.

The precision of estimates does depend on the sample size. Having many repeated recordings of the same individual/entity/specimen can be used to give an estimate of this individuals value with better precision: the "technical variance" can be eliminated by averaging many repeated measurements. Doing this with many individuals will show the inter-individual variabiliy, i.e. the "biological" variance.

Having measurements from many individuals can be used to estimate the expected value of the measured response (some say: to estimate the population mean). This estimate has a precision that get better the more individuals are sampled. If actually all individuals existing are sampled, the resulting mean is actually not an estimate anymore: this "estimate" had an infinitively high precision, there is not uncertainty left (some say: if the whole population is sampled, the sample mean *is* the population mean, so we do know the true population mean and do not need to estimate it).

The word "error" is meant in a statistical sense, not that something was "wrong".

Accuracy is not the same as precision. You can have a highly precise estimate that is not at all accurate. Accuracy means "being close to the correct value", precision means "having little variability of estimates between replicate analyses" Bias is the enemy of accuracy, and there is no statistical weapon to fight it. Presicion, on the other hand, can always be improved to any extend by increasing the sample size.

.

In the following part of your response there are some misconceptions:

"

*CI shows you the interval within which a true mean is likely to be found when all objects in the universe is measured.*"No, this is NOT the meaning of a CI. A CI is a random interval. It is a property of the data, not of the "true mean". It may be likely that a sick person says in bed, but this does not mean that a person lying in a bed is sick.

.

"

*What beats me is that some researcher recommend CI rather than p-values by stating that CIs can be used for decision making, p-values not.*"You are right that p-values and CIs are two sided of the very same coin. And there is a habit to use CIs in the same way as p-values to "reject null hypotheses". This usage is silly and not intended (as you said one can use p-values for that). In contrast to p-values, CIs contain some information about the effect size ("how much", "how different", "how strong"), and this should be interpreted. And CIs are intervals. They do provide a "range of estimates (or hypotheses)" that is plausible in light of the current data. And this is related to "precision": a wider interval indicates a larger uncertainty about which estimates or hypotheses explain the data best.

.

"

*CIs and p-values help us to see the reliability of these assessments.*"No. Neither p-values nor CIs are measures of reliability. This is related to the misconception that a CI would give a range of "likely values of the true mean" above.

Oslo City CURATO

Dear Jochen

I am convinced that we agree. However, we are defining some concepts in somewhat different ways. Thanks for your response.

Arne

Oslo City CURATO

One more point. I hope we can agree that the CI gives us a range within which the true means is likely to be in 95% of instances, assuming that the only source of error is selection of cases. When the number of cases is large, this range is narrow, and p-values (t-tests) are small. Usually such results are presented by means of bar-charts with extra lines to show the CI. It seems that these extra lines may be misinterpreted as showing the variation of the values, which is an incorrect interpretation.

A more appropriate method for presenting such results may be by means of box-plots. In addition to showing the actual variation, the box is more appropriate for showing the location of results, the bar chart is more appropriate for showing number of cases.

Justus-Liebig-Universität Gießen

I think we can :)

However I do not like the word "likely", what is another word for "probably". There is no probability for any given interval to contain the "true mean" (whatever "truth" is - this is more a religios question than a scientific one!). The probability is assigned to the procedure of constructing such intervals. The probability of the

*procedure*of constructing 95%-confidence intervals that contain the "true mean" is 0.95.Regrading the graphical representation:

There is a large confusion about what people show. A common example: one has two (or more) groups and wants to analyze the differences between the groups. I think it is very common in this case that people show the group means and spend some effort on calculating and showing the precision of these estimates (be it standard errors [SE], or confidence intervals [CI]) - but they miss to show the actually important information: the estimated

*differences*between the groups with their SEs or CIs.The group means itself may or may not be important. In my field they are often not so, sometimes they are completely irrelevant (like fluorescence measures in arbitrary units!). The differences are discussed, but the differences are not shown, and no precision measure of them is provided. This is bad but common practice.

To I think firstly one should think if the values of the response variable itself are of any scientific interest. If so - only if so-, then these values should be shown. I agree that is would be most instructive to really show the values (in 1D scatterplots or "beeswarm plots", see first link) really showing the whole data

*as is*(each datum is a dot, so you see the whole complete information, nothing is hidden, you see the sample size, the distribution, possible problems/outliers...everything). Only if the plot will be too busy or the aim is to show a higher-level pattern across multiple groups, showing summaries (like boxplots) are more appropriate.But is the differences between the groups are the (main) concern, then these differences should be made explicit together with their precision measures (SEs or CIs - surely I very much prefer CIs). A convenient plot is Tukey's mean difference plot (see second link).

Eli Lilly

Thank you all for your input. I have learnt much more than i anticipated. Great discussion.

1 Recommendation

National Health Authority

If the confidence interval is relatively narrow (e.g. 0.70 to 0.80), the effect size is known precisely. If the interval is wider (e.g. 0.60 to 0.93) the uncertainty is greater. Intervals that are very wide (e.g. 0.50 to 1.10) indicate that we have little knowledge about the effect and that further information is needed.

1 Recommendation

## Similar questions and discussions

## Related Publications

Statistical Formulas Used. Describes formulas for Design Effect and 95% Confidence Interval calcuations, from Donner A, Birkett N, Buck C (1981) Randomization by cluster. Sample size requirements and analysis. Am J Epidemiol 114: 906–914.
(TIF)