University of Utah
Question
Asked 27 March 2014
What number of points is required to fit Gaussian Distributions?
A Gaussian distribution has an exponential term and 2 variables. Its log will turn it linear. But I am not able to figure out how many points will suffice to fit a gaussian distribution and why? It is recommended to be 5 to 12 points in some articles but no concrete reasons are given.
Most recent answer
@Deepak Kumar Akar,
Sorry for the delay in responding. As discussed earlier, if you already know that the curve is Gaussian then the problem is much easier. In any case, it depends on how accurately you would like to characterize the distribution. For example, if you are interested in the mean of the distribution then, statistically speaking, the certainty of the determination improves proportionally to the square root of the number of points. For example, if you acquire 16 points then the uncertainty in the mean is half as much as if you acquire 4 points. I don't recall what the relationship is for the standard deviation, i.e. how it improves as the number of sampling points increases.
1 Recommendation
Popular answers (1)
University of Utah
If there is no error in the points then thee unique points will determine the fit. This is because there are three degrees of freedom that in the function being fitted, so three points are needed to fit the data.
If there is no error in the points and you somehow knew beforehand that the underlying distribution were normalized to unit probability then two points are enough to uniquely determine the distribution.
In making the comments above seveal assumptions are made. One is that there is no error in the data. This assumption was already stated.
The most important underlying assumption is that the curve you are trying to fit actually is Gaussian.
Also, an assumption is made regarding the meaning of "points". In the discussion above the term "points" is taken to mean that the value of the underlying curve is known at two or three points. For example, if your friend calculated three points on a Gaussian curve and handed the results to you, that would be an example where you would know the value of three points on the curve.
Another case would be if you or someone else were generating a set of random data taken from a Gaussian distribution. This could be via a computer-generated Monte-Carlo sampling from a Gaussian distribution, or from a set of experimentally generated data, or some other statistically generated method. The "points" in this case have an entirely different meaning. They are now repeated draws from a Gaussian distribution. In this case the number of points required is much larger, and it depends on how close you need to be to the "true" values. For example, the statistical error in the mean of the distribution varies as one over the square root of the number of sample points.
I have given an incomplete discussion, but it might be enough to get you started on the right track.
5 Recommendations
All Answers (8)
University of Utah
If there is no error in the points then thee unique points will determine the fit. This is because there are three degrees of freedom that in the function being fitted, so three points are needed to fit the data.
If there is no error in the points and you somehow knew beforehand that the underlying distribution were normalized to unit probability then two points are enough to uniquely determine the distribution.
In making the comments above seveal assumptions are made. One is that there is no error in the data. This assumption was already stated.
The most important underlying assumption is that the curve you are trying to fit actually is Gaussian.
Also, an assumption is made regarding the meaning of "points". In the discussion above the term "points" is taken to mean that the value of the underlying curve is known at two or three points. For example, if your friend calculated three points on a Gaussian curve and handed the results to you, that would be an example where you would know the value of three points on the curve.
Another case would be if you or someone else were generating a set of random data taken from a Gaussian distribution. This could be via a computer-generated Monte-Carlo sampling from a Gaussian distribution, or from a set of experimentally generated data, or some other statistically generated method. The "points" in this case have an entirely different meaning. They are now repeated draws from a Gaussian distribution. In this case the number of points required is much larger, and it depends on how close you need to be to the "true" values. For example, the statistical error in the mean of the distribution varies as one over the square root of the number of sample points.
I have given an incomplete discussion, but it might be enough to get you started on the right track.
5 Recommendations
Gauhati University
Actually, there cannot be a definite answer to this question! Not just the Gaussian probability law, for every other probability law, it cannot be said that to fit the law, we need certain prefixed number of observations,. Indeed, the more, the better.
If the number of observations is small, to fit the Gaussian law, or any other law for that matter, it is better to use the method of successive approximation using the Taylorian expansion.
1 Recommendation
SCHWIND eye-tech-solutions GmbH
In some other papers you can find as a general rule a minimum of 20 observations per each degree of freedom (sometimes DoF + 1), this will make 60 (or 80) observations for a robust fit. Why 20, is not clear to me, I will check my secondary references.
University of Utah
Again, I think it is important to clarify what is meant by "point". If one is sampling off of a mathematically-defined curve (e.g. calculating the value of a function at several points) then two points are enough to completely define a normalized Gaussian distribution, and three points are enough to completely define an unnormalized Gaussian distribution.
If one is taking a series of experimental points (or doing a Monte Carlo sampling of a probability distribution on a computer) and attempting to determine the statistics of the data set generated experimentally, then one is dealing with a different problem, and the answer will depend on the accuracy required.
To determine whether a distrubiton is Gaussian there are a number of different statistical tests that can be applied. One is known as the Anderson-Darling test. There are others as well.
2 Recommendations
NIN, Hyderabad
It could be 5 observations for each continuous class interval. It may be enough to have at least 30 observations to fit a normal distribution. Just try a simulation study, you will come to know the difference.
Bhabha Atomic Research Centre
@Alan Rockwood I want to add that the points have been generated by some experiment (these points may have some inherent fluctuation and the fluctuation from point to point but is fixed for any particular point)and it is known that the points will fall on a guassian curve. For example photopeak in gamma ray spectrometry.
University of Utah
@Deepak Kumar Akar,
Sorry for the delay in responding. As discussed earlier, if you already know that the curve is Gaussian then the problem is much easier. In any case, it depends on how accurately you would like to characterize the distribution. For example, if you are interested in the mean of the distribution then, statistically speaking, the certainty of the determination improves proportionally to the square root of the number of points. For example, if you acquire 16 points then the uncertainty in the mean is half as much as if you acquire 4 points. I don't recall what the relationship is for the standard deviation, i.e. how it improves as the number of sampling points increases.
1 Recommendation
Similar questions and discussions
Conference Speaker Invites (are these legit)
Hung King Tiong
Example invite:
Dear Dr. Hung King Tiong,
The 11th Annual World Congress of Food and Nutrition (WCFN-2024) will be held on November 12-14, 2024 in Singapore. On behalf of the organizing committee, we cordially welcome you to deliver a presentation as Speaker regarding your research on Recovery of Pasteurization-Resistant Vibrio parahaemolyticusfrom Seafoods Using a Modified, Two-Step Enrichment....
Each WCFN has gathered large number of world-renowned experts, professors, and entrepreneurs from many countries. The conference is a great scientific festival with wonderful opportunities to learn about the new research result and achievement of Food and Nutrition, as well as to promote the communication and collaboration among specialists. We look forward you to being a part of this grand event.
Aiming to promote an international exchange of scientific knowledge and experiences, the scientific program will include plenary/keynote lectures and session talks as well as poster presentations under several major themes, mainly focused on: Food Economy, Policy and Laws, Basic Food Science, Advanced Food Technologies, Food Quality and Safety Control, Nutraceuticals and Functional Foods, Diet, Nutrition Research, Human Nutrition through Life Course, Nutrition and Disease Management. Through these dynamic scientific and social events, you will have many opportunities to network and to form potential business collaborations with participants from the globe.
We expect your precious comments or suggestions; also your reference to other speakers will be highly appreciated. We look forward to receiving your replies on the following questions:
1. What is the title of your speech?
2. Do you have any suggestions about our program?
The destination city, Singapore is one of the world's most pocket-sized countries and the smallest country in this region, hence is nicknamed "The Little Red Dot". Despite its small size, Singapore has an significant place in the world with its free trade economy and efficient labor force. As a multi-ethnic country, Singapore is unique in "harmony but difference". In addition to numerous sightseeing and activities, encountering the local culture and customs will give you the ultimate tourism experience.
We sincerely wish you can accept our invitation and join us to contribute your invaluable experience and knowledge at this magnificent conference. Look forward to hearing from you soon!
Sincerely yours,
Ms. Alma Yang
Program Coordinator
Organizing Committee of WCFN
Email: alma.wgc@gala-tek.com
Related Publications
Thesis (M.S.)--Clarkson College of Technology, 1970. Includes bibliographical references (p. 131-132) and abstract.