Retired US Fed Govt/Home Research
Question
Asked 14 November 2016
How to determine Appropriate sample size in survey research?
I have identify only 160 companies are in composites manufacturing sector in the UK. If I want to administer a questionnaire survey, how can I determine the sample size?
Most recent answer
Al Amin bin Mohamed Sultan -
It depends upon the kind of data you are collecting, and your design, and your goals. You should always stratify/break the population into categories/groups which are more homogeneous when you can. Standard deviations within each need to be "guessed" through means such as previous data or a pilot study. See the first book referenced below, for example, for more on this.
If you see an online calculator beware: They are usually for yes/no data, simple random sampling, no finite population correction factor, and worst case p=q. They are not generally applicable.
If you are looking at quantitative data, say continuous or yes/no, there are a number of good texts such as the following:
Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.
Blair, E. and Blair, J(2015), Applied Survey Sampling, Sage Publications.
Lohr, S.L(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole.
If you have auxiliary data, you can do better. You might also look at these, but they are fairly advanced:
Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang.
Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press.
An Introduction to Model-Based Survey Sampling with Applications, 2012,
Ray Chambers and Robert Clark, Oxford Statistical Science Series
Finite Population Sampling and Inference: A Prediction Approach, 2000,
Richard Valliant, Alan H. Dorfman, Richard M. Royall,
Wiley Series in Probability and Statistics.
Survey Sampling: Theory and Methods, First Edition, 1992,
Chaudhuri, A., Stenger, H.,
Marcel Dekker, lnc., New York, Basel, Hong Kong.
My thought though, is that you may be looking at relatively small samples of small categories of highly skewed establishment survey continuous data. If so, you might look at the small example data set shown on the last page of the following, and see how it was used in this paper:
If you are looking at such continuous data from establishment surveys, and have regressor data available, then there are other papers on my RG pages you might also use.
Note that there is a difference between trying to obtain an optimal sample size at a more aggregate level, with stratification, as opposed to being able to publish information for each of those strata, in which case it is better to think of them as subpopulations or categories, not strata. If you need to publish for each subpopulation, that generally requires a larger overall sample than if you only care about the most aggregate level, and not an individual stratum. This is true no matter what kind of sampling, and I suppose that the concept carries over to all kinds of data.
For an interesting historical account which also covers different types of sampling and estimation to a nice but easier-than-usual to understand degree, for continuous data, see the following (noting that the one model-only example uses the classical ratio estimator level of heteroscedasticity, though that is just one possibility):
Ken Brewer's Waksberg Award article:
Brewer, K.R.W. (2014), “Three controversies in the history of survey sampling,” Survey Methodology,
(December 2013/January 2014), Vol 39, No 2, pp. 249-262. Statistics Canada, Catalogue No. 12-001-X.
Ken Brewer believed in using probability sampling and models together, but he explains the different approaches, the pros and cons, in this rather 'fun' paper.
Cheers -
Jim Knaub
Conference Paper Projected Variance for the Model-based Classical Ratio Estim...
1 Recommendation
All Answers (7)
Bindura University of Science Education
I have attached material which can help to address your question. Read Chapter 1 pp.11-50 of the attached text.
2 Recommendations

- Determining sample size is a very important issue because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results.
- In many cases, we can easily determine the minimum sample size needed to estimate a process parameter, such as the population mean.
- Before you can calculate a sample size, you need to determine a few things about the target population and the sample you need:
- Population Size — How many total people fit your demographic? For instance, if you want to know about mothers living in the US, your population size would be the total number of mothers living in the US. Don’t worry if you are unsure about this number. It is common for the population to be unknown or approximated.
- Margin of Error (Confidence Interval) — No sample will be perfect, so you need to decide how much error to allow. The confidence interval determines how much higher or lower than the population mean you are willing to let your sample mean fall. If you’ve ever seen a political poll on the news, you’ve seen a confidence interval. It will look something like this: “68% of voters said yes to Proposition Z, with a margin of error of +/- 5%.”
- Confidence Level — How confident do you want to be that the actual mean falls within your confidence interval? The most common confidence intervals are 90% confident, 95% confident, and 99% confident.
- Standard of Deviation — How much variance do you expect in your responses? Since we haven’t actually administered our survey yet, the safe decision is to use .5 – this is the most forgiving number and ensures that your sample will be large enough.
1 Recommendation
University of Bahrain
The process for determining the best sample size to collect so that you can make a good decision isn’t as complicated as you might think (or remember from your stats classes). Because there is no magic bullet or single number, there are a few things you’ll want to have determined before you start figuring out what your sample size is:
1. Your goals and objectives:2.How precise do you need or want to be? 3. How confident or sure do you want to be in the results?4. What kind of variability are you looking at? 5. Estimate your response rate.
after this you will get.....
1 Recommendation
Retired US Fed Govt/Home Research
Al Amin bin Mohamed Sultan -
It depends upon the kind of data you are collecting, and your design, and your goals. You should always stratify/break the population into categories/groups which are more homogeneous when you can. Standard deviations within each need to be "guessed" through means such as previous data or a pilot study. See the first book referenced below, for example, for more on this.
If you see an online calculator beware: They are usually for yes/no data, simple random sampling, no finite population correction factor, and worst case p=q. They are not generally applicable.
If you are looking at quantitative data, say continuous or yes/no, there are a number of good texts such as the following:
Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.
Blair, E. and Blair, J(2015), Applied Survey Sampling, Sage Publications.
Lohr, S.L(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole.
If you have auxiliary data, you can do better. You might also look at these, but they are fairly advanced:
Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang.
Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press.
An Introduction to Model-Based Survey Sampling with Applications, 2012,
Ray Chambers and Robert Clark, Oxford Statistical Science Series
Finite Population Sampling and Inference: A Prediction Approach, 2000,
Richard Valliant, Alan H. Dorfman, Richard M. Royall,
Wiley Series in Probability and Statistics.
Survey Sampling: Theory and Methods, First Edition, 1992,
Chaudhuri, A., Stenger, H.,
Marcel Dekker, lnc., New York, Basel, Hong Kong.
My thought though, is that you may be looking at relatively small samples of small categories of highly skewed establishment survey continuous data. If so, you might look at the small example data set shown on the last page of the following, and see how it was used in this paper:
If you are looking at such continuous data from establishment surveys, and have regressor data available, then there are other papers on my RG pages you might also use.
Note that there is a difference between trying to obtain an optimal sample size at a more aggregate level, with stratification, as opposed to being able to publish information for each of those strata, in which case it is better to think of them as subpopulations or categories, not strata. If you need to publish for each subpopulation, that generally requires a larger overall sample than if you only care about the most aggregate level, and not an individual stratum. This is true no matter what kind of sampling, and I suppose that the concept carries over to all kinds of data.
For an interesting historical account which also covers different types of sampling and estimation to a nice but easier-than-usual to understand degree, for continuous data, see the following (noting that the one model-only example uses the classical ratio estimator level of heteroscedasticity, though that is just one possibility):
Ken Brewer's Waksberg Award article:
Brewer, K.R.W. (2014), “Three controversies in the history of survey sampling,” Survey Methodology,
(December 2013/January 2014), Vol 39, No 2, pp. 249-262. Statistics Canada, Catalogue No. 12-001-X.
Ken Brewer believed in using probability sampling and models together, but he explains the different approaches, the pros and cons, in this rather 'fun' paper.
Cheers -
Jim Knaub
Conference Paper Projected Variance for the Model-based Classical Ratio Estim...
1 Recommendation
Similar questions and discussions
Zerodivisionerror error in autodock?
Elahe Siadati
Hi, I used material studio and biuld selnium nanoparticle. I used autodock and dock nanoparticle with hsa . when input ligand I recieved this error.
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
****************************************************************
Personal firewall software may warn about the connection IDLE
makes to its subprocess using this computer's internal loopback
interface. This connection is not visible on any external
interface and no data is sent to or received from the Internet.
****************************************************************
IDLE 1.2.2 ==== No Subprocess ====
>>> adding gasteiger charges to receptor
NanoSe3: :MOL2:Se and NanoSe3: :MOL2:Se have the same coordinates
ERROR *********************************************
Traceback (most recent call last):
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\ViewerFramework\VF.py", line 898, in tryto
result = command( *args, **kw )
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\AutoDockTools\autotorsCommands.py", line 1008, in doit
initLPO4(mol, cleanup=cleanup)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\AutoDockTools\autotorsCommands.py", line 292, in initLPO4
root=root, outputfilename=outputfilename, cleanup=cleanup)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\AutoDockTools\MoleculePreparation.py", line 1019, in __init__
detect_bonds_between_cycles=detect_bonds_between_cycles)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\AutoDockTools\MoleculePreparation.py", line 768, in __init__
delete_single_nonstd_residues=False)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\AutoDockTools\MoleculePreparation.py", line 143, in __init__
self.addCharges(mol, charges_to_add)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\AutoDockTools\MoleculePreparation.py", line 229, in addCharges
chargeCalculator.addCharges(mol.allAtoms)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\MolKit\chargeCalculator.py", line 80, in addCharges
babel.assignHybridization(atoms)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\PyBabel\atomTypes.py", line 137, in assignHybridization
self.valence_two()
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\PyBabel\atomTypes.py", line 266, in valence_two
angle1 = bond_angle(k.coords, a.coords, l.coords)
File "C:\Program Files (x86)\MGLTools-1.5.6\lib\site-packages\PyBabel\util.py", line 47, in bond_angle
raise ZeroDivisionError("Input used:", a, b, c)
ZeroDivisionError: ('Input used:', [-3.7719999999999998, -9.9429999999999996, -5.774], [-3.7719999999999998, -9.9429999999999996, -5.774], [-3.7719999999999998, -9.9429999999999996, -5.774])
"prot:A:MET1:HN1 and prot:A:MET1:HN1 have the same coordinates" Error While Performing Flexible Docking?
Anusha Majumder
After selecting the residues of the protein for flexible docking, when I click on "Choose Torsions In Currently Selected Residues", under the " Flexible Residues" tab, I get this error:
prot:A:MET1:HN1 and prot:A:MET1:HN1 have the same coordinates
ERROR *********************************************
Traceback (most recent call last):
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\ViewerFramework\VF.py", line 941, in tryto
result = command( *args, **kw )
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\AutoDockTools\autoflexCommands.py", line 369, in doit
map(self.setAutoFlexFields, flex_residues)
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\AutoDockTools\autoflexCommands.py", line 414, in setAutoFlexFields
rotatables = rbs.select(bondlist)
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\MolKit\bondSelector.py", line 534, in select
rotatable = BondOrderBondSelector().select(bnds,1)
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\MolKit\bondSelector.py", line 507, in select
atype.assignHybridization(allAts)
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\PyBabel\atomTypes.py", line 137, in assignHybridization
self.valence_two()
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\PyBabel\atomTypes.py", line 266, in valence_two
angle1 = bond_angle(k.coords, a.coords, l.coords)
File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\PyBabel\util.py", line 47, in bond_angle
raise ZeroDivisionError("Input used:", a, b, c)
ZeroDivisionError: ('Input used:', [2.015, -1.769, 0.66], [2.015, -1.769, 0.66], [2.015, -1.769, 0.66])
Is there a way to solve this problem?
Related Publications
The paper reviews both the influencing factors and calculation strategies of sample size determination for survey research. It indicate the factors that affect the sample size determination procedure and explains how. It also provides calculation methods (including formulas) that can be applied directly and easily to estimate the sample size needed...