Content uploaded by Anita Shankar Acharya
Author content
All content in this area was uploaded by Anita Shankar Acharya
Content may be subject to copyright.
Symposium
Sampling: Why and How of it?
Anita S Acharya*, Anupam Prakash**, Pikee Saxena#, Aruna Nigam##
Departments of * Community Medicine, **Medicine and #Obs. & Gynae, Lady Hardinge Medical College & SSK Hospital, New Delhi-110001, India.
##Department of Obs. & Gynae., HIMS&R, New Delhi, India.
Corresponding author: Dr. Anita S Acharya, Associate Professor, Department of Community Medicine, Lady Hardinge Medical College & SSK
Hospital, New Delhi-110001, India. E-mail: anitaacharya29@gmail.com
Received: 17-06-2013 | Accepted: 29-06-2013 | Published Online: 07-07-2013
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (creativecommons.org/licenses/by/3.0)
Conict of interest: None declared | Source of funding: Nil | DOI: http://dx.doi.org/10.7713/ijms.2013.0032
Abstract
A ‘sample’ is a subset of the population, selected so as to be representative of the larger population. Since
we cannot study the entire population we need to take a sample. Sampling techniques are broadly classied
into ‘Probability’ and ‘Non-probability’ samples. Probability sampling allows the investigator to generalise
the ndings of the sample to the target population. Probability sampling includes Simple random sampling,
Systematic random sampling, Stratied random sampling, Cluster sampling, etc. A sampling frame is crucial
in probability sampling, because if the sampling frame is not drawn appropriately from the population of
interest, random sampling from that frame cannot address the research problem. Generalisations can be
made ‘only’ to the actual population dened by the sampling frame. Non-probability sampling includes
Convenience/purposive sampling, Quota sampling, Snow ball sampling, etc. Each method of sampling has
its own advantages and limitations, however, probability sampling is preferable, since its results can be
generalised.
Key words: Data collection; sampling studies; sampling bias; population; probability; random allocation.
Introduction
In any research study, the best strategy is to
investigate the problem in the whole population.
But practically, it is always not possible to
study the entire population. Alternatively, we
study a “sample” which is sufciently large and
representative of the entire population. A sample
is a subset of the population, selected so as to be
representative of the larger population [1]. By
taking a representative sample, we can reduce the
costs incurred, the time taken to do the research
and also the manpower needed to conduct the
study. Sample representativeness depends on three
factors: 1) Sampling methodology 2) Sample size
and 3) Response rate. Sampling methods should be
systematic and dened so as to draw valid inferences
from the sample.
Classication of sampling methods
Broadly, sampling methods are classied as 1)
Probability sample and 2) Non-probability
sample. Probability samples are the gold standard
in sampling methodology and also for ensuring
generalisibility of the study results to the target
population. By probability sampling, we mean each
individual in the population has an equal chance of
being selected in the study. Probability sampling is
further classied as:
1. Simple random sampling
2. Systematic random sampling
3. Stratied random sampling
4. Cluster sampling
5. Multiphase sampling
6. Multistage sampling
All the above sampling methods use a random
process.
1. Simple random sampling: In this method, every
individual has an equal chance of being selected
in the sample from the population. Data is chosen
330 Indian Journal of Medical Specialities, Vol. 4, No. 2, July - Dec 2013
INDIAN JOURNAL OF MEDICAL SPECIALITIES 2013;4(2):330-333
using random number table or computer generated
list of random numbers. It can also be done by
lottery method, using currency notes, etc.
In this method, a sampling frame is required. All
the individuals in the study population have to be
enumerated either in ascending or descending order.
The advantages of this method are that minimal
knowledge of the population is required, the internal
as well as external validity is high and it is easy to
analyse data. However, the limitations are that the
cost is high, a sampling frame is required. They tend
to have large sampling errors and less precision than
stratied samples of the same size [2].
Example- Let us say there are 200 participants
in a conference and we would like to select 50
participants by simple random sampling. The list of
all the 200 participants would be available which
constitutes the sampling frame. The 50 participants
can now be selected by either using random number
table or by lottery method. Once a participant has
been selected, that particular number is struck
off from the random number table. This method
is known as sampling without replacement. In this
way 50 participants are selected.
2. Systematic random sampling: In systematic
sampling, the selection of the rst subject is done
randomly and then the subsequent subjects are
selected by a periodic process. A systematic random
sample is one in which every kth item is selected; k
is determined by dividing the number of items in the
sampling frame by the desired sample size. An initial
starting point is selected by a random process, and
then every kth number on the list is selected. The
advantages of this sampling is that it has moderate
usage, moderate cost, internal and external validity
is high, it is simple to draw and easy to verify. The
disadvantage is that technically only the selection
of the rst subject is a probability selection since
for subsequent selections there would be subjects
who will have zero chance of selection.
Example- If we take the same example as above,
N=200 and n=50, therefore, k = N/n, i.e. 4, which
becomes the sampling interval. Now we select a
random number between 1 to 4. Suppose it is “3”,
so number “3” participant is our rst subject. Then
we go on adding “4” to this number. Our subsequent
subjects would be 7, 11, 15, 19, 23, 27, 31 and so
on, till we complete the requisite sample size of 50.
3. Stratied random sampling: Data is divided
into various sub-groups (strata) sharing common
characteristics like age, sex, race, income,
education, and ethnicity. A random sample is taken
from each strata. The advantages are- it assures
representation of all groups in the population
needed. The characteristics of each stratum can
be estimated and comparisons can be made. It also
reduces variability from systematic sampling. The
limitations are that it requires accurate information
on proportions of each stratum; also stratied lists
are expensive to prepare.
Example- In studying the prevalence of diabetes in
an adult population, it would be possible to stratify
the population according to gender and then having
equal number of subjects from both males and
females. This would yield sex-wise prevalence of
diabetes. The sample could also be stratied by
place of residence such as urban, rural or peri-
urban which would give us area-wise prevalence
of diabetes with equal representation from each
group.
4. Cluster sampling: A cluster random sample is a
two-step process in which the entire population is
divided into clusters or groups, usually geographic
areas or districts like villages, schools, wards, blocks,
etc. It is more commonly used in epidemiologic
research than in clinical research. It is most
practical to be used in large national surveys. The
clusters are chosen randomly. All individuals in the
cluster are taken in the sample. Usually it requires
a larger sample size. Cluster sampling is very useful
when the population is widely scattered and it is
impractical to sample and select a representative
sample of all the elements [3].
Example- A sample could be taken from rst year
college students measuring their knowledge of
Human papilloma virus (HPV) and cervical cancer.
Suppose all the colleges in Delhi are clusters.
We select 20 colleges either by simple random
or systematic random sampling, and then every
college becomes a cluster. We may then interview
all the students or randomly select students in each
selected cluster for their knowledge on HPV.
5. Multiphase sampling: Multi-phase sampling
Indian Journal of Medical Specialities, Vol. 4, No. 2, July - Dec 2013 331
Sampling: why and how of it?
is a complex form of cluster sampling. Here the
population is organised into groups; subsequently
groups are randomly selected and then the
members are randomly selected in these groups
(an equal number selected per group). A part of the
information is collected from whole sample and part
from sub-sample. This method of sampling is mostly
carried out to increase precision, reduce costs and
reduce non-response.
Example- In a tuberculosis survey, Mantoux test is
done in all cases (Phase I); in the next phase, X-ray
chest is done in all Mantoux positive cases (Phase
II); in the last phase, sputum examination is done in
all X-ray positive cases (Phase III).
Survey by this method of sampling is less costly, less
laborious & more purposeful.
6. Multistage sampling: It is a complex form of
cluster sampling in which two or more levels of
units are embedded one in the other. It involves
the repetition of two basic steps i.e. listing and
sampling. Typically, at each stage the cluster gets
smaller in size and in the end, subject sampling is
done. Sometimes, special terminology is used for
various stages of sampling. The rst stage sampling
is called as ‘Primary Sampling Unit” (PSU), the
second stage called as “Secondary Sampling Unit”
(SSU), the third stage known as “Tertiary Sampling
Unit” (TSU) and so on till one gets to the “Final” or
“Ultimate” sampling units [2].
Example- In a national survey, a random number
of districts are chosen in all the states followed by
random number of talukas and villages. In the third
stage, houses will be selected. All the houses which
are the nal units of sampling are surveyed.
It is not as robust as true random sampling but
probably helps to resolve the limitations inherent to
random sampling. It is extremely useful as it involves
multiple stages of randomisations. Multi-stage
sampling is used frequently when a complete list
of all the members of the population does not exist
and is inappropriate. The costs are thereby reduced
as compared to traditional cluster sampling.
It is to note that in multi-stage sampling, the
sampling units for the different stages are different.
On the other hand, in multi-phase sampling the
same sampling unit is sampled multiple times.
Non-probability sampling: Non-probability samples
are those in which the probability that a subject is
selected is unknown and results in selection bias in
the study. They include the most commonly used
convenience/purposive sampling, quota sampling,
snowball sampling, etc.
1. Convenience/purposive sampling: This is the
most commonly used sampling method. The sample
is chosen on the basis of the convenience of the
investigator. Often the respondents are selected
because they are at the right place at the right
time. Convenience sampling is most commonly used
in clinical research where patients who meet the
inclusion criteria are recruited in the study. The
advantages are that they are most commonly used,
less expensive and there is no need for a list of all the
population elements. However, they are not without
limitations; the foremost being variability and bias
cannot be measured or controlled. Secondly, results
from the data cannot be generalised beyond the
sample.
Example- Patients coming to the out-patient
department of a hospital and meeting the inclusion
criteria, school students, members of a social
organisation, etc.
2. Quota sampling: The sampling procedure that
ensures that a certain characteristic of a population
sample will be represented to the exact extent that
the investigator desires.
Example- In a sample of 100, the investigator wishes
to have 40% men & 60% women in the sample. He
would stop when 40 men are recruited i.e. ‘quota’
for men is over. The advantages are that the cost is
moderate, it is very extensively used/understood,
and there is no need for list of population elements.
It also introduces some elements of stratication.
The limitations are similar to convenience sampling.
Stratied sampling and Quota sampling are similar in
that in both, population is divided into categories/
strata and subjects are selected from each category.
The purpose is to select a representative sample
and/or to allow sub –group analyses.
However, there are certain differences between
332 Indian Journal of Medical Specialities, Vol. 4, No. 2, July - Dec 2013
Anita S Acharya and others
“Stratication” and “Quota” sampling. In the
former, selection of subjects is by simple random
sampling once the categories have been created.
Call-backs are used to get that particular subject.
Stratied sampling without call-backs may not, in
practice, be much different from quota sampling. In
‘quota’ sampling, interviewer selects rst available
subject who meets the inclusion criteria. Thereby,
it is a convenience sampling. A sampling frame is
required for stratied sampling but not for quota
sampling. More importantly, stratied sampling uses
probability sampling, thus permitting the estimation
of sampling error which is not possible with quota
samples.
3. Snow-ball sampling: In this sampling procedure,
the initial respondents are chosen by probability
or non-probability methods, and then, additional
respondents are obtained by information provided
by the initial respondents.
Example- In a study on a sample engaging in high-
risk behaviour or substance abuse, a person who is
engaging in a high-risk behaviour may name other
persons involved in similar high-risk behaviour
practices, and this continues further till adequate
number of respondents are completed. The
advantages are its low cost, usefulness in specic
circumstances and for locating rare populations.
The disadvantages include bias because sampling
units are not independent and projecting data
beyond sample is not justied.
Conclusion
In order to have valid results from any research
Key Points
• The sampling method chosen depends on the
population of interest.
• Careful planning is the key for generating
reliable results.
• Probability samples are the gold standard in
sampling methodology.
• Probability sampling means one can
generalise to the population dened by the
sampling frame.
• Non-probability sampling means one cannot
generalise beyond the sample.
study, it is important to choose a sound and scientic
sampling methodology. Ideally, probability sampling
methods should be used to ensure representativeness
of the sample and also for generalisibility of the
results to the target population. If they are not
used, caution must be exercised in interpreting the
study results.
References
1. Probability & related topics for making
inferences about data. In: Dawson B, Trapp RG
(eds). Basic & clinical biostatistics. McGraw Hill,
USA 2004,4th edn; 61-92.
2. Choosing the type of probability sampling.
Available from http://www.sagepub.com/upm-
data/40803_5.pdf Accessed on 16th June,2013.
3. Baridalyne N. Sampling, sample size estimation
and randomisation. Indian J Med Spec
2012;3:195-7.
Indian Journal of Medical Specialities, Vol. 4, No. 2, July - Dec 2013 333
Sampling: why and how of it?