BookPDF Available

# Revealing Statistical Principles.

Authors:
Revealing Statistical Principles
J.K. Lindsey
ii
Contents
Preface vii
1 Planning a study 1
1.1 Why statistics? 1
1.2 Protocols 5
1.3 Types of observations 18
1.4 Study designs 27
1.5 Summary 40
2 Sample surveys 42
2.1 Sampling 42
2.2 Organization 49
2.3 Measuring instruments 54
2.4 Sampling error 67
2.5 Sample designs 73
2.6 Sample size 84
2.7 Summary 89
3 Experimental trials 91
3.1 Basic principles 91
vi Contents
3.2 Ethical issues 106
3.3 Designs 110
3.4 Organization 119
3.5 Summary 122
4 Data analysis 124
4.1 Data handling 124
4.2 Descriptive statistics 132
4.3 Role of statistical models 134
4.4 Model selection 148
4.5 Estimating precision 156
4.6 Summary 162
5 Reporting the results 164
5.1 Evaluation of the study 164
5.2 Interpreting the results 167
5.3 Writing the report 176
5.4 Publication and dissemination 185
5.5 Summary 192
Bibliography 194
Index 198
Preface
To ﬁnd out what happens to a system when you inter-
fere with it you have to interfere with it (not just pas-
sively observe it). (Box, 1966)
This little book is addressed to people who are called upon
to organize research studies involving human subjects or to
judge the value of such studies, but who have little or no sta-
tistical knowledge. Those responsible for research and devel-
opment in government and industry will be one main body
of readers. Beginning Ph.D. candidates who will be conduct-
ing empirical research, requiring statistical methods, in some
substantive area involving human beings but who have little
training in such methods will form a second group.
In this book, I attempt to provide the basic principles of
statistics in a non-mathematical way, accessible to a wide au-
dience. My intention is to avoid technical details that can be
obtained, as necessary, from a professional statistician or, for
the more advanced, from the statistical literature. In this way,
you should acquire sufﬁcient knowledge of what statisticians
do in order to be able to communicate with them, whether to
viii Preface
obtain advice or to criticize the work they have done.
The text begins at the point when a study is originally con-
ceived and moves in order through all stages to the ﬁnal re-
port writing, covering both observational and experimental
(intervention) studies. Due to the primordial importance of
the proper design of a study, much of the material concen-
trates on this aspect. I have spent considerably less time on
the analysis, which, in any case, is covered, more or less ad-
equately, in introductory statistics courses.
Even if you are primarily interested in only one of obser-
vational and intervention studies, you should preferably read
both of these chapters. Many principles are common to the
two and the contrasts can provide you with illuminating in-
sights, highlighted by the quote from Box given above con-
cerning the perpetual problem of studying causality among
human beings.
The ideas presented in this book have accumulated from
two types of experience: in educational planning and evalua-
tion beginning about 25 years ago, primarily in Third World
countries, especially India, Indonesia, Madagascar, and Mo-
rocco, and in clinical trials, beginning somewhat more re-
cently and restricted to Europe. Thus, many of the examples
in the text are related to these two ﬁelds, but I have tried to
keep the discussion general enough to be applicable to any
studies directly involving human beings and requiring statis-
tical procedures.
In order to make the ideas clear and easily accessible,
I have presented many as check-lists. My intention is not
to provide many details on modern statistical methods but
rather an overview. The bibliography will give you indica-
Preface ix
tions as to where to ﬁnd some of the information needed ac-
tually to carry out the procedures that I have described.
Obviously, few of the ideas in this book are new. However,
to make the text more readable, I have not loaded it with
scholarly references, but have included in the bibliography
the works that I have found most useful.
Philippe Lambert, Patrick Lindsey and six referees pro-
vided many useful comments on an earlier draft of various
chapters.
I would like to acknowledge the support of UNESCO,
which ﬁnanced a course on this subject for the Ministry of
Education in Morocco, and especially Claude Tibi who or-
ganized the course and who himself participated as much as
I in presenting it.
J.K.L.
Diepenbeek and Li`ege
July, 1998
1
Planning a study
1.1 Why statistics?
1.1.1 Human variability
This book is about rigorous ways of collecting scientiﬁc in-
formation about human beings. In such circumstances, ran-
dom variation in observations makes statistical procedures
necessary. If all people reacted in exactly the same way in
all circumstances, it would be possible to demonstrate any
relationship of interest simply by observing one individual.
If the common cold always lasted exactly seven days and ad-
ministration of a new medication to one person reduced it
to ﬁve, we would know that the drug worked. If every stu-
dent received the same score on a test, administering it to one
child would tell us how difﬁcult it was. Because this is not
so, we must conduct studies involving groups of people. And
measures of variability will be as important as will averages.
However, although some speciﬁc group of people will be
of particular interest in a study, usually you cannot observe
all members of that group. You must select a representative
subgroup, or sample. Thus, the ﬁeld of statistics can provide
you with objective means of generalizing from the particular-
2Planning a study
ities of observations only on some suitably chosen subgroup
to conclusions about the group as a whole.
1.1.2 Research projects
We can divide most research projects concerning human be-
ings into three main phases:
1. deciding on the question(s) to study, the procedures to use,
and which people to include (Chapters 1, 2, and 3);
2. collecting the required information from and/or about them
(Chapters 2 and 3);
3. processing, analysing, interpreting, and reporting this in-
formation (Chapters 4 and 5).
In a process of this complexity, detailed prior planning is es-
sential. A considerable body of theoretical statistical knowl-
edge is available to aid you in carrying out the ﬁrst and third
stages efﬁciently. Appropriate ways of performing the sec-
ond are still very much a matter of trial and error, often de-
pending on speciﬁc-subject matter questions rather than gen-
eral statistical principles.
Statistical methods have a number of advantages over other
methodological tools of the research worker. You must:
1. record information in as standardized a form as is possible
with human subjects;
2. choose subjects in an objectively representative fashion so
that you can make generalizations from speciﬁc observa-
tions;
3. state assumptions clearly, and usually check them empiri-
cally.
1.1 Why statistics? 3
In planning a study, two aspects that will be of particular
pertinence to statistics are that the results of the study are
relevant to the questions being asked and that they are suf-
ﬁciently precise. Statistical design of a study is speciﬁcally
concerned with these objectives. General questions that you
will have to face include:
how to select the particular individuals to be observed;
how to ﬁx the total number of such individuals;
how to allocate these individuals among various pertinent
groups.
Thus, it is essential that a statistician be involved from the
very initial stages of planning a study, and not simply be
called upon to analyse the ﬁnal results. In the latter case,
unless one is particularly gifted or lucky, the statistician will
generally only be able to provide a post mortem report on the
reasons why the study failed to attain its goals!
It is important that you clearly distinguish between two
types of investigation:
1. a planned intervention in the natural course of events to
determine its effect;
2. the passive observation of phenomena as they exist in so-
ciety.
The ﬁrst is called an experiment or a trial and the second a
survey. As we shall see, only the former can provide you
with direct objective information about the consequences of
the implementation of some innovation.
4Planning a study
1.1.3 Ethics
Statistical methods can play an important role both in deci-
sion making and in scientiﬁc inference. However, they also
have the potential for misuse: everyone is familiar with some
phrase such as ‘lies, damned lies, and statistics’! It is best to
consider some of these problems immediately.
Misuse of statistical methods can occur from the design,
through the analysis, to the reporting stage of a study. Three
of the most important problems to avoid are:
1. bias;
2. sampling too few subjects to detect a difference;
3. lack of published results.
The whole point of almost any study is to further knowledge,
often with the view to using the information as a partial basis
for policy or decision making. If you do not report the result
of a study, it was a waste of time both for the investigators
and for the subjects involved.
On the other hand, if it becomes evident to you during a
study that there are unexpected difﬁculties, implying serious
inadequacies, you should stop the study. Prior ignorance of
the design and organizational requirements of a study is not
an excuse for inadequate preparation!
One may argue (Altman, 1991, pp. 477–478, 491–492)
that such misuse of statistics, and the accompanying substan-
dard research, is unethical. You are:
misusing subjects by exposing them to inconvenience and,
in some cases, to risks;
wasting resources;
1.2 Protocols 5
publishing misleading results that can lead to inappropri-
ate decisions, with the accompanying further risks and
wasted resources.
If you publish a poorly conducted study, others may:
ﬁnd it impossible to obtain funding or permission to con-
duct further research on the subject;
be led to follow false lines of investigation;
use the same inferior research methods elsewhere;
widely introduce an intervention although it has no effect,
or even harmful effects.
Unfortunately, with enough effort, even the worst research
report can eventually be published somewhere.
Dishonesty and fraud are hopefully rare. Cases include:
hoax – reporting a phenomenon that has never existed;
forgery – inventing observations that were never made;
cooking – selecting only those observations, or those sta-
tistical analyses, that agree with the desired conclusions.
Most cases of fraud are eventually uncovered, although some-
times only after signiﬁcant damage has been caused.
1.2 Protocols
When beginning a study, you must develop a protocol to de-
scribe the purpose of the study and the steps in obtaining and
analysing the data pertinent to this goal. However, there is no
point in starting to plan a study that does not have adequate
ﬁnancial support and sufﬁcient skilled staff available.
6Planning a study
Proper design of the study is essential. The data from
a well-planned study can be analysed in many appropriate
ways, but no amount of clever manipulation at the analysis
stage can compensate for a badly conceived study. A per-
fect design that is impossible to implement in practice is of
no use; neither is a practically convenient plan that will not
support the desired scientiﬁc conclusions.
Your protocol should clearly specify the following.
1. The subject:
(a) the background and motivation;
(b) the question(s) you wish to investigate;
(c) the administrative responsibilities.
2. The material:
(a) the population and time frame you will consider, and
the unit of observation (person, family, town, . . . );
(b) how you will choose the sample, including the type
of study design,randomization, the sampling or ex-
perimental unit(s), and the determination of sample
size;
(c) in experimental trials, the type of subject consent;
(d) what outcome(s), to become the response variables
in the statistical analysis, you will measure;
(e) what sources of explanation, to become the explana-
tory variables in the statistical analysis, you will mea-
sure.
3. The methods:
(a) if you will maintain certain variables under experi-
mental control, the randomization process by which
1.2 Protocols 7
you will perform treatment assignment to individu-
als in the study;
(b) what instruments you will use to measure the vari-
ables and how you will train the investigators in-
volved to use them;
(c) the ways that you will conduct monitoring of the
progress of the study, including means of preventing
deviations from the protocol and any interim analy-
ses of the data.
4. The analysis:
(a) procedures for data transfer to electronic form and
for veriﬁcation against recording and transcription
errors;
(b) appropriate statistical models that you think will al-
low you to detect patterns of interest in the data to
be collected;
(c) selection strategies for choosing among the possible
models;
(d) criteria to distinguish random, or chance, variability
from that which is systematic.
5. The report: the form in which you will submit the ﬁnal
results.
These points form a uniﬁed whole; you should consider them
simultaneously. Your choices about any point will have an
impact on most of the others.
Prepare a draft protocol very early in your planning of a
study. This will reveal confusions, weak points, and possible
difﬁculties that you must face and resolve. You may require
8Planning a study
several drafts before producing a protocol that is acceptable
on scientiﬁc, organizational, and ethical grounds.
Where possible, invite all people or organizations who
may make use of the information obtained to provide input
as to the structure of the protocol. In this way, they will be
aware of the nature of the study and can make suggestions
for modiﬁcations before the study begins.
It may often be desirable to construct the protocol in such
a way that it will make the study comparable with previous
existing studies, whether in the same or in other countries.
This will be especially true for deﬁning the population and
constructing the instruments.
Involve the statistician who will be responsible for the
analysis and presentation of the results from the ﬁrst plan-
ning stages. It may also be necessary to consult with a statis-
tical expert in study design.
To make the choices necessary to construct a protocol (ex-
cept for the ﬁrst point), prior knowledge of variability in the
population and of ways in which it is practical to collect the
information will be of help. For this, a pilot study may be
necessary.
The ﬁnal protocol will serve, among other things, as:
a speciﬁcation of the scientiﬁc design, including motiva-
tion and aims;
an operations manual by which all investigators know what
is expected of them;
a prior record of assumptions and hypotheses so that you
cannot be accused of drawing post hoc conclusions.
1.2 Protocols 9
A protocol may vary from a few to 50 pages, depending on
the complexity of the study.
1.2.1 Study validity
The ﬁnal role of any study is to convince. Thus, in conduct-
ing a study, it is essential that people reading your ﬁnal report
are prepared to accept that your conclusions are valid.Inter-
nal validity refers to the extent to which your conclusions
apply to the people actually studied, whereas external valid-
ity refers to the possibility of generalizing such conclusions
to a wider population, whether persons, settings, or times.
You must expect that the recipients of the ﬁnal report will
closely and critically question all of these.
Internal validity
Relationship validity The ﬁrst and most fundamental type
of questioning will be whether the relationships that you have
found between the outcome and the sources of explanation
are valid. Threats to this can come, for example, from claims
that the relationships simply arose by chance in the sample
examined or that the way in which you collected the data was
biased.
Major problems may arise from:
biases in study design or implementation;
too much random variation in measurements;
the sample size being too small to detect a relationship,
called lack of power;
applying an inappropriate statistical analysis;
a relationship not being stated in the protocol, but found
by ‘data dredging’ after the data have been collected.
10 Planning a study
Causal validity If the relationships found can be accepted,
your second claim may be that they are causal. Causal re-
lationships are most easily studied in closely controlled cir-
cumstances, but this will limit generalization of conclusions.
Challenges to causal conclusions can arise, for example, if
some other source of explanation can be invoked as inﬂu-
encing the outcome or if it can be argued that causality could
be in the opposite direction. This can always occur if you
have not used an intervention in the study and if you have
not randomized the subjects involved to the treatments (Sec-
tions 1.4.1 and 3.1.3).
Instrument validity Even if the critics can accept the ex-
istence of a relationship, and, if applicable, the fact that it
is causal, they may claim that the empirical phenomena ob-
served do not correspond to the theoretical concepts pro-
posed. In other words, the causal relationship that you have
found is not what you claim it to be. Problems of instrument
validity can arise from the ways in which you have measured
either the outcome or the sources of explanation. It may be
as simple as a bias, but may be related to complex problems
of measuring attitudes and opinions (Sections 1.2.4 and 2.3).
External validity
If the relationships you have found within a sample of peo-
ple can be accepted, whether claimed to be causal or not,
you must then ask to what extent they are generalizable to
other people, in the same time and location, or elsewhere. If
there is an interaction between a source of explanation and
the type of subject, the setting of the study, or the time of the
study, generalization of the results will be questionable. The
1.2 Protocols 11
essential technique here is the random selection of a sample
from an appropriately and widely enough deﬁned population
of interest (Sections 1.4.1 and 2.1.2).
Experimental trials, because they apply interventions by
using random allocation of treatments but involve no random
selection of subjects, will generally have high internal va-
lidity but questionable external validity. In contrast, sample
surveys, with their random sample selection but no treatment
allocation, will have high external validity but no causal va-
lidity. Thus, when human beings are involved, internal and
external validity are often in conﬂict. The use of strict con-
trol and homogeneity within a study will allow you to detect
relationships more easily but will restrict the breadth of ap-
plication of your conclusions.
1.2.2 Question investigated
The ﬁrst step in preparing a protocol is to translate your
vague general objectives, that have made a study necessary,
into more detailed and speciﬁc objectives. This may entail
developing working hypotheses that you can empirically test
by the study. For all of this to become operational, you will
usually have to choose some speciﬁc observable outcome as
the principal object of study to be explained: being cured of
a disease or becoming enrolled at school, for example. Such
an outcome may be more or less ‘natural’, but you should al-
ways carefully construct and deﬁne it in an appropriate way.
Be wary of predeﬁned administrative or common-sense cat-
egories.
A number of steps are generally useful in developing the
12 Planning a study
central theme of the study:
Search the literature to ﬁnd other similar studies already
available, whether in the same or other countries.
Study the appropriate literature to discover the most suit-
able techniques of design and analysis for such a research
project.
Meet the people concerned to discuss the means of opera-
tionalizing all aspects of the study.
Plan the budget carefully to ensure that the objectives can
realistically be met.
Where necessary and possible, consult outside experts.
1.2.3 Population and time frame
Always try to have in mind a clearly deﬁned population about
which you plan to obtain information and a time frame to
which it will be applicable. This is a complex technical ques-
tion that I shall discuss in detail below and in the following
chapters.
If, as is usually the case, you cannot study the whole pop-
ulation, you must also clearly specify the means of objec-
tively choosing a representative sample of the appropriate
size. This is at the centre of the design of the study. The
principal designs will be described below.
In experimental trials, you can decide on the treatments
or procedures to be compared. In observational studies, such
ﬂexibility is not possible. Having decided on the types of
comparisons to make, search for some environment in which
it is possible to collect data to provide such comparisons.
Often, you must make do with comparisons that are far from
ideal.
1.2 Protocols 13
1.2.4 Instruments and measurements
Besides the principal outcome to be explained, you will want
to study the conditions under which it is produced (Section
1.3.3). This will require a careful operational deﬁnition of
the sources of exposure that could explain the observed dif-
ferences of outcome. If certain such conditions are to be un-
der the control of the investigators, clearly deﬁne the means
of assigning them to the subjects.
Instruments
The protocol must specify the instruments that will be used
to make the measurements, both of the outcome and of the
sources of exposure, as well as the investigators who will use
them. When necessary, it must also give the means of appro-
priate training of these investigators. Remember that a mea-
surement, whatever the instrument, involves many known
and unknown implicit theoretical variables, as well as un-
proven assumptions.
Three criteria are generally required for the evaluation of
any instruments to be used.
1. Validity: Several types of instrument validity are impor-
tant, although some are much more difﬁcult to judge than
others.
(a) Criterion validity involves assessing an instrument
against some accepted absolute standard.
(b) Construct validity refers to whether the empirical
phenomena being observed actually correspond to
the theoretical concepts you wish to study. You can
assess it by inspection of the pattern of relationships
14 Planning a study
between the instrument and other measures made in
completely different ways.
(c) Face or content validity involves checking if the in-
strument (usually a questionnaire) covers the range
of topics for which it is intended. A panel of experts
usually makes the judgements.
These are all connected to the internal validity, so that they
are also prerequisites for the external validity of the study
as a whole.
2. Reliability: An instrument is reliable if it is able to yield
the same results on repeated application. You may some-
times be able to accomplish this by looking at internal reli-
ability at a single administration. Thus, for example, split-
test reliability involves splitting your instrument (usually
a questionnaire) into equal halves and checking the de-
gree of agreement. The alternative is test-retest reliability,
but take care that subjects do not change in any important
ways between the two administrations. You should also
assess reliability of results among different investigators
using the instruments. Do this in normal operating condi-
tions, because intensive training or special expertise will
bias the results.
3. Sensitivity: The instrument should be able to detect scien-
tiﬁcally important differences, or changes over time. On
the other hand, you may waste money, and perhaps time,
if you use overly precise instruments.
Data recording
Common problems in data recording include:
unclear speciﬁcation of the data to be recorded;
1.2 Protocols 15
values need to be calculated instead of being entered as
observed (for example, age from date of birth);
too much data collected from each person;
poor quality of recorded data;
data recorded in a form unsuitable for transfer to a com-
puter.
In collecting the data, it is better to anticipate problems than
simply to wait for them to occur. Record all departures from
protocol.
Together, this and the preceding subsection constitute what
are classically called the material and methods.
1.2.5 Analysis and reporting
Arrange for all results to be directly produced in, or trans-
fered to, a form that is machine readable. At this stage, ver-
ify all of the data to identify errors, cleaning and correcting
as necessary.
Here, we are principally concerned with studies that re-
quire statistical analysis. You need only set out the main
lines of such analyses in the protocol; these should be fairly
ﬂexible. Two main phases will be involved:
1. selecting among all possible statistical models those which
are most appropriate to describe the patterns of interest in
the data;
2. providing measures of precision of the unknown quanti-
ties in these models that are calculated, or estimated, em-
pirically from the data.
Your choices will depend primarily on the type of outcome
you have chosen for study.
16 Planning a study
Although the operations of statistical analysis are one of
the cheapest aspects of a complete study, the time required to
carry them through is often the most underestimated aspect
of a study. Masses of unanalysed, and hence wasted, data
that cost a great deal to collect lie stocked throughout the
world.
Finally, you must report the results obtained in a form that
is understandable by the audience to whom it is addressed.
For these results to be convincing, your report must cover a
clear description of all steps of the study to provide evidence
that you carried it out in an objective and complete manner.
1.2.6 Monitoring the study
Follow your study closely to ensure that all aspects of the
protocol are respected. Monitoring will be particularly im-
portant if:
there is an intervention;
there are several centres collecting data;
the study extends over a considerable period of time.
Monitoring can serve a number of other functions as well,
including:
ﬁnding errors in reporting, if data are being entered in the
computer as they are recorded;
sustaining motivation by providing preliminary general
results, called interim analysis;
if an intervention is involved,
detecting adverse side effects;
1.3 Protocols 17
allowing the study to stop early if the intervention
proves either ineffective or very effective.
Interim analysis is particularly delicate in any study. If you
disclose partial results, this may inﬂuence future responses
still to be recorded.
1.2.7 Administration
Carefully plan the project management. This will include:
Who has overall responsibility for the project?
Who is in charge of various areas of the work, possibly
divided both geographically and by subject?
How are the various activities to be coordinated?
What is the timetable?
You will have to establish a detailed budget covering:
staff salaries;
travel and subsistence;
consumables, including general running costs and materi-
als;
equipment;
overheads.
Before going ahead, be sure that you will have adequate
funding available.
18 Planning a study
1.3 Types of observations
1.3.1 Choice of subject areas
A basic problem is to select the most relevant items of infor-
mation or types of observations from all those that it is prac-
tical to collect and that might conceivably have a bearing on
the subject you are investigating. You may take a number of
steps to resolve this problem:
1. Determine the details of the information required to deal
with the problem.
2. Consider whether there are any related problems of impor-
tance on which this information, possibly supplemented to
some extent, would throw light.
3. With the whole ﬁeld mapped out in this way, consider the
practicality of obtaining the necessary information cover-
ing any (sub)set of these problems.
4. Take ﬁnal decisions on the inclusion of each point in light
of the relative importance of the problems and the total
load possible to impose on the investigators and on the
subjects who will be involved.
The items of information that you will collect should form a
rounded whole, covering a coherent area of interest.
You will only be able to collect accurate information if
you obtain the full and willing cooperation of the investi-
gators and of the subjects. Your study should have a clear
purpose that you can explain to them, and the material that
you collect must be relevant to this purpose.
You can distinguish three main types of observations that
you will frequently need in a study:
1.3 Types of observations 19
1. objective facts;
2. opinions, attitudes, and motivations;
3. personal knowledge.
These will generally require different means of data collec-
tion and will be used for different purposes. If the observa-
tion unit is not human beings but, say, groups, only the ﬁrst
is usually pertinent.
1.3.2 Outcomes
Your primary observation on each subject will be that of the
phenomenon or outcome to be explained. The statistician
calls this the response variable.
We may distinguish the prevalence of the phenomenon
from its incidence. The former is concerned with the study
of all existing cases that have the characteristic in which we
are interested, as compared to those who do not. It refers to
the probability of a case in the population whereas the latter
is concerned only with new cases, referring to the rate,risk,
or intensity of its occurrence.
Response variables can take a number of forms that will
determine how the data are to be analysed:
In many ﬁelds, the most common outcome type is binary,
taking only two values, such as yes or no. Then, the phe-
nomenon studied is the proportion of units in the popula-
tion in each of the two categories: the proportion of chil-
dren attending school or of people who are cured.
A second common type of phenomenon to explain in-
volves a count of something: the number of times a child
has failed at school or the number of infections.
20 Planning a study
Finally, a quantitative measurement may be made. Two
forms of such measurements can be distinguished:
The observations can take any value, positive or neg-
ative. This is the main type of observation treated in
many classical statistics books where the bell-shaped
normal curve is emphasized, but is rare in practice.
Only positive values are possible, for example, length
or duration in time, such as survival or length of un-
employment.
You may also record other types of responses, such as mem-
bership in one of a number of categories. These may be un-
ordered or ordered, called respectively nominal and ordinal.
Take special care in the selection and construction of the
response variable, because the success of your study depends
upon it.
1.3.3 Sources of explanation
The second type of observations on each unit will be the
characteristics or sources of exposure that hopefully will ex-
plain at least some of the differences in the observed val-
ues of the response variable. The statistician calls these the
explanatory variables. However, care must be taken with
this term, because research workers in many disciplines call
such observable quantities the parameters. As we shall see
(Section 4.3.1), this latter term has a very different sense for
the statistician, leading to problems of communication and
to misunderstandings.
Explanatory variables take two main forms:
1.3 Types of observations 21
two or more qualitative categories that separate the popu-
lation into subgroups, such as sex, marital status, and so
on;
measured quantities, such as income.
Such variables can be useful in three main ways:
as descriptive categories among which the response varies,
such as sex or geographical region (in most contexts);
as explanatory, but unmodiﬁable, characteristics, such as,
for adults, amount of formal education or childhood ill-
nesses;
modiﬁable explanatory factors, such as accessibility to a
public facility.
Of course, modifying an explanatory factor is only useful if
it is a causal factor, as discussed below.
The tendency is often to accumulate a vast number of ex-
planatory variables, but judicious choice of a relatively small
number is usually preferable for a number of reasons:
The cost of data collection and analysis will otherwise be
unduly increased.
The time required for each respondent to provide the in-
formation should be limited so that you obtain reliable
data.
Large databases increase the risk of recording and man-
agement errors.
If you collect a large number of explanatory variables, the
data analyst will be overwhelmed, and probably will be
obliged to ignore many of them.
22 Planning a study
The number of explanatory variables showing relation-
ships to the response variable just by chance will also be
increased.
Among other things, statistical analysis serves to determine
which explanatory variables appear to have links to the re-
sponse variable of interest, and in what way, given the in-
herent uncertainty arising from the variability when only a
sample from the population of interest is observed.
1.3.4 Confounding
Many factors usually inﬂuence a response of interest, not all
of which can be investigated at any one time. Any factor
along with which the response varies is called a confound-
ing variable. If it is unequally distributed in the groups be-
ing compared, it will give rise to differences in the response
among the groups, distorting the comparison under study.
Consider, for example, alcohol consumption, smoking, and
lung cancer. Smoking and drinking tend to vary together.
Hence, one might be led to conclude that lung cancer is
caused by drinking.
In an experimental trial, you have three weapons to handle
extraneous variables not under your direct control:
1. strict regulation of experimental conditions to reduce the
effects of such variables;
2. direct measurement of such covariates to allow for them
by matching, blocking, or stratiﬁcation (Section 3.1.3) in
the analysis;
1.3 Types of observations 23
3. randomization of treatment assignment to make the aver-
age effect of confounding variables the same in all treat-
ment groups.
In observational studies, only the second strategy (Section
2.5.2) is generally possible. Your choice of environment is
limited by the availability of the comparisons to be made.
By deﬁnition, treatments are not assigned, randomly or oth-
erwise, in an observational study.
Thus, in an observational study, confounding variables
can be controlled by:
stratiﬁcation;
matching similar individuals;
measurement of concomitant explanatory variables.
These will be further discussed below.
1.3.5 Accuracy and precision
You must design any study in such a way that you can actu-
ally attain your desired objectives. Here, the accuracy of the
results is their lack of bias, that is, you are actually measur-
ing what you want to study. You must distinguish this from
the precision of the results, the range of values within which
what you are studying is almost sure to lie, usually assuming
that the measurements are accurate.
Inaccuracies result from systematic biases in the methods
of collecting data, particularly from:
the selection of the individuals to observe – coverage er-
ror;
missing responses – non-response error;
24 Planning a study
the mode of data collection, as for differences in response
by mail, by telephone, and in person design error;
the validity of the instruments used to make the observa-
tions – instrument error;
effects on response due to the way the instrument is ad-
ministered – investigator error;
the accuracy of the information provided – respondent er-
ror.
You can never improve accuracy once the observations are
made. In addition, to measure any study biases, you will
require data external to the study itself.
The precision will depend primarily on:
1. the intrinsic variability of whatever is being observed, this
generally being relatively large for human subjects;
2. the number of individuals upon whom observations are
made and, to a lesser extent, the number of observations
per individual;
3. the actual design of the study;
4. the precision of the instruments used;
5. to a minor extent, the type of analysis performed.
The ﬁrst three points determine the sampling precision.
The standard error is a crude measure of the precision
of an estimate obtained in a study. It is a function of the
variability of the population, as measured by the variance
or its square root, the standard deviation, and of the sample
size, decreasing as that size increases.
If the variance is σ2, then the standard error is σ/n,
where nis the sample size. Approximately one-third of the
1.3 Types of observations 25
observable random variability in an estimate will be greater
than the standard error and one-twentieth greater than twice
the standard error.
An estimate will be biased with respect to the population
of interest if the study fails to include certain units because
of coverage or non-response errors. In the same way, the
standard error can only measure variability among samples
due to not including all of the population in the observed
sample; it does not take into account non-coverage and non-
response.
Although only providing a rough estimate of precision for
almost all types of response variables, the standard error will
be useful for calculating the size of a study.
Lack of accuracy immediately places in question the value
of any results and conclusions, whereas lack of precision
generally only increases the uncertainty surrounding the ex-
act values calculated.
1.3.6 Missing values
Non-respondents are almost always different than those who
agree to respond, although the amount of difference may
vary among questions to be answered. Thus, if non-response
is not restricted to a small proportion of the sample, no gen-
eral validity can be claimed for any conclusions drawn. Make
every effort to reduce the number of missing values. At the
same time, forcing people to participate or to reply to spe-
ciﬁc questions can bias the results because answers will not
be reliable, and perhaps not even relevant. Institute a rigor-
ous system of dealing with the non-response problem from
the outset of the study.
26 Planning a study
Non-response may involve all answers for a given indi-
vidual or only some of the answers. Many of the reasons
may be similar in the two cases. Keep all respondents in the
study who supply at least an answer to one question. They
can furnish information about reasons for non-response.
The bias of non-response is approximately proportional to
the rate of non-response (R) times the difference in param-
eter value (φAφM) between the group answering and the
missing group. Thus, increasing the response rate does not
necessarily reduce bias if the missing group becomes much
more extreme so that the difference between the two groups,
φAφM, increases more rapidly than Rdecreases. Note,
however, that the situation is usually even more complex than
this because both the response rate and the parameter value
will differ among types of non-response: for example, not
contacted, incapable of replying, and refusal.
Substituting other individuals for the non-respondents is
usually a mistake because the replacements will resemble the
respondents, not the missing ones. It is not sufﬁcient to plan
for a sample of 1000 when 800 are required and 20% are
expected to be missing. This is in no way equivalent to a
complete random sample of 800.
In repeated surveys, such as panels, and longitudinal ex-
periments, reduction of non-response may be especially im-
portant because it will tend to increase progressively. A con-
tinually larger number of missing respondents, called drop-
outs, can indicate that something is wrong, so that the study
should either be reorganized or abandoned. In surveys, col-
lection of information about friends and relatives of the par-
ticipants at the beginning of the study can be helpful in trac-
1.4 Study designs 27
ing those who disappear. However, in other cases, study of
the dropping-out process may be important in its own right,
as when it involves drug side effects. Then, you should not
discourage it but allow it to proceed ‘naturally’.
1.4 Study designs
Optimization of study designs has (wrongly) primarily been
concerned with obtaining maximum sampling precision for
the least cost. It generally ignores questions of bias arising
from questionnaires, investigator training, and so on, only
taking into account those connected with missing those mem-
bers of the population of interest not in the sample. In choos-
ing a design, you must make choices as to using resources to
maximize response rates, improve instruments, and so on, as
well as increasing sampling precision.
1.4.1 Population and sample
Any group of individuals that you wish to study will be called
the eligible population. You must clearly deﬁne it in such a
way that you know what individuals belong to it or, at least,
so that you know if any given individual belongs to it. It
may often be desirable to deﬁne the population in such a way
that it will make the study comparable with previous existing
studies, whether in the same or other countries.
Often the eligible population is a subset of some larger
source population. In practical situations, the latter will con-
tain four subgroups:
1. the eligible;
2. the adequately assessed ineligible;
28 Planning a study
3. the assessed but unclassiﬁable because of incomplete in-
formation; and
4. the unassessed, due to lack of resources, unavailability,
and so on.
Thus, for example, special difﬁculties will occur if the pop-
ulation contains ‘ﬂoating’ elements such as the homeless or
nomads. In certain cases, for reasons of expense, you may
have to exclude them. This may sometimes be justiﬁed by
their differing fundamentally from the rest of the population.
If they are important, a separately constructed study may be
necessary.
Once you have deﬁned, and enumerated, your population
of interest, your problems do not end. Certain members may
not be accessible, perhaps because you cannot locate them
or because they are incapable of or unwilling to participate
in the study.
To have practical value, the results of a study will gener-
ally need to be applicable to subjects other than those in the
eligible population, for example to those who will enter that
eligible population in future years. Thus, you will aim to ap-
ply the results to some target population. In contrast to the
other two populations, this one is usually not ﬁxed.
In most cases, the population will be so large that you
cannot possibly observe all of the individuals in it, whether
because of time constraints, expense, or other reasons. Then,
asample is any subgroup of the population that you choose
to observe. Thus, you will have a ﬁve-level hierarchy from
the source population to the sample:
1. source population;
1.4 Study designs 29
2. target population;
3. eligible population;
4. accessible population;
5. sample.
Your selection criteria for inclusion in the sample will deter-
mine the external validity of the results of a study completely,
in so far as they are actually fulﬁlled, and the internal validity
to a large extent.
Once you have clearly deﬁned the population, you will
have to make certain fundamental choices as to the appropri-
ate design of the study. Several basic principles are common
to all designs. Two of the most important are the following:
1. Randomize wherever possible to maintain objectivity.
2. Calculate the minimum necessary sample size so as not to
waste resources.
Let us look at these in turn.
Randomization
As we shall see, randomization is used in selecting a sample
from a population (Section 2.1.2) and, when some interven-
tion is involved, in assigning subjects to groups receiving the
different treatments (Section 3.1.3).
The term ‘randomness’ is an everyday common-sense no-
tion that does not generally agree with what statisticians mean
by the term. It is often associated with the idea of haphazard-
ness. This latter term rarely if ever corresponds to the truly
random in the sense deﬁned below but often simply means
30 Planning a study
that an event has no obvious explanation. In statistical ran-
domness, the probabilities of the various possible events un-
der consideration are (in principle) exactly known, whereas
they are not in the everyday usage.
A very long completely random sequence of digits has the
following characteristics:
Each digit occurs equally frequently.
Adjacent digits and sets of digits are independent of each
other, so that you cannot predict the following digits from
previous ones.
Reasonably long sequences show regularity, such as about
100 ones in a series of 1000 random decimal digits.
Randomness is thus a property of the whole sequence, or
more exactly of the process that generated it. You cannot
judge a shorter subsequence drawn from it in isolation as to
its randomness without knowing its source.
Traditionally, tables of random numbers were used. Now,
you will usually generate such numbers by statistical soft-
ware on a computer. As we shall see, these are used in vari-
ous aspects of the design of a study.
Sample size
Sample size will largely determine the precision of your re-
sults. Always calculate it before beginning a study. The tech-
nical details will be given in the following chapters; see, par-
ticularly, Sections 2.6 and 3.3.4.
If it is impossible for you to ﬁnance a sufﬁciently large
sample, so that the precision will be too low to draw useful
1.4 Study designs 31
conclusions, then you should probably abandon the idea
of making such a study.
If your planned sample size is too large for the precision
required, you will unnecessarily waste resources.
In the ideal case, you should specify the precision and then
calculate the corresponding sample size, but this is often not
possible and you must use the reverse process, of calculating
the precision for a feasible sample size.
However, sample size is not the only important determi-
nant of cost to take into account. Maximization of response
rates and improvements in instruments and investigators are
also both important and costly. It is usually difﬁcult to weigh
the relative beneﬁts of each.
Unfortunately, sample size, and the resulting precision,
are easily measured so that effort is often concentrated on
it, at the expense of biases from non-response and inaccurate
answers. Ignoring the latter in your calculations can lead you
to greatly overestimate accuracy and precision.
1.4.2 Types of designs
A number of different basic organizations of a study are pos-
sible.
Prospective designs
In a prospective design, you sample individuals from a pop-
ulation and then follow them over a certain period of time,
recording new events. In principle, the idea is to start with
groups having different values of some important explana-
tory variables, that is, different sources of exposure, and to
32 Planning a study
follow them to see if different distributions of the response,
the outcome, result.
We can distinguish three cases:
1. In an experimental trial, such as the clinical trial often
used in medical studies, you will randomly allocate the
subjects to one of a number of different treatments before
the following observations.
2. In a follow-up study, you will follow distinct groups with
different exposures, called prognostic or risk factors, to
determine if they ﬁnally give a different response.
3. In a panel design, sometimes misleadingly called a cohort
design, you simply observe all variables repeatedly as they
occur over time.
In a cohort design, people of given ages are followed (strictly
speaking, a cohort consists of all of the age group). This may
be either prospective or historical.
Follow-up and panel studies are important for their ease
of ensuring representativity, at least at the beginning, before
drop-outs occur. Panel studies also cover the time dimension
in a population. You may make observations more or less
continuously, for example, using diary cards, or at intervals
of time, such as once a year. However, such studies will only
provide you with information about evolution as is, without
any intervention, whether voluntary or external.
Experimental trials have the big advantage of allowing a
direct causal interpretation because you have applied an in-
tervention, but the major limitation of being impossible in
most human situations. Even when possible, ethical consid-
1.4 Study designs 33
erations mean that you can usually only enroll unrepresenta-
tive volunteers.
Most trials look for a difference between two or more in-
terventions or treatments. A particularly difﬁcult type of trial
to conduct is the equivalence trial, where you wish to de-
termine if a new treatment is equivalent to the existing one.
Does a new medication or teaching strategy provide as good
results as that currently in use? If your study is too small, it
will be incapable of detecting a difference so that you may
draw the wrong conclusions from too little information.
Cross-sectional designs
In a cross-sectional design, you simply record all variables
on observed sampled individuals at one given ﬁxed point in
time. You can use them to study the state of a given popu-
lation, for example, the prevalence of some condition. They
are the easiest type of design to ensure a representative sam-
ple, by randomization, but have the major handicap of lack-
ing a time dimension.
Do not confuse a series of cross-sectional studies using the
same questionnaire with a panel study. In the former, differ-
ent people are involved each time, whereas, in the latter, the
same are used.
One special type of cross-sectional design is sometimes
used. Aggregate measurements of some characteristics are
compared across population groups, usually geographically
deﬁned, in an ecological design. Thus, you might want to re-
late the success rate in schools to the class size without taking
into account individual student and teacher characteristics.
34 Planning a study
Retrospective designs
In a retrospective design, you choose subjects according to
their response values, their outcomes, and then obtain the
values of the explanatory variables, the exposures. Thus, in
contrast to all of the previous designs, here the explanatory
variables are subject to variability, whereas the responses
may be ﬁxed and known. Many cross-sectional studies are,
in fact, retrospective, because many questions apply to past
history.
The major advantage is the speed with which you can ob-
tain results, whereas the major problem is distortion of in-
formation as you try to go further back into the past. For
example, you generally cannot use such a design to assess
prevalence: a currently representative sample is not repre-
sentative in previous points in time because of differential
mortality, and so on.
In certain circumstances, this may be the only design pos-
sible. In epidemiology, it is often only after the victims have
appeared that the origins of an epidemic can be studied!
Acase–control design is a special type of design, usually
retrospective, where a number of cases having a given char-
acteristic are available. You then match these with similar
control subjects who do not have the characteristic in an at-
tempt to distinguish inﬂuential, hopefully causal, factors that
occurred in the past. This design is often used when one of
the response events is uncommon, as for a rare disease, be-
cause a prospective study would require an enormous sample
to obtain even a few individuals with the event. In such a sit-
uation, this design is highly efﬁcient in terms of the number
1.4 Study designs 35
of subjects and the time required. But such studies are more
difﬁcult to design properly, especially because of the choice
of control group, than are standard prospective studies.
1.4.3 Causality
When you observe the individuals in a sample survey at a
given point in time, one kind of information that you should
obtain concerns what you might observe if you chose another
sample from the same population. However, to do this, you
must assume that no change is taking place in the population
between the two sets of observations. This is static informa-
tion. Even if you observe the same sample over several time
points, the information you obtain only refers to the evolu-
tion at those time points. You may extrapolate into the future,
but this will only yield valid results if all of the conditions of
change remain ﬁxed, as previously observed.
In contrast, operationally, causality, in a statistical con-
text, implies that changing one (explanatory) variable will
produce changes in the distribution of another (response)
variable. This differs fundamentally from the inferences you
can make from surveys where you must assume that the pop-
ulation remains the same or continues to evolve in the same
way. You cannot empirically study causality simply by tak-
ing static samples from a population, even by following them
over time. Notice that, with this deﬁnition, an explanatory
variable such as sex could not be a cause.
Causality, as so conceived, is a group or collective, not
an individual, effect: two interventions cannot generally be
compared on the same individuals (certainly not simultane-
ously), but only on two different groups (for an exception,
36 Planning a study
see Section 3.3.1). You then study differences in the distri-
butions of groups of responses.
The cause of some effect is often not unique. Both better
manuals and superior teaching can improve student perfor-
mance; several different drugs may cure the same illness.
Causality also implies a time sequence: an effect cannot
occur before its cause. Theory should specify some time in-
terval within which the effect will occur or last for appro-
priate measurement to be possible. However, in many cases,
this ordering may not be obvious. For example, many dis-
eases have a considerable latency period before the symp-
toms appear. It may not be possible to eliminate events oc-
curring during that period as potential causes because the
time of the true onset of the disease is not known. Even if
the effect of some cause is theoretically instantaneous, test-
ing the relationship will require a temporal precedence of
intervention before effect.
In pure science, one searches for the causes of a given
effect. In strictly applied work, one asks if a given cause
(treatment) will produce the desired outcome. However, the
latter is usually the required method for empirical study even
in pure research.
Thus, instead of taking a sample of individuals from the
population and observing the values of the variables that they
have, as in a survey, suppose that you can select the indi-
viduals and then control them by giving them values of the
variable(s) that you think are causes. This type of planned
intervention is called an experimental trial. As we have seen
above, the assignment of such values is usually done ran-
domly, for the same reasons as in choosing a sample from a
1.4 Study designs 37
population, especially to eliminate biases.
To perform an experiment, you must have:
at least an approximate theory predicting the size, or di-
rection, of effect of an intervention;
a suitable group of subjects prepared to give consent to the
intervention;
means of (unrealistically) isolating the phenomenon stud-
ied from external sources of inﬂuence;
stable responses whose only reason for changing over time
is the treatment variable;
measuring instruments whose resolution (precision) is ﬁne
compared to the size of the predicted effect.
In studies involving human beings, causality is thus very dif-
ﬁcult to ascertain empirically, which is not to belittle its ex-
treme importance. Think of the relationship between smok-
ing and lung cancer. The debate lasted for many years, al-
though sampling from existing populations showed a strong
association whereby proportionally more smokers had lung
cancer. But an experiment could not be performed in which
some people were randomly chosen and told to smoke and
others not, after which cancer incidence would be observed
in the following years.
In such cases, where an intervention is not possible, the
best plan is to attempt to discover as many different con-
sequences as possible of the causal hypothesis under study.
Thus, for example, with smoking and lung cancer, we could
look at the death rate for:
people smoking different amounts in the same time;
38 Planning a study
those smoking the same amount but in different times;
ex-smokers and current smokers of the same amount;
ex-smokers smoking different amounts;
ex-smokers smoking the same amount but stopping at dif-
ferent times in the past;
(see Cochran, 1965).
Take particular care when drawing conclusions from an
ecological design. Such studies may provide clues to rela-
tionships among individuals but can suffer from the ecologi-
cal fallacy. Suppose that you make a study in a set of groups
or clusters, say geographical regions, and that you have avail-
able global measures of some response and a corresponding
source of exposure for each cluster. Although the two mea-
sures vary together, this provides no direct evidence of any
links between the response and the exposure at the individual
level, one of the main reasons being that unavailable con-
founding factors could explain the relationship. Individual
success at school may not be linked to class size even al-
though the overall school success rate is; larger classes may
be found in more deprived neighbourhoods, with individual
success depending on social class.
Both in a static survey and in an experiment, you may
ﬁnd a relationship of dependence between two variables. The
statistical procedure to describe the relationship may be the
same in both cases. But your conclusions about the mean-
ing of the relationship must depend on the way in which you
collected the information. No mathematical manipulation of
the data afterwards can change this. You can only directly
study causality empirically if you can manipulate the appro-
1.4 Study designs 39
priate explanatory variables. You can only draw causal con-
clusions from a survey, without experimentation, by making
empirically unveriﬁable assumptions.
1.4.4 Choosing a design
Many of the points already discussed in this chapter can be
summarized by considering issues surrounding the choice of
a study design.
The ﬁrst question that you must decide in selecting a de-
sign is whether an intervention will be involved or not. Ex-
perimental trials have the enormous advantage of allowing
causality to be empirically studied without untestable hy-
potheses. However, they often may require a very long study
duration between intervention and effect. When a trial is eth-
ically and logistically possible, it is usually preferable. How-
ever, most studies on human beings are not experimental.
It cannot be emphasized enough that not all kinds of de-
signs will allow you to draw the same types of conclusions,
in particular those about the causal effect of one variable on
another. Only an experimental trial can answer such ques-
tions clearly.
Observational studies have the signiﬁcant advantage of
generally providing no added risk to the people involved.
Among such studies, a descriptive survey is designed to esti-
mate some simple characteristics of a population, whereas an
analytical survey is to investigate associations among such
characteristics.
Retrospective and prospective studies are both longitudi-
nal designs. They can provide information about processes
over time. However, in a prospective study, current practice
40 Planning a study
may change in unforeseen ways over time, making the ﬁnd-
ings irrelevant.
With a prospective study, the sample can usually be clearly
deﬁned and chosen to be representative of the population of
interest. This is often much more difﬁcult for a retrospective
study. However, the prospective study can be more subject to
missing data, especially drop-outs. Retrospective studies are
generally completed much more quickly and cost less, but
are subject to increasing inaccuracy as you go back in time
(unless suitable written records are available).
An ecological design is particularly important when the
characteristics of interest are relatively homogeneous in each
area and measurement errors on individuals are relatively
large. Then, contrasts among regions, for example, among
cultures, may provide the evidence you require. Thus, for
example, in descriptive epidemiology, ecological evidence,
by comparison among countries, has indicated links between
diet and cancer.
1.5 Summary
Random variability in observations makes statistical proce-
dures necessary. Statistics can help you in all stages from
setting up a study to analysing and reporting it. If you expect
to call upon a statistician for help in the analysis, involve her
or him from the beginning of the design stage.
One of the most important distinctions is between making
passive observations of subjects and an intervention. Only
the latter will allow you to draw causal conclusions without
making empirically unveriﬁable assumptions, but, in most
1.5 Summary 41
situations, it is impossible to perform with human subjects
for ethical or other reasons.
Do not begin any study without preparing a detailed proto-
col outlining all steps of the procedures to be followed. The
two main types of observations that you will make are the
outcomes to be explained and the corresponding sources of
explanation. You must take into consideration many factors
that will control their accuracy and precision. You can most
easily manipulate sample size, but others, such as instrument
biases and missing data, will generally be much more impor-
tant.
In choosing the design of the study, deﬁnition of the pop-
ulation of interest is a ﬁrst important step, followed by ran-
domization where possible for all relevant aspects, especially
in choice of sample and assigning intervention treatments.
Designs may be retrospective, cross-sectional, or prospec-
tive, each with their particular advantages and disadvantages.
You must seriously consider all of these aspects of a study
before beginning the actual data collection. When you have
made the appropriate choices, you should state them clearly
in your protocol, a document that will help you to justify the
objectivity of your work when you make the ﬁnal report.
2
Sample surveys
2.1 Sampling
In observational studies, one of the ﬁrst choices to make is
whether to study the whole population or only a sample from
it.
2.1.1 Samples versus censuses
In extreme cases, where you require information on all indi-
vidual units, you must make a complete census of the popu-
lation. Generally, the cost in effort and expense required to
collect information is less, per unit, for a census than for a
sample. However, if the size of the sample needed to give
the required precision represents only a small fraction of the
total population, the total effort and expense required to col-
lect information by sampling methods will be much less than
that for a census.
A sample generally has a number of advantages over a
census:
A full census may be impractical because of the cost, time,
and effort involved.
2.1 Sampling 43
You can much more easily ensure the completeness and
accuracy of the results if you only collect information from
a small proportion of the population. Generally, forms are
more completely and more accurately ﬁlled in. Further-
more, you can make more detailed checks of the quality.
You can obtain more detailed information about each unit
in a sample, even with a smaller total volume of data.
You can generally obtain results much more quickly by
means of sampling than by a complete census. This is
especially true at the stages of collection and recording.
Sampling using interviewers is necessary in a population
where many people are illiterate and could not ﬁll out a
census form.
The amount of information that you will obtain from a sam-
ple depends on its absolute size, not on its size as a propor-
tion of the population, at least when the proportion is small.
Your ﬁnal choice between a sample survey and a census
will usually depend on which gives you the highest degree
of precision and accuracy for the least cost. The latter will
depend on a number of factors, including:
the amount of information required per individual;
the number of individuals to be covered;
the way in which individuals are distributed in the region
studied and their accessibility;
the size of the region studied and the quality of the trans-
portation and communication network;
the type of instruments used;
the qualiﬁcations and training of the investigators using
the instruments.
44 Sample surveys
You must carefully weigh all of these factors.
2.1.2 Random sampling
One can imagine choosing a sample from a population in a
wide variety of ways:
a readily accessible group;
haphazard choice, most often used in experimental situ-
ations, where it is vaguely and implicitly assumed that
items selected are typical;
expert choice or judgement sampling of some representa-
tive members;
volunteers, where speciﬁc changes of behaviour must be
accepted, again most often in experimental contexts;
quota sampling, used in opinion polls and market surveys,
whereby the interviewers themselves build up a sample
roughly proportional to the population on a few demo-
graphic variables.
All of these procedures have at least two major disadvan-
tages:
1. they are always biased in unknown ways with respect to
the population;
2. they do not allow any statistical calculation of precision
of the estimates.
You will not be able reliably to generalize the results of a
study based on any of these choices of sample to any known
population.
2.1 Sampling 45
Easily accessible groups are often unique. Experts rarely
agree. Volunteers, by deﬁnition, are exceptional. Interview-
ers tend to select subjects who are easy to ﬁnd, who are likely
to be cooperative, or who they think may beneﬁt from the
study. In all cases, the bias is constant with sample size,
never decreasing as more individuals are observed.
Usually, you will wish the sample to be ‘representative’ of
the population; you want the individuals to be exchangeable,
as far as possible, for all of their speciﬁc characteristics that
are not of interest. You can only accomplish this by choos-
ing a random sample: every member of the population, in-
dependently, has a known, non-zero probability of being se-
lected for the sample. Thus, usually, you will require that the
observations selected from the population be independent:
observing one tells you nothing about which others may be
selected. In contrast, with haphazard selection you do not
know the probabilities of selection.
With random selection, you have the best chance of a rea-
sonable and unbiased balance of the unknown characteris-
tics, although this cannot be guaranteed.
2.1.3 Observational and sampling units
The observational unit is the entity about which you are di-
rectly collecting information. This does not mean that you
must obtain all information from that unit. If you are study-
ing children, you may require relevant information about their
family, the school, the village, and so on. Often, the only way
that you can obtain it accurately is directly from each such
group.
The unit of observation may not be the same as the sam-
46 Sample surveys
pling unit, which is the entity chosen at random from the
population. Thus, the sampling unit might be the family,
whereas the observational unit might be the eldest child or
even all children in the family. Sampling units may not be of
the same size, but may contain differing numbers of obser-
vational units. Several different levels of sampling units may
be necessary in the same study, such as school, classroom,
and child.
Generally, for a given sample size of observational units,
the smaller the sampling unit employed, the more precise
and representative the results will be. This is because ob-
servational units within a sampling unit tend to be similar,
providing less information than independently chosen units.
This requirement often conﬂicts with costs because larger
sampling units are generally easier to observe.
If you are interested in the inter-relationships among mem-
bers of a group, then collect information on such groups as a
whole, or at least on pairs of units within such groups. Sim-
ilarly, if you are interested in inter-relations among the be-
haviours of the same individuals at different time points, de-
sign the study so as to provide information over an adequate
time period.
Thus, your choice of a sampling procedure will depend
not only on the relative precision of the competing tech-
niques but also on practical considerations. The most suit-
able method will depend on the type of information already
available about the population. For example, you should not
oblige the investigators to travel excessively and you should
subject them to proper supervision and control.
2.1 Sampling 47
2.1.4 Sampling frame
Asampling frame is a list of all sampling units in the popu-
lation. This is necessary in order for you to be able to make
a random selection. If no such frame already exists, its con-
struction may constitute a sizeable part of the work of the
survey.
Notice that it is not necessarily required that a list of all
observational units be available: the population of sampling
units and the population of observational units may not be
identical. When they are not, a random sample of sampling
units does not yield a random sample of the observational
units.
A sampling frame may be defective in a number of ways:
The information about the units may be inaccurate. Some
units may not even exist at all.
The information may be incomplete if certain units in the
population are missing. These may be random individuals
or whole categories of the population. The latter is much
more serious.
There may be duplication whereby certain units are in-
cluded more than once.
The sampling frame may be out of date, in that it was
accurate, complete, and without duplications at the time
of construction, but the population has changed.
You will generally be able to discover inaccuracies, and to
correct them, as the study progresses, as, in many cases,
you will ﬁnd the duplications. On the other hand, you will
not usually ﬁnd the coverage errors, due to incompleteness;
48 Sample surveys
these will often lead to some categories of the population
being under-represented. Thus, the former is measurable
whereas the latter is not.
Sources of frames
You can obtain information to construct a list of sampling
units in various ways.
Population census: A complete census will tend to be out
of date, especially because it can only be carried out in-
frequently and the results take a considerable time to be
released.
Administrative lists: Various administrative activities re-
quire lists of segments of the population. They will gen-
erally only be accurate, complete, and up to date if the ad-
ministration is very efﬁcient. Often, they are maintained
by local ofﬁces so that their accuracy may vary throughout
the country.
Lists of establishments: schools or hospitals (for cluster
sampling).
Lists of households or dwellings: Such lists, for example
for taxes or elections, have more permanence than lists of
individuals.
Town plans and maps: Considerable detail is required for
these to be useful. Otherwise, you may need to include all
dwellings in a block or other small region. Take care be-
cause equal-sized areas will not be equally densely popu-
lated. In rural regions, you may need to use natural bound-
aries, with unequal areas, because you cannot easily locate
rectangles marked on a map on the ground.
2.2 Organization 49
Lists of villages: If available maps are insufﬁcient for ru-
ral areas, you may use villages as the sampling unit. Take
special care if all individuals are not afﬁliated with vil-
lages.
Note that lists of individuals are not suitable if the sampling
unit is a larger entity such as the household.
2.2 Organization
A number of the ways in which a particular society is orga-
nized can play important roles in the success of a survey:
1. common language(s) so that communication is possible;
2. common assumptions and understandings;
3. freedom for interviewers to contact sampled people;
4. lack of fear of strangers;
5. trust that answers will be held conﬁdential;
6. belief that surveys are useful and informative.
Each can be a factor in non-response.
2.2.1 Types of surveys
We may distinguish three different types of studies:
1. collection of relatively simple facts covering the whole
population of a country and capable of giving separate re-
sults for small administrative regions, usually done by a
census;
2. surveys of the whole population of the country involving
more detailed information, but not providing details for
small regions;
50 Sample surveys
3. local surveys covering a small region to obtain detailed
information by ﬁeld investigators.
The ﬁrst type of study presents relatively simple sampling
problems but may be administratively complex. The third
type is usually simple both in sampling and administration.
The second type is the most difﬁcult, usually beneﬁting from
a relatively complex sampling design.
2.2.2 Administration
Timing of the study will be inﬂuenced by:
seasonal factors;
availability of an appropriate sampling frame;
holidays;
deadlines for the results, and so on.
However, the principal requirement will be that you conduct
it at a time representative of that to which you will apply the
results. Once you establish the timing, you must set up a
schedule for the various stages.
Costs must be estimated. These may include:
expert consultation fees;
sample selection;
printing questionnaires;
travel and subsistence expenses;
data entry, veriﬁcation, and analysis, including computers
and software;
preparing the report;
general overheads, including salaries.
2.2 Organization 51
Reasonable estimates are necessary to ensure that the study
is feasible.
The amount of administrative work will depend on:
the scale of the study;
the sample design;
the area covered.
In the ﬁeld, the main task will be to supervise the invest-
igators, whereas centrally it will be to direct the data record-
ing and analysis. Different people will often be responsible
for each. If the area covered is large, you may require re-
gional centres for the ﬁrst task. Use existing administrative
and ofﬁce organizations where possible.
2.2.3 Ethical questions
A basic principle for any study should be that those involved
should be part of a population that is in a position to beneﬁt
from the results of the study.
You must use ethical means to obtain any list of people to
be used as the sampling frame. Many countries have data
protection acts that limit access to lists of names and ad-
dresses. You may need to recontact for permission any peo-
ple who have supplied information without being explicitly
told that it would be used for research. This can incur sub-
stantial non-response even before data collection begins.
Any collection of information from individuals involves
problems of conﬁdentiality. Such private information is pro-
tected to differing degrees in different countries. Generally,
you should safeguard the conﬁdentiality of respondents at all
costs. Longitudinal studies are particularly challenging; you
52 Sample surveys
will be collecting such a volume of information on each per-
son that it may be easy to identify speciﬁc individuals.
Take special care that individuals cannot be identiﬁed if
you are to release data banks for research purposes outside
the organization collecting the data. You may have to remove
certain information, such as geographical details. You may
want to release ‘restricted use’ ﬁles to researchers who swear
to abide by speciﬁc procedures to safeguard the security of
the data under penalty of law.
A further ethical issue involves the amount of information
about the goals of the study that you supply to the respon-
dents. Hiding the true purposes of a study from the respon-
dents may seem necessary in order to obtain honest answers
in some circumstances, but raises difﬁcult ethical questions
that you must carefully weigh.
In many cases, you may have to call upon an ethics com-
mittee to make a decision about the various procedures used
in your study.
2.2.4 Pilot studies
When you know little about the population or are using new
and untested instruments, a small preliminary study will gen-
erally be necessary. A pretest is a piecemeal check of parts
of the instrument, whereas a pilot study is a small test version
of the full study.
A pilot study may aim to:
check the adequacy of the sampling frame;
develop the ﬁeld procedure by
testing the adequacy of the questionnaires;
2.2 Organization 53
training the investigators;
checking the efﬁciency of the brieﬁng and instruc-
tions for the investigators;
verifying communication between the ﬁeld and the
ofﬁce;
obtain information on the various components of variabil-
ity to which the population is subject;
determine the suitable size of sampling unit;
estimate the rate of non-response, whether refusals or non-
contacts;
provide estimates of the costs, such as interview and travel
times.
Where possible, try to use random sampling for the pilot
study, although this is rarely done in practice. Instead, some
typical sampling units, such as nearby villages, are generally
selected. Take care that the members of any pilot study are
not included in the ﬁnal sample to avoid any biases arising
from repeated interviewing.
Where possible, conduct the testing in two stages:
1. a trial of the questionnaire by professional investigators
who are thoroughly familiar with it and with the study as
a whole;
2. a subsequent trial using the revised questionnaire with in-
vestigators of the type who will actually be used in the
study.
You may want to try open questions in the pretest, or pilot
study, in order to determine the range of possible answers.
From this, you can then construct closed questions (which
54 Sample surveys
does not mean that the ﬁnal instrument should only contain
closed questions). In certain cases, you may ﬁnd it necessary
to try several forms of the same question to check if different
answers are received.
2.3 Measuring instruments
If you lack research or administrative experience in the sub-
jects to be covered, it is fatally easy to omit some vital items
when designing the instruments.
Where possible, it is generally preferable to use instru-
ments that have already previously been used in other stud-
ies. This has several advantages:
substantial time can be gained;
the instruments have already been tested in a similar con-
text;
results of the study will be more comparable with other
studies.
As already mentioned, thoroughly try out all instruments, in
the conditions of the new study, by means of a small pretest
of the survey procedures.
2.3.1 Types of instruments
The purpose of an interview is to ﬁnd out what is on some-
one’s mind, to discover things that cannot be observed di-
rectly. You may use a large number of different instruments
to collect information about the problem of interest. These
include:
direct observation schedules;
2.3 Measuring instruments 55
tests of personal knowledge;
questionnaires consisting of closed and/or open questions;
structured interviews;
recording undirected discourse (life histories);
participant observation.
Notice that certain of these instruments, especially later in
the list, often do not directly produce information appropri-
ate for subsequent statistical analysis.
The more detailed the information you obtain from each
observational unit, the smaller, and often the more unrep-
resentative, the sample will usually need to be. The more
structured the information you obtain, the more suitable it
will be for statistical analysis, as opposed to more subjective
means of summarization.
You can apply many of these instruments in a variety of
ways.
Direct administration:
observation;
direct interviews;
telephone interviews.
Indirect methods:
deposit the questionnaire to be collected later;
diary cards collected periodically;
postal surveys.
Among other things, your choice will depend on;
the budget available;
the number of observational units;
56 Sample surveys
the ability of people to reply with and without guidance;
the interest generated by the study.
Postal surveys can be relatively cheap and can have high
return rates. However, they do have a number of disadvan-
tages over interviewing:
The questions must be simple and clear, not requiring a
lot of explanation.
All answers are ﬁnal, with no opportunity to attempt to
overcome hesitation or ambiguity.
Spontaneous replies, opinions uninﬂuenced by discussion
with others, and tests of personal knowledge are impossi-
ble.
Questions cannot be ordered such that early ones are an-
swered without knowledge of later ones.
There is no guarantee that the person randomly sampled
actually answers the questionnaire.
Supplementary observational data cannot be obtained.
Pretesting of the questionnaire is especially important with
postal surveys because the investigator will not be present
to gain cooperation or to clear up ambiguities. The cover-
ing letter and the sponsorship can be crucial in convincing
people to reply.
Generally, where possible, direct observation is preferable
to questions, and questions on facts and on past actions are
preferable to questions on generalities and on hypothetical
future actions. Physical measurements are more objective,
but qualitative observations are often more capable of sum-
marizing the important features of a complex situation. By
2.3 Measuring instruments 57
proper standardization and calibration among investigators,
you can make qualitative observations reasonably objective.
Multicultural studies
Special problems arise if the instrument is to be used in sev-
eral cultures. You can distinguish two cases:
1. The instrument may be adapted for use in more than one
language and cultural context, without any attempt to com-
pare the results cross-culturally.
2. The results of the study must be compared or aggregated
across cultures.
In the ﬁrst case, you should only use the original instrument
as a guide in producing a culturally appropriate procedure
for the new setting. The second case is much more complex,
raising conceptual, technical, and ethical problems. In all
cultures:
both the items in the instrument and the responses to them
must be conceptually and functionally equivalent;
the same phenomenon must be measured – the underlying
concept must exist and be pertinent;
the questions must be relevant and not too personal or of-
fensive;
issues of local importance must not be missed.
In most existing instruments, the majority of items are highly
culture-speciﬁc.
You should have your instrument translated into the target
language, preferably in several alternative versions, and then
back-translated to the original language for checking. Have
58 Sample surveys
a panel of monolingual lay people comment on the transla-
tion in the new language; multilingual people are never rep-
resentative, especially if one of their languages is English.
Psychological and emotional states are the most difﬁcult to
translate between cultures. When working in several cul-
tures, you must use much more elaborate ﬁeld pretesting for
reliability and validity in each language.
2.3.2 Questionnaires
You may design a questionnaire for completion in three prin-
cipal ways:
1. by the investigator,
(a) from direct observation or
(b) with the aid of questions put to the respondents;
2. by the respondent with little or no assistance from the in-
vestigator.
You may require all of them in the same survey for different
types of information. In all cases, clearly specify the means
of distinguishing a non-response from a non-applicable ques-
tion.
The simplest instruments are those where the investigators
themselves record observations. Generally, there should be
a separate instruction booklet, so that the form itself remains
simple.
When the investigators are to ﬁll out the questionnaire
containing questions posed to respondents, you may train
them to use a given ﬁxed wording or ask them to elicit infor-
mation in any way that will provide an answer to the ques-
tions. Thus, you must instruct them as to whether they must
2.3 Measuring instruments 59
put the questions in the exact form given or can ask them in
a more general way. The former procedure is especially im-
portant in matters of opinion where the wording may affect
the answer.
In such a context, explanatory notes may either be on the
questionnaire or in a separate booklet. The latter results in
a more compact questionnaire and is more suitable for pro-
fessional investigators. The former is more likely to ensure
that the investigators remain aware of the purpose of each
question.
When the respondents will ﬁll out the questionnaire them-
selves, the only role of the investigators may be to explain
the purposes of the study and to persuade the respondents
to cooperate. Be especially attentive to the wording of the
questions and the explanatory notes. The latter will gener-
ally be on the questionnaire beside each question. However,
detailed and lengthy explanations should be avoided.
The principal types of questions are:
measurements, in clearly deﬁned units;
multiple-choice, or closed, questions, where all possible
answers are ﬁxed in advance, one and only one of which
must necessarily be chosen;
semi-open questions, with the main possibilities listed, but
alternative replies can also be supplied;
open questions, where any reply is possible.
Never record measurements in predeﬁned categories, except
in the rare cases of delicate subjects such as income, where
an accurate answer is otherwise unlikely.
60 Sample surveys
The advantage of multiple-choice questions is that they
are usually easy to understand and to reply to and that they
are also easy to record and to analyse. However, they carry
the danger of telegraphing the answer and provide no op-
portunity for nuance. The most common error is to allow
respondents to choose several possibilities; this makes statis-
tical analysis difﬁcult or impossible.
Semi-open questions may suggest to the respondent that
the list provided contains more common or more acceptable
answers than the one not included that might be closer to the
truth.
Open questions, if well formulated, can provide informa-
tion on almost any subject. They are the only possibility
when you cannot predict most or all answers to a question.
However, they can be more difﬁcult to construct to obtain
objective answers and much more difﬁcult to record and to
analyse.
Thus, pay careful attention to the detailed wording of all
questions, even if these are only intended as a guide to the
investigator, and to the order in which they are presented.
In formulating questions, a number of basic principles are
important: they should each
be precise, simple and logical, with no technical terms un-
familiar to the population being addressed;
be short, concerning only one idea;
be unambiguous – for example, age can refer to the last or
to the nearest birthday;
be self-explanatory, where possible;
2.3 Measuring instruments 61
not ask for too many details or about events too far in the
past;
require some answer, so that it is possible to detect non-
response;
clearly specify any units of measurement and the precision
required;
provide enough different categories, without any overlap-
ping intervals;
be answerable – asking for the cause of death of parents
must be conditional on their being dead and not require
excessive medical details;
avoid hypothetical situations.
In summary, a good question will:
be relevant to the respondent;
be easily understood and unambiguous in meaning;
not be inﬂuenced in any untoward way by the context in
which it is used;
relate to the survey objectives;
mean the same thing to the respondent, the investigator,
and the decision makers for whom you are performing the
study.
For every question, ask yourself if it is really necessary. A
shorter questionnaire reduces non-response and increases ac-
curacy of answers.
The order of the questions is important. The logical order
is not necessarily the best. The ﬁrst questions should help to
place the respondent at ease. Simple demographic questions
62 Sample surveys
often serve this purpose well. Then, the more delicate sub-
jects can be treated near the middle. Generally group ques-
tions by subject areas so that the respondent is not required to
change train of thought too often. An exception will be du-
plicate or redundant questions that you use to double-check
each other and that must appear at separate moments in the
sequence.
In complex cases, you may need several questionnaires for
each sampling unit: child, household, village, school. Take
great care in identiﬁcation so that you can link them all cor-
rectly together for analysis.
The questionnaire forms must be convenient to use and set
out so that the information can easily be translated to elec-
tronic media. Only collect raw data; the investigator should
make no intermediate calculations. For example, dates are
more reliable than time intervals. You should usually al-
low space for the investigator or respondent to make general
comments. Although you cannot easily treat such observa-
tions by an exact analysis, they can be of considerable value
in drawing your attention to relevant facts not covered by the
questionnaire.
In all cases, arrange the information to be recorded on the
questionnaire in such a way as to be easily transferred to a
computer. Plan and clearly specify the manner in which to
do this, including the systematic handling of non-responses,
at the same time as you develop the questionnaire. No in-
termediate calculations should be necessary at this transfer
stage.
2.3 Measuring instruments 63
2.3.3 Field investigators
The main tasks of a ﬁeld investigator are:
locating the people sampled;
obtaining agreement to be interviewed;
asking the questions;
recording the answers.
In many cases, only about one-third of an investigator’s time
is actually spent interviewing. The rest involves:
studying materials;
travelling and locating respondents;
editing questionnaires;
general administrative work.
You must allow for this in scheduling work-loads.
Field work is very arduous, involving considerable men-
tal strain. Interviews are often intense experiences involv-
ing complete attention and frequent thinking on one’s feet.
Hours of work are generally very irregular, because evening
visits are often necessary to contact working people and re-
duce non-response. Supervisors should be required to un-
dertake some ﬁeld work themselves in order to be able to
appreciate the difﬁculties.
Payment by piece rates is generally unsatisfactory, be-
cause it leads to incomplete or hasty work and to irregulari-
ties such as replacing one respondent by another.
If you are recruiting new investigators, give all applicants
preliminary tests and arrange proper training courses. Re-
cruits should be honest, interested, accurate, adaptable, pleas-
ant, and able to follow complicated instructions. Carefully
64 Sample surveys
watch and supervise the early work of new investigators. If
possible, build up the team of investigators by selecting suit-
able ones at the pilot stage; this provides a means of testing
and training.
Investigators should have background knowledge of the
subject under study. Unspecialized teams of investigators are
only suitable for carrying out routine studies requiring rela-
tively simple questionnaires. When a high degree of techni-
cal knowledge is needed, use staff in existing organizations.
Note also that the reactions of the respondents may de-
pend on the origin of the investigators. For example, gov-
ernment ofﬁcials may arouse suspicions that the information
collected could be used for purposes other than those stated.
2.3.4 Accuracy of recorded information
Once you have chosen a proper sample, the most common
sources of inaccuracies in the observations are variation in
the respondents’ reactions to the method of assessment and
variation in the investigators’ techniques. These measure-
ment errors may be called observation errors. Inadequate
responses may be missing, incomplete, irrelevant, or inaccu-
rate. In order to be able to check if these problems are evolv-
ing over time, record the sequence in which the respondents
are interviewed.
In addition to the principles listed above for questionnaire
construction, you should consider a number of further points
in judging if you will obtain accurate information from the
respondents:
Are the respondents sufﬁciently informed, or able to recall
past events, to be capable of providing accurate answers?
2.3 Measuring instruments 65
Can they translate into an unambiguous and understand-
able answer what they believe to be the truth, and are they
willing to do this?
If the answers require substantial work, will they be pre-
pared to do it?
Do they have motives for concealing the truth (perhaps
simply trying to impress or please) and, if so, will they
refuse to answer or give incorrect information?
A number of remedial actions are possible:
Anonymity of respondents is important, especially when
using incriminating or highly personal questions.
If you are using skilled investigators, working on a rel-
atively small sample, they may be able to elicit accurate
information in many circumstances.
In certain cases where you expect such problems to arise,
you can take substitute measures, such as approximating
revenue by the size of the dwelling.
In other cases, you may have to abandon the question.
Similar points apply to the role of the investigators:
Are they sufﬁciently informed about the subject and mo-
tivated to do the work required?
Are they in possession of standardized methods for elicit-
ing and recording qualitative information in an objective
manner?
Are they trained to approach all respondents in the same
way, independently of the opinions they hold and the dis-
position they happen to be in?
66 Sample surveys
Do they know how to avoid indicating to the respondent
in any way what they believe to be the appropriate answer,
instead listening with the required patience?
Are they meticulous in recording the answers supplied or
in verifying that the respondent has properly ﬁlled out a
questionnaire?
Fieldwork will be most accurate if the investigators are well
trained, capable, conscientious, and keen. Personal charac-
teristics of the investigators can inﬂuence the answers given.
Make ﬁeld checks where possible. Carry these out on a ran-
dom subsample of units, in a way such that the investigators
know beforehand that they will be checked but not which
parts of their work will be examined.
Poorly constructed instruments, badly administrated, can
also lead to biases and inaccuracies, some already mentioned.
Misinterpretation may occur because the question is not
speciﬁc and the respondent does not want to show igno-
rance.
Technical terms and academic jargon can easily lead to
incomprehension.
For multiple choices, an insufﬁcient number of alterna-
tives may force the respondent to choose an inappropriate
answer.
Providing the possibility of ‘don’t know’ may allow an
easy way out for lazy respondents but is necessary for peo-
ple who genuinely do not have experience of a subject.
Probing by the interviewer is usually acceptable for fac-
tual questions but not for opinions or tests of knowledge.
2.4 Sampling error 67
Leading questions will cause some subjects to provide
what they believe to be the correct or desired answer.
People tend to choose an item near the beginning or end
of a long list of possibilities.
In a series of questions involving ratings, the ﬁrst will of-
ten give more extreme results because the respondent has
not yet established a standard.
More recent information is usually more accurately re-
ported. However, this may not always be the case. When
asking for the number of events in the previous week, the
answers may be, consciously or unconsciously, telescoped
so that more are reported than actually happened. In such
cases, reports of events in a longer period, such as a year,
may be more accurate.
For binary variables, sensitivity is the proportion of those
who actually have the characteristic who are classiﬁed as
having it. Speciﬁcity is the proportion of those not having it
correctly classiﬁed. Obviously, it is usually difﬁcult to check
either of these because you do not generally know the correct
answer.
Make a preliminary examination of the returned forms as
quickly as possible so that you can have defective work cor-
rected while it is still possible.
2.4 Sampling error
If instruments could be constructed such that they could yield
accurately recorded data, that is, no observation errors, ob-
served differences in response among groups in a sample
could arise from four sources:
68 Sample surveys
1. the (causal?) effect of belonging to the group;
2. other relevant factors not taken into account, called con-
founding (Section 1.3.4);
3. bias in choosing the sample;
4. random chance in choosing the sample.
Your study has internal validity if you can ascribe differences
to the ﬁrst source. Now, let us consider the last two.
Two types of sampling error can arise:
1. biases in selection of the sampled units;
2. chance differences between members of the population in-
cluded in the sample and those not included, called ran-
dom sampling error.
Bias forms a constant component of the error that does not
decrease, in a large population, as the number in the sample
increases. On the other hand, random sampling variation,
what statisticians have called random error, decreases on av-
erage as the sample size increases.
Bias is an important factor in determining the accuracy of
the results, along with observation error. On the other hand,
random sampling error determines the precision of any quan-
tities estimated (Section 1.3.5). There is an inverse relation-
ship between the latter two: precision increases as random
sampling error diminishes.
2.4.1 Causes of sampling bias
Random selection is not haphazard selection. You can only
obtain a true random sample by adhering to some strict ran-
dom process (Section 1.4.1). Sticking pins on a map or meet-
ing people on a street corner is not random. If at all possible,
2.4 Sampling error 69
perform the random selection centrally, and do not leave it to
the arbitrariness of the individual ﬁeld investigators.
Faulty selection of the sample can give rise to bias in a
number of ways. The main causes are:
deliberate selection of a ‘representative’ sample (Section
2.1.2);
a selection procedure depending on some characteristic
that is associated with the properties of the units that are
of interest – many haphazard selection processes have this
defect, as when, in a shopping survey, customers arriving
at shops are interviewed;
conscious or unconscious deviations from a proper ran-
dom selection process, for example when a ﬁeld investiga-
tor replaces a sampling unit for some reason, such as sub-
stitution of a convenient member of the population when
difﬁculties are encountered in obtaining information from
the individual randomly selected;
failure to cover the whole chosen sample, leading to miss-
ing data.
Of course, bias will also arise from any systematic measure-
ment errors as discussed above, for example if the respon-
dents misunderstand a question.
If possibilities of bias exist, you will not be able to draw
any fully objective conclusions from a sample. The only uni-
versally acceptable way for you to avoid bias in the selection
process is to draw the sample at random and to avoid miss-
ingness wherever possible.
70 Sample surveys
2.4.2 Missing values
Missing values can arise from coverage errors or from non-
response errors. The former are systematic for any sample
chosen, whereas the latter depend on many factors related to
the actual conduct of a given study. Only non-response can
generally be detected. Let us consider it in more detail.
There are many reasons for non-response.
A respondent may be unsuitable for interview, because of
an error in the sampling frame.
There may be difﬁculty in contacting respondents, which
can depend on various factors:
change of residence;
the nature of the respondent, where, for example,
housewives are more often at home than those who
go out to work;
the time of call, with employed people being away
during the day or in vacation time;
the interview situation, for example, if there is ad-
vance notice of the visit.
Refusals may depend on:
the disposition of the respondent, this varying from
cheerful cooperation to hostility;
the techniques of the investigator;
the number, nature, and sequence of the questions;
sponsorship of the survey.
Respondents may lack interest or concern – people with
children in school will be more prepared to answer ques-
tionnaires about education and ill people more inclined to
answer about health.
2.4 Sampling error 71
They may have some incapacity or inability, such as
illness in the family;
language difﬁculties.
If non-response is not minimal, review both the question-
naire and the type of investigator to ﬁnd out whether it is
wise to continue. If you have not detected these problems
during the pilot stage, the required changes will often render
the earlier and later results incompatible.
You can fairly easily remedy two problems:
1. failure to make contact with the selected respondents
persistent calling back may be the only solution, although
contacts with neighbours may provide useful information
to trace the missing individuals;
2. too long and complex a questionnaire – shorten and sim-
plify it (at the pilot stage!).
Making an appointment for an interview can have the per-
verse effect of allowing the respondent to be out at the time
of call.
In a well-designed questionnaire, you will include a sec-
tion to record information on non-respondents. For personal
refusals, you should have a considerable amount of approx-
imate information noted. Once the survey is ﬁnished, you
should check the missing answers to see if they are repre-
sentative at least for known groups, such as sex, age, and so
on.
Missing answers to single questions can occur for a num-
ber of reasons:
1. the interviewer forgets to ask the question;
72 Sample surveys
2. the respondent cannot provide an answer;
3. the respondent refuses to answer;
4. the answer is not recorded.
The rate of missing answers is often linked to interviewer
experience.
Measures to reduce the non-contact and overall refusal
rates, such as calling back, are often more costly than those
to reduce individual missing items. The latter usually in-
volve improving interviewer training and questionnaire de-
sign. You must weigh the cost of reducing non-response bias
against increased sampling error due to the resulting smaller
sample size for ﬁxed total cost.
2.4.3 Random sampling error
Sampling error arises from non-observation: the whole eli-
gible population, the sampling frame, is not included in the
sample. Thus, the simplest way for you to reduce random
sampling error, and increase precision, is to increase the sam-
ple size. Other things being equal, this error is approximately
inversely proportional to the square root of the sample size.
However, the precision attained also depends on the variabil-
ity in the population. Techniques discussed below, that re-
strict selection as compared to full randomization without
introducing bias, can increase precision. The main one is
stratiﬁcation.
If you only require overall results for the whole popula-
tion, you can attain a given degree of precision with a far
smaller sample than will be the case if you require the de-
tailed results for different parts of the populations (for exam-
ple, different regions, towns, and so on).
2.5 Sample designs 73
2.5 Sample designs
For any random sampling procedure to be possible, you must
subdivide the population under study into sampling units. As
we have seen, these may be the observational units, or some
aggregation of them. They may be natural (families), admin-
istrative (villages), or artiﬁcial (square regions of equal area
or population density).
2.5.1 Simple random samples
A simple random sample from the population of observa-
tional units is the simplest type of rigorous method of obtain-
ing a sample. It is also the basis of most other procedures.
In this method, you divide the population into observational
units, a numbered list of which is available, and select the
required number of units at random from this entire popula-
tion.
Usually, you will generate random numbers by statistical
software on a computer. The numbers you obtain indicate
which units on the list you are to include in the sample. In
simple random sampling, each member has the same proba-
bility of being selected. Although the simplest method, this
is not usually the most efﬁcient or the most cost-effective.
2.5.2 Stratiﬁcation
If you have available additional information about all of the
individuals in the population, you can obtain increased repre-
sentativity and precision by stratiﬁcation of the sample. This
involves randomly choosing ﬁxed proportions in each cate-
gory of some known explanatory variable such as regions,
74 Sample surveys
age groups, or sex. In this way, the sample and population
proportions are guaranteed to be equal, or to have a known
relationship, at least for this variable.
Divide the population into blocks, or strata, of units, such
that the members of each stratum are as similar as possible
on some important criteria. These strata may or may not all
contain the same number of units. You then sample each
stratum at random.
The main purpose of stratiﬁcation is to increase precision
of the overall population estimates and of the correspond-
ing estimates for each stratum. If there are large differences
in response among the units in the various strata, the accu-
racy and precision of the overall estimates will be increased.
This is because the strata will be represented in their correct
(or at least known) proportions, whereas, in simple random
sampling, these proportions are subject to random sampling
error. In a stratiﬁed sample, only variation within strata pro-
duces random sampling error.
For stratiﬁcation to be possible, you must be able to clas-
sify each sampling unit distinctly into one stratum. In other
words, you must have complete information about any stra-
tum variable available for all sampling units before begin-
ning the study. Typically, the formation of only a few strata
will yield the most gains.
In summary, stratiﬁcation is only possible if the informa-
tion about each sampling unit necessary to create the strata
is available in the sampling frame. Stratiﬁcation yields three
main advantages over simple random sampling:
1. If units within a stratum are more similar with respect to
2.5 Sample designs 75
the response than those between strata, then the precision
of any overall population estimate will be greater than that
from a simple random sample of the same size.
2. The corresponding estimates within strata should be more
accurate, which will be important if these subgroups are
of special interest.
3. Stratiﬁcation will make it possible to sample various sub-
groups in different ways, which may reduce costs.
Be careful about simplistic assumptions concerning the re-
lationship between sample size and costs. This relationship
may not be linear. For example, cost per unit may be a de-
creasing function of sample size in some strata.
Uniform and variable sampling fractions
If you select the same proportion of the members in the pop-
ulation from each stratum, the strata will all be represented
in the correct proportions in the complete sample. How-
ever, different proportions may be more useful if you al-
low the more important or more variable strata to be over-
represented. In this case, you may need to use appropriate
weightings in subsequent calculations.
Note, however, that a known non-zero probability of se-
lection does not imply a similar probability of measurement.
Different groups of people may have different rates of non-
response introducing differential biases.
Often, the largest strata are most variable, but also have
the lowest sampling costs. If there is a ﬁxed cost in each
stratum, as well as a variable cost proportional to sample
size, you will minimize total cost by choosing strata samples
76 Sample surveys
proportional to the product of strata size and strata variabil-
ity (measured by strata standard deviations) divided by the
square root of cost per unit.
In the opposite case, you may oversample small strata if it
is important to have precise information about them.
2.5.3 Clustered or multi-stage samples
In simple random and stratiﬁed samples, the sampling unit
and the observational unit are identical. These designs are
only possible when you have a complete sampling frame of
the observational units available. In many countries of the
world, they are not feasible for this reason, or because of high
travel costs. Then, you can only apply random sampling to
groups or clusters of observational units; these become the
sampling units.
Thus, you may sometimes save time and expense, at the
cost of reduced precision, by clustering, that is, by choos-
ing random groups of individuals found together. Then, you
may study all of members of each chosen group or select
some randomly. For example, you may choose several peo-
ple from each of several villages or entire classrooms of stu-
dents, where you take the village or classroom at random.
When the clusters are geographical regions so that the sam-
pling frame is a map, this is called area sampling.
Individuals in such a cluster will generally be more similar
or homogeneous than if you had chosen each of them inde-
pendently randomly from throughout the whole population.
This means that you are actually collecting less information
for a given number of observations, resulting in lower pre-
cision per observation unit. However, in certain cases, you
2.5 Sample designs 77
may increase overall precision, because the smaller cost per
unit sampled allows you to take a larger overall sample for
the same total cost.
With cluster sampling, the variance, as compared to a sim-
ple random sample of the same size, will be increased by a
factor of approximately 1 + (m1)ρ, where mis the clus-
ter size and ρis the correlation among responses in a cluster.
Thus, in contrast to stratiﬁcation, where strata should be as
homogeneous as possible to increase precision, here clusters
should be small and heterogeneous (have small correlation).
In multi-stage sampling, you consider the population to be
made up of a number of ﬁrst-stage sampling units. You then
take those chosen to consist of second-stage sampling units,
and so on. (Simple clustering is two-stage.) At each stage,
you sample the units by a suitable method, usually simple
random sampling or stratiﬁed sampling; this may not be the
same at all stages. The important point is that you choose all
sampling units at all stages by a proper random process.
Choice of the ﬁrst-stage units is especially important. You
should take into account several criteria:
The total number of primary units in the population should
be relatively large.
The units should have clear boundaries. Well-known ad-
ministrative units are often preferable.
The units should be fairly uniform in size.
The units should remain stable over time (at least from
when the information was obtained until the survey is car-
ried out). Comparability to past and future data is even
better.
78 Sample surveys
Multi-stage sampling has several important advantages:
It introduces ﬂexibility that is absent from the simpler
methods. You can use existing natural divisions of the
population as sampling units.
You will only need to carry out subdivision into second-
stage units on those ﬁrst-stage units actually selected.
However, multi-stage sampling generally yields less precise
results than a sample containing the same number of ﬁnal-
stage observational units selected by some suitable one-stage
process.
Construction of an appropriate pilot study is usually most
difﬁcult for multi-stage sampling. You will require a much
more extensive pilot study than for other designs if you want
to obtain any reliable preliminary estimate of variability.
If you are interested in the clusters themselves, as well as
their members, many of the optimizing criteria discussed in
this section are no longer relevant.
2.5.4 Systematic samples
Much practical sampling is not fully random in nature. A
frequent method of selecting a sample, when a list of all units
is available, is to take every kth entry on the list. The ﬁrst
entry should be determined by selecting a random number
between 1 and k.
Such a systematic sample would be a simple random sam-
ple if the list were arranged completely at random. You can
only estimate precision if you make this assumption. How-
ever, no list is completely random. If you are forced to use
2.5 Sample designs 79
this method, take great care to verify that no periodic fea-
tures or monotonic trend appear in the list, especially any
that might be associated with the sampling interval, k.
This method does have several advantages:
It often involves less labour and technical expertise than a
true random sample. Thus, it may be especially useful if
selection must be made in the ﬁeld by relatively untrained
investigators.
In some situations, you need not know the complete sam-
pling frame in advance. You can select units sequentially
in time.
2.5.5 Case–control designs
When the response characteristic that you wish to study is
extremely rare in a population, you would require an impos-
sibly large sample to obtain even a few cases randomly. If
cases are available, but in a non-random way, you may need
to use a radically different type of study. You identify sub-
jects with the condition of interest, as well as a group of con-
trols without the condition, matched as closely as possible
to the cases. You then compare them as to their previous
exposure to any risk factors of interest.
The main advantages of such a case–control study are
practicality, simplicity, speed, and low cost. However, the
disadvantages are many:
non-random selection of the cases;
difﬁculty in locating appropriate controls, as well as their
non-random selection;
inaccuracy of retrospective memory;
80 Sample surveys
differential recall between cases and controls;
detection bias, whereby one of the exposure factors stud-
ied facilitates observing the response condition.
You must study cases carefully for conclusions to be gener-
alizable. Estimation of exposure from a control group that
includes individuals either predisposed to such exposure or
not can be altered by changes in this mix. Identiﬁcation of
eligible cases and controls must not depend on their exposure
status.
You may locate cases as all those diagnosed
in a community;
in a random sample from the population;
in all relevant institutions (schools, hospitals) in the com-
munity;
in one or more such institutions.
The controls should be as similar as possible to the cases, ex-
cept that they do not have the condition being investigated.
Without the possibility of randomization, this can be difﬁ-
cult or impossible. Often, the best chance is by individually
matching one or more controls with each case on some vari-
ables that could confound the comparison. These should be
variables that are strongly related both to the condition and to
the exposure factors. But then you cannot use such variables
as possible risk factors for the condition.
Problems of memory with retrospective observations have
already been discussed. However, here there is a further
problem. Subjects with the condition may have thought about
the reasons for it and have noticed exposure factors. These
2.5 Sample designs 81
may also be present for the controls, who, however, have not
noticed them. Thus, exposure factors may be under-reported
in the controls. Subjects exposed to certain factors may more
often seek professional advice that, in turn, allows their con-
dition to be more often detected than the unexposed, creating
additional bias.
You must interpret any results from a case–control study
with extreme caution.
2.5.6 Repeated sampling
Most samples are carried out on a single occasion to deter-
mine the characteristics of the population at a given point in
time. If the population is subject to change, such a study
cannot provide you with information on the nature or rate of
change. One possibility is to work retrospectively, and ask
questions about the past. This generally involves problems
of memory, whereby information further back in time is less
trustworthy.
In other cases, you must make provision to redo the study
periodically, in what are called waves. This may take differ-
ent forms. You may:
develop a completely new survey for each point in time
(these independent random samples may overlap);
repeat the survey in the same form but with new sampling
units drawn at intervals in time (again, the random sam-
ples may overlap);
repeat the survey with the same sample at each time point,
a panel or cohort study;
replace a part of the sample at each occasion;
82 Sample surveys
draw a subsample from the original sample, and only resur-
vey these units.
Your choice will generally depend on the exact type of change
to be studied, as well as on questions of cost and practicabil-
ity.
Panel and cohort studies will provide you with the most
information about how change is occurring. You may design
one so that you choose ﬁxed proportions of respondents in
categories of some key explanatory variable, the risk groups.
For example, to study lung cancer, you could select groups
of smokers and non-smokers.
These studies will not require your respondents to remem-
ber events over long periods, thus increasing accuracy. How-
ever, they carry several dangers:
1. Respondents must be tracked from interview to interview,
with the major risk of drop-outs; the people lost will not
be representative.
2. A sample that you randomly selected to be representative
of some population when the study began will usually no
longer be representative at successive waves, if only be-
cause everyone has grown older and the younger cohorts
are missing.
3. Repeated restudy of the same units may induce resistance
to providing information, but it may also lead to more ac-
curate answers, both creating spurious trends in the re-
sults.
4. It may result in modiﬁcation of the units involved, as com-
pared to the rest of the population, so that they become
less representative.
2.5 Sample designs 83
5. If initial risk groups were sampled, the high-risk group
may be followed more closely, leading to surveillance bias,
or subjects may change habits, and hence risk groups, for
example, by stopping smoking.
Recruiting people willing to be included in a study requir-
ing continued participation over a period of time will often
not be easy. One of the most difﬁcult and costly aspects of
longitudinal studies is keeping track of the respondents. To
facilitate this, it is usually a good idea to collect as much in-
formation as possible about the respondents’ families, close
relatives, and friends. If you lose contact with respondents,
you can contact these people to try to locate them.
The advantages of cluster designs, such as using hospi-
tals or schools as sampling units, may be lost in longitudinal
studies because the observational units may change clusters
between waves. If you administer questionnaires in groups,
this will also create an increasing problem as respondents
become split up over time.
The sample for a longitudinal study is representative at the
moment it is chosen. However, it may not remain so as time
goes by. For example, a sample of students chosen in the
sixth year of school will, two years later, no longer be rep-
resentative of students in the eighth year of school because
of failures and perhaps drop-outs. One possible approach
is to use freshening by sampling extra students at each new
time point in an attempt to bring the sample back to repre-
sentativity. In such cases, it may also be useful to obtain ret-
rospective background information on these supplementary
respondents.
84 Sample surveys
In general, diary cards are better than repeated interviews
for registering new events or changes in circumstances. On
the other hand, interviews allow in-depth questioning about
chronic conditions.
2.6 Sample size
Sample size refers to the number of observational, not sam-
pling, units. A sample size can only be calculated for some
speciﬁc aspect of the population to be estimated, usually re-
lated to some important response variable. The size of sam-
ple that you will require in order to attain a given precision
for such an estimate depends on the variability of the popu-
lation and on the extent to which it is possible to reduce the
different components of this variability in the random sam-
pling error, primarily by stratiﬁcation.
The standard error, although usually a crude measure of
the precision of an estimate obtained from a sample, is accu-
rate enough to allow you to make sample size calculations,
these being, in any case, themselves rather rough. We have
seen that the standard error is a function of the sample size,
decreasing as that size increases (Section 1.3.5).
Suppose that, in simple random sampling, you want to
be relatively conﬁdent that the population value of interest
is within a small region around an estimate of it calculated
from the sample. Then, you can use the standard error to
calculate the approximate size of sample required to ensure
this. Fortunately, in the common cases, the standard error is
easy to calculate.
Let us represent the population value of interest by φ, and
2.6 Sample size 85
its estimate calculated from the sample by ˆ
φ. If the standard
error is represented by σ/n, where nis the sample size,
you wish to have sufﬁcient precision from the sample to be
conﬁdent that φlies in a small interval around ˆ
φ. Apply-
ing the criteria mentioned in Section 1.3.5, you can use the
approximate interval, ˆ
φ±2σ/n.
2.6.1 Binary responses
For a binary response variable, in the simplest case, you will
be interested in the proportion of the population having a
given characteristic, that is, in the probability of that charac-
teristic appearing. Let us call this π, so that φ=πis the
value of interest. Then, you can obtain the estimated stan-
dard error from
ˆσ=qˆπ(1 ˆπ)
Thus, for the desired precision interval of two standard er-
rors, say ˆ
φ±δ/2, to be small enough, you will require that
n=16ˆπ(1 ˆπ)
δ2
In order to calculate the sample size using this formula, you
need to have some idea of what value the population propor-
tion has — this is a major reason why sample size calculation
is always approximate.
Suppose that you think that the proportion is about 80%
and you want an interval of δ= 4%. Then, the sample size
86 Sample surveys
calculation is
n=16 ×0.8×(1 0.8)
0.042= 1600
The closer πis to one-half, the larger the required sample
size will be.
2.6.2 Counts
When the response variable is a count, a similar procedure is
followed. Here, you will be interested in the average count;
we can call this µso that φ=µ. The estimated standard
error is now obtained from
ˆσ=pˆµ
Thus, for the desired interval to be small enough, you require
that
n=16ˆµ
δ2
Here, you will need to have some idea of the value of the
population mean.
Suppose that you think that the mean number of children
in a family is about 3 and you want an interval of 0.2. Then,
the sample size calculation is
n=16 ×3
0.22= 1200
Notice that, if you believe the mean to be 2, the sample size
is smaller:
n=16 ×2
0.22= 800
2.6 Sample size 87
2.6.3 Measurements
When the response variable is a quantitative measurement,
the mean will again be of interest, so that φ=µ. The method
is slightly different because the standard error is not automat-
ically given. However, the sample size formula is similar:
n=16σ2
δ2
where σ2is the variance of the measurements. Here, you do
not need to have an idea of the estimate of the mean but only
of the variability. Often, this is much more difﬁcult to obtain.
For simplicity in these examples, I have assumed that you
are only interested in estimating a single value for the whole
population. For the way to modify this to estimate differ-
ences among categories of an explanatory variable, see Sec-
tion 3.3.4.
2.6.4 Complex sample designs
If you are interested in more than one response, make the
sample size calculation for each and use the largest (if feasi-
ble).
If you are concerned with the precision of the estimate
within each stratum of a stratiﬁed random sample, you can
make the above calculations separately for each group, such
as the regions of a country. In certain cases, the variability
may be less within such strata than in the population as a
whole, but the number in the sample will also be smaller.
Generally, stratiﬁcation will reduce the variability of global
88 Sample surveys
estimates for the whole population, especially if there are
marked differences among the strata.
When you use clustering in a design, sample size calcula-
tion is much more difﬁcult because it depends on how sim-
ilar are the units within each cluster. A measure of this is
usually impossible to estimate before the study begins. The
precision that you will obtain in multi-stage samples is often
closer to that for a sample size calculated from the sampling
units than from the observational units. In an extreme exam-
ple, suppose that the clusters are families and the observa-
tional units are identical twins. The amount of information
available is better indicated by the number of families than
by the number of children because of the similarity among
twins.
Essentially, a sample size calculation requires you to as-
sume, or to make a good guess at, what you are setting out to
discover. Furthermore, you must make vast simpliﬁcations,
for example ignoring all of the numerous explanatory vari-
ables that you are collecting except one crucial variable.
Sample size determination requires you to perform a del-
icate balancing of costs and precision. There is rarely any
point in collecting data on a sample that is so large that it
provides much greater precision than that actually needed.
But you may have to abandon any hope of high precision if
the cost of a sufﬁciently large sample is too great.
2.7 Summary 89
2.7 Summary
Samples are often to be preferred to censuses because they
can provide more accurate information at less cost. Random
sampling is necessary in order to avoid unknown biases and
to yield a measure of precision of the estimates.
The observational unit, about which you will be collect-
ing information, may differ from the sampling unit, chosen
at random in the population. The sampling units in the pop-
ulation are deﬁned by the sampling frame.
A pilot study is almost always necessary to test the ﬁeld
procedures and to obtain an idea of the variability in the pop-
ulation under study.
You may use many different types of instruments in sam-
ple surveys. Those most useful for subsequent statistical
analysis include observational schedules, tests of knowledge,
and questionnaires. You should take great care in their con-
struction so as to obtain accurate information. You may ad-
minister them directly, say by interviews, or indirectly, for
example through the post. Carefully choose the ﬁeld investi-
gators and adequately train them for the speciﬁc procedures
that you will use.
Sampling error arises from biases and from the random
variability in the sample. The biases may result from in-
adequacies in the sampling procedure or from missing re-
sponses, as well as from problems with the instruments or
investigators.
The main sampling designs are simple random samples,
stratiﬁcation by randomly choosing ﬁxed proportions of in-
dividuals within subgroups of the population, and clustering
90 Sample surveys
by randomly choosing whole groups of individuals. On the
other hand, case–control designs are sometimes useful when
you are studying some rare event. For changes over time,
you will need to do repeated sampling, as in a panel study.
Sample size calculations will provide you with a rough
idea of how many individuals you must observe in order to
obtain a desired degree of precision or to detect some effect
of interest.
3
Experimental trials
3.1 Basic principles
The main feature that distinguishes an experimental trial from
a sample survey is that you perform an intervention or treat-
ment instead of simply observing things as they are. Funda-
mental advantages of the experimental method are that:
causality can be empirically studied;
a complex causal problem can be attacked by proceeding
in a series of simple steps.
Thus, you can break a problem up into simple questions to be
explored by separate trials with simple causal assumptions.
However, as we have seen, the major drawback, especially
with human subjects, is that they cannot be chosen randomly,
making generalization to a larger population difﬁcult.
In a trial, the principal sources of exposure will be un-
der your control; you decide to whom each is applied. The
smallest entity that might have received a different (sequence
of) intervention(s), when they were allocated in the study, is
called the experimental unit. As with sampling units in a sur-
vey, there may occasionally be several levels of experimental
92 Experimental trials
units, such as classrooms and children.
In certain scientiﬁc investigations, you will not precisely
know the causal factors so that a goal of the study is to deter-
mine which are relevant. In other situations, you may pos-
sibly apply several distinct types of intervention in various
combinations. In either case, it may often be desirable for
you to include several causal factors simultaneously in the
same trial.
You will often begin your investigation of any relatively
complicated phenomenon with a general survey of the effect
of a variety of changes on the system. Next, you may test
more closely your ideas about how parts of the system really
work. Most often, you will need a series of trials, with your
initial ideas being corrected at each step.
Even trials with direct practical political or commercial
aims may allow you to include special treatments yielding
fundamental knowledge about the process under study. A
good check on reliability of a trial is agreement with pre-
viously established results in the ﬁeld. Often, it is worth
including some speciﬁc ‘standardized’ treatment solely for
this objective.
The basic requirements for you to perform an acceptable
experimental trial include:
as simple, but efﬁcient, a design as possible;
freedom from systematic error or bias;
sufﬁciently precise and exact measurement of the response;
a measure of precision of the results;
wide validity of the results.
3.1 Basic principles 93
Various procedures are necessary to ensure these goals. Three
of the important steps are choice of:
1. treatments;
2. experimental units;
3. types of measurements observed.
Generally, the choice of treatments, except perhaps for the
form of control, is a technical question speciﬁc to the subject
under study. Once you have settled these three questions,
you can elaborate the overall design of the trial.
3.1.1 Controls and placebo
Experimental trials can only allow you to identify the effect
of an intervention by comparison with something else. Ob-
servation of past conditions, before intervention, called his-
torical controls, is not sufﬁcient because there can be exter-
nal evolution over time or unrecorded differences among the
subjects involved. Thus, you must include a simultaneous
control treatment for a trial to be worthwhile. In this way,
you construct experimentally the causal factor under study.
In many trials, you must randomly assign subjects either
to treatment or to control for valid comparisons to be feasi-
ble. Inform the participants that they will be (blindly) ran-
domized either to the control or to the new or active treat-
ment. That is, where possible, they will not know which
they receive. The protocol should be available for inspection
by all participants.
Cross-over trials (Section 3.3.1) are somewhat different in
that all subjects will receive the treatment(s) and control, but
in different orders.
94 Experimental trials
Aplacebo control is an inert treatment that appears, in
all external aspects, to be identical to the active treatment.
It reduces the chance of subjects guessing which treatment
they are receiving. Thus, for example, in testing drugs, the
placebo would be an inert substance, identical in taste, ap-
pearance, smell, density, and so on, to the active drug.
Using a placebo will allow you to distinguish the true
side effects of the intervention. If your control involves no
intervention at all, this can create problems of interpreta-
tion. Those receiving the intervention may be reacting sim-
ply because they are getting special attention (the placebo or
Hawthorne effect), and not because of the speciﬁc nature of
the treatment. Thus, it is preferable if members of all groups
believe that they are receiving equivalent special treatment.
The role of a placebo is not to deceive the subjects into
thinking that they are receiving the active treatment, but to
leave them in doubt as to which treatment they are receiving.
Thus, ethically, a placebo is only possible when randomiza-
tion of treatments is used.
If several active treatments, dissimilar to each other, are
involved in a trial, several corresponding placebos may also
be necessary. This can pose the problem of overburdening
the subject. Sometimes, it may also be useful to include two
control groups, with and without placebo.
In many situations, it is ethically impossible not to supply
an active treatment. Then, the new treatment is generally
compared to the existing or standard treatment as the control,
instead of to a placebo.
Always remember that a treatment effect, between two ex-
perimental groups (of volunteers), that is perfectly genuine
3.1 Basic principles 95
under trial conditions may be quite different when introduced
on a large scale in realistic conditions.
3.1.2 Choice of subjects
As for sample surveys, carefully plan what is to be the el-
igible population. Specify eligibility criteria in advance, in
the protocol. You must not determine them after selecting
subjects and assigning them to an experimental group.
The fact that an intervention is involved means that it is
generally impossible to choose a random sample of subjects
to participate in a trial. You cannot impose the intervention
upon them; they must voluntarily agree. Thus, subjects in
such a trial are never representative of a larger population.
Even if there are no refusals among those in the eligible pop-
ulation whom you ask, this does not make the subjects rep-
resentative of some larger population.
Eligibility criteria may render the chosen subjects even
more unrepresentative. In a medical study, if subjects are
required not to be taking any other medication, they will be
healthier than average. The more restrictive are the exclusion
criteria, the less generalizable will be the results. Extrapola-
tion is always a risky process.
At the same time, the subjects should not be unrepresen-
tative. Your primary scientiﬁc goal in an experimental trial
is to investigate what effect some intervention can have by
showing what effect it had in a particular case. In other
words, the test intervention should not be applied to subjects
who would not receive it under normal conditions after test-
ing is completed. (An exception might be the initial testing
of a drug for tolerability on healthy volunteers.)
96 Experimental trials
Often, your choice of investigators may conﬂict with your
choice of subjects. Centres of excellence, for example hos-
pitals or schools with highly trained research staff, will gen-
erally provide select and unrepresentative subjects.
Your next step will be to deﬁne what will be the experi-
mental units. Factors to take into account include:
their size, for example, classroom or child;
how representative they need be;
how close to realistic are the conditions in which they will
be studied;
whether responses on the same unit can be observed sev-
eral times, perhaps under different treatments.
In most trials, the size of the units will not be in question,
because it will be the individual subject.
3.1.3 Randomization of treatment
In a trial, you want to compare a set of two or more treat-
ments, including the control, with a group of subjects as-
signed to each. These groups should initially be as alike
as possible in all ways so that, in the subsequent analysis,
you will only need to compare them on the characteristics of
direct interest. Individuals need not be equivalent; you are
studying the group reaction.
The only objective way for you to ensure this similarity
is by randomization. If the choice of treatment is made by
the investigator, or by the subject, there is great freedom
for unconscious or intentional bias. Thus, even although
you generally cannot choose subjects at random from some
3.1 Basic principles 97
larger population, use randomization for the allocation of
treatments among the participating subjects.
The equivalence achieved by random assignment is prob-
abilistic. It is not inevitable that a correctly implemented ran-
domization procedure will result in similar groups, although
the larger the groups the more chance of their not differ-
ing much. A successfully implemented random assignment
procedure does not guarantee that the initial comparability
among groups will be maintained over the course of a trial.
The major risk is differential attrition related to treatments.
In many trials, this can be an important object of study in its
own right.
Thus, this randomization does not rule out all threats to
causal validity.
Subjects in a control group may imitate those under a
treatment.
If a control group is felt to be unjustly treated, compen-
satory measures may be available to it from elsewhere.
A group that feels it is receiving less desirable treatment
may exert special effort in compensation or it may become
demoralized.
You may prevent some of these by blinding where this is
possible, as discussed below.
If certain combinations of treatment allocation, that might
arise randomly, are undesirable, these can be speciﬁed in ad-
vance and eliminated. For example, when subjects are admit-
ted to a trial sequentially, the ﬁrst half of subjects might all
randomly be assigned to receive one treatment and the sec-
98 Experimental trials
ond half the other. Such possibilities are rare and generally
only occur in very small trials.
Subjects meeting the selection criteria are said to enter the
trial. Any exclusion after randomization, for whatever rea-
son, even discovery that eligibility criteria have not actually
been met, may upset the randomization balance. Before en-
try, every subject must be regarded as suitable for any of the
treatments under study. If this is not true, equivalent groups
cannot be constructed and comparisons will be impossible.
Your treatment allocation system should be such that the
people entering subjects into the trial do not know in advance
what is the next treatment to be assigned. If they did know,
they might decide that the next treatment was inappropriate
for the subject and not enter that person into the trial. Thus,
randomization is important as a measure of concealment of
the order of assigning treatments, because any system in the
sequence might be detected.
If the same investigator cannot administer the treatments
to all subjects, design the study so that investigators are also
distributed so as to give all treatments in some random fash-
ion. In this way, you can avoid confounding investigator bias
with treatment.
In simple randomization, you generate a series of random
numbers by computer. If only two treatments are involved,
you can assign the treatment of each subject sequentially,
with odd digits in the series indicating one treatment and
even digits the other. With three treatments, 1, 2, and 3 might
indicate the ﬁrst, 4, 5, and 6 the second, and 7, 8, and 9 the
third, zeros being ignored. You can easily adapt such a pro-
cedure to weighted randomization, with different proportions
3.1 Basic principles 99
in the groups.
Start treatment as soon as possible after entry and random-
ization of each subject, to avoid any intervening changes in
the state of the subjects. If at all possible, the waiting time
should be equal in all groups to avoid differential losses. In
many studies, entry must be staggered in time, as subjects
become available.
If it is necessary to ensure that exactly the same number
of subjects is assigned to each group, you will require some-
what more complex schemes. You can also use stratiﬁcation
and cluster methods, as in sample surveys (Sections 2.5.2
and 2.5.3). Here, the ﬁrst are often called blocks and classiﬁ-
cation factors and the second plots, both to be distinguished
from the causal factors of interest. As in multi-stage surveys,
there may be several levels of ‘plots’, and several sizes of ex-
perimental units, for example in a split plot design.
Cluster randomization
For intervention studies involving, say, new disease preven-
tion procedures or educational methods, randomization of
clusters is often more appropriate than that of individuals.
This may be necessary for various reasons.
The intervention may have to be at the group level, as in
community health services or an educational curriculum.
If allocation is at the individual level, control subjects may
beneﬁt from observing or communicating with the treat-
ment subjects.
You can capture the mass effect of interaction among in-
dividuals in a more realistic setting within each cluster.
100 Experimental trials
In a context where transmission is possible, such as an
infectious disease, you can study both susceptibility and
infectiousness within each cluster.
In a large study, it may be infeasible to allocate treatment
and control to different individuals in the same group even
although it would be possible within a small group.
The clusters might be villages, schools, medical practices,
factories, and so on. In many situations, cluster random-
ization provides experimental information that is closer to
that which will prevail if the intervention is subsequently ex-
tended to the whole population.
As with cluster sampling, there is a trade-off between a
few large clusters and many small ones. The former are
cheaper but may also reduce contamination between treat-
ment and control. On the other hand, the latter allow you to
control variability among clusters so that the averaging effect
of randomization can work. If you must use large clusters,
you may randomly sample individuals for further follow-up
within each cluster. If clusters are geographical, you may
concentrate sampling near the centre to avoid contamination
from neighbouring clusters.
3.1.4 Blinding
Even unconsciously, observers’ judgements can be affected
by knowing which treatment each subject is receiving, or by
knowledge of previous measurements on that subject. In the
same way, a subject knowing that he or she is receiving active
treatment may react differently than if the same subject knew
he or she had been assigned to the control group.
3.1 Basic principles 101
Blinding, or masking, refers to the fact that the people in-
volved in a trial are not aware of the treatment a subject is
receiving. In a single-blind trial, the subjects are not aware,
whereas, in a double-blind trial, neither the investigators di-
rectly involved nor the subjects know. The latter is preferable
but is obviously impossible in many contexts. If you cannot
blind the investigators directly involved, then you may be
able to use external assessment of responses.
A double-blind trial has the enormous advantage that the
investigators cannot, even unconsciously, bias the results in
one group and can objectively evaluate progress in all groups.
Indeed, in many cases, it will be advantageous to use a triple-
blind trial, whereby the statistician conducting the analysis
is also not aware of which group corresponds to which treat-
ment.
For blinding to be feasible, a placebo treatment must be
available. As with the use of a placebo, the subjects should
be informed of the protocol; blinding is only ethical under
randomization of treatments.
3.1.5 Primary end-point
Carefully deﬁne the main response variable that will be ob-
served, as well as when this will occur, called the primary
end-point. Describe in detail the way in which you will mea-
sure this and closely control it. However, this is usually a
technical question, speciﬁc to the subject matter being stud-
ied, and not a statistical question like the questionnaire con-
struction in sample surveys. Nevertheless, you should take
similar care about recording the results and ensuring that
they will be susceptible to statistical analysis. Thus, it will
102 Experimental trials
be useful if you review Section 2.3.
Generally, you must follow the effects of intervention over
a certain period of time. In other words, you will need to
monitor subjects to detect some change. Take into account:
the resources available;
the frequency of visits required for normal monitoring of
the process;
the inconvenience to subjects of frequent evaluation;
the number of measurements required to provide an ade-
quate comparison among treatments.
Often, you may want to focus on evaluation of responses at
the beginning and end of some ﬁxed period.
When you are performing interventions, the observations
that you measure on subjects, in addition to the actual treat-
ment(s), generally can take ﬁve main forms:
1. the baseline response before treatment begins;
2. supplementary baseline explanatory variables and, as well,
sometimes time-varying ones;
3. the principal response end-point of interest;
4. secondary responses, such as side effects;
5. monitoring, including compliance.
In many situations, you may ﬁnd it useful to make prelim-
inary observations on the subjects before treatments begin,
called baseline response values. Several recordings may be
necessary to avoid random ﬂuctuations. These will provide
indications of the variability among the subjects, and the reli-
ability of measurement, that may be useful in controlling and
3.1 Basic principles 103
interpreting the results. Indeed, if you are measuring change
or improvement under treatment, recording the baseline re-
sponse is essential.
Especially if there is substantial variation among subjects,
it may prove useful to plan to collect concomitant observa-
tions on other variables besides the baseline response. The
main condition for these to be useful in the subsequent anal-
ysis is that they not be affected by the treatment. You can
easily ensure this by taking the measurements before treat-
ment begins, but this is not always possible. If you believe
that they inﬂuence the subject’s response to treatment, you
may use them as prognostic factors.
Take particular care with subject identiﬁcation: there will
usually be several forms to be ﬁlled out at different points
in time that must later be linked together. Investigators must
have the appropriate forms available at the right time. Send
out requests for them to be returned promptly.
Note and record all reactions of all groups equally. A
checklist of possible side effects and the reasons for non-
compliance may be useful, although this has the same dan-
gers as closed questions in surveys (Section 2.3.2).
If judgements or interpretations of the data recorded for
each subject are required, make them before disclosure of
which group had what treatment.
For many types of trials, it is important to conduct some
form of follow-up study to determine what are the long-term
effects of the intervention, perhaps years after the treatment
has ended.
104 Experimental trials
3.1.6 Missing values and non-compliance
Because your subjects are volunteers, the problem of refusals
found in sample surveys generally does not occur here. On
the other hand, missing values will occur for certain obser-
vations. Where applicable, pay special attention to provision
for treatment during holidays and weekends.
In addition, because of the lengthy time period often in-
volved, the number of drop-outs can be considerable. Some-
times this will be unavoidable, as with deaths, people mov-
ing away, and so on. However, in other cases, treatment may
be stopped or changed because of side effects, and so on.
(Intentional drop-out might be considered to be a form of re-
fusal.) If drop-outs are not linked with the intervention, they
create no fundamental problem (except loss of information)
for internal validity, although problems may arise for exter-
nal validity.
You can make several types of checks on differential attri-
tion:
Is the rate of drop-out the same in all groups?
Are the reasons for dropping out the same in all groups?
Are the pre-randomization covariates still comparable for
those remaining in the groups?
If baseline response measures are available, are they still
comparable for those remaining in the groups?
A problem that does not arise with surveys is compli-
ance: do the subjects actually follow the treatment assigned
to them? This can be very difﬁcult to judge, because, for ex-
ample, patients can discard pills instead of consuming them
3.1 Basic principles 105
and teachers can go back to their old teaching methods. In
some cases, you can assess the probability of adherence to
the assigned treatment and use it as a screening criterion for
entry to a trial. Once the trial begins, use all realistic proce-
dures to ensure the maximum compliance possible. Record
measures of the extent of non-compliance. The need for
screening and compliance enforcement will usually make it
more difﬁcult for you to generalize the results to applications
in more realistic situations.
If your goal is to study the effect of intervention in realis-
tic conditions, compliance may not be a problem because the
same thing could be expected to occur under normal condi-
tions for the population as a whole. Then, statistical analysis
is carried out on intention to treat, that is, on the treatment
assigned to a subject, not necessarily that actually followed.
Related to compliance is the wider issue as to whether
subjects are reacting in a reasonably normal way in the ex-
perimental conditions so that fairly general conclusions can
be drawn. You can take certain steps in an attempt to ensure
this:
Check the reactions on pilot subjects before beginning the
study.
Redesign the study to reduce suspicion, if necessary.
Provide clear instructions.
Emphasize personal anonymity.
Minimize aspects that appear to test the subject.
Use non-invasive instruments to obtain response measures.
Separate the experimental manipulation and the response
measurements as far as possible.
106 Experimental trials
Develop special instruments to detect if subjects are sim-
ply trying to please the investigator.
3.2 Ethical issues
Because you are imposing an intervention, ethical issues are
much more critical than in a sample survey. However, many
of the basic questions are the same; your should review Sec-
tions 1.1.3 and 2.2.3.
3.2.1 Ethics committees
For experimental trials, the conﬂict between present individ-
ual and future collective ethics may be particularly evident.
Each subject should receive the most appropriate treatment
now, but, in the future, all subjects may collectively bene-
ﬁt from an intervention that has been shown to be superior.
Weighing the merits of the two for a given trial is often a
delicate task. Some judgement must be made as to whether
the impingement on the individual is such, as compared with
the possible beneﬁt to society, that the trial should not take
place.
For most areas of study, when interventions using human
subjects are involved, you must obtain permission from an
ethics committee. Generally, you must submit a full protocol
describing the study, with ample justiﬁcation that the objec-
tives are worthwhile, that the design is efﬁcient, and that the
rights of the subjects will be protected.
3.2 Ethical issues 107
3.2.2 Informed consent
The main question at issue is whether it is ethical to withhold
a treatment that might perhaps give beneﬁt. On the one hand,
the value of the intervention is not proven or there would be
no need for the trial. On the other, there must be some ba-
sis for considering the new treatment, or a trial would not be
undertaken. In no case should the control group be disadvan-
taged by participation as compared to their non-involvement
in the study.
The decision often depends on the gravity of the condition
being treated, it being impossible, for example, to withhold
treatment in a life-and-death situation (which treatment, the
old or the new?). But it may be unethical to introduce a new
treatment into general use, if it has been poorly or inade-
quately tested. All risks do not lie on one side; what is new
is not always best.
On the research side, ideally no investigator should par-
ticipate who believes that one treatment is clearly superior.
An investigator should not enter a subject in a trial, if he or
she believes a particular treatment to be preferable for that
person. In other words, you are conducting the trial because
no one knows which treatment is better.
Unrandomized trials are almost invariably unethical be-
cause they involve subjects with risks when the results of the
study will be unreliable. The same might be said of any trial
that does not use an optimally efﬁcient design.
Questions that you must face include:
Should you obtain the subjects’ informed consent?
Is the new treatment safe and unlikely to bring harm?
108 Experimental trials
In a trial with inert control, can you ethically withhold
treatment?
Is it acceptable to use a placebo treatment?
What type of subjects can you acceptably allocate ran-
domly between different treatments?
Is it all right to use double-blinding?
(See Hill and Hill, 1991, pp. 212–214.) For most questions
except the ﬁrst, there is rarely an unequivocal answer.
Inform potential subjects in detail about the conduct of
the trial, including the alternative treatments that will be in-
volved. You must guarantee conﬁdentiality. Potential sub-
jects can decline to participate. If they agree to take part,
they generally will have to sign a form stating that they un-
derstand the trial, called informed consent. The question may
arise as to whether they actually do understand or not. Legal
requirements as to such consent vary greatly among coun-
tries.
If the experimental unit is a group, then you must gener-
ally obtain special permission from the leaders of that group,
such as school directors or community ofﬁcials. Although
you should inform the members of the group, obtaining ac-
tive individual consent is not always possible. However, if
individuals are to be followed up and tested, you will require
their consent.
3.2.3 Interim analysis
One of the main reasons for monitoring a trial is the ethical
concern to reduce the chance of subjects receiving a treat-
ment known to be inferior. Thus, when subjects are to re-
ceive treatment over a long period of time or if their entry to
3.3 Ethical issues 109
the trial is staggered, interim assessment of treatment differ-
ences, where possible, is essential to make a trial ethically
acceptable. Once you can reach a conclusion, you will enter
no new subjects, and you may perhaps change those in the
trial to a more appropriate treatment.
Generally, you will check the primary end-point response,
usually one main treatment comparison. Specify in the pro-
tocol the formal ‘stopping rule’ for the trial, that is, the cri-
terion indicating sufﬁcient superiority of a treatment so that
the trial can be stopped. For this to be possible:
the time lag between subject entry and meaningful re-
sponse measurements must not be excessive, especially
in comparison to the total time during which individuals
will be entered;
data entry must be kept up to date, with all forms quickly
returned, not just an unrepresentative set, such as the op-
timistic ones.
Interim analyses should be relatively simple, sufﬁcient only
to determine if the trial is to continue or not.
You will generally have interim analyses performed peri-
odically and not continuously, with the frequency speciﬁed
in the protocol. They should be performed by people not di-
rectly involved in the investigation, and the decision on con-
tinuing made by the committee responsible for the trial. The
decision to stop a trial will never be purely statistical.
The results, if the trial is to continue, should remain conﬁ-
dential so that subsequent recruitment and responses will not
be biased. They can inﬂuence both subjects and investiga-
tors.
110 Experimental trials
3.3 Designs
In most trials, only a limited increase in precision can be ob-
tained by modifying the instruments used. Trials under very
controlled conditions usually cease to be representative of
practical conditions. Precision will depend much more on
intrinsic human variability and experimental design, includ-
ing sample size.
Take care not to use too many different treatments because
the ability to detect treatment differences depends primar-
ily on the number of subjects per treatment, not on the total
number in the trial.
The two principal designs used in experimental trials with
human beings are the parallel and cross-over designs. The
scientiﬁc ideal would always be to administer all treatments
to all subjects, because different subjects may react in differ-
ent ways to the various treatments. However, this is rarely
possible for various practical and ethical reasons.
3.3.1 Cross-over designs
In certain special circumstances, it may be possible for you to
apply several treatments sequentially to each subject. If you
can apply these in different orders, you require a cross-over
design: you randomly assign each subject to receive some
given sequence of different treatments in successive periods.
Often, you can use some form of Latin square design to bal-
ance the numbers in the different sequences: each treatment
appears once in each period and in each sequence.
Note that this is quite different from applying all treat-
ments to all subjects, but always in the same order. Then,
3.3 Designs 111
treatment effects cannot be distinguished from a temporal
change.
Obviously, you can only apply such designs in situations
where:
the condition under study is chronic, with no trend in time;
the measured response is recurrent;
the condition is not fundamentally modiﬁed by a treat-
ment.
These designs are really the only possibility when subjects
may differ in their responses to the various treatments, that
is, when there is the possibility of a subject–treatment inter-
action. They are more useful for one-shot treatments, than
long-term ones.
Cross-over designs have the major advantage that you can
compare the different treatments on the same subjects. This
generally means that the sample size can be smaller for the
same precision.
The major disadvantages are as follows:
The effects of earlier treatments may carry over to modify
responses under later treatments.
Drop-out may occur between treatments so that no within-
subject comparison is possible.
Systematic differences may appear between periods, per-
haps simply due to the learning effect of being treated in
the ﬁrst period.
To counter carryover, either plan an adequate wash-out pe-
riod between the end of one treatment and the beginning of
112 Experimental trials
the next or allow a ‘burn-in’ period at the beginning of each
treatment, during which you do not measure (or, at least, do
not analyse) the response values. In certain contexts, the ﬁrst
possibility will present ethical problems, if you have to with-
hold treatment completely during that period.
If period effects are present, variability among individuals
has been replaced by variability over time within individuals.
Many different types of speciﬁc cross-over designs exist.
These depend on the number of different treatments and peri-
ods, on whether each subject receives each treatment in turn,
and so on.
3.3.2 Matching
If a cross-over trial is not possible, it may still be possible for
you to ﬁnd pairs of subjects with closely similar character-
istics and to allocate treatment and control randomly to the
two members of the matched pair. The major problem here
is to have a large enough pool of subjects from which you
can draw ones with the appropriate characteristics simulta-
neously. If matching is made too precise, it will be difﬁcult
to obtain two suitable subjects at the same time.
One advantage of this design is that it may permit you to
detect what kinds of subjects are most susceptible to ben-
eﬁt from each treatment. For example, for the majority of
subjects, response differences to treatment may be negligi-
ble but, for a minority, one treatment may be vital.
3.3.3 Parallel designs
In a parallel design, you randomize subjects to different treat-
ments; they then stay on them for the whole trial. Most such
3.3 Designs 113
trials involve a new treatment and either a control or stan-
dard treatment. Randomization to the treatments allows you
to make the assumption that the groups are comparable in all
relevant aspects. This is the standard and most common type
of trial.
Factorial designs
Treatments should differ qualitatively in single, speciﬁcally
identiﬁable, ways so that your interpretation of response dif-
ferences will be clear, with unique explanations. In other
words, more complex treatments should be split up into sev-
eral simple factors. Remember that an effect generally has
several causes. In addition, it may not act the same way in
all situations.
If you are to study several types of intervention, the fac-
tors, it is almost always preferable to use them simultane-
ously in a factorial design that includes all possible combi-
nations. The different treatments within a factor are often
called levels. Examples of pairs of treatments would include
new textbooks and new teaching practices, or two comple-
mentary types of medication. Thus, in the simplest case,
for example, with two such treatments, each at two levels
(control and active), you would have four groups: (1) con-
trol only (perhaps two placebos); (2) ﬁrst type of treatment
only, perhaps with a placebo; (3) second type of treatment
only, perhaps with a placebo; and (4) both types of treat-
ment. Factorial experiments may also include combinations
of treatment and classiﬁcation factors.
Factorial designs have several advantages: you can
obtain more information from a smaller number of sub-
114 Experimental trials
jects, that is, higher precision at lower cost;
study the interaction among types of treatment with re-
spect to the response;
perhaps extend the validity of your conclusions by the in-
clusion of a classiﬁcation factor that increases the vari-
ability of the conditions of application;
check on whether treatments might have different effects
under different conditions, a causal factor interacting with
a classiﬁcation factor.
Two factors are said to interact, in the statistical sense, if the
effect of one factor on the response changes depending on
what value the other has. If there is no interaction, the differ-
ence in response between two levels of one factor is the same
no matter what level the other has. Thus, for the statistician,
the term ‘interaction’ does not have the usual common-sense
meaning.
These considerations for factorial designs do not imply
that you should plan the biggest and most complex trial pos-
sible.
Take care, especially at the beginning of an investigation,
not to commit yourself to one big trial; small preliminary
trials may indicate the proper line of attack.
For a real understanding of many problems, a series of
small trials is more appropriate than one large trial, each
one designed in the light of previous results.
Large and complex trials are difﬁcult to organize.
In most cases, you should consider no more than two or three
factors.
3.3 Designs 115
Factor levels may not necessarily be qualitatively distinct,
especially in factorial designs. We may distinguish:
ﬁxed qualitative factors, each level being of intrinsic in-
terest;
ranked categories, such as slight, moderate, and severe;
quantitative factors, with levels ﬁxed at certain arbitrary
levels, such as doses of a drug;
factor levels that are assumed to be ‘representative’ of
some larger population of possible levels, although rarely
chosen at random.
With quantitative factors, you can study changes as a re-
sponse curve (or surface if in several dimensions). Because
only a few levels are used, analysis will generally require you
to make the assumption of some smooth functional statistical
model connecting them together.
Sequential trials
When it is imperative not to continue an inferior treatment
(once this becomes known), you may use a (group) sequen-
tial trial. You continue the study only until it is clear that
one treatment is superior to another. To do this, you must
analyse the data after the results of each subject or group of
subjects become available. For this to be possible, subjects
must enter the trial sequentially and results must be available
relatively quickly after administering the intervention.
Such trials have the additional advantage that, on aver-
age, you will require a substantially smaller number of sub-
jects than in a ﬁxed sample size trial, particularly if there is
116 Experimental trials
a big difference in response between the treatments. How-
ever, when designing such a trial, you should realize that
the conventional probability levels for signiﬁcance tests and
conﬁdence intervals are much more difﬁcult to calculate than
when the size is ﬁxed in advance.
Equivalence trials
Trials to determine if a new treatment is equivalent to the ex-
isting standard one are particularly difﬁcult to design because
of the danger of detecting no difference simply because the
study was too small. Those judging the results of such a trial
must be convinced that it was properly conducted.
3.3.4 Sample size
Sample size calculations are very similar to those for sample
surveys. You should ﬁrst (re)read Section 2.6. Maintaining
a trial at its minimum size necessary to provide the required
precision is here even more imperative than for surveys be-
cause of the risks involved in imposing interventions. The
sequential trials mentioned above are one means to this end.
If you cannot use one, you will need to calculate the minimal
sample size.
The power of a trial is its ability to detect a difference of
interest, due to an intervention, if it really exists. This is pri-
marily a function of sample size. Too small a trial is a waste
of resources and exposes subjects to useless risks. In equiv-
alence trials, too small a sample will ensure that treatments
appear to be equivalent!
Consider the simple case of two treatments, say active and
control, and equal numbers in each group. The sample size
3.3 Designs 117
is calculated for differences in response at the primary end-
point.
The ﬁrst step in a sample size calculation is to specify
the smallest difference in response among treatments, δ, that
would be of importance. This is not a statistical, but a sci-
entiﬁc question. Then, you can apply the formula of Section
2.6, but where you replace the 16 by 64 to obtain precision
equivalent to the two standard errors used there. In other
words, you multiply by four all values that you would have
calculated for the examples without covariates given above
for sample surveys.
As an example, for binary responses, the value of πwill
be the expected average probability under the two treatments.
Suppose that you want to detect a difference in probability of
response of 10%, between say 75% under control and 85%
under active treatment. The average is 80% and the sample
size is calculated to be
n=64 ×0.8×(1 0.8)
0.12= 1024
that is, 512 in each treatment group. You can use similar
procedures for counts and measurements, as in Section 2.6.
In more complex trials, one particular contrast between
two treatments may be of particular interest and you might
use this to calculate the required sample size. Otherwise, you
will require more complex techniques. In a sequential trial,
the standard error calculated at a ﬁrst stage will generally
give you a good idea of how many subjects are still required
to attain the desired precision.
It is particularly important, when you begin a trial, to en-
118 Experimental trials
sure that you foresee a large enough sample size, that suf-
ﬁcient subjects will be available to fulﬁl that goal, and that
your ﬁnance support is adequate. Do not undertake a trial
that will result in too few subjects to meet scientiﬁc require-
ments of precision.
When subjects enter into a trial sequentially, as they be-
come available, you must estimate the accrual rate accu-
rately in order to ensure that enough people will be avail-
able in the time planned for the trial. This will often be an
overestimate mainly because of:
over-enthusiasm of the research workers planning the trial;
ineligibility of some people;
refusals;
loss of interest over time if the study is too long.
If you cannot meet the sample size in the allotted time, you
will need to:
increase the accrual rate, for example by using more cen-
tres or changing eligibility criteria;
reduce precision;
increase the time period; or
stop the trial, if it will not be possible to detect relevant
differences among treatments.
Many trials that ﬁnd no evidence of treatment differences are
too small to reach a reliable conclusion.
3.4 Organization 119
3.4 Organization
3.4.1 Clinical trials
In medical research, experiments are usually clinical trials
to evaluate the effectiveness of some treatment(s), such as
drugs, surgery, physiotherapy, diet, health education, and so
on. These trials thus involve patients with a given medi-
cal condition and are designed to provide information about
the most appropriate treatment for future patients having the
same condition. The vast majority of clinical trials are con-
cerned with evaluating some speciﬁc drug, most often con-
ducted, or at least ﬁnanced, by a pharmaceutical company.
A complete study for a new therapy requires evaluation
of safety,efﬁcacy, and quality of life. Especially for a drug
that is to be commercialized, preclinical research begins with
animal experiments testing for safety. Then, human experi-
mentation can generally be classiﬁed into four phases:
1. Phase I: initial study of pharmacology and toxicity, usu-
ally with healthy subjects, to determine safety at various
doses, including side effects;
2. Phase II: small-scale, often non-comparative, clinical in-
vestigation on patients, to screen out ineffective drugs and
to determine dose and other characteristics of the therapy;
3. Phase III: full-scale evaluation of an apparently effective
treatment in its ﬁnal form, as compared to a control or to
standard treatment;
4. Phase IV: surveillance after approval for commercializa-
tion, monitoring long-term adverse effects.
120 Experimental trials
The ﬁrst two phases involve tightly controlled scientiﬁc in-
vestigation of medical aspects, whereas the third is closer to
realistic administration of the therapy once commercialized.
The ﬁrst two are exploratory, providing hypotheses that can
be tested with the results of the third. The fourth may often
take the form of a sample survey, rather than a trial.
Phase II trials were traditionally often uncontrolled and
not blinded; this could bias the results, for example by the
enthusiasm of the investigators. Many early studies of this
type have suggested that the new treatment is highly effec-
tive, only for this apparent beneﬁt to disappear when more
carefully tested in Phase III. Thus the use of randomization
is increasing in Phase II studies.
3.4.2 Multi-centre trials
An experimental trial may be organized in one centre, a hos-
pital or school for example, or, more often, in a number of
centres. The latter will create an effect of clustering, as in
sample surveys. The advantage of a multi-centre trial, over
using just one centre, is that it includes much more variability
among subjects so that there is more chance of the results be-
ing generally applicable. This should be true even although
the centres recruited will not be representative of centres in
general. Centres are chosen for reasons of cost, efﬁciency,
convenience, the research group’s reputation, and so on, but
rarely because they are typical.
The centres involved will usually be rather variable, for
size, geographical, subject recruitment, and other reasons.
It will often be preferable to allow these variations to enter
the study, particularly if they are commonly met in practice,
3.4 Organization 121
rather than to maintain too high a degree of standardization,
difﬁcult to accept either during the trial or subsequently. If
different centres provide different results, then generalizabil-
ity is in question. This is important information to obtain.
Often, the choice will be between a large number of sub-
jects rapidly recruited in a number of centres with relatively
short follow-up and fewer subjects with a long follow-up.
The former design can present major administrative prob-
lems, including ensuring the constant use of the same pro-
cedures in all centres. The latter may suffer from changing
investigators, loss of interest, and so on.
The difﬁculties that may arise with multi-centre trials in-
clude:
complex planning and administration;
excessive expense;
demotivation of investigators;
non-uniform eligibility requirements for entry;
problems with monitoring;
greater risks of missing data, for example from lost forms;
deviations from the protocol, including problems of non-
compliance;
lack of uniform quality control of data.
Carefully weigh these factors before undertaking such a trial.
3.4.3 Longitudinal trials
Most often, you must follow subjects over a sufﬁciently long
period for the treatment to have an effect. In this sense, all
trials are longitudinal. However, in many trials, a speciﬁc
end-point is deﬁned: the ﬁnal response of interest. Because
122 Experimental trials
this occurs at a given point in time, the study is fundamen-
tally cross-sectional.
In other cases, you will measure responses, at ﬁxed or
convenient times, over a considerable period. Thus, you will
have a sequence of response values for each subject. This is a
true longitudinal study. Notice, however, that randomization
to treatment only assures comparability among groups at the
moment of allocation. As the subjects evolve over time, their
characteristics will be modiﬁed, depending on their previous
history, and hence may no longer be comparable.
3.5 Summary
An experimental trial is an intervention involving human be-
ings. The major advantage of such a trial is that you can
empirically study causality. However, because you must use
consenting volunteers, the results will often not be easily
generalizable to more realistic situations outside of the con-
trolled context of the trial.
You can only identify the effect of a new treatment by
comparing it with something else, usually a placebo or the
standard treatment. The only objective way to ensure that the
groups receiving the different treatments (including placebo)
are similar is to assign subjects randomly to them. To avoid
biases in reactions of both subjects and observers, blind the
trial whenever possible, so that those directly involved do not
know who has what treatment.
The primary end-point is the response of main interest to
be observed at some well-deﬁned time. Baseline measure-
ments will also usually be useful.
3.5 Summary 123
Missing values, including drop-outs, will be of concern.
One problem speciﬁc to trials is compliance: are the subjects
actually taking the treatment assigned to them?
Because you are applying an intervention, ethical issues
are crucial. All subjects must give informed consent. You
must use procedures, such as interim analysis, to stop the
trial as soon as possible if it is clear that one treatment (in-
cluding the placebo) is superior so that no subjects continue
longer than necessary on an inferior one.
Cross-over and parallel designs are the principal ones used.
The former have the advantage that the various treatments
are compared on the same people, but this is not possible in
many cases.
In medical research, clinical trials have a central place in
the development of new procedures. Both safety and efﬁ-
cacy must be determined in a series of phases from initial
exploration to ﬁnal conﬁrmation and long-term surveillance.
Both multi-centre and longitudinal trials have their own
special complications.
4
Data analysis
4.1 Data handling
Efﬁcient data management is essential to minimize errors.
This usually means that you should employ specialized staff,
neither the investigators nor the statisticians, for this task.
Your basic goals will be to have data that are:
complete;
accurate;
uniform or properly standardized;
coherent.
In many ways, achieving these goals will provide you with a
check on the actual data collection process.
4.1.1 Data entry
Once you terminate data collection, your ﬁrst step will be to
check that all forms have been returned. Also visually check
them for any obvious errors and omissions, and seek the nec-
essary rectiﬁcations from the ﬁeld investigators. Then, have
the data transferred to magnetic form for analysis by com-
puter. The main steps are:
4.1 Data handling 125
1. coding the results in a suitable form for computer treat-
ment (you may have incorporated this stage into the orig-
inal questionnaire);
2. typing the coded values into the computer (in certain cases,
this may also be possible electronically, as with bar codes
and light pens);
3. editing the resulting ﬁle(s) to remove errors.
This can be one of the most costly and time-consuming oper-
ations in a study. It is the one where errors are quite possible,
and where you can most easily avoid them, with proper care.
The main sources of error are:
recording, as when an instrument is misread;
deliberate, due to falsiﬁcation either by the investigator or
by the respondent;
transcription when observations are copied;
typing (during data entry).
You should have the data entered using some database
management system; some of the major statistical packages
contain one or you can use a general-purpose one. If the
mass of data is of a reasonable size, you should have an entry
grid developed within the database system; this will show the
coder, at all times, what variable is currently being entered
and prevent entry of impossible values. Thus, if a question
has a binary response, such as yes/no, that is to be coded as
1 or 2 (plus a missing value code), no other values will be
accepted. This removes one of the most common sources of
fundamental errors in the database.
126 Data analysis
If you are using several forms for each observational unit,
whether from different levels of clustering or longitudinally
over time, such a database system will also provide a con-
venient means of linking them together. Complications may
arise, especially in experimental trials and longitudinal stud-
ies, because:
subjects have unequal amounts of data recorded;
data on each subject accumulate over time.
In trials with staggered entry, earlier subjects will have more
data available than later ones at any particular moment. Care-
fully choose ways of handling these problems.
You can reduce errors if you use the same code for miss-
ing values in all variables, although this may not always be
possible. Some computer programs have a special code sym-
bol for this, but, if you use it, no other software may be able
to read the ﬁle. Most often the code chosen is a set of nines,
the number of digits corresponding to the maximum number
of digits occupied by the variable. The use of blanks to in-
dicate missing values is especially dangerous because much
statistical software cannot distinguish these from zero values.
As explained in previous chapters, plan the ways in which
each variable will be coded for entry into the computer at the
stage of preparing the instruments for data collection. You
should have planned data collection in such a way that no
intermediate calculations are necessary until after the data
are in the computer. The basic types of coding are:
counts;
measurements, for which the units should be clearly spec-
iﬁed;
4.1 Data handling 127
discrete categories, for which special coding will be re-
quired.
Let us consider each of these in turn.
Counts These take integral values that cannot be negative.
There may be some upper limit to the reasonable values that
they can take.
Measurements Almost all measurements are positive val-
ues. Common errors involve:
a misplaced decimal point;
digit preference, whereby observers tend to round to say
zero or ﬁve for the last digit of each number;
confusion about the type of units – in an international
study, feet or yards for lengths in one country and metres
in another;
mistakes in the size of the units – some values in millime-
tres and others in centimetres, or some in days and others
in weeks.
The last can be difﬁcult to detect, especially if the units are
not mentioned on the data form.
In international studies, extreme care must be taken with
numerical dates; not all countries use the day–month–year
order. If years are only recorded to two digits, errors can
easily occur at the turn of the century.
Discrete categories Many statistical packages allow you
to code discrete categories by alphabetic or numerical means,
but some require all values to be numeric. Use of alphabetic
128 Data analysis
codes may restrict future analysis to the former packages, but
may reduce the risk of errors at data entry.
More general errors include:
transposition of digits;
repetition of the same value in two successive rows or
columns, perhaps with the effect of displacing all subse-
quent values in that row or column;
hidden time effects, whereby the value observed depends,
for example, on the time of day or the time of year.
The safest, but most costly, way for you to ensure that data
are entered correctly is to do it twice with different people.
Then, a computer program can check for differences between
the two ﬁles. Any found must obviously be corrected! This
process, however, will only detect errors in the transfer of
data from the forms to the computer. It will not ﬁnd errors in
originally ﬁlling out the forms.
It is usually not possible to know what is correct, so re-
strict attention to ensuring that the recorded values are plau-
sible. Nor can you expect to spot all errors, although hope-
fully you will ﬁnd the major ones. However, if there is no
reason to suspect that seemingly strange, but possible, val-
ues are wrong, you should not modify them. Such checking
should be carried out as rapidly as possible; the longer you
wait, the less chance you will have of being able to obtain
a correction. Further data checking and screening will be
discussed in Section 4.2.
The database system will then ﬁnally produce one or more
computer data ﬁles containing all of the information col-
4.1 Data handling 129
lected. In simple cases, these will be of rectangular form,
with each line referring to one observational unit and each
column to one variable, the standard form that most statisti-
cal software requires.
Once the data are entered, store the original forms in a
safe place. Also make copies of all computer data ﬁles and
store them in a separate building, in case of ﬁre or theft. It is
wise to do this sequentially at all stages of data entry as well.
4.1.2 Computer treatment
Computer manipulation of data has both advantages and dis-
advantages. Among the advantages are that you can:
quickly handle vast amounts of data;
obtain results to high precision;
try many and varied statistical techniques, some very com-
plex;
easily present data graphically;
rapidly repeat analyses after making small changes or cor-
rections;
calculate new variables from those available.
On the other hand, some of the disadvantages are as follows:
software may contain errors or, more often, poor or inad-
equately documented statistical techniques;
with a wide variety of statistical methods available, you
may choose the wrong ones;
you may use the software without any understanding of
what it is actually doing in any given analysis, the black
box approach;
130 Data analysis
if the data or the questions are erroneous, the software
will still produce a seemingly plausible answer – garbage
in, garbage out.
Massive computer analysis will never be a substitute for clear
thought.
Spreadsheets may be suitable for data management, but
are not designed for statistical analysis, often even yielding
incorrect results.
The criteria for choosing among statistical software in-
clude whether they have the following features:
clear documentation;
clear, self-explanatory output;
ﬂexibility for reading data from ﬁles;
data management, such as editing;
a reasonable maximum amount of data accepted;
accuracy, precision, and speed;
a wide choice of appropriate statistical methods;
treatment of missing values;
a variety of high-resolution graphics;
good reputation and reasonable cost;
good error handling;
user friendliness and interactivity
(Altman, 1991, p. 110). The relative importance of these fac-
tors will depend on the variety of uses to which you will put
the software and on the sophistication of the users. Gener-
ally, it is preferable to use the same software for all analyses,
but, for certain problems, such as clustering, you may require
specialized software.
4.1 Data handling 131
4.1.3 Data editing
The goal of data editing is to detect and, where possible, to
correct errors in recorded data resulting from data collection
and entry. This should involve not only cleaning the data, but
also obtaining information on how the errors arose in the ﬁrst
place in order to improve the process during future studies.
In a sample survey, editing may account for 20–40% of
the total costs, including computer hardware and software,
salaries, and ﬁeld expenses. Other costs upon which you will
ﬁnd it more difﬁcult to place a monetary value include:
ill will due to the additional burden on respondents;
lack of conﬁdence in data quality;
loss of timeliness of the results due to delays.
Excessive editing can be often counter-productive; you can-
not guarantee high quality simply by increasing the number
of checks. This may hide problem areas instead of revealing
them.
Data errors that you can most easily correct by editing are
those that would be recognizable to a user of the individual
data records but not having any supplementary knowledge
about each given unit. The suspicious items are more difﬁ-
cult. You can judge the latter in relation to their inﬂuence
on the estimates (Section 4.4.1) to be calculated. Begin with
the most extreme suspicious values and stop verifying when
further corrections have little effect on the estimates.
The focus of recontacting respondents should be to ac-
quire knowledge of the respondents’ problems and causes of
errors rather than just to determine that a suspicious value is
132 Data analysis
wrong and to ﬁnd a more accurate value. This will lead to
improvements in your future study designs.
4.2 Descriptive statistics
4.2.1 Univariate statistics
The ﬁrst step in any data analysis should be to produce the
simple descriptive statistics for each variable in the database.
For categorical data, provide the frequencies and percentages
in each category. You may supplement these by histograms
that display this information graphically. If count data in-
volve only a few small values, you can use the same methods
for them.
For large counts and for measurements, calculate the mean,
maximum,minimum, and standard deviation. It may also be
useful to categorize the values and to produce percentages
and histograms, as above.
Carefully scrutinize all of your descriptive statistics for
anomalies. This is the beginning of the second stage of data
checking. If you have used a proper entry grid in a database
system, no impossible values should be present. Then, any
anomalies might consist of unexpectedly large or small per-
centages in certain categories, unreasonable maxima or min-
ima, and so on, called outliers. Any problems that you detect
must be traced back to the original forms, or further if neces-
sary, in order to make the appropriate corrections. Again,
you should only change values that you are certain to be
wrong. If an outlier is correct, it may indicate an anomalous
individual, perhaps one who is not actually in the eligible
population or one who is of special scientiﬁc interest.
4.2 Descriptive statistics 133
4.2.2 Cross-classiﬁcations
Once you are satisﬁed that the univariate statistics are rea-
sonable, you can consider the relationships between pairs of
variables. For even a small number of variables, this will
yield an enormous number of combinations; you will usu-
ally not be able to study all of them in detail. Then, select
the most crucial and informative ones, especially those re-
lated to the main response variable(s).
For categorical data, two-way frequency tables, or contin-
gency tables, are generally most useful. Most statistical soft-
ware will produce percentages for both rows and columns.
When numerical quantities are involved, whether counts or
measurements, scatter-plots are often most useful. Occa-
sionally, non-parametric methods may be helpful in drawing
smoothed lines through the data.
Inspection of these results will provide logical checks, re-
vealing impossible combinations of values, something that
you could not detect by the previous methods. Thus, for ex-
ample, if you consider age and year in school, three-year-
olds in grade ﬁve would be suspect. In the same way, the
number of previous pregnancies should be undeﬁned for all
men. Such inspection may also reveal pairs of values that are
individually plausible but impossible in combination.
If you have recorded a series of values, say responses over
time, on each individual, plot them, as proﬁles, to check that
they vary in an acceptable way. For example, height of chil-
dren should gradually increase, with no decreases.
In experimental trials, you should make checks on the de-
gree of equality among randomized groups, especially if they
134 Data analysis
are small. However, testing (Section 4.4) if departures from
equivalence are due to chance is irrelevant, unless you sus-
pect errors in the randomization process itself, because you
know that allocation was random, and random departures
will be present.
Study of these simple descriptive statistics also enables
you to gain a ﬁrst familiarity with the structure of the data.
They will provide a basis for much of the presentation of the
ﬁnal report; however, you will interpret them in the light of
the more sophisticated analyses that you have performed but
that many readers may not be able to understand.
4.3 Role of statistical models
The objectives of a study should indicate a few main analyses
of primary interest. This will be particularly so in experimen-
tal trials. Nevertheless, exploratory inspection of the data
will generally provide further important information that you
should not neglect. However, this is hypothesis generating,
not conﬁrmatory analysis; the latter will require new data.
Any data collected contain a mass of information. The
problem is for you to extract that part of it that is relevant to
the questions to be answered by your study, in the simplest
and most understandable way possible. This essentially in-
volves checking for pertinent patterns and anomalies in the
data. This is a basic role of statistical models: to simplify
reality in a reasonable and useful way, a way that you can
empirically check with the data. No model is ever ‘true’, but
some models are more useful than others for given data and
questions.
4.3 Role of statistical models 135
Models can serve many roles. They can provide:
a parsimonious description or summary of results, high-
lighting the important features;
a basis for prediction of future observations;
biological or social insight into the processes under study;
a test of a prior theoretical relationship;
comparisons of results from different studies;
measures of precision of quantities of interest.
You can think of models as smoothing the irregularities in
the data in a way that makes patterns clearer. The danger, of
course, is that a pattern that you isolate in this way is a ran-
dom artefact of the given data set, corresponding to nothing
reproducible if someone were to do a second such study, and
hence to nothing in the population under study. The role of
model selection, through empirical checking with the data,
and of measures of precision is to reduce this risk and to
quantify it; you can never eliminate it entirely.
The basic steps in exploratory model building will usually
include:
1. studying the pertinent descriptive statistics, as described
above, in order to become familiar with the data;
2. developing a reasonable model from the results of step 1
and from previous knowledge;
3. ﬁtting the model to the data;
4. checking the goodness of ﬁt of the model;
5. going back to step 2, if necessary;
6. using the model to draw appropriate conclusions.
136 Data analysis
The purpose of modelling is not to get the best ﬁt to the data,
but to construct a model that is not only supported by the
data but also consistent with previous knowledge (unless that
is being placed in question), including earlier empirical re-
search, and that also has a good chance of describing future
observations reasonably well.
Before looking at how you can actually ﬁt models to em-
pirical data, let us consider some principles of model con-
struction. In constructing a model, the response and the ex-
planatory variables play very different roles. Let us consider
them in turn in the next two subsections.
4.3.1 Choice of probability distribution
The main response variable that you will study should be that
speciﬁed in the protocol. In most cases it is directly observ-
able, but in some experimental trials it may be constructed –
for example, the difference between the response at baseline,
before the intervention began, and the ﬁnal response after a
certain length of treatment.
In statistical models, we consider the response variable
to arise at random in a certain sense: we cannot predict in
advance exactly what response each respondent will give so
that random ﬂuctuations are not reproducible. This variabil-
ity arises primarily from differences among human beings,
in contrast to studies in physics or chemistry where measure-
ment error is predominant.
You can then represent the frequencies of the different
possible responses by a histogram. This is so even if you
are making quantitative measurements, because you can only
record them to some ﬁnite precision; hence, the observed
4.3 Role of statistical models 137
values are actually categorical.
Unless you have made a very large number of observa-
tions, histograms will generally be rather irregular, with a
bumpy appearance. If the response consists of unordered cat-
egories, such as a list of career choices or of types of illness,
the shape of the histogram has little meaning: you can ar-
bitrarily modify it by changing the order of the categories.
Then, you have to construct models directly in terms of the
probabilities of respondents falling into the different cate-
gories. The most common case is a binary response, but a
number of models are also available for nominal and ordinal
response variables.
If the response is a count or a measurement, you can go
considerably further. The shape of the histogram now has a
meaning. A probability distribution is a mathematical func-
tion that smoothes the histogram in an informative way, while
retaining, and highlighting, the basic shape. For example,
everyone is familiar with the smooth bell-shaped form of the
Gaussian or normal distribution.
An added advantage of such smoothing is that different
distributions correspond to different ways in which the data
might have been generated. Thus, the normal distribution
arises when a large number of unknown small effects add
together to generate the response, as for example with genet-
ically inherited traits and multiple environmental inﬂuences.
Most probability distributions have one or two unknown
and unobservable parameters (not to be confused with the
observable parameters of the scientist, our explanatory vari-
ables). For example, the normal distribution has the mean
and the variance. These parameters allow ﬂexibility so that
138 Data analysis
you can adjust the distribution to ﬁt as closely as possible to
the empirical histogram. Most distributions have a parameter
that indicates the size of the responses, generally the mean.
Some have a second parameter related to the shape of the
histogram. For the normal distribution, which is symmet-
ric, this is the variance. Most distributions, however, are not
symmetric and will have a different second parameter, if they
have one at all.
In modern statistics, there is rarely any need for you to
transform the response variable to normality. (The major
exception would be if you have made quantitative measure-
ments on an inappropriate scale.) Transformations gener-
ally make the results extremely difﬁcult to understand: how
do you interpret the average of the square root of your re-
sponses? A multitude of distributions are available for non-
normal data, as is the software to perform analyses with them.
Only use transformations of response variables for scientiﬁ-
cally valid reasons.
Common probability distributions
Binomial distribution For binary responses, we require a
distribution that describes the only two possible events. Gen-
erally, the binomial distribution, with only one parameter, the
probability, say π, of the ﬁrst of the two events, is used.
Poisson distribution For counts, we require a distribution
to describe positive integers. Here, the Poisson distribution,
with one parameter, the mean number of events, say µ, is
commonly used.
4.3 Role of statistical models 139
Duration distributions A large number of asymmetric or
skewed distributions is available to describe durations until
some speciﬁed event, such as survival (before death). Often
asurvival curve is ﬁtted, based on the Kaplan–Meier esti-
mates. In many cases, it is more convenient to study directly
the intensity or rate of occurrence of the event of interest in-
stead of the time until the event occurs. The most famous ex-
ample of this is the Cox proportional hazards model, widely
used in medical studies of survival.
Normal distribution This distribution is well known al-
though it is rarely encountered in practice, except as a con-
venient approximation.
The distributions mentioned are only the commonest ones.
For example, frequencies of binary events and counts may
show a large amount of variability, called overdispersion,
that must be taken into account with special distributions.
This is almost bound to occur, for example, if clustering is
used in a design, as in multi-stage surveys and multi-centre
trials.
In contrast to models based on the normal distribution, for
all other common distributions the variance cannot remain
constant when the mean changes. For example, as we saw
when calculating the sample size for counts in Section 2.6.2,
for the Poisson distribution, the mean is equal to the variance.
4.3.2 Regression models
The probability distribution describes the random variabil-
ity in the response variable. However, in a study, you will
usually be primarily concerned with systematic changes in
140 Data analysis
response under certain conditions, the explanatory variables.
You can translate this into a statistical model by looking at
how the probability distribution of the response, or more ex-
actly the parameters in it, change under these conditions.
As usual, you generally must make simplifying assump-
tions. Thus, in many circumstances, you may reasonably as-
sume that only the mean of the distribution changes with the
conditions of interest. You can take the basic shape, for ex-
ample as indicated by the variance, to remain constant under
all conditions.
A second simplifying assumption is more peculiar to the
statistician, not being directly relevant to the scientiﬁc en-
deavour. The way in which the mean varies with the con-
ditions described by the explanatory variables is taken to
be linear in the unknown parameters. In contrast, scien-
tists are interested in the (non-)linearity of responses with
respect to the explanatory variables (their parameters). The
statisticians’ linearity is an old historical assumption that was
necessary to facilitate computation, and that is no longer re-
quired with modern computing power. Unfortunately, most
software packages do not meet such modern criteria.
When you combine these conditions, you will obtain a
standard (multiple) linear regression model, whereby some
function of the mean changes with the conditions:
g(µi) = β0+β1xi1+β2xi2+··· (4.1)
where µiis the mean for the ith subject, xij is the obser-
vation of the jth explanatory variable for that subject, and
βjis the corresponding unknown parameter, the regression
4.3 Role of statistical models 141
coefﬁcient, to be estimated. This model function, that com-
bines some probability distribution with a linear regression,
has come to be known as a generalized linear model.
Notice that the function, g(·), is a transformation of the
mean, not of the observations, so that it does not produce the
difﬁculties of interpretation discussed above. Your choice for
this function of the mean generally will depend on the type
of response that you have observed. Common possibilities
include:
binomial distribution — log odds or logit:
g(µ) = log[µ/(nµ)] where µ=;
Poisson distribution — logarithm: g(µ) = log(µ);
duration distributions — logarithm: g(µ) = log(µ);
normal distribution — identity: g(µ) = µ.
The logarithm is a particularly important transformation be-
cause it allows you to compare means as ratios instead of
as differences; effects are multiplicative instead of additive.
You can then study relative, instead of absolute, differences.
(Note that a logarithmic transformation of the observed re-
sponses does not have this simple interpretation.) In linear
models, the logarithm also ensures that the mean cannot be
negative, often an important requirement. However, a num-
ber of other functions are also possible and are fairly widely
used.
For a binary response, the logit transformation yields lo-
gistic regression; this is probably the most widely used re-
gression model of all. It is noteworthy that this is the only
regression model that can provide the correct estimates of
regression coefﬁcients in a case–control study.
142 Data analysis
The Poisson distribution with a logarithmic transforma-
tion is the basis of log linear models for categorical data.
Duration data, such as survival times, present special dif-
ﬁculties that are now widely known. In particular, many ob-
servations may be censored, that is, incomplete in that the
event did not occur before observation had to stop so that the
duration is only known to be at least a certain length. These
are not missing values or drop-outs and provide essential in-
formation about the longest durations. All standard software
packages handle such data.
More complex constructions are required for nominal and
ordinal response variables because the deﬁnition of a mean is
not so obvious. These are well documented in many books.
One simple way to handle ordinal variables is to assign a
known scale to the categories. Care should be taken with
this approach, however, because it will be misleading if the
scale is poorly chosen.
The xijs in the model need not be simply the observed
explanatory variables. In contrast to response variables, here
transformations can often be useful. Furthermore, you can
handle interactions among variables by including their mul-
tiplicative products in the regression. These allow one con-
dition to inﬂuence the response in different ways depending
on other conditions.
The regression models commonly available in statistical
software are linear in the parameters. Skilful use of the trans-
formation function of the mean, g(µi), can yield a limited
selection of non-linear relationships. However, certain spe-
cialized software is available for more complex non-linear
models. Use these when they make scientiﬁc sense. One
4.3 Role of statistical models 143
common example is an unknown transformation of a vari-
able, such as xα
ij where αis an unknown parameter.
It is often forgotten, when analysing the usual multiple re-
gression models, that the fact that no relationship is detected
in a such a linear model does not exclude the possibility that
a non-linear relationship is present.
Complex sample designs
Take great care when you have used a design involving clus-
tering, as with multi-stage surveys and multi-centre trials.
You can only apply standard regression models as a rather
poor approximation. The model selection procedures to be
described below will generally include too many explana-
tory variables in the ﬁnal model and the precision of the
parameters will be overestimated. Instead, you will require
special software that is not widely available, using random
effects models, to take into account the dependence among
responses within each cluster. This will allow you to make
more reasonable model selection and precision estimates.
You can handle more simply the case of stratiﬁcation with
sampling fractions differing from the population proportions.
Reweight the observations in the strata to bring them back to
the population values. Good software will handle this fairly
automatically, once you have calculated the proper weights.
When you have found the appropriate model for depen-
dence within clusters and/or for stratiﬁcation, handling and
interpretation of the regression aspects is not fundamentally
different from the simpler cases.
144 Data analysis
Interpretation of explanatory variables
Binary variables You can usually most usefully code a bi-
nary variable, xij in Equation (4.1), using zeros and ones.
However, pay attention as to which of the two possible values
has the value one. The category with the value zero is often
called the baseline, because you will make comparisons to
it. Then, you can interpret the corresponding regression co-
efﬁcient, βj, as a contrast between the categories, that is, the
difference in the transformed mean for the category coded as
one, as compared to that for the baseline coded zero.
Several categories If the values of a variable correspond
to several categories, such as marital status or religion, your
interpretation in terms of contrasts will be similar to that for
binary variables, but somewhat more complex. Coding may
be as alphabetic names or as numbers. In the latter case, re-
member, and indicate to the software, that these numbers are
simply codes for categories and not measured magnitudes.
Otherwise, the software will generate completely erroneous
results. Generally, the software will produce an automatic
recoding to a series of dummy variables indicating in which
category each individual belongs.
Again, with most software, you must choose one category
as the baseline to which you will compare the responses in
all other categories. The choice will not alter the ﬁnal in-
terpretation. You should make it for convenience in inter-
preting the contrasts among categories. Here, there will be
a set of regression coefﬁcients, one less than the number of
categories, the missing one being the baseline category. You
can interpret each parameter in the same way as for binary
4.3 Role of statistical models 145
variables, as the difference in the transformed mean for that
category, as compared to that for the baseline.
You can directly compare the sizes of the coefﬁcients,
with larger values indicating greater mean differences from
the baseline category. If some values are close to zero, the
corresponding categories may possibly be collapsed into the
baseline category. If the values for two or more categories
are similar, they may also be combined together. These steps
will simplify the model by reducing the number of parame-
ters.
Quantitative variables When an explanatory variable, xij ,
is a count or measurement, the corresponding regression co-
efﬁcient will be the slope of a straight line describing how
the transformation of the mean changes per unit change of
that variable. For this reason, the size of the coefﬁcient will
depend on the unit of measurement. For example, if the
variable is measured in centimetres, the coefﬁcient