Content uploaded by Stephen Senn
Author content
All content in this area was uploaded by Stephen Senn
Content may be subject to copyright.
1
*Published as: Senn, S. J. (2003). "Bayesian, likelihood and frequentist
approaches to statistics." Applied Clinical Trials 12(8): 35-38.
Bayesian, Likelihood and Frequentist Approaches to Statistics*
Stephen Senn
Dawn of a Bayesian era?
The Italian mathematician, actuary and Bayesian, Bruno DeFinetti (1906-1985), once
estimated that it would take until the year 2020 for the Bayesian view of statistics to
completely prevail
1
. Whether or not his prediction comes true, there is no question that
Bayesian statistics is gaining ground. In drug regulation, however, the alternative frequentist
view continues to dominate, although even here, there are areas (for example the regulation
of medical devices) where the Bayesian approach is being applied. Many readers of Applied
Clinical Trials will have heard of Bayesian statistics and some will have wondered what it is. If
DeFinetti is right, those who have not wondered yet may have to reason to do so in future.
This article is an attempt to provide an explanation.
Statistics versus probability
Before explaining the difference between Bayesian and frequentist statistics (and a third
alternative, the likelihood approach, which has some features of both) it is useful to draw
another distinction: that between probabilists and statisticians. Probabilists are
mathematicians and like others of that breed are involved in a formal game. The game they
play is subjunctive, which is to say that it is a matter of if and then: if such and such are true
then such and such follow. 'If the die is fair, then there is one chance in six that I will roll a
one,' is a trivial example of the sort of question probabilists deal in. However, if you ask the
probabilist, 'is the die fair?', you will receive the reply, 'that's not my department'. Enquiry as to
whose department it is, leads to the statistician. The statistician cannot restrict life to
subjunctive matters: statisticians deal not just with if and then but also with whether and what.
Whether the die is fair or not and if not what exactly the bias is, are the sorts of questions that
statisticians are supposed to try and answer and their answer is supposed to rely on data.
In my book, Dicing with Death
2
, I have described this difference between probability theory
and statistics as the difference between the divine and the human. Probability theory is a
divine theory because it works from known initial conditions to consequences using universal
laws. These initial conditions are declared by the probabilist in a fiat to begin with: 'let there be
theta'. (Theta is a popular choice of symbol to represent a probability.) Thus the probabilist
acts as creator of his or her own universe. Statistics on the other hand is a human science.
The state of nature has been declared and given, but we don't know what it is. All we can do
is observe the consequences and try and divine the mind of The Creator.
The distinction between probability and statistics is also sometimes made in terms of direct
and inverse probability. A question in direct probability might be, 'In 100 tosses of a fair coin,
what is the probability of having exactly 40 heads?'. A question in inverse probability might be,
'In 100 tosses of a coin, 40 showed heads. What is the probability that the coin is fair?'. The
former question is thus the province of probability theory and the latter the province of
statistics. Now it turns out that the second sort of question is much harder to answer than the
first; in fact, it is so hard that mathematicians, scientist, philosophers and, of course,
probabilists and statisticians also, can't agree how it should be answered. The difficulty can
be illustrated with a simple example.
An example
Suppose that I have two urns each containing four balls: urn A and urn B. I am told that urn A
contains three white balls and one black ball and that urn B contains two black balls and two
white balls. I am informed that an urn has been chosen and that a ball has then been drawn
at random from the urn chosen. The ball is black. What is the probability that it came from urn
A? One simple answer might be as follows. Before the ball was chosen the urns contained
three black balls between them: one ball in urn A and two balls in urn B. If any ball in either of
the urns is equally likely to be chosen, it is twice as likely that the black ball chosen was from
urn B as that it came from urn A. Hence, the probability that it came from A is 1/3.
This answer can be formally justified by a theorem in probability, Bayes theorem, named after
Thomas Bayes ( 1701-1761), an English Presbyterian minister, and which was communicated
posthumously by his friend Richard Price and read to the Royal Society of London in 1763. In
words, Bayes theorem states that the probability of an event
1
E
given another event
2
E
is the
joint probability of both events divided by the probability of event
2
E
. In symbols we would
write this as
( )
(
)
( )
1 2
1 2
2
P E E
P E E
P E
∩
=
(1)
Here
( )
P
means probability of, means 'given' and
∩
means 'and'. Since the probability
of the joint event
1 2
E E
∩
is
(
)
1 2 1 2 1
( ) ( )
P E E P E P E E
∩ = (which, expressed in words
means that the probability of '
1
E
and
2
E
' is the probability of
1
E
multiplied by the probability
of
2
E
given
1
E
), an alternative representation of (1) is
( )
(
)
(
)
( )
1 2 1
1 2
2
P E P E E
P E E
P E
=
. (2)
Suppose, in our example, that event
1
E
is 'choose urn A' and
2
E
is 'choose black ball'. Then
if each urn is equally likely a priori to have been chosen we have
(
)
1
1/ 2
P E =
. Furthermore,
if each urn is equally likely to be chosen, since both urns contain the same number of balls,
each ball is equally likely to be chosen. Out of the eight balls in total, one is the black ball in
urn A so that the probability of 'urn A and black ball' is
(
)
1 2
1/ 8
P E E∩ =
. On the other
hand, three out of eight balls in total are black. Hence we have
(
)
2
3 / 8
P E =
. Now applying
Bayes theorem we can substitute these values in the right hand side of the equation given
by(1) to obtain
( )
(
)
( )
(
)
( )
1 2
1 2
2
1/ 8
1
3 / 8 3
P E E
P E E
P E
∩
= = =
,
which is the answer we had before.
Some difficulties
This is all very well and, indeed, may even seem trivial but there is a difficulty with this
answer. In formulating the question in the first place I did not say that the decision from which
urn to withdraw a ball was made at random, with each urn being given an equal chance of
being chosen. I did specify that the ball was chosen from the urn at random. The net result of
this is that although some of the probabilities for this problem are well defined, for example
the probability of choosing a black ball if urn A was chosen in the first place, one important
probability is not, that of choosing urn A.
It is the case that many of the problems we encounter in science have probability elements
that can be divided into two sorts. One sort can be fairly well defined. We assume that a given
theory is true and then calculate the probability of the consequences. For example, we might
assume that the probability,
θ
, that a patient will be cured if given a particular drug is 0.3. We
can then calculate very precisely, for example, given this assumed value what the probability
is that exactly 40 patients in a sample of 100 will be cured. In fact, given that we have a
sample of 100 patients 40 of whom have been cured we can calculate the probability of this
event as a function of the probability
θ
substituting all sorts of values, not just 0.3. This type
of probability, where the event is fixed and the hypothesis changes, were called likelihoods by
the great statistician, geneticist and evolutionary biologist RA Fisher (1890-1962) and play a
central part in statistical inference. Suppose in our urn-sampling problem that we had drawn a
white ball. The probability of sampling a white ball is 3/4 if A is chosen and 1/2 if B is chosen.
These are the so-called likelihoods. Note that they do not add up to one and there is no
general requirement for likelihoods, unlike conventional probabilities, to do so. This is
because the event (white ball) is fixed and the hypothesis (urn A or B) is allowed to change.
For conventional probability we fix the hypothesis (for example urn A) and vary the outcome
(black or white ball).
The second kind of probability element is not well defined. This is the probability of a given
hypothesis being true in the first place. For example the hypothesis, in advance of running a
trial in 100 persons, as to the probability of a cure. It turns out, however, that to issue inverse
probability statements, it is necessary to assume such prior probabilities. Since there is no
objective basis for them, this can only be done subjectively. In attempting to solve the urns
and ball problems you have to assume a prior probability that urn A was chosen, even though
this was not specified in the problem.
This brings us to the heart of the problem. In order to use Bayes theorem to allow us to say
something about the probability of a scientific hypothesis
H
being true given some evidence
e
, we would have to use (2) to write something like
( )
(
)
(
)
( )
P H P e H
P H e
P e
=
. (3)
Here
(
)
P H e
is sometimes referred to as the posterior probability of the hypothesis: the
probability after seeing the evidence. The difficulty is that of the three terms on the right-hand
side of (3), we can usually only find objective values for
(
)
P e H
, the probability of the
evidence given the hypothesis. However, the prior probability of the hypothesis,
( )
P H
is
needed for the solution, as is the probability of the evidence
( )
P e
. The latter is particularly
awkward to obtain, since many different hypotheses would give rise to
e
(albeit with differing
probabilities); you thus need to know the prior probability of every single such hypothesis in
order to calculate it.
Odds, Bayes and likelihood
However, by reformulating our objectives slightly, the difficulty of having to estimate
( )
P e
can
be finessed. Suppose that we wish to compare the posterior probabilities of two hypotheses,
A
H
and
B
H
in terms of their ratios, or odds. We can use (3) to write
( )
(
)
(
)
( )
A A
A
P H P e H
P H e
P e
=
and also
( )
(
)
(
)
( )
B B
B
P H P e H
P H e
P e
=
.
The ratio of these two expressions gives us what we require and, fortunately, the awkward
term
( )
P e
cancels out so that we are left with
(
)
( )
(
)
(
)
( )
( )
( )
( )
(
)
( )
A A A A
A
B
B B B B
P H e P H P e H P e H
P H
P H
P H e P H P e H P e H
= =
. (4)
This is the odds ratio form of Bayes theorem, promoted by the British mathematician
and statistician George Barnard(1915 -2002). It states that the posterior odds of one
hypothesis compared to another is the product of the prior odds (the first of the two terms in
curly brackets) and the ratio of likelihoods (the second of the two terms in curly brackets).
This still leaves us with the problem of estimating the prior odds. There are three common
"solutions".
The first is the Bayesian one of stating that there is nothing inherently problematic about
subjective probabilities, since probabilities anyway are nothing more or less than a statement
of belief. The difficulty with using Bayes theorem only arises because of the myth of objective
probabilities, which is part of the myth of objective knowledge. Indeed, De Finetti himself
referred contemptuously to the, 'inveterate tendency of savages to objectivize and
mythologize everything'
1
. What the Bayesian says is, 'abandon your pretensions of objectivity,
embrace subjectivity and recognise that you need to include personal belief as part of the
solution of any problem'. Thus introspection is the key to the solution. It is personal belief that
provides the final (otherwise missing) ingredient to the calculation of posterior probabilities.
The second solution is to go halfway only. This is to say that of the two terms on the right-
hand side of (4), one (the ratio of likelihoods) is well defined and may attract a fair degree of
assent as to its value but the other (the prior odds) is not. For example, in my urn and ball
problem, since I did not define the mechanism by which the urns were chosen then
(
)
( )
A
B
P H
P H
is completely speculative and not worth including in the problem. However, the second term,
(
)
( )
A
B
P e H
P e H
is defined by the problem. Indeed, it is equal to
(
)
1/ 4 (2 / 4) 1/ 2
=
. The ratio of likelihoods
is thus one to two comparing urn A to urn B or two to one in favour of urn B. This quantity is
then perfectly objective. The Bayesian will counter that this may well be so but it still fails to
capture an important element of the problem, namely the prior odds. Furthermore, it turns out
that for more complex cases it is not always possible to calculate such simple ratios of
likelihoods.
The frequentist approach
The third solution is the frequentist one. This is to abandon all pretence of saying anything
about hypotheses at all. Effectively, inverse probability is rejected altogether and one tries to
work with direct probabilities only. For example one could adopt the following rule of
behaviour. If a black ball is chosen I shall act as if it came from urn B. If a white ball is chosen
I shall act as if it came from urn A. We can then calculate the probabilities of making two
types of error. If urn A was the urn from which the ball is chosen, then there is a one in four
chance of choosing a black ball. Thus there is a one in four chance of being wrong. On the
other hand, if urn B is the urn from which the ball is chosen, then there are two chances out of
four of choosing a white ball; thus there is a fifty percent chance of being wrong. This is
referred to a 'hypothesis testing' and is an approach that was developed at University College
London in the later 1920s and early 1930s by the Polish mathematician Jerzy Neyman (1894-
1981) and the British statistician Egon Pearson(1895-1980). Neyman later emigrated to the
USA and founded an extremely influential and vigorous school of statistics at Berkeley. Note,
however, that these error rates are subjunctive. The probability statements are of the 'if/then'
form. They do not correspond to probabilities that the hypotheses are true and, indeed,
Neyman would deny that any such statement has meaning: a hypothesis either is or is not
true and hence does not have a probability of being true.
The Neyman-Pearson system is the one that appears to be the one used in drug regulation.
We refer to type I error rates, to null and alternative hypotheses, to power of tests and so
forth. All of these are concepts that play an important part in that system. Nevertheless, the
way in which the system is applied in practice reflects elements of a slightly older and similar
system, much developed by RA Fisher. A problem in applying the Neyman-Pearson system in
practice is that often the probability of the evidence is often only well defined under a so-
called null hypothesis. In a controlled clinical trial such a hypothesis might be, 'there is no
difference between the treatments'. Given such a null hypothesis the probability of observing
a result as extreme or more extreme than the observed difference between treatments, the
so-called p-value, may be calculated. This may be compared to a standard level of
significance, for example 5%. If the p-value is less than this standard, the hypothesis is
considered rejected, Such a procedure can be employed to guarantee a given type I error
rate, as in the Neyman-Pearson system but does not employ any specific reference to
alternative hypotheses, which can be difficult to characterise. For example, the logical
alternative to, 'the treatments are the same', is the, 'treatments are different' but since there
are infinitely many ways in which treatments can differ, this does not yield a unique way of
calculating probabilities.
It would be simplistic to conclude that the difference between frequentist and Bayesian
statistics is that the former is objective and the latter subjective. It would be more accurate to
say that the former is subjunctive (if the null hypothesis is true I shall make an error with this
probability) and the latter is subjective (my personal belief of the truth of this statement is such
and such). Bayesians would claim that frequentist methods give an illusion of objectivity.
Frequentists deny any place for subjective probability but, in fact, the interpretations that
result from any application of frequentist statistics depend very much on personal actions. For
example, the decision to inspect a trial during its running with the possibility of stopping the
trial early may impact on the reported p-value. Thus, it is not only the data that affect the
conclusion but the trialist's intentions also. Such behaviour has no direct impact on Bayesian
calculations, which are not affected by the number of times one looks at a trial and so from
this point of view can claim to be more objective.
Where does this leave us?
In my view, it is too early to say. It may be that DeFinetti's prediction will come true and we
shall move to a consensus that Bayesian methods are those we should use. Perhaps, on the
other hand, drug regulation will continue much as before. Personally, I like the advice of
George Barnard. Starting with a key paper in 1949, Barnard produced many trenchant
criticisms of the then dominant frequentist school
3
but never accepted that Bayesian methods
alone would be sufficient for the applied statistician. Towards the end of his life he suggested
that every statistician ought to have basic familiarity with the four major systems of inference
4
:
DeFinetti's fully subjective Bayesian approach, a less extreme version pioneered by the
British geophysicist Harold Jeffreys(1891-1989) (which has not been discussed here), the
Neyman-Pearson system and Fisher's mix of significance tests and likelihood.
Of course this can be regarded as a rather unsatisfactory situation; we have to have four
systems rather than one. Is statistics not complicated enough as it is? As already explained,
however, statistics is a human subject not a divine one. The difficulties it attempts to
overcome are genuine and our human powers are limited. As in other areas of human
struggle, pragmatic compromise, although not perfect, may avoid the disasters to which
fanatic single-mindedness can tend.
References
1. de Finetti, B. D. Theory of Probability (Volume 1) (Wiley, Chichester, 1974).
2. Senn, S. J. Dicing with Death (Cambridge University Press, Cambridge, 2003).
3. Barnard, G. A. "Statistical Inference (with discussion)". Journal of the Royal Statistical
Society, Series B 11115-149 (1949).
4. Barnard, G. A. "Fragments of a statistical autobiography". Student 1257-268 (1996).
Stephen Senn, PhD, CStat is Professor of Pharmaceutical and Health Statistics at University
College London and on the editorial board of Applied Clinical Trials. His book Dicing with
Death (2003), a popular account of medical statistics, is published by Cambridge University
Press.
Comparison of frequentist and Bayesian systems of inference as regards key issues.
Statistical System
Issue
Bayesian
Frequentist
Nature of probability Subjective. A statement of
belief
A relative frequency. The
long-run proportion
Scope Relevant to any situation Strictly speaking, only
relevant where an infinite
repetition of the set-up can
be envisaged
Relevance of events that did
not occur
Not relevant except to the
extent that they influence the
assessment of prior
probability
All possible events must be
taken into account when
performing the probability
calculation
Role of intentions. (For
example stopping rules.)
Only affect the probability
calculation to the extent that
they are reflected in the prior
Will affect the probability
calculation
Probability statements for
hypotheses
Can be made as they reflect
personal belief in the truth of
hypotheses
Are not relevant since
hypotheses are either true or
false
Prior probabilities Are needed Are not needed