Content uploaded by Stephen Senn

Author content

All content in this area was uploaded by Stephen Senn

Content may be subject to copyright.

1

*Published as: Senn, S. J. (2003). "Bayesian, likelihood and frequentist

approaches to statistics." Applied Clinical Trials 12(8): 35-38.

Bayesian, Likelihood and Frequentist Approaches to Statistics*

Stephen Senn

Dawn of a Bayesian era?

The Italian mathematician, actuary and Bayesian, Bruno DeFinetti (1906-1985), once

estimated that it would take until the year 2020 for the Bayesian view of statistics to

completely prevail

1

. Whether or not his prediction comes true, there is no question that

Bayesian statistics is gaining ground. In drug regulation, however, the alternative frequentist

view continues to dominate, although even here, there are areas (for example the regulation

of medical devices) where the Bayesian approach is being applied. Many readers of Applied

Clinical Trials will have heard of Bayesian statistics and some will have wondered what it is. If

DeFinetti is right, those who have not wondered yet may have to reason to do so in future.

This article is an attempt to provide an explanation.

Statistics versus probability

Before explaining the difference between Bayesian and frequentist statistics (and a third

alternative, the likelihood approach, which has some features of both) it is useful to draw

another distinction: that between probabilists and statisticians. Probabilists are

mathematicians and like others of that breed are involved in a formal game. The game they

play is subjunctive, which is to say that it is a matter of if and then: if such and such are true

then such and such follow. 'If the die is fair, then there is one chance in six that I will roll a

one,' is a trivial example of the sort of question probabilists deal in. However, if you ask the

probabilist, 'is the die fair?', you will receive the reply, 'that's not my department'. Enquiry as to

whose department it is, leads to the statistician. The statistician cannot restrict life to

subjunctive matters: statisticians deal not just with if and then but also with whether and what.

Whether the die is fair or not and if not what exactly the bias is, are the sorts of questions that

statisticians are supposed to try and answer and their answer is supposed to rely on data.

In my book, Dicing with Death

2

, I have described this difference between probability theory

and statistics as the difference between the divine and the human. Probability theory is a

divine theory because it works from known initial conditions to consequences using universal

laws. These initial conditions are declared by the probabilist in a fiat to begin with: 'let there be

theta'. (Theta is a popular choice of symbol to represent a probability.) Thus the probabilist

acts as creator of his or her own universe. Statistics on the other hand is a human science.

The state of nature has been declared and given, but we don't know what it is. All we can do

is observe the consequences and try and divine the mind of The Creator.

The distinction between probability and statistics is also sometimes made in terms of direct

and inverse probability. A question in direct probability might be, 'In 100 tosses of a fair coin,

what is the probability of having exactly 40 heads?'. A question in inverse probability might be,

'In 100 tosses of a coin, 40 showed heads. What is the probability that the coin is fair?'. The

former question is thus the province of probability theory and the latter the province of

statistics. Now it turns out that the second sort of question is much harder to answer than the

first; in fact, it is so hard that mathematicians, scientist, philosophers and, of course,

probabilists and statisticians also, can't agree how it should be answered. The difficulty can

be illustrated with a simple example.

An example

Suppose that I have two urns each containing four balls: urn A and urn B. I am told that urn A

contains three white balls and one black ball and that urn B contains two black balls and two

white balls. I am informed that an urn has been chosen and that a ball has then been drawn

at random from the urn chosen. The ball is black. What is the probability that it came from urn

A? One simple answer might be as follows. Before the ball was chosen the urns contained

three black balls between them: one ball in urn A and two balls in urn B. If any ball in either of

the urns is equally likely to be chosen, it is twice as likely that the black ball chosen was from

urn B as that it came from urn A. Hence, the probability that it came from A is 1/3.

This answer can be formally justified by a theorem in probability, Bayes theorem, named after

Thomas Bayes ( 1701-1761), an English Presbyterian minister, and which was communicated

posthumously by his friend Richard Price and read to the Royal Society of London in 1763. In

words, Bayes theorem states that the probability of an event

1

E

given another event

2

E

is the

joint probability of both events divided by the probability of event

2

E

. In symbols we would

write this as

( )

(

)

( )

1 2

1 2

2

P E E

P E E

P E

∩

=

(1)

Here

( )

P

means probability of, means 'given' and

∩

means 'and'. Since the probability

of the joint event

1 2

E E

∩

is

(

)

1 2 1 2 1

( ) ( )

P E E P E P E E

∩ = (which, expressed in words

means that the probability of '

1

E

and

2

E

' is the probability of

1

E

multiplied by the probability

of

2

E

given

1

E

), an alternative representation of (1) is

( )

(

)

(

)

( )

1 2 1

1 2

2

P E P E E

P E E

P E

=

. (2)

Suppose, in our example, that event

1

E

is 'choose urn A' and

2

E

is 'choose black ball'. Then

if each urn is equally likely a priori to have been chosen we have

(

)

1

1/ 2

P E =

. Furthermore,

if each urn is equally likely to be chosen, since both urns contain the same number of balls,

each ball is equally likely to be chosen. Out of the eight balls in total, one is the black ball in

urn A so that the probability of 'urn A and black ball' is

(

)

1 2

1/ 8

P E E∩ =

. On the other

hand, three out of eight balls in total are black. Hence we have

(

)

2

3 / 8

P E =

. Now applying

Bayes theorem we can substitute these values in the right hand side of the equation given

by(1) to obtain

( )

(

)

( )

(

)

( )

1 2

1 2

2

1/ 8

1

3 / 8 3

P E E

P E E

P E

∩

= = =

,

which is the answer we had before.

Some difficulties

This is all very well and, indeed, may even seem trivial but there is a difficulty with this

answer. In formulating the question in the first place I did not say that the decision from which

urn to withdraw a ball was made at random, with each urn being given an equal chance of

being chosen. I did specify that the ball was chosen from the urn at random. The net result of

this is that although some of the probabilities for this problem are well defined, for example

the probability of choosing a black ball if urn A was chosen in the first place, one important

probability is not, that of choosing urn A.

It is the case that many of the problems we encounter in science have probability elements

that can be divided into two sorts. One sort can be fairly well defined. We assume that a given

theory is true and then calculate the probability of the consequences. For example, we might

assume that the probability,

θ

, that a patient will be cured if given a particular drug is 0.3. We

can then calculate very precisely, for example, given this assumed value what the probability

is that exactly 40 patients in a sample of 100 will be cured. In fact, given that we have a

sample of 100 patients 40 of whom have been cured we can calculate the probability of this

event as a function of the probability

θ

substituting all sorts of values, not just 0.3. This type

of probability, where the event is fixed and the hypothesis changes, were called likelihoods by

the great statistician, geneticist and evolutionary biologist RA Fisher (1890-1962) and play a

central part in statistical inference. Suppose in our urn-sampling problem that we had drawn a

white ball. The probability of sampling a white ball is 3/4 if A is chosen and 1/2 if B is chosen.

These are the so-called likelihoods. Note that they do not add up to one and there is no

general requirement for likelihoods, unlike conventional probabilities, to do so. This is

because the event (white ball) is fixed and the hypothesis (urn A or B) is allowed to change.

For conventional probability we fix the hypothesis (for example urn A) and vary the outcome

(black or white ball).

The second kind of probability element is not well defined. This is the probability of a given

hypothesis being true in the first place. For example the hypothesis, in advance of running a

trial in 100 persons, as to the probability of a cure. It turns out, however, that to issue inverse

probability statements, it is necessary to assume such prior probabilities. Since there is no

objective basis for them, this can only be done subjectively. In attempting to solve the urns

and ball problems you have to assume a prior probability that urn A was chosen, even though

this was not specified in the problem.

This brings us to the heart of the problem. In order to use Bayes theorem to allow us to say

something about the probability of a scientific hypothesis

H

being true given some evidence

e

, we would have to use (2) to write something like

( )

(

)

(

)

( )

P H P e H

P H e

P e

=

. (3)

Here

(

)

P H e

is sometimes referred to as the posterior probability of the hypothesis: the

probability after seeing the evidence. The difficulty is that of the three terms on the right-hand

side of (3), we can usually only find objective values for

(

)

P e H

, the probability of the

evidence given the hypothesis. However, the prior probability of the hypothesis,

( )

P H

is

needed for the solution, as is the probability of the evidence

( )

P e

. The latter is particularly

awkward to obtain, since many different hypotheses would give rise to

e

(albeit with differing

probabilities); you thus need to know the prior probability of every single such hypothesis in

order to calculate it.

Odds, Bayes and likelihood

However, by reformulating our objectives slightly, the difficulty of having to estimate

( )

P e

can

be finessed. Suppose that we wish to compare the posterior probabilities of two hypotheses,

A

H

and

B

H

in terms of their ratios, or odds. We can use (3) to write

( )

(

)

(

)

( )

A A

A

P H P e H

P H e

P e

=

and also

( )

(

)

(

)

( )

B B

B

P H P e H

P H e

P e

=

.

The ratio of these two expressions gives us what we require and, fortunately, the awkward

term

( )

P e

cancels out so that we are left with

(

)

( )

(

)

(

)

( )

( )

( )

( )

(

)

( )

A A A A

A

B

B B B B

P H e P H P e H P e H

P H

P H

P H e P H P e H P e H

= =

. (4)

This is the odds ratio form of Bayes theorem, promoted by the British mathematician

and statistician George Barnard(1915 -2002). It states that the posterior odds of one

hypothesis compared to another is the product of the prior odds (the first of the two terms in

curly brackets) and the ratio of likelihoods (the second of the two terms in curly brackets).

This still leaves us with the problem of estimating the prior odds. There are three common

"solutions".

The first is the Bayesian one of stating that there is nothing inherently problematic about

subjective probabilities, since probabilities anyway are nothing more or less than a statement

of belief. The difficulty with using Bayes theorem only arises because of the myth of objective

probabilities, which is part of the myth of objective knowledge. Indeed, De Finetti himself

referred contemptuously to the, 'inveterate tendency of savages to objectivize and

mythologize everything'

1

. What the Bayesian says is, 'abandon your pretensions of objectivity,

embrace subjectivity and recognise that you need to include personal belief as part of the

solution of any problem'. Thus introspection is the key to the solution. It is personal belief that

provides the final (otherwise missing) ingredient to the calculation of posterior probabilities.

The second solution is to go halfway only. This is to say that of the two terms on the right-

hand side of (4), one (the ratio of likelihoods) is well defined and may attract a fair degree of

assent as to its value but the other (the prior odds) is not. For example, in my urn and ball

problem, since I did not define the mechanism by which the urns were chosen then

(

)

( )

A

B

P H

P H

is completely speculative and not worth including in the problem. However, the second term,

(

)

( )

A

B

P e H

P e H

is defined by the problem. Indeed, it is equal to

(

)

1/ 4 (2 / 4) 1/ 2

=

. The ratio of likelihoods

is thus one to two comparing urn A to urn B or two to one in favour of urn B. This quantity is

then perfectly objective. The Bayesian will counter that this may well be so but it still fails to

capture an important element of the problem, namely the prior odds. Furthermore, it turns out

that for more complex cases it is not always possible to calculate such simple ratios of

likelihoods.

The frequentist approach

The third solution is the frequentist one. This is to abandon all pretence of saying anything

about hypotheses at all. Effectively, inverse probability is rejected altogether and one tries to

work with direct probabilities only. For example one could adopt the following rule of

behaviour. If a black ball is chosen I shall act as if it came from urn B. If a white ball is chosen

I shall act as if it came from urn A. We can then calculate the probabilities of making two

types of error. If urn A was the urn from which the ball is chosen, then there is a one in four

chance of choosing a black ball. Thus there is a one in four chance of being wrong. On the

other hand, if urn B is the urn from which the ball is chosen, then there are two chances out of

four of choosing a white ball; thus there is a fifty percent chance of being wrong. This is

referred to a 'hypothesis testing' and is an approach that was developed at University College

London in the later 1920s and early 1930s by the Polish mathematician Jerzy Neyman (1894-

1981) and the British statistician Egon Pearson(1895-1980). Neyman later emigrated to the

USA and founded an extremely influential and vigorous school of statistics at Berkeley. Note,

however, that these error rates are subjunctive. The probability statements are of the 'if/then'

form. They do not correspond to probabilities that the hypotheses are true and, indeed,

Neyman would deny that any such statement has meaning: a hypothesis either is or is not

true and hence does not have a probability of being true.

The Neyman-Pearson system is the one that appears to be the one used in drug regulation.

We refer to type I error rates, to null and alternative hypotheses, to power of tests and so

forth. All of these are concepts that play an important part in that system. Nevertheless, the

way in which the system is applied in practice reflects elements of a slightly older and similar

system, much developed by RA Fisher. A problem in applying the Neyman-Pearson system in

practice is that often the probability of the evidence is often only well defined under a so-

called null hypothesis. In a controlled clinical trial such a hypothesis might be, 'there is no

difference between the treatments'. Given such a null hypothesis the probability of observing

a result as extreme or more extreme than the observed difference between treatments, the

so-called p-value, may be calculated. This may be compared to a standard level of

significance, for example 5%. If the p-value is less than this standard, the hypothesis is

considered rejected, Such a procedure can be employed to guarantee a given type I error

rate, as in the Neyman-Pearson system but does not employ any specific reference to

alternative hypotheses, which can be difficult to characterise. For example, the logical

alternative to, 'the treatments are the same', is the, 'treatments are different' but since there

are infinitely many ways in which treatments can differ, this does not yield a unique way of

calculating probabilities.

It would be simplistic to conclude that the difference between frequentist and Bayesian

statistics is that the former is objective and the latter subjective. It would be more accurate to

say that the former is subjunctive (if the null hypothesis is true I shall make an error with this

probability) and the latter is subjective (my personal belief of the truth of this statement is such

and such). Bayesians would claim that frequentist methods give an illusion of objectivity.

Frequentists deny any place for subjective probability but, in fact, the interpretations that

result from any application of frequentist statistics depend very much on personal actions. For

example, the decision to inspect a trial during its running with the possibility of stopping the

trial early may impact on the reported p-value. Thus, it is not only the data that affect the

conclusion but the trialist's intentions also. Such behaviour has no direct impact on Bayesian

calculations, which are not affected by the number of times one looks at a trial and so from

this point of view can claim to be more objective.

Where does this leave us?

In my view, it is too early to say. It may be that DeFinetti's prediction will come true and we

shall move to a consensus that Bayesian methods are those we should use. Perhaps, on the

other hand, drug regulation will continue much as before. Personally, I like the advice of

George Barnard. Starting with a key paper in 1949, Barnard produced many trenchant

criticisms of the then dominant frequentist school

3

but never accepted that Bayesian methods

alone would be sufficient for the applied statistician. Towards the end of his life he suggested

that every statistician ought to have basic familiarity with the four major systems of inference

4

:

DeFinetti's fully subjective Bayesian approach, a less extreme version pioneered by the

British geophysicist Harold Jeffreys(1891-1989) (which has not been discussed here), the

Neyman-Pearson system and Fisher's mix of significance tests and likelihood.

Of course this can be regarded as a rather unsatisfactory situation; we have to have four

systems rather than one. Is statistics not complicated enough as it is? As already explained,

however, statistics is a human subject not a divine one. The difficulties it attempts to

overcome are genuine and our human powers are limited. As in other areas of human

struggle, pragmatic compromise, although not perfect, may avoid the disasters to which

fanatic single-mindedness can tend.

References

1. de Finetti, B. D. Theory of Probability (Volume 1) (Wiley, Chichester, 1974).

2. Senn, S. J. Dicing with Death (Cambridge University Press, Cambridge, 2003).

3. Barnard, G. A. "Statistical Inference (with discussion)". Journal of the Royal Statistical

Society, Series B 11115-149 (1949).

4. Barnard, G. A. "Fragments of a statistical autobiography". Student 1257-268 (1996).

Stephen Senn, PhD, CStat is Professor of Pharmaceutical and Health Statistics at University

College London and on the editorial board of Applied Clinical Trials. His book Dicing with

Death (2003), a popular account of medical statistics, is published by Cambridge University

Press.

Comparison of frequentist and Bayesian systems of inference as regards key issues.

Statistical System

Issue

Bayesian

Frequentist

Nature of probability Subjective. A statement of

belief

A relative frequency. The

long-run proportion

Scope Relevant to any situation Strictly speaking, only

relevant where an infinite

repetition of the set-up can

be envisaged

Relevance of events that did

not occur

Not relevant except to the

extent that they influence the

assessment of prior

probability

All possible events must be

taken into account when

performing the probability

calculation

Role of intentions. (For

example stopping rules.)

Only affect the probability

calculation to the extent that

they are reflected in the prior

Will affect the probability

calculation

Probability statements for

hypotheses

Can be made as they reflect

personal belief in the truth of

hypotheses

Are not relevant since

hypotheses are either true or

false

Prior probabilities Are needed Are not needed