Content uploaded by Norman Elliott Fenton
Author content
All content in this area was uploaded by Norman Elliott Fenton on Jan 24, 2018
Content may be subject to copyright.
This is the submitted version of:
Fenton, N. and Neil, M. (2010). "Comparing risks of alternative medical
diagnosis using Bayesian arguments." Journal of Biomedical
Informatics, 43: 485-495, http://dx.doi.org/10.1016/j.jbi.2010.02.004
Comparing risks of alternative medical
diagnosis using Bayesian arguments
Norman Fenton and Martin Neil
Queen Mary University of London
RADAR (Risk Assessment and Decision Analysis Research)
School of Electronic Engineering and Computer Science
London E1 4NS
and
Agena Ltd
www.agena.co.uk
32-33 Hatton Garden
London EC1N 8DL
2 July 2009
For correspondence: Prof Norman Fenton, School of Electronic Engineering and
Computer Science, Queen Mary (University of London), London E1 4NS. Email:
norman@dcs.qmul.ac.uk, fax: +44 020-8980-6533
Abstract
This paper explains the role of Bayes theorem and Bayesian networks arising in a
medical negligence case brought by a patient who suffered a stroke as a result of an
invasive diagnostic test. The claim of negligence was based on the premise that an
alternative (non-invasive) test should have been used because it carried a lower risk.
The case raises a number of general and widely applicable concerns about the
decision-making process within the medical profession, including the ethics of
informed consent, patient care liabilities when errors are made, and the research
problem of focusing on ‘true positives’ while ignoring ‘false positives’. An immediate
concern is how best to present Bayesian arguments in such a way that they can be
understood by people who would normally balk at mathematical equations. We feel it
is possible to present purely visual representations of a non-trivial Bayesian argument
in such a way that no mathematical knowledge or understanding is needed. The
approach supports a wide range of alternative scenarios, makes all assumptions easily
understandable and offers significant potential benefits to many areas of medical
decision-making.
Keywords: Bayes theorem, Bayesian networks, event trees, catheter angiagram,
MRA scan, aneurysm, palsey
1 Introduction
In a classic and much referenced study [1] the following question was put to 60
students and staff at Harvard Medical School (we shall refer to this as the ‘Harvard
question’)
"One in a thousand people has a prevalence for a particular heart disease.
There is a test to detect this disease. The test is 100% accurate for people who
have the disease and is 95% accurate for those who don't (this means that 5%
of people who do not have the disease will be wrongly diagnosed as having it).
If a randomly selected person tests positive what is the probability that the
person actually has the disease?"
Almost half gave the response 95%.
The 'average' answer was 56%.
In fact the correct answer is just below 2% (as we shall explain in Section 2). There is
much debate about why intelligent people are so poor at answering questions that
require simple mathematical reasoning. There is also much debate about the best ways
to avoid such errors. For example, Cosmides and Tooby [11] demonstrated that
responses to the Harvard question were significantly improved by using language that
avoided abstract probabilities. This led them to challenge the widely believed claims
[29] that lay people were inherently incapable of accurate probabilistic reasoning.
Their view was that it was the Bayesian framework generally used to answer such
questions that was the cause of confusion.
It turns out that the issue of how best to present probabilistic Bayesian reasoning was
crucial in a recent medical negligence case that we describe in Section 3. We worked
as experts (on probabilistic risk assessment) to help the claimant’s legal team
represent and verify the key probabilistic arguments, and even more importantly, to
understand them sufficiently well to be able to present them in court. The High Court
accepted the claimant’s case and awarded significant damages.
The approach we used in the case to explain the argument (a decision tree
representation of Bayes) is described in detail in Section 4. With this approach no
mathematics is required, and it is still easy to consider a full range of assumptions that
incorporate both the claimant and the defence viewpoints.
The case raises a number of very general and widely applicable concerns about the
decision-making process within the medical profession and also about how the
Bayesian approach can improve this process. Our view is that, in many cases such as
this particular one, it is possible to present purely visual representations of a Bayesian
argument, in such a way that no mathematical knowledge or understanding is needed.
However, where the medical problem involves many variables and interactions, the
proposed approach becomes infeasible and an alternative approach (Bayesian
networks) is needed. Section 5 explains why we believe Bayesian networks can
provide a viable alternative, and also explains how we used them to fully verify the
whole argument in the case.
As we explain in the paper, there is nothing new about using either Bayesian
reasoning, decision tree representations, or Bayesian networks within the domain of
risk assessment in medical diagnosis. Our novel contribution in this paper is both a
real (and successful) case study in using all of these techniques and a new approach to
how these techniques can be made much more widely accepted.
2 Presenting Bayesian arguments visually
The easiest way to explain the correct result for the Harvard question is to use the
kind of visual argument presented in Figure 1 and Figure 2 (similar approaches have
been proposed in [4]). Here we imagine a large number of randomly selected people
(1,000 in this case but the argument works for any large number) being tested for the
particular disease.
Denotes person with disease Denotes person wrongly diagnosed with disease
Figure 1 In 1000 random people about 1 has the disease but about 50 more are diagnosed as
having the disease
Denotes person with disease
Denotes person wrongly diagnosed with disease
Figure 2 Only 1 out of the 51 diagnosed with the disease actually have the disease
The argument is essentially as follows:
The disease is prevalent in one in a thousand people, so in a sample of, say,
1000 people we would expect about one to have the disease (in 100,000
people we would expect about 100 to have the disease etc.). This is
represented by the black figure in Figure 1.
But if you test everybody in the sample then, in addition to the people who do
have the disease, we would expect approximately 50 – that is 5% of the other
999 -- will be wrongly diagnosed as having it (in 100,000 people this will be
approximately 4995, that is 5% of the 99,900 who do not have the disease).
This is represented by the grey figures in Figure 1.
In other words fewer than 2% of the people who are diagnosed positive (i.e. 1
out of the 51 in the case of 1000 people and 100 out of 5005 in the case of
100,000 people) actually have the disease. This is represented in Figure 2.
When people give a high answer, like 95%, they are falling victim to a very common
fallacy known as the 'base-rate neglect' fallacy [29]; people neglect to take into
consideration the very low probability (of having the disease) that forms the vital
starting point. In comparison, the probability of a false positive test is relatively high
(5% is the same as 50 in a thousand, whereas there is only a one in a thousand chance
of having the disease).
The above visual explanation can convince even the most sceptical and mathematical-
illiterate observers of both the correct answer and how to calculate it themselves.
However, this cannot be said of the standard formal approach to solving such
problems. Specifically, the formal way to present the above argument is to use Bayes
Theorem, which is generally acknowledged as the standard approach for reasoning
under uncertainty. The Bayesian approach enables us to re-evaluate probabilities (in
this case the probability that a randomly tested patient has the disease) in the light of
new evidence (in this case a positive test result). Specifically, Bayes Theorem is a
formula for the revised (posterior) probability in terms of the original (prior)
probability and the probability of observing the evidence. Its use in medical
diagnostics is far from new as can be seen from publications dating back almost 50
years [35] [57]. However, although the formula is straightforward (see the Appendix
for both the formula and its application in this case) most people without a
statistical/mathematical background do not understand the argument if it is presented
in this way [11]. There are two reasons for this:
1. They simply refuse to either ‘examine’ or ‘believe’ a mathematical formula.
2. The results are presented as probabilities rather than frequencies and they find
this much harder to contextualise.
Hence, while mathematicians and statisticians assume that Bayesian arguments are
simple and self-explanatory, they are not normally understood by doctors [9] [17] [29]
[51], lawyers, judges and juries [16] [14] [21] [30]. Because of this the power of
Bayesian reasoning has been massively underused.
An approach to presenting the argument that can be viewed as a semi-formal (and
more easily repeatable) version of the above visual argument, is to use decision trees
(also called event trees). Indeed, this approach is recommended for precisely this type
of application in the excellent recent book on medical decision-making [55].
Total people
100,000
1/1000
999/1000
Have the disease
100
Don’t have
the disease
99,900
Test positive
100
Test negative
0
Test positive
4,995
Test negative
94,905
So 100 out of 5,095
(4995 + 100)
who test positive actually
have the disease, i.e.
under 2%
100%
0%
5%
95%
Figure 3 Event/decision tree explanation
Figure 3 presents the event/decision tree in this case. Note that, as in the visual
argument (and in contrast to the formal Bayesian argument), we start with a
hypothetical large number of people to be tested. This simple enhancement has a
profound improvement on comprehensibility [11].
Before moving to the particular medical negligence case, it is worth considering the
implications of failing to understand the true probabilities arising from medical
diagnostic test results. As the Harvard question demonstrated, medical experts tend to
believe that a positive test result implies a much greater probability that the patient
has the disease than is really the case. Most members of the public would believe the
same. But both the experts and the public are demonstrably wrong, as we have shown
in Figures 1-3 (the probability increases in this example but is still very small). In
practice such misunderstanding about the true probability has been known to lead not
only to unnecessary anguish by the patient but also to further unnecessary tests and
even unnecessary surgery (see, for example, [20] which gives a comprehensive
analysis of this phenomenon in the case of screening for breast cancer).
3 The background to the medical negligence case
The patient was an insulin-dependent diabetic admitted to hospital suffering from
headaches and vomiting. Initial scans were negative, but the patient then developed a
(pupil-sparing) 3rd Nerve Palsy. This condition is fairly common and, being pupil-
sparing, the cause is normally ischaemic, meaning that the patient makes a full
recovery without treatment. Indeed, this particular patient had previously suffered two
similar such ischaemic episodes but had fully recovered from them without treatment.
However, in a small percentage of cases, the cause of a pupil-sparing 3rd nerve Palsy
can be either an expanding aneurysm or a cavernous sinus pathology (CSP). In either
of these cases urgent diagnosis and treatment is required, otherwise the condition can
be fatal. The doctor in charge of the patient recommended an MRA (Magnetic
Resonance Angiogaphy) scan be performed urgently, since this is a non-invasive test
that is reasonably accurate for detecting expanding anuerusyms and CSP. However, it
being a Friday evening the hospital could not offer such a test until the following
Monday morning. Consequently, the patient was transferred to a specialist hospital
that had the equipment to carry out the test immediately.
Contrary to the recommendation of the first doctor, the doctors at the specialist
hospital decided to perform an alternative test, called a catheter angiagram (CA). This
test is recognised as being more accurate than the MRA scan for diagnosing
aneurysms. However, it cannot diagnose CSP at all and, as an invasive test, it carries a
known 1% risk of causing a permanent stroke in diabetic patients. The CA test
(which was performed early afternoon on the Saturday) indeed caused the patient to
suffer a permanent stroke. The cause of the 3rd nerve palsy was subsequently found to
be ischaemic (so it is assumed that the patient would have recovered without
treatment).
The claim of negligence brought by the patient was based primarily on the notion that
the patient was not adequately informed of the relative risks of the alternative
treatments; this would have indicated that the sensible pathway was to give the MRA
test because, although it carried a very small risk (5%) of non-detection (which could
be fatal), this risk was tiny in comparison with the risk (1%) of a stroke from the
invasive CA test.
It turned out that, despite the statistical data available, neither side could provide a
coherent argument for directly comparing the risks of the alternative pathways. One
surgeon advising the claimant’s legal team argued that, by using Bayes Theorem, he
could 'prove' the risk of the non-invasive MRA test was much less than the CA test.
His proof was essentially as follows:
1. The risk from the MRA test is of the patient dying as a result of failing to
detect an expanding aneurysm.
2. The prior probability of there being such an expanding aneurysm is very low.
3. Bayes theorem calculates the posterior probability that the palsy is caused by
an aneurysm given that the MRA test is negative. This probability is
exceptionally low – much lower than the 1% probability of stroke risk from
the CA test.
However, because the surgeon used the Bayes Theorem formula neither the lawyers
nor other doctors could understand his 'proof' and they sought some expert advice to
both validate the argument and present it in a user-friendly way.
The proof was essentially sound, but was missing some significant variables and
interactions between them (such as the different impact and detection rates of large
and small aneurysms and the role of CSP as a potential alternative cause of the
syptoms).
4 The formal argument in the medical negligence
case
Our approach to analysing this problem was to build a causal model (known as a
Bayesian network) that contained all of the relevant variables and dependencies (we
discuss this in Section 5). We could then run the model in a tool that automatically
performs the correct Bayesian calculations under a range of different assumptions and
scenarios (including both the claimant and defence figures and differing medical
opinions). This was, for us, a necessary validation to convince ourselves of the
argument and results, but we were fully aware that such a model and calculations was
not appropriate for the lawyers and doctors to understand and present in court.
It turned out that, although the argument was significantly more complex than the
classic example presented in Section 2, it was still possible to use the event/decision
tree approach. What we had to do was to present the two alternative pathways (MRA
test or CA test) as two separate decision trees as shown in Figure 4 and Figure 5
respectively.
Total people
1,000,000
Large
9,900
Small
100
CSP
10,000
Others (ischaemic)
980,000
Detected by Test
9,405
Undetected by Test
495
Detected by Test
50
Undetected by Test
50
Detected by Test
9,000
Undetected by Test
1000
Die from burst/bleeding
Die from burst/bleeding
Die from CSP
Die from CSP
1%
1%
1%
98%
95%
5%
50%
50%
50%
90%
10%
2%
2%
20%
MRA Test Pathway
Cause of Palsy Test Result Outcome Deaths
10
1
1800
500
2311
TOTAL
= 0.2311%
0
0
0
2311 out of 1,000,000 give risk
Aneurysm
10,000
99%
Figure 4 Decision/event tree for MRA Test Pathway
Detected by Test
9,900
Undetected by Test
100
Detected by Test
90
Undetected by Test
10
Detected by Test
0
Undetected by Test
10,000
Die from burst/bleeding
Die from CSP
99%
1%
90%
10%
50%
0%
100%
2%
2%
CA Test Pathway
Cause of Palsy Test Result Outcome Deaths
2
0
5000
5002
TOTAL
= 1.495%
1
14,952 out of 1,000,000 give risk
Stroke
Strokes
Don’t die
99
Stroke
Stroke
Die from burst/bleeding
Don’t die Stroke
Don’t die Stroke
1%
1%
1%
1%
1%
50%
98%
98%
99
2
98 1
1
1
0
10 00
5000
5000 50
1% Stroke
50
9799 9799
9950
Total people
1,000,000
Large
9,900
Small
100
CSP
10,000
Others (ischaemic)
980,000
1%
1%
1%
98%
Aneurysm
10,000
99%
Figure 5 Decision/event tree for Catheter Angiagram Pathway
The decision trees show clearly the various assumptions made (in this case by the
claimant’s legal team). For example, it is assumed that, if an aneurysm really is the
cause of the palsy then if this aneurysm is undetected there is a 2% chance that it will
burst and bleed (causing death) within 48 hours. The relevance of the 48 hours is that
this was the approximate time before an informed discussion about the patient’s
condition could take place (on the Monday morning), i.e. this was the time during
which the ramifications of the alternative pathways was relevant.
The key thing about the decision trees is that, by starting with a hypothetical million
people similar to the patient, it is easy to conceptualise at each stage the number of
people who follow the separate treatment paths of the tree. For example, for the MRA
test there will be approximately 10 people who die within 48 hours as a result of an
undetected large aneurysm, whereas with the CA test there will be approximately
9799 people with the relatively harmless ischaemic condition who would suffer a
permanent stroke.
Other than an understanding of how to calculate percentages, the decision tree
approach is fully understandable without any mathematics. Moreover, (using, an excel
spreadsheet version) it also allows us to consider a wide range of different
assumptions For example, the defence argued that the 'prior' probability of an
aneurysm being the cause of the palsy could be as high as 20% rather than the 1%
assumed by the claimant’s lawyers, and that the accuracy of the MRA test in detecting
large aneurysms could be a low as 80% rather than the 95% argued by the claimant.
Such changes are simple to apply. It turned out that, even with these assumptions
(and, indeed, others that were most favourable to the defence case) the risk of the
invasive surgery (CA) was always significantly higher: from out of the million people
the number who would get a stroke/die as a result of the invasive test was much
higher than the number who would die as a result of the non-invasive test.
The numerical values resulting from the decision tree approach contrast with the
probability outputs that are obtained from the purely Bayesian mathematical version
of the same argument. For example, the latter results in a probability of 0.00001 of
death within 48 hours as a result of an undetected large aneurysm (under MRA test)
and a probability of 0.009799 that someone with the harmless ischaemic condition
would suffer a permanent stroke (under CA test). Both probabilities are clearly
‘small’ and it is difficult for lay people to appreciate the differences between these
probabilities [20]. Of course, it is possible to simply turn the argument round and
apply the probabilities to the same hypothetical one million patients (as we did in the
decision tree approach) but this is generally unsatisfactory because, as we have argued
above, lay people generally neither understood nor even believe the probabilities.
The effectiveness of the decision tree approach in the particular example was
demonstrated very clearly:
The only person working on the claimant’s legal team who understood the
Bayesian formulaic calculations was the expert witness surgeon who had
originally provided the calculations. The others who stated that they could not
understand the argument at all were: the barrister, the main lawyer working on
the case, and two other doctors involved in the case as expert witnesses.
Another lawyer supporting the main lawyer had a partial understanding, but
insufficient to explain it to his colleagues.
When presented with the decision tree approach every member of the above
legal team said that they now understood the argument. The QC grasped it
immediately and described how he would be able to use this explanation in
court without having to resort to mathematics.
5 Limitations of the Decision Tree Approach
The decision tree presentation of Bayes worked in this case because there were
sufficiently few 'linked variables'.
Generally, decision trees are suitable if the following requirements are satisfied:
The alternative pathways represented by the different decision trees are
independent (in the sense that they do not rely on some common test or action
that has not been modelled).
There are no more than a small number of variables, since even if each
variable had only two outcomes there are 2n different paths for n variables. As
a rule of thumb 6 is a reasonable limit.
Each variable has only a small number of outcomes (as a rule of thumb, less
than 5), So, for example, if it does make sense to consider ‘size of aneurysm’
in terms of a set of outcomes like {‘small’, ‘large’}, or even {‘none’, ‘small’,
‘medium’, ‘large’} then this can be accommodated in a decision tree. But if
the outcome is on a continuous scale, say 0 to 1000 in millimetres, then it
would not be possible to use a decision tree.
There are no additional causes, effects and dependencies between the
variables.
If these requirements are not satisfied the use of decision trees can become
impractical or impossible. The same is, of course, also true of any attempt to explain
Bayes from first principles using the formulaic approach; the calculations are beyond
even the most experienced mathematicians.
Hence, in such circumstances we believe that the use of Bayesian networks (causal
probability models) is inevitable, but raises again the issue of how to present the
results in a way that is understandable to lay people. Bayesian networks (referred to
subsequently as simply BNs) are graphical models (such as the one in Figure 6) where
the nodes represent uncertain variables and the arcs represent causal or influential
relationships (an accessible introductory overview of BNs can be found in [18]). BNs
have been fairly widely used in the medical domain since algorithmic breakthroughs
in the late 1980s [34] [49] meant that large-scale BN models could be efficiently
calculated. Indeed, the first commercial BN software arose out of a project to
construct a BN model for a particular type of medical diagnosis [3]. Clinical decision
support systems based on BNs were first developed in the late 1980s [7] [24]. There
have since been many hundreds of BN papers published within the medical domain,.
Examples include BN models for:
diagnosis of specific diseases [2] [13] [23] [27] [28] [40] [43] [57]
predicting risk of specific diseases [8] [32] [53] [54]
predicting specific medical outcomes [19] [25] [31] [52] [56]
analysing impact of treatment [12]
analysing test results [23] [39]
improved medical procedures [6] [22] [36] [37] [38]
cost-effectiveness analysis of different treatments [10] [45]
General guidelines on using BNs in medical applications can be found in
[41] [44] [48], while comparisons of BNs with alternative approaches in the medical
context can be found in [5] [15].
The BN model for the problem we have discussed is shown in Figure 6.
Figure 6 Bayesian network model with initial probabilities
Like all BNs, the model has two components.
1. The graphical component shown in Figure 6 that describes the causal
structure. Thus, for example, the graphical component tells us that ‘death
within 48 hours’ is caused/influenced by the combination of the cause (of the
palsy) and whether the test correctly identifies the cause.
2. A probability table associated with each node in the model. For nodes without
parents the probability table is simply the prior probabilities for each of the
node states. For nodes with parents the probability table is the prior
probability for each of the node states given each of the combinations of
parent states. For example, the probability table for the node “test correctly
identifies cause” is shown in Figure 7.
Figure 7 Probability table for the node "test correctly identifies cause”
By building the BN in a tool (such as AgenaRisk [1]) we can enter observations in the
model, such as “MRA test is performed”, and run the model. What happens is that the
various Bayesian calculations are performed automatically and the probabilities for all
of the unknown variables are revised as shown in Figure 8. We can see, for example,
that the probability of “stroke and not death” is 0% compared with just under 0.5% in
the initial model, while the probability of “death within 48 hours” is about 0.231%
compared to 0.366% in the initial model. The figure 0.231% equates to
approximately 2311 people in one million –the same result as seen in the decision tree
of Figure 4.
Figure 8 Revised Probabilities with MRA test performed
Similarly, Figure 9 shows what happens in the case of the CA test. We can see, for
example, that the probability of “stroke and not death” is now just under 1%
compared with just under 0.5% in the initial model (and 0% in the case of the MRA
test), while the probability of “death within 48 hours” is just over 0.5% compared to
0.365621% in the initial model (and 0.231% in the case of the MRA test). Again these
results are essentially the same as in the decision tree of Figure 9.
Figure 9 Revised probabilities with catheter (CA) test performed
So what is needed to accept the BN argument? The Bayesian calculation algorithms in
tools like AgenaRisk have an established pedigree, so in principle we should need
only the following:
1. To agree on what the causal structure should be: There are actually two
stages in this process: (i) identifying a minimal necessary set of variables
(nodes); and (ii) agreeing on the relevant links between the nodes. There are
many books and papers that provide guidelines on both of these steps (see, for
example [26] [50]). Our experience suggests that they are best achieved with a
BN expert and a domain expert (medical in this case) working together. If
there is more than one stakeholder then genuine disagreements can be
accommodated by producing alternative models (in many cases, for example,
experts will not agree of the direction of the links but the resulting alternative
models may still produce exactly the same results; what differs are the
probabilities that need to be elicited). The BN expert can advise on which
structures capture appropriate assumptions of independence and dependence
between variables, and also on which structures are computationally infeasible
(recommending equivalent feasible structures where appropriate).
2. To agree on the values for the probability tables. Specifically, for each node
with parents we have to specify the probability of that node’s states given
every combination of the parent nodes’ states. For nodes without parents we
have to specify the prior probabilities for each state. In the example in this
paper all of the probability values were taken from empirical studies provided
by the medical experts (where no such data is available we have to rely on
expert judgement). Where different studies provide conflicting probabilities
(as in the case of a test correctly identifying a cause) we simply create
alternative models representing the different values. In this case we created
one version with probabilities representing the most favourable results from
the defence perspective and one representing the results of the empirical
studies cited by the claimant. Running both models provides results at the two
extremes (in both cases the final result supported the claimant’s main
argument). A single BN model can also be used to test a wide range of
different assumptions by creating different ‘what-if’ scenarios involving a
range of different state observations on particular nodes. The same approach
can also be used to perform sensitivity analysis. The overall impact of these
methods is to lessen the dependence of individual assumptions.
It is important to note that the above required assumptions are no different from what
is needed to produce the decision tree argument. However, the real challenge is that,
whereas you might convince mathematically competent people that the Bayesian
calculations (that they understood in the simple case) scale up and are calculated
correctly in the tool, medical and legal professionals are reluctant to accept this. Such
professionals normally expect some simple argument to lead them to the final result in
all cases, and they are not convinced until the whole calculation is clear to them. This
is, of course, impossible. It is also irrational, given the established and
(mathematically) universally agreed pedigree of Bayes Theorem; the same people
would surely not reject the use of calculators to perform long division on the basis
that it is too difficult to understand the underlying sequence of actions that take place
at the hardware circuit level. Nevertheless, the concern is real and has impeded the
adoption of Bayes in both the medical and legal profession. Three factors have
perpetuated the problem:
1. There is a misunderstanding of the nature of Bayes Theorem. Since the
theorem is a formal mechanism for revising subjective beliefs in the light of
evidence, members of both the legal and medical professions have perceived it
as infringing on the role of the jury and doctors respectively.
2. On the rare occasions where Bayes has been introduced into court, experts
have attempted to explain the calculations from first principles rather than
simply presenting the results of the calculations. Moreover, these first
principle arguments have attempted to use the formulaic approach rather than
the alternatives discussed here. In doing so they confused the jury, judge and
lawyers [16].
3. Whereas there have been many prominent campaigns by statisticians and
others to promote acceptance of Bayes Theorem, there have been none to our
knowledge to promote acceptance of the Bayesian calculation algorithms
necessary for all but the simplest problems.
In all but the simplest situations (where something like the decision tree works) it is
unreasonable to expect lay people to understand the Bayesian reasoning. But this
should not stop us from presenting the results of calculations from a BN model. And
such results should only be challenged on the basis of the prior assumptions (causal
structure and probability tables), not on the Bayesian calculations that follow from the
prior assumptions.
There is one important additional issue that needs to be considered when presenting
the results of a complex Bayesian calculation. Compare the following statements:
1. Out of one million people 1000 are likely to die from treatment A, but only 10
are likely to die from treatment B.
2. The probability of dying from treatment A is 0.001, but the probability of
dying from treatment B is 0.00001.
Although the statements are equivalent, numerous studies have shown that statement
1 is more easily understandable to most people than statement 2 (the reference [20],
for example, describes a number of such studies). Indeed, the failure of both doctors
and lawyers to fully understand a statement like 2 was the reason why the original
Bayesian presentation was considered unacceptable in the medical negligence case.
Although it is straightforward to ‘transform’ a statement like 2 into a statement like 1,
such a transformation should be done within the BN model rather than outside. This
means incorporating numeric variables like ‘number of deaths’ into the model. Until
very recently no BN tool was able to incorporate numeric nodes accurately (a fact
which, in itself has been an impediment to more widespread use of BNs in practice).
However a recent breakthrough algorithm for dynamically discretising numeric nodes
[46] (described in overview form in Appendix 2) makes such accurate computations
possible and this algorithm is implemented in the latest version of AgenaRisk. As
shown in Figure 10 we can simply insert relevant numeric nodes with the appropriate
formulas for their probability tables.
9950
50
Figure 10 Bayesian network with additional nodes for number of people (out of 1,000,000)
affected
6 Summary and Conclusions
We have shown how simple decision trees were used effectively to distinguish the
different levels of risk of alternative diagnostic tests in a real medical negligence case.
The decision tree provides a simple and clear visual explanation of an application of
Bayes Theorem. Whereas lay people are known to have problems understanding the
Bayesian argument when presented in the mathematical way, their understanding is
radically improved by the visual representation. This method is widely generalisable.
For more complex situations involving more causal variables and dependences, the
decision tree approach is not feasible. However, another visual modelling approach,
namely Bayesian networks, provides an elegant solution in which all calculations are
done automatically. The BN approach offers a number of advantages:
The causal structure concretely represents legal/medical pathways that
otherwise get contorted by natural language.
The separate medical pathways are all captured in the same model (for
decision trees you have to create separate trees for each pathway).
The models can be built with different prior probability assumptions. Hence,
in this case we were able to run different scenarios using both the defence and
claimant assumptions. The model showed that, even using the defence’s own
assumptions, the risk of the CA test was greater than that of the MRA test.
The challenge is to convince medics and lawyers to accept the calculations that result
from such a BN model and to focus their attention purely on the initial probability
assumptions in such models.
The kind of modelling we have used also helps address a number of general and
widely applicable concerns about the decision-making process within the medical
profession. These concerns include:
the ethics of informed consent: the results of both the decision tree and BN
approaches could be used by both doctors and patients to help them make
more informed decisions
patient care liabilities when errors are made: the BN models can provide
proper quantification of the impact of such errors.
faulty research: both the decision tree and BN approaches expose the research
problem of focusing on ‘true positives’ while ignoring ‘false positives’.
We envisage a future where doctors will have immediate access to BN-based decision
support systems that automatically provide the quantified risks of choosing alternative
diagnostic test pathways for any type of condition based on 'live' data of: prior
probabilities for the condition (including patient-specific data), the various test
accuracy and sensitivity, and test outcomes. Moreover, the decision-support systems
will be able to present the results in a form that is easily understandable to the patient
as well as the doctor. While the decision as to which pathway to take ultimately still
rests with the doctor and not with a computer, at least this way the doctor and patient
will be properly informed of the relative risks.
7 Acknowledements
We would like to thank Professor Max Parmar, whose recommendation led to our
introduction to the case described in this paper. of. We also acknowledge the
contribution of the lawyers and doctors involved in the case who provided us with
many of the details and insights described. The work has also benefited from our
involvement in the EPSRC project DIADEM (EP/G001987/1) and in particular
colleagues William Marsh, George Hanna and Peter Hearty. Finally, we gratefully
acknowledge the anonymous referees for their incisive and helpful comments, which
have led to an improved paper.
8 References
[1] Agena Ltd (2008). AgenaRisk, http://www.agenarisk.com.
[2] Alvarez, S. M., Poelstra, B. A. and Burd, R. S. (2006). "Evaluation of a
Bayesian decision network for diagnosing pyloric stenosis." J Pediatr Surg
41(1): 155-61; discussion 155-61.
[3] Andreassen, Woldbye, M., Falck, B. and Andersen, S. K. (1987). MUNIN: a
causal probabilistic network for interpretation of electromyographic findings.
10th International Joint Conference on Artificial Intelligence. Milan, Italy:
366-372.
[4] Ancker, J. S., Y. Senathirajaha, et al. (2006). "Design Features of Graphs in
Health Risk Communication: A Systematic Review " J Am Med Inform
Assoc. 13: 608-618.
[5] Aronsky, D., Fiszman, M., Chapman, W. W. and Haug, P. J. (2001).
"Combining decision support methodologies to diagnose pneumonia." Proc
AMIA Symp: 12-6.
[6] Athanasiou, M. and Clark, J. Y. (2007). A Bayesian Network Model for the
Diagnosis of the Caring Procedure for Wheelchair Users with Spinal Injury.
Twentieth IEEE International Symposium on Computer-Based Medical
Systems (CBMS'07) 433-438.
[7] Beinlich, I., G. Suermondt, R. Chavez and G. Cooper (1989). The ALARM
Monitoring System: A Case Study with Two Probabilistic Inference
Techniques for Belief Networks. Second European Conf. Artificial
Intelligence in Medicine. 38: 247-256.
[8] Burnside, E. S., Rubin, D. L., Fine, J. P., Shachter, R. D., Sisney, G. A. and
Leung, W. K. (2006). "Bayesian network to predict breast cancer risk of
mammographic microcalcifications and reduce number of benign biopsy
results: initial experience." Radiology 240(3): 666-73.
[9] Casscells, W., Schoenberger, A. and Graboys, T. B. (1978). "Interpretation by
physicians of clinical laboratory results." New England Journal of Medicine
299 999-1001.
[10] Cooper, N., Sutton, A. and Abrams, K. (2002). "Decision analytical economic
modelling within a Bayesian framework: application to prophylactic
antibiotics use for caesarean section." Stat Methods Med Res 11(6): 491-512.
[11] Cosmides, L. and Tooby, J. (1996). "Are humans good intuitive statisticians
after all? Rethinking some conclusions from the literature on judgment under
uncertainty." Cognition 58 1-73
[12] Cowell, R. G., Dawid, A. P., Hutchinson, T. A. and Spiegelhalter, D. J.
(1991). "A Bayesian expert system for the analysis of an adverse drug
reaction." Artificial Intelligence in Medicine 3: 257-270.
[13] Cruz-Ramirez, N., Acosta-Mesa, H. G., Carrillo-Calvet, H., Alonso Nava-
Fernandez, L. and Barrientos-Martinez, R. E. (2007). "Diagnosis of breast
cancer using Bayesian networks: A case study." Comput Biol Med 37(11):
1553-64.
[14] Dawid, A. P. (2002). Bayes’s theorem and weighing evidence by juries. In
Bayes’s Theorem: Proceedings of the British Academy. R. Swinburne.
Oxford, Oxford University Press. 113: 71-90.
[15] Dexheimer, J. W., Brown, L. E., Leegon, J. and Aronsky, D. (2007).
"Comparing decision support methodologies for identifying asthma
exacerbations." Stud Health Technol Inform 129: 880-4.
[16] Donnelly, P. (2005). "Appealing Statistics." Significance 2(1): 46-48.
[17] Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine; problems
and opportunities. Judgment under uncertainty: Heuristics and biases. D.
Kahneman, P. Slovic and A. Tversky, Cambridge University Press: 249-67.
[18] Fenton, N. E. and Neil, M. (2007). Managing Risk in the Modern World:
Bayesian Networks and the Applications, London Mathematical Society,
Knowledge Transfer Report. 1.
www.lms.ac.uk/activities/comp_sci_com/KTR/apps_bayesian_networks.pdf
[19] Gevaert, O., et al. (2006). "Predicting the outcome of pregnancies of unknown
location: Bayesian networks with expert prior information compared to
logistic regression." Human Reproduction 21(7): 1824-1831.
[20] Gigerenzer, G. (2002). Reckoning with Risk: Learning to Live with
Uncertainty. London, Penguin Books.
[21] Goodman, J. (1992). "Jurors’ Comprehension and Assessment of Probabilistic
Evidence, ." Am. J. Tr. Advoc’y 16 361.
[22] Haddawy, P., Kahn, C. E., Jr. and Butarbutar, M. (1994). "A Bayesian
network model for radiological diagnosis and procedure selection: work-up of
suspected gallbladder disease." Med Phys 21(7): 1185-92.
[23] Hamilton, P. W., Anderson, N., Bartels, P. H. and Thompson, D. (1994).
"Expert system support using Bayesian belief networks in the diagnosis of fine
needle aspiration biopsy specimens of the breast." J Clin Pathol 47(4): 329-36.
[24] Heckerman, D. E., E. J. Horvitz and B. N. Nathwani (1992). Towards
normative expert systems: part I - the Pathfinder project. Methods of
Information in Medicine 31, 90-105.
[25] Hoot, N. and Aronsky, D. (2005). "Using Bayesian networks to predict
survival of liver transplant patients." AMIA Annu Symp Proc: 345-9.
[26] Jensen, F. V. and Nielsen, T. (2007). Bayesian Networks and Decision
Graphs, Springer-Verlag New York Inc.
[27] Kahn, C. E., Jr., Laur, J. J. and Carrera, G. F. (2001). "A Bayesian network for
diagnosis of primary bone tumors." J Digit Imaging 14(2 Suppl 1): 56-7.
[28] Kahn, C. E., Jr., Roberts, L. M., Shaffer, K. A. and Haddawy, P. (1997).
"Construction of a Bayesian network for mammographic diagnosis of breast
cancer." Comput Biol Med 27(1): 19-29.
[29] Kahneman, D., Slovic, P. and Tversky, A. (1982). Judgement Under
Uncertainty: Heuristics and Biases, New York: Cambridge University Press.
[30] Kaye, D. H. and Koehler, J. J. (1991). "Can Jurors Understand Probabilistic
Evidence?" Journal of the Royal Statistical Society. Series A (Statistics in
Society) 154(1): 75-81.
[31] Kazi, J. I., Furness, P. N. and Nicholson, M. (1998). "Diagnosis of early acute
renal allograft rejection by evaluation of multiple histological features using a
Bayesian belief network." J Clin Pathol 51(2): 108-13.
[32] Kline, J. A., Novobilski, A. J., Kabrhel, C., Richman, P. B. and Courtney, D.
M. (2005). "Derivation and validation of a Bayesian network to predict pretest
probability of venous thromboembolism." Ann Emerg Med 45(3): 282-90.
[33] Kozlov, A.V. and D. Koller, Nonuniform dynamic discretization in hybrid
networks, in Proceedings of the 13th Annual Conference on Uncertainty in AI
(UAI). 1997: Providence, Rhode Island. p. 314-325.
[34] Lauritzen, S. L. and Spiegelhalter, D. J. (1988). Local Computations with
Probabilities on Graphical Structures and their Application to Expert Systems
(with discussion). J. R. Statis. Soc. B, 50, No 2, pp 157-224.
[35] Ledley, R. S. and Lusted, L. B. (1959). "Reasoning foundations of medical
diagnosis; symbolic logic, probability, and value theory aid our understanding
of how physicians reason." Science 130(3366): 9-21.
[36] Leegon, J., Jones, I., Lanaghan, K. and Aronsky, D. (2005). "Predicting
hospital admission for Emergency Department patients using a Bayesian
network." AMIA Annu Symp Proc: 1022.
[37] Lehmann, H. and Shortliffe, E. (1991). "Thomas: building Bayesian statistical
expert systems to aid in clinical decision making." Comput Methods Programs
Biomed 35(4): 251-60.
[38] Lin, J.-H. and Haug, P. J. (2008). "Exploiting missing clinical data in Bayesian
network modeling for predicting medical problems." J. of Biomedical
Informatics 41(1): 1-14.
[39] Lucas, P. J., van der Gaag, L. C. and Abu-Hanna, A. (2004). "Bayesian
networks in biomedicine and health-care." Artif Intell Med 30(3): 201-14.
[40] Luciani, D., et al. (2007). "Bayes pulmonary embolism assisted diagnosis: a
new expert system for clinical use." Emerg Med J 24(3): 157-64.
[41] Mani, S., Valtorta, M. and McDermott, S. (2005). "Building Bayesian
Network Models in Medicine: The MENTOR Experience." Applied
Intelligence 22(2): 93-108.
[42] Marshall, A. H., McClean, S. I., Shapcott, C. M., Hastie, I. R. and Millard, P.
H. (2001). "Developing a Bayesian belief network for the management of
geriatric hospital care." Health Care Manag Sci 4(1): 25-30.
[43] McKendrick, I. J., Gettinby, G., Gu, Y., Reid, S. W. and Revie, C. W. (2000).
"Using a Bayesian belief network to aid differential diagnosis of tropical
bovine diseases." Prev Vet Med 47(3): 141-56.
[44] Montironi, R., Whimster, W. F., Collan, Y., Hamilton, P. W., Thompson, D.
and Bartels, P. H. (1996). "How to develop and use a Bayesian Belief
Network." J Clin Pathol 49(3): 194-201.
[45] Negrín, M. and Vázquez-Polo, F. (2006). "Bayesian cost-effectiveness
analysis with two measures of effectiveness: the cost-effectiveness
acceptability plane." Health Econ 15(4): 363-72.
[46] Neil, M., Tailor, M. and Marquez, D. (2007). "Inference in hybrid Bayesian
networks using dynamic discretization." Statistics and Computing 17(3): 219-
233.
[47] Nikiforidis, G. and Sakellaropoulos, G. (1998). "Expert system support using
Bayesian belief networks in the prognosis of head-injured patients of the
ICU." Med Inform (Lond) 23(1): 1-18.
[48] Nikovski, D. (2000). "Constructing Bayesian Networks for Medical Diagnosis
from Incomplete and Partially Correct Statistics." IEEE Transactions on
Knowledge and Data Engineering 12(4): 509-518.
[49] Pearl, J. (1986). "Fusion, propagation, and structuring in belief networks."
Artificial Intelligence 29: 241-288.
[50] Pearl, J. (2000). Causality: Models Reasoning and Inference, Cambridge
University Press
[51] Piatelli-Palmarini, M. (1994). Inevitable Illusions: How Mistakes of Reason
Rule Our Minds, John Wiley & Sons, Inc.
[52] Sakellaropoulos, G. C. and Nikiforidis, G. C. (1999). "Development of a
Bayesian Network for the prognosis of head injuries using graphical model
selection techniques." Methods Inf Med 38(1): 37-42.
[53] Sanders, D. and Aronsky, D. (2006). "Prospective evaluation of a Bayesian
Network for detecting asthma exacerbations in a Pediatric Emergency
Department." AMIA Annu Symp Proc: 1085.
[54] Sanders, D. L. and Aronsky, D. (2006). "Detecting asthma exacerbations in a
pediatric emergency department using a Bayesian network." AMIA Annu
Symp Proc: 684-8.
[55] Sox, H. C., M. A. Blatt, et al. (2007). Medical Decision Making, ACP Press.
[56] Verduijn, M., Rosseel, P. M., Peek, N., de Jonge, E. and de Mol, B. A. (2007).
"Prognostic Bayesian networks II: An application in the domain of cardiac
surgery." J Biomed Inform.
[57] Warner, H. R., Toronto, A. F. and Veasy, L. G. (1964). "Experience with
Baye's Theorem for Computer Diagnosis of Congenital Heart Disease." Ann N
Y Acad Sci 115: 558-67.
9 Appendix 1: Bayes Theorem
Let A be the event 'person has the disease'
let B be the event 'positive test'.
We wish to calculate the probability of A given B, which is written p(A|B).
By Bayes Theorem this is:
Now, in the example of Section 2, we know the following:
p(A)=0.001
p(not A)=0.999
p(B| not A) = 0.05
p(B|A) = 1
Hence:
which is equal to 0.1963
10 Appendix 2: Handling Numeric nodes in Bayesian
networks using dynamic discretisation
Handling continous numeric nodes in BNs is generally difficult because (except in
very special cases) there is no analytic method for computing the necessary Bayesian
calculations. Consequently, continuous nodes have to be ‘discretised’. For example, a
node representing size of an aneurysm in millimetres cannot simply be declared to be
‘in the range 0 to 1000’; it must be defined in terms of finite discrete intervals such as
0 to 10, 10 to 20 etc. The standard approach to working with such continuous
numeric nodes in BN tools is to use static discretisation, whereby the set of
discretisation intervals is defined by the user in advance of any computations and do
not change regardless of the evidence entered into the model. But this process is both
complicated and inaccurate. You must guess the state ranges before running the
calculation, thus pre-supposing that you know the resulting probability distribution of
the results beforehand. In simple cases this may be quite easy, but in others it will be
difficult or even impossible. The dynamic discretisation algorithm [46] addresses the
problem in general by using entropy error [33] as the basis for approximation. In
outline, the algorithm follows these steps:
Convert the BN to an intermediate structure called a Junction Tree (JT) (this is
a standard method used in BN algorithms and is described in, for example,
[34])
Choose an initial discretisation in the JT for all continuous variables.
Calculate the Node Probability Table (NPT) of each node given the current
discretisation.
Enter evidence and perform global propagation on the JT, using standard JT
algorithms
Query the BN to get posterior marginals for each node, compute the
approximate relative entropy error, and check if it satisfies the convergence
criteria.
If not, create a new discretisation for the node by splitting those intervals with
highest entropy error.
Repeat the process by recalculating the NPTs and propagating the BN, and
then querying to get the marginals and then split intervals with highest entropy
error.
Continue to iterate until the model converges to an acceptable level of
accuracy.
This dynamic discretisation approach allows more accuracy in the regions that matter
and incurs less storage space over static discretisations. In the implementation [1] of
the algorithm, the user simply specifies the range of the variable without ever having
to worry about the discretisation intervals. Default settings are provided for the
number of iterations and convergence criteria. However, the user can also select the
number of iterations and convergence criteria, and hence can go for arbitrarily high
precision (at the expense of increased computation times).