PreprintPDF Available

Healthcare serial killer or coincidence? Statistical issues in investigation of suspected medical misconduct

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Justice systems are sometimes called upon to evaluate cases in which healthcare professionals are suspected of killing their patients illegally. These cases are difficult to evaluate because they involve at least two levels of uncertainty. Commonly in a murder case it is clear that a homicide has occurred, and investigators must resolve uncertainty about who is responsible. In the cases we examine here there is also uncertainty about whether homicide has occurred. Investigators need to consider whether the deaths that prompted the investigation could plausibly have occurred for reasons other than homicide, in addition to considering whether, if homicide was indeed the cause, the person under suspicion is responsible. In this report (commissioned by the Section on Forensic Statistics of the Royal Statistical Society, London) we provide advice and guidance on the investigation and evaluation of such cases. Our work was prompted by concerns about the statistical challenges such cases pose for the legal system.
Content may be subject to copyright.
Healthcare serial killer or coincidence?
Statistical issues in investigation of suspected
medical misconduct
P.J. Green
, R.D. Gill
, N. Mackenzie
, J. Mortera§
, W.C. Thompson
27 September, 2022
Abstract
Justice systems are sometimes called upon to evaluate cases in which
healthcare professionals are suspected of killing their patients illegally.
These cases are difficult to evaluate because they involve at least two levels
of uncertainty. Commonly in a murder case it is clear that a homicide
has occurred, and investigators must resolve uncertainty about who is
responsible. In the cases we examine here there is also uncertainty about
whether homicide has occurred. Investigators need to consider whether
the deaths that prompted the investigation could plausibly have occurred
for reasons other than homicide, in addition to considering whether, if
homicide was indeed the cause, the person under suspicion is responsible.
In this report (commissioned by the Section on Forensic Statistics of the
Royal Statistical Society, London) we provide advice and guidance on
the investigation and evaluation of such cases. Our work was prompted
by concerns about the statistical challenges such cases pose for the legal
system.
1 Table of Contents
Contents
1 Table of Contents 1
2 Overview 2
3 “This could not have been a coincidence!” 5
3.a Seemingly unlikely coincidences can and do occur . . . . . . . . . 8
3.b The importance of avoiding illogical inferences from p-values . . 9
4 Competing theories 11
University of Bristol,UK
Leiden University, Netherlands; email: gill@math.leidenuniv.nl
Arnot Manderson Advocates, Edinburgh
§Universit´a Roma Tre
University of California, Irvine
1
5 Investigative bias 15
5.a Unconscious bias throughout society . . . . . . . . . . . . . . . . 16
5.b Anatomy of a biased investigation . . . . . . . . . . . . . . . . . 18
5.c “Suspicious deaths” . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.d Access and opportunity . . . . . . . . . . . . . . . . . . . . . . . 20
5.e Similar circumstances . . . . . . . . . . . . . . . . . . . . . . . . 21
5.f Theroleofchance .......................... 22
6 Advice on investigative procedures 23
6.a Identifying all potential causal factors . . . . . . . . . . . . . . . 24
6.b Minimisingbias............................ 25
6.c The role for statistics in other specialist evidence . . . . . . . . . 27
7 Advice for evidence evaluation and case presentation 28
7.a Thelawyersrole ........................... 28
7.b Evaluating event clusters . . . . . . . . . . . . . . . . . . . . . . . 30
7.c Recognising the consequences of investigative bias . . . . . . . . 31
7.d Avoiding fallacious interpretations of statistical findings . . . . . 34
8 Conclusions and summary of recommendations 34
A Appendices 37
A.1 Probability and odds . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2 Statistical significance, effect size and risk . . . . . . . . . . . . . 37
A.3 Sensitivity and specificity . . . . . . . . . . . . . . . . . . . . . . 39
A.4 Bayesrule .............................. 39
A.5 Cumulative effects of bias: two worked numerical examples . . . 43
A.5.1 Example1........................... 43
A.5.2 Example2........................... 45
A.6 Patterns of occurrence of adverse events . . . . . . . . . . . . . . 46
A.7 Usual practice in medical statistics and epidemiology. . . . . . . 49
A.8 Annotated code and output . . . . . . . . . . . . . . . . . . . . . 50
A.9 Members of the working party drawing up this report . . . . . . 57
B Bibliography 57
2 Overview
Justice systems are sometimes called upon to evaluate cases in which health-
care professionals are suspected of killing their patients illegally. These cases
are difficult to evaluate because they involve at least two levels of uncertainty.
Commonly in a murder case it is clear that a homicide has occurred, and in-
vestigators must resolve uncertainty about who is responsible. In the cases we
examine here there is also uncertainty about whether homicide has occurred.
Investigators need to consider whether the deaths that prompted the investiga-
tion could plausibly have occurred for reasons other than homicide, in addition
to considering whether, if homicide was indeed the cause, the person under
suspicion is responsible.
In this report, the RSS provides advice and guidance on the investigation
and evaluation of such cases. This report was prompted by concerns about
2
the statistical challenges such cases pose for the legal system. The cases often
turn, in part, on statistical evidence that is difficult for lay people and even
legal professionals to evaluate. Furthermore, the statistical evidence may be
distorted by biases, hidden or apparent, in the investigative process that render
it misleading. In providing advice on how to conduct investigations in such
cases, this report particularly focuses on minimising the kinds of biases that
could distort statistical evidence arising from the investigation. This report
also provides guidance on how to recognise and take account of such biases
when evaluating statistical evidence and more broadly on how to understand
the strengths and limitations of such evidence and give it proper weight.
This report is designed specifically to help all professionals involved in in-
vestigating such cases and those who evaluate such cases in the legal system,
including expert witnesses. It will also be of interest to scholars and legal profes-
sionals who are interested in the role of statistics in evidentiary proof, and more
generally to anyone interested in improving criminal investigations. With such
a wide range of audiences, it is inevitable that for some readers certain sections
may seem more relevant, and some less so, but we believe it is important not
to aim particular sections at particular kinds of reader. We want, for example,
the barrister to see what advice we give to the expert statistical witness and
we hope understand it, at least in broad terms and vice versa; we believe that
is important in helping all parties to appreciate the contributions of others in
reaching just outcomes.
Because suspicions about medical murder often arise due to a surprising
or unexpected series of events, such as an unusual number of deaths among
patients under the care of a particular professional, this report will begin (in
Section 2) with a discussion of the statistical challenge of distinguishing event
clusters that arise from criminal acts from those that arise coincidentally from
other causes. This analysis will show that seemingly improbable patterns of
events (eg apparent clusters, rising trends, etc.) can often arise without criminal
behaviour and may therefore have less probative value than people assume for
distinguishing criminality from coincidence.
Section 3 of this report will focus on the competing theories that are of-
ten advanced by the prosecution and defence when a medical professional faces
criminal charges for killing patients. The prosecution’s theory is typically that a
medical professional, previously trusted to perform critical life-saving functions,
has unexpectedly (and sometimes inexplicably), chosen to murder patients in
his or her care. While history has shown that humans are capable of such be-
haviour, and there have indeed been cases in which, for example, physicians
have murdered multiple patients, nevertheless proven instances are thankfully
extraordinarily rare a mere handful of documented cases, perhaps a dozen or
so per year, among the many millions of healthcare professionals worldwide. So
the prosecution’s theory in such cases is often one that appears, a priori, to
be improbable. Alternative theories ie, that some unknown factors, or mere
chance, caused deaths to occur in apparently extraordinary numbers among pa-
tients under the care of a particular professional often also appear improbable.
So the assessment of the case invariably turns, at least in part, on a weighing or
balancing of the probabilities of seemingly extraordinary events. Such assess-
ments are challenging under the best of circumstances but become especially
difficult when the evidence adduced to distinguish between the competing the-
ories may be biased or presented in a misleading manner.
3
Section 4 of this report discusses the kinds of investigative biases that can
arise in these cases. Our focus is on ways that investigators’ desires and expec-
tations may unintentionally and even unconsciously influence what they
look for, how they characterise and classify what they find, what they deem to
be relevant and irrelevant, and what they choose to disclose. Examiner bias is a
well-known phenomenon in both scientific and forensic investigations. It arises
in large part from what are known as observer effects, a tendency for human
beings to look for data confirming their expectations (confirmation bias) and
to interpret data in ways that are subtly (and often unconsciously) influenced
by their expectations and desires. Statisticians have long studied the ways in
which examiner bias can distort statistical evidence emerging from scientific
and forensic investigations. In Section 4, we apply insights from this scientific
literature to an analysis of the investigative process in the types of cases dis-
cussed in this report. We also draw examples from investigations of actual cases
that illustrate what we believe to have been biased investigative processes and
discuss how such biases can generate misleading statistical findings. It bears
repeating that our focus in this section is on processes that can unintentionally
and unconsciously influence the investigative process. We are not questioning
the general honesty, integrity or good intentions of those involved in investigat-
ing such cases. We focus instead on investigative procedures that can distort
statistical findings in ways that, while entirely unintentional, may nevertheless
be important.
Section 5 of this report provides advice on how to improve investigative
procedures in order to minimise investigative biases. While it is impossible to
eliminate all human biases from a criminal investigation, there are a number of
procedures that can reduce bias and thereby improve the quality and objectivity
of the evidence emerging from the types of investigations we discuss here. We
focus particularly on the advantages of blinding and masking procedures, which
involve temporarily withholding potentially biasing facts from some of those
involved in the investigation. We go on to discuss ways to reduce “tunnel vision”
in which the investigation becomes a search for evidence confirming a particular
investigative theory while ignoring or dismissing evidence inconsistent with that
theory. We provide and explain advice on appropriate correct analyses of data,
and discuss two worked examples.
Section 6 provides advice on evidence evaluation and fact-finding in these
cases. We expect this report to be relevant and useful anywhere such cases may
arise; hence we do not limit our discussion to the needs of a particular legal
system, and expect our advice to be useful both in inquisitorial and adversarial
legal processes. We believe the statistical issues in these cases pose challenges to
legal fact-finders in every jurisdiction, whether they are professional judges or lay
jurors, and are challenging for lawyers as well. Our advice focuses on identifying
and appreciating ways in which statistical evidence may be misleading, and
assuring (to the extent possible) that presentations of evidence are balanced in
order to help triers-of-fact appreciate both the strengths and limitations of the
evidence, and give it only the weight it deserves. We will provide examples of
presentations and arguments that we consider to be misleading or inappropriate.
We will discuss cautionary instructions that may be helpful to lay fact-finders.
Ultimately, we hope our comments will help lawyers and judges, and statisticians
and other experts, refine their presentation and evaluation of evidence in these
difficult cases in order to better serve the interests of justice.
4
Finally, we draw together our main conclusions, and present a summary of
our most important recommendations in Section 7.
3 “This could not have been a coincidence!”
The challenge of drawing conclusions from suspicious clusters of deaths
(or other adverse outcomes)
In some cases suspicions against medical professionals arise for the very rea-
son that an apparently unusual number of deaths occurs among their patients.
In other cases suspicions arise for unrelated reasons and this prompts an ex-
amination of cases where a certain medical professional was on duty and this
reveals an apparently unusual number of deaths. There is a statistical chal-
lenge of distinguishing event clusters that arise from criminal acts from those
that arise coincidentally from other factors. Seemingly improbable clusters of
events can often arise by chance without criminal behaviour and may therefore
have less probative value than people assume for distinguishing criminality from
coincidence.
Lucy and Aitken published an analysis of evidence used to prosecute medical
professionals accused of harming their patients1. They found (see p. 152) that
“evidence of attendance” was “by far the most frequently occurring” yet was also
“the most difficult type of evidence, both from a legal and epistemological point
of view”. While other types of evidence may be presented, such as evidence
of a criminal intent (mens rea) and the means to carry it out, or eyewitness
accounts, these tend to be less than definitive for a variety of reasons, such as
the difficulty of ascertaining retrospectively the exact cause of death and the
uncertainty inherent in assessing human motives and behaviour. Statistics on
the relative rate of deaths when a particular professional was “in attendance”
may, by contrast, seem more objective and scientific, making statistical evidence
the lynchpin of these cases.
Drawing causal conclusions from a statistically improbable cluster of events
is often challenging, however.2A criminal investigation is analogous to a ret-
rospective observational study. In such a study, it is possible to ascertain cor-
relations between variables. The study might establish, for example, that the
death rate was higher when a particular medical professional was present on a
hospital ward. However, one of the fundamental principles of logical inference
is that correlation does not prove causation. The increase in death rate cannot,
in itself, prove that the professional in question was engaging in misconduct
that caused the increase in deaths because other factors, known as confounding
variables, might offer alternative explanations.3Competent investigators are
attentive to the possibility of confounding variables and may attempt to take
them into account. Even if all known confounding variables are taken into ac-
count, however, there might be additional confounders, unknown, unmeasured,
unmeasurable, or otherwise inadequately dealt with, that affect mortality rates
when a given medical professional is on duty. For example, there may be changes
1Lucy and Aitken (2002).
2Wartenberg, 2001.
3A confounder is a variable, not of prime concern in a study, that is associated with both
the ‘exposure’ (eg presence of a particular nurse) and the ‘outcome’ (eg unexpected death
of a patient). Neglect or inadequate attention to confounders typically leads to misleading
conclusions about the causal effect of the exposure.
5
in the circumstances and characteristics of the hospital for reasons that are not
measured, or even not observable at all. So finding an association of a particular
professional with high mortality rates cannot per se have a causal interpretation.
It is customary to compute the relative risk, which is the ratio of the death
rate per unit of time when the suspected individual is on duty to the death
rate per unit time when the suspect is not on duty. For example, an analysis
in a prosecution against US nurse Jane Bolding (see box over page) found that
patients under her care were 47.5 times more likely to suffer cardiac arrest than
were those of other nurses, and that “an epidemic” of cardiac arrests ceased
when Bolding left the hospital unit where it occurred (Sacks et al., 1988).
To help the court interpret such statistics, experts often report a p-value,
which is an estimate of the probability that the observed (or a greater) number
of deaths would occur by chance, if the risk to the patients in question was in
fact no higher than the risk to similar patients who were not under the care of
the accused.
6
The case of Jane Bolding
Statistical evidence often plays a prominent role in the investigation of sus-
pected healthcare serial killers. In 1988, Jane Bolding, an American nurse
who worked in the intensive care unit of Prince George’s Medical Center
in Maryland, was prosecuted for serial murder of patients, allegedly by ad-
ministering lethal doses of potassium chloride. The key evidence against
Bolding was the high incidence of cardiac arrest during periods when Bold-
ing was on duty. Evidence suggested that she had been the primary nurse
on duty when 57 heart attacks occurred, while the number during com-
parable periods when other nurses were on duty had never exceeded five.
An analysis by epidemiologists from the U.S. Centers for Disease Control
(CDC) concluded that Bolding’s patients were 47.5 times more likely to ex-
perience cardiac arrest than those of other nurses and that“an epidemic”
of cardiac arrests ceased when Bolding left the hospital unit where it oc-
curred (Sacks et al., 1988; CDC, 1985). Sacks testified at Bolding’s trial
that “[t]he chances of [this large number of cardiac arrests] happening by
chance is about one in 100 trillion.”
Other than the statistical evidence, the key evidence against Bolding was an
alleged confession. During a 23-hour interrogation, Bolding reportedly con-
fessed to killing two patients and agreed to write letters of apology to their
families. She later retracted this confession, however, and it was excluded
from the trial after a judge found that it had been obtained through ille-
gally coercive methods that violated Bolding’s constitutional rights. Con-
sequently, prosecutors had little to rely upon during the trial other than the
statistical evidence. No one testified to seeing Bolding inject any patients
with potassium chloride, and although post-mortem examinations showed
that the patients had higher than normal potassium levels, it was impos-
sible to determine whether potassium chloride poisoning was the cause of
the deaths. Defence lawyers offered alternative theories for the elevated
rate of deaths during periods when Bolding was present.
A judge, who decided the case without a jury, found the prosecution’s sta-
tistical evidence insufficient to warrant a conviction, saying “the state at
most has placed [Bolding] at the scene of the offenses. . . but that is insuf-
ficient to sustain a conviction.” (Washington Post, June 21, 1988). Ac-
cording to the judge, the statistical evidence “failed to supply the missing
link that would connect the defendant with the alleged criminal act,” and
consequently “the state’s reach hopelessly exceeded its grasp” (AP News,
June 20, 1988).
Sacks testified at Bolding’s trial that “[the] chances of [this large number
of cardiac arrests] happening by chance is about one in 100 trillion”. Faced
with such evidence, it is understandable that authorities may declare: “This
could not have been a coincidence!” and conclude that the only reasonable
explanation is criminal misconduct. Yet a judge acquitted Bolding of charges
that she killed three gravely ill patients and attempted to murder two others by
injecting them with massive doses of potassium chloride, finding this statistical
evidence insufficient to sustain a conviction.
As we will explain, there are a number of potential problems with compu-
7
tations and interpretations based on such evidence. Relative risk is difficult to
compute in a manner that takes appropriate account of all relevant variables,
and efforts to evaluate relative risk may be distorted by a variety of biases and
predictable errors that can make such statistics misleading.4
Recommendations for appropriate statistical analysis are discussed in Sec-
tion 5.5We will leave those problems aside for the moment, however, in order
to discuss a more fundamental issue: what conclusions can reasonably be drawn
from a seemingly unlikely cluster of occurrences?
3.a Seemingly unlikely coincidences can and do occur
It is important to acknowledge that seemingly unlikely coincidences occur regu-
larly in other words rare events do happen. The individual chances of winning
a lottery, for example, are often extremely low, yet winners are declared regu-
larly and it is rare that anyone takes the low probability of winning as indicating
that the victory “could not have been a coincidence.” A California couple once
managed to win two separate lotteries in a single day. According to one esti-
mate, the probability of winning both lotteries by chance was approximately
one in 26 trillion.6Yet there was no general belief that this probability was low
enough to rule out the possibility it was merely coincidence; no serious claim
that it proved the lotteries had been “fixed”. It is therefore worth consider-
ing why the low p-value assigned to a relative risk analysis can be taken as
powerful evidence that a medical professional engaged in criminal misconduct,
while a similarly low probability of winning a lottery is not viewed as convincing
evidence that the lottery was corrupt.
The difference may arise, in part, from our understanding that many people
play the lottery. There may only be one chance in many millions of winning,
but if many millions of people play, it becomes quite plausible that someone
will, by chance, be a winner (and this holds even without the lottery operator
engineering wins by eg carrying over unclaimed prizes to later draws). When
suspicions arise against a medical professional, by contrast, our focus is on that
single individual. We are not thinking about the likelihood that an unusual
number of deaths will be observed somewhere among the patients of one of
the millions of medical professionals in the world; we are thinking about the
likelihood of so many deaths among patients of a particular individual. While
we expect someone to win a lottery; we do not expect any particular player to
win; and by similar logic we do not expect a rash of deaths among patients of
a particular medical professional.
4This is a separate issue from that of how to report changes in risk. It is important to
distinguish between relative risk and absolute risk to avoid misinterpretation. The absolute
risk is the number of events (good or bad) during a stated period in treated or control groups,
divided by the number of people in that group. The relative risk is the ratio of the absolute
risk in the treatment group, divided by that in the control group.It is very common, both
in serious studies and in everyday life, for a large relative risk to be reported for what may
be a very small, even negligible, increase in absolute risk. See Appendix 2, and Spiegelhalter
(2017). See also the BMJ blog article Making a pig’s breakfast of research reporting which
includes the sentence ”This implies that every single portion of bacon increases risk by 20%,
when in fact the study found that only increased consumption over time increases absolute
risk by 0.08%.”
5See also the Inns of Court College of Advocacy/Royal Statistical Society report on Statis-
tics and Probability for Advocates (2017).
6See ABC News, California couple wins two lotteries in one day.
8
This difference may be more apparent than real, however, particularly if an
apparently anomalous number of deaths is the very reason that suspicions arose
against that individual, or was the reason that such suspicions led to a criminal
prosecution. It is unlikely that a 1-in-10-million coincidence will incriminate
any particular medical professional, but given the very large number of medical
professionals in the world, it is likely, perhaps even inevitable, that such a
coincidence will affect the patients of some medical professional at some medical
facility somewhere in the world.7Consequently, if we take the 1-in-10-million
coincidence to be evidence of medical misconduct, it is inevitable that we will
falsely incriminate innocent people. Thus, the existence of a cluster of deaths
among patients of a medical professional should not, in itself, be taken as proof
of criminality. We are not suggesting that such evidence is worthless; indeed,
as we will explain, it may have substantial probative value, but its value needs
to be assessed carefully in light of the other evidence in the case.
3.b The importance of avoiding illogical inferences from
p-values
Suppose that an unexpectedly large number of deaths occur among patients
of a particular medical professional. Suppose further that an expert concludes
that the probability of so many deaths occurring by chance is only 0.000001,
or one in a million. What can we conclude about the chances that the medical
professional engaged in misconduct?
As mentioned earlier, there are a number of potential problems surrounding
the computation of statistics of this type, p-values, but let us leave such issues
aside for the moment and assume that the expert’s probability is well supported
by available data. What can we conclude from it?
People often think that a p-value tells them the probability that a coincidence
occurred. So, it may seem reasonable to assume that the p-value means there
is only one chance in a million that so many deaths occurred by coincidence
and correspondingly 999,999 chances in a million that the deaths arose from
some other cause. If misconduct by the medical professional appears to be
the only plausible alternative explanation, then people might be tempted to
conclude that the probability of misconduct is overwhelming (999,999 chances
in 1 million). Thus, the p-value of 1 in 1 million might inadvertently be taken
as proof that there is only 1 chance in 1 million that the medical professional is
innocent of misconduct.
Research has shown that people are often tempted to think in this manner
and have difficulty seeing errors in this chain of logic. Yet, this line of thinking
is indeed erroneous and illogical, and may cause people to misinterpret the
value of statistical evidence in ways that are dangerous and unfair for medical
professionals accused of misconduct.
The problem arises from misinterpretation of the p-value. The p-value is not
the probability that a coincidence occurred. Rather, the p-value is the probabil-
ity of the observed or more extreme evidence if it arose due to coincidence. In
a case involving unexpected deaths, the expert might ask, for example, “what
7For a different example making a similar point, it is a simple calculation that if 1000
events occur completely at random in a year, there is a more than 93% chance that one of the
365 calendar days includes 8 or more events, yet the chance of 8 or more on one particular
day is less than 1%.
9
is the probability, by chance, of observing a given number of deaths (or more)
among patients of a particular professional, if we assume that the rate of deaths
is no higher than for patients of other similar professionals?” That is a different
question than asking “what is the probability that the explanation for these
deaths is mere coincidence?” Importantly, the answer to the former question
does not tell us the answer to the latter, although people may have difficulty
seeing why not.
Misunderstanding of p-values
Question expert attempts to answer:
“What is the probability of observing this many deaths (or more) by chance
or coincidence IF the risk of deaths to the patients in question was in fact
no higher than the risk to similar patients who were not under the care of
the accused?”
Question people may mistakenly think expert is answering:
What is the probability that the high number of deaths observed among
the patients in question is explained by chance or coincidence?
Ap-value is a statement about a conditional (cumulative) probability that
is, a statement about the probability of one event (observing the data, or some-
thing more extreme) in light of another event (there is no effect of interest, only
chance).8People are generally familiar with the idea that the occurrence of
one event may change our uncertainty about another event. For example, the
probability of rain if the sky is cloudy will differ from the probability of rain if
the sky is clear. But psychological research has shown that people often fall vic-
tim to a logical fallacy known as “transposing the conditional”, misinterpreting
the probability of the evidence given a hypothesis with the probability of the
hypothesis given the evidence. In a legal context, this error has been known as
the “prosecutor’s fallacy,” the “source probability error,” or the “fallacy of the
transposed conditional”.9It is a widespread mode of reasoning, affecting many
of the general public, the media, lawyers, jurors and judges alike.10 When in-
correctly stating that the p-value tells us the probability a medical professional
is innocent of misconduct, the error of logic is not so easy to see.
The error is to confuse or equate the conditional probability of event A,
given that event B occurs, with the conditional probability of event B, given
that event A occurs; the fault in this logic is easy to illustrate by inserting
various propositions for A and B. Consider, for example, that the probability
an animal has four legs if it is a dog is not the same as the probability an animal
8The “another event” is a hypothesis; in some philosophical traditions, this would not be
considered an event, but the logic that follows holds regardless.
9Thompson & Schumann, 1987; Koehler, 1993; Balding & Donnelly, 1994; Evett, 1995.
10For example, in the trial of Sally Clark for double infanticide, an expert medical witness
testified that the probability that both her babies would have died from natural causes was
one in 73 million (This figure has itself been widely and properly criticised, but that is not
the issue here). If, as appears very natural, we refer to this figure as “the probability that the
babies died by innocent means”, it is all too easy to misinterpret this as the probability (on
the basis of the evidence of the deaths) that Sally is innocent such a tiny figure seeming to
provide incontrovertible proof of her guilt. For other evidence of the widespread appearance
of this fallacy, and the difficulty people have in appreciating it, see Thompson & Newman,
2015; de Keijser & Elffers, 2012.
10
is a dog if it has four legs. The probability a person speaks Spanish if they grew
up in Peru is not the same as the probability they grew up in Peru if they
speak Spanish. With simple examples like these, the logical error of equating
the two conditional probabilities is easy to see. But this same error underlies
the assumption that the p-value tells us the probability a medical professional
is innocent of misconduct, and in that context the error of logic is not so easy
to see. A statistician testifies that the conditional probability of observing
so many deaths (or more) is only 0.000001, or 1 chance in a million, if the
deaths are occurring randomly at rates no higher than among similar patients
of other medical professionals. Here it is tempting to transpose the conditional
probabilities to assume that there is also only 1 chance in 1 million that the
deaths are occurring randomly given that so many deaths occurred.
Of course investigators and triers-of-fact in these cases need to evaluate
whether the deaths in question could have occurred by chance. While the p-
value does not answer that question directly, it may nevertheless be helpful by
casting light on the importance of the death rate when considered in combination
with other evidence in the case.
A final important caveat about use of p-values is that the assumptions un-
derlying their calculation include that the test in question is the only test to
be conducted. If large numbers of hypotheses are tested, then some will yield
statistically significant results just by chance that chance is precisely what the
numerical p-value measures so if several tests are conducted some adjustment
for multiple testing should be used.
The correct way to “invert the conditional” and avoid the Prosecutor’s Fal-
lacy involves a simple logic that is codified in what is known as “Bayes’ rule”,
that combines the probabilities of the evidence given various possible explana-
tions for the data “hypotheses”) with the prior probabilities of these hypotheses,
to deliver the probabilities of each hypothesis given the evidence. It is the same
logic as is used to report a diagnosis following a medical test. In Appendix 4,
Bayes’ rule is explained, using an example presented in non-technical language.
As we will discuss in the following sections, the connection between a p-
value and the probability of misconduct by a medical professional becomes even
weaker and more problematic when there are other possible explanations for the
evidence (other than coincidence or misconduct by a particular individual), and
when a p-value is calculated in a biased and misleading manner.
4 Competing theories
There are documented cases in which medical professionals have intentionally
engaged in misconduct that put their patients at risk. A well-known example is
that of Harold Fredrick Shipman, an English physician in general practice (see
box below). In 2000, Shipman was found guilty of the murder of 15 patients
under his care.11
While it is important to acknowledge that cases like that of Shipman exist,
it is also important to realise that they are extremely rare. Of the hundreds
of millions of medical professionals in the world, the number who are known
to have been serial killers of their patients is very small, a miniscule fraction
of the total number. Consequently, in the absence of any other evidence of
11See The Shipman Inquiry (2003).
11
misconduct, the prior odds of any particular medical practitioner engaging in
such conduct must be considered extremely low, of the order of one chance in
millions.12 As noted in the previous section, such low prior odds will often be
difficult to overcome on the basis of statistical evidence alone. Even if there is a
cluster of deaths among the practitioner’s patients that is a million times more
probable if the practitioner is a murderer than if the deaths occurred by chance,
a logical assessment of the posterior odds might still conclude that the theory
of coincidence is more probable.13
The case of Harold Shipman
There are documented cases in which medical professionals have intention-
ally engaged in misconduct that put their patients at risk. A well-known
example is that of Harold Fredrick Shipman, an English physician in gen-
eral practice. In 2000, Shipman was found guilty of the murder of 15
patients under his care. Investigators suspected he was responsible for the
deaths of many others, perhaps as many as 250, making him one of the
most prolific serial killers in modern history.
Concerns about Shipman were first raised by other medical practitioners,
who noted what appeared to be an unusually high rate of deaths among Ship-
man’s patients. An initial police investigation in 1998 found insufficient
evidence to bring charges, but police subsequently learned that the wills of
some of Shipman´s former patients had been altered under suspicious cir-
cumstances to leave assets to Shipman, rather than family members of the
deceased. Further investigation found evidence that Shipman had admin-
istered lethal doses of sedatives to healthy patients, and had then altered
medical records to indicate falsely the patients had been in poor health.
Based on this evidence Shipman was prosecuted and convicted.
In light of this grim episode, there were calls for improved monitoring of
adverse medical outcomes, to allow dangerous medical misconduct to be
detected earlier. For example, statistician David Spiegelhalter and col-
leagues suggested that statistical monitoring of patient death rates would
have raised red flags about Shipman’s misconduct years earlier, thereby
saving lives (see Spiegelhalter, D. et al., 2003).
An assessment of the posterior odds may change dramatically, however,
should other evidence emerge that supports the theory of misconduct, such as
evidence of the altered wills and altered medical records in Shipman’s case. Evi-
dence of this type, when considered in combination with the statistical evidence,
may well make an overwhelming case in favour of conviction. Consequently, it
is vital that suspicions raised by apparent statistical anomalies be followed by a
careful investigation that looks for other evidence, such as evidence of motive,
consciousness of guilt, or actual lethal medical interventions. Furthermore, as
discussed in later sections, it is vital that such an investigation be conducted
in a neutral, open-minded fashion to minimise bias. If such an investigation is
conducted and no supporting evidence is found, that may be a strong indicator
12There is an extensive peer-reviewed literature quantifying this risk: see for example Forrest
(1995).
13Posterior odds are defined in Appendix 4.
12
that the statistical anomaly indeed arose from coincidence, or that causal fac-
tors other than misconduct by the accused individual are responsible for what
happened.14
Investigators should always bear in mind that there may be innocent expla-
nations for apparent (and even striking) correlations between a particular pro-
fessional’s presence in a hospital on the one hand, and deaths, resuscitations, or
other incidents on the other hand. Correlations might be caused by many fac-
tors, some of which might in principle be known, but still hard to take account
of; some might be unknown altogether. Seriously ill patients on a medium care
ward are quite likely to die at any moment, but the best medical professionals
will not be able to predict exactly when. In one such hospital situation, statis-
tical analysis of registered times of deaths shows that most deaths happen in
the morning.15 The physiological explanation is that after some sleep, bodily
functions resume, and organs close to breaking point can suddenly fail. In this
hospital, there are many nurses on duty during the morning shift, starting at 7
a.m. and lasting seven hours till 2 p.m. This is also the period when medical
specialists make their rounds. In the afternoon and evening, there are fewer
nurses on duty. Things are quietening down for the night. There are also fewer
deaths and emergencies, and fewer medical specialists present. During the night
shift, everything is very quiet, there are few nurses on duty. There are also few
events. Contrary to popular imagination, most people do not die in their sleep
at night; they die in the morning while waking up.16
All this means that most nurses (especially full time, fully qualified nurses)
spend many more hours on duty during the morning, and even less during the
night, compared to the afternoon and evening. Most deaths occur (or at least,
are registered to have occurred) in the morning. Therefore, most full-time, fully
qualified nurses do experience many more deaths when they are on duty than
occur when they are not on duty!
A complicating factor is that time of death has to be registered by a doctor.
The rules on what a doctor is supposed to write on a death certificate vary
in different jurisdictions: is it the time that they guess that the patient had
died, or is it the time at which they sign off that they have determined that the
patient has died?17 In some countries, data show that “official” times of death
are often rounded to whole hours or whole half hours, sometimes even to whole
days. A patient found dead in the morning might be registered as having died
at five past midnight or at five past seven, for all kinds of innocent reasons.
The activities of nurses can certainly influence these registered times. One
might imagine that a better nurse checks up on all their patients more often
and checks up more carefully and is more aware of how they are doing. A better
nurse will therefore notice and signal a death (or an emergency) earlier than a
worse nurse; thereby causing deaths to be registered in their shifts and not later.
A better nurse will clock-in for work well before their shift starts, and clock-
out well after it ends, in order to participate fully in the necessary hand-over
from one shift to the next. There is a lot of evidence that the apparent excess of
14Absence of evidence can sometimes be evidence of absence, see commentary of Aart de
Vos, translated in Gill (2021), and Thompson & Scurich, 2018.
15See Dotto, Gill & Mortera, 2022, especially Figs 2 and 3.
16See Mitler, et al, 1987.
17Currently in England and Wales there are no rules about this; in Scotland it is quite
clearly codified at Scottish Government, 2018.
13
deaths when Lucia de Berk (see box over page) was on duty was connected to the
fact that she was in fact a better (more hard working, more conscientious) nurse
than many of her colleagues.18 In fact we know that she had been evaluated as
an excellent nurse and consequently rapidly gained the necessary qualifications
to be entrusted with harder tasks. The rapid increase in deaths on her ward
coinciding with this “promotion” also coincided with a management decision to
transfer babies with genetic birth defects from intensive care (where any deaths
occur in different circumstances and are likely to be accurately recorded) to
medium care, in order that they might then be rapidly transferred to home.
Notice that the nurses themselves, as well as their lawyers, or the experts
they call as witnesses, may not think of these alternative hypotheses themselves.
The same may be true of hospital authorities, who may be the first to sound
the alarm and perform their own investigations, followed by police investigators
and public prosecutors. Attention may then focus on the possibility that a
suspected individual engaged in nefarious actions; the investigators may lack the
knowledge or, in some instances the motivation, to identify possible alternative
explanations.
There are additional examples of cases in which a cluster of deaths, initially
attributed to individual misconduct, turned out to have another explanation.
A cluster of deaths in a neo-natal ward in Toronto was initially associated with
a nurse, who was suspected of malevolent activity. Only later was it discovered
that new artificial latex products in feeding tubes and bottles could have been
responsible.19 An apparent increase in death on a neonatal ward in England
raised similar suspicions until a medical statistician identified the date at which
the death rate rose, and a neonatologist recognized it as the date when the
supplier of milk formula was changed. As these examples show, an increase in
deaths may be caused by factors that are not immediately apparent, even to
those involved. Such factors may require considerable expertise to discover and
could be missed entirely in some instances.
18See Gill, et al, 2018, Meester, et al., 2007 and Schneps & Colmez, 2013.
19See Hamilton, 2011.
14
The case of Lucia de Berk
In 2003, Lucia de Berk, a Dutch paediatric nurse, was convicted of four
murders and three attempted murders of children under her care. In 2004,
after an appeal, she was convicted of seven murders and three attempted
murders. Thereafter, several academic commentators questioned the qual-
ity of the evidence used to support the conviction, particularly statistical
testimony.
De Berk had been under suspicion in her hospital for some months as a
result of gossip about her tough, disturbed childhood and striking person-
ality. When a child in her care died suddenly, the death was immediately
announced to be completely unexpected and, by implication, suspicious.
Hospital officials identified eight further deaths or resuscitations that had
occurred while she had been on duty as medically suspicious. Additional
suspicious deaths were identified and linked to de Berk at two other hos-
pitals where she had worked. For two of the patients, investigators found
toxicological evidence supporting the claim that de Berk had poisoned them,
although the probative value of this evidence was weak. Statements in de
Berk’s diary about “a very great secret” and a “compulsion on a day that
a patient had died were given a sinister interpretation.
During her original trial, a criminologist (who had years earlier gradu-
ated in mathematics) presented statistical evidence according to which the
probability of so many deaths occurring while de Berk was on duty was
only 1 in 342 million. This number was the product of three p-values,
one for each hospital. Prominent statisticians came forward to argue that
the incriminating statistic was based on an over-simplified and unrealistic
model, biased data collection, and a serious methodological error in com-
bining p-values from independent statistical tests. The probability of so
many deaths occurring by chance may have been as high as one in 25.
In light of these doubts, and further medical evidence that came to light in
post-conviction investigations, the case was re-tried in 2010 and de Berk
was acquitted. The original convictions are widely viewed as miscarriages
of justice that were prompted, in part, by an inadequate investigation and
misuse of statistical evidence. They led to various reforms in the Dutch
legal system.
5 Investigative bias
Because criminal investigations are carried out by human beings, investigative
findings may be influenced by common human tendencies (often called biases)
that can affect the way investigators search for and evaluate evidence, as well as
how they choose to report findings. This section will discuss ways that various
widely recognised human biases may affect the investigation of misconduct by
medical professionals, with a particular focus on how such biases may affect the
statistical findings. The following section (Section 5) will discuss ways to min-
imise such biases to facilitate more objective and useful investigative outcomes.
15
5.a Unconscious bias throughout society
Although a variety of specific biases have been identified,20 they generally arise
from a common phenomenon: that people’s expectations and desires can in-
fluence what they look for and how they evaluate what they find when they
seek answers to important questions.21 The tendency of preconceptions and
motives to influence people’s interpretation of evidence has been called “one of
the most venerable ideas of . . . traditional epistemology.. . as well as “. . . one
of the better demonstrated findings of twentieth-century psychology”.22 This
tendency, which is often labelled an observer effect,23 was mentioned in both
classic texts and in the writings of early natural philosophers, such as Francis
Bacon, who observed in 1620 that:
The human understanding, when any proposition has once been laid down . . .
forces everything else to add fresh support and confirmation; and although . . .
instances may exist to the contrary, yet . . . either does not observe or despises
them.24
The potential for observer effects to distort scientific investigations was
recognised by early astronomers, who discovered differences in reported find-
ings of the same astronomical phenomena by different observers.25 Historians
of science have noted numerous additional ways in which human expectations
or desires may explain incorrect reports of scientific observations, such as scien-
tists’ failure to notice (or at least to report) phenomena inconsistent with their
theory-based expectations; reported findings that support pet theories but can-
not be replicated; and the statistically improbable degree of correspondence that
has been observed between some reported findings and theoretical expectations
(the most famous example from the history of science being the improbable
degree of agreement between data and theory in Gregor Mendel’s experiments
with peas).26 Over the last 70 years, epidemiologists and statisticians have
developed methods to limit and assess the impact of many of these biases.27
The same scope for biased data collection has been noted in criminal in-
vestigations. Miscarriages of justice are often attributed to “tunnel vision” and
“confirmation bias,” processes that may lead investigators to “focus on a partic-
ular conclusion and then filter all evidence in a case through the lens provided
by that conclusion.”28 The comments of Findley and Scott on the underlying
process are reminiscent of what Francis Bacon (quoted above) wrote in 1620:
Through that filter, all information supporting the adopted conclusion is el-
evated in significance, viewed as consistent with the other evidence and deemed
relevant and probative. Evidence inconsistent with the chosen theory is easily
overlooked, or dismissed as irrelevant, incredible, or unreliable.29
Common investigative practices noted in the commentary by Findlay and
Scott include tendencies for investigators:
20See Sackett, 1979, and Appendix D to The Law Commission, 2015.
21Kassin, Dror & Kukucka, 2013; Thompson, 2009; Nickerson, 1998.
22Nisbett & Ross, 1980.
23Risinger et al, 2003. For the relevance to epidemiological methods, see also Sackett, 1979.
24Bacon, 1620.
25Risinger et al, ibid.
26Pires & Branco, 2010; Jeng, 2006.
27Hill, 1965, is an early seminal reference. For approaches useful in forensic science, see
Stoel et al., 2015, Dror et al., 2015.
28Findley and Scott, 2006, p. 292.
29Findley and Scott, ibid.
16
to settle too quickly on a preferred theory, without adequately considering
alternatives;
to look for evidence that confirms or supports the preferred theory rather
than seeking evidence that might disconfirm it or support alternatives;
to notice, remember and record evidence more readily and reliably when
the evidence is consistent with the preferred theory than when it is not;
to interpret ambiguous evidence in a manner consistent with the preferred
theory;
to view evidence and interpretations as more credible when they support
the preferred theory, and vice versa;
to report findings with a higher degree of confidence if they support rather
than contradict the expected result;
to fail to hand over or disclose all the countervailing evidence to the de-
fence; and
to have skewed incentives to boost their case.
A key part of any investigation is attempting to identify the full range of
hypotheses that need to be explored, though it is difficult to be confident this is
done successfully. Investigations can go awry, and produce misleading findings,
if investigators miss or ignore an underlying causal factor. Deadly misconduct
by a medical professional is one possible explanation for an unexpected surge in
the death rate at a medical facility. Coincidence is another possible explanation
(as discussed in Section 2(a)): clusters do occur by chance even in a completely
random pattern of events. If investigators assume misconduct and coincidence
are the only plausible explanations, then evidence indicating that coincidence
is unlikely will lead investigators inevitably to the conclusion that misconduct
is likely. This conclusion may be mistaken, however, if the elevated death rate
arose, even in part, from other confounding factors, such as changes in the un-
derlying population of patients; negligence or misconduct by other individuals;
administrative changes affecting such matters as staffing levels, training, or case
load; or medical policy changes affecting transfer of patients from one hospital
section to another. See also the discussion on competing theories in Section 3.
The difficulty of identifying possible causal factors is multi-faceted.30 It
may arise in part from basic human psychological tendencies, as well as from
self-interest of individuals involved. Psychological research suggests that people
have a general tendency to gravitate toward criminality as an explanation for
seemingly anomalous events, rather than looking at situational or institutional
factors; this is called the “fundamental attribution error”.31 It causes people
to look to the person (ie, to human agency) rather than the situation when ex-
plaining events. There is strong and well-demonstrated psychological tendency
for people to assume that bad things are caused by bad people rather than bad
circumstances (cf. the common public need to attribute blame to individuals,
30The methods developed in medicine to bring together evidence from laboratory science,
observational and experimental studies are important tools for investigation. In the UK, the
Forensic Science Regulator has published advice on the development of evaluative reporting.
31Nisbett & Ross, 1980; Ross, 1977.
17
for “heads to roll”, in cases of systemic failure in public services).32 Hence, peo-
ple may tend to look for scapegoats to blame for bad medical outcomes arising
from other causes, and this is often encouraged by sensationalist reporting in
the media.
The tendency toward scapegoating may be abetted by stereotyping and bias.
Individuals accused of medical misconduct have often been unusual in ways that
drew attention and ultimately suspicion, making the hypothesis of medical mur-
der appear more plausible than otherwise. While it is important for investiga-
tors to take account of suspicious behaviour when that behaviour is diagnostic
of misconduct, a focus on the odd-ball or iconoclast may unfairly distort inves-
tigators’ impressions of the matter if they mistakenly rely on stereotypes that
have little probative value. Stereotypes with no empirical support, such as the
notion that nurses whose fashion aesthetic tends toward the “gothic” are more
likely than other nurses to commit murder, may draw unwarranted suspicion to
certain individuals and make them investigative targets. People are not always
conscious of the effect of such stereotypes on their thinking,33 and through this
unconscious bias, statistical evidence offered against a Goth nurse may be taken
more seriously and examined less critically, than the same evidence would be if
it were offered against a more conventional individual.
Cognitive biases can also affect the way that investigators interpret and clas-
sify data, and thereby distort the findings that emerge from an investigation.
Epidemiological and statistical methods used in investigations of disease out-
breaks or clusters of adverse events are applicable to investigating clusters of
deaths.
Whether a particular death should be deemed “suspicious”, for example,
might be influenced by a variety of factors, including factors that have little
or no diagnostic value. Cognitive psychologists have found that people often
have limited insight into the factors that influence such evaluations, so can be
influenced by their own expectations or motives without realising it.34 The
largely unconscious nature of these processes makes the resulting biases diffi-
cult to remedy. Teaching people about such biases is not sufficient to prevent
them, nor is exhorting people to be unbiased.35 The most reliable and effec-
tive counter-measure is actively to avoid creating the biases in the first place
by arranging, to the extent possible, to avoid creating strong expectations of
desires for a particular outcome.36 Additionally one needs proper checks and
balances including “red-teaming” and independent stringent review of potential
evidence, as done, for example, by the CPS in England and Wales.
5.b Anatomy of a biased investigation
The general points about investigative bias offered in paragraph (a) above allow
us now to consider more specifically ways that bias may distort the investiga-
tion of medical professionals accused of harming patients. We will describe a
hypothetical (but not completely fanciful) case in which a medical professional
is accused of mass murder; we will then discuss how the cognitive tendencies
32Burger, 1981; Ross, 1977.
33Nisbett & Wilson, 1977.
34Nisbett & Wilson, ibid.
35Kassin et al., 2013; Risinger et al. 2003.
36Dror et al., 2015.
18
discussed above may lead that investigation awry.
Let us suppose that administrators at a hospital become aware of an alarm-
ing increase in the number of deaths among elderly patients. Suspicion falls on a
doctor who works in a unit where many deaths occurred. The doctor has drawn
attention by speaking publicly in favour of euthanasia, making comments sug-
gestive of relief rather than sadness after some patients died, and by appearing
at the hospital’s Halloween costume party dressed as the Angel of Death. When
a co-worker reports seeing a syringe in the doctor’s bag, hospital administrators
call for an investigation.
At this stage, investigators often seek to determine whether the surge in
deaths can be linked to the suspect. Were patients more likely to die when
this doctor was on duty than when other doctors were on duty? To address
this question, investigators often try to count the number of deaths for which
the suspect may bear responsibility and compare it to the number of deaths
that occurred under similar circumstances when the doctor could not have been
involved.
In order to perform such an analysis, investigators must make a number of
difficult judgments. They must determine, for example, whether each death
that occurred should be viewed as a possible homicide, and if so whether the
doctor in question could have been responsible for that homicide. Crucially,
they must also consider whether factors other than the presence or absence of
the doctor in question may have affected the rate of death.
Because there is a high degree of subjectivity in such judgments, they may be
subject to bias. Furthermore, as we will explain, if the investigators are focused
on a particular suspect, it is likely that these biases will slant the investigators’
findings in a direction that is unfairly incriminating to that suspect. The re-
mainder of this section will discuss ways bias may arise in an investigation; the
following section will discuss possible ways to mitigate such biases.
5.c “Suspicious deaths”
One important judgment is about which deaths to count. Investigators typically
try to rule out deaths that are readily attributable to known causes and instead
focus on deaths that are “suspicious” or “unexpected” ie, deaths that might
possibly have been the result of homicide rather than disease or other “natural
causes.” Distinguishing the former from the latter is a matter that requires ex-
pert judgment by specialists, such as forensic pathologists and researchers who
study death certificate coding.37 Research indicates, however, that forensic
pathologists sometimes disagree about manner of death (eg, whether by homi-
cide or accident) and that such judgments can be influenced by non-medical
contextual information.38 . Post mortem evidence also confirms a high error
rate in reported cause of death, even for good doctors.
For example, Dror et al recently reported that forensic pathologists who were
asked to evaluate autopsy findings in a hypothetical case were more likely to
conclude that a child died due to homicide rather than accident if they were
37Ideally, forensic pathologists would be able to examine each corpse, but more realistically,
death
certificates should at least be reviewed by two people independently.
38For an example of the need to understand coding of deaths see the article on violent child
deaths by Sidebotham et al, 2012.
19
told it was a black child under the care of the mother’s boyfriend than if told it
was a white child under care of the child’s grandmother.39 Whether this finding
reflects “bias” is controversial.40 Some commentators have argued that infor-
mation about the child and its caretaker is relevant to forensic pathologists’
determination of manner of death and hence that it was perfectly proper for
them to be influenced by it.41 Dror and his colleagues have argued in response
that the information is beyond what forensic pathologists should consider when
making such determinations and deserves little to no weight even if it is consid-
ered, and hence that their findings are indeed examples of “contextual bias”.42
For present purposes, the key finding of the Dror et al. study is that forensic
pathologists’ manner-of-death determinations can be influenced by contextual
information, such as information about who was caring for the decedent. Let us
consider how that might affect the fairness of the kinds of investigations we are
discussing here. Suppose, for example, that a forensic pathologist is more likely
to determine that a patient’s death was “suspicious” and hence possibly due
to homicide if aware that the patient was under the care of a suspected serial
killer. This might happen because the forensic pathologist thinks it is proper
and appropriate to consider such information when evaluating cause of death.
Even if the forensic pathologist tries to ignore such information, however, it
may still bias the evaluation by creating an expectation of homicide when the
pathologist reviews cases associated with the suspected serial killer, and it may
do so without the pathologist being aware of it.
Contextual information of this kind may also affect thresholds for reporting.
Concern about missing possible victims may cause them to lower their threshold
for reporting “possible homicide” when evaluating patients attended by the sus-
pect; while concern about casting suspicion on an innocent person causes them
to raise the reporting threshold for patients attended by other nurses. Conse-
quently, when their evaluation of the medical evidence leaves them uncertain,
forensic pathologists may be more likely to report a case as a possible homicide
if they know the nurse on duty was a suspected serial killer, and less likely if
another nurse was on duty.
Regardless of how it occurs, this kind of bias would undermine the fairness
of the investigation by causing an increase in the count of “suspicious” deaths
associated with the nurse. The higher count would arise from the very suspicions
that the investigation is supposed to evaluate an example of circular reasoning.
Potential remedies for such biases will be discussed in Section 5.
5.d Access and opportunity
Another important judgment that investigators must make is which suspicious
deaths to count “against the suspect” (ie, as possible homicides committed by
the suspect) and which to attribute to other causes. In order to make this deter-
mination, investigators need to evaluate, for each “suspicious death,” whether
the suspect had, or may have had, sufficient access to be responsible. That
evaluation requires consideration of a number of factors, such as how the death
39Dror et al., 2021a.
40Arguably, the experts were asked the wrong question: if they had been asked not about
cause of death, but about likelihood ratios then context could be appropriately separated.
41Peterson et al., 2021.
42Dror et al., 2021b.
20
may have been caused, how long it would have taken to perform acts that would
cause the death, who else might have been present and whether they would have
observed and reported such misconduct, how soon thereafter the death would
have occurred, how soon it would have been detected, and so on.
Like judgments about manner of death, judgments about access and oppor-
tunity to kill are complex subjective assessments on which different experts may
have differing opinions (and, where experts are party-appointed, have implicit
“advocacy-bias”).43 Hence, they are also the kinds of judgments that may be in-
fluenced by contextual bias. There is a risk that investigators will be influenced
by their expectations and desires. It is possible, for example, that they will cast
a wider net when looking for “suspicious deaths” that can be linked to a sus-
pect; and a narrower net when counting suspicious deaths that occurred when
the suspect was not present. As a consequence, the deaths counted against the
suspected individual could increase (relative to deaths counted against others)
for the very reason that the suspect has come under suspicion.
5.e Similar circumstances
To determine whether an unusual number of deaths occurred when a particular
healthcare worker was on duty, it is necessary to compare the death rate when
the worker was on duty with the death rate during otherwise comparable
periods when the worker was not. It is often difficult to make such comparisons
in a fair manner, however, because the presence or absence of the worker may be
correlated with other factors that also affect the rate of death. In other words,
the periods chosen for comparison may not afford a fair comparison with the
periods when the worker was on duty.
Suppose, for example, that the worker typically works the morning shift.
Investigators could compare the rate of deaths that occurred on mornings when
the worker worked with the rate of deaths during the afternoon or night shifts
on the same ward, but the result would be misleading if the rate of deaths is
generally higher during the morning shift than during the other shifts. Inves-
tigators could instead compare the death rates on those mornings when the
worker did and did not work, but that comparison may also be confounded
by other factors. If the worker tended to work weekdays, for example, and not
work weekends, it would be important to consider whether that factor (weekday
vs. weekend) might also make a difference. Past investigations have sometimes
compared rates of death during the period when a particular individual was on
a medical staff with rates before or after. This kind of comparison confounds
that presence and absence of the individual in question with the period of time,
which could be misleading if time-related changes in procedure, staffing levels,
patient population and the like might also have influenced death rates. Media
reporting of poor outcomes in a hospital may deter future patients from seeking
treatment there, changing the case mix and influencing future outcomes, and
further confounding such comparisons.
Other factors affecting simplistic comparisons are the possibility of seasonal
effects on disease prevalence and severity, and purely administrative matters
such as the effects of hospital practice on recording of times of death, presence
of doctors, shift changes, etc. In some hospitals, deaths occurring during a night
43Murrie et al., 2013.
21
shift are officially recorded only in the presence of a doctor at the beginning of
the morning shift.
Investigators must carefully consider such factors in order to make a fair
comparison. That will typically require considerable knowledge about factors
that influence death rates, such as all those identified above. That suggests
that investigators will either need to be knowledgeable medical professionals
themselves or will need guidance from such professionals.
However, this guidance may itself introduce further biases, when it is ob-
tained from administrators and staff of the institution being investigated. Then
the officials who guide the investigation may have an interest in supporting par-
ticular outcomes, which could hinder the ability of investigators to identify the
full range of possible causal factors. Suppose for example, that the increase
in deaths that prompted the investigation arose after administrative changes
that affected staff levels, training, or supervision. To avoid any implications
of responsibility for a surge in deaths, the administrators may well prefer that
the investigation focus on a single bad apple on the staff, rather than these
background factors, and hence may de-emphasise or ignore them. This self-
interested guidance may prevent investigators from recognising causal factors
that confound their assessment of the rate of deaths attributable to a particular
individual. It would be far better if the investigators had access to sophisti-
cated guidance from experts on medical issues and hospital procedure who are
independent of the staff and administration of the institution being investigated.
As in science, we should only compare like with like. When we cannot
guarantee this by the design of the study, it is important to control for differences
in other plausible causal factors.
In Appendix 5 we give two hypothetical examples that illustrate ways in
which investigative bias may distort statistical evidence emerging from inves-
tigations of medical misconduct, emphasising that even very small biases can
completely transform the strength of the evidence from weak to compelling.
The second example also demonstrates that failure to control for differences in
a causal factor can also lead to huge biases.
5.f The role of chance
Throughout this report, we have recognised that the number of deaths observed
while a particular medical professional is on duty is influenced by chance, in the
form of natural sampling variation: comparing one period to another, the num-
bers of deaths will differ purely due to coincidence. Coincidental fluctuations
from population means are more likely with smaller samples, where the law of
large numbers does not dominate, than larger samples, hence there is a greater
chance of observing an unrealistically high or low number of deaths for shorter
intervals than for longer intervals, and for smaller patient populations than for
larger ones. Calculations of statistical significance are precisely answering the
question of whether observed differences between periods are greater than can
reasonably be accommodated by these chance effects.
However, there are many other reasons why even in the absence of a causal
effect of a medical professional’s actions, numbers of deaths in different periods
will differ. The behavioural aspects of these reasons have been grouped together
22
by Kahneman into what he calls “noise”.44 This is really an umbrella term for
quite different kinds of effect that we prefer to distinguish, as they require
different treatments.
We have already discussed that other measurable factors such as season, time
of day, etc., that differ between periods must be included in the analysis before
effects can be attributed to particular causes, and we have illustrated how to
do this. However, in addition, we have to accept the possibility of unmeasured
confounders, the “unknown unknowns” we mentioned earlier. Statisticians often
accommodate these factors probabilistically, for example, using random effect
models, bringing a second role of chance into the situation, one for which the law
of large numbers does not assist us (unless we are dealing with a large number of
periods). A simpler approach is to allow for overdispersion in standard models,
assuming for example “extra-Poisson variation”.
While the p-values presented in these examples take into account variability
due to the size of the samples (ie, the number of cases examined), they do
not, and cannot, take into account additional variability that may arise due to
unmeasured variables. As discussed earlier, such factors may well increase or
decrease the rate of deaths observed in actual cases. That means that bad luck
(due to investigators’ failure to appreciate such factors) may increase the number
of “suspicious deaths” that are counted against an innocent suspect in such an
investigation, quite independently of the factors discussed in the hypothetical
illustrations; and good luck (if that is the proper term) may decrease the number
of deaths that are counted against a guilty suspect (if, for example, a killer
happened to commit murders during periods when an unappreciated variable
caused the death rate to be low).
6 Advice on investigative procedures
The foregoing analysis leads to three pieces of advice for professionals who
are asked to investigate alleged misconduct by medical professionals. First, do
not be oblivious to other factors that may affect the negative outcomes under
investigation. Try to understand fully the factors that may have affected the
rates of death or other negative outcomes that are at issue, and try to take all of
those factors into account when assessing the likelihood that negative outcomes
arose from misconduct by a particular individual.
Second, do not be biased, indeed take active steps not to be. Be familiar
with the potential for cognitive bias and the subtle and often unconscious ways
it can influence expert judgments, and take steps to minimise such biases. As
explained below, that will typically require the lead investigators to establish
context management procedures,45 in order to control the flow of investigative
information to other members of the investigative team. Failure to take such
steps may permanently impair the investigators’ efforts to produce findings that
will be helpful, rather than misleading, to police, prosecutors and triers-of-fact.
Third, be cautious about drawing conclusions from limited samples, such as
death rates over short periods of time. When examining rates of unexpected
deaths (or other negative outcomes), seek the advice of statisticians on the ap-
propriateness of the samples selected for evaluation, on ways to reduce sampling
44Kahneman, Sibony and Sunstein, 2021.
45See Dror et al., 2015; Stoel et al., 2015.
23
error and various forms of bias, and on the meaning and proper interpretation
of statistical findings (in terms of both statistical significance and effect size, eg,
relative risk).46
6.a Identifying all potential causal factors
A key part of any investigation is identifying the full range of hypotheses that
need to be explored. As discussed in the previous section, investigations can go
awry and produce misleading findings, if investigators miss or ignore an under-
lying causal factor. If an investigation is premised on the assumption that an
unexpected set of patient deaths either resulted from intentional misconduct by
a given individual or from coincidence, for example, then evidence that coin-
cidence is unlikely will appear strongly incriminating. This conclusion may be
misleading, however, if the elevated death rate arose, even in part, from other
factors, such as:
changes in the underlying population of patients,
negligence or misconduct by other individuals,
administrative changes affecting such matters as staffing levels, training,
or case load, or
changes in policy on moving patients between sections of the hospital.
Police are often called into such investigations by medical authorities who
become suspicious of a given individual. The police may in turn rely upon
those authorities to familiarise them with the situation and help them identify
possible hypotheses in need of investigation. A danger inherent in this process
is that medical authorities may have an interest in the outcome of the investi-
gation that influences what they tell the police about possible causal factors.
For example, faced with an upsurge in patient deaths, hospital administrators
may find it easier to imagine that it was caused by individual misconduct of
a “bad apple” on the staff than to acknowledge that it may have arisen from
administrative decisions related to staffing and service levels; this may happen
unconsciously. Identifying possible causal factors is a complex task requiring
considerable expertise both in medicine and medical administration; it should
not be left to amateurs, but also not left to parties involved in the matter.
Consequently, it is essential that investigators in such cases have guidance
from experts who are independent of the institution being investigated. Be-
cause several types of expertise are relevant, it will often be necessary to have
an advisory team or panel. To investigate a surge in deaths in the geriatric
wing of a hospital, for example, investigators may need to consult experts in
geriatric medicine, experts in forensic pathology and forensic toxicology, and
experts familiar with hospital procedures, staffing practices and similar issues.
The advisory panel should have sufficient expertise to identify every factor that
might plausibly have contributed to the surge in deaths.47 The advisory panel
46Refer to Appendix 2.
47Notice there may even remain “unknown unknowns”. For example, with hospital baby
deaths, nobody could have imagined that a change from rubber to plastic could have put
digoxin-related substances into babies’ bodies (see Hamilton, 2011). Of course, you can hardly
take account of unknown unknowns quantitatively. But you should be aware that they are
24
should also have the expertise needed to guide investigators on where to look
for evidence that might support or undermine all plausible causal theories.
If the investigators attempt a statistical analysis, it is essential that they have
guidance from a competent statistician. We discuss this further in paragraph
(c) below.
6.b Minimising bias
In 2009, the United States’ National Academy of Sciences observed that “foren-
sic science experts are vulnerable to cognitive and contextual bias” that “renders
experts vulnerable to making erroneous identifications.” (p.4 note 8). In the
United Kingdom, the Forensic Science Regulator reached similar conclusions.
Support for these conclusions can be found in a growing body of research
showing that forensic examiners can be influenced by contextual factors that
are irrelevant to the scientific assessments they are supposed to be performing.
For example, latent print examiners who were told that a suspect had a solid
alibi were less likely to conclude that a latent print found at the crime scene had
come from the suspect.48 Similar findings have been reported in other forensic
disciplines, including document examination,49 bite mark analysis,50 bloodstain
pattern analysis,51 forensic anthropology52 and DNA analysis.53
To minimise such biases, academic commentators and some advisory bodies
have recommended that bench-level forensic scientists adopt context manage-
ment procedures that shield them from exposure to potentially biasing contex-
tual information that is not needed for their scientific analyses.54 Generally,
this requires dividing duties between a case manager and forensic examiners.
The case manager communicates with criminal investigators, determines what
evidence needs to be collected and what examinations are necessary, and then
passes the evidence on to forensic examiners, who evaluate the evidence and
draw conclusions. The division of duties makes it possible for the case manager
to be fully informed about underlying case and familiar with the context, while
the examiner receives only the information needed to perform the analysis re-
quested. In this manner the examiners can be “blind” to potentially biasing
contextual information until after they have drawn conclusions. The finger-
print examiner does not learn, for example, whether the suspect has a solid
alibi (or not) until after comparing the prints and drawing a conclusion. This
procedure assures that the examiner’s conclusion is based solely on the analysis
of the physical evidence submitted for examination and is not biased by other
contextual information.
We recommend that a similar procedure be employed when investigating
allegations of misconduct by medical professionals. A lead investigator could
play a role similar to that of the case manager by communicating with other
possible, and some statistical methods allow approximate adjustment for “unmeasured covari-
ates”.
48Dror, 2006; Dror & Charton, 2006; Dror & Rosenthal, 2008.
49Miller, 1984; Stoel, Dror & Miller, 2014.
50Osborne, Woods, Kieser & Zajac, 2014.
51Taylor, Laber, Kish, Owens & Osborne, 2016.
52Nakhaeizadeh, Dror & Morgan, 2013.
53Dror & Hampikian, 2011; Thompson, 2009.
54Dror et al., 2015; Thompson, 2015; Thompson, 2011; Risinger et al., 2002.
25
investigators and relevant authorities and identifying evidence in need of fur-
ther examination. The lead investigator would be fully informed of the facts
surrounding the case. The lead investigator would be supported by other indi-
viduals with specialised expertise who would conduct ancillary investigations.
These ancillary investigators would not be fully informed about the underlying
case but would deliberately be kept blind to information that is unnecessary to
a fair scientific assessment, including the prior opinions and conclusions of other
parties in the case.
Consider cases where a medical professional is suspected of killing patients
by poisoning. It will likely be necessary to have toxicologists examine medical
specimens to assess whether deaths that occurred during relevant periods could
have been caused by poison. The toxicologists will need access to the specimens
and will likely need information about the circumstances under which they were
collected, but they need not know whether the specimens were collected from
patients to whom the alleged killer had access, or from patients to whom the
alleged killer did not have access. Information about access, although potentially
biasing for reasons discussed in Section 4, is clearly irrelevant to the scientific
analysis of the specimen. By keeping the toxicologists blind to this information,
the lead investigator can greatly reduce the risk that contextual bias will distort
the toxicological findings.
Lead investigators will typically need the assistance of forensic pathologists
when assessing whether particular death should be regarded as “suspicious,”
and hence as a potential homicide. As with toxicologists, however, the forensic
pathologists need not be fully informed of all details of the investigation. When
assessing whether a particular death was “suspicious” they should remain blind
to whether the suspected medical professional had access to the deceased at
least until after they have recorded their conclusions on cause and manner of
death. Blinding will prevent the kinds of biases discussed in Section 4.
It is possible that some forensic pathologists will resist the use of blind-
ing procedures that deprive them, even temporarily, of contextual information.
Forensic pathologists in the United States have vociferously opposed the sug-
gestion that they employ context management procedures in routine practice on
grounds that the case context is always potentially relevant to their assessment
of the medical history of the deceased and that no one other than a forensic
pathologist has the competence to assess which parts of the case context may
be medically relevant.55 Additionally, forensic pathologists are accustomed to
receiving all case information when they assess cause and manner of death for
the purpose of issuing death certificates. For that purpose, they are expected to
consider all the evidence surrounding the case that may bear on how the death
occurred, including circumstantial evidence. Forensic pathologists might even
consider it perfectly proper to take account of the fact that the deceased was
under the care of a suspected serial killer when deciding whether to report the
death as a homicide on the death certificate.
For reasons discussed in Section 4, however, in giving evidence to a court
of law on the specific evidence in question in this kind of investigation, it is
clearly improper for a forensic pathologist to be influenced by such information.
As illustrated in the hypothetical examples, forensic pathologists may create
an unfair bias against suspects if they allow such information to influence their
55See Simon, 2019.
26
judgments, and this undermines the fairness of the legal process. It is for the
court to consider contextual information, such as that the deceased was under
the care of a suspect, and the expert witness’s assessment should be limited
to the specific scientific evidence concerned; if this principle is not adhered to,
there is a risk of “double-counting” of the contextual information when the court
weighs all of the evidence, with prejudicial consequences. Biases of this kind are
difficult to control because they can occur without conscious awareness. Simply
instructing a forensic pathologist to ignore contextual information may well
be insufficient; better practice would be to avoid exposing them to potentially
biasing information until after they have made their assessment.
In summary, blinding is central to proper and defensible conclusions from
analysis of numerical evidence. To promote its increased and widespread use
in legal practice we recommend that it is adopted wherever practicable, that
to encourage good practice its adoption is disclosed and indeed emphasised in
court, and that if it is not practicable that reasons for this are explained.56
In such investigations a little statistical knowledge may be a dangerous thing.
Medical professionals or social scientists who have taken a few statistics courses
may know how to compute statistics such as relative risk ratios and p-values,
but may lack the sophistication and experience needed to do so in a manner
that takes into account all relevant variables. The statisticians who advise such
investigations should have three qualifications: (1) doctoral level training in
statistics, ideally with an appropriate professional qualification (eg CStat in the
UK); (2) experience with statistical analysis of medical data; and (3) familiar-
ity with the academic literature on the investigation of misconduct by medical
professionals, such as the literature cited herein, and this report itself. It is par-
ticularly important that statisticians advising such investigations be familiar
with critical commentary that has identified flaws and limitations in previous
investigations; the reference list contains a few examples. To improve the qual-
ity of future investigations, it is vital that investigators, and their advisors,
treat past mistakes as opportunities for learning rather than defensively or for
blame.”57
Some annotated examples of correct analyses of illustrative data on patterns
of occurrence of adverse events are explained at length in Appendix 6, with
explicit computer code to reproduce these in Appendix 8. These analyses make
use of the standard statistical methodology of log-linear models (taught in the
UK in many undergraduate courses).
6.c The role for statistics in other specialist evidence
In this report we concentrate on cases where the data underlying the evidence in
question consists of numbers of events, typically deaths. However, there is often
a need for statistical interpretation of other kinds of data, typically presented
by specialists of other disciplines, for example sciences such as toxicology or
pathology, or social sciences such as criminology and forensic psychology.
In an investigation where conclusions drawn by specialists also involve data
subject to any variation or uncertainty, a statistical analysis is necessary; it
56There is some evidence that the use of blinding procedures enhances the credibility of
forensic evidence, particularly when the trier-of-fact appreciates that the evidence may depend,
in part, on an expert’s subjective judgments; Thompson & Scurich, 2019.
57See Matthew Syed, 2015.
27
is important to have the opinion of a statistician about the soundness of that
analysis. Scientific “intuition”, rules of thumb, “standard lab practice”, etc.,
are no substitute for an analysis that is correct. As the example in footnote 10
illustrates, intuition can be a poor guide in evaluating probabilities, especially
regarding apparent “patterns” in irregular data.
For an example of a scientific analysis in a criminal case, it is common in
cases where there is an unexpected number of deaths in a hospital ward that a
pathologist is called to assess the cause of death. In the case of nurse Bolding,
and others, the prosecution declared that potassium chloride was used by the
suspect to kill a patient, on the basis of expert forensic pathologist evidence
using earlier scientific studies that related potassium ion (K+) concentration in
the vitreous fluid of the eye to the post mortem interval (PMI). In a recent case
the relationship between K+concentration and PMI was used to predict what
the “standard” amount of K+should be at a certain (observed) PMI.58 The
patient had a higher concentration than predicted by the pathologist and the
nurse was then accused of having injected the patient with potassium chloride,
causing her death. In a first trial, this evidence was sufficient to convince the
court to convict the nurse and sentence her to life imprisonment, but the analysis
used to predict what the post-mortem potassium level should have been was
flawed. Not only was the sample used to make the prediction not representative
of the case under examination, but no measure of uncertainty was attached
to the prediction. If a correct statistical analysis had been carried out the
”incriminating” result would have been seen to lie within the ”margin of error”
which means that the post-mortem analysis did not support the hypothesis
that the dead patients had been injected with potassium chloride. The nurse
was acquitted in a second trial in which a more complete statistical analysis was
presented.
Likewise, any statistical analysis of data generated using other scientific
disciplines should be accompanied by expert opinion in those disciplines.
7 Advice for evidence evaluation and case pre-
sentation
7.a The lawyer’s role
Put starkly, a lawyer’s role in an adversarial system is to advise and represent
those who instruct them, to the best of their ability. Lawyers work (or ought
to work) from facts to a solution. The advisory role requires an objective,
independent evaluation of the evidence. Clients are ill-served if they are told
only what they want to hear; others (including those accused of crimes) and the
administration of justice itself may also be harmed by biased reasoning.59 The
representative role, however, is in essence the attempt to persuade the decision-
maker (whether judge or jury) to make a finding favourable to the client on
the evidence and argument, and within professional and ethical boundaries.
The dividing line between persuasive and biased reasoning may be a fine one,
58See Dotto, Gill and Mortera, 2022.
59In The Scout Mindset, Julia Galef describes biased, or directionally motivated reasoning as
the “soldier mindset”. By contrast, the reasoning underlying the “scout mindset” is motivated
by accuracy.
28
and not always easy to identify; the potential harms of biased reasoning in a
representative capacity are, however, self-evident and ought to be eliminated
insofar as possible.
Both advisory and representative roles require an intimate knowledge of
the facts and, where statistics are involved, the uses to be made and limits of
such evidence. We suggest that the starting point ought to be a full, complete
chronology (or thematic organization) of the facts; this is common practiced in
some area of law. Once that has been done, elementary strategies such as asking
“what do we know?”, “what don’t we know?”, and “what do we need to know?”
ought to help to identify gaps, guide investigations, frame issues, and assist the
building of argument.60 If, once investigations are complete, gaps remain, then
care ought to be taken not to fill them with speculation or (which may be the
same thing) over-determined statistical analysis.
Even once the analysis is complete, there is still a place for sense-checking.
Critical thinking techniques such as the “double standard test”, the “outsider
test”, the “conformity test”, and the “selective skeptic test” are likely to be
useful in testing inference and argument.61
Courts in the UK have warned, at the highest level, that “there is a danger
that so-called ‘epidemiological evidence’ will carry a false air of authority.”62
The courts also recognise, however, that epidemiological evidence used with
proper caution can be admissible and relevant in conjunction with specific evi-
dence related to individual circumstances and parties to a case. The significance
a court may attach to such evidence must depend on the nature of the epidemio-
logical evidence, and of the particular factual issues before the court. The point
was pithily made by Lord Rodger of Earlsferry that “Where there is epidemi-
ological evidence of association, the court should not proceed to find a causal
relationship without further, non-statistical evidence”,63 and by Baroness Hale
of Richmond:
“The fact that there are twice as many blue as yellow taxis about
on the roads may double the risk that, if I am run over by a taxi, it
will be by a blue rather than a yellow one. It may make it easier to
predict that, if I am run over by a taxi, it will be by a blue rather
than a yellow one. But when I am actually run over it does not
prove that it was a blue taxi rather than a yellow taxi which was
responsible. Likewise, if I actually develop breast cancer, the fact
that there is a statistically significant relationship between, say, age
at first child-bearing and developing the disease does not mean that
60Amy E Herman, Visual Intelligence, pp153-164.
61Julia Galef in The Scout Mindset, p71, describes these tests as follows:
The Double Standards test is “Are you judging one person (or group) by a different standard
than you would use for another person (or group)”
The Outsider test is “How would you evaluate this situation if it wasn’t your situation?”
The Conformity test is, “If other people no longer held this situation, would you still hold
it?”
The selective Sceptic test is “If this evidence supported the other side, how credible would
you judge it to be?”
Other strategies are to be found in Tom Chatfield’s Critical Thinking.
62Sienkiewicz v Greif (UK) Ltd [2011] 2 AC 229 at p299, paragraph 206, per Lord Kerr of
Tonaghmore JSC.
63Also in Sienkiewicz, at p287, paragraph 163.
29
that is what caused me to do so.”64
The criteria set out by Bradford Hill for inferring causality in epidemiological
studies are also relevant here.65
To properly advise their clients, frame the issues for the court, and present
evidence and argument, lawyers need to (that is, should) understand and ef-
fectively deploy the statistical and epidemiological evidence and the inferences
arising from them.
Evidence presented in a case at law can be regarded as data, and the issue
to be decided by the court as a hypothesis under test. The relationship between
these may be immediate, or else indirect, involving a long chain of intermediate
propositions. The outcome of a criminal trial cannot be known in advance, as it
requires sifting, evaluating, and deciding one or more issues; there is uncertainty
about the relationship of the issue to the evidence such that there is a burden
of proof. Such uncertainty can, in principle at least, be described probabilisti-
cally. We do not suggest that judges and juries are likely to have (or should be
expected to acquire) a sophisticated understanding of probability or facility in
manipulating probabilities; nor that explicit probability arguments should be-
come routine in courts of law.66 There are however increasing numbers of cases
where evidence about probabilities is clearly relevant, and the court would stand
to benefit from advice about how to handle them.
This section provides advice for lawyers and judges on how to deal with
such cases when they reach the courts. As the previous sections have discussed,
these cases are complex and can be extraordinarily difficult to investigate. If
the investigations lead to prosecution, the evidence produced in court can be
difficult to evaluate. In recent years, serious concerns have been raised about
the fairness of the legal process in a number of these cases. Here are some
suggestions about steps to take and pitfalls to avoid in order that these cases
are tried fairly and justice is served.
7.b Evaluating event clusters
The first recommendation is to avoid giving undue weight to seemingly unlikely
clusters of events. As discussed in Section 2, seemingly improbable clusters of
events can arise by chance without criminal behaviour. Consequently, evidence
involving event clusters may be less probative than people assume for distin-
guishing criminality from coincidence. Even if it is highly improbable that such
a cluster would occur by coincidence, the best explanation might nevertheless
be coincidence in the absence of convincing evidence to the contrary. Lawyers
and judges should keep this point in mind. In jurisdictions that rely on lay
juries, judicial instructions on this point may be helpful.
In the absence of other evidence, an unlikely cluster of evidence is difficult
to evaluate and may be meaningless. When combined with other evidence,
however, such events can be highly probative. Consequently, it is extremely
important for triers of fact to consider whether evidence of an event cluster
(eg, a surprising number of deaths among the patients of a particular medical
64Sienkiewicz, p290, paragraph 171.
65Hill, A. B., 1965.
66Although, in particular jurisdictions there may be relevant requirements for admissibility,
eg in the USA, the Daubert standard for evaluating scientific evidence.
30
professional) is supported by other more traditional evidence suggesting that
the suspect had the motive, means and opportunity to kill patients. Failure
to find such supportive evidence, when it would be expected, may constitute
strong evidence against the theory that the suspected individual engaged in
misconduct.67 Judicial instructions on this point might also be helpful in some
cases.
7.c Recognising the consequences of investigative bias
A second recommendation is to be mindful of the dangers of investigative bias
and the ways that it can unfairly bolster the apparent strength of the evidence.
As discussed in Section 4, bias may arise from investigators’ failure to consider
all possible explanations for the deaths or other negative outcomes under in-
vestigation. Consequently, it is vitally important for lawyers, judges and jurors
to consider carefully whether all potential causal factors have been considered.
That will typically require expert assessments by individuals broadly knowl-
edgeable about clinical medicine and clinical practice, and often other fields as
well, such as epidemiology and statistics.
If it appears that the initial investigation may have been incomplete because
investigators had insufficient access to independent expert advice or focused
prematurely on an incomplete set of hypotheses, it is vital that independent
experts be called upon to examine the evidence before trial. Courts should
call such experts in jurisdictions where experts report to the courts, whether
independent court appointed experts or single joint experts. In jurisdictions in
which the parties typically provide courtroom experts, steps should be taken
to assure that such experts are allowed and that parties can afford to provide
them.68
Courts should of course consider any alternative explanations offered by the
individual under suspicion. It would be a mistake, however, to assume that
the person under suspicion has sufficient knowledge or insight to identify all
possible causal factors. During the trial of Dutch nurse Lucia de Berk (discussed
in Section 3), who was convicted of murdering patients but later exonerated,
the Judges asked de Berk whether she could explain why there had been so
many deaths among her patients. They specifically asked her to comment on
such matters as whether she lacked competence, whether her case load was
more difficult, whether she had more night shifts. She could offer nothing to
help her own case. Yet independent experts who examined her case after she
was convicted identified a large number of potential causal factors that cast
the case in an entirely different light, and ultimately contributed to de Berk’s
release from prison. The initial investigation had missed or ignored some of
these factors, perhaps because they cast a negative light on individuals who
were involved in the initial investigation. The Lucia de Berk case is thus an
important cautionary tale about the need to involve independent experts before
trial to avoid subsequent miscarriages of justice. Ideally this will occur during
the initial investigation, but if not, then it needs to be done before the case
67See commentary of Aart de Vos, translated in Gill, 2021; Thompson & Scurich, 2018.
68We recognize, of course, that public resources are limited and that a balance must be
struck between the needs of the parties for expert assistance and other priorities. Our goal is
to explain the importance of expert assistance on this matter, not to comment on the priority
it should receive relative to other needs.
31
comes to trial.
The fairness of the investigation may also be undermined by the failure of
investigators to take adequate steps to mitigate contextual bias. As illustrated
in Section 4, predictable biases may arise when experts assess such matters as
whether a death was “suspicious” and whether the suspect had access to the
decedent. Even if the effect of such biases on the number of deaths counted
against the suspect is relatively small, the cumulative effect can be dramatic
on statistics used to assess the significance of event clusters, such as p-values.
Lawyers and judges need to understand that investigative bias can create seem-
ingly powerful statistical evidence against someone who is entirely innocent. In
light of that insight, they must consider whether adequate procedures were taken
to control bias in the instant investigation and, if not, how that failure affects
the probative value of the statistical evidence generated by the investigation.
A poorly conducted investigation may yield statistical findings that are so
problematic that they do not warrant consideration. Jurisdictions that rely on
lay juries as triers of fact often require judges to screen scientific testimony to
assure it is sufficiently trustworthy to be admitted into evidence. In the United
States, for example, Rule 702 of the Federal Rules of Evidence requires the trial
judge to determine that proffered expert testimony “will help the trier of fact
to understand the evidence,” that it is “based on sufficient facts or data,” that
it is “the product of reliable principles and methods,” and that the expert “has
reliably applied the principles and methods to the facts of the case.”69 If the
underlying investigation failed to consider relevant causal factors that might
provide an alternative explanation for a cluster of deaths, and carried out the
assessment linking the defendant to those deaths in a biased manner, it might
well be appropriate for a judge to find that the resulting evidence does not meet
the requirements of Rule 702 and should be excluded. Such evidence might also
be subject to challenge under provisions such as Rule 403 of the Federal Rules
of Evidence, which allows a trial judge to exclude evidence from consideration
by a jury “if its probative value is substantially outweighed by a danger of .. .
unfair prejudice, confusing the issues, [or] misleading the jury . . .
Similar rules apply in England and Wales.70 To assist the court, skilled (or
expert) witnesses, unlike other witnesses, can give evidence of their opinions,
if and only if they fall within their domain of proven and relevant expertise.71
If on the proven facts a judge or jury can form their own conclusions without
help, then the opinion of an expert is unnecessary (and thus inadmissible).72
As with judicial or other opinions, what should carry weight is the quality of
the expert’s reasoning, not whether the expert’s conclusions accord with other
69The United States Supreme Court has addressed the admissibility of expert evidence under
the Federal Rules in a series of cases that began with Daubert v. Merrell Dow Pharmaceuticals
(1993) and included General Electric v. Joiner (1997) and Kumho Tire v. Carmichael (1999).
All three cases emphasized the judge’s role as a “gatekeeper” with the authority to exclude
from the trial expert evidence that that is insufficiently “reliable” to be trustworthy.
70Kennedy v Cordia Services LLP 2016 SC (UKSC) 59, paragraphs [38] et seq.; see also
Forensic Science Regulator (2015, 2021a, 2021b, 2022).
71The Professor Meadows and shaken-baby syndrome cases, were cautionary examples of
an expert straying outside their domain of expertise (medicine) and being also regarded by
the court as expert in their non-expert opinion area (statistics).
72In the case of Ben Geen, see Gill RD, Fenton N, & Lagnado D (2022), the judge ruled
that written opinions on the biasedness of the investigation submitted by two experts in the
field of statistics and medicine was merely common sense. The experts were not allowed to
present their opinions to the jury.
32
evidence. The expert should be careful to recognise, however, the need to avoid
supplanting the court’s role as the ultimate decision-maker on matters that are
central to the outcome of the case. The expert’s role is to provide information
and analysis that is helpful to the trier-of-fact, not to comment directly on how
the trier-of-fact should decide the case. On the question of impartiality and
other duties of an expert, these include:
Expert evidence presented to the court should be, and should be seen to be,
the independent product of the expert uninfluenced as to form or content by
the exigencies of litigation.
An expert witness should provide independent assistance to the court by
way of objective unbiased opinion in relation to matters within his or her ex-
pertise. An expert witness in the High Court should never assume the role of
an advocate.
An expert witness should state the facts and assumptions on which the
opinion is based, and should not omit to consider material facts which could
detract from the concluded opinion.
An expert witness should make it clear when a particular question or issue
falls outside his or her expertise.
If an expert’s opinion is not properly researched because insufficient data
is available, then this must be stated with an indication that the opinion is no
more than a provisional one. In cases where an expert witness who has prepared
a report could not assert that the report contained the truth, the whole truth
and nothing but the truth without some qualification, that qualification should
be stated in the report.
If, after exchange of reports, an expert witness changes his or her view on
a material matter having read the other side’s expert’s report or for any other
reason, such change of view should be communicated (through legal represen-
tatives) to the other side without delay and when appropriate to the court.
Where expert evidence refers to photographs, plans, calculations, analyses,
measurements, survey reports or other similar documents, these must be pro-
vided to the opposite party at the same time as the exchange of reports.73 This
applies also to software.74
In many jurisdictions, particularly those that use professional judges as triers
of fact, there are no rules for exclusion of untrustworthy or unreliable scientific
evidence. In those jurisdictions statistics generated in a poorly conducted in-
vestigation would need to be considered but could, of course, be dismissed or
ignored if the judges found them unpersuasive.
While some investigations may be sufficiently problematic to justify exclud-
ing or ignoring the statistical findings entirely, courts are likely, in most cases,
to treat investigative flaws, methodological limitations and potential biases as
issues going to the weight of the evidence that is, as issues for the trier of
fact to consider when weighing the value of the evidence. In light of the is-
sues discussed in this report, it should be clear that advice, reports and expert
testimony from independent statisticians may be extremely important. If in-
vestigative bias is a significant concern, lawyers and courts should also consider
seeking evaluations from experts of cognitive bias and factors associated with
the accuracy of expert judgment.
73Kennedy, p74, para [52], quoting from well-established case law.
74Similar best practice rules for experts are included in both the UK CPR Common Proce-
dure Rules and various International Arbitration rules.
33
7.d Avoiding fallacious interpretations of statistical find-
ings
A third recommendation is to be mindful of the danger of drawing illogical con-
clusions from statistical findings, such as p-values, and to take steps to assure
that misinterpretation of statistical findings does not undermine the fairness of
the trial. As discussed in Section 2, people often transpose conditional prob-
abilities, which can cause them to draw illogical and unwarranted conclusions
from statistics like p-values, conclusions that may be quite unfair to an accused
individual.
The first step in avoiding unfairness is for lawyers and judges to educate
themselves about the proper interpretation of such statistics, so that they can
avoid inadvertently incorporating such errors into arguments they make to the
triers of facts or summations of evidence. It is also important that lawyers and
judges avoid eliciting from experts testimony that incorporates or is conducive
to such errors.
Avoiding error in the presentation of evidence is only the first step. Be-
cause people often jump to illogical conclusions on their own, it is not enough
to present the evidence correctly. The trier of fact is likely to need guidance
on correct interpretation of such statistics. If the triers of fact are professional
judges, that guidance could be part of their professional training. Some jurisdic-
tions are exploring the possibility of special education in probability for judges
who will handle cases involving statistical evidence; such training would surely
be appropriate for judges handling this class of cases.
With lay triers of fact, the guidance can take two forms. It could be incorpo-
rated into expert testimony. For example, experts could be asked to comment on
the meaning of a p-value or similar statistic, which might allow them to identify
incorrect interpretations. An expert might say, for example that a low p-value
does not necessarily imply a low probability that the findings are coincidental.
It means that the evidence observed is unlikely if coincidence is the underlying
explanation, but coincidence may still be more likely than other explanations in
the absence of convincing evidence supporting another explanation. In jurisdic-
tions where judges instruct the jury on applying the law to the facts, guidance
to this effect might also be incorporated into judge’s instruction.
8 Conclusions and summary of recommendations
In this final section, we draw together our main recommendations. We reiterate
that the scope of this report is the use of evidence based on statistical analysis in
cases of suspected medical malpractice. Some of our recommendations may be
appropriate in other contexts, but that is for others to say. We also recall that
our scope is not limited to any particular jurisdiction; in some jurisdictions some
of our recommendations may be redundant as they advocate what is already
accepted practice.
It should be clear now that in our view, the statistical aspects of these cases
are often nontrivial, fraught with difficulties, challenging to laypeople (jurors,
media reporters, the public) and to lawyers. They are not entirely straightfor-
ward to the specialists!
Recommendation 1: It is therefore important that all parties involved in
34
investigation and prosecution in such cases consult with professional statisti-
cians, and use only such appropriately qualified individuals as expert witnesses.
[Section 5(c)]
There are two kinds of error in drawing inferences about effects from data:
inferring an effect that is not real, or missing one that is. Both have grave effects
in the judicial setting. It has been argued that if one decreases the error rate
of one of the two kinds, the error rate of the other kind will go up; thus any
change in practice shifts the balance between prosecutor and defence, shifting
the errors from Type 1 to Type 2 or vice versa. That is only the case if nothing
is changed in statistical methodology, apart from merely shifting a decision
threshold. But one can reduce both error rates by increasing the amount of
information extracted from the already available data, using superior statistical
methods, and of course by acquiring more and different kinds of data.
Recommendation 2: In presenting the results of statistical tests, both the
level of statistical significance (p-value) and the estimated effect size should
be stated. One addresses the question of whether an effect is truly detected,
the other quantifies the size of that effect, if it exists. These are different
concepts and both are important; neither should be confused with subjective
judgements about the credibility of the expert witness. [Section 4(c), Section 5,
and Appendix 2]
Special care is needed to assure that p-values, when presented in reports and
testimony, are understood and used properly. While p-values are an important
statistical and scientific tool, they are difficult for people to understand and
are frequently misinterpreted. They may, for example, be misunderstood as
statements about the probability that a coincidence occurred, rather that the
probability of observing a given number of deaths (or more) by chance, and this
kind of misinterpretation can be extremely unfair to individuals suspected of
misconduct.
Recommendation 3: In reports and testimony, experts should take care to
explain the proper interpretation of p-values and should avoi