Content uploaded by Scott McLachlan
Author content
All content in this area was uploaded by Scott McLachlan on May 13, 2020
Content may be subject to copyright.
1
The fundamental limitations of COVID-19
contact tracing methods and how to resolve
them with a Bayesian network approach
Scott McLachlan1,2, Peter Lucas3, Kudakwashe Dube2,4, Graham A Hitman5, Magda
Osman6, Evangelia Kyrimi1, Martin Neil1, Norman E Fenton1
1 Risk and Information Management, Queen Mary University of London, United Kingdom
2 Health informatics and Knowledge Engineering Research (HiKER) Group
3 Faculty of EEMCS, University of Twente, Netherlands
4 School of Fundamental Sciences, Massey University, New Zealand
5 Centre for Genomics and Child Health, Blizard Institute, Queen Mary University of London, United Kingdom
6 Biological and Experimental Psychology Group, Queen Mary University of London, United Kingdom
Abstract
Many digital solutions mainly involving Bluetooth technology are being proposed for Contact Tracing
Apps (CTA) to reduce the spread of COVID-19. Concerns have been raised regarding privacy, consent, uptake
required in a given population, and the degree to which use of CTAs can impact individual behaviours. The
introduction of a new CTA alone will not contain COVID-19. The best-case scenario for uptake requires between
90 and 95% of the entire population for containment. This does not factor in any loss due to people dropping out
or device incompatibility or that only 79% of the population own a smartphone, with less than 40% in the over-
65 age group. Hence, the best-case scenario is beyond that which could conceivably be achieved. We propose
to build on some of the digital solutions already under development, with the addition of a Bayesian network
model that predicts likelihood for infection supplemented by traditional symptom and contact tracing. When
combined with freely available COVID-19 testing with results in 24 hours or less, an effective communication
strategy and social distancing, this solution can have a very beneficial effect on containing the spread of this
pandemic.
1. Introduction
At the time of writing many of us are several weeks into social distancing and lockdown in
an effort, we were told, that would flatten the curve and curtail the spread of COVID-19. As
considerations move from dealing with the worst of the disease to containment of any remaining
pockets of infection, much noise is being made in the media concerning the need to implement
contact tracing apps (CTA) before the world can return ostensibly to normal (Mathews, 2020;
Scott, 2020; Whittaker, 2020; Drew 2020). While the claimed benefits for CTA of being able to
leave our homes, reopen workplaces and revive crippled economies are significant, CTA are not
without some controversy (Lomas, 2020; Volk, 2020) including whether they work on every
generation or type of smartphone (Duell, 2020). Questions regarding transmission dynamics and
optimal intervention strategies for the disease, and the risk CTA pose to individual privacy and
efficacy are repeatedly raised, and many feel these have not been adequately answered (Crocker
et al, 2020; Sun & Viboud, 2020). Some describe CTA as the trojan horse: reminding us that many
governments and corporations already operate population-wide electronic surveillance and once
they also get access to our CTA data, will not, act in good faith (Lomas, 2020). However, few ask
whether this personal information is being provided in support of the most, or even an effective
2
method and at what uptake rate in the general population are we sure that it will be worthwhile.
Is a Bluetooth radio beacon paired to a smartphone app the most effective method for digital
contact tracing? In this paper we address these key questions for smartphone-based contact
tracing solutions. In Section 2 we provide a general overview of contact tracing including the
different types, current supporting technology, data/privacy issues and limitations to its
effectiveness. In Section 3 we review the current and proposed contact tracing systems for
COVID-19. Our own proposed solution – BayesCOVID - is presented in Section 4. Section 5
contains a discussion and conclusions.
2. Contact Tracing Overview
Proposed more than 80 years ago for the control of syphilis (Paran, 1937), contact tracing
is a surveillance and containment strategy for infectious disease (Vazquez-Prokopec et al, 2017).
Rather than managing only isolated cases as they seek medical attention, contact tracing follows
the path of infection from diagnosed patients to those with whom they have been in close physical
contact (Armbruster & Brandeau, 2007; Eames, 2007; Vazquez-Prokopec et al, 2017). Several
approaches (summarised in Figure 1) for contact tracing have been described in the literature
(Eames, 2007; Klinkenberg et al, 2006).
Figure 1: Contact Tracing approaches
(a) First-order: Identifies only immediate contacts of the infected.
(b) Single-step: Identifies immediate contacts, their contacts and so on. Only deals with those
who are symptomatic. Asymptomatic infecteds can spread disease until detected and
isolated.
(c) Iterative: Tracks and re-applies diagnostic test to contacts iteratively. Process continues
until no further infecteds are identified.
(d) Retrospective: Same as single-step or iterative but also seeks to work back, identifying who
infected the patient, who infected that person and so on.
Contact tracing has traditionally been conducted manually when a patient is diagnosed
with an infection that is usually also subject to notification rules that require the clinician to inform
the health authority (HA). Any likely contacts of the infected patient are determined, identified,
advised of their exposure status and encouraged to seek medical advice (Armbruster & Brandeau,
2007; Eames, 2007). Generally, contact tracing has only been used for diseases with low
prevalence, such as for: tuberculosis, HIV/AIDS, Ebola and sexually transmitted diseases
(Armbruster & Brandeau, 2007; Danquah et al, 2019; Eames, 2007; Yasaka et al, 2020).. While
some believed contact tracing was effective during the SARS outbreaks of the early 2000’s (Kiss
et al, 2005), Singapore’s reliance on contact tracing during that period was found on review to
have failed (Fidler, 2004; Huat, 2006). Other examples where contact tracing failed, in some
cases even with the use of smartphone technology and apps, include an audit of contact tracing
use for tuberculosis (Hussain et al, 1992; Mwongela, 2018); the Foot and Mouth outbreak in the
3
UK in 2001 (Kiss et al, 2005; Kao, 2003); and the 2014-16 Ebola epidemic in West Africa
(Danquah et al, 2019).
2.1 Modern contact tracing using wireless beacons
With our vastly increased global population, international airline travel, megacities and
mass transit, it is unlikely that traditional contact tracing alone could contain even a minimally
contagious disease (Niehus et al., 2020). Traditional contact tracing was used early-on during the
SARS epidemic (Fidler, 2004; Huat, 2006). However, it failed to contain the infection and global
HAs realised that new approaches were now required (Fidler, 2004; Huat, 2006). Modern contact
tracing approaches have been proposed using ubiquitous and pervasive smartphones and the
wireless technologies they contain to record and report when we have come into close physical
contact with others. It is believed this automated contact tracing will overcome situations when
we either are not aware of, or don’t recall, every contact incident (Maghdid & Ghafoor, 2020). The
proposed approaches shown in Figure 2 incorporate these technologies to more efficiently and
effectively provide: (a) movement-focused mobile-assisted automatic contact recording; (b)
contact identification; (c) contact notification; and, (d) narrowcast messaging (Maghdid & Ghafoor,
2020; Vazquez-Prokopec et al, 2017; Yasaka et al, 2020). Proponents of CTA claim, that installing
the app will significantly reduce the chance of passing on the infection to family and friends
(COVIDSafe App, 2020), and that is is essential to keeping your family safe from COVID-19
(Hamilton, 2020).
Figure 2: Modern applications of contact tracing using smart devices
(a) CTA automatically records anonymous IDs of other devices that come within the broadcast distance of the
wireless technology being used (Bluetooth or wifi).
(b) Central server operated by health authority or a technology supplier maintains the linking table that can
identify all users of CTA.
(c) When an infected person (in RED) notifies the CTA of their positive diagnosis, the central server advises
all CTA users who have been in close physical contact with the infected that they should seek medical
advice.
(d) The central server can also be used to send narrowcast messages, for example: alerting people who CTA
location tracing identified near a particular infection hotspot during a defined period (in GREEN) that they
may have been exposed and to seek medical advice.
While solutions using WiFi MAC address sniffing (Lu et al, 2020), GPS (Finazzi, 2020;
Klopfenstein et al, 2020; Maghdid & Ghafoor, 2020) and cellular network geolocating (DP3T,
2020; PEPP-PT, 2020) have all been proposed, many believe Bluetooth tracing to be the most
4
suitable for use in CTA (Berke et al, 2020; Brack et al, 2020) and that it has already been
demonstrated effective for proximity detection (Berke et al, 2020; Brack et al, 2020). It is also
claimed that, while Bluetooth has an effective range of around 25-30 metres, signal strength can
be used to effectively identify whether another device is within the 2-metre rule promoted as a
component of social distancing (Berke et al, 2020; Brack et al, 2020; Xia & Lee, 2020).
2.2 CTA Data Points and Privacy
Most attention to privacy in the literature focuses on the interactions and data passing
between users of the CTA when they come into close physical contact and their devices
handshake. A smaller focus is given to interactions between the CTA and HA server, whose
privacy exposure is mitigated, it is claimed, by decentralised solutions: that is, solutions where
most data remains on the user’s device and only small push or pull transactions occur to the HA
server to either advise the system of the user’s COVID-19 diagnosis, or verify that the user has
not already been in contact with another who has since been diagnosed. What is clear is that,
while labelling their solutions as privacy-preserving, most authors seek to mitigate one form of
data or privacy loss while ignoring, intentionally or not, every other possible disclosure vector
(Kuhn et al, 2020). To the best of our knowledge, no author even considered the issue of metadata
and its effect in nullifying their often complicated and expensive privacy solutions.
Metadata is the most common and easily accessible form of personal information being
collected (McLachlan, 2016). Metadata is defined as information about a communication: the who,
when, where, and how but not the what. Metadata contains sufficient information to know when
you made a call, texted, emailed or accessed a web page, who your communication or web
request was made to, how and whether the person or system at the other end received the
communication. The only thing metadata does not contain is the actual content of the message
(Maurushat et al, 2015). Metadata has been used by law enforcement and others to draw
inferences about our state of mind, intentions, previous travel, personal associations and
interactions (Maurushat et al, 2015; McLachlan, 2016). In many countries metadata may be
accessed without a warrant by authorised organisations and agents, and laws exist requiring
telecommunications, internet service provider companies and web hosts to maintain large stores
of metadata collected as a result of the activities of individual subscribers (Maurushat et al, 2015;
McLachlan, 2016; Shamsi et al, 2018).
Metadata are generated at every step of the typical CTA scenario. Every communication
or request sent to cellular, internet service provider or web host organisations results in metadata
that must be stored in logs in their network that identify you from your subscriber identity module
(SIM) record matched to the details of your device, with a record of what you requested or sent,
to or from whom, and when (De Carli et al, 2020; McLachlan, 2016; Shamsi et al, 2018). All digital
traffic passing from your provider’s network via the internet to the HA results in metadata being
captured in the systems of every network provider between the two, but more importantly, in the
HA’s network systems and servers. Believed by many to be non-sensitive, metadata often
remains overlooked in smartphone and internet-facing solutions even though it can be a trivial
matter to re-identify an individual and their actions and interactions with others from the metadata,
or digital breadcrumbs, they create (Ho et al, 2018; Maurushat et al, 2015; Perez et al, 2018;
Shamsi et al, 2018).
2.3 Efficacy of CTA
Many issues limit contact tracing efficacy, the most significant being the need to
understand transmission, susceptibility, prevalence, and latency for the target disease (Kiss et al,
5
2005). Before deciding on an effective control strategy, it is essential to understand the course of
the disease. In epidemiology, many compartmental models have been developed for modelling
infectious diseases (Roddam, 2001; Hethcote, 2000). One commonly used model computes the
theoretical number of people infected with a contagious disease in a closed population over time
is the Susceptible-Infected-Recovered (SIR) model (Anderson, 1991; Rodrigues, 2016). These
mathematical models are considered an important source of knowledge for life-or-death decisions
regarding management of COVID-19. The Susceptible-Exposed-Infected-Recovered (SEIR)
model has been used to focus on transmission of COVID-19 in Wuhan, China (Lin et al, 2020),
and to compare outcomes for different containment policies (Casella, 2020). The Susceptible-
Infectious-Recovered-Dead (SIRD) model has been used to provide estimations of the basic
reproduction number (R0), per day infection mortality and recovery rates, and attempts to forecast
the evolution of an outbreak at the epicentre three weeks in advance (Anastassopoulou et al.,
2020). Susceptible-Infected-Diagnosed-Ailing-Recognized-Threatened-Healed-Extinct
(SIDARTHE) was proposed as an extension to SIR in an effort to model the COVID-19 epidemic
in Italy (Giordano et al., 2020). Their model showed that enforced lockdowns could be mitigated
in the presence of widespread testing (Peto, 2020) and contact tracing, strongly contributing to
rapid resolution of the epidemic. Similar findings were also found in (Hellewell et al., 2020).
While many claim suitability, viability and effectiveness for CTA, in most cases the CTA
solution they proposed has yet to be prototyped, and for those that were none were tested in
anything approaching a real-world situation (Brack et al, 2020; De Carli et al, 2020; Hekmati et al,
2020; Klopfenstein et al, 2020; Mwongela, 2018; Yasaka et al, 2020). We sought to understand
how effective CTA might be as a containment approach for COVID-19 in highly populous locations
like London or Birmingham in the UK, or Sydney and Melbourne in Australia. With respect to how
many people an infected person may come into contact with, we rely on the calculations provided
in the UK that have come to be known as the Oxford figures and have been used by those
developing and promoting the need for an NHS-specific app, and in the media, to support efficacy,
funding and deployment of the NHS app (Merrick, 2020). Keeling et al (2020) used an SEIR model
to suggest that in a 14-day period post-lockdown the average person comes into contact with 217
people, of whom 59 are considered to be close contacts sufficient for disease transmission, and
of those 36 would be individuals in a CTA scenario who are potentially traceable (Keeling et al,
2020). We could have used the larger number from the Keeling et al (2020) paper, 59, for
simulating close contacts per 14-day period. This would have made our numbers significantly
larger, significantly increasing the opportunity for CTA containment failure. However, in order to
demonstrate the weakness of claims made in support of CTA even as a component in disease
containment for COVID-19 we chose again to work from a best-case position, using the lower
figure of 36 as the base for simulating contact transmissions.
The Oxford figures also provide that the average latent period, usually defined as the
period between when a person is exposed to the virus and when they begin exhibiting symptoms,
is 4 days (Keeling et al, 2020). Other authors using larger datasets provided this incubation period
was 5 days, with 97% of patients showing symptoms at day 12 (Lauer et al, 2020; Qi et al, 2020).
Younger infected patients tend to be asymptomatic, and for longer periods, and the mean serial
interval, the time between when symptoms appear in infector and infectee) varies between 4 and
7.5 days (Du et al, 2020; Qi et al, 2020). Our best-case assumptions are similar to those in
(Kucharski et al, 2020) except that our mean delay from symptoms to isolation was reduced to 1
day: the effect of this would be to reduce the number of secondary infecteds created by each
primary in our scenarios. In spite of this, our results were statistically similar to those of Kucharski
et al (2020).
In our calculations we used the following (best case) assumptions:
a) The infection clock starts from exposure for patient zero (in red);
b) From day 5 the infected begins to shed the virus;
c) Patients may become symptomatic between days 5.5 and 11.5;
6
d) At day 12 every infected is considered to by symptomatic;
e) Each infected comes into close contact with 36 people in a 14-day period, pro rata
for the period between day 5 and when they become symptomatic;
f) Every infected has self-isolated from day 13;
For the 6 o’clock path shown in Figure 3, we present the absolute best-case scenario
where 100% of the population have smartphones, install the CTA, are tested, immediately self-
report and self-isolate. This scenario, whilst being quite impossible, would actually contain the
disease in only two cycles, or 14 days.
Figure 3: COVID-19 CTA Infection Scenarios
The 3 and 9 o’clock paths present the UK and Australian scenarios for the claimed 60%
(Merrick, 2020) and 40% (Woodley, 2020) adoption that we are told would deliver CTA success
in their respective populations. In each scenario every infected spreads COVID-19 to only a small
number of infecteds, and while a percentage of secondary infecteds are alerted through the CTA
and self-isolated, the remaining percentage, those without the app, persist to spread the infection
to a significantly large number of people. Appendix 4 provides a visual representation of the
progress at each stage for the 60% adoption NHS CTA scenario.
Unfortunately, smartphone penetration for adults in the UK has only achieved 79%,
reducing to 40% in the key COVID-19 demographic, the over-65s. Australian figures are similar.
To get 60% penetration in the overall UK population, 76% of all smartphone owners must install,
register and use the app. This assumes no loss to follow-up, which occurs where a user either
stops using or removes the app from their device for any reason. Since the average loss to follow-
up in a clinical trial is 6% (Akl et al, 2012), to ensure 60% of the population use a CTA app to
completion would actually require more than 82% of the smartphone-owning population to initially
install and register the app.
For the 40% (Australian) and 60% (UK) scenarios we begin from the position that 40%
and 60% of the population respectively have installed the app and immediately self-report and/or
self-isolate when alerted. As these scenarios played out, we calculated under an absolute best-
case wherein people who were alerted by the app or who reached day 13 all immediately self-
isolated. Unfortunately, we know some people’s symptoms will not be severe enough at first for
them to believe they have the disease and seek medical advice. Studies report that around 18%
of all exposed people remain asymptomatic but recover from the virus in a timeframe similar to
7
that of people who do become symptomatic (Mizumoto et al, 2020; Day, 2020). Further, 1-2% of
patients will be asymptomatic but remain contagious and continue to shed the virus from 1-3
months after their initial exposure, with or without a symptomatic period (Bengali, 2020). In
keeping with our best-case model we have not incorporated additional potential exposures that
would arise from these groups of people in our calculations.
A final set of calculations was performed seeking the sweet spot: that number below
absolute for CTA adoption in the overall population where the number of secondary cases was
manageable by manual contact tracing and other containment methods, and the NHS generally.
Table 1 presents the results of those calculations and, similar to figures proposed by other groups
who have evaluated this issue (Bulchandani et al, 2020), we find the sweet spot for CTA uptake
in order to control COVID-19 lies somewhere between 90 and 95%. As discussed, such high
uptake is simply not credible, nor possible.
Table 1: Number of additional infecteds created per one infected, based on % of the population installing
and immediately complying with the CTA
Day:
12
14
16
18
20
95%
18.0
4.5
4.7
5.1
6.2
90%
18.0
9.0
9.9
11.3
16.0
80%
18.0
18.0
21.6
27.0
46.8
*NB - While estimates suggest 94% of UK adults owns a mobile telephone
(https://www.tigermobiles.com/blog/mobile-phone-usage-statistics/) only 79% of those over 18 in
have a smartphone (source: https://www.finder.com/uk/mobile-internet-statistics ) and only 40%
over 65 - the key demographic for infection and death from COVID-19.
3. Proposed and current COVID-19 CTA Solutions
At time of writing Singapore and Australia’s Health Departments have already commenced
rollout of CTA solutions for COVID-19, and the United Kingdom (UK), North America and most of
Europe will commence their trial deployments shortly (Hern & Sabbagh, 2020). We describe UK
apps in Section 3.1 and 3.2. The Australian app is described in Appendix 1. Taiwan, South Korea
and Israel were even more proactive, with increased testing, quarantines and mandated CTA of
recent travellers and the infected that has resulted in lower rates of secondary infections and
significantly fewer deaths, with alarms being raised, similar to home detention systems for
criminals, informing police if those in quarantine left the building in which they were being housed
(Lee, 2020; Lomas, 2020b).
Before considering the actual apps in use (or about to be used) in the UK, it is important
to note that there are also many other theoretical solutions in hurriedly prepared preprints that are
yet to undergo rigorous testing or peer review. Examples include: (Berke et al, 2020; Brack et al,
2020; De Carli et al, 2020; Hekmati et al, 2020; Klopfenstein et al, 2020; Maghdid & Ghafoor,
2020; Reichart et al, 2020; Xia et al, 2020). While acknowledging that privacy is not a design goal
for any CTA, many propose solutions that they claim are privacy-preserving: both between app
users generally, and between individuals and the health authority and technology suppliers who
maintain the central servers (Brack et al, 2020; Reichert et al, 2020). We identified only one paper
that acknowledged no privacy could exist where there was a central authority, and that users
should only expect solutions to keep them blinded from each other (Berke et al, 2020).
8
Some solutions present as a confusing array of seemingly random technology, thrust
together (Reichart et al, 2020). Apps proposing ID hashing or public/private key encryption
between central server and end-user claim these additions ensure complete user privacy: and
while authors acknowledge that the central server will have recorded your current and all previous
hashIDs and will be used to distribute alerts to other users, they also disingenuously claim that
the health authority are entirely unable to learn anything at all about users, the infected, or their
contact history from this vast collection of data (Brack et al, 2020). Many proclaim CTA ineffective
because it relies on willing individuals who must provide identifying information about themselves
and those they come into contact with, and to self-report their infected status via the app for
storage on a central server (Brack et al, 2020; Hekmati, 2020; Yasaka et al, 2020). These
assertions are often made to provide a basis for proposing an ostensibly privacy-protecting or
decentralised solution that we note still requires some form of user ID and other exposing
information, that in some cases included giving access to the CTA user’s location or contact lists
for direct use by the CTA (Brack et al, 2020; Hekmati et al, 2020; Reichert et al, 2020; Yasaka et
al, 2020).
3.1 The Oxford/NHS App
Appendix 2 describes the app being rolled out by the UK government and NHS. We refer
to it as the Oxford/NHS app since its development was led by academics at Oxford. The
government are pinning their hopes on this app being a key enabler for relaxing the current
lockdown policy. Appendix 3 discusses the common properties and data being collected by CTA
reviewed during this research. We believe the statistics and overall proposal to support
development of the app and promote its uptake in the community are based on best-case
scenarios. However, we do perceive that the strength of Government and NHS support comes
from the perception of trust they seek to engender. The openness and degree of transparency
that the NHS and Oxford teams have been upselling in the media, if delivered, far exceed that of
any other. We found no other State-developed or operated solution that suggested a willingness
to allow the media, technologists and general public access to the source code. However, early
non-published results of a pilot trial on the Isle of Wight are less encouraging with a major limiting
factor being the variation in smartphone operating systems, especially those of older phones
(Duell, 2020). The level of transparency underpinning the NHS solution needs to also be adopted
in any use of the APIs provided by the Apple/Google collaboration.
3.2 Chan/Spector Symptom Tracker App merged with CTA
In late March 2020 a COVID-19 Symptom Tracker app (resulting from collaboration
between Massachusetts General Hospital, Harvard Medical School, Guys and St Thomas’ NHS
Foundation Trust and King’s College London) was released (Drew et al, 2020). There are
currently 2.8 million users, with symptoms data gathered from around 1.6 million, of whom only a
tiny fraction of 1,176 (0.07%) had undergone some form of PCR-based diagnostic test (Drew et
al, 2020). While the take-up of the app is impressive, there are concerns about its subjective
nature and the bias of its user base. The app was initially promoted to and installed by clinical
staff and their families, and many in the wider community who voluntarily install such apps are
the worried well who, when prompted with questions suggesting the symptoms that go with a
condition, are more likely to identify as having some of them. Unless carefully managed,
suggestibility unintentionally induces conditioned associations between symptoms, leading
patients to report more intense or additional flu-like symptoms (Skelton et al, 1993).
Leaving these issues aside we believe a good solution might have been to incorporate
CTA into this Symptom Tracker app, and allow the existing user-base to either consent or decline
9
providing that additional information. That a high number of existing users would consent to the
addition is far more likely than believing that almost 3 million people will install a second COVID-
19 related app. We also believe that any proposed CTA solution should contemplate capture of
many of the same symptom-based data-points, whether used directly in contact tracing or not.
We suggest this in order to enable future anonymous aggregation and data mining/knowledge
engineering on COVID-19 from what could be a considerably much larger and richer dataset.
4: Bayesian network based COVID-19 CTA
Our proposed solution focuses on enabling users to diagnose the possible presence of
Covid-19 themselves. This is done through a causal probabilistic model (a Bayesian network, that
we describe in Section 4.1) that is made available in a smartphone app based on the architectural
framework (that we describe in Section 4.2). The app provides the user with information about
how likely it is they have or have not mild or severe Covid-19. When this probabilistic information
is combined with data about the GPS-location of the smartphone, together with information about
the age group of the person the triple (Prob. user has Covid-19, GPS-location, Age-group) can be
used to provide information about the distribution of mild and severe Covid-19. For example using
colour shades on the map of a country, the data can be used to present a dynamic visualization
of the probability distribution on the location where that information was collected (Hay et al,
2013). This solution option involves providing diagnostic-oriented feedback to citizens with real
time Covid-19 surveillance and minimal privacy infringement as quickly as possible in the face of
all the limitations of the current constantly changing situation. Response measures from the
information collected from this option operate mainly at the population location level, such as
intensified lockdown, social/physical distancing and self-isolation campaigns rather than more
granular contact tracing and individual isolation measures requiring massive resource
deployment. This option is dramatically different from the many trace and contact app solutions
provided elsewhere.
4.1 The Bayesian network (BN) for providing feedback on user symptoms
A Bayesian network (BN) (Cowell et al, 1999; Fenton and Neil 2018, Koller & Friedman,
2009; Pearl, 1988) is a graphical model consisting of nodes and arcs as shown in Figure 4 (this
is the draft model we propose for our app). Some of the variables (such as those representing
symptom nodes) may be directly observable while others (such as the COVID-19 node) are not.
There is an arc between two nodes if the corresponding variables are causally linked in a
probabilistic sense. The strength of the link, as well as the uncertainty associated with these, is
captured using probabilities and statistical distributions. When data are entered into the model for
specific variables that are observed, all of the probabilities for, as yet, unknown variables are
updated using an AI algorithm called Bayesian inference. Hence, in the model here, the BN
algorithm computes the probability of having none, mild, or severe COVID-19 , based on present
signs and symptoms and other relevant background information entered by the user.
The model makes a number of simplifying, but rational assumptions. For example, it
assumes: that a person can only become infected if they have been in recent contact with an
infected person (or some biological matter from an infected person); that a positive test result
from a perfectly accurate COVID19 test procedure would mean that the person has COVID19
(even if they were asymptomatic); that there may be other conditions such as COPD or flu that
have some symptoms in common with COVID-19.
10
The probability distributions in the model for the symptoms given the disease status (i.e.
the status of the COVID-19 variable) are based on the statistics provided in the paper by Huang
et al. (2020). All the assumptions are described in Appendix 5.
Figure 4: Covid-19 Bayesian network model structure. The probabilities shown for the
COVID19 status node represent the prior probabilities when no observations are entered.
Figure 5 shows the updated predicted probabilities with some user entered observations;
in this example a user has many of the COVID19 symptoms and has had multiple recent
interactions with other people. Although this user has not entered their background or risk factors,
the model infers there is a 76% probability the person has Covid19 (66% probability severe and
11% probability mild). Note that the model also updates the probabilities for the unknown risk
factors and background nodes. For example, this person is more likely to be male than female
(56%) and is likely to be over 65 (54% probability). The probability of obesity is 12% (up from a
prior of 10%). These backward inferences are simply the application of Bayes. Appendix 5
illustrates the power of the model through other scenarios.
11
Figure 5: Covid-19 BN model for a user with most CODID19 symptoms and multiple recent
interactions with other people (nodes with observations are denoted with a scenario label).
Depending on the value of the ‘alert threshold’ that is set the model will trigger an alert (it
will trigger a separate hospitalization alert depending on the length of time the symptoms have
been present and whether or not they are improving). So those people with the app who have
come into contact with the person will be alerted that they have been in contact with a person
most likely to be Covid19 positive, while this person could be given appropriate instructions for
contacting the health authorities.
This model is still an incomplete attempt at developing a BN for the prediction of the
presence of Covid-19 (we are in the process of gathering the relevant data required to complete
all of the probability tables; currently those for which we do not have relevant data, or are not
logically determined, are simply estimated). It is possible to add other signs and symptoms (for
example dizziness seems useful) and also comorbidities and immunodeficiency could be added,
as the literature provides the relevant information.
The advantage of a BN is that it can still generate predictions with incomplete information.
Thus, if certain evidence is not entered by the user, the model is able to use prior probabilistic
information rather than make particular assumptions. So, although body temperature and oxygen
saturation are key measurements, the user decides whether or not these measurements are
actually done. Using the BN it is also possible to predict which feature will be the most informative
12
one in contributing to the diagnosis, and this feature can be used to request additional information
from the app’s user after some initial input.
4.2 Design of the BayesCOVID Surveillance Framework
The envisioned use of such a probabilistic BN model is as a foundation of population
surveillance of the geographical outbreak and spread of Covid-19. The proposed infrastructure
for personalised Covid-19 status feedback and collecting geographical data is shown in Figure 5,
and is inspired by related research of the authors’ research groups (van der Heijden et al, 2013;
Velikova et al, 2014).
Figure 5: Infrastructure for personalised Covid-19 feedback and collecting geographical Covid-19
data.
As Figure 5 illustrates, the BN is embedded or integrated into an app meant to run on a
person’s smartphone. The presentation of the feedback is expected to be attractive and easily
understood by the smartphone user with additional advice whether or not it is wise to contact a
GP.
This solution operates within the CardiPro environment using the Web/PWA front-end and
Agena CloudAPI (McLachlan et al, 2020). Our research group has the means now to demonstrate
both the elements and the entire solution presented in Figure 5. The minimalist data transmitted
to the server, even if coupled with collecting a similar anonymous symptom set as used for the
Chan/Spector app, might be more palatable to people who may be concerned about privacy in
both the UK and Netherlands.
In summary, in this proposed solution, it is assumed that a citizen of a country obtains
feedback about the likelihood of the presence of mild or severe Covid-19 from a smartphone app,
but the main purpose of making an app with the BN embedded is to monitor the population for
detecting new outbreaks and the locations at which this occurs as early as possible. For this
purpose, it is only needed that the minimalistic data triple (Prob. user has Covid-19, GPS-location,
Age-group) is collected centrally. The age information might be useful to get information about
required protection of particular groups. In addition it might be useful to also add an app-specific
unique identifier so that it is possible to follow the progress of Covid-19 in the individual (possibly
until hospital admission). However, collecting only the above-mentioned data triple has the
advantage of minimal infringement of privacy.
13
5. Discussion and conclusions
The UK and several other countries including Australia, Singapore and Germany, propose
a centralised approach whereby data will be collected on smartphones and some component of
that data is forwarded to a central server, enabling contact alerting and tracing of the epidemic.
Some countries favour use of the solution presented by the Apple and Google Partnership, which
is claimed to be a ‘local’ solution under development that will not breach data security and will not
lead to any centralisation of data. Their proposal for privacy-safe contact tracing using Bluetooth
would, they say, require explicit user consent, which is another issue that needs greater
consideration. The Apple/Google solution APIs on first blush don’t appear to collect personally
identifiable information or user location data, and suggest a list of people you’ve been in contact
with never leaves your phone (detected via Bluetooth LE). We are also told that people who test
positive are not identified to other users, Google or Apple and that the information would only be
used for contact tracing by public health authorities for Covid-19 pandemic management. Like
every other proposed decentralised system this requires communication with and storage of data
in some form of central server. However, with regard to metadata and privacy, it is possible that
Apple/Google (or the various HA using their APIs) may collect at least part of the data being
generated for secondary use purposes.
It should also be noted that the Apple/Google APIs are simply an interface for HAs to
expedite development of CTA solutions: they are not a CTA. APIs act as a standardised
intermediary, in this case between the user interface and a data backend, both of which will still
require HAs to engage software architects and developers to create. There is no guarantee that
without engaging far more experienced technologists and serious reconsideration, any app the
NHS develop using the Apple/Google APIs will not fare as badly as the first 24 hours of real-world
testing of the NHSX CTA on the Isle of Wight (Duell, 2020). If we are to use these APIs, a better
solution might be a Progressive Web App (PWA). A single PWA could be developed to be
compatible with both Android and Apple architectures, and engineered to avoid the main issue
seen with the NHS trial app: incompatibility with variants of the smartphone’s operating system.
CardiPro (McLachlan et al, 2020), which we proposed in our solution in Section 4, is an example
of this approach which we have already developed and demonstrated.
It can be inferred from the literature, mass media and download pages of those developing
and promoting CTA, that to at least some degree they seek to create the belief that
implementation of contact tracing makes containment of COVID-19 a fait accompli. Each presents
a solution couched in words suggesting that, for successful eradication of COVID-19, we need
only to install the CTA, and in doing so we will have identified everyone who, symptomatic or
asymptomatic, might have the disease. However, this assumes the data collected by the CTA will
be clean, accurate and sufficiently complete, and will fully support their containment efforts which,
despite best intentions, is extremely unlikely (Senga et al, 2017). We accept that solutions
operating at the front end of contact tracing, like the CTA, will produce more data. More contact
information will require time-consuming and labour-intensive follow-up, and consumption of
considerable resources in order to identify and weed out the true cases from the spurious chatter
(Senga et al, 2017). But it should be noted that previous work has failed to consider:
a) the effect of people simply leaving their smartphone at home, or in the car;
b) how to effectively deal with people who might have two or more devices;
c) how to identify the owners of prepaid devices that in some countries can be registered
without identification, or anonymously.
d) the effect of a CTA user coming into close physical contact with others who eschew, or
cannot afford, smartphones (all previous work assumed that the adjectives pervasive and
ubiquitous meant complete coverage).
14
We believe that care should be taken when deploying CTA in any community. Not just
because of privacy or consent issues. But, rather, to ensure that even the most suggestible
member of our community does not become complacent and assume that CTA operates, as
claims like those provided with the Australian Government COVIDSafe app would seem to
suggest, as an invisible shield making us and our families impenetrable or immune to the disease.
For many proposing CTA, the idea of using an app instead of just network tracing via the
cellular network or other means is not as much about Bluetooth being more accurate, it is about
the idea of claiming to have informed consent: that by downloading the app and clicking through
a privacy agreement they have received ‘informed consent’ to access and monitor an individual
through their device. In studies evaluating the impact and effect of privacy policies and user
agreements it was found that 54% are written in language unapproachable by most people
(Jensen et al, 2004), 40% of participants do not even recall seeing the agreement while clicking
through to install the app (Good et al, 2005), and only 0.24% of more than 55,000 actually clicked
or scrolled to view the policy (Jensen et al, 2004). Most users have no idea what they have agreed
to, and given that organisations change their policies and agreements regularly, whether the
current version of the agreement is consistent with that which the media may have discussed
when the CTA was being rolled out. Given these findings, is it ethical to consider that when users
install and register the CTA, the inclusion of a long privacy policy and user agreement that
potentially more than half of the population will be unable to comprehend constitutes informed
consent?
Even if all potential privacy and consent issues were resolved, the decision to install and
register the CTA in most western countries would remain voluntary. This raises the question: How
can high-level uptake of the CTA be assured? To answer this question we propose that at least
three related matters must be considered: (i) public compliance with existing social distancing
measures; (ii) media narrative of CTA; and, (iii) ongoing changes in peoples’ subjective estimate
of severity and susceptibility to the virus.
Recent opinion polls suggest that the majority in each country are in favour of existing
social distancing measures, irrespective of how strictly they are maintained and for how long they
remain (Ipsos-mori, 2020a). When compared to other countries, people in the UK are displaying
a higher degree of support for continued social distancing. However, public opinion elsewhere is
somewhat mixed. In other countries the trade-off is not the same: the protection of privacy
outweighs relaxed social distancing through use of a CTA. For example: (i) in France, where 53%
of respondents are opposed to the CTA (Hughes Hubbard, 2020); and, (ii) the US, where 50% of
respondents are opposed to the CTA, (Kirzinger et al, 2020). The US poll also showed opinion
somewhat changes when benefits such as going back to work are more prominently presented,
in which case 66% would agree to download the CTA. However, from 64% of the US total sample
17% indicated that a CTA would make them feel less safe, while 47% said the CTA would make
no difference to their feelings of safety at all. The current media narrative and an individual’s
subjective estimates of severity and susceptibility are two broad factors that, whilst not
independent of each other, account for the observed differences in opinion and behaviour both
between countries and over time (Abeysinghe & White, 2011; Leppin & Aro, 2009; Slovic, 2000;
Wagner-Egger et al, 2011; Wheaton et al 2011).
While the sustained focus on data privacy concerns remains strong in the mainstream
media, this negative issue will dominate public understanding of CTAs and significantly restrain
uptake. If the narrative can be drawn towards the potential benefits for everyone that comes from
a general loosening of restrictions then the success we have seen in compliance with the current
lockdown may allow these people to accept the trade-off and come out in favour of the CTA.
Many proposed solutions, even the Google/Apple collaboration, focus very heavily on
privacy and app distribution and make almost no mention regarding accuracy. Despite best
intentions, the levels of inaccuracy that arise in any data recording mean that any contact tracing,
manual or digital, will always be incomplete (Senga et al, 2017). Even when we have a significant
15
proportion that do comply with contact tracing, we often still have poor data arising out of the
methods employed to collect the data. The normal inaccuracies that occur in data recording and
data entry are amplified with contact tracing because some people simply don’t want to be traced,
while others have limited socio-cultural understanding for why we are wanting to trace them
(Senga et al, 2017).
We are sceptical that any standalone contact tracing approach, manual or automated,
could contain a high-prevalence highly contagious disease like COVID-19. This is primarily
because the CTA acts retrospectively. It advises the user they were previously in close contact
with an infected, and in the case of COVID-19, this advice often comes only after they have
already begun asymptomatically shedding the disease. The solution we propose integrates the
retrospective CTA with symptom tracking and a BN, providing the user with a prospective view of
the probability that they may have contracted COVID-19. In this way we increase CTA utility for
users. We believe that with increased utility uptake may be improved, as is the opportunity to
collect useful data and identify actionable clinical knowledge to improve the response in future
disease outbreaks. Solutions like the one proposed here can have a very beneficial effect on
containing the spread of infection but only if combined with open-access COVID-19 tracing.
Acknowledgement
The authors acknowledge support from the EPSRC under project EP/P009964/1:
PAMBAYESIAN: Patient Managed decision-support using Bayesian Networks and The Alan
Turing Institute under the EPSRC grant EP/N510129/1.
References
Abeysinghe, S., & White, K. (2011). The avian influenza pandemic: Discourses of risk, contagion and preparation in
Australia. Health, Risk & Society, 13(4), 311-326.
Akl, E. A., Briel, M., You, J. J., Sun, X., Johnston, B. C., Busse, J. W., ... & Alshurafa, M. (2012). Potential impact on
estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT):
systematic review. Bmj, 344, e2809.
Anastassopoulou, C., Russo, L., Tsakris, A., & Siettos, C. (2020). Data-based analysis, modelling and forecasting of
the COVID-19 outbreak. PLoS ONE, 15(3), 1–21. https://doi.org/10.1371/journal.pone.0230405
Anderson, R.M. Discussion: The Kermack-McKendrick epidemic threshold theorem. Bltn Mathcal Biology 53, 1 (1991).
https://doi.org/10.1007/BF02464422
Armbruster, B., & Brandeau, M. (2007). Contact tracing to control infectious disease: When enough is enough. Health
Care Management Science, 10, pp 341-355.
Bengali, S. (2020). He was symptom-free. But the coronavirus stayed in his body for 40 days. LA Times. Last accessed:
04th May, 2020. Sourced from: https://www.latimes.com/world-nation/story/2020-04-30/why-some-patients-
keep-testing-positive-for-the-coronavirus
Berke, A., Bakker, M., Vepakomma, P., Raskar, R., Larson, K., & Pentland, A. (2020). Assessing disease exposure
risk with location histories and protecting privacy: A cryptographic approach in response to a global pandemic.
arXiv preprint arXiv:2003.14412.
Brack, S., Reichert, L., & Scheuermann, B. (2020). Decentralized Contact Tracing Using a DHT and Blind Signatures.
Last accessed: 01st May, 2020. Sourced from: https://eprint.iacr.org/2020/398.pdf
Bulchandani, V. B., Shivam, S., Moudgalya, S., & Sondhi, S. L. (2020). Digital herd immunity and COVID-19. arXiv
preprint arXiv:2004.07237.
Casella, F. (2020). Can the COVID-19 epidemic be controlled on the basis of daily test reports? ArXiv.
http://arxiv.org/abs/2003.06967
COVIDSafe App, (2020). Australian Government Department of Health: COVIDSafe App. Last accessed: 29th April,
2020. Sourced From: https://www.health.gov.au/resources/apps-and-tools/covidsafe-app
Cowell, R., Dawid, A., Lauritzen, S., & Spiegelhalter, D. (1999). Probabilistic Networks and Expert Systems. New York:
Springer.
Crocker, A., Opsahl, K., & Cyphers, B. (2020). The challenge of Proximity Apps for COVID-19 Contact Tracing.
Electronic Frontier Foundation. Last accessed: 29th April, 2020. Sourced from:
https://www.eff.org/deeplinks/2020/04/challenge-proximity-apps-covid-19-contact-tracing
16
Danquah, L. O., Hasham, N., MacFarlane, M., Conteh, F. E., Momoh, F., Tedesco, A. A., ... & Weiss, H. A. (2019). Use
of a mobile application for Ebola contact tracing and monitoring in northern Sierra Leone: a proof-of-concept
study. BMC infectious diseases, 19(1), 810.
Day, M. (2020). Covid-19: identifying and isolating asymptomatic people helped eliminate virus in Italian village. BMJ
(Clinical Research Ed.), 368(March), m1165. https://doi.org/10.1136/bmj.m1165
De Carli, A., Franco, M., Gassmann, A., Killer, C., Rodrigues, B., Scheid, E., Schonbachler, D., & Stiller, B. (2020).
WeTrace: A privacy preserving mobile COVID-19 tracing approach and application. ArXiv preprint:
2004.08812v1
DP3T, (2020). Decentralised privacy-preserving proximity tracing. Last accessed: 21st April, 2020. Sourced from:
https://github.com/DP-3T/
Drew, D. A., Nguyen, L. H., Steves, C. J., Wolf, J., Spector, T. D., Chan, A. T., & COPE Consortium. (2020). Rapid
implementation of mobile technology for real-time epidemiology of COVID-19. medRxiv.
Du, Z., Xu, X., Wu, Y., Wang, L., Cowling, B. J., & Meyers, L. A. (2020). The serial interval of COVID-19 from publicly
reported confirmed cases. medRxiv.
Duell, M. (2020). New NHSX Covid-19 contact tracing app doesn't work on two-year-old phones say Isle of Wight
residents using it in trial. Mail Online, Last accessed: 08th May, 2020. Sourced from:
https://www.dailymail.co.uk/news/article-8297475/NHSX-Covid-19-contact-tracing-app-doesnt-work-two-
year-old-phones.html
Eames, K. (2007). Contact tracing strategies in heterogeneous populations. Epidemiology and Infection, 135, pp 443-
454.
Farrell, P. (2016). Lamb chop weight enforcers want access to Australians’ metadata. The Guardian, Last accessed:
02nd May, 2020. Sourced from: https://www.theguardian.com/world/2016/jan/19/lamb-chop-weight-
enforcers-want-warrantless-access-to-australians-metadata
Fenton, N.E. and M. Neil (2018), Risk Assessment and Decision Analysis with Bayesian Networks, Second Edition.
2018, Chapman and Hall/CRC Press, ISBN: 9781138035119
Fidler, D. (2004). SARS, Governance and the Globalization of Disease. Springer.
Finazzi, F. (2020). Earthquake network - Pilot investigation COVID-19 in Val Seriana. Last accessed 7th April, 2020.
Sourced from: https://sismo.app/covid/
Giordano, G., Blanchini, F., Bruno, R., Colaneri, P., Di Filippo, A., Di Matteo, A., & Colaneri, M. (2020). Modelling the
COVID-19 epidemic and implementation of population-wide interventions in Italy. In Nature Medicine.
https://doi.org/10.1038/s41591-020-0883-7
Good, N., Dhamija, R., Grossklags, J., Thaw, D., Aronowitz, S., Mulligan, D., & Konstan, J. (2005, July). Stopping
spyware at the gate: a user study of privacy, notice and spyware. In Proceedings of the 2005 symposium on
Usable privacy and security (pp. 43-52).
Gould, M., & Lewis, G. (2020). Digital contact tracing: Protecting the NHS and saving lives. Healthtech Blog, Last
accessed: 02nd May, 2020. Sourced from: https://healthtech.blog.gov.uk/2020/04/24/digital-contact-tracing-
protecting-the-nhs-and-saving-lives/
Guan, W. J., Ni, Z. Y., Hu, Y., Liang, W. H., Ou, C. Q., He, J. X., ... & Du, B. (2020). Clinical characteristics of
coronavirus disease 2019 in China. New England journal of medicine, 382(18), 1708-1720
Guy, G. (2015). Requests for Access to Telecommunications Metadata under 176A of the TIA. Right to Know. Last
accessed 02nd May, 2020. Sourced from: https://goo.gl/jQHysu
Hamilton, I. (2020). The UK won’t use Apple and Google’s coronavirus contact-tracing technology for its app,
sparking privacy worries about how people’s data will be used. Business Insider, Last accessed: 02nd May,
2020. Sourced from: https://www.businessinsider.com/uk-nhsx-rejects-apple-google-coronavirus-app-model-
2020-4?amp;IR=T&r=US&IR=T
Hay, S. I., Battle, K. E., Pigott, D. M., Smith, D. L., Moyes, C. L., Bhatt, S., ... & Gething, P. W. (2013). Global
mapping of infectious disease. Philosophical Transactions of the Royal Society B: Biological Sciences,
368(1614), 20120250.
Hethcote, H. W. (2000). Mathematics of infectious diseases. SIAM Review, 42(4), 599–653.
https://doi.org/10.1137/S0036144500371907
Hekmati, A., Ramachandran, G., & Krishnamachari, B. (2020). CONTAIN: Privacy-oriented Contact Tracing Protocols
for Epidemics. arXiv preprint arXiv:2004.05251.
Hellewell, J., Abbott, S., Gimma, A., Bosse, N. I., Jarvis, C. I., Russell, T. W., Munday, J. D., Kucharski, A. J., Edmunds,
W. J., Sun, F., Flasche, S., Quilty, B. J., Davies, N., Liu, Y., Clifford, S., Klepac, P., Jit, M., Diamond, C., Gibbs,
H., … Eggo, R. M. (2020). Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts.
The Lancet Global Health, 8(4), e488–e496. https://doi.org/10.1016/S2214-109X(20)30074-7
Hewrn, A., & Sabbagh, D. (2020). Critical mass of Android users crucial for NHS contact-tracing app. The Guardian,
Last accessed: 07th May, 2020. Sourced from: https://www.theguardian.com/world/2020/may/06/critical-
mass-of-android-users-needed-for-success-of-nhs-coronavirus-contact-tracing-app
Ho, S. M., Kao, D., & Wu, W. Y. (2018). Following the breadcrumbs: Timestamp pattern identification for cloud forensics.
Digital Investigation, 24, 79-94.
17
Hu, Z., Song, C., Xu, C., Jin, G., Chen, Y., Xu, X., ... & Wang, J. (2020). Clinical characteristics of 24 asymptomatic
infections with COVID-19 screened among close contacts in Nanjing, China. Science China Life Sciences, 1-
6.
Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., ... & Cheng, Z. (2020). Clinical features of patients infected with
2019 novel coronavirus in Wuhan, China. The lancet, 395(10223), 497-506.
Huat, C. B. (2006). SARS epidemic and the disclosure of Singapore nation. Cultural Politics, 2(1), 77-96.
Hughes Hubbard (2020). Guidance from the EDPB and the CNIL for GDPR-Compliant Covid-19 contact tracing. Last
accessed: 06th May, 2020. Sourced from: https://www.hugheshubbard.com/news/guidance-from-the-edpb-
and-the-cnil-for-gdpr-compliant-covid-19-contact-tracing
Hussain, S. F., Watura, R., Cashman, B., Campbell, I. A., & Evans, M. R. (1992). Audit of a tuberculosis contact tracing
clinic. British Medical Journal, 304(6836), 1213-1215.
Ipsos mori (2020a). One month in: British public opinion on Covid-19. https://www.ipsos.com/ipsos-mori/en-uk/one-
month-british-public-opinion-covid-19-coronavirus.
Ipsos mori (2020b). Majority of Britons support government using mobile data for surveillance to tackle coronavirus
crisis. https://www.ipsos.com/ipsos-mori/en-uk/majority-britons-support-government-using-mobile-data-
surveillance-tackle-coronavirus-crisis
Jensen, C., & Potts, C. (2004, April). Privacy policies as decision-making tools: an evaluation of online privacy notices.
In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (pp. 471-478).
Johns Hopkins University, (2020). COVID-19 Dashboard, Centre for Systems Science and Engineering (CSSE), USA,
Last accessed: 07th May, 2020. Sourced from: https://coronavirus.jhu.edu/map.html.
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press.
Kao, R. R. (2003). The impact of local heterogeneity on alternative control strategies for foot-and-mouth disease.
Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1533), 2557-2564.
Kiss, I. Z., Green, D. M., & Kao, R. R. (2008). The effect of network mixing patterns on epidemic dynamics and the
efficacy of disease contact tracing. Journal of the Royal Society Interface, 5(24), 791-799.
Klinkenberg, D., Fraser, C., & Heesterbeek, H. (2006). The effectiveness of contact tracing in emerging epidemics.
PLoS ONE. 1(1). e12.
Klopfenstein, L., Delpriori, S., Di Francesco, G., Maldini, R., Paolini, B., & Bogliolo, A. (2020). Digital Ariadne: Citizen
empowerment for epidemic control. ArXiv preprint: 2004.07717v1
Kucharski, A. J., Klepac, P., Conlan, A., Kissler, S. M., Tang, M., Fry, H., ... & CMMID COVID-19 Working Group.
(2020). Effectiveness of isolation, testing, contact tracing and physical distancing on reducing transmission of
SARS-CoV-2 in different settings. medRxiv.
Kuhn, C., Beck, M., & Strufe, T. (2020). Covid Notions: Towards formal definitions - and documented understanding -
of privacy goals and claimed protection in proximity-tracing services. ArXiv preprint: 2004.07723v1
Kirzinger, A., Hamel, L., Muñana, C., Kearney, A., & Brodie, M., (2020). KFF Health Tracking Poll - Late April 2020:
Coronavirus, Social Distancing, and Contact Tracing. Last accessed: 06th May, 2020. Sourced from:
https://www.kff.org/global-health-policy/issue-brief/kff-health-tracking-poll-late-april-2020/
Lee, Y. (2020) Taiwan’s carrot-and-stick approach to virus fight wins praise, but strains showing. Reuters, Last
accessed: 07th May, 2020. Sourced from: https://www.reuters.com/article/us-health-coronavirus-taiwan-
quarantine/taiwans-carrot-and-stick-approach-to-virus-fight-wins-praise-but-strains-showing-
idUSKBN21E0EE
Leppin, A., & Aro, A. R. (2009). Risk perceptions related to SARS and avian influenza: theoretical foundations of current
empirical research. International journal of behavioral medicine, 16(1), 7-29.
Lin, Q., Zhao, S., Gao, D., Lou, Y., Yang, S., Musa, S. S., Wang, M. H., Cai, Y., Wang, W., Yang, L., & He, D. (2020).
A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual
reaction and governmental action. International Journal of Infectious Diseases, 93, 211–216.
https://doi.org/10.1016/j.ijid.2020.02.058
Lomas, N. (2020). Europe’s PEPP-PT COVID-19 contacts tracing standard push could be squaring for a fight with
Apple and Google. Tech Crunch. Last accessed: 29th April, 2020. Sourced from:
https://techcrunch.com/2020/04/17/europes-pepp-pt-covid-19-contacts-tracing-standard-push-could-be-
squaring-up-for-a-fight-with-apple-and-google/
Lomas, N. (2020b). Israel passes emergency law to use mobile data for COVID-19 contact tracing. Tech Crunch. Last
accessed: 06th May, 2020. Sourced from: https://techcrunch.com/2020/03/18/israel-passes-emergency-law-
to-use-mobile-data-for-covid-19-contact-tracing/
Maddocks, (2020). Department of Health: The COVIDSafe Application Privacy Impact Assessment. Last Accessed:
01st May, 2020.
Maghdid, H., & Ghafoor, K. (2020). A smartphone enabled approach to manage COVID-19 lockdown and economic
crisis. arXiv preprint arXiv:2004.12240.
Mathews, S. (2020). Private contact-tracing apps could ease coronavirus lockdown and get major businesses back to
work by monitoring COVID-19 spread in offices and alerting staff if they have been in contact with an infected
colleague. Daily Mail. Last accessed: 29th April, 2020. Sourced from: https://www.dailymail.co.uk/news/article-
8260301/Private-contact-tracing-apps-help-ease-coronavirus-lockdown.html
18
Maurushat, A., Bennett-Moses, L., & Vaile, D. (2015). Using 'big' metadata for criminal intelligence: understanding
limitations and appropriate safeguards. In Proceedings of the 15th International Conference on Artificial
Intelligence and Law (pp. 196-200).
McLachlan, S. (2016). Predicted by Orwell: A discourse on the gradual shift in electronic surveillance law.
https://arxiv.org/abs/2004.11594
McLachlan, S., Paterson, H., Dube, K., Kyrimi, E., Dementiev, E., Neil, M., Daley, B., Hitman, G.A., & Fenton, N. (2020).
Real-time Online Probabilistic Medical Computation using Bayesian Networks (No. 2744). EasyChair Preprint:
https://easychair.org/publications/preprint/9Jks
Merrick, R. (2020). Coronavirus: NHS contact tracing app needs 60% take-up to be successful, expert warns. The
Independant. Last accessed: 04th May, 2020. Sourced from:
https://www.independent.co.uk/news/uk/politics/coronavirus-app-uk-nhs-contact-tracing-phone-smartphone-
a9484551.html
Mizumoto, K., Kagaya, K., Zarebski, A., & Chowell, G. (2020). Estimating the asymptomatic proportion of coronavirus
disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020.
Eurosurveillance, 25(10), 2000180.
Mwongela, S. W. (2018). A Mobile based Tuberculosis contact tracing and screening system (Doctoral dissertation,
Strathmore University).
Niehus, R., De Salazar, P. M., Taylor, A. R., & Lipsitch, M. (2020). Using observational data to quantify bias of traveller-
derived COVID-19 prevalence estimates in Wuhan, China. The Lancet. Infectious Diseases, 3099(20), 1–6.
https://doi.org/10.1016/S1473-3099(20)30229-2
Oxford University (2020), COVID-19 Evidence Service, Centre for Evidence-Based Medicine (CEBM), United Kingdom.
Last accessed 06th May, 2020. Sourced from: https://www.cebm.net/covid-19.
Parran, T. (1937) Shadow on the Land: Syphilis. New York, Reynal & Hitchcock.
Pearl, J. (2014). Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier.
PEPP-PT, (2020). Pan-European Privacy-Preserving Proximity Tracing. Last accessed: 20th April, 2020. Sourced from:
https://www.pepp-pt.org/
Perez, B., Musolesi, M., & Stringhini, G. (2018). You are your metadata: Identification and obfuscation of social media
users using metadata information. In Twelfth International AAAI Conference on Web and Social Media.
Peto, J. (2020). Covid-19 mass testing facilities could end the epidemic rapidly. The BMJ, 368(March), 110110.
https://doi.org/10.1136/bmj.m1163
PHCIA, (2020). Biosecurity (Human Biosecurity Emergency)(Human Coronavirus with Pandemic Potential)(Emergency
Requirements - Public Health Contact Information) Determination 2020 Act. Last accessed: 02nd May, 2020.
Sourced from: https://www.legislation.gov.au/Details/F2020L00480
Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., ... & Xing, X. (2020). Early transmission dynamics in Wuhan,
China, of novel coronavirus–infected pneumonia. New England Journal of Medicine.
Reichert, L., Brack, S., & Scheuermann, B. (2020). Privacy-preserving contact tracing of covid-19 patients. Sourced
from:
Roddam, A. W. (2001). Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation.
International Journal of Epidemiology, 30(1), 186–186. https://doi.org/10.1093/ije/30.1.186
Rodrigues, H. S. (2016). Application of SIR epidemiological model: new trends. International Journal of Applied
Mathematics and Informatics, 10. http://arxiv.org/abs/1611.02565
Scott, D. (2020). What good digital contact tracing might look like. Vox. Last accessed: 29th April, 2020. Last accessed:
01st May, 2020. Sourced from:
https://eprint.iacr.org/2020/375.pdfhttps://www.vox.com/2020/4/22/21231443/coronavirus-contact-tracing-
app-states
Senga, M., Koi, A., Moses, L., Wauquier, N., Barboza, P., Fernandez-Garcia, M. D., ... & Kargbo, D. (2017). Contact
tracing performance during the Ebola virus disease outbreak in Kenema district, Sierra Leone. Philosophical
Transactions of the Royal Society B: Biological Sciences, 372(1721), 20160300.
Shamsi, J. A., & Khojaye, M. A. (2018). Understanding privacy violations in big data systems. IT Professional, 20(3),
73-81.
Skelton, J. A., Loveland, J. E., & Yeagley, J. L. (1996). Recalling symptom episodes affects reports of immediately-
experienced symptoms: Inducing symptom suggestibility. Psychology and Health, 11(2), 183-201.
Slovic P. (2000). The perception of risk. London: Earthscan.
Sun, K., & Viboud, C. (2020). Impact of contact tracing on SARS-CoV-2 transmission. Lancet, DOI: 10.1016/S1473-
3099(20)30357-1
TIAA, (2018). Telecommunications (Interception and Access) Act 1979. Last accessed: 02nd May, 2020. Sourced from:
https://www.legislation.gov.au/Details/C2019C00010
van der Heijden, M., Lucas, P. J., Lijnse, B., Heijdra, Y. F., & Schermer, T. R. (2013). An autonomous mobile system
for the management of COPD. Journal of biomedical informatics, 46(3), 458-469.
Vazquez-Prokopec, G., Montgomery, B., Horne, P., Clennon, J., & Ritchie, S. (2017). Combining contact tracing with
targeted indoor residual spraying significantly reduces dengue transmission. Science Advances, 3, e1602024.
19
Velikova, M. V., Terwisscha van Scheltinga, J. A., Lucas, P. J., & Spaanderman, M. (2014). Exploiting causal functional
relationships in Bayesian network modelling for personalised healthcare. International Journal of Approximate
Reasoning. 55. pp 59-73.
Volk, S. (2020). Coronavirus contact-tracing apps: Most of us won’t cooperate unless everyone does. The
Conversation. Last accessed: 29th April, 2020. Sourced from: https://theconversation.com/coronavirus-
contact-tracing-apps-most-of-us-wont-cooperate-unless-everyone-does-135959
Wagner-Egger, P., Bangerter, A., Gilles, I., Green, E., Rigaud, D., Krings, F., ... & Clémence, A. (2011). Lay perceptions
of collectives at the outbreak of the H1N1 epidemic: heroes, villains and victims. Public Understanding of
Science, 20(4), 461-476.
Wheaton, M. G., Abramowitz, J. S., Berman, N. C., Fabricant, L. E., & Olatunji, B. O. (2012). Psychological predictors
of anxiety in response to the H1N1 (swine flu) pandemic. Cognitive Therapy and Research, 36(3), 210-218.
Whitaker, S. (2020). Hundreds of academics back privacy-friendly coronavirus contact tracing apps. Techcrunch. Last
accessed: 29th April, 2020. Sourced from: https://techcrunch.com/2020/04/20/academics-contact-tracing/
Woodley, M. (2020). RACGP releases COVIDSafe factsheet. Royal Australian College of General Practitioners, Last
accessed: 05th May, 2020. Sourced from: https://www1.racgp.org.au/newsgp/professional/racgp-releases-
covidsafe-fact-sheet?feed=RACGPnewsGPArticles
Xia, Y., & Lee, G. (2020). How to return to normalcy: Fast and comprehensive contact tracing of COVID-19 through
proximity sensing using mobile devices. ArXiv preprint: 2004.12576v1
Yasaka, T., Lehrich, B., & Sahyouni, R. (2020). Peer-to-peer contact tracing: Development of a privacy-preserving
smartphone app. JMIR MHEALTH and UHEALTH, 8(4), e18936.
Appendix 1:
Australia
Rather than develop their own app, the Australian Government licensed rights to rebrand the TraceTogether app
developed by the Singaporean Government, and deploy it as COVIDSafe. As is common, emergency legislation was
hurriedly drafted and enacted under the catchy title: Biosecurity (Human Biosecurity Emergency)(Human
Coronavirus with Pandemic Potential)(Emergency Requirements - Public Health Contact Information) Determination
2020 Act (PHCIA, 2020). While making it an offence for a person outside those employed by a state or federal health
authority to collect, use or disclose COVID app data except for the purposes of contact tracing (Section 6(1) & (2)),
this determination explicitly limits the same provision to data generated within the app or by the Commonwealth and
stored on the user’s mobile device. PHCIA also excludes from all provisions, privacy or otherwise, information arising
from any source other than the National COVIDSafe data store (Section 6(3)). The effect of provisions of the PHCIA
make it unlawful for an app user or member of the general public to decrypt, view or disseminate any data from their
device, or even knowledge about data that the app collects or stores, while leaving Government organisations able
to interact with this data more freely.
The PHCIA contracts itself out of provisions of the Privacy Act 1998 that may be found inconsistent under power
of Section 477(5) of the Biosecurity Act 2015, but does not exclude itself from the operation of others, including the
Telecommunications (Interception and Access) Act 1979 (TIAA, 2018) which invokes data retention provisions on
telecommunications providers, including your telephony and internet service providers, and Amazon Web Services
who will be the web host of the central server, to store records of all forms of electronic communication for at least
two years. The TIAA also makes metadata available without warrant to a broad range of organisations that include
law enforcement, local, state and federal government bodies, the RSPCA, the Australian Navy and Border Protection
Services, the Thoroughbred horse and greyhound racing associations, Workplace Safety investigators, the Clean
Energy Regulator, National Measurement Institute, Building and Construction Commission, Taxi Services
Commission and in some cases it has been demonstrated, private investigators (Farrell, 2016; Guy, 2015).
20
Appendix 2:
Appendix 3:
United Kingdom
While drawing significant criticism, the UK National Health Service (NHS) has rejected the Apple/Google APIs and
decentralised model, expressly favouring a centralised approach that they say will allow for collection of more
granular data and broader analysis to study and track the pandemic (Hamilton, 2020). The key difference to be noted
between the NHS approach and all others is upfront acknowledgement of the intention to maintain this central
collection of data while also making substantial claims regarding the privacy strength of the userland app and ethics
of their approach. Unlike descriptions of all other claimed privacy-preserving apps seen in the COVID-19 literature,
and in stark contrast to the Australian approach of denying the public any real knowledge of the data being collected
and transmitted by their device (PHCIA, 2020), the NHS are making encouraging noises regards allowing
researchers, security analysts and the general public access to the source code, to see behind the curtain and verify
what data the app is collecting and transmitting (Gould & Lewis, 2020). Unlike any other, and if taken on face value,
this could allow UK citizens to consider that data’s existence and potential uses when deciding whether to download
and activate the app on our personal devices.
The data being collected
Drawn from many of the cited papers in this work, most apps will collect and transmit some subset of the following data
fields:
● MAC address of your device’s Bluetooth or wifi chip
● Your Phone number (or IMEI number if the device does not easily report the subscriber phone number)
● The MAC address of other people your phone sees (Bluetooth handshakes with everything it sees that is also
Bluetooth, even when it doesn’t know the device and has never been paired with it)
● The time, date and in some cases, location data from your GPS for each new interaction with another in-range
device (accurate to about 15 meters). A new interaction is when your device sees another device move into its
broadcast area. Note that in a corporate office the app might see the device of someone in the next room move
into and out of range tens or hundreds of times over the course of a working day.
● The Bluetooth or device name of the smartphone that is running the app, and every other Bluetooth device that
crosses into its broadcast range. This last point can more easily enable re-identification as people often name
their smartphone ‘Tim’s iPhone’ or similar.
21
Appendix 4:
Figure 6: Visual representation for the 60% CTA user NHS scenario
Appendix 5: The BN model
The full BN model is available at
http://www.eecs.qmul.ac.uk/~norman/Models/covid19_for_contact_tracing_paper.cmpx.
It can be opened and run in the free trial version of AgenaRisk (https://www.agenarisk.com/).
All the model prior and conditional probability tables can be inspected once the model is
opened.
Assumptions include:
● There is currently a hidden node called ‘risk factors’ whose parents are the background
risk factor nodes and which is a parent of both the COVID19 node and the ‘other condition
with similar symptoms node’. This explains why there are dotted lines.
● It is possible to have both COVID19 and a different condition (such as COPD or flu) with
several symptoms in common. However, an alert is only triggered is there is a sufficiently
high probability of the person being COVID19 positive.
● The COVID alert node is defined to be true of either sever, mild or asymptomatic are true
in the COVID node.
● Only the most important signs and symptoms S were included in the current version of the
BN presented in Figure 4, with symptoms occurring in at least 25% of patients with either
mild or severe Covid-19. In addition, the selected symptom S was expected to have a
likelihood ratio λ(S Covid-19) = P (S = yes Covid-19 = severe)/P (S = yes Covid-19 = mild)
> 1.08 (The likelihood ratio of the selected symptoms and signs vary from 1.085 for fever
22
to 7.75 for dyspnea).
● Some of the signs and symptoms, such as nasal congestion, have a likelihood ratio lower
than 1 (λ(nasal congestion Covid-19) = 0.549), and could be added to the model to reverse
to distribution of probability mass between severe and mild Covid-19. However, these
symptoms are relatively rare.
● It is assumed that all signs and symptoms are conditionally independent given the
presence or absence of Covid-19, with the exception of the variable Age and body
temperature (BodyTemp). Note, however, that all signs and symptoms are dependent on
each other through the Covid-19 variable.
Figures 7-9 provide some typical scenarios running the model that illustrate its power and
flexibility. Figure 10 provides the sensitivity analysis (tornado graph) output from Agenarisk
that shows which user observable variables have the most impact on the COVID alert
being True.
Figure 7: A person with no symptoms at all tests positive (on a perfectly accurate test). Note that this means
the “recent contact with infected person” node probability becomes 100% and there is a close to
100% probability the person is asymptomatic COVID (a very small probability that the symptoms
are so mild that the user has just failed to report them). The model infers this person is likely to
young with no underlying medical conditions. The COVID alert is triggered, but there is no need for
hospitalisation.
23
Figure 8: A user enters the observations that he is a 70-year-old obese male with underlying medical
conditions and that he believes he has been in contact with somebody who had COVID19
symptoms. Without entering any symptoms, the model predicts there is a 13% chance this person
has COVID19.
24
Figure 9: The user in Figure 8 now enters the fact that he has a high temperature, a cough, and chills (but
no dyspnoea). The probability of severe COVID is 82% (67% severe and 15% mild). The COVID
alert is triggered.
25
Figure 10: Sensitivity analysis illustrating the variables the user observable variables that most impact the
ALERT being true