PreprintPDF Available

An attention model to analyse the risk of agitation and urinary tract infections in people with dementia

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Behavioural symptoms and urinary tract infections (UTI) are among the most common problems faced by people with dementia. One of the key challenges in the management of these conditions is early detection and timely intervention in order to reduce distress and avoid unplanned hospital admissions. Using in-home sensing technologies and machine learning models for sensor data integration and analysis provides opportunities to detect and predict clinically significant events and changes in health status. We have developed an integrated platform to collect in-home sensor data and performed an observational study to apply machine learning models for agitation and UTI risk analysis. We collected a large dataset from 88 participants with a mean age of 82 and a standard deviation of 6.5 (47 females and 41 males) to evaluate a new deep learning model that utilises attention and rational mechanism. The proposed solution can process a large volume of data over a period of time and extract significant patterns in a time-series data (i.e. attention) and use the extracted features and patterns to train risk analysis models (i.e. rational). The proposed model can explain the predictions by indicating which time-steps and features are used in a long series of time-series data. The model provides a recall of 91\% and precision of 83\% in detecting the risk of agitation and UTIs. This model can be used for early detection of conditions such as UTIs and managing of neuropsychiatric symptoms such as agitation in association with initial treatment and early intervention approaches. In our study we have developed a set of clinical pathways for early interventions using the alerts generated by the proposed model and a clinical monitoring team has been set up to use the platform and respond to the alerts according to the created intervention plans.
Content may be subject to copyright.
1
An attention model to analyse the risk of agitation
and urinary tract infections in people with dementia
Honglin Li §, Roonak Rezvani §, Magdalena Anita Kolanko §, David J. Sharp, Maitreyee Wairagkar, Ravi
Vaidyanathan, Ramin Nilforooshan, Payam Barnaghi
Abstract—Behavioural symptoms and urinary tract infections
(UTI) are among the most common problems faced by people
with dementia. One of the key challenges in the management
of these conditions is early detection and timely intervention in
order to reduce distress and avoid unplanned hospital admissions.
Using in-home sensing technologies and machine learning models
for sensor data integration and analysis provides opportunities
to detect and predict clinically significant events and changes
in health status. We have developed an integrated platform to
collect in-home sensor data and performed an observational study
to apply machine learning models for agitation and UTI risk
analysis. We collected a large dataset from 88 participants with
a mean age of 82 and a standard deviation of 6.5 (47 females
and 41 males)to evaluate a new deep learning model that utilises
attention and rational mechanism The proposed solution can
process a large volume of data over a period of time and extract
significant patterns in a time-series data (i.e. attention) and use
the extracted features and patterns to train risk analysis models
(i.e. rational). The proposed model can explain the predictions
by indicating which time-steps and features are used in a long
series of time-series data. The model provides a recall of 91%
and precision of 83% in detecting the risk of agitation and
UTIs. This model can be used for early detection of conditions
such as UTIs and managing of neuropsychiatric symptoms such
as agitation in association with initial treatment and early
intervention approaches. In our study we have developed a
set of clinical pathways for early interventions using the alerts
generated by the proposed model and a clinical monitoring team
has been set up to use the platform and respond to the alerts
according to the created intervention plans.
I. INTRODUCTION
DEMENTIA affects 850,000 people in the UK and over
50 million globally, and is set to become the developed
world’s largest socioeconomic healthcare burden over coming
decades [1], [2]. In the absence of any current treatment,
there is an urgent need to focus on reducing the effects of
symptoms and help to improve the quality of life and well-
being of those already affected [3]. The 2020 report of the
Lancet Commission on dementia prevention, treatment, and
care stresses the importance of individualised interventions
H. Li, M. N.Kolanko, D. J. Sharp, P.Barnaghi are with Department of Brain
Sciences, Imperial College London, W12 0NN, United Kingdom.
R. Rezvani is with Centre for Vision, Speech and Signal Processing,
University of Surrey, Guildford, GU2 7XH, United Kingdom.
M. Wairagkar and R. Vaidyanathan are with Department of Mechanical
Engineering, Imperial College London, SW7 1AL, United Kingdom.
R. Nilforooshan is with Surrey and Borders NHS Foundation Trust,
Leatherhead, KT22 7AD, United Kingdom.
All authors are also with the Care Research and Technology Centre, The
UK Dementia Research Institute (UK DRI).
§these authors contributed equally to this work.
Corresponding author: p.barnaghi@imperial.ac.uk
to address complex medical problems, multimorbidity and
neuropsychiatric symptoms in dementia, which lead to un-
necessary hospital admissions, faster functional decline, and
worse quality of life [4].
People with dementia have complex problems with symp-
toms in many domains. It is estimated that up to 90%
will develop behavioural and physical symptoms of dementia
(BPSD) over the course of their illness, with agitation being
one of the most common symptoms [5], and a frequent reason
for nursing home placement [6]. Furthermore, patients with
dementia often suffer from a number of co-morbid conditions
and have a higher frequency of medical problems such as falls,
incontinence, dehydration or urinary tract infection (UTI) - the
commonest bacterial infection in the older patient population,
and the commonest cause of sepsis in older adults [7] with
an associated in-hospital mortality of 33% in this age group
[8]. If not detected and treated early, both BPSD and medical
comorbidities frequently lead to emergency hospital admis-
sions in dementia patients. Alzheimer’s Research UK estimates
that 20% of hospital admissions in dementia patients are for
preventable conditions, such as urinary tract infections. Be-
sides significant costs, hospitalisation places dementia patients
at risk of serious complications, with longer hospital stays,
higher risk of iatrogenic complications, delayed discharge
and functional decline during admission, which contributes
to higher rates of transfer to residential care and in-patient
mortality [9]. Therefore, increased medical supervision, early
recognition of deterioration in health status and rapid treatment
are key to preventing unnecessary hospitalization for ’ambu-
latory’ conditions, that could be treated outside of hospital,
such as UTIs. Furthermore, ongoing monitoring of people
with dementia allows immediate detection of behavioural
disturbances, enabling earlier psychosocial and environmental
interventions to reduce patients’ distress and prevent further
escalation and hospitalization.
However, monitoring and supporting individuals in an on-
going manner is a resource and cost-intensive task, often not
scalable to larger populations. Utilising remote monitoring
technologies with the help of caregivers can allow creating
practical and generalisable solutions. As part of the research
in the Care Research and Technology Centre at the UK De-
mentia Research Institute (UK DRI), we have been developing
and deploying in-home monitoring technologies to help and
support people affected by dementia. Our research has led to
the development of a digital platform that allows collecting
and integrating in-home observation and measurement data
using network-connected sensory devices [10]. In this paper,
arXiv:2101.07007v1 [cs.AI] 18 Jan 2021
2
we discuss how our in-home monitoring data and machine
learning algorithms are used to detect early symptoms of
agitation and UTI in people with dementia living in their own
homes.
Sensing technologies have been increasingly used to moni-
tor activities and movements of elderly patients living in their
own homes [11], [12], [13]. Interpreting this information; how-
ever, demands considerable human effort, which is not always
feasible. The use of analytical algorithms allows integration
and analysis of rich environmental and physiological data at
scale, enabling rapid detection of clinically significant events
and development of personalized, predictive and preventative
healthcare.
Deep learning models have been applied in a variety of
healthcare scenarios to identify the risk of various clinical con-
ditions or predict outcomes of treatment [14], [15]. Recently,
there have been several implementations of Recurrent Neural
Networks (RNNs) to create learning models for time-series
healthcare data analysis [16], [17], [18]. The behavioural and
physiological symptoms and patterns in long-term conditions
such as dementia appear in the data over a long period
of time and can fluctuate and change over the course of
disease. Machine learning models such as RNNs; however,
are not suitable for analysing long sequences of time-points.
To address the long sequence analysis issue in RNNs, other
methods such as Bidirectional RNN, LSTM and GRU have
been used [19], [20]. There also have been attempts to apply
attention mechanisms to clinical datasets [21], [22], [23], [24],
[25] to improve the performance of analysing imbalanced
and long-tail time-series data. A fundamental limitation in
these models is the adaptivity and generalisability. When
long-distance symptoms and patterns are related to a specific
condition, the generalisability and performance of the existing
models are limited. The long sequences of data points and
the changes in the ongoing conditions vary in patients, and
often there are no large labelled training samples to train the
models for all the variations. Deep learning models offer a
new opportunity to training models that can pay attention to
correlations and long-distance relations between the patterns
and sequences. However, the off-the-shelf and existing deep
learning model require large training samples.
While applying neural networks to clinical data, there are
two main challenges: 1) selecting the important timesteps and
features from long sequences of data to create generalisable
models; and 2) imbalance in datasets. Neural networks are
very effective in finding a trend in datasets. Models such
as Recurrent Networks use the positions of the input and
output sequences to generate a sequence of hidden states. This
is computationally expensive and limited computing of the
global dependencies [26]. In these models, the computational
complexity to relate input or output positions also grows as the
distance between positions increases. This latter makes it very
challenging to learn dependencies and correlations between
long-distance patterns and time points [27].
Additionally, clinical datasets are often imbalanced, with
content spanning ensembles of heterogeneous data. Most of
the clinical datasets contain more normal cases (i.e. True
positives) than abnormal data points (i.e. True Negatives). In
our dataset, which includes a large set of in-home environ-
mental and physiological data from people with dementia, the
number of positive cases for infections is much smaller than
the true negative cases. In large parts of the data, the true
status of the infection is unknown (i.e. the data is partially
labelled due to the limitations in accessing the patients’ clinical
records or knowing the presence of any infections without
a test). This issue causes the learning models to exhibit a
bias towards the majority class. It may ignore the minority
class or make a decision based on a partial set which is not
a broad representation of the cases [28]. There have been
several works on implementing attention mechanisms [26] to
improve the generalisability of learning models in analysing
time-series data. However, Jian et. al [29] found that there are
limitations in the weights generated by attention-based models
which can lead to wrong predictions. Hence, we need to be
more cautious in using the attention mechanisms and their
explanations in designing deep learning models. While the
attention-based models are promising in healthcare time-series
data analysis, considering the time and features dependencies
of the predictions poses a challenge for this type of models.
Over-sampling which augments the data by generating syn-
thetic samples [30], down-sampling which prunes the samples
in the majority classes are among the typical models that are
used to deal with the imbalance issues in datasets [31]. How-
ever, samples in clinical data and variations in the real-data are
important aspects of the observations and measurements that
may not be present in augmented data generated by sampling
methods. It is crucial to find an efficient way to address the
imbalance issue without modifying or reducing the original
data in pre-processing steps [32].
Our goal is to propose a model to address the challenges
mentioned above. To support the clinical treatment and adapt
to the real-world sensory data readings, the model should
filter the redundant and less informative data. Furthermore,
the model can explain the predictions by telling us which time
periods and sensors are important to give the predictions. Last
but not least, the model can adapt to the imbalanced data.
II. DESIGN,SETTING AND PARTICIPANTS
Real-time, continuous measurement methodologies enabled
by the recent advances in pervasive computing and ‘smart-
home’ technologies provide opportunities to monitor the be-
haviour and health status of elderly people using wearable
technology or environmental sensors [11], [12], [13].
Computer-derived algorithms have been developed to anal-
yse sensor data and identify patterns of activity over time.
These can be applied to detect changes in activities of daily
living in order to predict disease progression and cognitive
decline. For instance, ORCATECH group used continuous in-
home monitoring system and pervasive computing technolo-
gies to track activities and behaviours such as sleep, computer
use, medication adherence to capture changes in cognitive
status [33]. They also demonstrated the ability of machine
learning algorithms to autonomously detect mild cognitive
impairment in older adults [34]. Machine learning models
have also been used to detect clinically significant events
3
Fig. 1: An overview of the proposed solution for healthcare data analysis. The data is encoded by positional encoding before passing to the
model. The proposed rationalising extract important information and pass to the higher layers. The proposed rationalising block contains a
rational layer to extract important time steps. A Long-Short Term Memory (LSTM) model processes the extracted data. The attention layer
to pay attention to suitable features. The rationalising process of the data changes during the rationalising block. The rationalising block
extracts the important time steps at first. Then it pays attention to different emphasis features of the pruned data. Then the data is given to
make a prediction. All the layers are trained simultaneously.
and changes in health status. Much of the previous work
focused on detection and prediction of falls using wearable
accelerometers or other motion detectors [35], as well as
tracking behavioural symptoms such as sleep disturbances
[36], agitation [37], and wandering [38] in elderly patients.
However, there is limited research on the use of machine
learning models for detection of health changes such as infec-
tion in the context of smart-homes. An early supervised UTI
detection model has been described using in-home PIR sensors
[39], however it relied on the activity labels and annotations in
the training dataset, which is extremely time-consuming and
not generalisable to the real-world situations with large amount
of unlabelled data collected from uncontrolled environments.
We have previously proposed an unsupervised technique that
could learn individual’s movement patterns directly from the
unlabelled PIR sensor data [40].
Furthermore, the existing research and the data-driven solu-
tions are either applied to small scale pilot studies and do not
provide evidence for scalability and generalisability. They are
also limited in analysing long-term patterns and correlations
that appear in the data. Attention-based models which can
overcome these problems have never been applied to sensor
data for detecting clinically significant events or changes in
health status in dementia patients.
This is the first to use deep learning and attention-based
methods to perform risk analysis for behavioural symptoms
and health conditions such as UTIs in people living with
dementia. The proposed model improves the accuracy and
generalisability of machine learning models that use imbal-
anced and noisy in-home sensory data for the risk analysis.
An analysis of the suitability of the digital markers and the
use of in-home sensory data is explored in an ablation study.
The proposed model is compared with several baseline models
and state-of-the-art methods. The proposed approach has been
evaluated in an observational clinical study. Participants (n=88,
age=81 +/- 6.5) were recruited for a six months trial period.
The proposed solution provides a recall of 91% and precision
of 83% in detecting the risk of agitation and UTIs. We have
also set up a framework and a clinical response team that use
the risk alerts generated by the models for ongoing support and
management of the conditions in people living with dementia.
Using high-resolution in-home observation and measure-
ment data in association with advance machine learning
methods leads to early and timely interventions and has a
significant impact on reducing preventable and unplanned
hospital admissions in people affected with dementia. A key
challenge in using analytical and predictive models for risk
analysis is identifying and collecting digital markers data using
in-home sensory devices. The capacity of the proposed model
to address time-series feature identification and data imbalance
enables use in a very wide range of healthcare and risk analysis
applications using in-home digital markers.
III. METHOD
We introduce a model that can identify the important
time steps and features and utilise long-distance dependencies
to make better predictions. The proposed model provides a
prediction based on the selected time points and the selected
features from the raw observation and measurement data.
Figure 1 shows how the data changes during the processing.
The model selects important time steps through a pruning
process. After pruning the data, it pays attention to different
features and uses them to make the predictions. Different
from methods such as clustering sampling [41], we select the
4
Fig. 2: Visualisation of the sensor readings. The x-axis represents the time of the day for activation of the sensors. The y-axis represents
the days for a period of 8 months for a patient. Each colour represents a type of an environmental activity sensor. Similar colour along the
y-axis represent similar patterns of activities around the same time in consecutive days. The more colour distortion/merge of colours along
the y-axis represent more changes in pattern of activity over time.
important time steps of each sample instead of selecting a
portion of samples for training. In contrast to statistic feature
selection methods such as sequential feature selection [42], the
proposed model selects important time steps based on different
data. We use focal loss [43] to assign priority to the minority
class without generating synthetic samples.
Fig. 3: A heat-map of the aggragation of the raw data. The readings
are aggregated per hour within each day.
Data sources and pre-processing
We have collected the data as part of an observational
clinical study in people living with dementia from December
2018 to April 2020. Each of the participants has had a
confirmed diagnosis of dementia (mild to severe) within, at
least, the past three months of recruitment and have been
stable on dementia medication. The collected data contains
continuous environmental sensor data from houses of patients
with dementia who live in the UK. The sensors include Passive
Infra-Red (PIR), smart power plugs, motion and door produced
by Develco in Aarhus, Denmark. The sensors were installed
in the bathroom, hallway, bedroom, living room (or lounge)
and kitchen in the homes and also on the fridge door, kettle
and microwave (or toaster). The sensors also include network-
connected physiological monitoring devices that are used for
submitting daily measurements of vital signs, weight and
hydration. The data is integrated into a digital platform, which
is designed in collaboration with clinicians and user group to
support the people with dementia, that we have developed in
our past research [10]. A clinical monitoring team that is set
up as part of our observational study has used the platform to
daily annotate the data and very the risk analysis alert. Based
on the annotations, we select four incidents including agitation,
Urinary Tract Infection (UTI), abnormal blood pressure and
abnormal body temperature to label our data binarily. More
specifically, a label is set to true when the abnormal incident
is verified by the monitoring team and vice versa. We then
use the environmental data to inference if there is any incident
happen within one day. Fig 2 shows an example of collected
data. To pre-process the data, we aggregate the readings of the
sensors within each hour of the day, shown in Fig 3. Appendix
1 shows a list of potential digital markers and sensory data that
can be used in dementia care. In the appendix, we also show a
screenshot of the platform that is used for collecting the data.
Machine learning model
We aim to use the environmental sensors to predict possible
incidents and avoid delayed treatment. Furthermore, the model
should provide the reason, i.e. which period of time and
sensors are important to give the predictions, to explain the
inference. In other words, the model can remove the redundant
5
or less informative information and use the rest of the data to
give the prediction, shown in Fig 4.
Fig. 4: Selected time steps from the raw data. These time steps are
selected by the model. The model learns to identify time steps that
are more important in predicting the outcome.
As discussed earlier, in healthcare data analysis, often, the
predictions are based on a long sequence of data measured
and collected at different time-points. Accessing and feeding
more data helps to train more accurate models. However, more
information can also mean more noise in the data, and the
imbalance in the samples that are given to the model can also
lead to decision bias. An efficient model should be able to
process and utilise as much data as available. However, the
model should also avoid the common pitfalls of noise and bias.
To address these issues, we have studied the use of attention-
based models. This group of models will utilise all the avail-
able information and, in each sequence, will identify the time-
points that provide the most information to the training and
prediction. This attention and selection process is an embedded
step in the model. It will allow the model to be flexible and
generalisable for different sequences with variable lengths and
for a different combination of features and values that are
represented in the data. Before explaining our proposed models
and its contributions to creating a generalisable solution for
time-series healthcare data analysis, we provide an overview
of the related work. We discuss the use of attention-based
models in other domains and explain how the ideas presented
in the existing work has led to the design of our current model.
Fig. 5: After selecting the important time steps, the model learns
which sensors should be attention. In this case, the model think the
bathroom sensor has the most contribution the prediction.
The attention mechanisms have been introduced in Neural
Language Processing (NLP) by Bahdanau et. al [44]. The
attention-based models are widely used in NLP due to their
capability of detecting important parts of a sequence and
efficiently interpreting it. The attention-based models have also
been used in continuous healthcare and clinical data analysis
[45]. Continuous clinical data are multivariate time-series data
with temporal and sequential relationships. For each patient,
the data is a set of time steps, and each time step contains
medical features (XRt×d). REverse Time AttentIoN model
(RETAIN) is one of the first systems, that used in using
attention mechanism for medical data [21]. In this model,
there are two separate RNNs, one to generate the visit-level
attention weights (α) and the other one for variable-level (β)
attention weights. In this model, the most relevant time step
is the one associated with the largest value in α. Choi et. al
provided a method to find the most influential medical feature
[21]. However, RETAIN cannot handle long-distance depen-
dencies. To deal with this issue, Ma et. al proposed Dipole,
a predictive model for clinical data using Bidirectional RNNs
[22]. They have implemented the model using two different
attention mechanisms: General attention and Concatenation-
based attention. The results show that Concatenation-based
attention outperforms because of incorporating all the long-
distance dependencies.
In the above models, the input layer is simple, and the
data has the same pipeline, but in the Timeline model, Bai
et. al adapted the pipeline of data [23]. They use attention
layer to aggregate the medical features, and by modelling
each disease progression pattern, they find the most important
timesteps. To deal with long-distance dependencies, Timeline
implements Bidirectional LSTMs. One of the recent studies in
this area is AdaCare [24], which uses Gated Recurrent Units
(GRU). AdaCare utilises convolutional structure to extract
all the dependencies in the clinical data. AdaCare showed
promising results in the explainability of the model. The
models mentioned above have been developed based on re-
current networks. However, the sequential aspect of recurrent
models is computationally inefficient. The SAnD model was
developed solely based on multi-head attention mechanism
[25]. Song et. al implemented a positional encoding to include
the sequential order in the model.
The models mentioned above show significant improve-
ments in the accuracy and performance of predictive models
in the clinical field. However, incorporating both long-distance
dependencies and feature associations is a challenging task. In
the existing models, the analysis is either on time step-level or
feature-level. In this paper, we propose a model to detect and
predict the risk of healthcare conditions by analysing long-
distance dependencies in the patterns and sequences of the
data. This information can be useful for clinical experts in
ongoing management of the conditions. The work also helps
to use an automated process to alert the risk of adverse health
conditions and explore the symptoms related to the detected
conditions.
Our proposed model consists of two main components, a
rationalising block and the classification block, as shown in
Figure 1. In a high-level overview, the rational layers select the
6
important time steps and pass to an LSTM layer. The LSTM
layer will ignore the trivial time steps and process the data
for the attention block. The classifier uses these time points
for predictions. After processing by the attention block, the
model will give a prediction. The details of these blocks are
explained in the following sections.
Positional Encoding
To use the order of sequence in the analysis, we add
positional encoding (PE) before passing the data into the
model. We use the sine and cosine positional encoding [26].
Shown in Equation 1, where pos is the position of the time
step, iis the position of the sensor, dis the dimension of each
time step.
P E(pos, 2i) = sin(pos/100002i/d )
P E(pos, 2i+ 1) = cos(pos/100002i/d)(1)
Rationalising Prediction
To add more focus on the time steps in the data that are
more relevant to the predictions, the generator produces a
binary mask to select or ignore a specific time points. For
example: xRk×fcontains ktime point and ffeatures for
each time point, the generator will produce a binary vector
z={z1, z2, . . . , zk}. The ith variable zi∈ {0,1}indicates
whether the ith time point in xis selected or not.
Whether the ith time point is selected or not is a conditional
probability given the input x. We assume that the selection
of each time point is independent. The Generator uses a
probability distribution over the z, which could be a joint
probability of the selections. The joint probability is given
by:
p(z|x) =
k
Y
i=1
p(zi|x)(2)
Classifier
After exploring and selecting the most relevant time points,
we train a classifier to provide the predictions. The trained
classifier contains attention blocks and residual blocks.
Attention block is an application of self-attention mecha-
nism to detect the important features. The attention mecha-
nism detects important parts of a sequence. It has three key
components: the inputs structure, the compatibility function
and the distribution function [46].
There are three inputs in the structure; Keys (KRnk×dk),
Values (VRnv×dv) and Query (QRnq), where the nis
the dimension of the inputs, the k, v, q are the dimension of the
outputs. They could have different or same sources. If Kand
qcome from the same source, it is self-attention [26]. Kand
Vrepresent input sequence which could be either annotated or
raw data. qillustrates the reference sequence for computing
attention weights. For combining and comparing the qand
Kvalues, compatibility function has been used. Distribution
function computes the attention weights (aRdk) using the
output of compatibility function (cRdk).
We obtain the attention by Equation 3. The Q, K, V are ma-
trices formed by queries, keys and values vectors, respectively.
Since we use the self-attention, the Q, K, V are calculated by
the inputs with different weight matrices.
Attention(Q, K, V ) = softmax(QK T
dk
)V(3)
The architecture of the attention block is the same described
in [26]. We employ a residual connection [47] followed by a
normalisation layer [48] inside the attention block. Residual
blocks and the output layer process the output of the attention
block.
Objective function
The training samples in healthcare datasets are often imbal-
anced due to the low prevalence and sporadic occurrences. In
other words, some of the classes contain more samples than
others. For example, only 25% of the data we collected are
labelled as positive. More details of the dataset will be clarified
in the following section. To deal with the imbalance issue, we
use focal loss [43] as the objective function of the classifier,
shown in Equation 4:
Lc=α(1 p)βlog(p)(4)
where αand βare hyper-parameters to balance the variant of
the focal loss, p=f(x, z)y+ (1 f(x, z)(1 y).f(x, z)
is the probability estimated by the classifier and y∈ {0,1}is
the label of x.
In addition to the loss function used in the classifier, the
generator produces a short rational selection and calculates
the loss. Shown in Equation 5, where the λis the parameter
to weight the selection:
Lg=λ||z|| (5)
We then combine the focal loss and the loss from generator
to construct loss function as shown in Equation 6:
L=X
(x,y)D
E[Lc+Lg](6)
IV. RES ULT S
Evaluation Metrics: To evaluate our proposed method and
compare it with the baseline models, we calculated different
metrics. One of the primary metrics to assess the model is
accuracy which is the measure of how close is the predicted
class to the actual class. However, accuracy alone cannot be
a good measure to evaluate the performance of a classifier.
As a result, we also calculated the Area Under the Curve
of Receiver Operating Characteristic (ROC) and Precision-
Recall (PR). The precision of class A is the ratio of samples
predicted as class A which are correct, and Recall is the ratio
of samples as true class A which have been detected. ROC
curve is the measure of model capability in differentiating
between classes. We do not report the results in terms of
specificity and sensitivity. The reason is that in this study, we
do not have access to the full electronic healthcare records
7
(a) PR (b) ROC (c) Loss
Fig. 6: Evaluation of the proposed methods using the in-home sensory dataset. (a) shows the precision; (b) evaluates the Receiver Operating
Characteristics (ROC) curve and (c) shows the changes to the loss during the training. In (a) and (b) the results of the proposed model is
also compared with a set of baseline models.
(a) PR (b) ROC (c) Selection Rate changes
Fig. 7: An ablation study to evaluate the model; (a) shows the precision; (b) evaluates the Receiver Operating Characteristics (ROC) curve
and (c) shows the selection rate changes. In (a) and (b) the results of the evaluation is by eliminating different components from the model.
and hospital admission data of all the participants. So report
the specificity and sensitivity only based on the detected and
evaluated labels in our dataset, which can only be a sub-set
of true and false cases for the cohort, can be misleading in
terms of an actual and generalisable clinical finding. Instead,
we have opted to evaluate the precision and generalisability
of the prediction algorithm based on the existing labelled
data and the known cases that we could evaluate and verify
the performance of the model.
Baseline Models: We compare our model with the Linear
Regression (LR) [49], Long-Short Term Memory (LSTM)
neural networks [50] and a fully connected Neural Network
(NN) model [51].
LR is a discriminative model which can avoid the confound-
ing effects by analysing the association of all variables together
[49]. It is also a commonly used baseline model to evaluate
the performance of the proposed models [20].
NN has the ability to learn a complex relationship. Unlike
LR, NN does not need to assume the variables are linearly
separated. It is also applied to a variety of clinical data sets
[52], [53]. In the experiment, we used a Neural Network with
one hidden layer contains 200 neurons, a softmax output layer
contains two neurons, cross entropy loss and adam optimiser.
LSTM is a powerful neural network to analyse the sequen-
tial data, including time-wised clinical datasets [18], [19].
It can associate the relevant inputs even if they are widely
separated. Since our dataset consists of time-series sequences,
we take the LSTM as another baseline model. In the experi-
ment, we used a model that contains one residual block, one
LSTM layer contains 128 neurons, and a softmax output layer
contains two neurons, cross entropy loss and adam optimiser.
In the experiments, we aggregate the readings of each
sensor per hour. Hence each data point contains 24-time
points and eight features. We set the batch size to 32,
learning rate to 0.0001, sparsity to 0.001. We divide the
data into a train set and a test set. The numbers of training
and testing samples in the datasets are 209 and 103 cases
with their associated time-series data, respectively. The
data is anonymous, and only the anonymous data without
any personally identifiable information is used in this research.
Experiments: The ROC and PR changes during training
are shown in the first two graphs in Figure 6. Overall, the
proposed model outperforms other baseline methods. The
LSTM performs well in dealing with the time-series data.
Compared to the other methods, the neural network converges
much faster. However, the performance of the model fluctuates
around 30 epochs. The convergence and the fluctuation are
due to the rational process. The model has to learn how to
extract important time steps and pay attention to the features.
This process is also reflected in Figure 6c, the loss fluctuates
during that period. However, the model adjusts this fluctuation
automatically and improves the performance. The overall
results are also summarised in Table I.
V. DISCUSSION
Ablation Study: We begin the discussion with an ablation
study. Our model contains five important components:
Rational layers, Attention layers, Residual layers, focal loss
8
TABLE I: The evaluation results in comparison with a set of
baseline models: Linear Regression (LR), Long-Short Term Memory
(LSTM) neural networks and a fully connected Neural Network (NN)
model. Since the dataset is imbalance, we calculated the Area Under
the Curve (AUC) of Receiver Operating Characteristic (ROC) and
Precision-Recall (PR) to evaluate the performance.
LR LSTM NN Proposed method
AUC - PR 0.3472 0.6901 0.5814 0.8313
AUC - RC 0.5919 0.7644 0.7601 0.9131
and positional encoding. We omit each component one at
a time and explore how removing one of the components
will impact the performance of the model. The experiments
are shown in the first two graphs of Figure 7. The orange
line represents the model without rational layer. Although
the performance of the model without rational layer keeps
increasing, it underperforms in others significantly. In other
words, the rational layer plays an important role in the model.
Removing the positional encoding, attention layer, residual
layer, or the focal loss decrease the performance as well.
The performance change caused by omitting each of these
four components are quite similar. As shown in Figure 7,
the positional encoding helps the model to identify relevant
patterns of the data over time and plays an important role in
the performance of the model. The rate of selected timesteps
changes is shown in Figure 7c. The rate of selected timesteps
changes is shown in Figure 7c.
Rationalising prediction: the Rational component helps to
increase the accuracy of the model. Generally, the proposed
rationalising method shows that the model knows which time
steps and features to give the prediction. These patterns and
time steps can also be explored to identify and observer
relevant data and symptoms to a condition in each patient.
Using this component, a personalised set of patterns and
symptoms can be explored for each patient. The last graph
in Figure 7 shows the selection rate changes during the
training phase. The model learns to extract the time steps,
and the accuracy increases after the changes become stable.
As mentioned in the ablation study, after learning to extract
the important time steps, the proposed model outperforms the
baseline models without rational mechanisms. In other words,
the model extracts a sub-set of the time steps (e.g. part of
the time steps are extracted from Figure 3 to Figure 4) to
obtain a better prediction. As the learning process continues,
the model tries different selections and finds the optimised
selection rate. Comparing to other models, the performance
of the proposed model does not decrease during the training.
The model learns to pay attention to the most relevant
segments of the data and consider long-distance dependencies
in the time-series data. In summary, the proposed model
can not only explain the prediction but also abandon the
redundant information in the data automatically. According to
our experiments, the proposed model in average selects 61%
of the time points in the datasets to estimate the predictions.
Pair analysation: We then analyse the rational block pro-
cessing on the positive and negative samples. As shown in
Figure 8, the rational block assigns weights to the positive
and negative samples differently. More specifically, the model
has learnt to extract different amount and series of time steps
based on the inputs. In this case, the model extracts more time
steps on the positive case than the negative case. Furthermore,
the model pays attention differently based on the input data.
In the example above, the model assumes the bathroom is
the most important sensors in the positive samples. However,
the model takes the bathroom and kettle almost as equally
important sensors for predicting the negative case. After the
model pays attention to the sensors of selected time steps, the
classifier gives the predictions correctly.
Translating machine learning research into clinical practice
Improving the quality of life by preventing illness-related
symptoms and negative consequences of dementia has been
set out as a major goal to advance dementia care. Agitation
and infections have been highlighted as areas for priority
development [6]. Our proposed model directly addresses these
priorities in dementia care and intervention by enabling early
detection of agitation and urinary tract infections in re-
mote healthcare monitoring scenario, providing an opportunity
for delivering more personalised, predictive and preventative
healthcare. When applied to real-world clinical dataset in the
context of the current clinical study our proposed algorithm
provided a recall of 91% and precision of 83% in detecting
early signs of agitation and UTI from physiological and
environmental sensor data. A clinical monitoring team verified
the predictions by contacting the patient or carer when an
agitation or UTI alert was generated. A set of clinical pathways
for early interventions has also been developed for the clinical
monitoring team to use when responding to the alerts.
Relevance to patient outcomes: We would like to high-
light an important aspect of using this type of analysis to
evaluate healthcare and patient outcomes. Focusing only on
accuracy as a metric for assessment of the solution within
a specific cohort goes only so far [54]. Large studies and
further experiments with different cohorts and various in-
home deployment settings are required to assess how such
algorithms will perform in the noisy and dynamic real-world
environments. There are several examples of AI and machine
learning algorithms that perform very well in controlled and
laboratory settings, but the real-world experience is different
[54]. In this study, the sensors and data collection happens in
uncontrolled, real-world environment. We have done several
cross-validations, comparison and ablation studies to avoid
overfitting the model and make sure the results are robust
and reproducible. However, further independent trials and
validation studies with larger cohorts are required to transform
the current work into a product that can be used in real-world
clinical and care settings. Another important item is that only
focusing on the accuracy of the algorithm will not give a
complete picture of the real effectiveness and impact of the
solution on patient outcomes.
Our agitation intervention protocol follows all current guide-
lines, which agree that individualised and person-centred non-
9
Fig. 8: Visualisation of the outputs within the rational block. The top figure visualises a sample which is validated with a True incident. The
bottom figure is a sample which is validated with a False incident.
pharmacological therapies are the first-line treatment for agita-
tion in people with dementia [55], [56]. In line with the current
guidelines, the initial assessment explores possible reasons
for patients’ distress and addresses clinical or environmental
causes first. The clinical monitoring team asks a set of stan-
dardised questions to evaluate the symptoms and to help the
carer to identify potential causes of agitation such as pain,
illness, discomfort, hunger, loneliness, boredom or environ-
mental factors (temperature, light, noise level). The recognition
and treatment of possible organic causes or triggering factors
remains the mainstem of the intervention. In particular detec-
tion of delirium and a possible underlying infection is of great
importance and the clinical monitoring team facilitates early
diagnosis and treatment by liaising with the study’s clinical
team and patient’s GP. Finally, the clinical monitoring team
provides psychological support for the caregivers in order to
reduce the caregiver distress. In the future, we are planning
to use multimodal sensor data to improve the classification
of agitation state which will include measuring sound levels
along with activity detected by environmental sensors.
Similarly to the agitation protocol, in case of a UTI alert the
clinical monitoring team first responds by contacting the pa-
tient/carer to evaluate the symptoms. However, the diagnosis of
UTI in dementia patients can be problematic, as these patients
are less likely to present with a typical clinical history and
localised urinary symptoms compared with younger patients
[57]. The team, therefore, arranges a home visit to perform a
dipstick urine test. If the urine dipstick test is suggestive of
infection (positive nitrates or leukocytes) clinical monitoring
team advises the person with dementia/carer to visit the GP the
same day to obtain a prescription for antibiotics. Monitoring
Team also informs the GP of test results and requesting for
antibiotics to be prescribed.
One potential criticism of our UTI intervention algorithm
could be the possibility of antibiotic over-prescribing con-
tributing to the spread of antibiotic resistance. However, recent
evidence demonstrates that in elderly patients with a diagnosis
of UTI in primary care, no antibiotics and delayed antibiotics
are associated with a significant increase in bloodstream
infection and all-cause mortality compared with immediate
treatment [58]. Therefore, early prescription of antibiotics for
this vulnerable group of older adults is advised in view of
their increased susceptibility to sepsis after UTI and despite a
growing pressure to reduce inappropriate antibiotic use.
The impact of our in-home monitoring technologies and the
embedded machine learning models on clinical outcomes in-
cluding hospitalisation, institutionalisation and mortality rates
is part of an ongoing study. Nevertheless, the current work
demonstrates the effectiveness of the proposed algorithm and
its translation into real-life clinical interventions. Fig 8 illus-
trates individual cases of agitation and UTI correctly identified
by the algorithm, with the digital markers demonstrating a
behavioural anomaly.
VI. CONCLUSION
To avoid unplanned hospital admissions and provide early
clues to detect the risk of agitations and infections, we
collected the daily activity data and vital signs by in-home
sensory devices. The noise and redundant information in the
data lead to inaccuracy predictions for the traditional machine
learning algorithms. Furthermore, the traditional machine
learning models cannot give explanation of the predictions.
To address these issues, we proposed a model that can not
only outperform the traditional machine learning methods but
also provide the explanation of the predictions. The proposed
rationalising block, which is based on the rational and
attention mechanism, can process healthcare time-series data
by filtering the redundant and less informative information.
Furthermore, the filtered data can be regarded as the important
information to support clinical treatment. We also demonstrate
the focal loss can help to improve the performance on the
imbalanced clinical dataset and attention-based models can
be used effectively in healthcare data analysis. The evaluation
shows the effectiveness of the model in a real-world clinical
dataset and describes how it is used to support people with
dementia.
ACKNOWLEDGMENT
This research is funded by the UK Medical Research Coun-
cil (MRC), Alzheimer’s Society and Alzheimer’s Research UK
and supported by the UK Dementia Research Institute.
10
REFERENCES
[1] “World health organization. dementia: a public health priority,” https:
//www.who.int/mental_health/publications/dementia_report_2012/en/,
2012.
[2] “Alzheimer’s society: dementia support and research charity,” https://
www.alzheimers.org.uk/.
[3] G. Livingston, J. Huntley, A. Sommerlad, D. Ames, C. Ballard, S. Baner-
jee, C. Brayne, A. Burns, J. Cohen-Mansfield, C. Cooper et al.,
“Dementia prevention, intervention, and care: 2020 report of the lancet
commission,” The Lancet, vol. 396, no. 10248, pp. 413–446, 2020.
[4] J. Pickett, C. Bird, C. Ballard, S. Banerjee, C. Brayne, K. Cowan,
L. Clare, A. Comas-Herrera, L. Corner, S. Daley et al., “A roadmap
to advance dementia research in prevention, diagnosis, intervention, and
care by 2025,” International journal of geriatric psychiatry, vol. 33,
no. 7, pp. 900–906, 2018.
[5] A. Feast, M. Orrell, G. Charlesworth, N. Melunsky, F. Poland, and
E. Moniz-Cook, “Behavioural and psychological symptoms in dementia
and the challenges for family carers: systematic review,” The British
Journal of Psychiatry, vol. 208, no. 5, pp. 429–434, 2016.
[6] G. T. Buhr, M. Kuchibhatla, and E. C. Clipp, “Caregivers’ reasons for
nursing home placement: clues for improving discussions with families
prior to the transition,” The Gerontologist, vol. 46, no. 1, pp. 52–61,
2006.
[7] B. C. Peach, G. J. Garvan, C. S. Garvan, and J. P. Cimiotti, “Risk
factors for urosepsis in older adults: a systematic review,” Gerontology
and geriatric medicine, vol. 2, p. 2333721416638980, 2016.
[8] S. Tal, V. Guller, S. Levi, R. Bardenstein, D. Berger, I. Gurevich, and
A. Gurevich, “Profile and prognosis of febrile elderly patients with
bacteremic urinary tract infection,” Journal of Infection, vol. 50, no. 4,
pp. 296–305, 2005.
[9] C. Fogg, P. Griffiths, P. Meredith, and J. Bridges, “Hospital outcomes
of older people with cognitive impairment: An integrative review,”
International journal of geriatric psychiatry, vol. 33, no. 9, pp. 1177–
1197, 2018.
[10] S. Enshaeifar, P. Barnaghi, S. Skillman, D. Sharp, R. Nilforooshan, and
H. Rostill, “A digital platform for remote healthcare monitoring,” in
Companion Proceedings of the Web Conference, 2020.
[11] S. Majumder, E. Aghayi, M. Noferesti, H. Memarzadeh-Tehran, T. Mon-
dal, Z. Pang, and M. J. Deen, “Smart homes for elderly health-
care—recent advances and research challenges,Sensors, vol. 17, no. 11,
p. 2496, 2017.
[12] R. Turjamaa, A. Pehkonen, and M. Kangasniemi, “How smart homes
are used to support older people: An integrative review,” International
Journal of Older People Nursing, vol. 14, no. 4, p. e12260, 2019.
[13] K. K. Peetoom, M. A. Lexis, M. Joore, C. D. Dirksen, and L. P. De Witte,
“Literature review on monitoring technologies and their outcomes in
independently living elderly people,Disability and Rehabilitation:
Assistive Technology, vol. 10, no. 4, pp. 271–294, 2015.
[14] R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep patient: an
unsupervised representation to predict the future of patients from the
electronic health records,” Scientific reports, vol. 6, no. 1, pp. 1–10,
2016.
[15] C. S. Ross-Innes, H. Chettouh, A. Achilleos, N. Galeano-Dalmau,
I. Debiram-Beecham, S. MacRae, P. Fessas, E. Walker, S. Varghese,
T. Evan et al., “Risk stratification of barrett’s oesophagus using a non-
endoscopic sampling method coupled with a biomarker panel: a cohort
study,The lancet Gastroenterology & hepatology, vol. 2, no. 1, pp.
23–31, 2017.
[16] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnose
with lstm recurrent neural networks,” arXiv preprint arXiv:1511.03677,
2015.
[17] C. Esteban, O. Staeck, S. Baier, Y. Yang, and V. Tresp, “Predicting
clinical events by combining static and dynamic information using
recurrent neural networks,” in 2016 IEEE International Conference on
Healthcare Informatics (ICHI). IEEE, 2016, pp. 93–101.
[18] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, “Doctor
ai: Predicting clinical events via recurrent neural networks,” in Machine
Learning for Healthcare Conference, 2016, pp. 301–318.
[19] I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou,
“Patient subtyping via time-aware lstm networks,” in Proceedings of the
23rd ACM SIGKDD international conference on knowledge discovery
and data mining, 2017, pp. 65–74.
[20] H. Harutyunyan, H. Khachatrian, D. C. Kale, G. Ver Steeg, and
A. Galstyan, “Multitask learning and benchmarking with clinical time
series data,” Scientific data, vol. 6, no. 1, pp. 1–18, 2019.
[21] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart,
“Retain: An interpretable predictive model for healthcare using reverse
time attention mechanism,” in Advances in Neural Information Process-
ing Systems, 2016, pp. 3504–3512.
[22] F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao, “Dipole: Diagnosis
prediction in healthcare via attention-based bidirectional recurrent neural
networks,” in Proceedings of the 23rd ACM SIGKDD international
conference on knowledge discovery and data mining. ACM, 2017,
pp. 1903–1911.
[23] T. Bai, S. Zhang, B. L. Egleston, and S. Vucetic, “Interpretable rep-
resentation learning for healthcare via capturing disease progression
through time,” in Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. ACM, 2018, pp.
43–51.
[24] L. Ma, J. Gao, Y. Wang, C. Zhang, J. Wang, W. Ruan, W. Tang, X. Gao,
and X. Ma, “Adacare: Explainable clinical health status representation
learning via scale-adaptive feature extraction and recalibration,arXiv
preprint arXiv:1911.12205, 2019.
[25] H. Song, D. Rajan, J. J. Thiagarajan, and A. Spanias, “Attend and
diagnose: Clinical time series analysis using attention models,” in Thirty-
second AAAI conference on artificial intelligence, 2018.
[26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances
in neural information processing systems, 2017, pp. 5998–6008.
[27] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber et al., “Gradient
flow in recurrent nets: the difficulty of learning long-term dependencies,
2001.
[28] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with
class imbalance,” Journal of Big Data, vol. 6, no. 1, p. 27, 2019.
[29] S. Jain and B. C. Wallace, “Attention is not explanation,” arXiv preprint
arXiv:1902.10186, 2019.
[30] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:
synthetic minority over-sampling technique,Journal of artificial intel-
ligence research, vol. 16, pp. 321–357, 2002.
[31] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for
class-imbalance learning,” IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550, 2008.
[32] B. Krawczyk, “Learning from imbalanced data: open challenges and
future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp.
221–232, 2016.
[33] B. E. Lyons, D. Austin, A. Seelye, J. Petersen, J. Yeargers, T. Riley,
N. Sharma, N. Mattek, H. Dodge, K. Wild et al., “Corrigendum: Perva-
sive computing technologies to continuously assess alzheimer’s disease
progression and intervention efficacy,” Frontiers in aging neuroscience,
vol. 7, p. 232, 2015.
[34] A. Akl, B. Taati, and A. Mihailidis, “Autonomous unobtrusive detection
of mild cognitive impairment in older adults,IEEE transactions on
biomedical engineering, vol. 62, no. 5, pp. 1383–1394, 2015.
[35] L. Schwickert, C. Becker, U. Lindemann, C. Maréchal, A. Bourke,
L. Chiari, J. Helbostad, W. Zijlstra, K. Aminian, C. Todd et al., “Fall
detection with body-worn sensors,” Zeitschrift für Gerontologie und
Geriatrie, vol. 46, no. 8, pp. 706–719, 2013.
[36] I. Lazarou, A. Karakostas, T. G. Stavropoulos, T. Tsompanidis, G. Med-
itskos, I. Kompatsiaris, and M. Tsolaki, “A novel and intelligent home
monitoring system for care support of elders with cognitive impairment,
Journal of Alzheimer’s Disease, vol. 54, no. 4, pp. 1561–1591, 2016.
[37] A. Bankole, M. Anderson, T. Smith-Jackson, A. Knight, K. Oh, J. Brant-
ley, A. Barth, and J. Lach, “Validation of noninvasive body sensor
network technology in the detection of agitation in dementia,” American
Journal of Alzheimer’s Disease & Other Dementias®, vol. 27, no. 5, pp.
346–354, 2012.
[38] T. Fleiner, P. Haussermann, S. Mellone, and W. Zijlstra, “Sensor-based
assessment of mobility-related behavior in dementia: feasibility and
relevance in a hospital context,International Psychogeriatrics, vol. 28,
no. 10, p. 1687, 2016.
[39] M. J. Rantz, M. Skubic, R. J. Koopman, L. Phillips, G. L. Alexander,
S. J. Miller, and R. D. Guevara, “Using sensor networks to detect
urinary tract infections in older adults,” in 2011 IEEE 13th International
Conference on e-Health Networking, Applications and Services. IEEE,
2011, pp. 142–149.
[40] S. Enshaeifar, A. Zoha, S. Skillman, A. Markides, S. T. Acton, T. El-
saleh, M. Kenny, H. Rostill, R. Nilforooshan, and P. Barnaghi, “Machine
learning methods for detecting urinary tract infection and analysing daily
living activities in people with dementia,PloS one, vol. 14, no. 1, p.
e0209909, 2019.
[41] C. Wu and M. E. Thompson, “Stratified sampling and cluster sampling,
in Sampling Theory and Practice. Springer, 2020, pp. 33–56.
11
[42] D. W. Aha and R. L. Bankert, “A comparative evaluation of sequential
feature selection algorithms,” in Learning from data. Springer, 1996,
pp. 199–206.
[43] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
for dense object detection,” in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2980–2988.
[44] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473,
2014.
[45] M. Usama, B. Ahmad, W. Xiao, M. S. Hossain, and G. Muhammad,
“Self-attention based recurrent convolutional neural network for disease
prediction using healthcare data,” Computer methods and programs in
biomedicine, vol. 190, p. 105191, 2020.
[46] A. Galassi, M. Lippi, and P. Torroni, “Attention, please! a critical
review of neural attention models in natural language processing,arXiv
preprint arXiv:1902.02181, 2019.
[47] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[48] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv
preprint arXiv:1607.06450, 2016.
[49] S. Sperandei, “Understanding logistic regression analysis,” Biochemia
medica: Biochemia medica, vol. 24, no. 1, pp. 12–18, 2014.
[50] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:
Continual prediction with lstm,” 1999.
[51] M. H. Hassoun et al.,Fundamentals of artificial neural networks. MIT
press, 1995.
[52] T. A. Lasko, J. C. Denny, and M. A. Levy, “Computational phenotype
discovery using unsupervised feature learning over noisy, sparse, and
irregular clinical data,” PloS one, vol. 8, no. 6, 2013.
[53] Z. Che, D. Kale, W. Li, M. T. Bahadori, and Y. Liu, “Deep computational
phenotyping,” in Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2015, pp. 507–
516.
[54] W. D. Heaven, “Google’s medical ai was super accurate in a lab. real
life was a different story,” https://www.technologyreview.com/2020/
04/27/1000658/\google-medical- ai-accurate- lab-real- life-clinic- covid-\
diabetes-retina- disease/, April 2020.
[55] C. Duff et al., “Dementia: assessment, management and support for
people living with dementia and their carers,” 2018.
[56] E. Ijaopo, “Dementia-related agitation: a review of non-pharmacological
interventions and analysis of risks and benefits of pharmacotherapy,
Translational psychiatry, vol. 7, no. 10, pp. e1250–e1250, 2017.
[57] M. Lutters and N. B. Vogt-Ferrier, “Antibiotic duration for treating
uncomplicated, symptomatic lower urinary tract infections in elderly
women,” Cochrane Database of Systematic Reviews, no. 3, 2008.
[58] M. Gharbi, J. H. Drysdale, H. Lishman, R. Goudie, M. Molokhia, A. P.
Johnson, A. H. Holmes, and P. Aylin, “Antibiotic management of urinary
tract infection in elderly patients in primary care and its association with
bloodstream infections and all cause mortality: population based cohort
study,bmj, vol. 364, 2019.
... Heat maps have also been used in conjunction with deep learning techniques to determine the probability of agitation-or UTI-related anomalies. In the study by Li et al [26], hourly heat maps based on raw sensor data were encoded via positional encoding to extract relevant time steps that were then passed into a long short-term memory model to extract relevant data and into an attention-based model to make predictions. This method uses supervised learning and, as is common with deep learning models, is computationally expensive and requires sufficient training data for accurate risk analysis and predictions. ...
Article
Full-text available
Background: Sensor-based remote health monitoring can be used for the timely detection of health deterioration in people living with dementia with minimal impact on their day-to-day living. Anomaly detection approaches have been widely applied in various domains, including remote health monitoring. However, current approaches are challenged by noisy, multivariate data and low generalizability. Objective: This study aims to develop an online, lightweight unsupervised learning-based approach to detect anomalies representing adverse health conditions using activity changes in people living with dementia. We demonstrated its effectiveness over state-of-the-art methods on a real-world data set of 9363 days collected from 15 participant households by the UK Dementia Research Institute between August 2019 and July 2021. Our approach was applied to household movement data to detect urinary tract infections (UTIs) and hospitalizations. Methods: We propose and evaluate a solution based on Contextual Matrix Profile (CMP), an exact, ultrafast distance-based anomaly detection algorithm. Using daily aggregated household movement data collected via passive infrared sensors, we generated CMPs for location-wise sensor counts, duration, and change in hourly movement patterns for each patient. We computed a normalized anomaly score in 2 ways: by combining univariate CMPs and by developing a multidimensional CMP. The performance of our method was evaluated relative to Angle-Based Outlier Detection, Copula-Based Outlier Detection, and Lightweight Online Detector of Anomalies. We used the multidimensional CMP to discover and present the important features associated with adverse health conditions in people living with dementia. Results: The multidimensional CMP yielded, on average, 84.3% recall with 32.1 alerts, or a 5.1% alert rate, offering the best balance of recall and relative precision compared with Copula-Based and Angle-Based Outlier Detection and Lightweight Online Detector of Anomalies when evaluated for UTI and hospitalization. Midnight to 6 AM bathroom activity was shown to be the most important cross-patient digital biomarker of anomalies indicative of UTI, contributing approximately 30% to the anomaly score. We also demonstrated how CMP-based anomaly scoring can be used for a cross-patient view of anomaly patterns. Conclusions: To the best of our knowledge, this is the first real-world study to adapt the CMP to continuous anomaly detection in a health care scenario. The CMP inherits the speed, accuracy, and simplicity of the Matrix Profile, providing configurability, the ability to denoise and detect patterns, and explainability to clinical practitioners. We addressed the need for anomaly scoring in multivariate time series health care data by developing the multidimensional CMP. With high sensitivity, a low alert rate, better overall performance than state-of-the-art methods, and the ability to discover digital biomarkers of anomalies, the CMP is a clinically meaningful unsupervised anomaly detection technique extensible to multimodal data for dementia and other health care scenarios.
Conference Paper
Full-text available
Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is verifiable by clinical experts.
Article
Full-text available
Background: The number of healthy older people is increasing, and most of them want to live in their own homes for as long as possible. Smart home technology can support living at home, but synthetised knowledge of previous studies about their suitability for the everyday lives of older people is rare. Methods: Data for this integrated review were obtained by searching the PubMed, CINAHL and Scopus databases from 2012 to 2019, based on inclusion and exclusion criteria, and then carrying out quality appraisals of the papers that were selected. Results: We identified 944 papers, and 16 were included in the review. According to our analysis, smart home solutions for older people focused on devices for daily and healthy living and older people's safety. The smart home solutions they discussed were used to help older people carry out everyday activities and lead healthier and more fulfilled lives, by improving their physical safety and social communication. Older people reported that smart homes improved their sense of security, quality of daily life and activities and provided them with information about the care they could receive. However, research on older people playing an active role in developing smart home technology was lacking. Conclusion: The existing literature focused on evaluating daily activities with routine measurements. There has been a lack of research that has focused on older people's experiences as the end users of this technology. However, the papers lacked data on how older people could maintain their social relationships and become more proactive in daily living. Implications for practice: With further development, smart homes can be used to support older people to perform daily activities and help them maintain their social relationships. These steps will ensure that they can continue to live independently in their own homes for longer.
Article
Full-text available
The purpose of this study is to examine existing deep learning techniques for addressing class imbalanced data. Effective classification with imbalanced data is an important area of research, as high class imbalance is naturally inherent in many real-world applications, e.g., fraud detection and cancer detection. Moreover, highly imbalanced data poses added difficulty, as most learners will exhibit bias towards the majority class, and in extreme cases, may ignore the minority class altogether. Class imbalance has been studied thoroughly over the last two decades using traditional machine learning models, i.e. non-deep learning. Despite recent advances in deep learning, along with its increasing popularity, very little empirical work in the area of deep learning with class imbalance exists. Having achieved record-breaking performance results in several complex domains, investigating the use of deep neural networks for problems containing high levels of class imbalance is of great interest. Available studies regarding class imbalance and deep learning are surveyed in order to better understand the efficacy of deep learning when applied to class imbalanced data. This survey discusses the implementation details and experimental results for each study, and offers additional insight into their strengths and weaknesses. Several areas of focus include: data complexity, architectures tested, performance interpretation, ease of use, big data application, and generalization to other domains. We have found that research in this area is very limited, that most existing work focuses on computer vision tasks with convolutional neural networks, and that the effects of big data are rarely considered. Several traditional methods for class imbalance, e.g. data sampling and cost-sensitive learning, prove to be applicable in deep learning, while more advanced methods that exploit neural network feature learning abilities show promising results. The survey concludes with a discussion that highlights various gaps in deep learning from class imbalanced data for the purpose of guiding future research.
Article
Full-text available
Objective To evaluate the association between antibiotic treatment for urinary tract infection (UTI) and severe adverse outcomes in elderly patients in primary care. Design Retrospective population based cohort study. Setting Clinical Practice Research Datalink (2007-15) primary care records linked to hospital episode statistics and death records in England. Participants 157 264 adults aged 65 years or older presenting to a general practitioner with at least one diagnosis of suspected or confirmed lower UTI from November 2007 to May 2015. Main outcome measures Bloodstream infection, hospital admission, and all cause mortality within 60 days after the index UTI diagnosis. Results Among 312 896 UTI episodes (157 264 unique patients), 7.2% (n=22 534) did not have a record of antibiotics being prescribed and 6.2% (n=19 292) showed a delay in antibiotic prescribing. 1539 episodes of bloodstream infection (0.5%) were recorded within 60 days after the initial UTI. The rate of bloodstream infection was significantly higher among those patients not prescribed an antibiotic (2.9%; n=647) and those recorded as revisiting the general practitioner within seven days of the initial consultation for an antibiotic prescription compared with those given a prescription for an antibiotic at the initial consultation (2.2% v 0.2%; P=0.001). After adjustment for covariates, patients were significantly more likely to experience a bloodstream infection in the deferred antibiotics group (adjusted odds ratio 7.12, 95% confidence interval 6.22 to 8.14) and no antibiotics group (8.08, 7.12 to 9.16) compared with the immediate antibiotics group. The number needed to harm (NNH) for occurrence of bloodstream infection was lower (greater risk) for the no antibiotics group (NNH=37) than for the deferred antibiotics group (NNH=51) compared with the immediate antibiotics group. The rate of hospital admissions was about double among cases with no antibiotics (27.0%) and deferred antibiotics (26.8%) compared with those prescribed immediate antibiotics (14.8%; P=0.001). The risk of all cause mortality was significantly higher with deferred antibiotics and no antibiotics than with immediate antibiotics at any time during the 60 days follow-up (adjusted hazard ratio 1.16, 95% confidence interval 1.06 to 1.27 and 2.18, 2.04 to 2.33, respectively). Men older than 85 years were particularly at risk for both bloodstream infection and 60 day all cause mortality. Conclusions In elderly patients with a diagnosis of UTI in primary care, no antibiotics and deferred antibiotics were associated with a significant increase in bloodstream infection and all cause mortality compared with immediate antibiotics. In the context of an increase of Escherichia coli bloodstream infections in England, early initiation of recommended first line antibiotics for UTI in the older population is advocated.
Article
Full-text available
Dementia is a neurological and cognitive condition that affects millions of people around the world. At any given time in the United Kingdom, 1 in 4 hospital beds are occupied by a person with dementia, while about 22% of these hospital admissions are due to preventable causes. In this paper we discuss using Internet of Things (IoT) technologies and in-home sensory devices in combination with machine learning techniques to monitor health and well-being of people with dementia. This will allow us to provide more effective and preventative care and reduce preventable hospital admissions. One of the unique aspects of this work is combining environmental data with physiological data collected via low cost in-home sensory devices to extract actionable information regarding the health and well-being of people with dementia in their own home environment. We have worked with clinicians to design our machine learning algorithms where we focused on developing solutions for real-world settings. In our solutions, we avoid generating too many alerts/alarms to prevent increasing the monitoring and support workload. We have designed an algorithm to detect Urinary Tract Infections (UTI) which is one of the top five reasons of hospital admissions for people with dementia (around 9% of hospital admissions for people with dementia in the UK). To develop the UTI detection algorithm, we have used a Non-negative Matrix Factorisation (NMF) technique to extract latent factors from raw observation and use them for clustering and identifying the possible UTI cases. In addition, we have designed an algorithm for detecting changes in activity patterns to identify early symptoms of cognitive decline or health decline in order to provide personalised and preventative care services. For this purpose, we have used an Isolation Forest (iForest) technique to create a holistic view of the daily activity patterns. This paper describes the algorithms and discusses the evaluation of the work using a large set of real-world data collected from a trial with people with dementia and their caregivers.
Chapter
Complex survey designs involve at least one of the three features: (i) stratification; (ii) clustering; and (iii) unequal probability selection of units. In this chapter we provide some basic results on stratified sampling and cluster sampling. In Sect. 3.5 we provide a brief discussion on stratified two-stage cluster sampling, which reveals the notational complexities for complex surveys. General unequal probability sampling methods will be discussed in the next chapter.
Article
Background and objective: Nowadays computer-aided disease diagnosis from medical data through deep learning methods has become a wide area of research. Existing works of analyzing clinical text data in the medical domain, which substantiate useful information related to patients with disease in large quantity, benefits early-stage disease diagnosis. However, benefits of analysis not achieved well when the traditional rule-based and classical machine learning methods used; which are unable to handle the unstructured clinical text and only a single method is not able to handle all challenges related to the analysis of the unstructured text, Moreover, the contribution of all words in clinical text is not the same in the prediction of disease. Therefore, there is a need to develop a neural model which solve the above clinical application problems, is an interesting topic which needs to be explored. Methods: Thus considering the above problems, first, this paper present self-attention based recurrent convolutional neural network (RCNN) model using real-life clinical text data collected from a hospital in Wuhan, China. This model automatically learns high-level semantic features from clinical text by using bi-direction recurrent connection within convolution. Second, to deal with other clinical text challenges, we combine the ability of RCNN with the self-attention mechanism. Thus, self-attention gets the focus of the model on essential convolve features which have effective meaning in the clinical text by calculating the probability of each convolve feature through softmax. Results: The proposed model is evaluated on real-life hospital dataset and used measurement metrics as Accuracy and recall. Experiment results exhibit that the proposed model reaches up to accuracy 95.71%, which is better than many existing methods for cerebral infarction disease. Conclusions: This article presented the self-attention based RCNN model by combining the RCNN with self-attention mechanism for prediction of cerebral infarction disease. The obtained results show that the presented model better predict the cerebral infarction disease risk compared to many existing methods. The same model can also be used for the prediction of other disease risks.
Article
For the first time in 12 years, NICE has updated its guideline on the care of people living with dementia. This article summarises the key recommendations of the new guidance and how it differs from the 2006 document.