Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
1
An attention model to analyse the risk of agitation
and urinary tract infections in people with dementia
Honglin Li §, Roonak Rezvani §, Magdalena Anita Kolanko §, David J. Sharp, Maitreyee Wairagkar, Ravi
Vaidyanathan, Ramin Nilforooshan, Payam Barnaghi
Abstract—Behavioural symptoms and urinary tract infections
(UTI) are among the most common problems faced by people
with dementia. One of the key challenges in the management
of these conditions is early detection and timely intervention in
order to reduce distress and avoid unplanned hospital admissions.
Using in-home sensing technologies and machine learning models
for sensor data integration and analysis provides opportunities
to detect and predict clinically significant events and changes
in health status. We have developed an integrated platform to
collect in-home sensor data and performed an observational study
to apply machine learning models for agitation and UTI risk
analysis. We collected a large dataset from 88 participants with
a mean age of 82 and a standard deviation of 6.5 (47 females
and 41 males)to evaluate a new deep learning model that utilises
attention and rational mechanism The proposed solution can
process a large volume of data over a period of time and extract
significant patterns in a time-series data (i.e. attention) and use
the extracted features and patterns to train risk analysis models
(i.e. rational). The proposed model can explain the predictions
by indicating which time-steps and features are used in a long
series of time-series data. The model provides a recall of 91%
and precision of 83% in detecting the risk of agitation and
UTIs. This model can be used for early detection of conditions
such as UTIs and managing of neuropsychiatric symptoms such
as agitation in association with initial treatment and early
intervention approaches. In our study we have developed a
set of clinical pathways for early interventions using the alerts
generated by the proposed model and a clinical monitoring team
has been set up to use the platform and respond to the alerts
according to the created intervention plans.
I. INTRODUCTION
DEMENTIA affects 850,000 people in the UK and over
50 million globally, and is set to become the developed
world’s largest socioeconomic healthcare burden over coming
decades [1], [2]. In the absence of any current treatment,
there is an urgent need to focus on reducing the effects of
symptoms and help to improve the quality of life and well-
being of those already affected [3]. The 2020 report of the
Lancet Commission on dementia prevention, treatment, and
care stresses the importance of individualised interventions
H. Li, M. N.Kolanko, D. J. Sharp, P.Barnaghi are with Department of Brain
Sciences, Imperial College London, W12 0NN, United Kingdom.
R. Rezvani is with Centre for Vision, Speech and Signal Processing,
University of Surrey, Guildford, GU2 7XH, United Kingdom.
M. Wairagkar and R. Vaidyanathan are with Department of Mechanical
Engineering, Imperial College London, SW7 1AL, United Kingdom.
R. Nilforooshan is with Surrey and Borders NHS Foundation Trust,
Leatherhead, KT22 7AD, United Kingdom.
All authors are also with the Care Research and Technology Centre, The
UK Dementia Research Institute (UK DRI).
§these authors contributed equally to this work.
Corresponding author: p.barnaghi@imperial.ac.uk
to address complex medical problems, multimorbidity and
neuropsychiatric symptoms in dementia, which lead to un-
necessary hospital admissions, faster functional decline, and
worse quality of life [4].
People with dementia have complex problems with symp-
toms in many domains. It is estimated that up to 90%
will develop behavioural and physical symptoms of dementia
(BPSD) over the course of their illness, with agitation being
one of the most common symptoms [5], and a frequent reason
for nursing home placement [6]. Furthermore, patients with
dementia often suffer from a number of co-morbid conditions
and have a higher frequency of medical problems such as falls,
incontinence, dehydration or urinary tract infection (UTI) - the
commonest bacterial infection in the older patient population,
and the commonest cause of sepsis in older adults [7] with
an associated in-hospital mortality of 33% in this age group
[8]. If not detected and treated early, both BPSD and medical
comorbidities frequently lead to emergency hospital admis-
sions in dementia patients. Alzheimer’s Research UK estimates
that 20% of hospital admissions in dementia patients are for
preventable conditions, such as urinary tract infections. Be-
sides significant costs, hospitalisation places dementia patients
at risk of serious complications, with longer hospital stays,
higher risk of iatrogenic complications, delayed discharge
and functional decline during admission, which contributes
to higher rates of transfer to residential care and in-patient
mortality [9]. Therefore, increased medical supervision, early
recognition of deterioration in health status and rapid treatment
are key to preventing unnecessary hospitalization for ’ambu-
latory’ conditions, that could be treated outside of hospital,
such as UTIs. Furthermore, ongoing monitoring of people
with dementia allows immediate detection of behavioural
disturbances, enabling earlier psychosocial and environmental
interventions to reduce patients’ distress and prevent further
escalation and hospitalization.
However, monitoring and supporting individuals in an on-
going manner is a resource and cost-intensive task, often not
scalable to larger populations. Utilising remote monitoring
technologies with the help of caregivers can allow creating
practical and generalisable solutions. As part of the research
in the Care Research and Technology Centre at the UK De-
mentia Research Institute (UK DRI), we have been developing
and deploying in-home monitoring technologies to help and
support people affected by dementia. Our research has led to
the development of a digital platform that allows collecting
and integrating in-home observation and measurement data
using network-connected sensory devices [10]. In this paper,
arXiv:2101.07007v1 [cs.AI] 18 Jan 2021
2
we discuss how our in-home monitoring data and machine
learning algorithms are used to detect early symptoms of
agitation and UTI in people with dementia living in their own
homes.
Sensing technologies have been increasingly used to moni-
tor activities and movements of elderly patients living in their
own homes [11], [12], [13]. Interpreting this information; how-
ever, demands considerable human effort, which is not always
feasible. The use of analytical algorithms allows integration
and analysis of rich environmental and physiological data at
scale, enabling rapid detection of clinically significant events
and development of personalized, predictive and preventative
healthcare.
Deep learning models have been applied in a variety of
healthcare scenarios to identify the risk of various clinical con-
ditions or predict outcomes of treatment [14], [15]. Recently,
there have been several implementations of Recurrent Neural
Networks (RNNs) to create learning models for time-series
healthcare data analysis [16], [17], [18]. The behavioural and
physiological symptoms and patterns in long-term conditions
such as dementia appear in the data over a long period
of time and can fluctuate and change over the course of
disease. Machine learning models such as RNNs; however,
are not suitable for analysing long sequences of time-points.
To address the long sequence analysis issue in RNNs, other
methods such as Bidirectional RNN, LSTM and GRU have
been used [19], [20]. There also have been attempts to apply
attention mechanisms to clinical datasets [21], [22], [23], [24],
[25] to improve the performance of analysing imbalanced
and long-tail time-series data. A fundamental limitation in
these models is the adaptivity and generalisability. When
long-distance symptoms and patterns are related to a specific
condition, the generalisability and performance of the existing
models are limited. The long sequences of data points and
the changes in the ongoing conditions vary in patients, and
often there are no large labelled training samples to train the
models for all the variations. Deep learning models offer a
new opportunity to training models that can pay attention to
correlations and long-distance relations between the patterns
and sequences. However, the off-the-shelf and existing deep
learning model require large training samples.
While applying neural networks to clinical data, there are
two main challenges: 1) selecting the important timesteps and
features from long sequences of data to create generalisable
models; and 2) imbalance in datasets. Neural networks are
very effective in finding a trend in datasets. Models such
as Recurrent Networks use the positions of the input and
output sequences to generate a sequence of hidden states. This
is computationally expensive and limited computing of the
global dependencies [26]. In these models, the computational
complexity to relate input or output positions also grows as the
distance between positions increases. This latter makes it very
challenging to learn dependencies and correlations between
long-distance patterns and time points [27].
Additionally, clinical datasets are often imbalanced, with
content spanning ensembles of heterogeneous data. Most of
the clinical datasets contain more normal cases (i.e. True
positives) than abnormal data points (i.e. True Negatives). In
our dataset, which includes a large set of in-home environ-
mental and physiological data from people with dementia, the
number of positive cases for infections is much smaller than
the true negative cases. In large parts of the data, the true
status of the infection is unknown (i.e. the data is partially
labelled due to the limitations in accessing the patients’ clinical
records or knowing the presence of any infections without
a test). This issue causes the learning models to exhibit a
bias towards the majority class. It may ignore the minority
class or make a decision based on a partial set which is not
a broad representation of the cases [28]. There have been
several works on implementing attention mechanisms [26] to
improve the generalisability of learning models in analysing
time-series data. However, Jian et. al [29] found that there are
limitations in the weights generated by attention-based models
which can lead to wrong predictions. Hence, we need to be
more cautious in using the attention mechanisms and their
explanations in designing deep learning models. While the
attention-based models are promising in healthcare time-series
data analysis, considering the time and features dependencies
of the predictions poses a challenge for this type of models.
Over-sampling which augments the data by generating syn-
thetic samples [30], down-sampling which prunes the samples
in the majority classes are among the typical models that are
used to deal with the imbalance issues in datasets [31]. How-
ever, samples in clinical data and variations in the real-data are
important aspects of the observations and measurements that
may not be present in augmented data generated by sampling
methods. It is crucial to find an efficient way to address the
imbalance issue without modifying or reducing the original
data in pre-processing steps [32].
Our goal is to propose a model to address the challenges
mentioned above. To support the clinical treatment and adapt
to the real-world sensory data readings, the model should
filter the redundant and less informative data. Furthermore,
the model can explain the predictions by telling us which time
periods and sensors are important to give the predictions. Last
but not least, the model can adapt to the imbalanced data.
II. DESIGN,SETTING AND PARTICIPANTS
Real-time, continuous measurement methodologies enabled
by the recent advances in pervasive computing and ‘smart-
home’ technologies provide opportunities to monitor the be-
haviour and health status of elderly people using wearable
technology or environmental sensors [11], [12], [13].
Computer-derived algorithms have been developed to anal-
yse sensor data and identify patterns of activity over time.
These can be applied to detect changes in activities of daily
living in order to predict disease progression and cognitive
decline. For instance, ORCATECH group used continuous in-
home monitoring system and pervasive computing technolo-
gies to track activities and behaviours such as sleep, computer
use, medication adherence to capture changes in cognitive
status [33]. They also demonstrated the ability of machine
learning algorithms to autonomously detect mild cognitive
impairment in older adults [34]. Machine learning models
have also been used to detect clinically significant events
3
Fig. 1: An overview of the proposed solution for healthcare data analysis. The data is encoded by positional encoding before passing to the
model. The proposed rationalising extract important information and pass to the higher layers. The proposed rationalising block contains a
rational layer to extract important time steps. A Long-Short Term Memory (LSTM) model processes the extracted data. The attention layer
to pay attention to suitable features. The rationalising process of the data changes during the rationalising block. The rationalising block
extracts the important time steps at first. Then it pays attention to different emphasis features of the pruned data. Then the data is given to
make a prediction. All the layers are trained simultaneously.
and changes in health status. Much of the previous work
focused on detection and prediction of falls using wearable
accelerometers or other motion detectors [35], as well as
tracking behavioural symptoms such as sleep disturbances
[36], agitation [37], and wandering [38] in elderly patients.
However, there is limited research on the use of machine
learning models for detection of health changes such as infec-
tion in the context of smart-homes. An early supervised UTI
detection model has been described using in-home PIR sensors
[39], however it relied on the activity labels and annotations in
the training dataset, which is extremely time-consuming and
not generalisable to the real-world situations with large amount
of unlabelled data collected from uncontrolled environments.
We have previously proposed an unsupervised technique that
could learn individual’s movement patterns directly from the
unlabelled PIR sensor data [40].
Furthermore, the existing research and the data-driven solu-
tions are either applied to small scale pilot studies and do not
provide evidence for scalability and generalisability. They are
also limited in analysing long-term patterns and correlations
that appear in the data. Attention-based models which can
overcome these problems have never been applied to sensor
data for detecting clinically significant events or changes in
health status in dementia patients.
This is the first to use deep learning and attention-based
methods to perform risk analysis for behavioural symptoms
and health conditions such as UTIs in people living with
dementia. The proposed model improves the accuracy and
generalisability of machine learning models that use imbal-
anced and noisy in-home sensory data for the risk analysis.
An analysis of the suitability of the digital markers and the
use of in-home sensory data is explored in an ablation study.
The proposed model is compared with several baseline models
and state-of-the-art methods. The proposed approach has been
evaluated in an observational clinical study. Participants (n=88,
age=81 +/- 6.5) were recruited for a six months trial period.
The proposed solution provides a recall of 91% and precision
of 83% in detecting the risk of agitation and UTIs. We have
also set up a framework and a clinical response team that use
the risk alerts generated by the models for ongoing support and
management of the conditions in people living with dementia.
Using high-resolution in-home observation and measure-
ment data in association with advance machine learning
methods leads to early and timely interventions and has a
significant impact on reducing preventable and unplanned
hospital admissions in people affected with dementia. A key
challenge in using analytical and predictive models for risk
analysis is identifying and collecting digital markers data using
in-home sensory devices. The capacity of the proposed model
to address time-series feature identification and data imbalance
enables use in a very wide range of healthcare and risk analysis
applications using in-home digital markers.
III. METHOD
We introduce a model that can identify the important
time steps and features and utilise long-distance dependencies
to make better predictions. The proposed model provides a
prediction based on the selected time points and the selected
features from the raw observation and measurement data.
Figure 1 shows how the data changes during the processing.
The model selects important time steps through a pruning
process. After pruning the data, it pays attention to different
features and uses them to make the predictions. Different
from methods such as clustering sampling [41], we select the
4
Fig. 2: Visualisation of the sensor readings. The x-axis represents the time of the day for activation of the sensors. The y-axis represents
the days for a period of 8 months for a patient. Each colour represents a type of an environmental activity sensor. Similar colour along the
y-axis represent similar patterns of activities around the same time in consecutive days. The more colour distortion/merge of colours along
the y-axis represent more changes in pattern of activity over time.
important time steps of each sample instead of selecting a
portion of samples for training. In contrast to statistic feature
selection methods such as sequential feature selection [42], the
proposed model selects important time steps based on different
data. We use focal loss [43] to assign priority to the minority
class without generating synthetic samples.
Fig. 3: A heat-map of the aggragation of the raw data. The readings
are aggregated per hour within each day.
Data sources and pre-processing
We have collected the data as part of an observational
clinical study in people living with dementia from December
2018 to April 2020. Each of the participants has had a
confirmed diagnosis of dementia (mild to severe) within, at
least, the past three months of recruitment and have been
stable on dementia medication. The collected data contains
continuous environmental sensor data from houses of patients
with dementia who live in the UK. The sensors include Passive
Infra-Red (PIR), smart power plugs, motion and door produced
by Develco in Aarhus, Denmark. The sensors were installed
in the bathroom, hallway, bedroom, living room (or lounge)
and kitchen in the homes and also on the fridge door, kettle
and microwave (or toaster). The sensors also include network-
connected physiological monitoring devices that are used for
submitting daily measurements of vital signs, weight and
hydration. The data is integrated into a digital platform, which
is designed in collaboration with clinicians and user group to
support the people with dementia, that we have developed in
our past research [10]. A clinical monitoring team that is set
up as part of our observational study has used the platform to
daily annotate the data and very the risk analysis alert. Based
on the annotations, we select four incidents including agitation,
Urinary Tract Infection (UTI), abnormal blood pressure and
abnormal body temperature to label our data binarily. More
specifically, a label is set to true when the abnormal incident
is verified by the monitoring team and vice versa. We then
use the environmental data to inference if there is any incident
happen within one day. Fig 2 shows an example of collected
data. To pre-process the data, we aggregate the readings of the
sensors within each hour of the day, shown in Fig 3. Appendix
1 shows a list of potential digital markers and sensory data that
can be used in dementia care. In the appendix, we also show a
screenshot of the platform that is used for collecting the data.
Machine learning model
We aim to use the environmental sensors to predict possible
incidents and avoid delayed treatment. Furthermore, the model
should provide the reason, i.e. which period of time and
sensors are important to give the predictions, to explain the
inference. In other words, the model can remove the redundant
5
or less informative information and use the rest of the data to
give the prediction, shown in Fig 4.
Fig. 4: Selected time steps from the raw data. These time steps are
selected by the model. The model learns to identify time steps that
are more important in predicting the outcome.
As discussed earlier, in healthcare data analysis, often, the
predictions are based on a long sequence of data measured
and collected at different time-points. Accessing and feeding
more data helps to train more accurate models. However, more
information can also mean more noise in the data, and the
imbalance in the samples that are given to the model can also
lead to decision bias. An efficient model should be able to
process and utilise as much data as available. However, the
model should also avoid the common pitfalls of noise and bias.
To address these issues, we have studied the use of attention-
based models. This group of models will utilise all the avail-
able information and, in each sequence, will identify the time-
points that provide the most information to the training and
prediction. This attention and selection process is an embedded
step in the model. It will allow the model to be flexible and
generalisable for different sequences with variable lengths and
for a different combination of features and values that are
represented in the data. Before explaining our proposed models
and its contributions to creating a generalisable solution for
time-series healthcare data analysis, we provide an overview
of the related work. We discuss the use of attention-based
models in other domains and explain how the ideas presented
in the existing work has led to the design of our current model.
Fig. 5: After selecting the important time steps, the model learns
which sensors should be attention. In this case, the model think the
bathroom sensor has the most contribution the prediction.
The attention mechanisms have been introduced in Neural
Language Processing (NLP) by Bahdanau et. al [44]. The
attention-based models are widely used in NLP due to their
capability of detecting important parts of a sequence and
efficiently interpreting it. The attention-based models have also
been used in continuous healthcare and clinical data analysis
[45]. Continuous clinical data are multivariate time-series data
with temporal and sequential relationships. For each patient,
the data is a set of time steps, and each time step contains
medical features (X∈Rt×d). REverse Time AttentIoN model
(RETAIN) is one of the first systems, that used in using
attention mechanism for medical data [21]. In this model,
there are two separate RNNs, one to generate the visit-level
attention weights (α) and the other one for variable-level (β)
attention weights. In this model, the most relevant time step
is the one associated with the largest value in α. Choi et. al
provided a method to find the most influential medical feature
[21]. However, RETAIN cannot handle long-distance depen-
dencies. To deal with this issue, Ma et. al proposed Dipole,
a predictive model for clinical data using Bidirectional RNNs
[22]. They have implemented the model using two different
attention mechanisms: General attention and Concatenation-
based attention. The results show that Concatenation-based
attention outperforms because of incorporating all the long-
distance dependencies.
In the above models, the input layer is simple, and the
data has the same pipeline, but in the Timeline model, Bai
et. al adapted the pipeline of data [23]. They use attention
layer to aggregate the medical features, and by modelling
each disease progression pattern, they find the most important
timesteps. To deal with long-distance dependencies, Timeline
implements Bidirectional LSTMs. One of the recent studies in
this area is AdaCare [24], which uses Gated Recurrent Units
(GRU). AdaCare utilises convolutional structure to extract
all the dependencies in the clinical data. AdaCare showed
promising results in the explainability of the model. The
models mentioned above have been developed based on re-
current networks. However, the sequential aspect of recurrent
models is computationally inefficient. The SAnD model was
developed solely based on multi-head attention mechanism
[25]. Song et. al implemented a positional encoding to include
the sequential order in the model.
The models mentioned above show significant improve-
ments in the accuracy and performance of predictive models
in the clinical field. However, incorporating both long-distance
dependencies and feature associations is a challenging task. In
the existing models, the analysis is either on time step-level or
feature-level. In this paper, we propose a model to detect and
predict the risk of healthcare conditions by analysing long-
distance dependencies in the patterns and sequences of the
data. This information can be useful for clinical experts in
ongoing management of the conditions. The work also helps
to use an automated process to alert the risk of adverse health
conditions and explore the symptoms related to the detected
conditions.
Our proposed model consists of two main components, a
rationalising block and the classification block, as shown in
Figure 1. In a high-level overview, the rational layers select the
6
important time steps and pass to an LSTM layer. The LSTM
layer will ignore the trivial time steps and process the data
for the attention block. The classifier uses these time points
for predictions. After processing by the attention block, the
model will give a prediction. The details of these blocks are
explained in the following sections.
Positional Encoding
To use the order of sequence in the analysis, we add
positional encoding (PE) before passing the data into the
model. We use the sine and cosine positional encoding [26].
Shown in Equation 1, where pos is the position of the time
step, iis the position of the sensor, dis the dimension of each
time step.
P E(pos, 2i) = sin(pos/100002i/d )
P E(pos, 2i+ 1) = cos(pos/100002i/d)(1)
Rationalising Prediction
To add more focus on the time steps in the data that are
more relevant to the predictions, the generator produces a
binary mask to select or ignore a specific time points. For
example: x∈Rk×fcontains ktime point and ffeatures for
each time point, the generator will produce a binary vector
z={z1, z2, . . . , zk}. The ith variable zi∈ {0,1}indicates
whether the ith time point in xis selected or not.
Whether the ith time point is selected or not is a conditional
probability given the input x. We assume that the selection
of each time point is independent. The Generator uses a
probability distribution over the z, which could be a joint
probability of the selections. The joint probability is given
by:
p(z|x) =
k
Y
i=1
p(zi|x)(2)
Classifier
After exploring and selecting the most relevant time points,
we train a classifier to provide the predictions. The trained
classifier contains attention blocks and residual blocks.
Attention block is an application of self-attention mecha-
nism to detect the important features. The attention mecha-
nism detects important parts of a sequence. It has three key
components: the inputs structure, the compatibility function
and the distribution function [46].
There are three inputs in the structure; Keys (K∈Rnk×dk),
Values (V∈Rnv×dv) and Query (Q∈Rnq), where the nis
the dimension of the inputs, the k, v, q are the dimension of the
outputs. They could have different or same sources. If Kand
qcome from the same source, it is self-attention [26]. Kand
Vrepresent input sequence which could be either annotated or
raw data. qillustrates the reference sequence for computing
attention weights. For combining and comparing the qand
Kvalues, compatibility function has been used. Distribution
function computes the attention weights (a∈Rdk) using the
output of compatibility function (c∈Rdk).
We obtain the attention by Equation 3. The Q, K, V are ma-
trices formed by queries, keys and values vectors, respectively.
Since we use the self-attention, the Q, K, V are calculated by
the inputs with different weight matrices.
Attention(Q, K, V ) = softmax(QK T
√dk
)V(3)
The architecture of the attention block is the same described
in [26]. We employ a residual connection [47] followed by a
normalisation layer [48] inside the attention block. Residual
blocks and the output layer process the output of the attention
block.
Objective function
The training samples in healthcare datasets are often imbal-
anced due to the low prevalence and sporadic occurrences. In
other words, some of the classes contain more samples than
others. For example, only 25% of the data we collected are
labelled as positive. More details of the dataset will be clarified
in the following section. To deal with the imbalance issue, we
use focal loss [43] as the objective function of the classifier,
shown in Equation 4:
Lc=−α(1 −p)βlog(p)(4)
where αand βare hyper-parameters to balance the variant of
the focal loss, p=f(x, z)∗y+ (1 −f(x, z)∗(1 −y).f(x, z)
is the probability estimated by the classifier and y∈ {0,1}is
the label of x.
In addition to the loss function used in the classifier, the
generator produces a short rational selection and calculates
the loss. Shown in Equation 5, where the λis the parameter
to weight the selection:
Lg=λ||z|| (5)
We then combine the focal loss and the loss from generator
to construct loss function as shown in Equation 6:
L=X
(x,y)∈D
E[Lc+Lg](6)
IV. RES ULT S
Evaluation Metrics: To evaluate our proposed method and
compare it with the baseline models, we calculated different
metrics. One of the primary metrics to assess the model is
accuracy which is the measure of how close is the predicted
class to the actual class. However, accuracy alone cannot be
a good measure to evaluate the performance of a classifier.
As a result, we also calculated the Area Under the Curve
of Receiver Operating Characteristic (ROC) and Precision-
Recall (PR). The precision of class A is the ratio of samples
predicted as class A which are correct, and Recall is the ratio
of samples as true class A which have been detected. ROC
curve is the measure of model capability in differentiating
between classes. We do not report the results in terms of
specificity and sensitivity. The reason is that in this study, we
do not have access to the full electronic healthcare records
7
(a) PR (b) ROC (c) Loss
Fig. 6: Evaluation of the proposed methods using the in-home sensory dataset. (a) shows the precision; (b) evaluates the Receiver Operating
Characteristics (ROC) curve and (c) shows the changes to the loss during the training. In (a) and (b) the results of the proposed model is
also compared with a set of baseline models.
(a) PR (b) ROC (c) Selection Rate changes
Fig. 7: An ablation study to evaluate the model; (a) shows the precision; (b) evaluates the Receiver Operating Characteristics (ROC) curve
and (c) shows the selection rate changes. In (a) and (b) the results of the evaluation is by eliminating different components from the model.
and hospital admission data of all the participants. So report
the specificity and sensitivity only based on the detected and
evaluated labels in our dataset, which can only be a sub-set
of true and false cases for the cohort, can be misleading in
terms of an actual and generalisable clinical finding. Instead,
we have opted to evaluate the precision and generalisability
of the prediction algorithm based on the existing labelled
data and the known cases that we could evaluate and verify
the performance of the model.
Baseline Models: We compare our model with the Linear
Regression (LR) [49], Long-Short Term Memory (LSTM)
neural networks [50] and a fully connected Neural Network
(NN) model [51].
LR is a discriminative model which can avoid the confound-
ing effects by analysing the association of all variables together
[49]. It is also a commonly used baseline model to evaluate
the performance of the proposed models [20].
NN has the ability to learn a complex relationship. Unlike
LR, NN does not need to assume the variables are linearly
separated. It is also applied to a variety of clinical data sets
[52], [53]. In the experiment, we used a Neural Network with
one hidden layer contains 200 neurons, a softmax output layer
contains two neurons, cross entropy loss and adam optimiser.
LSTM is a powerful neural network to analyse the sequen-
tial data, including time-wised clinical datasets [18], [19].
It can associate the relevant inputs even if they are widely
separated. Since our dataset consists of time-series sequences,
we take the LSTM as another baseline model. In the experi-
ment, we used a model that contains one residual block, one
LSTM layer contains 128 neurons, and a softmax output layer
contains two neurons, cross entropy loss and adam optimiser.
In the experiments, we aggregate the readings of each
sensor per hour. Hence each data point contains 24-time
points and eight features. We set the batch size to 32,
learning rate to 0.0001, sparsity to 0.001. We divide the
data into a train set and a test set. The numbers of training
and testing samples in the datasets are 209 and 103 cases
with their associated time-series data, respectively. The
data is anonymous, and only the anonymous data without
any personally identifiable information is used in this research.
Experiments: The ROC and PR changes during training
are shown in the first two graphs in Figure 6. Overall, the
proposed model outperforms other baseline methods. The
LSTM performs well in dealing with the time-series data.
Compared to the other methods, the neural network converges
much faster. However, the performance of the model fluctuates
around 30 epochs. The convergence and the fluctuation are
due to the rational process. The model has to learn how to
extract important time steps and pay attention to the features.
This process is also reflected in Figure 6c, the loss fluctuates
during that period. However, the model adjusts this fluctuation
automatically and improves the performance. The overall
results are also summarised in Table I.
V. DISCUSSION
Ablation Study: We begin the discussion with an ablation
study. Our model contains five important components:
Rational layers, Attention layers, Residual layers, focal loss
8
TABLE I: The evaluation results in comparison with a set of
baseline models: Linear Regression (LR), Long-Short Term Memory
(LSTM) neural networks and a fully connected Neural Network (NN)
model. Since the dataset is imbalance, we calculated the Area Under
the Curve (AUC) of Receiver Operating Characteristic (ROC) and
Precision-Recall (PR) to evaluate the performance.
LR LSTM NN Proposed method
AUC - PR 0.3472 0.6901 0.5814 0.8313
AUC - RC 0.5919 0.7644 0.7601 0.9131
and positional encoding. We omit each component one at
a time and explore how removing one of the components
will impact the performance of the model. The experiments
are shown in the first two graphs of Figure 7. The orange
line represents the model without rational layer. Although
the performance of the model without rational layer keeps
increasing, it underperforms in others significantly. In other
words, the rational layer plays an important role in the model.
Removing the positional encoding, attention layer, residual
layer, or the focal loss decrease the performance as well.
The performance change caused by omitting each of these
four components are quite similar. As shown in Figure 7,
the positional encoding helps the model to identify relevant
patterns of the data over time and plays an important role in
the performance of the model. The rate of selected timesteps
changes is shown in Figure 7c. The rate of selected timesteps
changes is shown in Figure 7c.
Rationalising prediction: the Rational component helps to
increase the accuracy of the model. Generally, the proposed
rationalising method shows that the model knows which time
steps and features to give the prediction. These patterns and
time steps can also be explored to identify and observer
relevant data and symptoms to a condition in each patient.
Using this component, a personalised set of patterns and
symptoms can be explored for each patient. The last graph
in Figure 7 shows the selection rate changes during the
training phase. The model learns to extract the time steps,
and the accuracy increases after the changes become stable.
As mentioned in the ablation study, after learning to extract
the important time steps, the proposed model outperforms the
baseline models without rational mechanisms. In other words,
the model extracts a sub-set of the time steps (e.g. part of
the time steps are extracted from Figure 3 to Figure 4) to
obtain a better prediction. As the learning process continues,
the model tries different selections and finds the optimised
selection rate. Comparing to other models, the performance
of the proposed model does not decrease during the training.
The model learns to pay attention to the most relevant
segments of the data and consider long-distance dependencies
in the time-series data. In summary, the proposed model
can not only explain the prediction but also abandon the
redundant information in the data automatically. According to
our experiments, the proposed model in average selects 61%
of the time points in the datasets to estimate the predictions.
Pair analysation: We then analyse the rational block pro-
cessing on the positive and negative samples. As shown in
Figure 8, the rational block assigns weights to the positive
and negative samples differently. More specifically, the model
has learnt to extract different amount and series of time steps
based on the inputs. In this case, the model extracts more time
steps on the positive case than the negative case. Furthermore,
the model pays attention differently based on the input data.
In the example above, the model assumes the bathroom is
the most important sensors in the positive samples. However,
the model takes the bathroom and kettle almost as equally
important sensors for predicting the negative case. After the
model pays attention to the sensors of selected time steps, the
classifier gives the predictions correctly.
Translating machine learning research into clinical practice
Improving the quality of life by preventing illness-related
symptoms and negative consequences of dementia has been
set out as a major goal to advance dementia care. Agitation
and infections have been highlighted as areas for priority
development [6]. Our proposed model directly addresses these
priorities in dementia care and intervention by enabling early
detection of agitation and urinary tract infections in re-
mote healthcare monitoring scenario, providing an opportunity
for delivering more personalised, predictive and preventative
healthcare. When applied to real-world clinical dataset in the
context of the current clinical study our proposed algorithm
provided a recall of 91% and precision of 83% in detecting
early signs of agitation and UTI from physiological and
environmental sensor data. A clinical monitoring team verified
the predictions by contacting the patient or carer when an
agitation or UTI alert was generated. A set of clinical pathways
for early interventions has also been developed for the clinical
monitoring team to use when responding to the alerts.
Relevance to patient outcomes: We would like to high-
light an important aspect of using this type of analysis to
evaluate healthcare and patient outcomes. Focusing only on
accuracy as a metric for assessment of the solution within
a specific cohort goes only so far [54]. Large studies and
further experiments with different cohorts and various in-
home deployment settings are required to assess how such
algorithms will perform in the noisy and dynamic real-world
environments. There are several examples of AI and machine
learning algorithms that perform very well in controlled and
laboratory settings, but the real-world experience is different
[54]. In this study, the sensors and data collection happens in
uncontrolled, real-world environment. We have done several
cross-validations, comparison and ablation studies to avoid
overfitting the model and make sure the results are robust
and reproducible. However, further independent trials and
validation studies with larger cohorts are required to transform
the current work into a product that can be used in real-world
clinical and care settings. Another important item is that only
focusing on the accuracy of the algorithm will not give a
complete picture of the real effectiveness and impact of the
solution on patient outcomes.
Our agitation intervention protocol follows all current guide-
lines, which agree that individualised and person-centred non-
9
Fig. 8: Visualisation of the outputs within the rational block. The top figure visualises a sample which is validated with a True incident. The
bottom figure is a sample which is validated with a False incident.
pharmacological therapies are the first-line treatment for agita-
tion in people with dementia [55], [56]. In line with the current
guidelines, the initial assessment explores possible reasons
for patients’ distress and addresses clinical or environmental
causes first. The clinical monitoring team asks a set of stan-
dardised questions to evaluate the symptoms and to help the
carer to identify potential causes of agitation such as pain,
illness, discomfort, hunger, loneliness, boredom or environ-
mental factors (temperature, light, noise level). The recognition
and treatment of possible organic causes or triggering factors
remains the mainstem of the intervention. In particular detec-
tion of delirium and a possible underlying infection is of great
importance and the clinical monitoring team facilitates early
diagnosis and treatment by liaising with the study’s clinical
team and patient’s GP. Finally, the clinical monitoring team
provides psychological support for the caregivers in order to
reduce the caregiver distress. In the future, we are planning
to use multimodal sensor data to improve the classification
of agitation state which will include measuring sound levels
along with activity detected by environmental sensors.
Similarly to the agitation protocol, in case of a UTI alert the
clinical monitoring team first responds by contacting the pa-
tient/carer to evaluate the symptoms. However, the diagnosis of
UTI in dementia patients can be problematic, as these patients
are less likely to present with a typical clinical history and
localised urinary symptoms compared with younger patients
[57]. The team, therefore, arranges a home visit to perform a
dipstick urine test. If the urine dipstick test is suggestive of
infection (positive nitrates or leukocytes) clinical monitoring
team advises the person with dementia/carer to visit the GP the
same day to obtain a prescription for antibiotics. Monitoring
Team also informs the GP of test results and requesting for
antibiotics to be prescribed.
One potential criticism of our UTI intervention algorithm
could be the possibility of antibiotic over-prescribing con-
tributing to the spread of antibiotic resistance. However, recent
evidence demonstrates that in elderly patients with a diagnosis
of UTI in primary care, no antibiotics and delayed antibiotics
are associated with a significant increase in bloodstream
infection and all-cause mortality compared with immediate
treatment [58]. Therefore, early prescription of antibiotics for
this vulnerable group of older adults is advised in view of
their increased susceptibility to sepsis after UTI and despite a
growing pressure to reduce inappropriate antibiotic use.
The impact of our in-home monitoring technologies and the
embedded machine learning models on clinical outcomes in-
cluding hospitalisation, institutionalisation and mortality rates
is part of an ongoing study. Nevertheless, the current work
demonstrates the effectiveness of the proposed algorithm and
its translation into real-life clinical interventions. Fig 8 illus-
trates individual cases of agitation and UTI correctly identified
by the algorithm, with the digital markers demonstrating a
behavioural anomaly.
VI. CONCLUSION
To avoid unplanned hospital admissions and provide early
clues to detect the risk of agitations and infections, we
collected the daily activity data and vital signs by in-home
sensory devices. The noise and redundant information in the
data lead to inaccuracy predictions for the traditional machine
learning algorithms. Furthermore, the traditional machine
learning models cannot give explanation of the predictions.
To address these issues, we proposed a model that can not
only outperform the traditional machine learning methods but
also provide the explanation of the predictions. The proposed
rationalising block, which is based on the rational and
attention mechanism, can process healthcare time-series data
by filtering the redundant and less informative information.
Furthermore, the filtered data can be regarded as the important
information to support clinical treatment. We also demonstrate
the focal loss can help to improve the performance on the
imbalanced clinical dataset and attention-based models can
be used effectively in healthcare data analysis. The evaluation
shows the effectiveness of the model in a real-world clinical
dataset and describes how it is used to support people with
dementia.
ACKNOWLEDGMENT
This research is funded by the UK Medical Research Coun-
cil (MRC), Alzheimer’s Society and Alzheimer’s Research UK
and supported by the UK Dementia Research Institute.
10
REFERENCES
[1] “World health organization. dementia: a public health priority,” https:
//www.who.int/mental_health/publications/dementia_report_2012/en/,
2012.
[2] “Alzheimer’s society: dementia support and research charity,” https://
www.alzheimers.org.uk/.
[3] G. Livingston, J. Huntley, A. Sommerlad, D. Ames, C. Ballard, S. Baner-
jee, C. Brayne, A. Burns, J. Cohen-Mansfield, C. Cooper et al.,
“Dementia prevention, intervention, and care: 2020 report of the lancet
commission,” The Lancet, vol. 396, no. 10248, pp. 413–446, 2020.
[4] J. Pickett, C. Bird, C. Ballard, S. Banerjee, C. Brayne, K. Cowan,
L. Clare, A. Comas-Herrera, L. Corner, S. Daley et al., “A roadmap
to advance dementia research in prevention, diagnosis, intervention, and
care by 2025,” International journal of geriatric psychiatry, vol. 33,
no. 7, pp. 900–906, 2018.
[5] A. Feast, M. Orrell, G. Charlesworth, N. Melunsky, F. Poland, and
E. Moniz-Cook, “Behavioural and psychological symptoms in dementia
and the challenges for family carers: systematic review,” The British
Journal of Psychiatry, vol. 208, no. 5, pp. 429–434, 2016.
[6] G. T. Buhr, M. Kuchibhatla, and E. C. Clipp, “Caregivers’ reasons for
nursing home placement: clues for improving discussions with families
prior to the transition,” The Gerontologist, vol. 46, no. 1, pp. 52–61,
2006.
[7] B. C. Peach, G. J. Garvan, C. S. Garvan, and J. P. Cimiotti, “Risk
factors for urosepsis in older adults: a systematic review,” Gerontology
and geriatric medicine, vol. 2, p. 2333721416638980, 2016.
[8] S. Tal, V. Guller, S. Levi, R. Bardenstein, D. Berger, I. Gurevich, and
A. Gurevich, “Profile and prognosis of febrile elderly patients with
bacteremic urinary tract infection,” Journal of Infection, vol. 50, no. 4,
pp. 296–305, 2005.
[9] C. Fogg, P. Griffiths, P. Meredith, and J. Bridges, “Hospital outcomes
of older people with cognitive impairment: An integrative review,”
International journal of geriatric psychiatry, vol. 33, no. 9, pp. 1177–
1197, 2018.
[10] S. Enshaeifar, P. Barnaghi, S. Skillman, D. Sharp, R. Nilforooshan, and
H. Rostill, “A digital platform for remote healthcare monitoring,” in
Companion Proceedings of the Web Conference, 2020.
[11] S. Majumder, E. Aghayi, M. Noferesti, H. Memarzadeh-Tehran, T. Mon-
dal, Z. Pang, and M. J. Deen, “Smart homes for elderly health-
care—recent advances and research challenges,” Sensors, vol. 17, no. 11,
p. 2496, 2017.
[12] R. Turjamaa, A. Pehkonen, and M. Kangasniemi, “How smart homes
are used to support older people: An integrative review,” International
Journal of Older People Nursing, vol. 14, no. 4, p. e12260, 2019.
[13] K. K. Peetoom, M. A. Lexis, M. Joore, C. D. Dirksen, and L. P. De Witte,
“Literature review on monitoring technologies and their outcomes in
independently living elderly people,” Disability and Rehabilitation:
Assistive Technology, vol. 10, no. 4, pp. 271–294, 2015.
[14] R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep patient: an
unsupervised representation to predict the future of patients from the
electronic health records,” Scientific reports, vol. 6, no. 1, pp. 1–10,
2016.
[15] C. S. Ross-Innes, H. Chettouh, A. Achilleos, N. Galeano-Dalmau,
I. Debiram-Beecham, S. MacRae, P. Fessas, E. Walker, S. Varghese,
T. Evan et al., “Risk stratification of barrett’s oesophagus using a non-
endoscopic sampling method coupled with a biomarker panel: a cohort
study,” The lancet Gastroenterology & hepatology, vol. 2, no. 1, pp.
23–31, 2017.
[16] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnose
with lstm recurrent neural networks,” arXiv preprint arXiv:1511.03677,
2015.
[17] C. Esteban, O. Staeck, S. Baier, Y. Yang, and V. Tresp, “Predicting
clinical events by combining static and dynamic information using
recurrent neural networks,” in 2016 IEEE International Conference on
Healthcare Informatics (ICHI). IEEE, 2016, pp. 93–101.
[18] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, “Doctor
ai: Predicting clinical events via recurrent neural networks,” in Machine
Learning for Healthcare Conference, 2016, pp. 301–318.
[19] I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou,
“Patient subtyping via time-aware lstm networks,” in Proceedings of the
23rd ACM SIGKDD international conference on knowledge discovery
and data mining, 2017, pp. 65–74.
[20] H. Harutyunyan, H. Khachatrian, D. C. Kale, G. Ver Steeg, and
A. Galstyan, “Multitask learning and benchmarking with clinical time
series data,” Scientific data, vol. 6, no. 1, pp. 1–18, 2019.
[21] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart,
“Retain: An interpretable predictive model for healthcare using reverse
time attention mechanism,” in Advances in Neural Information Process-
ing Systems, 2016, pp. 3504–3512.
[22] F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao, “Dipole: Diagnosis
prediction in healthcare via attention-based bidirectional recurrent neural
networks,” in Proceedings of the 23rd ACM SIGKDD international
conference on knowledge discovery and data mining. ACM, 2017,
pp. 1903–1911.
[23] T. Bai, S. Zhang, B. L. Egleston, and S. Vucetic, “Interpretable rep-
resentation learning for healthcare via capturing disease progression
through time,” in Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. ACM, 2018, pp.
43–51.
[24] L. Ma, J. Gao, Y. Wang, C. Zhang, J. Wang, W. Ruan, W. Tang, X. Gao,
and X. Ma, “Adacare: Explainable clinical health status representation
learning via scale-adaptive feature extraction and recalibration,” arXiv
preprint arXiv:1911.12205, 2019.
[25] H. Song, D. Rajan, J. J. Thiagarajan, and A. Spanias, “Attend and
diagnose: Clinical time series analysis using attention models,” in Thirty-
second AAAI conference on artificial intelligence, 2018.
[26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances
in neural information processing systems, 2017, pp. 5998–6008.
[27] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber et al., “Gradient
flow in recurrent nets: the difficulty of learning long-term dependencies,”
2001.
[28] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with
class imbalance,” Journal of Big Data, vol. 6, no. 1, p. 27, 2019.
[29] S. Jain and B. C. Wallace, “Attention is not explanation,” arXiv preprint
arXiv:1902.10186, 2019.
[30] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:
synthetic minority over-sampling technique,” Journal of artificial intel-
ligence research, vol. 16, pp. 321–357, 2002.
[31] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for
class-imbalance learning,” IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550, 2008.
[32] B. Krawczyk, “Learning from imbalanced data: open challenges and
future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp.
221–232, 2016.
[33] B. E. Lyons, D. Austin, A. Seelye, J. Petersen, J. Yeargers, T. Riley,
N. Sharma, N. Mattek, H. Dodge, K. Wild et al., “Corrigendum: Perva-
sive computing technologies to continuously assess alzheimer’s disease
progression and intervention efficacy,” Frontiers in aging neuroscience,
vol. 7, p. 232, 2015.
[34] A. Akl, B. Taati, and A. Mihailidis, “Autonomous unobtrusive detection
of mild cognitive impairment in older adults,” IEEE transactions on
biomedical engineering, vol. 62, no. 5, pp. 1383–1394, 2015.
[35] L. Schwickert, C. Becker, U. Lindemann, C. Maréchal, A. Bourke,
L. Chiari, J. Helbostad, W. Zijlstra, K. Aminian, C. Todd et al., “Fall
detection with body-worn sensors,” Zeitschrift für Gerontologie und
Geriatrie, vol. 46, no. 8, pp. 706–719, 2013.
[36] I. Lazarou, A. Karakostas, T. G. Stavropoulos, T. Tsompanidis, G. Med-
itskos, I. Kompatsiaris, and M. Tsolaki, “A novel and intelligent home
monitoring system for care support of elders with cognitive impairment,”
Journal of Alzheimer’s Disease, vol. 54, no. 4, pp. 1561–1591, 2016.
[37] A. Bankole, M. Anderson, T. Smith-Jackson, A. Knight, K. Oh, J. Brant-
ley, A. Barth, and J. Lach, “Validation of noninvasive body sensor
network technology in the detection of agitation in dementia,” American
Journal of Alzheimer’s Disease & Other Dementias®, vol. 27, no. 5, pp.
346–354, 2012.
[38] T. Fleiner, P. Haussermann, S. Mellone, and W. Zijlstra, “Sensor-based
assessment of mobility-related behavior in dementia: feasibility and
relevance in a hospital context,” International Psychogeriatrics, vol. 28,
no. 10, p. 1687, 2016.
[39] M. J. Rantz, M. Skubic, R. J. Koopman, L. Phillips, G. L. Alexander,
S. J. Miller, and R. D. Guevara, “Using sensor networks to detect
urinary tract infections in older adults,” in 2011 IEEE 13th International
Conference on e-Health Networking, Applications and Services. IEEE,
2011, pp. 142–149.
[40] S. Enshaeifar, A. Zoha, S. Skillman, A. Markides, S. T. Acton, T. El-
saleh, M. Kenny, H. Rostill, R. Nilforooshan, and P. Barnaghi, “Machine
learning methods for detecting urinary tract infection and analysing daily
living activities in people with dementia,” PloS one, vol. 14, no. 1, p.
e0209909, 2019.
[41] C. Wu and M. E. Thompson, “Stratified sampling and cluster sampling,”
in Sampling Theory and Practice. Springer, 2020, pp. 33–56.
11
[42] D. W. Aha and R. L. Bankert, “A comparative evaluation of sequential
feature selection algorithms,” in Learning from data. Springer, 1996,
pp. 199–206.
[43] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
for dense object detection,” in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2980–2988.
[44] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473,
2014.
[45] M. Usama, B. Ahmad, W. Xiao, M. S. Hossain, and G. Muhammad,
“Self-attention based recurrent convolutional neural network for disease
prediction using healthcare data,” Computer methods and programs in
biomedicine, vol. 190, p. 105191, 2020.
[46] A. Galassi, M. Lippi, and P. Torroni, “Attention, please! a critical
review of neural attention models in natural language processing,” arXiv
preprint arXiv:1902.02181, 2019.
[47] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[48] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv
preprint arXiv:1607.06450, 2016.
[49] S. Sperandei, “Understanding logistic regression analysis,” Biochemia
medica: Biochemia medica, vol. 24, no. 1, pp. 12–18, 2014.
[50] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:
Continual prediction with lstm,” 1999.
[51] M. H. Hassoun et al.,Fundamentals of artificial neural networks. MIT
press, 1995.
[52] T. A. Lasko, J. C. Denny, and M. A. Levy, “Computational phenotype
discovery using unsupervised feature learning over noisy, sparse, and
irregular clinical data,” PloS one, vol. 8, no. 6, 2013.
[53] Z. Che, D. Kale, W. Li, M. T. Bahadori, and Y. Liu, “Deep computational
phenotyping,” in Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2015, pp. 507–
516.
[54] W. D. Heaven, “Google’s medical ai was super accurate in a lab. real
life was a different story,” https://www.technologyreview.com/2020/
04/27/1000658/\google-medical- ai-accurate- lab-real- life-clinic- covid-\
diabetes-retina- disease/, April 2020.
[55] C. Duff et al., “Dementia: assessment, management and support for
people living with dementia and their carers,” 2018.
[56] E. Ijaopo, “Dementia-related agitation: a review of non-pharmacological
interventions and analysis of risks and benefits of pharmacotherapy,”
Translational psychiatry, vol. 7, no. 10, pp. e1250–e1250, 2017.
[57] M. Lutters and N. B. Vogt-Ferrier, “Antibiotic duration for treating
uncomplicated, symptomatic lower urinary tract infections in elderly
women,” Cochrane Database of Systematic Reviews, no. 3, 2008.
[58] M. Gharbi, J. H. Drysdale, H. Lishman, R. Goudie, M. Molokhia, A. P.
Johnson, A. H. Holmes, and P. Aylin, “Antibiotic management of urinary
tract infection in elderly patients in primary care and its association with
bloodstream infections and all cause mortality: population based cohort
study,” bmj, vol. 364, 2019.