ArticlePDF Available

Representation Learning for Clinical Time Series Prediction Tasks in Electronic Health Records

Authors:

Abstract and Figures

Background: Electronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful. Method: In this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector. Results: Based on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the "Deep Feature" represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations. Conclusion: We propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.
Content may be subject to copyright.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259
https://doi.org/10.1186/s12911-019-0985-7
RESEARCH Open Access
Representation learning for clinical time
series prediction tasks in electronic health
records
Tong Ruan1†, Liqi Lei1†, Yangming Zhou1*†,JieZhai
1, Le Zhang1,PingHe
2andJuGao
3
From IEEE International Conference on Bioinformatics and Biomedicine 2018
Madrid, Spain. 3-6 December 2018
Abstract
Background: Electronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical
research. However, there are many challenges faced by the applications of EHRs, such as temporality, high
dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to
effectively use by traditional machine learning methods while the sequential information of EHRs is very useful.
Method: In this paper, we propose a general-purpose patient representation learning approach to summarize
sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to
encode inhospital records of each patient into a low dimensional dense vector.
Results: Based on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese
Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and
comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms
existing methods. In addition, we apply the “Deep Feature” represented by our proposed RNN-DAE method to track
similar patients with t-SNE, which also achieves some interesting observations.
Conclusion: We propose an effective unsupervised RNN-DAE method to summarize patient sequential information in
EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.
Keywords: Electronic health records, Mortality prediction, Representation learning, Recurrent neural network
Background
The past decade has witnessed an explosion in the
amount of digital information recorded in electronic
health records (EHRs). The EHR data is an essential
resource for clinical researchers to design quantitative
models, and it is crucial to understand the information
contained in EHRs. In this case, machine learning mod-
elshavebeenwidely-usedtoanalyzedatawithpatients
EHRs, especially for predicting health status and help-
ing diagnose diseases, such as disease risk prediction [1],
*Correspondence: ymzhou@ecust.edu.cn
Tong Ruan, Liqi Lei and Yangming Zhou contributed equally to this work.
1School of Information Science and Engineering, East China University of
Science and Technology, 130 Meilong Road, 200237 Shanghai, China
Full list of author information is available at the end of the article
mortality prediction [2] and similarity analysis [3]. How-
ever, it is a great challenge to directly deal with raw EHR
data due to its temporality, high dimensionality, noise,
systematic bias, sparseness and random error [4]. Take
temporality as an example, the information about the
impending patient disease status is closely related to the
sequence of medical events. Moreover, the same clini-
cal phenotype may have many descriptions in EHRs [5].
Therefore, the success of predictive models relies heavily
on the representation of data. In other words, extract-
ing useful features from patient EHRs is one key aspect
leading to the success of prediction models.
Representation learning methods have been used exten-
sively within and outside the clinical domain to learn
the semantics of words, phrases, and documents. For
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 2 of 14
instance,Mikolov etal. [6]applied neural language models
to learn a distributed representation for each word, called
a word embedding. They further proposed an unsuper-
vised algorithm [7] to learn fixed-length feature represen-
tations from variable-length pieces of texts, such as sen-
tences, paragraphs, and documents. Peters et al. [8]used
a bidirectional long short-term memory network trained
on a specific task to derive word embeddings. They came
up with the contextualized embedding (i.e., each word
has multiple embeddings depending on the context it is
used in) through grouping together the hidden states of
the their model. Devlin et al. [9] proposed a language
representation model called bidirectional encoder repre-
sentations from transformers to generate word embed-
dings. Those representations perform effectively results
on multiple natural language processing tasks, such as
question answering and language inference. Traditional
representation methods such as one-hot encoding and
multi-hot encoding treat every dimension independently.
Compared to the vectors generated by these methods,
those derived by representation learning models are low-
dimensional and dense, and they capture the semantics in
context.
In the clinical domain, considerable efforts also have
been made to convert medical information in EHRs to
vectors. For example, Choi et al. [10]learnedwordembed-
dings of medical concepts. Nguyen et al. [11] extracted
features from medical records with a convolutional neural
network model. Zhou et al. [12]appliedstackeddenois-
ing autoencoders [13] to learn deep representations for
predictive diagnoses. These works are all based on deep
learning methods. In some degree, deep learning methods
can overcome the difficulties in representation learning
caused by the complexity of EHRs. However, deep learn-
ing models of these works are trained to deal with a
specific task rather than a general task. We have to re-
learn or re-tune a new representation when giving a new
predictive task.
Learning a patient representation from for general pur-
pose is necessary to make it available for various medical
prediction tasks. The main challenge is to encode the
sequential information of EHRs into a vector. Considering
the temporality of EHRs, each patient typically has multi-
ple inpatient records. Since previous medical events may
have an impact on future medical events, these continu-
ous medical records are critical for clinical diagnosis and
treatment.
In this paper, we propose an effective patient represen-
tation learning method for time-series prediction tasks
based on real-world EHR data, which greatly improves
and extends our previous work [14]. We develop a recur-
rent neural network based denoising autoencoder (RNN-
DAE) to summarize inpatient records of each patient into
a dense vector. In detail, a sub-repository for heart failure
disease is first constructed from the clinical data repos-
itory of the Shuguang Hospital. After that, we represent
clinical event information of a patient with a tensor, i.e.,
a series of multi-hot vectors. Finally, we generate patient
representation vector by using our RNN-DAE model.
With the help of our RNN-DAE model, time-series infor-
mation in EHR data is well integrated in our patient
representation. The main contributions of this paper are
summarized as follows:
We propose an effective patient representation
learning method for the time-series prediction tasks
in EHR data. Our proposed patient representation
learning method uses recurrent neural network based
denoising autoencoder (RNN-DAE) to encode
time-series information. Unlike existing patient
representation learning methods, our proposed
RNN-DAE method considers the time series
information in patient presentation.
Based on the heart failure EHR data collected from
the Shuguang Hospital, we experimentally evaluate
our proposed RNN-DAE method on two clinical time
series prediction tasks. Computational studies show
that our proposed RNN-DAE method is highly
competitive compared to existing methods, achieving
an AUC of 78.31% in mortality prediction task and
the best result in comorbidity prediction task. In
addition, we apply the “Deep Feature" represented by
our proposed RNN-DAE method to track similar
patients with t-SNE, which also achieves some
interesting results.
Related work
In this section, we first briefly introduce state-of-the-art
models for the mortality prediction and disease risk pre-
diction task of heart failure. then, we report the progress
of the representation learning methods in the medical
field.
Mortality prediction and disease risk prediction for heart
failure
Mortality prediction and disease risk prediction tasks
are very two essential health applications. It has been
found that many factors are able to increase mortal-
ity for heart failure, such as demographic factors (e.g.,
gender), clinical factors (e.g., renal dysfunction), comor-
bidities (e.g., diabetes), cardiac imaging markers (e.g.,
cardio-thoracic ratio and ejection fraction) and serum
biomarkers (e.g., brain natriuretic peptide and C-reactive
protein). In recent years, a lot of studies have shown that
machine learning methods play an important role in med-
ical research, including support vector machine, Bayesian
network, decision tree, nearest neighbors method, and
ensemble learning method [15]. For instance, Lee et al.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 3 of 14
[16] proposed a mortality prediction model with a patient
similarity metric. Three types of classification models
were used in their work, such as logistic regression, simple
statistics and decision tree. Panahiazar et al. [17]designed
a risk prediction model by using support vector machine,
logistic regression, random forest, adaboost and decision
tree. Furthermore, some researchers [15,18] experimen-
tally compared andanalyzed multiplemortality prediction
models. The results of these works varies because their
data and experiment settings are totally different, but they
did actually demonstrate that machine learning methods
have limitations in some degree.
Recently, deep learning methods play an important role
in medical research. For example, Choi et al. [19]and
Lipton et al. [20] integrated time-series information into
medical applications by recurrent neural network. Never-
theless, their model focus on event-level time-series infor-
mation(e.g., aseries ofblood pressuretests). Besides, their
model is not universal and can only handle specific tasks.
Cheng et al. [4]applieddeeplearningmodeltoextract
phenotypes from EHR data. Although the representations
of phenotypes could be used in some further applications,
the convolutional neural network they developed in this
work might ignore the sequentiality of events. Compared
with traditional machine learning models, deep learning
models require less human efforts on feature engineering,
but their results are more difficult to interpret.
Representation learning in medical field
Since effective feature representation is a basic step before
further applications, a large amount of studies are devoted
to exploring representation learning methods in the med-
ical field.
Inspired by the work of word embedding in natu-
ral language processing, many studies focus on repre-
senting medical concepts in recent years. For example,
Minarro-Giménez et al. [21]developedskip-gramto
get the representations of medical terms. Their medical
texts are collected from Wikipedia, PubMed, Medscape
and Merck Manuals. Choi et al. [22]learnedlow-
dimensional vector representations of medical codes in
longitudinal EHRs with skip-gram-based model. Medical
codes include disease, medication and procedure codes.
In their studies, patient representation with one record
is generated by aggregating all the vectors of medical
codes. Another study [10] proposed an approach named
“Med2Vec" to learn the representations of medical codes
in code level and visit level. Cui et al. [23]proposed
a supervised model guided by specific prediction tasks
to facilitate representations of medical codes, and it is
effective to work with small EHR datasets. Deepika and
Geetha [24] used a semi-supervised learning framework
which contains representation learning of drugs to pre-
dict the drug interactions. However, these studies are all
concept level, which means that the representations are
learned to represent medical codes rather than patient
representations.
Meanwhile, patient representations are widely used in
several applications to assist clinical staff. Considerable
efforts were made to learn dense vector representations
at the patient level. For example, Zhou et al. [12]devel-
oped an unsupervised feature selection scheme relied on
stacked denoising autoencoders (SDAs). However, their
model aims to summarize time-series features in an inpa-
tient record, rather than the temporality between mul-
tiple inpatient records. Miotto et al. [25]adoptedSDAs
to generate patient representations. Furthermore, Sushil
et al. [26] derived task-independent patient representa-
tions directly from clinical notes by applying SDAs and
a paragraph vector model. The above two methods only
consider the frequency of medical events. The main differ-
ence between our works and theirs is that they ignore the
temporality of EHRs. In addition, Zhang et al. [27] applied
Bi-LSTM network to derive the patient vectors based on
specific prediction. Although they take time series into
consideration, this method is task-driven and supervised.
Methods
Theoverviewofourproposedpatientrepresentation
learning framework and its potential applications are
shown in Fig. 1. Specifically, a sub-repository focusing
on heart failure is built from clinical data repository
Fig. 1 An overview of the proposed representation learning approach to generate patient vectors and further applications
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 4 of 14
(CDR) firstly. EHR data stored in the sub-repository is
then normalized and processed to tensors. Afterwards,
we derive the patient representations (called “Deep Fea-
tures”) by using our proposed RNN-DAE method. Finally,
the obtained “Deep Features” applied for some time series
prediction tasks, such as mortality prediction and comor-
bidity prediction. We use “Deep Feature” to conduct
patient similarity analysis as well.
Dataset generation: heart failure selection
The EHR data used in this paper is collected from the
Shuguang Hospital which is the first class general hos-
pitals in Shanghai. The CDR of the Shuguang Hospital
between January 2005 and April 2016 contains approxi-
mately 350,000 hospital records.
In this paper, a sub-repository focusing on heart failure
is constructed from the above CDR. We select patients
who satisfy the following criteria: One patient has at least
two hospital records, and the ICD-10 code associated
with heart failure exists in the diagnosis or medical order
of these two hospital records. Specially, clinical experts
define a list of ICD-10 codes related to heart failure,
including 63 codes.
Our dataset consists of 4682 patients with 10,898 inpa-
tient records, where 568 patients (about 12.1%) died in
the hospital and the remaining patients are difficult to
track. To enrich our dataset, we split the patients’ hos-
pital records and obtain 10,898 samples. For instance,
if a patient has three inpatient records, we then con-
struct three samples by respectively selecting only the first
record, both the first and second records, and all three
records.
Data preprocessing
For each patient in the sub-repository, auxiliary informa-
tion, general demographic details (i.e., age and gender),
and clinical events are retained. Auxiliary information
contains EMPI (i.e., patient unique identifier), hospital ID
(i.e., inpatient record unique identifier), admission time
and death time. We use auxiliary information to orga-
nize and preprocessing EHR data. General demographic
details (i.e., age and gender) only needs two dimensions
to describe, and the value of age should be normalized
without breaking sparsity first. Besides, clinical events
include diagnoses, medications and lab tests. To convert
clinical events to computable sequences, the normaliza-
tion process for different clinical events varies by their
types. In particular, we convert clinical event information
of one record to a multi-hot vector. Finally, a multi-hot
vector with 1309 dimensions is obtained according to the
following principles:
Diagnoses: The patient records of heart failure
repository include 1232 ICD-10 codes in total. As a
result, we represent the ICD-10 codes with 1232
dimensions.
Medications: According to the universality of
medication for heart failure in China, 61 kinds of
medications are chosen by clinical specialists
manually. Clinical specialists classified these
medications into 11 groups, such as ACE-I, ARA, and
ARB. Similarly, we represent the medications with 11
dimensions.
Lab Tests: Clinical experts choose 22 laboratory tests
related to heart failure in this research. According to
the reference value of each lab test, a flag including
high, low and normal is used to denote the results.
Therefore, three dimensions are required to convert
the result of one lab test into binary feature.
Eventually, we represent the lab tests with 66
dimensions.
Specially, raw feature includes clinical events and demo-
graphic details, and one record of raw feature is described
with 1311 dimensions in total.
Patient representation learning
Figure 2describes a straightforward motivation for using
distributed representation for patients. The size of ten-
sor representations is variable because different patients
may have various inpatient times(i.e., x,yor ztimes).
As shown in Fig. 2a, it is challenging to use the tensors
with variable length as the input of prediction models.
To solve this issue, the representation method in Fig. 2b
performs statistics for all the inpatient records of each
patient, such as summarize, average, and maximize. For
example, the value on each dimension of the patient vec-
tor is the summary of the corresponding medical event in
all inpatient records. Therefore, the dimensions of patient
vector is equal to the number of distinct medical events
appeared in the raw data. However, these kind of repre-
sentation is still high dimensional and sparse. Moreover,
they do not take the time series information in EHRs
into consideration. A better way to represent patients is
showninFig.2c. By using RNN-DAE model, we will use
distributed representation to better represent patients as
multi-dimensional real-valued vectors that will capture
the time series information between records.
Given a sequence of inpatient records X=(x1,x2,
···,xn),wherext(t=1, ··· ,n)is a multi-dimensional
multi-hot vector which represents an inpatient clinical
event record at time step t, our goal is to summarize a
feature vector representation cfrom these sequence of
clinical events. Finally, cwill be concatenated with demo-
graphic details to get our “Deep Feature”.
RNN is widely used to cope with time-series prediction
problems [28,29]. RNN can remember historical infor-
mation because the value of current hidden layer depends
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 5 of 14
Fig. 2 Three different forms of the representation of patients. Here, patient may have various inpatient times (e.g., x,y,z). The tensor representation
of each patient consists of multiple multi-hot vectors of N-dimensions (i.e., N=1309). The statistic-based representation is derived by operating
summary statistics, and it gets a vector with N-dimensions. Typically, distributed representation is a better representation with D-dimensions (i.e.,
D=300), where Dis much lower than N.aTensor representation of patients. bStatistic-based representation of patients. cDistributed
representation of patients
on the input of current layer and the output of previous
layer. Based on the standard RNN, Hochreiter et al. [30]
proposed long short-term memory (LSTM) model to cope
with gradient exploding and vanishing problems [31,32].
To simplify the structure of LSTM, one of the most pop-
ular variants is gated recurrent unit (GRU) model [33]
is developed. The GRU model keeps both advantages of
RNN and LSTM, that is, supporting longer sequences but
consuming less training time [34]. Therefore, we replace
the standard RNN unit with GRU in our research.
We develop a recurrent neural network based denois-
ing autoencoder (RNN-DAE) model in this paper, which
combines the ideas of SDAs [13] and sequence autoen-
coders [35]. In detail, our model trains a GRUencoder to
convert input features to a vector, and then a GRUdecoder
is developed to predict input features sequentially. Spe-
cially, the decoder reconstructs the initial inputs from a
noisy version of the input features. Figure 3illustrates the
architecture of our RNN-DAE model.
In order to avoid over-fitting when train our model,
input vectors Xare first mapped through a stochastic
mapping ˜
XqD(˜
X|X). Specially, we adopt Gaussian
noise as the stochastic mapping to get ˜
X. Gaussian noise
is a series of random numbers with a Gaussian distri-
bution. The GRUencoder reads the ˜
Xand turn it into
avectorc,wherecis actually the last hidden state of
GRUencoder which summarize the whole input sequence.
The GRUencoder predicts the next state htat time step t
given the input xtand the previous hidden state ht1as
follows:
zt=δ(Wz·[ht1,xt])(1)
rt=δ(Wr·[ht1,xt])(2)
˜
ht=tanh(W·[rtht1,xt])(3)
ht=(1zt)ht1+zt˜
ht(4)
where rtis the reset gate, ztistheupdategate,δ(·)
indicates a sigmoid activation function, and tanh(·)rep-
resents a tangent activation function. The reset gate reads
the values of ht1and xtand outputs the values (between
0to1)tothestateht1of each cell through the Eq. (2).
The update gate updates the hidden state to the new
state ht.
After encoding, GRUdecode is used to predict the next
state ytat time step tbased on the global patient vector c
and the previous hidden state st1as follows:
zt=δ(Wz·[st1,c])(5)
rt=δ(Wr·[st1,c])(6)
˜st=tanh(W·[rtst1,c])(7)
st=(1zt)st1+zt˜st(8)
yt=st(9)
where stis the hidden state of the decoder at time t.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 6 of 14
Fig. 3 The architecture of our proposed RNNDAE model. Multi-hot vectors (xt) with time series are added by a Gaussian noise and then encoded by
aGRUencoder model into the patient vector (c). Given the patient vector, another GRUdecoder model is used to decode in order to make the input
(xt) and the output (yt) are consistent as much as possible
Reconstruction error L(X,Y)is defined as the loss func-
tion, and the model optimize the parameters by min-
imizing reconstruction error. We utilize cross-entropy
function to calculate the reconstruction error as follows:
L(X,Y)=− n
i=1
d
j=1x(j)
ilogy(j)
i+1x(j)
ilog1y(j)
i
(10)
where x(j)
iis the j-th element of xiand y(j)
iis the j-th
element of yi.dis the dimension of xiand yi.
TheGaussiannoiseissetwithameanof0andavari-
ance of 0.1. The output dimensions of GRUencoder and
GRUdecoder are all 300, therefore, cis a 300-dimensional
vector. When training the network, the loss is mini-
mized by gradient-based optimization with mini-batch of
size 100.
Finally, each patient vector consists of 302 dimensions
and is renamed as “Deep Feature". Among them, 2 dimen-
sions are demographic details (i.e., age and gender), and
the other 300 dimensions are the output of our repre-
sentation model (i.e., RNN-DAE). We do not input the
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 7 of 14
demographic details into our models because they are of
great significant effect on clinical tasks. The vector cis
derived by encoding clinical events only.
Results
We compared our RNN-DAE model with other well-
known feature learning methods on mortality predic-
tion and comorbidity prediction tasks. Traditional meth-
ods such as k-means clustering(i.e., k-means),principal
component analysis (PCA), and Gaussian mixture model
(GMM)[36] performed only one transformation to the
original data, while deep learning method (i.e., SDAs)
needs to perform three transformations. The details of
traditional models to perform representation learning are
as follows.
PCA uses an orthogonal transformation to convert a
set of observations of possibly correlated variables
(entities each of which takes on various numerical
values) into a set of values of linearly uncorrelated
variables called principal components. This
transformation is defined in such a way that the first
principal component has the largest possible variance
(that is, accounts for as much of the variability in the
data as possible), and each succeeding component in
turn has the highest variance possible under the
constraint that it is orthogonal to the preceding
components. The resulting vectors are an
uncorrelated orthogonal basis set, and the dimensions
of them are less than or equal to that of original data.
Here, we set the PCA with 512 principal components.
k
-means clustering aims to partition unlabeled data
into k clusters in which each observation belongs to
the cluster with the nearest mean. In the feature
learning, the centroids of the cluster are used to
produce features. Specially, we used the vector of
centroid of the cluster to represent the data within
this cluster in our experiments and we set
k
-means
with 16 clusters.
GMM is a probabilistic model that assumes all the
data points are generated from a mixture of a finite
number of Gaussian distributions with unknown
parameters. One can think of mixture models as
generalizing
k
-means clustering to incorporate
information about the covariance structure of the
data as well as the centers of the latent Gaussian.
Each component (i.e., Gaussian distribution) of
GMM is a clustering center and has its own diagonal
covariance matrix. In the GMM model, the number
of components needs to be artificially defined, just
like the clusters number in the
k
-means. Specially, we
used the vector of the covariance of each component
to represent the data within this cluster in our
experiments and we set GMM with 512 components.
In this section, we devote to experimentally investi-
gate the effectiveness of our proposed RNN-DAE method.
BesidesRNN-DAE,wealsoevaluateavariantofRNN-
DAE method. That is a RNN based autoencoder model
without Gaussian noise (RNN-AE). RNN-AE model is an
alternative of RNN-DAE by removing Gaussian noise. In
following experiments, we applied our proposed meth-
ods to mortality prediction task, comorbidity prediction
task, and patient similarity analysis. Experimental results
are recorded in terms of Accuracy, F1-score and the area
under the curve (AUC), they are widely-used performance
measures [37,38].
Mortality prediction
Our proposed model is compared with four state-of-the-
art methods. Three of them are based on traditional
machine learning model including GMM, PCA and k-
means, and the remaining one is based on deep learning
model called “SDAs” [12]. Furthermore, we add an abla-
tion experiment to investigate the effect of the proposed
denoising part. In other words, we also develop a RNN-
AE model without Gaussian noise. According to patient
vectors derived from above representation learning mod-
els, downstream classifier is used to predict mortality.
The comparison of different downstream classifiers are
performed in the “Discussion”section.
Due to traditional machine learning models can not deal
with sequential data directly, observation windows are
required to extract features. In order to investigate impact
of window sizes, we conduct the experiments to compare
the performance of representation learning models under
various window sizes. Specially, the comparison is made
on mortality prediction task. According to the studies
[19,25], the window sizes are set with 30, 60, 90 and 180
days. Table 1shows experimental results. The first column
includes a series of represent learning methods, where
“Hand" indicates that the raw features of each patient are
only averaged. Since our proposed models RNN-DAE and
RNN-AE do not rely on window size, they achieve 0.783
Table 1 Comparative results of methods with different window
sizes
30 Days 60 Days 90 Days 180 Days
RNN-DAE 0.783 0.783 0.783 0.783
RNN-AE 0.755 0.755 0.755 0.755
SDAs 0.488 0.738 0.741 0.755
Hand 0.525 0.584 0.586 0.608
PCA 0.504 0.555 0.555 0.602
GMM 0.536 0.595 0.594 0.607
k-means 0.569 0.568 0.568 0.628
TheperformanceismeasuredbyAUC(i.e.,theareaundertheROCcurve).
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 8 of 14
and 0.755 respectively on AUCs in all cases. As the size
of the window grows, the performance of representation
learning models based on traditional machine learning
methodswillincreaseaswell.Thereasonisthatthe
larger the window size, more records it covers and more
useful information it provides. Consequently, we set the
window size with 180 days in later experiments. The per-
formance of comparison methods grows stably on AUC,
but our RNN-DAE model is at least 15.5% better than tra-
ditional machine learning models and 2.8% better than
the deep learning method “SDAs”. Comparative results
of different representation learning models for mortality
prediction task are summarized in Table 2. For the mor-
tality prediction task, we set the threshold value as 0.8 for
classification. The result shows that our RNN-DAE model
with Gaussian noise outperforms other models remark-
ably,achieving 0.783 onAUC, 0.779on accuracyand 0.449
on F1-score.
Comorbidity prediction
Comorbidity prediction task is a typical disease risk pre-
diction task. In this experiment, we consider ten comor-
bidities related to heart failure, and further validate the
effectiveness of our RNN-DAE method on comorbidity
prediction task. The statistical results of comorbidities are
shown in Table 3. Several comorbidities are so rare in the
dataset, and need to undersample when training classi-
fiers. For example, only 80 patients with valvular heart
disease occur. The column “Count" represent the number
of occurrences of each comorbidity and the column “Per-
cent" indicates the percentage of each comorbidity in our
dataset. For these comorbidities with percentage is less
than 30%, we apply NearMIss undersampling algorithm
before classification [39]. At the last column “Sample?", we
also indicate the use of sampling algorithm or not.
In the experiments, we train downstream classifiers for
each comorbidity prediction task respectively based on
patient vectors derived from various representation learn-
ing models. The comparison of different downstream
Table 2 Comparative results of different representation learning
methods for mortality prediction
AUC Threshold =0.8
Accuracy F1-score
RNN-DAE 0.783 0.779 0.449
RNN-AE 0.755 0.760 0.444
SDAs 0.755 0.738 0.439
Hand 0.608 0.693 0.420
PCA 0.602 0.715 0.427
GMM 0.607 0.693 0.420
k-means 0.628 0.722 0.430
Table 3 Statistics of 10 selected comorbidities in heart failure
Comorbidity Count Percent Sample?
Hypertension disease 7,097 0.694 n
Diabetes mellitus 3,674 0.359 n
Coronary artery disease 5,072 0.496 n
Atrial fibrillation 3,053 0.299 y
Chronic renal disease 896 0.088 y
Valvular heart disease 80 0.008 y
Dilated cardiomyopathy 321 0.031 y
Hypertrophic cardiomyopathy 146 0.014 y
Chronic obstructive pulmonary disease 818 0.080 y
Cerebral infarction disease 2,579 0.252 y
classifiers are detailed in the “Discussion”section.Table4
illustrates the comparative results between the patient
vectors learned by seven representation models with com-
plete ranking information. The result shows that no single
model achieves optimal performance across all 10 tasks.
Our RNN-DAE model achieves the most competitive
performance, and RNN-AE model achieves the second
highest performance. What is more, RNN-DAE model
achieves the highest score on 4 out of 10 comorbidity pre-
diction tasks, and obtains the smallest average ranks 2.000
(2.500, 5.600, 5.800, 5.400, 3.250 and 3.450 are respectively
obtained by the reference algorithms RNN-AE, SDAs,
PCA, k-means, GMM and Hand). Unlike RNN-DAE
model, traditional machine learning models and the unsu-
pervised deep learning model “SDAs” are constrained by
window size. To sum up, our proposed RNN-DAE model
is a better choice for comorbidity prediction task because
of its better performance.
Furthermore, we also apply the patient vectors derived
from our proposed model to predict top kcomorbidities
that a patient may suffer from. We evaluate the accuracy
of top-kcomorbidities prediction (with k=1,2, 3). The
accuracy of the downstream classifier is the average of the
top-kaccuracy of all patients. Specially, the downstream
classifier assignstop kcomorbidities to one patient by pre-
dicting the greatest kcomorbidity scores, and the top-k
accuracy of one patient is the correct rate in the predicted
top kcomorbidities. In this experiment, we evaluate the
theoretical upper bound of the classifier for each compar-
ison. That is, the accuracy when the classifier assigns all
the correct comorbidities to each patient. However, the
upper bond of top-3 comorbidities prediction is less than
1 when there is one patient with only one comorbidity in
our dataset. As shown in Table 5,ourRNN-DAEmodel
performs a little worse than our original RNN-AE model
in top-1, but outperforms in top-2 and top-3 prediction
tasks.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 9 of 14
Table 4 Comparative results of different methods for comorbidity prediction
RNN-DAE RNN-AE SDAs PCA k-means GMM Hand
HD 0.745(2) 0.771(1) 0.736(3) 0.545(6) 0.537(7) 0.652(5) 0.654(4)
DM 0.671(1) 0.660(2) 0.422(7) 0.619(6) 0.631(3) 0.627(4.5) 0.627(4.5)
CAD 0.744(2) 0.746(1) 0.609(6) 0.601(7) 0.617(5) 0.741(3) 0.740(4)
AF 0.743(1) 0.522(6) 0.537(4) 0.535(5) 0.404(7) 0.645(2) 0.644(3)
CRD 0.700(2) 0.727(1) 0.554(6) 0.373(7) 0.566(5) 0.699(3) 0.698(4)
VHD 0.586(5) 0.842(3) 0.601(4) 0.258(7) 0.500(6) 0.882(2) 0.902(1)
DCM 0.785(1) 0.777(2) 0.406(7) 0.416(6) 0.440(5) 0.675(3) 0.674(4)
HCM 0.718(2) 0.814(1) 0.201(7) 0.222(6) 0.396(5) 0.438(3) 0.437(4)
COPD 0.747(1) 0.547(3) 0.522(5) 0.577(2) 0.457(7) 0.522(5) 0.522(5)
CID 0.790(3) 0.739(5) 0.474(7) 0.697(6) 0.762(4) 0.872(2) 0.873(1)
Avg.rank 2.000 2.500 5.600 5.800 5.400 3.250 3.450
Ten selected comorbidities of heart failure are hypertension disease (HD), diabetes mellitus (DM), coronary artery disease (CAD), atrial fibrillation (AF), chronic renal disease
(CRD), valvular heart disease (VHD), dilated cardiomyopathy (DCM), hypertrophic cardiomyopathy (HCM), chronic obstructive pulmonary disease (COPD) and cerebral
infarction disease (CID)
Patient similarity analysis
Due to diagnosis and treatment highly relying on previous
experiences, it’s important to find those patients whose
physical status are similar. It helps clinicians give accu-
rate treatments. Researchers have made a large amount of
efforts [4042] to identify patients with similar status. We
make an assumption before we conduct the experiment.
That is, the patients who are dead in our dataset are sup-
posed to be similar. Based on the assumption, we try to
find out patients with similar outcomes (i.e., death) using
“Deep Feature” learned by our RNN-DAE model.
We use t-SNE [43] method to project “Deep Feature”
of 10,898 patient records to a 2-dimensional space firstly.
The t-SNE method is good at capturing much of the local
structure of the high-dimensional data, while also reveal-
ing global structure. As shown in Fig. 4,theredpoints
indicate the patients who finally die and the blue ones rep-
resent those patients who do not die. By using t-SNE, we
can convert “Deep Futures” in RDvector space into R2
vector space. It can capture the similarity of those “Deep
Future” so that the patients who die and those not are clus-
tered respectively. In detail, we split 2-dimensional space
into 30 30 =900 blocks. For each block at location
(i,j), the calculation of its mortality rate is performed as
follows.
Hij =Kij/Nij (11)
Hij =Kij/(Nij +F)(12)
where Kij indicates the number of dead records and Nij
represents the amount of inpatient records. When cal-
culate the mortality rate by Eq. 11, the corresponding
mortality rate will be 1.0 if a block has only one inpa-
tient record and it is a dead one. To avoid this problem,
we add Fas a smooth factor as shown in Eq. 12,and
we set 5 as an empirical value. Once we get the mortal-
ity rate of all blocks, we can construct a heatmap (see
Fig. 5). The higher the mortality rate of a block, the
darker the color is supposed to be. As shown in this
figure, the dead records are clustered into a few blocks,
and some of them have mortality rates over 60%. These
interesting observations show that our “Deep Feature” is
useful to calculate and visualize the similarities between
patients.
Discussion
In this section, we conduct four groups of experiments.
In first three experiments, we analyze different sampling
strategies, different binary classifiers, and patient repre-
sentation vectors with different dimensions, respectively.
Table 5 Prediction accuracy of top-kcomorbidity
Top-k° Upper BoundPatient representation learning methods
RNN-DAE RNN-AE SDAs PCA k-means GMM Hand
k=1 0.962 0.604 0.617 0.212 0.449 0.181 0.607 0.605
k=2 0.878 0.534 0.514 0.195 0.384 0.208 0.503 0.503
k=3 0.769 0.452 0.419 0.177 0.305 0.144 0.416 0.417
“Top-k” represents the average accuracy of all the patients, where accuracy for one patient is the average number of correct results included in its top k predicted
comorbiditie(s). The top-kcomorbiditie(s) is/are sorted by predicted probabilities, with k=1, 2, 3.
“Upper Bound” shows the best results achievable (i.e., all the correct comorbidities assigned to all the patients)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 10 of 14
Fig. 4 A diagram of t-SNE technique for dimensionality reduction. With the help of t-SNE, some D-dimensional data points are projected into
2-dimensional space. Specially, the red points indicate the patients who finally die and the blue ones represent those patients who do not die
Finally, we experimentally analyze the effect of different
training data sizes.
Analysis of different undersampling algorithms
The death information of EHR data is usually incom-
plete because only patients died in hospital were
recorded. Our dataset has imbalance issue because it
contains 4296 patients with 583 dead ones. The same
problem also exists between common diseases and
rare diseases in comorbidity prediction tasks. Thus,
it is necessary to undersample the dataset before the
prediction tasks. Various well-known undersampling
algorithms are evaluated in this experiment. Experimen-
tal results are displayed in Fig. 6, where x-axis repre-
sents different undersampling algorithms and y-axis indi-
cates the performance in terms of AUC. Besides, “Raw"
indicates that raw dataset was used without under-
sampling. We observe that NearMIss algorithm out-
performs other undersampling strategies. As a result,
we adopt NearMIss algorithm when undersampling
in mortality prediction and comorbidity prediction
tasks.
Fig. 5 Results of patient similarity analysis based on “Deep Feature”
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 11 of 14
Fig. 6 Comparative results of different sampling strategies
Analysis of different binary classifiers
We conduct two experiments to analyze different binary
classifiers. One is to determine a good classifier for down-
stream prediction tasks. The other is to verify the general
purpose of our RNN-DAE model.
We compare six well-known binary classifiers based on
mortality prediction task. These binary classifiers include
support vector machine (SVM), random forest (RF), gra-
dient boosting decision tree (GBDT), k-nearest neighbor
(KNN), logistic regression (LR) and naive Bayes (NB).
Figure 7records the results, where x-axis represent var-
ious classifiers and y-axis indicates the performance in
terms of AUC. As shown in this figure, SVM classifier
achieves the best performance. Therefore, SVM classi-
fier is used in both mortality prediction and comorbidity
prediction tasks.
To verify that our RNN-DAE model is insensitive by
the selected classifier. We experimentally compare the
results of mortality prediction between various represen-
tation learning methods using different classifiers, and
their results are summarized in Table 6.Fromthistable,
we observe that our proposed RNN-DAE method outper-
forms the traditional representation learning methods in
terms of AUC, with 4 of the 6 classifiers achieving the
Fig. 7 Comparative results of different binary classifiers
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 12 of 14
Table 6 Comparative results of various representation learning
methods using different classifiers for mortality prediction
SVM RF GBDT KNN LR NB
PCA 0.602 0.538 0.654 0.566 0.561 0.481
k-means 0.628 0.565 0.570 0.527 0.642 0.500
GMM 0.602 0.637 0.735 0.561 0.649 0.502
RNN-DAE 0.783 0.611 0.697 0.598 0.694 0.516
Hand 0.608 0.637 0.737 0.561 0.649 0.502
TheresultsaremeasuredbyAUC
best results. That is, our proposed RNN-DAE model is
able to achieve competitive results even without the best
classifier for downstream tasks.
Analysis of patient representation vectors with different
dimensions
To investigate the sensitivity of our proposed model, we
experimentally compare patient representation vectors
with different dimensions on mortality prediction task. As
shown in Fig. 8, the x-axis indicates different dimensions
of patient representation vector from 100 to 400 and the
y-axis denotes the performance of our proposed model
in term of AUC, Accuracy and F1-score. We observe that
the performance of our proposed model is basically stable,
although it is a bit fluctuating. In other words, no matter
how we vary the dimensions of our patient representation
vector, the value of AUC, Accuracy, and F1-score can be
better than 0.75, 0.71, and 0.42 respectively.
Analysis of different training data sizes
To find an empirical training data size to train our pro-
posed RNN-DAE model, we experimentally investigate
the effect of different training data sizes on mortality pre-
diction task. There are totally 10,898 samples in the train-
ing data. In the experiment, we randomly selected 10%,
20%,···, 100%of 10,898 samplesto trainour model.Com-
parative results are shown in Fig. 9. From this figure, we
observe that the performance of our RNN-DAE method
in terms of AUC, Accuracy, and F1-score increases sig-
nificantly when the training data increases from 10% to
30%. When the training data size continues to increase,
the value of AUC comes into a steady stage, but the val-
ues of accuracy and F1-score continue to rise until the
training data size reaches 60%. These interesting observa-
tions confirm the robustness of our proposed RNN-DAE
method. That is, our RNN-DAE model is able to achieve
comparable results even if only a few training data is used
to train.
Conclusions
We present an effective patient representation learning
method for time-series prediction tasks in real-world EHR
data. With the help of our patient representation learning
method, some predictive descriptors called “Deep Fea-
tures” can be derived from the EHR data. Our proposed
patient representation learning method uses recurrent
neural network based denoising autoencoder (RNN-DAE)
to encode time-series information. Our proposed RNN-
DAE method is able to capture hierarchical regularities,
dependencies, and time series information in the data
to create a compact, general-purpose set of patient fea-
tures that can be effectively used in predictive clinical
time series tasks. Based on the real-world heart fail-
ure EHR data collected from the Shuguang Hospital, we
experimentally evaluate the effectiveness of our proposed
RNN-DAE method on both mortality prediction task and
comorbidity prediction task. In addition, we apply our
Fig. 8 Comparative results of patient representation vectors with different dimensions
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 13 of 14
Fig. 9 The effect of different training data sizes
proposed RNN-DAE method to conduct patient similarity
analysis. Experimental results show that “Deep Features”
derived by our RNN-DAE method are consistently bet-
ter than those obtained by other feature learning methods
based on EHR data.
In future work, we plan to investigate some possi-
ble applications of our proposed RNN-DAE method to
analyze other special diseases, such as diabetes and col-
orectal cancer, and to solve other clinical tasks, such as
personalized prescriptions and therapy recommendation.
Since the patients inpatient records in our dataset rarely
exceeds 180 days, we did not consider the window size
for more than 180 days in this paper. We plan to consider
window sizes over 180 days in the future.
Abbreviations
AF: Atrial fibrillation; AUC: The area under the curve; Bi-LSTM: Bidirectional long
short-term memory; CAD: Coronary artery disease; CDR: Clinical data repository;
CID: Cerebral infarction disease; COPD: Chronic obstructive pulmonary disease;
CRD: Chronic renal disease; DCM: Dilated cardiomyopathy; DM: Diabetes
mellitus; EHRs: Electronic health records; GBDT: Gradient boosting decision
tree; GMM: Gaussian mixture model; GRU: Gated recurrent unit; HCM:
Hypertrophic cardiomyopathy; HD: Hypertension disease; ICD-10: International
classification of disease version 10; k-means: k-means clustering; KNN: k-
nearest neighbor; LR: Logistic regression; LSTM: Long short-term memory; NB:
Naive Bayes; PCA: Principal component analysis; RF: Random forest; RNN-AE:
RNN based autoencoder model without Gaussian noise; RNN-DAE: Recurrent
neural network based denoising autoencoder; SDAs: Stacked denoising
autoencoders; SVM: Support vector machine; VHD: Valvular heart disease
Acknowledgements
We express deepest gratitude to all friends enrolled in our research for their
invaluable efforts and contribution relating to the experiment. We also would
like to thank the reviewers for their useful comments and suggestions.
About this supplement
This article has been published as part of BMC Medical informatics and Decision
Making Volume 19 Supplement 8, 2019: Selected articles from the IEEE BIBM
International Conference on Bioinformatics & Biomedicine (BIBM) 2018: medical
informatics and decision making (part 2). The full contents of the supplement
are available online at https://bmcmedinformdecismak.biomedcentral.com/
articles/supplements/volume-19-supplement-8.
Authors’ contributions
LL, LZ and YZ designed the research and performed the experimental analysis,
LL and LZ wrote the manuscript; JG and PH provided the EHR data. YZ, TR and
JZ supervised and supported the research; YZ and LL substantively revised the
manuscript. All authors have read and approved the final manuscript.
Funding
Publication costs were funded by the National Natural Science Foundation of
China under Grant 61772201, the National Key R&D Program of China for
“Precision Medical Research” under Grant 2018YFC0910500, the National
Major Scientific and Technological Special Project for “Significant New Drugs
Development” under Grant 2018ZX09201008, the Shanghai Sailing Program
under Grant 19YF1412400, the Special Fund Project for “Shanghai
Informatization Development in Big Data” under Grant 201901043, and the
Network Teaching and Educational Research Project under Grant WJY2016012.
Availability of data and materials
Our datasets are not publicly available due to a concern to protect individual
patient confidentiality but they are available from the corresponding author
on reasonable request.
Ethics approval and consent to participate
Not applicable.
Consent for publication
We obtained the consent to publish their clinical data from the patients in this
study.
Competing interests
The authors declare that they have no competing interests.
Author details
1School of Information Science and Engineering, East China University of
Science and Technology, 130 Meilong Road, 200237 Shanghai, China.
2Shanghai Hospital Development Center, 2 Kangding Road, 200000 Shanghai,
China. 3Shuguang Hospital Affiliated to Shanghai University of Traditional
Chinese Medicine, 528 Zhangheng Road, 201203 Shanghai, China.
Published: 17 December 2019
References
1. Wang Q, Qiu J, Zhou Y, Ruan T, Gao D, Gao J. Automatic severity
classification of coronary artery disease via recurrent capsule network. In:
2018 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM). IEEE; 2018. p. 1587–94. https://doi.org/10.1109/bibm.2018.
8621136.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Ruan et al. BMC Medical Informatics and Decision Making 2019, 19(Suppl 8):259 Page 14 of 14
2. Allyn J, Allou N, Augustin P, Philip I, Martinet O, Belghiti M,
Provenchere S, Montravers P, Ferdynus C. A comparison of a machine
learning model with euroscore II in predicting mortality after elective
cardiac surgery: a decision curve analysis. PLoS ONE. 2017;12(1):0169772.
3. Sharafoddini A, Dubin JA, Lee J. Patient similarity in prediction models
based on health data: a scoping review. JMIR Med Inform. 2017;5(1):.
https://doi.org/10.2196/medinform.6730.
4. Cheng Y, Wang F, Zhang P, Hu J. Risk prediction with electronic health
records: A deep learning approach. In: Proceedings of the 2016 SIAM
International Conference on Data Mining. SIAM; 2016. p. 432–40. https://
doi.org/10.1137/1.9781611974348.49.
5. Zhang J, Wang Q, Zhang Z, Zhou Y, Ye Q, Zhang H, Qiu J, He P. An
effective standardization method for the lab indicators in regional
medical health platform using n-grams and stacking. In: 2018 IEEE
International Conference on Bioinformatics and Biomedicine (BIBM). IEEE;
2018. p. 1602–9. https://doi.org/10.1109/bibm.2018.8621274.
6. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed
representations of words and phrases and their compositionality. In:
Advances in Neural Information Processing Systems. Curran Associates,
Inc.; 2013. p. 3111–9. http://papers.nips.cc/paper/5021-
distributedrepresentations-of-words-and-phrases- andtheir-
compositionality.
7. Le Q, Mikolov T. Distributed representations of sentences and
documents. In: International Conference on Machine Learning. JMLR.org;
2014. p. 1188–96. http://proceedings.mlr.press/v32/le14.html.
8. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer
L. Deep contextualized word representations. In: Proceedings of the 2018
Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1
(Long Papers). Association for Computational Linguistics; 2018. p.
2227–37. https://www.aclweb.org/anthology/N18-1202/.
9. Devlin J, Chang M.-W., Lee K, Toutanova K. Bert: Pre-training of deep
bidirectional transformers for language understanding. 2018. arXiv
preprint arXiv:1810.04805.
10. Choi E, Bahadori MT, Searles E, Coffey C, Thompson M, Bost J,
Tejedor-Sojo J, Sun J. Multi-layer representation learning for medical
concepts. In: Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. ACM; 2016. p.
1495–504. https://doi.org/10.1145/2939672.2939823.
11. Nguyen P, Tran T, Wickramasinghe N, Venkatesh S. Deepr: A
convolutional net for medical records. IEEE J Biomed Health Inform.
2017;21(1):22–30.
12. Zhou C, Jia Y, Motani M, Chew J. Learning deep representations from
heterogeneous patient data for predictive diagnosis. In: Proceedings of
the 8th ACM International Conference on Bioinformatics, Computational
Biology, and Health Informatics. ACM; 2017. p. 115–23. https://doi.org/10.
1145/3107411.3107433.
13. Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and
composing robust features with denoising autoencoders. In: Proceedings
of the 25th International Conference on Machine Learning. ACM; 2008. p.
1096–103. https://doi.org/10.1145/1390156.1390294.
14. Lei L, Zhou Y, Zhai J, Zhang L, Fang Z, He P, Gao J. An effective patient
representation learning for time-series prediction tasks based on EHRs. In:
2018 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM). IEEE; 2018. p. 885–92. https://doi.org/10.1109/bibm.2018.8621542.
15. Purusothaman G, Krishnakumari P. A survey of data mining techniques
on risk prediction: Heart disease. Indian J Sci Technol. 2015;8(12):. https://
doi.org/10.17485/ijst/2015/v8i12/58385.
16. Lee J, Maslove DM, Dubin JA. Personalized mortality prediction driven by
electronic medical data and a patient similarity metric. PLoS ONE.
2015;10(5):0127428.
17. Panahiazar M, Taslimitehrani V, Pereira N, Pathak J. Using ehrs and
machine learning for heart failure survival analysis. Stud Health Technol
Inform. 2015;216:40–44.
18. Wu J, Roy J, Stewart WF. Prediction modeling using ehr data: challenges,
strategies, and a comparison of machine learning approaches. Med Care.
2010106–13. https://doi.org/10.1097/mlr.0b013e3181de9e17.
19. Choi E, Schuetz A, Stewart WF, Sun J. Using recurrent neural network
models for early detection of heart failure onset. J Am Med Inform Assoc.
2016;24(2):361–70.
20. Lipton ZC, Kale DC, Elkan C, Wetzel R. Learning to diagnose with lstm
recurrent neural networks. 2015. arXiv preprint arXiv:1511.03677.
21. Minarro-Giménez JA, Marin-Alonso O, Samwald M. Exploring the
application of deep learning techniques on medical text corpora. Stud
Health Technol Inform. 2014;205:584–8.
22. Choi E, Schuetz A, Stewart WF, Sun J. Medical concept representation
learning from electronic health records and its application on heart failure
prediction. 2016. arXiv preprint arXiv:1602.03686.
23. Cui L, Xie X, Shen Z. Prediction task guided representation learning of
medical codes in ehr. J Biomed Inform. 2018;84:1–10.
24. Deepika S, Geetha T. A meta-learning framework using representation
learning to predict drug-drug interaction. J Biomed Inform. 2018;84:
136–47.
25. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised
representation to predict the future of patients from the electronic health
records. Sci Rep. 2016;6:26094.
26. Sushil M, Šuster S, Luyckx K, Daelemans W. Patient representation
learning and interpretable evaluation using clinical notes. J Biomed
Inform. 2018;84:103–13.
27. Zhang J, Kowsari K, Harrison JH, Lobo JM, Barnes LE. Patient2vec: A
personalized interpretable deep representation of the longitudinal
electronic health record. IEEE Access. 2018;6:65333–46.
28. Werbos PJ. Backpropagation through time: what it does and how to do it.
Proc IEEE. 1990;78(10):1550–60.
29. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by
back-propagating errors. Nature. 1986;323(6088):533–6.
30. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput.
1997;9(8):1735–80.
31. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with
gradient descent is difficult. IEEE Trans Neural Netw. 1994;5(2):157–66.
32. Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent
neural networks. In: International Conference on Machine Learning.
JMLR.org; 2013. p. 1310–8. http://proceedings.mlr.press/v28/pascanu13.
html.
33. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated
recurrent neural networks on sequence modeling. In: NIPS 2014
Workshop on Deep Learning, December 2014; 2014. CoRR abs/1412.3555.
http://arxiv.org/abs/1412.3555.
34. Jin B, Che C, Liu Z, Zhang S, Yin X, Wei X. Predicting the risk of heart
failure with EHR sequential data modeling. IEEE Access. 2018;6:9256–61.
35. Dai AM, Le QV. Semi-supervised sequence learning. In: Advances in
Neural Information Processing Systems. Curran Associates, Inc.; 2015. p.
3079–3087. http://papers.nips.cc/paper/5949-semisupervised-sequence-
learning.
36. Zhou Y, Liu Y, Gao X-Z, Qiu G. A label ranking method based on
gaussian mixture model. Knowl-Based Syst. 2014;72:108–13.
37. Liu Y, Zhou Y, Wen S, Tang C. A strategy on selecting performance
metrics for classifier evaluation. International Journal of Mobile
Computing and Multimedia Communications (IJMCMC). 2014;6(4):20–35.
38. Zhou Y, Liu Y. Correlation analysis of performance metrics for classifier. In:
Decision Making and Soft Computing: Proceedings of the 11th
International FLINS Conference; 2014. p. 487–92. World Scientific. https://
doi.org/10.1142/9789814619998_0081.
39. More A. Survey of resampling techniques for improving classification
performance in unbalanced datasets. 2016. arXiv preprint
arXiv:1608.06048.
40. Sun J, Wang F, Hu J, Edabollahi S. Supervised patient similarity measure
of heterogeneous patient records. ACM SIGKDD Explor Newsl. 2012;14(1):
16–24.
41. Chan L, Chan T, Cheng L, Mak W. Machine learning of patient similarity:
A case study on predicting survival in cancer patient after locoregional
chemotherapy. In: 2010 IEEE International Conference on Bioinformatics
and Biomedicine Workshops (BIBMW). IEEE; 2010. p. 467–70. https://doi.
org/10.1109/bibmw.2010.5703846.
42. Zhang P, Wang F, Hu J, Sorrentino R. Towards personalized medicine:
leveraging patient similarity and drug similarity analytics. AMIA Summits
Transl Sci Proc. 2014;2014:132–6.
43. Maaten Lvd, Hinton G. Visualizing data using t-sne. J Mach Learn Res.
2008;9(Nov):2579–605.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... The identi ed data preprocessing techniques address various aspects such as numerical data, categorical data, data cleaning, and data shu ing. Some studies (n = 7, 15%) performed categorization by converting exact ages into intervals and clinical measurements into categories like high, normal, and low, based on clinical evaluation standards 33,36,40,46,55,56,62 . When maintaining the numerical nature of data, missing value imputation 30,52,59 and value normalization 33,38,44,59 have also been employed. ...
... Some studies (n = 7, 15%) performed categorization by converting exact ages into intervals and clinical measurements into categories like high, normal, and low, based on clinical evaluation standards 33,36,40,46,55,56,62 . When maintaining the numerical nature of data, missing value imputation 30,52,59 and value normalization 33,38,44,59 have also been employed. ...
... These models are predominantly characterized by simple architectures which are easy to train. Some studies employ shallow neural networks such as linear layer 23,59,61,63 , logistic regression (LR) (n = 8, 17%), and support vector machines (SVM) 31,33 . Models that can capture more complex data patterns such as feedforward neural networks (n = 12, 26%) and RNN 13,37-39,43,53 (n = 6, 13%), are also applied. ...
Preprint
Full-text available
The widespread adoption of Electronic Health Records (EHRs) and deep learning, particularly through Self-Supervised Representation Learning (SSRL) for categorical data, has transformed clinical decision-making. This scoping review, following PRISMA-ScR guidelines, examines 46 studies published from January 2019 to April 2024 across databases including PubMed, MEDLINE, Embase, ACM, and Web of Science, focusing on SSRL for unlabeled categorical EHR data. The review systematically assesses research trends in building efficient representations for medical tasks, identifying major trends in model families: Transformer-based (43%), Autoencoder-based (28%), and Graph Neural Network-based (17%) models. The analysis highlights scenarios where healthcare institutions can leverage or develop SSRL technologies. It also addresses current limitations in assessing the impact of these technologies and identifies research opportunities to enhance their influence on clinical practice.
... In the age of big data, electronic health records (EHR) have become paramount in both clinical practice and research [1]. One subset of EHR that has been of interest to both clinicians and researchers is time series data, often measured in laboratory values and vital signs, that can be applied to track disease trajectory from every few minutes to the span of years [2]. Compared to cross-sectional (static) data, which remains constant, time series offer more information on patient development over time. ...
... QL(y,ŷ, q) = qmax(0, (y −ŷ)) + (1 − q)max(0, (ŷ − y)), (2) where V is the set of target variables, Ω is the training dataset, Q is the set of quantiles {0.1, 0.5, 0.9} across future time points t to t max . We use the notation I(x) as the indicator function with value 1 when x is true and 0 otherwise. ...
Preprint
Trajectory forecasting in healthcare data has been an important area of research in precision care and clinical integration for computational methods. In recent years, generative AI models have demonstrated promising results in capturing short and long range dependencies in time series data. While these models have also been applied in healthcare, most of them only predict one value at a time, which is unrealistic in a clinical setting where multiple measures are taken at once. In this work, we extend the framework temporal fusion transformer (TFT), a multi-horizon time series prediction tool, and propose TFT-multi, an end-to-end framework that can predict multiple vital trajectories simultaneously. We apply TFT-multi to forecast 5 vital signs recorded in the intensive care unit: blood pressure, pulse, SpO2, temperature and respiratory rate. We hypothesize that by jointly predicting these measures, which are often correlated with one another, we can make more accurate predictions, especially in variables with large missingness. We validate our model on the public MIMIC dataset and an independent institutional dataset, and demonstrate that this approach outperforms state-of-the-art univariate prediction tools including the original TFT and Prophet, as well as vector regression modeling for multivariate prediction. Furthermore, we perform a study case analysis by applying our pipeline to forecast blood pressure changes in response to actual and hypothetical pressor administration.
... Many machine learning methods applied to clinical data-sets have shown success by utilising recurrent neural network model structures to model time-series data. [23][24][25] However, most of our data-sets were derived from annual censuses that were not sufficiently granular to merit a time-series analysis. Prior research has demonstrated the labels of a psychiatry diagnosis are insufficient for modelling since psychiatric diseases are often heterogeneous, multifactorial and highly comorbid. ...
Article
Full-text available
Background: Rates of childhood mental health problems are increasing in the UK. Early identification of childhood mental health problems is challenging but critical to children’s future psychosocial development. This is particularly important for children with social care contact because earlier identification can facilitate earlier intervention. Clinical prediction tools could improve these early intervention efforts. Aims: Characterise a novel cohort consisting of children in social care and develop effective machine learning models for prediction of childhood mental health problems. Method: We used linked, de-identified data from the Secure Anonymised Information Linkage Databank to create a cohort of 26 820 children in Wales, UK, receiving social care services. Integrating health, social care and education data, we developed several machine learning models aimed at predicting childhood mental health problems. We assessed the performance, interpretability and fairness of these models. Results: Risk factors strongly associated with childhood mental health problems included age, substance misuse and being a looked after child. The best-performing model, a gradient boosting classifier, achieved an area under the receiver operating characteristic curve of 0.75 (95% CI 0.73–0.78). Assessments of algorithmic fairness showed potential biases within these models. Conclusions: Machine learning performance on this prediction task was promising. Predictive performance in social care settings can be bolstered by linking diverse routinely collected data-sets, making available a range of heterogenous risk factors relating to clinical, social and environmental exposures.
... In finance, for example, Xie and Sheng [23] employed PCA and autoencoders to reduce the dimensionality of stock market data, which led to enhanced forecasting performance in deep learning models. In healthcare, dimensionality reduction methods have been utilized to analyze multivariate physiological signals, resulting in improved predictions of patient outcomes [27], [28]. Similarly, in climate science, these techniques have been instrumental in simplifying complex, high-dimensional data, enabling more accurate weather pattern forecasts [29]. ...
Article
Full-text available
Time series analysis is a critical task across various scientific and industrial domains, enabling the extraction of valuable insights from temporal data. High dimensionality of time series data can lead to computational inefficiencies and increased complexity. Dimensionality reduction techniques play a crucial role in handling the high-dimensional nature of time series data essential information while reducing computational complexity. This study explores the impact of various dimensionality reduction techniques and machine learning models on enhancing the accuracy and efficiency of time series datasets. The effectiveness of different dimensionality reduction methods is evaluated based on their ability to simplify data while preserving essential features. Subsequently, several machine learning models—such as Autoregressive Integrated Moving Average, k-Nearest Neighbors, Random Forest, Least Absolute Shrinkage and Selection Operator, and Extreme Gradient Boosting are applied to the transformed data. The findings reveal that the choice of dimensionality reduction technique significantly influences the performance of these models. Certain methods excel at uncovering underlying patterns and improving predictive accuracy, while others offer computational advantages. These results highlight the importance of selecting an appropriate combination of dimensionality reduction and machine learning techniques based on the specific characteristics of the time series data. This can contributes to a deeper understanding of how these methods can be effectively integrated, thereby enhancing decision-making in areas such as finance, meteorology, and operational planning.
... Many ML methods applied to clinical datasets have shown success by utilising Recurrent Neural Network (RNN) model structures to model time-series data (18)(19)(20). However, the majority of our datasets are derived from annual censuses that are not sufficiently granular to merit a time-series analysis. ...
Preprint
Full-text available
[Update: This paper is now published as a peer reviewed version here: https://doi.org/10.1192/bjo.2025.32. The manuscript changed significantly following peer-review so please see this updated version rather than the pre-print].
Article
Full-text available
Individual health trajectory forecasting is a major opportunity for computational methods to integrate with precision healthcare. Recently developed generative AI models have demonstrated promising results in capturing short and long range dependencies in time series data. While these models have also been applied in healthcare, most state-of-the-art are local models, i.e. one model per feature, which is unrealistic in a clinical setting where multiple measures are taken at once. In this work, we extend the framework temporal fusion transformer (TFT), a multi-horizon time series prediction tool, and propose TFT-multi, a global model that can predict multiple vital trajectories simultaneously. We apply TFT-multi to forecast 5 vital signs recorded in the intensive care unit: blood pressure, pulse, SpO2, temperature and respiratory rate. We hypothesize that by jointly predicting these measures, which are often correlated with one another, we can make more accurate predictions, especially in variables with large missingness. We validate our model on the public MIMIC dataset and an independent institutional dataset, and demonstrate our model’s competitive performance and computational efficiency compared to state-of-the-art prediction tools. Furthermore, we perform a study case analysis by applying our pipeline to forecast blood pressure changes in response to actual and hypothetical pressor administration.
Article
Background The use of structured electronic health records in health care systems has grown rapidly. These systems collect huge amounts of patient information, including diagnosis codes representing temporal medical history. Sequential diagnostic information has proven valuable for predicting patient outcomes. However, the extent to which these types of data have been incorporated into deep learning (DL) models has not been examined. Objective This systematic review aims to describe the use of sequential diagnostic data in DL models, specifically to understand how these data are integrated, whether sample size improves performance, and whether the identified models are generalizable. Methods Relevant studies published up to May 15, 2023, were identified using 4 databases: PubMed, Embase, IEEE Xplore, and Web of Science. We included all studies using DL algorithms trained on sequential diagnosis codes to predict patient outcomes. We excluded review articles and non–peer-reviewed papers. We evaluated the following aspects in the included papers: DL techniques, characteristics of the dataset, prediction tasks, performance evaluation, generalizability, and explainability. We also assessed the risk of bias and applicability of the studies using the Prediction Model Study Risk of Bias Assessment Tool (PROBAST). We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist to report our findings. Results Of the 740 identified papers, 84 (11.4%) met the eligibility criteria. Publications in this area increased yearly. Recurrent neural networks (and their derivatives; 47/84, 56%) and transformers (22/84, 26%) were the most commonly used architectures in DL-based models. Most studies (45/84, 54%) presented their input features as sequences of visit embeddings. Medications (38/84, 45%) were the most common additional feature. Of the 128 predictive outcome tasks, the most frequent was next-visit diagnosis (n=30, 23%), followed by heart failure (n=18, 14%) and mortality (n=17, 13%). Only 7 (8%) of the 84 studies evaluated their models in terms of generalizability. A positive correlation was observed between training sample size and model performance (area under the receiver operating characteristic curve; P=.02). However, 59 (70%) of the 84 studies had a high risk of bias. Conclusions The application of DL for advanced modeling of sequential medical codes has demonstrated remarkable promise in predicting patient outcomes. The main limitation of this study was the heterogeneity of methods and outcomes. However, our analysis found that using multiple types of features, integrating time intervals, and including larger sample sizes were generally related to an improved predictive performance. This review also highlights that very few studies (7/84, 8%) reported on challenges related to generalizability and less than half (38/84, 45%) of the studies reported on challenges related to explainability. Addressing these shortcomings will be instrumental in unlocking the full potential of DL for enhancing health care outcomes and patient care. Trial Registration PROSPERO CRD42018112161; https://tinyurl.com/yc6h9rwu
Conference Paper
Deep learning has increasingly been used to model electronic health records (EHR) data for a wide range of medical analysis, such as clinical risk prediction. Existing methods have focused on patient representation learning from a single graph view. In real clinical reasoning scenarios, it is a common practice to integrate information from different patient-level features (e.g., demographics, vital signs, diagnoses, procedures, and lab tests) to derive a patient health context, which can naturally be mapped to deep learning with multiple graphs generated from the patient-level features. Confronting the challenge of learning patient representations in clinical risk prediction, we present a new Multi-Graph Fusion Framework for patient representation learning, which learns multiple graph structures from input patient-level features and, in turn, generates an optimal graph structure that incorporates the characteristics of these graphs with attention mechanisms. Our method further aggregates the information from similar patients to offer a rich representation of the patient, which allows extraction of patient health context for missing data imputation and clinical risk prediction. Evaluation using two real-world EHR databases demonstrates the effectiveness and superiority of our method over competitive baselines.
Preprint
Full-text available
BACKGROUND There has been a rapid growth in the application of structured Electronic Health Records (EHRs) to healthcare systems, where huge amounts of diagnosis codes presenting the temporal event of the patient are collected. In the era of artificial intelligence, many models, especially Deep Learning (DL), are applied for patient outcome prediction. This systematic review aimed to identify DL models developed for sequential diagnosis codes for patient outcome prediction. OBJECTIVE The main objective of this systematic review is to identify and summarise existing DL studies predicting patient outcome using sequences of diagnosis codes, as a key part of their predictors. Additionally, this study also investigates the challenge of generalisability and explainability of the predictive models. METHODS In this review, we identified all relevant studies by using the following four databases: PubMed, Embase, IEEE Xplore, and Web of Science. After that, we evaluated the included papers in various aspects: Deep learning techniques, characteristics of the dataset, prediction tasks, performance evaluation, generalizability, and explainability. We also assessed the risk of bias (PROBAST) and the concern of applicability. RESULTS In this review, 84 papers met the eligibility criteria and were selected, which showed the growing trend in this research area. Recurrent neural networks (RNN) (and their derivatives) (n = 47; 57.3%) and Transformers (n = 22; 26.8%) were the most popular architectures in DL-based models. Most studies present their input feature as sequence of visit embedding (n = 45; 53.6%). For the prediction tasks, the most common one is next visit diagnosis (n = 30; 23.4%), followed by heart failure (18; 14.1%), and mortality (n = 17; 13.3%). Only 7 studies evaluated their models in terms of generalisability. A positive correlation was observed between training sample size and model performance (AUROC) (p-value < 0.05). However, about 70% of included studies were found to have high risk of bias. CONCLUSIONS The application of deep learning in sequence of diagnosis has demonstrated remarkable promise in predicting patient outcomes. Using multiple types of features and integration of time intervals was found to improve the predictive performance. Addressing challenges related to generalisation and explainability will be instrumental in unlocking the full potential of DL for enhancing healthcare outcomes and patient care. CLINICALTRIAL This review was registered on PROSPERO (CRD42023434032).
Conference Paper
Full-text available
Coronary artery disease (CAD) is one of the leading causes of cardiovascular disease deaths. CAD condition progresses rapidly, if not diagnosed and treated at an early stage may eventually lead to an irreversible state of the heart muscle death. Invasive coronary arteriography is the gold standard technique for CAD diagnosis. Coronary arteriography texts describe which part has stenosis and how much stenosis is in details. It is crucial to conduct the severity classification of CAD. In this paper, we employ a recurrent capsule network (RCN) to extract semantic relations between clinical named entities in Chinese coronary arteriography texts, through which we can automatically find out the maximal stenosis for each lumen to inference how severe CAD is according to the improved method of Gensini. Experimental results on the corpus collected from Shanghai Shuguang Hospital show that our proposed method achieves an accuracy of 97.0% in the severity classification of CAD.
Conference Paper
Full-text available
Since 2008, a regional medical health platform has been built for managing electronic health records of top public hospitals in Shanghai. However, public hospitals often use different names to present a same laboratory examination item (or lab indicator) in this regional platform, which seriously hinders the interconnection and sharing of medical information among hospitals. In this paper, we propose an effective method to standardize the lab indicators using n-gram features and Stacking mechanism. Our proposed method sequentially combines a clustering model and a binary classification model. More specifically, we first cluster the lab indicators based on character uni-gram similarity distances to reduce the alignment scale, and then leverage a binary classification algorithm through Stacking mechanism based on character n-gram similarity features to generate candidate-standard indicator pairs iteratively. Experimental studies on the clinical data collected from eight top public hospitals in Shanghai show that our proposed method achieves a good performance 88.43% in terms of F1-score in the final binary classification, which is highly competitive performance compared to baseline methods.
Conference Paper
Full-text available
Electronic Health Records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error, and systematic bias. In particular, temporal patient information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful. In this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder is employed to encode in hospital records of each patient into a low dimensional dense vector. Based on EHR data collected from Shanghai Shuguang Hospital, we experimentally evaluate our proposed method on both mortality prediction and comorbidity prediction tasks. Experimental studies show that our proposed method outperforms other reference methods based on raw EHRs data. We also apply the “Deep Feature” represented by our method to track similar patients with t-SNE, which also achieves interesting results.
Article
Full-text available
The wide implementation of electronic health record (EHR) systems facilitates the collection of large-scale health data from real clinical settings. Despite the significant increase in adoption of EHR systems, this data remains largely unexplored, but presents a rich data source for knowledge discovery from patient health histories in tasks such as understanding disease correlations and predicting health outcomes. However, the heterogeneity, sparsity, noise, and bias in this data present many complex challenges. This complexity makes it difficult to translate potentially relevant information into machine learning algorithms. In this paper, we propose a computational framework, Patient2Vec, to learn an interpretable deep representation of longitudinal EHR data which is personalized for each patient. To evaluate this approach, we apply it to the prediction of future hospitalizations using real EHR data and compare its predictive performance with baseline methods. Patient2Vec produces a vector space with meaningful structure and it achieves an AUC around 0.799 outperforming baseline methods. In the end, the learned feature importance can be visualized and interpreted at both the individual and population levels to bring clinical insights.
Article
Full-text available
We have three contributions in this work: 1. We explore the utility of a stacked denoising autoencoder and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. To analyze if these representations are transferable across tasks, we evaluate them in multiple supervised setups to predict patient mortality, primary diagnostic and procedural category, and gender. We compare their performance with sparse representations obtained from a bag-of-words model. We observe that the learned generalized representations significantly outperform the sparse representations when we have few positive instances to learn from, and there is an absence of strong lexical features. 2. We compare the model performance of the feature set constructed from a bag of words to that obtained from medical concepts. In the latter case, concepts represent problems, treatments, and tests. We find that concept identification does not improve the classification performance. 3. We propose novel techniques to facilitate model interpretability. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate feature sensitivity across two networks to identify the most significant input features for different classification tasks when we use these pretrained representations as the supervised input. We successfully extract the most influential features for the pipeline using this technique.
Article
Full-text available
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.
Article
Full-text available
Electronic health records (EHRs) contain patient diagnostic records, physician records, and records of hospital departments. For heart failure, we can obtain mass unstructured data from EHR time series. By analyzing and mining these time-based EHRs, we can identify the links between diagnostic events and ultimately predict when a patient will be diagnosed. However, it is difficult to use the existing EHR data directly because they are sparse and non-standardized. Thus, this paper proposes an effective and robust architecture for heart failure prediction. The main contribution of this paper is to predict heart failure using a neural network (i.e., to predict the possibility of cardiac illness based on a patient’s electronic medical data). Specifically, we employed one-hot encoding and word vectors to model the diagnosis events and predicted heart failure events using the basic principles of a long short-term memory network (LSTM) model. Evaluations based on a real-world dataset demonstrate the promising utility and efficacy of the proposed architecture in the prediction of the risk of heart failure.
Article
Motivation: Predicting Drug-Drug Interaction (DDI) has become a crucial step in the drug discovery and development process, owing to the rise in the number of drugs co-administered with other drugs. Consequently, the usage of computational methods for DDI prediction can greatly help in reducing the costs of in vitro experiments done during the drug development process. With lots of emergent data sources that describe the properties and relationships between drugs and drug-related entities (gene, protein, disease, and side effects), an integrated approach that uses multiple data sources would be most effective. Method: We propose a semi-supervised learning framework which utilizes representation learning, positive- unlabeled (PU) learning and meta-learning efficiently to predict the drug interactions. Information from multiple data sources is used to create feature networks, which is used to learn the meta-knowledge about the DDIs. Given that DDIs have only positive labeled data, a PU learning-based classifier is used to generate meta-knowledge from feature networks. Finally, a meta-classifier that combines the predicted probability of interaction from the meta-knowledge learnt is designed. Results: Node2vec, a network representation learning method and bagging SVM, a PU learning algorithm, are used in this work. Both representation learning and PU learning algorithms improve the performance of the system by 22% and 12.7% respectively. The meta-classifier performs better and predicts more reliable DDIs than the base classifiers.
Article
There have been rapidly growing applications using machine learning models for predictive analytics in Electronic Health Records (EHR) to improve the quality of hospital services and the efficiency of healthcare resource utilization. A fundamental and crucial step in developing such models is to convert medical codes in EHR to feature vectors. These medical codes are used to represent diagnoses or procedures. Their vector representations have a tremendous impact on the performance of machine learning models. Recently, some researchers have utilized representation learning methods from Natural Language Processing (NLP) to learn vector representations of medical codes. However, most previous approaches are unsupervised, i.e. the generation of medical code vectors is independent from prediction tasks. Thus, the obtained feature vectors may be inappropriate for a specific prediction task. Moreover, unsupervised methods often require a lot of samples to obtain reliable results, but most practical problems have very limited patient samples. In this paper, we develop a new method called Prediction Task Guided Health Record Aggregation (PTGHRA), which aggregates health records guided by prediction tasks, to construct training corpus for various representation learning models. Compared with unsupervised approaches, representation learning models integrated with PTGHRA yield a significant improvement in predictive capability of generated medical code vectors, especially for limited training samples.