ArticlePDF Available

Multivariate Sequential Analytics for Cardiovascular Disease Event Prediction

Authors:

Abstract and Figures

Background Automated clinical decision support for risk assessment is a powerful tool in combating cardiovascular disease (CVD), enabling targeted early intervention that could avoid issues of overtreatment or undertreatment. However, current CVD risk prediction models use observations at baseline without explicitly representing patient history as a time series. Objective The aim of this study is to examine whether by explicitly modelling the temporal dimension of patient history event prediction may be improved. Methods This study investigates methods for multivariate sequential modelling with a particular emphasis on long short-term memory (LSTM) recurrent neural networks. Data from a CVD decision support tool is linked to routinely collected national datasets including pharmaceutical dispensing, hospitalization, laboratory test results, and deaths. The study uses a 2-year observation and a 5-year prediction window. Selected methods are applied to the linked dataset. The experiments performed focus on CVD event prediction. CVD death or hospitalization in a 5-year interval was predicted for patients with history of lipid-lowering therapy. Results The results of the experiments showed temporal models are valuable for CVD event prediction over a 5-year interval. This is especially the case for LSTM, which produced the best predictive performance among all models compared achieving AUROC of 0.801 and average precision of 0.425. The non-temporal model comparator ridge classifier (RC) trained using all quarterly data or by aggregating quarterly data (averaging time-varying features) was highly competitive achieving AUROC of 0.799 and average precision of 0.420 and AUROC of 0.800 and average precision of 0.421, respectively. Conclusion This study provides evidence that the use of deep temporal models particularly LSTM in clinical decision support for chronic disease would be advantageous with LSTM significantly improving on commonly used regression models such as logistic regression and Cox proportional hazards on the task of CVD event prediction.
Content may be subject to copyright.
Multivariate Sequential Analytics for
Cardiovascular Disease Event Prediction
William Hsu1Jim Warren1Patricia Riddle1
1School of Computer Science, University of Auckland, Auckland,
New Zealand
Methods Inf Med 2022;61:e149e171.
Address for correspondence William Hsu, PhD, School of Computer
Science, University of Auckland, Private Bag 92019, Auckland 1142,
New Zealand (e-mail: whsu014@aucklanduni.ac.nz).
Introduction
A powerful tool in combating cardiovascular disease (CVD) is
automatedclinical decision support forrisk assessment. This is
particularly valuable in identifying at-risk patients for initiat-
ing risk communication and management. Numerous efforts
have sought to advance CVD risk prediction to better identify
and manage populations at risk. These include the Systematic
Keywords
cardiovascular
disease
event prediction
machine learning
deep learning
Abstract Background Automated clinical decision support for risk assessment is a powerful
tool in combating cardiovascular disease (CVD), enabling targeted early intervention
that could avoid issues of overtreatment or undertreatment. However, current CVD risk
prediction models use observations at baseline without explicitly representing patient
history as a time series.
Objective The aim of this study is to examine whether by explicitly modelling the
temporal dimension of patient history event prediction may be improved.
Methods This study investigates methods for multivariate sequential modelling with
a particular emphasis on long short-term memory (LSTM) recurrent neural networks.
Data from a CVD decision support tool is linked to routinely collected national datasets
including pharmaceutical dispensing, hospitalization, laboratory test results, and
deaths. The study uses a 2-year observation and a 5-year prediction window. Selected
methods are applied to the linked dataset. The experiments performed focus on CVD
event prediction. CVD death or hospitalization in a 5-year interval was predicted for
patients with history of lipid-lowering therapy.
Results The results of the experiments showed temporal models are valuable for CVD
event prediction over a 5-year interval. This is especially the case for LSTM, which produced
the best predictive performance among all models compared achieving AUROC of 0.801
and average precision of 0.425. The non-temporal model comparator ridge classier (RC)
trained using all quarterly data or by aggregating quarterly data (averaging time-varying
features) was highly competitive achieving AUROC of 0.799 and average precision of 0.420
and AUROC of 0.800 and average precision of 0.421, respectively.
Conclusion This study provides evidence that the use of deep temporal models
particularly LSTM in clinical decision support for chronic disease would be advanta-
geous with LSTM signicantly improving on commonly used regression models such as
logistic regression and Cox proportional hazards on the task of CVD event prediction.
received
March 2, 2022
accepted after revision
August 25, 2022
DOI https://doi.org/
10.1055/s-0042-1758687.
ISSN 0026-1270.
© 2022. The Author(s).
This is an open access article published by Thieme under the terms of the
Creative Commons Attribution-NonDerivative-NonCommercial-License,
permitting copying and reproduction so long as the original work is given
appropriate credit. Contents may not be used for commercial purposes, or
adapted, remixed, transformed or built upon. (https://creativecommons.org/
licenses/by-nc-nd/4.0/)
Georg Thieme Verlag KG, digerstraße 14, 70469 Stuttgart,
Germany
THIEME
Original Article e149
Article published online: 2022-12-23
COronary Risk Evaluation (SCORE),1,2 the Pooled cohort equa-
tions,3and in New Zealand the PREDICT equations.4Recently,
research has also advanced the prediction of long-term risk of
recurrent CVD events as improvements in disease manage-
ment have contributed to a growing number of patients with
established CVD in the community.5Modern risk assessment
tools use statistical methods to identify vulnerable patients
and quantify their level of risk.6For patientswho are identied
as high risk, an array of interventions are available to reduce
the level of risk as well as to prevent an acute CVD event. These
include, adopting lifestyle changes (e.g., smoking cessation,
regular exercise), pharmacological therapy, and closer moni-
toring (e.g., more frequent risk assessments).6A CVD event is
the prediction outcome of paramount clinical interest due to
its high cost to the health care systems (hospitalizations
and rehabilitation), associated disability-adjusted life years
burden, and patient mortality.7The ability to accurately
predict CVD events within a population enables targeted early
intervention that could avoid issues of overtreatment or
undertreatment in the population.4
All current CVD prediction models use predictors at
baseline. The central question that the current study seeks
to investigate is whether by including an observation
window leading up to the baseline, thus accounting for
patient history, CVD risk prediction may be improved. Addi-
tionally, in this study, we focus on lipid management. TC/HDL
(total cholesterol to high density lipoprotein ratio) is a
known important CVD risk factor.810 In New Zealand, clini-
cal guidelines recommend patients assessed to have a 5-year
CVD risk of 15% or more to use lipid-lowering pharmaco-
therapy to reduce risk of CVD event or death.6Further,
despite the strong evidence of the benets of preventive
medicine, non-adherence to medication is a long-standing
challenge in health care delivery and presents a signicant
obstacle to patients beneting from treatment.11 Both inter-
national and New Zealand studies have found long-term
adherence to statin (a lipid-lowering drug) to be low.12,13
In New Zealand, adherence to statin in secondary prevention
has been found to be 69 and 76% in the rst year and drops
down to 66% in the third year. For primary prevention,
adherence to statin was found to be 63% in the rst
year.14,15 A U.S. study found non-adherence to statin to
be as high as 56.0% for secondary prevention patients and
56.4% for primary prevention patients.16 Similarly, a United
Kingdom-based study found patterns of discontinuation of
treatment for 41% of patients who are using statin
as secondary prevention and 47% of patients who are using
statin as primary prevention, although many of these
patients restarted their treatment following discontinuation
(75 and 72%, respectively).17 The current study hypothesizes
that by integrating the temporal dynamics of TC/HDL levels
and adherence to lipid-lowering therapy, the prediction of
CVD risk can be improved. This hypothesis informs our
cohort selection criteria which is detailed in Section Cohort
Selection.
In the domain of health care, over a period of years, aided
by government efforts, there has been growing uptake of
electronic health record (EHR) systems. In New Zealand,
government initiatives in the 1990s supported development
of health IT infrastructure, including creation of a national
health index (NHI), providing the sector with a unique
individual patient identier; implementing a health infor-
mation privacy code; and actively encouraging the private
sector to develop and sell electronic services.18 In the United
States, in the wake of the Global Financial Crisis massive
growth in EHR uptake was driven by the HITECH act.19 Of
particular interest to this study are EHRs that are routinely
collected. These data are often the biproduct of health care
services and, in socialized health care systems such as New
Zealands, tend to have a whole-of-population coverage.
When linked across various datasets, they have a longitudi-
nal structure, allowing treatment and disease trajectories
(e.g., patients physiological changes) to be examined over
time.20
The present resurgence of deep learning in the machine
learning community is chiey facilitated by advances in
computational power, specically graphics processing units
(GPUs) and the increasing availability of enormous datasets.
Many of the notable breakthroughs in the application of deep
learning are in the area of computer vision and natural
language processing: image classication, object detection,
machine translation, and natural language generation.21,22
A shared feature of these tasks is the use of unstructured data
(images or plain text) where deep learning modelscapacity
for representation learning is exploited. In the domain of
health care, computer vision has achieved some of the most
signicant successes in the application of deep learning.
Here, medical image analysis, often using convolutional
neural networks, has achieved levels of performance on
par or exceeding human experts on a range of complex
diagnosis tasks.2327 However, the performance gain of
deep learning methods against conventional machine learn-
ing methods on structured/tabulated data, the type of data
that is ubiquitous in EHRs, is less certain.2830
Deep learning/neural networks (NNs) overcome some of
the limitations of regression-based models. Deep learning
models canjointly exploit feature interactions and hierarchy.31
Of specic interest to this study is the class of articial NNs
called recurrent neural networks (RNNs) which are temporal
models that are explicitly multivariate and sequential. In the
context of risk prediction in public health, RNNs afford the
opportunity for patient history to be modeled in a temporal
manner, in contrast to conventional risk modelling where risk
assessment is based on patient data at a specic point in time.
Here, the temporal dynamic relationships between risk factors
is integrated into the risk assessment. A variant of RNNs called
LSTM includes an internal cell state and gated units that
regulate what is inputted, retained, and outputted from the
cell. LSTM was developed to overcome the problem of long-
range dependencies (remembering signicant events fromthe
distant past)32 and has the capacity to reset its internal state33
(forget unimportantevents in the past). Since itsdevelopment,
LSTM-based methods have proven remarkably competitive on
a range of tasks.3439 and have been successfully applied to a
range of sequential tasks in the biomedical domain.4042
Given the long-term nature of CVD progression and CVD
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e150
management, this study hypothesizes that LSTM will be
well suited for CVD event prediction, where an observation
window of patient history is integrated into the prediction
task.
Vascular informatics using epidemiology and the web
(VIEW) is a vascular health research program based at
University of Auckland; the program includes a research
stream named PREDICT.43 For the current study, the PREDICT
dataset is linked to other routinely collected national data-
sets including pharmaceutical dispensing, hospitalization,
laboratory test results, and deaths, to investigate methods for
multivariate sequential modelling in the context of CVD risk
prediction. From the data linkage, features that have clinical
feasibility are derived. The study focuses on a cohort with
lipid management.
Objective
This study is motivated to investigate if risk prediction
performance in CVD can be improved if temporal deep
learning methods are utilized, specically in a context where
structured/tabulated data are used. The model long short-
term memory (LSTM) appears to be an excellent ttothe
problem of chronic di sease risk prediction and thus is central
in our investigation. LSTM allows patient history to be
explicitly modeled in a multivariate and sequential fashion,
where internal mechanisms of the unit control the content of
its memory. As such, LSTM should be well suited for predic-
tion tasks where the progression and management of a
disease are prolonged and long term. Of particular interest
to the current study is the relevance of the temporal dynam-
ics of l ipid management. We hypothesize that patient his tory
over time in the 2 years r un up to PREDICT assessment will be
informative for CVD risk prediction.
Our study compares LSTM against several model compa-
rators. The models are selected to assess: the consequence of
explicitly modelling time through the use of sequential data,
the usefulness of learning long-term dependencies and
forgettingfacilitated by the LSTM units, the advantages
of modelling non-linear relationships in the predictor vari-
ables and the benets of overcoming the problem of multi-
collinearity for the task of CVD event prediction against
traditional risk models used in clinical decision support.
Methods
Data Sources
PREDICT is a web-based CVD risk assessment and manage-
ment decision support system developed for primary care in
New Zealand. The system is integrated with general practice
EHR and since its deployment in 2002 has produced a con-
stantly growing cohort of CVD risk proles. Through the use of
encrypted NHI, the de-identied cohort is annually linked to
other routinely collected databases to produce a research
cohort. The PREDICT cohort and its use in improving CVD
risk assessment have been described in detail previously.4,44
The current study links the PREDICT cohort to TestSafe
(Auckland regional laboratory test results45) and national
collections by the Ministry of Health the pharmaceutical
collection, the National Minimum Dataset (hospital events),
and the Mortality Collection.46 TestSafe is used to obtain
laboratory test results of clinically relevant measures (see
next section). The Phar maceutical collection is used to obtain
dispensing history of medication relevant to the manage-
ment of CVD including lipid-lowering, blood pressure low-
ering, antiplatelets, and an ticoagulants as well as dispensings
of drugs used in the management of important comorbid-
ities, e.g., insulin. The National Minimum Dataset (NMDS) is
used to identify hospitalization with their dates of admission
and discharge and diagnosis. The mortality collection ena-
bles the identication of patients who died during the study
period and their cause of death. From these sources, history
of CVD, treatment trajectories, important comorbidities as
well as CVD events can be derived.
A lookup table constructed by the VIEW research team is
used to identify relevant chemical names from the Pharmaceu-
tical collection. Identied chemical names using this lookup
table are grouped into three broad categories: lipid-lowering,
CVD, and other. Similarly, a lookup table constructed by the
VIEW research team is used to identify ICD-10 codes in the
hospitalization collection that are related to CVD conditions:
more, specically, International Statistical Classication of
Diseases and Related Health Problems, Tenth Revision,
Australian Modication, ICD-10-AM, which was used in New
Zealand from 1999 to 2 019.47The conditions are broadly in two
categories:history and outcome, with the addition of mortality.
For the listof the CVD conditionsand their respective categories
see Appendix Table 1 in Appendix. For the denitions of
listed conditions see https://wiki.auckland.ac.nz/display/
VIEW/CompleteþVariableþNamesþIndex.
Laboratory Tests
Through TestSafe, records of high-density lipoproteins
(HDL), low-density lipoproteins (LDL), triglycerides (TRI),
total cholesterol (TCL), cholesterol ratio (TC/HDL), serum
creatinine (sCr), and glycated hemoglobin (HbA1c) are
obtained.45 TC/HDL is the ratio of TCL divided by HDL. TCL
is calculated by48
sCr is a measure used to determine the health of a patients
kidney.However, an individualss Crlevel can vary depending on
ones sex, age, ethnicity, and body size. A more precise measure
for determining an individualskidneyhealthistheestimated
glomerular ltration rate (eGFR)49 which is estimated for every
sCr laboratory test in the TestSafe record. HbA1c measures the
glucose level in an individuals blood, it is used for diabetes
diagnoses and to assess long-term glucose control for patients
diagnosed with diabetes.50 Patients with kidney disease or
diabetes have signicantly increased CVD risk.6
TestSafe Feature Construction
The measures from TestSafe are irregularly sampled. For
TC/HDL, some patients might have one test over the period
of 2 years, while others might have three tests in one quarter.
To construct time-series from TestSafe, values from tests are
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e151
linearly interpolated and extrapolated over the study period.
The method connects a straight line between any two adja-
cent data points within the study window. If no feature value
exists before the rst and or after the last feature value,
the rst/last feature values are linearly extrapolated. Linear
extrapolation uses the rst/last value of a feature and sets all
values of that feature before/after to that value. Laboratory
tests generally occur intermittently within a patients
history, however, for intervals without a measure for Lipid,
HbA1c, or GFR it does not mean these biometric measures
cease to exist (drop to zero) in these intervals. Experiments
were conducted exploring spline interpolation as a potential
method for interpolating between feature values within a
study window. However, the variability between when
measures are taken meant spline interpolation could poten-
tially introduce extreme values that are biologically implau-
sible. It was decided that interpolating and extrapolating
linearly offer the most parsimonious explanation of a
patients biometric trajectory without introducing extreme
values. In addition, auxiliary features TEST, TESTED, and
DIED are constr ucted, these are binary time-series indic ating
whether the patient had a cholesterol test in this quarter
(encompassing HDL, LDL, TRI, TCL, and TC/HDL), whether
the patient has ever had a cholesterol test and whether the
patient has died, respectively. Using TC/HDL as an example,
the rules used in constructing the cholesterol time-series
are illustrated in Figs. 1 and 2.
Figs. 1 and 2showexamples of TC/HDL, TEST, TESTED, and
DIED time series. TC/HDL laboratory test results and their
interpolated and extrapolated values are represented by
orange dots and orange lines, respectively. TC/HDL values are
point estimates, representing where the TC/HDL line intersects
with the blue dotted line at t
i
. TEST, TESTED, and DIED are
binary indicators. TEST values are evaluated over an interval,
betweent
i
1
(exclusive) and t
i
(inclusive). If the patient has had
anycholesteroltest within this interval, the value of TESTwould
be 1, otherwise 0. For simplicity, the above examples comprise
only TC/HDL tests in the study window. TESTED indicates
whether the patient has ever had a cholesterol test and DIED
indicates whether the patient has died.
Fig. 1 If TC/HDL test results outside the study window exist (before t
0
and after t
11
), they are used in the interpolation. HDL, high-densit y
lipoprotein; TC, total cholesterol.
Fig. 2 If TC/HDL laboratory test results outside the study window do not exist the TC/HDL values are extrapolated from the rst test result
leftward and from the last test result rightward. HDL, high-density lipoprotein; TC, total cholesterol.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e152
Laboratory test results of eGFR and HbA1c are treated
similarly in the construction of their respective time series.
Additional auxiliary time series of TEST_GFR, TEST_HBA1C,
TESTED_GFR, and TESTED_HBA1C are also included as time-
series features.
Pharmaceutical Dispense
A lookup table constructed by the VIEW research team is
used to identify relevant categories of medications. The
categories constructed by VIEW are lipid_lowering, statins,
bp_lowering, antiplatelets, anticoagulants, antianginals,
loop diuretics, anti-diabetes, insulin, metformin,
other_oralhypos, metolazone, ppi_h2a, corticosteroid, and
nonasp_nsaids. Identied chemical names using this look up
table are grouped into three broad categories: lipid-lowering
(comprised of lipid_lowering and statins medications), CVD
(comprised of bp_lowering, antiplatelets, anticoagulants,
antianginals, loopdiuretics and metolazone medications),
and other (comprised of anti-diabetes, insulin, metformin,
other_oralhypos, ppi_h2a, corticosteroid, nonasp_nsaids
medications).
Data Cleansing
We calculate the proportion of days covered (PDC) as a
percentage for pharmaceutical features. To do so, the eld
DAYS_SUPPLY is used to infer the number of days covered by
a specic medication. However, anomalous values need to be
addressed before PDC can be calculated. For each drug, if
DAYS_SUPPLY is 0 or outside a range specicforthat
drug (there were no missing values in this eld), the
value of DAYS_SUPPLY is inferred using the value of
QUANTITY_DISPENSED divided by the value DAILY_DOSE if
these values are available. Otherwise, the most frequently
occurring QUANTITY_DISPENSED and/or DAILY_DOSE for
that drug is used in the calculation. Following this inference,
if the value of DAYS_SUPPLY is still outside the range for this
drug we assign the most frequently occurring DAYS_SUPPLY
value that is nonzero for this drug to DAYS_SUPPLY. With a
few exceptions, all medications used the minimum of seven
and maximum of 90 as the range for DAYS_SUPPLY.
Insulin treatment and usage pattern are not one where
medication adherence can be reliably calculated from dis-
pensing records through the variables available. In the vast
majority of cases DAYS_SUPPLY is 0 and no sensible value
could be derived from dividing QUANITY_DISPENSED by
DAILY_DOSE, as DAILY_DOSE is not measured in pill counts
but volume, e.g., mL. Additionally, insulins are covariates in
our analysis, indicating the patient is managing the comor-
bidity of diabetes and an overall more complex health state.
Therefore, it is important for the signal of insulin dispense to
be kept in the data but it is not required for it to be of a value
where patient adherence to insulin can be measured. All
DAYS_SUPPLY of ins ulins are set to the most fre quent nonze-
ro QUANTITY_DISPENSED.
Pharmaceutical Collection Feature Construction
A PDC time series for each chemical name is constructed. It is
common for patients to switch treatments in the lipid-
lowering category. To address this, an extra PDC time series
bounded to 100, representing PDC for all lipid-lowering
medication is added to the features.
Chemical names in the category of CVD and other are
treated as covariates. For these chemical names, we con-
structed PDC time series for each name, where in the case of
combined treatment we split the chemical name with the
word withand construct a time series for each of the
elements in the combined treatment.
Hospitalization Discharge
The NMDS contains hospitalization records including varia-
bles DIAG_TYP (diagnosis type), ADM_TYP (admission type),
EVSTDATE (event start date), EVENDATE (event end date),
and CLIN_CD_10 (ICD-10 code). There are four relevant
DIAG_TYPs in the record51:
A. Principal diagnosis.
B. Other relevant diagnosis.
O. Operation/procedure.
E. External cause of injury.
Each admission can have up to 99 diagnosis/procedure
codes where there exists only one that is of DIAG_TYP A
principal diagnosis. With remaining codes categori zed by the
other DIAG_TYPs. A list of the retired and current ADM_TYPs
exist in the dataset51:
CURRENT
AA Arranged admission
AC Acute admission
AP Elective admission of a privately funded patient
RL Psychiatric patient returned from leave of more than
10 days
WN Admitted from DHB booking system (used to be
known as waiting list)
RETIRED
ZA Arranged admission, ACC covered (retired June 30,
2004)
ZC Acute, ACC covered (retired June 30, 2004)
ZP Private, ACC covered (retired June 30, 2004)
ZW Waiting list, ACC covered (retired June 30, 2004)
WU Waiting list urgent (code not used from August 20,
1993)
A lookup table constructed by the VIEW research team is
used to identify ICD-10 codes in the NMDS that are related to
CVD conditions of interest. The conditions are broadly
divided in two categories: history and outcome.
Hospitalization Discharge Feature Construction
Binary time series are constructed for all CVD conditions
dened by the VIEW research team, including 21 CVD
history, two CVD mortality and 18 CVD outcome categories.
PatientsNMDS records prior to the observation
window/study period are searched for evidence of CVD
history. If there exists a clinical code mapping to any of
the CVD history categories, the corresponding time series
will contain 1s otherwise 0s.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e153
All hospitalization records that fall within the study
period are parsed. Any hospitalization record with a clinical
code mapping to any CVD history categories will switch the
time series for the categories from 0s to 1s from the time step
the hospitalization event occurs and onward. Only clinical
codes with DIAG_TYP A, O and E are used to identify CVD
mortalities and outcomes. If there exists a clinical code with
DIAG_TYP A, O or E mapping to one of the CVD mortality
and/or outcome categories, the corresponding categories
will be 1 in the time step(s) in which the record of the event
falls.
In addition to the features constructed based on CVD
conditions dened by VIEW two time series NUMBER_OF_-
DAYS and ACUTE_ADM are constructed. NUMBER_OF_DAYS
is of the number of days within this time step (quarter) the
patient was in hospital. The equation
is used to derive the value for the variable to account for
day patients. ACUTE_ADM is a binary vector that has the
value 1 if the event is an acute admission (holding the value
of AC or ZC in ADM\_TYP), otherwise 0.
Study Design
To investigate whether patientsCVD event prediction may
be improved by the inclusion of patient history a study
design is formulated using each patients PREDICT assess-
ment as the index date, and approximately 2 years (8 90
day quarters) prior to the index date and approximately
5 years ( 20 90 day quarters) after the index date as the
observation window and prediction window, respectively
(Fig. 3). An approximately 5 years interval for the predic-
tion window is chosen because it aligns with Ministry of
Health guidelines for CVD risk assessment and is under-
pinned by the fact that patientsCVD risk and risk manage-
ment can change considerably over a longer period (e.g., 10
years), most randomized controlled trials of CVD medica-
tions are based on a period of 5 years or less and that
practitioners are accustomed to this approach.6An approx-
imately 2 year interval for the observation window is
chosen in the interest of retaining enough samples in the
dataset.
Cohort Selection
The study cohort was selected through several exclusion
criteria. First, patients having their rst PREDICT assessment
prior to January 01, 2007 and after December 30, 2013 are
excluded as their pharmaceutical records are censored in the
observation or prediction windows. Second, informed by our
interest in integrating the temporal pattern of disease states,
patients without all components of lipid prole (HDL, LDL,
TRI, TCL, and TC/HDL) in either the observation or prediction
windows are excluded. Third, informed by our interest in
integrating the temporal pattern of disease management
process, patients without lipid-lowering medication dis-
pensed in the observation window with a 2 week look ahead
post PREDICT assessment (to account for patients prescribed
lipid-lowering medication around the time of PREDICT
assessment) are excluded. Patients with infeasible data
values and patients under the age of 18 are excluded.
See Fig. 4 for the study cohort selection owchart.
Preprocessing
This subsection outlines the actions taken during prepro-
cessing to address categorical variables, missing values as
well as data imbalance and removing erroneous data. During
preprocessing, four samples were removed from the data
because the value of the variable PT_DIABETES_YR was <0. If
a samples PT_DBP2 value is missing the PT_DBP value is
assigned to the PT_DBP2 variable (seven samples). PREDICT
variables PT_RENAL which is ordinal and PT_ATRIAL_FIBRIL-
LATION which is binary with missing values have 0 assigned
to the missing values and all other values changed to the
value þ1. Missing PT_DIABETES_YR is assigned 0 (65,084
samples). Missing PT_EN_TCHDL is assigned the last TC/HDL
result before PREDICT assessment from TestSafe (889 sam-
ples). SEX is encoded as a binary variable and ETHNICITY is
one-hot encoded. Ethnicities MELAA (Middle Eastern, Latin
American and African; comprise only 1.5% of the New
Zealand population52) and Other are excluded due to small
sample size. Ethnicities Chinese and Other Asian are com-
bined. This resulted in ve ethnicity groups: European,
Māori, Pacic, Chinese/Other Asian, and Indian. Samples
missing PT_SMOKING (two samples) and PT_GEN_LIPID
(one sample) are removed.
Fig. 3 Study design showing date range from index date for t he observation window (shaded in green) and the prediction window (shaded in red).
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e154
The above steps leaves 100,096 samples in the data. These
samplesare randomly shufed, then a test setof the last 10,096
samples is set aside. Using data not in the test set, linear
regression models were developed to impute missing HBA1C
and eGFR values in the entire dataset using AGE, SEX, NZDEP,
and ETHNICITY as predictor variables. See Appendix Table 2
in Appendix for the list of PREDICTvariables and their descrip-
tions. See Appendix Table 3 in Appendix for the affected
variables, their conditions that require addressing, the actions
taken, and the number of affected cases.
Descriptive Statistics
Based on the study design outlined in the Study Design section
and the result of the cohort selection outlined in the Cohort
Selection section, quarterly time series based on 90 day quar-
ters are constructed for each patient in the cohort using the
linked dataoutlined in the Data Sourcessection. The featuresof
the data fall into eight categories: demographic, lipid prole,
lipid-lowering drugs, CVD drugs, other drugs, hospitalization,
HbA1c and eGFR, and PREDICT (i.e., other clinical variables
such as systolic blood pressure, diastolic blood pressure,
smoking status collected at the same time of CVD risk assess-
ment). See Appendix Tables 4 to 8in Appendix for the
featuresdescriptive statistics. Due to commercial sensitivity
of pharmaceutical data, the descriptive statistics of lipid-
lowering drugs, CVD drugs, and other drugs are not shown.
Test Data
An attributeof time series constructed throughinterpolation is
that the gradient of slopes afford the chance for data in the
observation window to peek ahead into data in the prediction
window. Obviously, this is strictly illegal in the task of forecast-
ing or prediction, because what the experiments are seeking to
quantify is how well the models can perform on these tasks
using only data up to the index date, hence peeking ahead
constitutes cheating. To avoid this problem, separate test data
are created that extrapolates from the last test value in the
observation window to the end of the observation window for
all the interpolated features (TestSafetests: HDL,LDL, TRI, TCL,
TC/HDL, HbA1c, and eGFR). See Fig. 5 for an illustration of this
treatment. In all experiments, the TestSafe features used for
training are the unaltered interpolated time series, while the
separate extrapolatedtest data are usedfor testing to ensure no
peeking ahead occurs during testing.
Prediction Outcome
The problem of CVD event prediction is formulated as a binary
classication task; predicting event and no event. In the
Fig. 4 Flowchart of study cohort selection.
Fig. 5 Test data are attened beyond the last laboratory test result in the observation window to prevent looking ahead; laboratory test results
beyond the observation window inuencing the gradient within the observation window. Here, the dots are the laboratory test measures, the
solid line is the constructed time-series and the dashed line represents the test data.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e155
context of this study, the outcome of a CVD event (fatal or non-
fatal) is dened as having an acute hospital admission with the
ICD-10-AM code of the principal diagnosis matching one of the
CVD mortality or outcome categories dened by VIEW
(excluding atrial brillation, the feature OUT_ATRIAL_FIBRIL-
LATION), or a CVD-related death without hospitalization.
See Appendix Table 1 in Appendix for the set of CVD
categories. A PREDICT variable (PT_IMP_FATAL_CVD) is used
to identify all patients who died due to CVD. This feature
captures thosewho have CVD as a cause of death on their death
certicate with or without hospitalization, as well as those
without CVD recorded on their death certicate but who had a
CVD hospital admission up to 28 days before their date of
death. The VIEW research group refers to this as the 28 day
rulefor reclassifying non-CVD death as CVD death.53
Of the 100,096 patients, 25,419 patients have prior history
of CVD, dened as having a hospital admission prior to their
PREDICT assessment date with an ICD-10-AM code matching
the broad CVD historycategory (HX_BROAD_CVD) dened
by VIEW. The remaining 74,677 patients are patients without
prior CVD. The proportions of each sub cohort (with or without
prior CVD) having a CVD event and a fatal CVD event in their
prediction window are shown in Table 1.
Prediction Models
This study investigates the performance of LSTM against ve
model comparators on the task of CVD event (fatal or non-
fatal) prediction. These model comparators are: simple
recurrent neural network (Simple RNN), multilayer percep-
tron (MLP), ridge classier (RC), logistic regression (LR), and
Cox proportional hazards model (Cox).
Conventionally, with the exception of the output layer,
MLP layers incorporate a non-linear activation, common
among which are sigmoid, tanh, or the more recently devel-
oped rectied linear unit. It is the non-linear activation that
provides the expressive power of MLP. Even with only a single
hidden layer, an MLP can be universal (represent arbitrary
functions) under certain technical conditions.54 Increasing
the depth of the network allows the network to represent
complex functions more compactly. The hidden layer(s) of
MLP can be thought of as learning nonlinear feature map-
ping, transforming a nonlinearly separable representation of
the features to one that is linearly separable.54,55
Ridge regression and its classicat ion variant RC are linear
models that address the problem of multicollinearity in the
predictor variables.56 The models are part of a family of
penalized regression models including Lasso57 and Elastic
Net58 that adds a penalty to the loss. This penalty constrains
and shrinks the size of the model coefcients, which has a
regularization effect and prevents overtting. For classica-
tion problems, RC rst modies binary response to 1 and 1
and then treats the task as a regression task, minimizing the
penalized residual sum of squares. The sign of the regressors
prediction then represents the predicted class.59 Ridge
regression/classication ha s shown to be a promising model-
ling technique in the domain of epidemiology, particularly in
high dimensional settings where the number of features is
large, such as in genomic data analysis.60,61 As a compara-
tively more interpretable model, it has shown to be compet-
itive against black-box models such as support vector
machines and NN.62
LR is a statistical method for modelling the relationship
between one or more predictor variables and a dichotomous
response variable of the values 1 or 0. It is a function of the
odds ratio, and it models the proportion of new incidents
developed within a given period of time. Cox is a statistical
method for modelling the relationship between one or more
predictor variables and the amount of time to pass before an
occurrence of an event. It differs from LR by assessing a rate
instead of a proportion. Cox regression is a function of the
relative risk and it models the hazard rate, the number of new
incidents per population per unit time. Although penalized
LR and regularized Cox variations exist, here we are interest-
ed in the utility of LR and Cox as widely used in traditional
clinical risk models4,63,64i.e., w ithout regularization in the
context of CVD event prediction. Their inclusion in the
investigation provides baselines for the prediction task.
The performance benets of adding a penalty to linear
models is represented in our investigation of RC.
The input datasets for LSTM and Simple RNN are expli citly
sequential. The input datasets for MLP, RC, and LR are
attened across the time step dimension and concatenated.
To examine the effect of multicollinearity as well as the effect
of using history on RC and LR, two other input datasets are
constructed. First, instead of concatenating the features
across multiple time steps, an input dataset is constructed
that uses the values of the last time step in the observation
window (quarter 8) for features that are invariable across
time (i.e., SEX, ETHNICITY, NZDEP) and the mean value of
features that are variable across time (i.e., TC/HDL, LL_SIM-
VASTATIN, HX_BROAD_CVD). Here, an exception is AGE
where the value at the 8th quarter is used. This dataset is
from here on referred to as aggregated. Second, an input
dataset is constructed using only the values of the last
quarter in the observation window. This dataset is from
here on referred to as last quarter. Due to the effect of
multicollinearity only the aggregated and last quarter data-
sets are used to evaluate Cox.
Table 1 Number of patients in the cohort with and without prior CVD and proportions of each respective subcohort that had a CVD
event and a fatal CVD event in their prediction window
CVD event Fatal CVD
Patients with prior CVD: 25,419 7,242 (approximately 28%) 2,116 (approximately 8%)
Patients with no prior CVD 74,677 4,989 (approximately 7%) 882 (approximately 1%)
Abbreviation: CVD, cardiovascular disease.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e156
All NN models used a two-unit densely connected layer
with softmax activation as the output layer. The unrolled
view across the time step dimension of the RNN models is
shown in Fig. 6.
Software Setup
Experiments are performed using Python 3.6.8,65 with NN
models using library Keras 2.2.466 with Tensorow 1.13.167
backend and linear models RC and LR using library Scikit-
learn 0.21.2.59 Experiments also used R version 3.6.0,
package pROC 1.16.268 for conducting DeLongs test and
packages survival 3.2.769 and pec 2019.11.370 for Cox
regression analysis. The package Autorank 1.1.1 is used
for comparing modelsperformance as measured by average
precision.71
Procedures for Hyperparameter Search
This section outlines the procedures performed to search for
the optimal set of hyperparameters for the LSTM, Simple
RNN, and MLP models. From the entire dataset, 10,096
samples are set aside as the test set and removed from the
search process. The remaining data (90,000 samples) are
used in the search process. For each combination of hyper-
parameters, a ve-fold cross validation is performed where
while the propor tion of data used for the train and validation
sets are consistent, with 90% train (81,000 samples) and 10%
validation (9,000 samples), different splits of train and
validation sets are used in the experiments. See Fig. 7 for
a visual illustration of how the data are split into train and
validation sets across the ve-folds. In these experiments we
use categorical cross-entropy as loss, where the validation
loss is monitored and the lowest mean validation loss is used
to determine the best set of hyperparameters.
For all experiments, the optimizer ADAM772 is used due
to its capacity to adaptively adjust the learning rate during
the training process and because its default hyperpara-
meters have been shown to work on a range of problems.
The ADAM optimizer is used with the default hyperpara-
meter values outlined in the original paper.72 These
hyperparameter values are, learning rate α¼0.001, the
exponential decay rate for the rst moment estimate β
1
¼
0.9, the exponential decay rate for the second moment
estimate β
2
¼0.999 and the small constant for numeric
stability ¼1e 7.66
See Table 2 for the found optimal hyperparameters of
the NN models.
For RC, hyperparameter search for the L2 regularisation
parameter and assessment of model performance on the
validation set is done at the same time using the data split
shown in Fold 1 in Fig. 7. Here, the values 1e
6
,1e
5
,1e
4
,
1e
3
,1e
2
, 0.1, 1 and 10.0 are searched. The found optimal L2
values and their respective accuracy on the validation set are
shown in Table 3, where the value of L2 is estimated using
the training samples, the accuracies reported are calculated
using the validation set.
Multicollinearity and Cox
When tting the Cox model, several features returned a
coefcient of NA: unknown.These features were removed
from the analysis to ensure predictions from the model
could be made. For Cox (aggregated) seven features were
removed. For Cox (last quarter) nine features were removed.
See Appendix Table 9 in Appendix for the removed
features.
Fig. 6 An unrolled view of RNN across the time-step dimension. Here,
RNN can be a layer of Simple RNN or LSTM. NN is a layer of densely
connected NN with softmax activation. X
n
are the inputs across n
timesteps. ŷis the output. LSTM, long short-term memor y; NN, neural
networks; RNN, recurrent neural network. Fig. 7 Illustration of the procedure used in splitting data into test,
train, and validation sets across different folds.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e157
Assess Model Performance
Once the optimal hyperparameters for each NN model have
been found, the models are trained using the found hyper-
parameters with the data split shown in Fold 1 in Fig. 7.The
Test set that is held aside is then used to assess model
performance. To ensure fairness, all linear models RC, LR,
and Cox are trained us ing the same training samples in Fold 1
and use the same test samples to measure model perfor-
mance. For LR and Cox, the samples from the validation set
are simply set aside in the process of model tting and
assessing model performance.
Taking into consideration the skewness of the classes (i.e.,
having a CVD event in the prediction window is much less
frequent than not having one), a further set of experiments
are conducted to add ress class imbalance. For the NN mo dels,
sample weighting that balances the two classes by making
each sample inversely proportional to their class frequency
in the training set is utilized. Sample weighting scales the
loss function during training; here the less frequent class
samples are given more weight thus contributing to greater
loss.73,74 The same weighting is applied to the classes for the
RC and LR models.
The analysis uses AUROC and average precision as metrics
for assessing model performance. Average precision is a
summary statistic for a precisionrecall (PR) curve. PR curves
can provide more discerning information when the dataset is
highly imbalanced.75,76 Recall (the x-axis of the PR curve) is
dened as
and precision (the y-axis of PR curve) is dened as
In PR space, the position of (1, 1) represents perfectdiscrim-
ination as opposed to (0, 1) in ROC space, where closer
the curve is to this point the better the discriminatory power of
the model. A horizontal line at ,wherePand Nare the
positive class and negative class frequencies, represent a no-
skill classier. The no-skill classier is equivalent to the classi-
er always predicting the minority class.77 Averageprecision is
a more conservative measurethan calculating the AUCwith the
trapezoidal rule. The average precision is formally dened as
here, P
n
and R
n
are precision and recall at the nth thresh-
old.59 Our experiments use a large number of thresholds
(equal to the size of the test set), so the difference is likely to
be small.
DeLongs test is used to statistically compare the resulting
AUROC of each models predictions. Currently, there is no
known signicance test for comparing two PR curves.78,79 To
compare the performance of models in PR space, the evalua-
tion utili zes bootstrapping to sample 1 00 10,000 depen-
dent samples of modelspredictions. From using 100 equal
splits of the sampled predictions, 100 average precision
scores are calculated for each model. The resulting average
precision scores are evaluated using the Autorank package.71
The Autorank package is built for conducting statistical
comparison between (multiple) paired populations. The
package uses the guidelines described in Demšar80 to rst
assess data normality and homoscedasticity before selecting
the appropriate statistical test for comparison.
Finally, to ascertain that the improvement in predictive
performance as the result of integrating patient history, an
ablation study using one-quarter and four-quarters observa-
tion windows is conducted using LSTM. The resulting two
modelspredictive performance are then compared with the
LSTM trained on eight quarters of observation window.
Results
The results of the modelsperformance on the test set are
shown in Table 4. The best performing modelsROC curves
and PR curves (with or without sample/class weighting) are
shown in Figs. 8 and 9.InFig. 10 details of the PR curves
are shown (with the same mapping of line colors to classi-
ersasinFigs. 8 and 9).
Table 2 NN model hyperparameters for the CVD event
prediction experiment
Models Hyperparameters
LSTM Layers: 1 LSTM and 1 Dense
Units: 32 (LSTM) and 2 (Dense)
Batch size: 16,384
L2:6.422e
2
Loss: categorical cross-entropy
Epochs: 200
Simple RNN Layers: 1 Simple RNN and 1 Dense
Units: 4 (Simple RNN) and 2 (Dense)
Batch size: 8,192
L2:1.318e
1
Loss: categorical cross-entropy
Epochs: 200
MLP Layers: 3 Dense and 2 Dropout
Units; 32, 32, 2
Batch size: 64
Dropout rate: Layer 1 2.500e
1
Layer 2 2.500e
1
Loss: categorical cross-entropy
Epochs: 50
Abbreviations: CVD, cardiovascular disease; LSTM, long short-term
memory; MLP, multilayer perceptron; RNN, recurrent neural network.
Table 3 Optimal L2 valuesfound for ridge classiers for CVD event
prediction and their respective accuracy on the validation set
L2 Accuracy
RC 1.0 0.886
RC (aggregated) 0.1 0.889
RC (last quarter) 0.1 0.887
Abbreviations: CVD, cardiovascular disease; RC, ridge classier.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e158
The signicance level of 0.05 is used for the comparison of
modelsAUROC. See Table 5 for the results of DeLongs tests.
The same signicance level is used for the comparison of
bootstrapped average precision scores using Autorank. The
internal evaluation using Shapiro-Wilk test and Bartletts
test showed the data from all models are normal and
homoscedastic. For that reason, repeated measures ANOVA
and Tukeys HSD test are used to determine if a signicant
difference of the mean exists between the modelsaverage
precision scores and which differences are of statistical
signicance. See Fig. 11 for the mean and 95.0% condence
interval of the modelsaverage precision scores. The result of
the analysis shows that no signicant differences were found
within the groups: RC (aggregated), RC, LR (aggregated), and
Simple RNN; RC, LR (aggregated), Simple RNN, and Cox
(aggregated); LR (aggregated), Simple RNN, Cox (ag gregated),
and RC (last quarter); Simple RNN, Cox (aggregated), RC (last
quarter), and MLP; Cox (aggregated), RC (last quarter), MLP,
and LR (last quarter); LR (last quarter), LR, and Cox (last
quarter). However, all other differences are of statistical
signicance.
Lastly, the results of the ablation study are shown
in Fig. 12.
Discussion
The results of the CVD event prediction experiment show
using average precision, LSTM is the overall leader (0.425) in
this prediction task with RC (aggregated) and LR (aggr egated)
Table 4 Model performance on CVD event prediction
Model Without weighting With weighting
AUROC Average precision AUROC Average precision
LSTM 0.801 0.425
a
0.800 0.423
Simple RNN 0.798 0.402 0.795 0.418
a
MLP 0.797 0.415
a
0.798 0.414
RC 0.799 0.420
a
0.798 0.409
RC (aggregated) 0.800 0.421
a
0.798 0.410
RC (last quarter) 0.794 0.417
a
0.794 0.400
LR 0.798 0.411
a
0.798 0.409
LR (aggregated) 0.801 0.421 0.802 0.421
a
LR (last quarter) 0.797 0.414
a
0.798 0.413
Cox (aggregated) 0.798 0.417 ––
Cox (last quarter) 0.793 0.411 ––
Abbreviations: AUROC, area under the receiver operating characteristic; CVD, cardiovascular disease; LR, logistic regression; LSTM, long short-term
memory; MLP, multilayer perceptron; RC, ridge classier; RNN, recurrent neural network.
a
The best performing average precision of the model.
Fig. 8 ROC curves of CVD event prediction. CVD, cardiovascular
disease; ROC, receiver operating characteristic.
Fig. 9 PR cur ves of CVD event prediction. CVD, cardiovascular
disease; PR, precision recall.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e159
Fig. 10 Detail plots of CVD event prediction PR curves, with the same
mapping of line colors to classiersasinFigs.8and9.CVD,
cardiovascular disease; PR, precision recall.
Table 5 p-ValuesofpairwisecomparisonofAUROCusingDeLongs test. The results are based on the best performing results of the
models, where the models Simple RNN and LR (aggregated) are trained with sample/class weighting. Using signicance level of
0.05, values under the Bonferroni adjusted signicance level of 9.091e 4 are highlighted
Simple
RNN
MLP RC RC
(aggr)
RC
(last)
LR LR
(aggr)
LR (last) Cox
(aggr)
Cox
(last)
LSTM 1.429e
3
8.107e
2
0.2171 0.4420 3.262e
3
0.1638 0.5561 4.218e
2
5.551e
2
5.901e
4
Simple
RNN
0.4844 0.1420 6.848e
2
0.6908 0.2948 1,365e
3
0.5711 0.2678 0.4641
MLP 0.3744 0.2219 0.2235 0.7310 2.972e
2
0.8878 0.7782 0.1666
RC 0.3885 1.602e
3
0.6324 8.400e
2
0.2690 0.6262 1.222e
2
RC (aggr) 3.631e
3
0.4285 0.1238 0.1698 0.2954 9.491e
3
RC (last) 0.1166 1.054e
3
5.921e
2
0.1466 0.6304
LR 4.035e
2
0.5481 0.9589 4.179e
2
LR (aggr) 4.848e
3
1.738e
4
9.985e
5
LR (last) 0.5733 1.269e
3
Cox
(aggr)
2.094e
2
Abbreviations: CVD, cardiovascular disease; LR, logistic regression; LSTM, long short-term memory ; MLP, multilayer perceptron; PR, precision recall;
RC, ridge classier; RNN, recurrent neural network.
Fig. 11 Statistical comparison of modelsperformances on the CVD
event prediction task. The plot shows the average precision mean and
95.0% condence intervals of the mean. Tukeys HSD test determined
no signicant differences exist within the groups: RC (aggregated),
RC, LR (aggregated) and Simple RNN; RC, LR (aggregated), Simple
RNN and Cox (aggregated); LR (aggregated), Simple RNN, Cox (ag-
gregated) and RC (last quarter); Simple RNN, Cox (aggregated), RC
(last quarter), and MLP; Cox (aggregated), RC (last quarter), MLP, and
LR (last quarter); LR (last quarter), LR and Cox (last quarter). All other
differences are found to be statistically signicant. CVD, cardiovas-
cular disease; HSD, honestly signicant difference; LR, logistic re-
gression; MLP, multilayer perceptron; PR, precision recall; RC, ridge
classier; RNN, recurrent neural network.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e160
ranked second equal (0.421). Our results conrm that PR
curves provide further valuable information when the
data are highly imbalanced. As an example, LR and Cox
(aggregated) both achieved AUROC of 0.798. However, the
same predictions achieved average precisions of 0.411 and
0.417, respectively (Table 4), a substantial difference in PR
space without any noticeable difference in ROC space. This
discrepancy is further conrme d when visually assessing the
ROC curves and PR cur ves plots (Figs. 8 and 9). In ROC space,
the curves of the models are densely packed together,
virtually indistinguishable from one another, whereas there
is a region in the PR space where the curves are noticeably
more variable and spread out. The detail plots of the PR
curves of recall in the interval of [0.4, 0.8] show there are
regions where LSTM clearly dominates the other models.
However, at the other end of the PR space where recall is in
the interval of [0.8, 1.0], the results are much more mixed.
The statistical analysis using ANOVA and Tukeys HSD test
comparing average precision scores of bootstrapped samples
shows signicant differences exist between groups, and that
the LSTM model is determined to be signicantly better than
all other models at this prediction task. It appears that the
capacity to retain and discard signicant and unimportant
events in the patients past in addition to modelling patient
history sequentially pr ovides LSTM the predic tive advantage,
making it the best performing model, by a small margin,
overall for this task.
The results also show that for this problem RC, RC (aggre-
gated) and LR (aggregated) are highly competitive against
the NN models. These models performed equally well as
Simple RNN. Here, it can be observed that RC (aggregated)
the best performing regression-based model achieved an
average precision mean of 0.421 and 95.0% condence
interval of (0.418, 0.425) and Simple RNN achieved an
average precision mean of 0.418 and 95.0% condence
interval of (0.415, 0.422). From the statistical analysis,
both models are found to belong to the same group (as
well as RC and LR [aggregated]) where there is no group
differences that are determined to be signicant. The statis-
tical analysis also found no signicant differences in the
group containing MLP, Cox (aggregated), RC (last quarter),
and LR (last quarter).
With the exception of LR, the worst performing linear
models are the models using only features from the last
quarter of the observation window. This indicates that for
this task, patient histor y is important irrespective of whether
it is explicitly sequential or in another representation. The
method of aggregating data by taking the mean of features
that vary across time in the observation window is the most
effective treatment of data for the linear models with RC
(aggregated), LR (aggregated), and Cox (aggregated) achiev-
ing the best results of each respective models. LRs relatively
poor performance (i.e., of the model using 8 quarters of
history) can be seen as the result of its incapacity to handle
multicollinearity. The ndings of this experiment suggest
there are no or limited non-linear interactions between the
features that the NN model could exploit.
In addition to the predictive advantage of LSTM, a
surprising nding is the competitiveness of RC and RC
(aggregated) in integrating patient history in a risk predic-
tion task when using structured data. These models are by
comparison much smaller than the NN models and require
far less hyperparameter tuning. This result shows that the
traditional regression-based approaches for risk modelling
can be improved by moving toward approaches in this
direction, by combining: (a) Integrating patient history by
capturing more factors across more time steps instead of
only using features from the last quarter before the index
date; and (b) Fitting a model with regularization such as
using RC so the tted coefcients are apt to deal with
multicollinearity.
Given the complexity of LSTM architecture a question
regarding the results might be whether LSTMs predictive
advantage is entirely due to model capacity rather than it
Fig. 12 Results of ablation study.
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e161
being a temporal model that is explicitly sequential. The
LSTM model used in our experiment contained 27,714
trainable parameters. In contrast, the MLP had 47,522
trainable parameters. This shows that model capacity alone
does not explain LSTMs performance. Additionally, the
results of the ablation study show that by including patient
history beyond just using patient data at the index date the
model performance improved, while the slight dip in AUROC
between using observation windows of 4 and 8 quarters
(from 0.802 to 0.801) is unlikely to be signicant. Further, the
metric better suited for imbalanced classication average
precision shows a monotonic increase in performance as
the observation window lengthened.
Recent results in clinical risk prediction using sequential
modelling typically focus on a short prediction horizon, e.g.,
next visit or 6 months.8183 In contrast, the current study
adopted a 5-year prediction horizon used in an established
clinical decision support system,4and leveraged routinely
collected EHR from a diverse population level dataset to
facilitate comparison. If LSTM is adopted as a model for
assessing CVD risk, it will be applied at a large scale. PREDICT
has been used >500k times in New Zealand. If a performance
difference is statistically signicant, then even if it is only
moderately better, it is a meaningful difference because, at
this scale, there would be many more cases where the
clinician gets the right answer, instead of the wrong answer,
from the model.
Two decades ago, there was a paradigm shift in CVD risk
management in clinical practice from prevention based on
managing individual risk factors (e.g., high blood pressure
and high cholesterol) to one that is based on the combination
of risk factors; a shift from focusing on relative risk to one
that focuses on absolute risk.84 Since then, many guidelines
on CVD risk assessment have moved from using paper charts
to computerized clinical decision support systems as the
number of predictor variables have grown over the inter-
vening years.13,6,8588 This trend is likely to continue as
non-classical CVD risk factors such as socio-economic depri-
vation are found to be strongly associated with CVD risk.1,4
Conventionally, Cox proportional hazard models are used for
these clinical decision support systems. Recently, studies
have focused on machine learning techniques to improve
predictive performance.89,90
Like many other non-communicable diseases, the devel-
opment, progression, and management of CVD are pro-
longed and long-term. This characteristic of the disease
makes the ability to include in the analytics of CVD risk
patient history in a multivariate and explicitly sequential
manner a desideratum, so that the dynamic temporal
interactions between the risk factors can be modeled.
Until recently, sequentially modelling long-range depen-
dency has remained computationally infeasible as shown in
the case of the widely studied and used Hidden Markov
Models.91 This study demonstrates the suitability of using
LSTM for sequentially modelling patient history on struc-
tured/tabulated data and a proof of concept that gains can
be made using LSTM for predicting CVD event over a 5-year
interval.
There are several limitations of the current study. Long-
termin the context of CVD can mean decades. Researchers
of CVD thera py have pointed to the knowledge gap that exis ts
between the evidence from randomized clinical trails, typi-
cally only lasting a few years, and the effect of long-term
medication treatment (it is common for therapy to continue
for decades) in secondar y prevention.92 The study design was
unable to capture the long-term (dened in the scale of
decades) effect of disease progression and treatment trajec-
tory. While preserving a useful number of cases, the data
construction used in this study was only able to achieve a
7 year window to divide between observation and predic-
tion. In the future, however, this will change as routinely
collected EHRs lengthen year on year. Another limitation of
the study is that L STM like other NN models, is a class of bl ack
box models where the inuence of and interactions between
predictor variables cannot be readily explained. Consider-
able research has been performed investigating methods to
interpret and explain neural models,93,94 and some speci-
cally for RNNs.95,96 These methods are clearly worthy direc-
tions of future work as they hold the potential for aiding risk
communication. Another possible future direction is to
incorporate time information such as by using: a decay
function, temporal encoding, or by combining a vector
representation for time with model architecture in sequen-
tial modelling83,97,98; or to utilize an attention mechanism to
boost model performance.8183,95 Lastly, the current study
focused on event prediction not time-to-event estimation
nor risk level prediction, which Cox proportionate hazards
models facilitate. Determining if the results of the present
study extend from event prediction to risk level and time-to-
event estimation would be a valuable next step in making the
case for widespread use of explicitly-temporal models in
chronic disease decision support.
Conclusion
The investigationsperformed in this study found that routinely
collected health data can be leveraged to predict patientsrisk
of a CVD event (fatal or non-fatal). Moreover, it is observed that
the LSTM model, outperformed linear additive models. For
CVD event prediction, LSTM provided the best average preci-
sion, signicantly outperforming all other models compared.
The additive models RC (aggregated), RC and LR (aggregated)
were found to be highly competitive, outperforming MLP and
matching the performance of Simple RNN as measured by
average precision. These results suggest for this prediction
task, apart from LSTM, classical statistical models are equally
performant as non-linear models. In our experiments, various
inputs were examined for the linear models to quantify the
potential for patient history to be used to improve their
performance. These include using the full sets of features
across the eight quarters of observation window, using aggre-
gated features and using only the last quarter of the observa-
tion window. For all linear models, using aggregated data
provided the best performance and RC (aggregated) was found
to be the best performing linear model for the prediction task.
Alongside the strength of LSTM, these ndings regarding the
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e162
inputs of linear models further corroborate that history
matters in the context of CVD event prediction. As routinely
collected EHR continues growing, alleviating one of the pri-
mary obstacles in applying deep learning methods, this study
provides incentive for LSTM to be further explored as an event
prediction model in the management of CVD, where even a
marginal gain can have substantial economic and social
benets.
Conict of Interest
None declared.
Acknowledgment
This study is supported by the University of Auckland
Doctoral Scholarship and in part by New Zealand Health
Research Council program grant HRC16609. The authors
thank Kylie Chen for code checking the time series con-
struction code and Mike Merry for facilitating network
connection to the GPU machine during COVID lockdowns.
Thanks to the members of the VIEW research team for
their feedback on earlier drafts of the manuscript.
References
1Jørstad HT, Colkesen EB, Minneboo M, et al. The Systematic
COronary Risk Evaluation (SCORE) in a large UK population: 10-
year follow-up in the EPIC-Norfolk prospective population study.
Eur J Prev Cardiol 2015;22(01):119126
2Conroy RM, Pyörälä K, Fitzgerald AP, et al; SCORE project group.
Estimation of ten-year risk of fatal cardiovascular disease in
Europe: the SCORE project. Eur Heart J 2003;24(11):9871003
3Goff DC Jr, Lloyd-Jones DM, Bennett G, et al; American College of
Cardiology/American Heart Association Task Force on Practice
Guidelines. 2013 ACC/AHA guideline on the assessment of car-
diovascular risk: a repor t of the American college of cardiology/
American heart association task force on practice guidelines.
Circulation 2014;129(25, suppl 2):S49S73
4Pylypchuk R, Wells S, Kerr A, et al. Cardiovascular disease risk
prediction equations in 400 000 primary care patients in
New Zealand: a derivation and validation study. Lancet 2018;
391(10133):18971907
5Poppe KK, Doughty RN, Wells S, et al. Developing and validating a
cardiovascular risk score for patients in the commun ity with prior
cardiovascular disease. Heart 2020;106(07):506511
6Ministry of Health Cardiovascular Disease Risk Assessment and
Management for Primary Care; 2018. Accessed July 3, 2020, at:
https://www.health.govt.-
nz/system/les/documents/publications/cardiovascular-disease-
risk-assessment-management-primary-care-feb18-v4_0.pdf
7National Health Committee Strategic Overview: Cardiovascular
Disease in New Zealand; 2013. Accessed December 3, 2020, at:
https://www.moh.govt.nz/NoteBook/nbbooks.nsf/0/
FAC55041FD6DBDADCC257F7F006CDC16/$le/strategic-ver-
view-cardiovascular-disease-in-nz.pdf
8Hero C, Svensson AM, Gidlund P, Gudbjörnsdottir S, Eliasson B, Eeg-
Olofsson K. LDL cholesterol is not a good marker of cardiovascular
risk in type 1 diabetes. Diabet Med 2016;33(03):316323
9Lemieux I, Lamarche B, Couillard C, et al. Total cholesterol/HDL
cholesterol ratio vs LDL cholesterol/HDL cholesterol ratio as
indices of ischemic heart disease risk in men: the Quebec Cardio-
vascular Study. Arch Intern Med 2001;161(22):26852692
10 Millán J, Pintó X, Muñoz A, et al. Lipoprotein ratios: physiological
signicance and clinical usefulness in cardiovascular prevention.
Vasc Health Risk Manag 2009;5:757765
11 Stewart RA, Kerr A. Non-adherence to medication and cardiovas-
cular risk. N Z Med J 2011;124(1343):610
12 Brown MT, Bussell JK. Medication adherence: WHO cares? Mayo
Clin Proc 2011;86(04):304314
13 Mabotuwana T, Warren J, Harrison J, Kenealy T. What can primary
care prescribing data tell us about individual adherence to long-
term medication?-comparison to pharmacy dispensing data
Pharmacoepidemiol Drug Saf 2009;18(10):956964
14 Grey C, Jackson R, Wells S, et al. Maintenance of statin use over
3 years following acute coronary syndromes: a national data
linkage study (ANZACS-QI-2). Heart 2014;100(10):770774
15 Sigglekow F, Horsburgh S, Parkin L. Statin adherence is lower in
primary than second ary prevention: a national follow-up study of
new users. PLoS One 2020;15(11):e0242424
16 Ellis JJ, Erickson SR, Stevenson JG, Bernstein SJ, Stiles RA, Fendrick
AM. Suboptimal statin adherence and discontinuation in primar y
and secondary pr evention populations. J Ge n Intern Med 2004;19
(06):638645
17 Vinogradova Y, Coupland C, Brindle P, Hippisley-Cox J. Discontin-
uation and restarting in patients on statin treatment: prospective
open cohort study using a primary care database. BMJ 2016;353:
i3305
18 Protti D, Bowden T. Electronic medical record adoption in New
Zealand primar y care physician ofces. Commonw Fund 2010;96
(1434):114
19 What is the HITECH a ct? HIPAA Journal. 2021 Accessed Januar y 27,
2021, at: https://www.hipaajournal. com/what-is-the-hitech-act/
20 Jorm L. Routinely collected data as a strategic resource for
research: priorities for methods and workforce. Public Heal Res
Pract 2015 Sep 30;25(04):e2541540
21 Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learni ng
in healthcare. Nat Med 2019;25(01):2429
22 Lundervold AS, Lundervold A. An overview of deep learning in
medical imaging focusing on MRI. Z Med Phys 2019;29(02):
102127
23 Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classi-
cation of skin cancer with deep neural networks. Nature 2017;
542(7639):115118
24 Haenssle HA, Fink C, Schneiderbauer R, et al; Reader study level-I
and level-II Groups. Man against machine: diagnostic perfor-
mance of a deep learning convolutional neural network for
dermoscopic melanoma recognition in comparison to 58 derma-
tologists. Ann Oncol 2018;29(08):18361842
25 Kooi T, Litjens G, van Ginneken B, et al. Large scale deep learning
for computer aided detection of mammographic lesions. Med
Image Anal 2017;35:303312
26 De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applica-
ble deep learning for diagnosis and referral in retinal disease. Nat
Med 2018;24(09):13421350
27 Liu Y, Gadepalli K, Norouzi M, et al. Detecting cancer metastases
on gigapixel pathology images. 2017:113 Accessed February 22,
2022, at: http://arxiv.org/abs/1703.02442
28 Neural nets vs. regression models. Statistical Modeling, Causal
Inference, and Social Science. 2019 Accessed Januar y 10, 2021, at:
https://statmodeling.stat.columbia.edu/2019/05/21/neural-nets
-vs-statistical-models/
29 Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep
learning with electronic health records. NPJ Digit Med 2018;1
(March):18
30 Benedetto U, Sinha S, Lyon M, et al . Can machine learning improve
mortality prediction following cardiac surgery? Eur J Cardio-
thorac Surg 2020;58(06):11301136
31 Cheng JZ, Ni D, Chou YH, et al. Computer-aided diagnosis with
deep learning architecture: applications to breast lesions in US
images and pulmonary nodules in CT scans. Sci Rep 2015;2016
(06):113
32 Hochreiter S, Schmidhuber J. Long short-term memory. Neural
Comput 1997;9(08):17351780
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e163
33 Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual
prediction with LSTM. Neural Comput 2000;12(10):24512471
34 Graves A, Jaitly N. Towards end-to-end speech recognition with
recurrent neural networks. Paper presented at: 31st Int Conf
Mach Learn ICML 2014. 2014;5:37713779
35 Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with
neural networks. Adv Neural Inf Process Syst 2014;4
(January):31043112
36 Graves A. Generating sequences with recurrent. Neural Netw
2013;•••:143 Accessed February 22, 2022, at: http://arxiv.-
org/abs/1308.0850
37 Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: a neural
image caption generator. Paper presented at: Proceedings of the
IEEE conference on computer vision and pattern recognition;
2015:31563164
38 Venugopalan S, Rohrbach M, Darrell T, Donahue J, Saenko K,
Mooney R. Sequence to Sequence Video to Text. 2015 Accessed
February 22, 2022, at: http://arxiv.org/abs/1505.00487
39 Ren M, Kiros R, Zemel RS. Image question answering: a visual
semantic embedding model and a new dataset. 2015 Accessed
February 22, 2022, at: http://arxiv.org/abs/1505.02074v1
40 Lipton ZC, Kale DC, Elkan C, Wetzel R. Learning to diagnose with
LSTM recurrent. Neural Netw 2015;•••:118 Accessed Febru-
ary 22, 2022, at: http://arxiv.org/abs/1511.03677
41 Xu Y, Biswal S, Deshpande SR, Maher KO, Sun J. R AIM: recurrent
attentive and intensive model of multimodal patient monitoring
data. Paper presented at: Proceedings of the 24th ACM SIGKDD
International Conference o n Knowledge Discovery & Data Mining.
ACM 2018;18:25652573
42 Pham T, Tran T, Phung D, Venkatesh S. DeepCare: A Deep Dynamic
Memory Model for Predictive Medicine. Lect Notes Comput Sci
(including Subser Lect Notes Artif Intell Lect Notes Bioinformat-
ics). 2016; 9652 LNAI(i):3041
43 VIEW research. The University of Auckland, Medical and Health
Sciences. Accessed May 23, 2021, at: https://www.fmhs.auck-
land.ac.nz/en/soph/about/our-departments/epidemiology-and-
biostatistics/research/view-study/research.html
44 Wells S, Riddell T, Kerr A, et al. Cohor t Prole: the PREDICT
cardiovascular disease cohort in New Zealand primary care
(PREDICT-CVD 19). Int J Epidemiol 2017;46(01):22
45 Welcome to TestSafe. CareConnect. 2022 Accessed February 22,
2022, at: https://www.careconnect.co.nz/testsafe/
46 Collections. Ministry of Health, ManatūHauora. 2019 Accessed
May 30, 2021, at: https://www.health.govt.nz/nz-health-statis-
tics/national-collections-and-surveys/collections
47 ICD-10-AM/ACHI/ACS Development. Ministry of Health, Manatū
Hauora. 2021. Accessed August 29, 2021, at: https://www.health.
govt.nz/nz-health-statistics/national-collections-
and-surveys/collections
48 What your cholesterol levels mean. American Heart Assoication.
2017. Accessed March 3, 2020, at: https://www.heart.org/en/
health-topics/cholesterol/about-cholesterol/what-your-choles-
terol-levels-mean
49 Creatinine: what is it? National Kidney Foundation. 2019
Accessed March 3, 2020, at: https://www.kidney.org/atoz/con-
tent/what-creatinine
50 What is the HbA1c test? Health Navigator New Zealand. 2019
Accessed March 3, 2020, at: https://www.healthnavigator.org.-
nz/health-a-z/h/hba1c-testing
51 National Minimum Dataset (Hospital Events) Data Dictionary
version 7.9. 2018. Accessed November 23, 2022, at: https://www.
health.govt.nz/system/les/documents/publications/nmds_data_
dictionary_v7.9.pdf
52 Ethnic group summaries reveal New Zealands multicultural
make-up. Stats NZ. 2020. Accessed May 22, 2021, at: https://
www.stats.govt.nz/news/ethnic-group-summaries-reveal-new-
zealand-multicultural-make-up
53 Fatal and Non-fatal EVENTS. VIEW#Data Wikipage. 2018
Accessed December 3, 2020, at: https://wiki.auckland.ac.nz/
display/VIEW/FatalþandþNon-fatalþEvents
54 Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press;
2016. Accessed February 22, 2022, at: http://www.deeplearning-
book.org
55 Grosse R. Lecture5: Multilayer Perceptrons. Intro to Neural
Networks and Machine Learning. 2018. Accessed November 8,
2020, at: http://www.cs.toronto.edu/~rgrosse/courses/csc321_
2018/readings/L05%20Multilayer%20Perceptrons.pdf
56 Hoerl AE, Kennard RW. Ridge regression: applications to non-
orthogonal problems. Technometrics 1970;12(01):6982
57 Tibshirani R. Regression shrinkage and selection via the Lasso. J R
Stat Soc B 1996;58(01):267288
58 Zou H, Hastie T. Regularization and Variable Selection Via the
Elastic Net. JSTOR; 2005
59 Pedregosa F, Weiss R, Brucher M. Scikit-learn. Machine Learn
Python 2011;12:28252830
60 Cule E, De Iorio M. Ridge regression in prediction problems:
automatic choice of the ridge parameter. Genet Epidemiol
2013;37(07):704714
61 De Vlaming R, Groenen PJF. The Current and Future Use of Ridge
Regression for Prediction in Quantitative Genetics. BioMed
Research International. 2015. Accessed February 22, 2022, at:
https://www.hindawi.com/journals/bmri/2015/143712/
62 Niemann U, Boecking B, Brueggemann P, Mebus W, Mazurek B,
Spiliopoulou M. Tinnitus-related distress after multimodal treat-
ment can be characterized using a key subset of baseline varia-
bles. PLoS One 2020;15(01):e0228037
63 Smolin B, Levy Y, Sabbach- Cohen E, Levi L, Mashiach T. Predicting
mortality of elderly patients acutely admitted to the Department
of Internal Medicine. Int J Clin Pract 2015;69(04):501508
64 Lanièce I, Couturier P, Dramé M, et al. Incidence and main factors
associated with early unplanned hospital readmission among
French medical inpatients aged 75 and over admitted through
emergency units. Age Ageing 2008;37(04):416422
65 Python language reference. Python Software Foundation. 2017.
Accessed November 10, 2020, at: https://www.python.org/
66 Chollet F, et al. Keras: the Python deep learning API. 2015
Accessed November 23, 2022, at: https://keras.io
67 Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large-Scale
Machine Learning on Heterogeneous Distributed Systems. 2015
Accessed November 23, 2022, at: https://tensorow.org
68 Robin X, Turck N, Hainard A, et al. pROC: an open-source package
for R and Sþto analyze and compare ROC curves. BMC Bioinfor-
matics 2011;12(77):77
69 Therneau TM. A package for survival analysis in R. 2020. R
package version 3.27, Accessed November 23, 2022, at:
https://CRAN.R-project.org/package=survival
70 Mogensen UB, Ishwaran H, Gerds TA. Evaluating random forests
for survival analysis using prediction error curves. J Stat Softw
2012;50(11):123
71 Herbold S. Autorank: a Python package for automated ranking of
classiers. J Open Source Softw 2020;5:2173
72 Kingma DP, Ba J. Adam: a method for stochastic optimization.
2014:115. Accessed November 23, 2022, at: https://arxiv.org/
abs/1412.6980
73 The Sequential model API. Keras 2.0.6 Documentation. Accessed
January 16, 2021, at: https://faroit.com/keras-docs/2.0.6/
74 sklearn.utils.class_weight.compute_class_weight. scikit learn.
2020. Accessed January 16, 2021, at: https://scikit-learn.org/
stable/modules/generated/sklearn.utils.class_weight.compute_-
class_weight.html
75 Johnson JM, K hoshgoftaar TM. Survey on d eep learning with class
imbalance. J Big Data 2019;6(27):154
76 Davis J, Goadrich M. The relationship between precision -recall
and ROC curves. Proc. 23
rd
International Conference on Machine
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e164
Learning. 2006. Accessed February 22, 2022, at: https://www.
biostat.wisc.edu/~page/rocpr.pdf
77 On ROC and precision-recall curves. Towards data science.
Accessed December 20, 2020, at: https://towardsdatascience.-
com/on-roc-and-precision-recall-curves-c23e9b63820c
78 Baldwin B. Comparing precision-recall curves the bayesian way?
LingPipe Blog. 2010. Accessed December 21, 2020, at: https://
lingpipe-blog.com/2010/01/29/comparing-precision-recall-
curves-bayesian-way/
79 Statistical test for comparing precision-recall curves. Cross
validated. 2020. Accessed December 21, 2020, at: https://stats.
stackexchange.com/questions/499672/statistical-test-for-com-
paring-precision-recall-curves
80 Demšar J. Statistical comparisons of classiers over multiple data
sets. J Mach Learn Res 2006;7:130
81 Ma F, Chitta R, Zhou J, You Q, Sun T, Gao J. Dipole: Diagnosis
Prediction in He althcare via Attention-based Bidirect ional Recur-
rent Neural Networks. Paper presented at: Proceedings of the 23
rd
ACM SIGKDD International Conference on Knowledge Discov-
ery and Data Mining. August 1317, 2017; Halifax, NS
82 Luo J, Ye M, Xiao C, Ma F. HiTANet: Hierarchical Time-Aware
Attention Networks for Risk Prediction on Electronic Health
Records. Paper presented at: Proceedings of the 26
th
ACM
SIGKDD International Conference on Knowledge Discovery and
Data Mining. August 2327, 2020; Virtual Conference
83 Pham TH, Yin C, Mehta L, Zhang X, Zhang P. Cardiac Complication
Risk Proling for Cancer Survivors via Multi-View Multi-Task
Learning. Paper presented at: Proceedings of the 2021 IEEE
International Conference on Data Mining (ICDM). December 07
11, 2021; Auckland, NZ
84 Jackson R. Guidelines on preventing cardiovascular disease in
clinical practice. BMJ 2000;320(7236):659661
85 ASCVD risk estimator. American College of Cardiology. 2018
Accessed January 22, 2021, at: https://tools.acc. org/ldl/ascvd_ris-
k_estimator/index.html#!/calulate/estimator/estimator
86 Welcom to the QRISK 32018 risk calculator. ClinRisk. 2018.
Accessed January 23, 2021, at: https://qrisk.org/three/index.php
87 Tunstall-Pedoe H. Cardiovascular risk and risk scores: ASSIGN,
Framingham, QRISK and others: how to choose. Heart 2011;97
(06):442444
88 de la Iglesia B, Potter JF, Poulter NR, Robins MM, Skinner J.
Performance of the ASSIGN cardiovascular disease risk score on
a UK cohort of pat ients from general practice. Heart 2011;97(06):
491499
89 Alaa AM, Bolton T, Angelantonio ED, Rudd JHF, Van Der Schaar M.
Cardiovascular Disease Risk Predic tion Using Automated Machine
Learning: A Prospective Study of 423, 604 UK Biobank Partici-
pants. PLOZ ONE; 2019:117
90 Chun M, Clarke R, Cairns BJ, et al; China Kadoorie Biobank
Collaborative Group. Stroke risk prediction using machine learn-
ing: a prospective coho rt study of 0.5 million Chi nese adults. J Am
Med Inform Assoc 2021;28(08):17191727
91 Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent
neural networks for sequence learning. 2015 Accessed Febru-
ary 22, 2022, at: http://arxiv.org/abs/1506.00019
92 Rossello X, Pocock SJ, Julian DG. Long-term use of cardiovascular
drugs challenges for research and for patient care. J Am Coll
Cardiol 2015;66(11):12731285
93 Holzinger A, Biemann C, Pattichis CS, Kell DB. What do we
need to build explainable AI systems for the medical domain?
2017 Accessed February 22, 2022, at: http://arxiv.org/abs/1712.
09923
94 Goebel R, Chander A, Holzinger K, et al. Explainable AI : the
new 42? Machine Learn Knowledge Extraction 2018;11015:
295303
95 Choi E, Bahadori MT, Kulas JA, Schuetz A, Stewart WF, Sun J.
RETAIN: an interpretable predictive model for healthcare using
reverse time attention mechanism. 2016 Accessed February 22,
2022, at: http://arxiv.org/abs/1608.05745
96 Ho LV, Aczon M, Ledbetter D, Wetzel R. Interpreting a recurrent
neural networks predictions of ICU mortality risk. J Biomed
Inform 2021;114:103672
97 Baytas IM, Xiao C, Zhang X, Wang F, Jain AK, Zhou J. Patient
Subtyping via Time-Aware LSTM Networks. Paper presented at:
Proceedings of the 23
rd
ACM SIGKDD Internationa l Conference on
Knowledge Discovery and Data Mining. August 1317, 2017;
Halifax, NS
98 Kazemi SM, Goel R, Eghbali S, et al. Time2Vec: Learning a Vector
Representation of Time. 2019 Accessed August 8, 2022 , at: https://
arxiv.org/abs/1907.05321
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e165
Appendix Table 1 VIEW CVD categories: CVD history, CVD mortality and CVD outcome, feature names under the categories and
feature descriptions. Feature names prexed with MORTALITYor OUT are used to identify outcome events (with the exception of
OUT_AT RIA L_FI BRI LLATIO N)
VIEW CVD categories
Category Feature name Description
History HX_BROAD_CVD
HX_ATHERO_CVD
HX_CHD_DIAG
HX_ACS
HX_MI
HX_UNST_ANGINA
HIST_ANGINA
HX_OTHER_CHD
HX_CHD_PROCS
HX_PCI
HX_CABG
HX_OTHER_CHD_PROCS
HX_PVD_DIAGS
HX_PVD_PROCS
HX_HAEMORRHAGIC_STROKE
HX_CEVD
HX_ISCHAEMIC_STROKE
HX_TIA
HX_OTHER_CEVD
HX_HEART_FAILURE
HX_ATRIAL_FIBRILLATION
History of broad CVD
History of atherosclerotic CVD
History of coronary heart disease (diagnoses)
History of acute coronary syndrome
History of myocardial infarction
History of unstable angina
History of angina
History of other coronary disease
History of coronary heart disease
History percutaneous coronary intervention
Historyofcoronaryarterybypassgraft
History of other coronary procedure
History of peripheral vascular disease
History of peripheral vascular procedure
History of hemorrhagic stroke
History of cerebral vascular disease
History of ischemic stroke
History of transient ischemic attack
History of other cerebral vascular disease
History of heart failure
History of atrial brillation
Mortality MORTALITY_BROAD_CVD_WITH_OTHER
MORTALITY_OTHER_RELATED_CVD_DEATHS
Death involving broad CVD
Death involving other related CVD
Outcome OUT_BROAD_CVD
OUT_ATHERO_CVD
OUT_CHD
OUT_MI
OUT_ACS
OUT_UNST_ANGINA
OUT_ANGINA
OUT_OTHER_CHD
OUT_PVD_DIAGS
OUT_PVD_PROCS
OUT_PCI_CABG
OUT_HAEMORRHAGIC_STROKE
OUT_CEVD
OUT_ISCHAEMIC_STROKE
OUT_TIA
OUT_OTHER_CEVD
OUT_HEART_FAILURE
OUT_ATRIAL_FIBRILLATION
Outcome of broad CVD
Outcome of atherosclerotic CVD
Outcome of coronary heart disease
Outcome of myocardial infarction
Outcome of acute coronary syndrome
Outcome of unstable angina
Outcome of angina
Outcome of acute coronary syndrome
Outcome of peripheral vascular disease
Outcome of peripheral vascular procedure
Outcome of percutaneous coronary intervention
Outcome of hemorrhagic stroke
Outcome of cerebral vascular disease
Outcome of ischemic stroke
Outcome of transient ischemic attack
Outcome of other cerebral vascular disease
Outcome of heart failure
Outcome of atrial brillation
Appendix A
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e166
Appendix Table 2 PREDICT variables and their descriptions
Variable name Description
PT_SBP Current systolic blood pressure (sitting)
PT_SBP2 Previous systolic blood pressure (sitting)
PT_DBP Current diastolic blood pressure (sitting)
PT_DBP2 Previous diastolic blood pressure (sitting)
PT_SMOKING Smoking history or current status
PT_EN_TCHDL TC/HDL cholesterol result
PT_DIABETES Diabetes status
PT_FAMILY_HISTORY Family history of premature CVD
PT_GEN_LIPID
PT_RENAL
Diagnosed genetic lipid disorder
Renal disease status
PT_DIABETES_YR Number of years since diabetes diagnosis
PT_ATRIAL_FIBRILLATION ECG conrmed atrial brillation
PT_IMP_FATAL_CVD
a
Improved fatal CVD using mortality record and 28 day rule
Abbreviation: CVD, cardiovascular disease.
a
This feature captures all patients with CVD as cause of death on their death certicate with or without hospitalization. In addition, those without
CVDrecordedontheirdeathcerticate but who had a CVD hospital admission up to 28 days before their date of death are included. The VIEW
research group refers to this as the 28 day rulefor reclassif ying non-C VD death as C VD death.
Appendix Table 3 Affected variables, their conditions that require addressing, the action taken, and the number of affected cases
Variable Condition Action Number of cases
PT_DIABETES_YR <0 Remove samples 4
PT_DBP2 Missing Assign PT_DBP value 7
PT_RENAL Missing Assign 0 to missing values and
change all other values to value þ1
65,086
PT_ATRIAL_FIBRILLATION Missing Assign 0 to missing values and
change all other values to value þ1
22
PT_DIABETES_YR Missing Assign 0 to missing values 65,084
PT_EN_TCHDL Missing Assign last TC/HDL result from TestSafe 889
SEX String values Encode as a binary variable 100,096
ETHNICITY
(MELAA and Other)
Small
sample size
Remove samples MELAA (1568), Other (8)
ETHNICITY
(Chinese and Other Asian)
Small
sample size
Combined Chinese (5,317)
Other Asian (3,655)
PT_SMOKING Missing Remove samples 2
PT_GEN_LIPID Missing Remove sample 1
ETHNICITY String values One-hot encoded 100,096
HBA1C Missing Impute using a linear model with
AGE, SEX, NZDEP and ETHNICITY
as predictor variables
983
EGFR Missing Impute using a linear model with
AGE, SEX, NZDEP and ETHNICITY
as predictor variables
56
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e167
Appendix Table 4 Descriptive statistics: demographic variables. Number of patients in each category
ID 100,096 NZDEP
1 21,167
Sex 2 19,074
Male 56,557 (56.5%) 3 17,141
Female 43,539 (43.5%) 4 18,903
Age (at index date) 5 23,811
Mean (SD) 61.82 (11.29)
1824 48 Ethnicity
2534 691 European 56,641
3544 5,690 Māori 9,977
4554 20,380 Pacic 14,878
5564 32,885 Chinese/Other Asian 8,971
6574 28,261 Indian 9,629
7584 10,379 DIED (%) 6,634 (6.6%)
85þ1,762
Appendix Table 5 Descriptive statistics: cholesterols. TEST and TESTED are binary features and the statistics are the number of
quarters in the entire dataset where the features contained a 1 and its relative percentage
Test (%) 885,936 (31.6%)
HDL mean (SD) 1.28 (0.37)
LDL mean (SD) 2.26 (0.96)
TRI mean (SD) 1.74 (1.04)
TCL mean (SD) 4.69 (1.13)
TC/HDL mean (SD) 3.85 (1.15)
Tested (%) 2,698,599 (96.3%)
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e168
Appendix Table 6 Descriptive statistics: hospitalization. Number of patients who had acute hospital admission within their time-
series and number of patients who had hospitalizations with clinical code mapping to the specied category in their time-series
NUMBER_OF_DAYS>0
mean (SD)
6.37 (11.87) MORTALITY_BROAD_CVD
_WITH_OTHER
17,463
ACUTE_ADM 54,448 MORTALITY_OTHER
_RELATED_CVD_DEATHS
2,416
HX_BROAD_CVD 32,542
HX_ATHERO_CVD 30,259 OUT_BROAD_CVD 16,421
HX_CHD_DIAGS 23,207 OUT_ATHERO_CVD 14,308
HX_ACS 16,777 OUT_CHD 9,689
HX_MI 13,799 OUT_MI 5,944
HX_UNST_ANGINA 6,596 OUT_ACS 7,445
HX_ANGINA 8,489 OUT_UNST_ANGINA 2,104
HX_OTHER_CHD 20,416 OUT_ANGINA 3,300
HX_CHD_PROCS 12,771 OUT_OTHER_CHD 3,539
HX_PCI 8,646 OUT_PVD_DIAGS 1,537
HX_CABG 5,659 OUT_PVD_PROCS 1,922
HX_OTHER_CHD_PROCS 335 OUT_PCI_CABG 5,758
HX_PVD_DIAGS 5,301 OUT_HAEMORRHAGIC
_STROKE
521
HX_PVD_PROCS 3,551
HX_HAEMORRHAGIC_STROKE 1,204 OUT_CEVD 4,364
HX_CEVD 8,403 OUT_ISCHAEMIC_STROKE 3,011
HX_ISCHAEMIC_STROKE 5,878 OUT_TIA 1,598
HX_TIA 3,159 OUT_OTHER_CEVD 50
HX_OTHER_CEVD 772 OUT_HEART_FAILURE 3,096
HX_HEART_FAILURE 8,079 OUT_ATRIAL_FIBRILLATION 3,288
HX_ATRIAL_FIBRILLATION 10,902
Appendix Table 7 Descriptive statistics: HbA1c and eGFR.
TEST_HBA1C, TESTED_HBA1C, TESTED_EGFR and
TESTED_EGFR are binary features and the statistics are the
number of quarters in the entire dataset where the feature
contained a 1 and its relative percentage
HBA1C mean (SD) 47.98 (15.20)
TEST_HBA1C 819,747 (28.9%)
TESTED_HBA1C 2,268,295 (80.9%)
EGFR mean (SD) 77.85 (20.11)
TEST_EGFR 1,041,487 (37.2%)
TESTED_EGFR 2,694,767 (96.1%)
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e169
Appendix Table 8 Descriptive statistics: PREDICT. PT_SMOKING, PT_DIABETES, PT_FAMILY_HISTORY, PT_GEN_LIPID, PT_RENAL,
PT_ATRIAL_FIBRILLATION and PT_IMP_FATAL_CVD show number of patients in each category
PT_SBP mean (SD) 132.25 (16.99) PT_GEN_LIPID
0 (None) 92,492
1 (Familial hypercholesterolemia) 5,569
PT_SBP2 mean (SD) 132.57 (17.24) 2 (Familial defective apoB) 20
3 (Familial combined dyslipidemia) 499
PT_DBP mean (SD) 78.70 (10.25) 4 (Other genetic lipid disorder) 1,516
PT_DBP2 mean (SD) 79.07 (10.30)
PT_SMOKING
0 (Never) 66,896
1(Quit>12 mo) 20,162
2(Quit12 mo) 1,901
3 (Up to 10/d) 6,249
4(1119/d) 3,046 PT_RENAL
5(20þ/d) 1,842 0 (Missing value) 64,131
PT_EN_TCHDL mean (SD) 3.90 (1.22) 1 (No nephropathy) 27,585
2 (Conrmed microalbuminuria) 5,996
PT_DIABETES 3 (Over diabetic Nephropathy) 1,975
0 (No diabetes) 64,125 4 (Non-diabetic nephropathy) 409
1 (Type 1) 1,267
2 (Type 2) 32,754
3 (Type unknown) 1,950
PT_FAMILY_HISTORY 20,162
PT_DIABETES_YR mean (SD) 8.19 (7.30)
PT_ATRIAL_FIBRILLATION
0 (Missing value) 21
1 (None) 95,292
2 (Conrmed atrial Fibrillation) 4,783
PT_IMP_FATAL_CVD 2,998
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al.e170
Appendix Table 9 Removed features for the Cox regression analysis
Removed features
Cox (aggregated) ETHN_5
DIED
CVD_METOLAZONE
OTHER_PREDNISOLONE
OTHER_CL ARITHROMYCIN
OTHER_VILDAGLIPTIN
PT_IMP_FATAL_CVD
Cox (last quarter) ETHN_5
TESTED
DIED
CVD_METOLAZONE
CVD_HYDRALAZINE_HYDROCHLORIDE
OTHER_INSULIN_ZINC_SUSPENSION
OTHER_PREDNISOLONE
OTHER_CL ARITHROMYCIN
OTHER_VILDAGLIPTIN
PT_IMP_FATAL_CVD
Methods of Information in Medicine Vol. 61 No. S2/2022 © 2022. The Author(s).
Cardiovascular Disease Event Prediction Hsu et al. e171
... Hsu et al. [11] proposed a methodology for cardiovascular disease-related event detection using recurrent neural networks. The research work was carried out using a 2-year observation and 5-year prediction window. ...
Article
Full-text available
Cardiovascular disease (CVDs) is a rapidly rising global concern due to unhealthy diets, lack of physical activity, and other factors. According to the World Health Organization (WHO), primary risk factors include elevated blood pressure, glucose, blood lipids, and obesity. Recent research has focused on accurate and timely disease prediction to reduce risk and fatalities, often relying on predictive models trained on large datasets, which require intensive training. An intelligent system for CVDs patients could greatly assist in making informed decisions by effectively analyzing health parameters. CEP has emerged as a valuable method for solving real-time challenges by aggregating patterns of interest and their causes and effects on end users. In this work, a fuzzy rule-based system is proposed for monitoring clinical data to provide real-time decision support. A fuzzy rule based on clinical and WHO standards ensures accurate predictions. The integrated approach uses Apache Kafka and Spark for data streaming, and the Siddhi CEP Engine for event processing. Additionally, numerous cardiovascular disease-related parameters are passed through CEP Engine to ensure fast and reliable prediction decisions. To validate the effectiveness of the approach, simulation is done with real-time, unseen data to predict cardiovascular disease. Using synthetic data (1000 samples), and categorized it into "Very Low Risk, Low Risk, Medium Risk, High Risk, and Very High Risk." Validation results showed that 20% of samples were categorized as very low risk, 15–45% as low risk, 35–65% as medium risk, 55–85% as high risk, and 75% as very high risk.
... This problem of dealing with long-range dependencies was overcome with the development of RNNs including a long short-term memory (LSTM) hidden unit that remembers the activation patterns of hidden layers. This allows significant events from the distant past to be recalled and unimportant events to be forgotten when making current predictions [62]. Within the context of healthcare, LSTM networks retain the sequential information from patient histories making them especially suitable for longterm forecasting using EHR data. ...
Article
Full-text available
The increasing access to health data worldwide is driving a resurgence in machine learning research, including data-hungry deep learning algorithms. More computationally efficient algorithms now offer unique opportunities to enhance diagnosis, risk stratification, and individualised approaches to patient management. Such opportunities are particularly relevant for the management of older patients, a group that is characterised by complex multimorbidity patterns and significant interindividual variability in homeostatic capacity, organ function, and response to treatment. Clinical tools that utilise machine learning algorithms to determine the optimal choice of treatment are slowly gaining the necessary approval from governing bodies and being implemented into healthcare, with significant implications for virtually all medical disciplines during the next phase of digital medicine. Beyond obtaining regulatory approval, a crucial element in implementing these tools is the trust and support of the people that use them. In this context, an increased understanding by clinicians of artificial intelligence and machine learning algorithms provides an appreciation of the possible benefits, risks, and uncertainties, and improves the chances for successful adoption. This review provides a broad taxonomy of machine learning algorithms, followed by a more detailed description of each algorithm class, their purpose and capabilities, and examples of their applications, particularly in geriatric medicine. Additional focus is given on the clinical implications and challenges involved in relying on devices with reduced interpretability and the progress made in counteracting the latter via the development of explainable machine learning.
Article
Full-text available
Objective To compare Cox models, machine learning (ML), and ensemble models combining both approaches, for prediction of stroke risk in a prospective study of Chinese adults. Materials and Methods We evaluated models for stroke risk at varying intervals of follow-up (<9 years, 0–3 years, 3–6 years, 6–9 years) in 503 842 adults without prior history of stroke recruited from 10 areas in China in 2004–2008. Inputs included sociodemographic factors, diet, medical history, physical activity, and physical measurements. We compared discrimination and calibration of Cox regression, logistic regression, support vector machines, random survival forests, gradient boosted trees (GBT), and multilayer perceptrons, benchmarking performance against the 2017 Framingham Stroke Risk Profile. We then developed an ensemble approach to identify individuals at high risk of stroke (>10% predicted 9-yr stroke risk) by selectively applying either a GBT or Cox model based on individual-level characteristics. Results For 9-yr stroke risk prediction, GBT provided the best discrimination (AUROC: 0.833 in men, 0.836 in women) and calibration, with consistent results in each interval of follow-up. The ensemble approach yielded incrementally higher accuracy (men: 76%, women: 80%), specificity (men: 76%, women: 81%), and positive predictive value (men: 26%, women: 24%) compared to any of the single-model approaches. Discussion and Conclusion Among several approaches, an ensemble model combining both GBT and Cox models achieved the best performance for identifying individuals at high risk of stroke in a contemporary study of Chinese adults. The results highlight the potential value of expanding the use of ML in clinical practice.
Article
Full-text available
Background Maintaining adherence to statins reduces the risk of an initial cardiovascular disease (CVD) event in high-risk individuals (primary prevention) and additional CVD events following the first event (secondary prevention). The effectiveness of statin therapy is limited by the level of adherence maintained by the patient. We undertook a nationwide study to compare adherence and discontinuation in primary and secondary prevention patients. Methods Dispensing data from New Zealand community pharmacies were used to identify patients who received their first statin dispensing between 2006 and 2011. The Medication Possession Ratio (MPR) and proportion who discontinued statin medication was calculated for the year following first statin dispensing for patients with a minimum of two dispensings. Adherence was defined as an MPR ≥ 0.8. Previous CVD was identified using hospital discharge records. Multivariable logistic regression was used to control for demographic and statin characteristics. Results Between 2006 and 2011 289,666 new statin users were identified with 238,855 (82.5%) receiving the statin for primary prevention compared to 50,811 (17.5%) who received it for secondary prevention. The secondary prevention group was 1.55 (95% CI 1.51–1.59) times as likely to be adherent and 0.67 (95% CI 0.65–0.69) times as likely to discontinue statin treatment than the primary prevention group. An early gap in statin coverage increased the odds of discontinuing statin treatment. Conclusion Adherence to statin medication is higher in secondary prevention than primary prevention. Within each group, a range of demographic and treatment factors further influences adherence.
Article
Full-text available
Background Chronic tinnitus is a complex condition that can be associated with considerable distress. Whilst cognitive-behavioral treatment (CBT) approaches have been shown to be effective, not all patients benefit from psychological or psychologically anchored multimodal therapies. Determinants of tinnitus-related distress thus provide valuable information about tinnitus characterization and therapy planning. Objective The study aimed to develop machine learning models that use variables (or “features”) obtained before treatment to characterize patients’ tinnitus-related distress status after treatment. Whilst initially all available variables were considered for model training, the final model was required to achieve highest predictive performance using only a small number of features. Methods 1,416 tinnitus patients (decompensated tinnitus: 32%) who completed a 7-day multimodal treatment encompassing tinnitus-specific components, CBT, physiotherapy and informational counseling were included in the analysis. At baseline, patients were assessed using 205 features from 10 questionnaires comprising sociodemographic and clinical information. A data-driven workflow was developed consisting of (a) an initial exploratory correlation analysis, (b) supervised machine learning to predict tinnitus-related distress after treatment (T1) using baseline data only (T0), and (c) post-hoc analysis of the best model to facilitate model inspection and understanding. Classification methods were embedded in a feature elimination wrapper that iteratively learned on features found to be important for the model in the preceding iteration, in order to keep the performance stable while successively reducing the model complexity. 10-fold cross-validation with area under the curve (AUC) as performance measure was implemented for model generalization error estimation. Results The best machine learning classifier (gradient boosted trees) can predict tinnitus-related distress in T1 with AUC = 0.890 using 26 features. Subjectively perceived tinnitus-related impairment, depressivity, sleep problems, physical health-related impairments in quality of life, time spent to complete questionnaires and educational level exhibited a high attribution towards model prediction. Conclusions Machine learning can reliably identify baseline features recorded prior to treatment commencement that characterize tinnitus-related distress after treatment. The identification of key features can contribute to an improved understanding of multifactorial contributors to tinnitus-related distress and thereon based multimodal treatment strategies.
Article
Full-text available
Background Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions. Methods and findings Using data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals’ usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain). Conclusions Our AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the “information gain” achieved by considering more risk factors in the predictive model was significantly higher than the “modeling gain” achieved by adopting complex predictive models.
Article
Full-text available
The purpose of this study is to examine existing deep learning techniques for addressing class imbalanced data. Effective classification with imbalanced data is an important area of research, as high class imbalance is naturally inherent in many real-world applications, e.g., fraud detection and cancer detection. Moreover, highly imbalanced data poses added difficulty, as most learners will exhibit bias towards the majority class, and in extreme cases, may ignore the minority class altogether. Class imbalance has been studied thoroughly over the last two decades using traditional machine learning models, i.e. non-deep learning. Despite recent advances in deep learning, along with its increasing popularity, very little empirical work in the area of deep learning with class imbalance exists. Having achieved record-breaking performance results in several complex domains, investigating the use of deep neural networks for problems containing high levels of class imbalance is of great interest. Available studies regarding class imbalance and deep learning are surveyed in order to better understand the efficacy of deep learning when applied to class imbalanced data. This survey discusses the implementation details and experimental results for each study, and offers additional insight into their strengths and weaknesses. Several areas of focus include: data complexity, architectures tested, performance interpretation, ease of use, big data application, and generalization to other domains. We have found that research in this area is very limited, that most existing work focuses on computer vision tasks with convolutional neural networks, and that the effects of big data are rarely considered. Several traditional methods for class imbalance, e.g. data sampling and cost-sensitive learning, prove to be applicable in deep learning, while more advanced methods that exploit neural network feature learning abilities show promising results. The survey concludes with a discussion that highlights various gaps in deep learning from class imbalanced data for the purpose of guiding future research.
Article
Deep learning has demonstrated success in many applications; however, their use in healthcare has been limited due to the lack of transparency into how they generate predictions. Algorithms such as Recurrent Neural Networks (RNNs) when applied to Electronic Medical Records (EMR) introduce additional barriers to transparency because of the sequential processing of the RNN and the multi-modal nature of EMR data. This work seeks to improve transparency by: 1) introducing Learned Binary Masks (LBM) as a method for identifying which EMR variables contributed to an RNN model’s risk of mortality (ROM) predictions for critically ill children; and 2) applying KernelSHAP for the same purpose. Given an individual patient, LBM and KernelSHAP both generate an attribution matrix that shows the contribution of each input feature to the RNN’s sequence of predictions for that patient. Attribution matrices can be aggregated in many ways to facilitate different levels of analysis of the RNN model and its predictions. Presented are three methods of aggregations and analyses: 1) over volatile time periods within individual patient predictions, 2) over populations of ICU patients sharing specific diagnoses, and 3) across the general population of critically ill children.
Article
Objectives: Interest in the clinical usefulness of machine learning for risk prediction has bloomed recently. Cardiac surgery patients are at high risk of complications and therefore presurgical risk assessment is of crucial relevance. We aimed to compare the performance of machine learning algorithms over traditional logistic regression (LR) model to predict in-hospital mortality following cardiac surgery. Methods: A single-centre data set of prospectively collected information from patients undergoing adult cardiac surgery from 1996 to 2017 was split into 70% training set and 30% testing set. Prediction models were developed using neural network, random forest, naive Bayes and retrained LR based on features included in the EuroSCORE. Discrimination was assessed using area under the receiver operating characteristic curve, and calibration analysis was undertaken using the calibration belt method. Model calibration drift was assessed by comparing Goodness of fit χ2 statistics observed in 2 equal bins from the testing sample ordered by procedure date. Results: A total of 28 761 cardiac procedures were performed during the study period. The in-hospital mortality rate was 2.7%. Retrained LR [area under the receiver operating characteristic curve 0.80; 95% confidence interval (CI) 0.77-0.83] and random forest model (0.80; 95% CI 0.76-0.83) showed the best discrimination. All models showed significant miscalibration. Retrained LR proved to have the weakest calibration drift. Conclusions: Our findings do not support the hypothesis that machine learning methods provide advantage over LR model in predicting operative mortality after cardiac surgery.