Content uploaded by Asmir Vodencarevic
Author content
All content in this area was uploaded by Asmir Vodencarevic on Mar 09, 2023
Content may be subject to copyright.
Stroke is available at www.ahajournals.org/journal/str
Stroke
Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557 July 2022 2299
Correspondence to: Asmir Vodencarevic, PhD, Novartis Pharma GmbH, Roonstr. 25, 90429 Nuremberg, Germany. Email asmir.vodencarevic@novartis.com
Supplemental Material is available at https://www.ahajournals.org/doi/suppl/10.1161/STROKEAHA.121.036557.
For Sources of Funding and Disclosures, see page 2305.
© 2022 American Heart Association, Inc.
CLINICAL AND POPULATION SCIENCES
Prediction of Recurrent Ischemic Stroke Using
Registry Data and Machine Learning Methods:
The Erlangen Stroke Registry
Asmir Vodencarevic , PhD; Michael Weingärtner; J. Jaime Caro , MDCM; Dubravka Ukalovic , MSc;
Marcus Zimmermann-Rittereiser, Dipl-Ing, MBM; Stefan Schwab , MD, PhD; Peter Kolominsky-Rabas , MD, PhD, MBA
BACKGROUND: There have been multiple efforts toward individual prediction of recurrent strokes based on structured clinical
and imaging data using machine learning algorithms. Some of these efforts resulted in relatively accurate prediction models.
However, acquiring clinical and imaging data is typically possible at provider sites only and is associated with additional costs.
Therefore, we developed recurrent stroke prediction models based solely on data easily obtained from the patient at home.
METHODS: Data from 384 patients with ischemic stroke were obtained from the Erlangen Stroke Registry. Patients were
followed at 3 and 12 months after first stroke and then annually, for about 2 years on average. Multiple machine learning
algorithms were applied to train predictive models for estimating individual risk of recurrent stroke within 1 year. Double
nested cross-validation was utilized for conservative performance estimation and models’ learning capabilities were assessed
by learning curves. Predicted probabilities were calibrated, and relative variable importance was assessed using explainable
artificial intelligence techniques.
RESULTS: The best model achieved the area under the curve of 0.70 (95% CI, 0.64–0.76) and relatively good probability
calibration. The most predictive factors included patient’s family and housing circumstances, rehabilitative measures, age,
high calorie diet, systolic and diastolic blood pressures, percutaneous endoscopic gastrotomy, number of family doctor’s
home visits, and patient’s mental state.
CONCLUSIONS: Developing fairly accurate models for individual risk prediction of recurrent ischemic stroke within 1 year solely
based on registry data is feasible. Such models could be applied in a home setting to provide an initial risk assessment and
identify high-risk patients early.
GRAPHIC ABSTRACT: A graphic abstract is available for this article.
Key Words: ischemic stroke ◼ machine learning ◼ probability ◼ recurrence ◼ registries
In 2019, stroke was the third most common cause
of disability over all ages globally, being responsible
for 5.7% (5.1–6.2) of all-case disability-adjusted
life-years (DALYs).1 This poses an increase of 32.4%
(22.0–42.2) in disability-adjusted life-years compar-
ing to 1990.1 The disease burden is even more strik-
ing for the population above 50 years of age, where
stroke poses globally the second most common cause
of disability.1
After having a stroke, there is an increased risk of
recurrence,1 emphasizing the need for closer risk moni-
toring and timely therapy adjustments in patients who
already had a stroke. Early identification of patients at
increased risk of recurrence increases the opportunity
for stroke prevention, as special attention and available
resources can be devoted to such patients. To achieve
this, accurate risk prediction models are necessary. Some
clinical risk scores have been developed,2,3,4 2 of which
Downloaded from http://ahajournals.org by on March 9, 2023
CLINICAL AND POPULATION
SCIENCES
Vodencarevic et al Machine Learning Prediction of Stroke Recurrence
2300 July 2022 Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557
were applied to assess the risk of recurring stroke within
1 year.5,6 Their predictive performance varied significantly
depending on evaluation methodologies, derivation popu-
lations, outcomes, and prediction time horizons, impeding
their direct comparison.7 For instance, the Stroke Prog-
nosis Instrument II (SPI-II) was developed for patients
with nondisabling ischemic stroke or transient ischemic
attack to predict the combined outcome of stroke or
death within 2 years,2 while the California Risk Score was
derived to predict the stroke risk within 90 days following
transient ischemic attack.3
The most prominent score developed for estimating
the short-term (at 90 days) risk of recurrent ischemic
stroke is the Recurrence Risk Estimator (RRE-90).4 It
reached area under the receiver operating characteristic
curve (AUROC) values of 0.70 to 0.82.7 Custom models
for short-term recurrent stroke prediction were trained
for specific subpopulations of patients with large artery
disease8 and atrial fibrillation9 using cox and logistic and
regression, respectively, showing moderate AUROC val-
ues of 0.62 to 0.70.
The best-known scores to estimate the long-term (1
year) risk of ischemic stroke recurrence are the Essen
Stroke Risk Score (ESRS)5 and the modified ESRS.6
ESRS is based on patient age, several comorbidities
(including hypertension, diabetes, etc), previous myo-
cardial infraction, and smoking status. Modified ESRS
was created by including sex, stroke subtype by etiol-
ogy, and waist circumference. While not being developed
for short-term predictions,10 ESRS did not show good
performance at 1-year either (AUROC, 0.56 [95% CI,
0.40–0.64]).11 These poor-to-moderate results of the
state-of-the-art scores, as well as the estimated pooled
cumulative risk of ischemic stroke recurrence within 1
year of 11.1% (95% CI, 9.0–13.3)12 underline the impor-
tance of developing an accurate tool for long-term pre-
dictions. Such tool would enable patient risk stratification
and targeted assignment of scarce health care resources
to the high-risk patients.
Attempts have been made to identify predictors of
recurrent stroke using Cox regression without develop-
ing a prediction model.13,14 Logistic regression was used
with only clinical and imaging variables (AUROC, 0.71),
only retinal characteristics (AUROC, 0.65), and both
(AUROC, 0.74) whereby performance was measured on
the same data used for model development (no sepa-
rate test data).15 Nonlinear machine learning algorithms
reached accuracy of 0.65 (details about variables and
algorithms used and AUROC not reported).16 Neural
networks achieved AUROC of 0.77 with interquartile
range of 0.68 to 0.84 in 10 runs of 5-fold cross-vali-
dation relying on clinical and imaging data and focus-
ing on transient ischemic attacks and minor strokes.17
Majority class (no recurrent stroke within 1 year) was
randomly undersampled before and not within cross-
validation, which introduced bias. We investigated the
feasibility of predicting individual risk of recurrent isch-
emic stroke within 1 year using long-term data from a
population-based registry.
METHODS
Study Design and Participants
We analyzed anonymized data of 384 patients from the
Erlangen Stroke Registry (ESPro).18 The study was approved
by the Ethics Committee of the Medical Faculty of Friedrich-
Alexander University Erlangen-Nürnberg (Reference number:
249_15 Bc). Written informed consent to participate was given
by patients or their legal representatives. ESPro is an ongo-
ing, population-based, prospective, longitudinal regional study
focusing on stroke and vascular dementia. The study was
started in 1994 and currently covers the population of 112 385
inhabitants of Erlangen in Northern Bavaria, Germany. ESPro
comprises data of 10 000 cerebrovascular events (status on
September 27, 2021) (both hospitalized and outpatient) with
about 1500 annual follow-ups, making it the largest stroke
registry in Germany. Patients are followed at 3 and 12 months
after the initial stroke event and then annually. Data on 250
variables, including demographics, comorbidities, interactions
with health care system, and limited clinical parameters are col-
lected during interviews with patients or their representatives.
The methodology and characteristics of the ESPro population
are described elsewhere.18 Because of the sensitive nature
of the data collected for this study, requests to access the
dataset from qualified researchers trained in human subject
confidentiality protocols may be sent to the Interdisciplinary
Center for Health Technology Assessment (HTA) and Public
Health, Friedrich-Alexander University Erlangen-Nürnberg at
peter.kolominsky-rabas@fau.de. Beause of local data privacy
requirements, only data of patients who had expired before the
start of this study were included in the analysis. Data prepa-
ration, model development, and evaluation were performed in
the Python 3.7.4 programming language and the correspond-
ing packages including pandas 1.0.1, scikit-learn 0.22.1, and
imblearn 0.6.2. The code ownership stays with the industrial
project partner.
Data Preparation
The unit of analysis in this modeling study was a patient fol-
low-up. The outcome to be predicted was binary: occurrence
of recurrent ischemic stroke within 1 year from the follow-up.
This time horizon was selected based on the available data
(predominantly annual follow-ups) but also due to the lack of
Nonstandard Abbreviations and Acronyms
AUROC area under the receiver operating char-
acteristic curve
ESPro Erlangen Stroke Registry
ESRS Essen Stroke Risk Score
SMOTE synthetic minority oversampling
technique
Downloaded from http://ahajournals.org by on March 9, 2023
CLINICAL AND POPULATION
SCIENCES
Vodencarevic et al Machine Learning Prediction of Stroke Recurrence
Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557 July 2022 2301
accurate long-term risk predictors. The data were collected at
1189 follow-ups of 384 patients (mean number of follow-ups
per patient 3.09 [95% CI, 2.97–3.21]), with mean age 78.8
[95% CI, 77.8–79.8] years and 201 (52.3%) were female
(Table 1). Three additional potential predictors were computed:
dynamic patient age (for each follow-up, age at baseline plus
the relative time between baseline date and follow-up date),
time since last stroke and number of previous strokes. After
absolute dates and 113 variables with > 60% of missing values
were removed,19 141 variables remained and were included in
the expert variable selection (Table S1). Two stroke experts
selected 93 potential predictors on the basis of the quality indi-
cators for Stroke Care of the German Stroke Registers Study.20
Categorical variables were dummy encoded (one binary vari-
able for each category) and numerical variables with >40% of
missing values were replaced with missing value indicators,
which specify whether a value was missing (1) or not (0). The
remaining numerical missing values were imputed iteratively as
described in the model development and evaluation section.
After removing redundant variables and categorizing the dos-
age of acetylsalicylic acid and Barthel index, the final dataset
included 119 predictors (10 numerical and 109 binary). All fea-
ture selection steps up to this point were done in an unsuper-
vised manner (ie, target variable was not used) and therefore
performed within data preparation before modeling. As the last
step of feature selection was a supervised one (chi-squared
test), it was included in the machine learning pipeline to avoid
data leakage as described in the next section. Details of the
data preparation workflow are given in Figure S1.
Model Development and Evaluation
Since the outcome was binary, the prediction task was treated
as a binary classification problem with following classes: (1)
recurrence and (2) no recurrence within 1 year from the
follow-up. For this reason and also to increase comparability
with related work, the selected algorithm performance metric
was the AUROC. There were 3 major challenges taken care
of in the analysis: class imbalance, missing data, and dimen-
sionality curse. The classes were highly skewed, with only 89
recurrent strokes recorded within 1 year (7.49%). Five machine
learning approaches were deployed to handle this: random
undersampling of the majority class,21 synthetic oversampling
of the minority class using the Synthetic Minority Oversampling
Technique algorithm,21 cost-sensitive learning21 (assigning
higher penalty to misclassification of the minority class, COST),
anomaly detection algorithms (treating minority class as an
anomaly to be detected),22 and balanced learning algorithms,23
where class balancing is embedded in the learning algorithm.
The percentage of missing values in numerical variables
varied from 0% in patient age to 59% in high blood pressure
treatment duration. Those numerical variables with >40% of
missing values were replaced with missing value indicators as
described in the data preparation step. Other numerical vari-
ables, containing 40% or less missing values, were imputed
iteratively within double nested cross-validation procedure to
avoid any data leakage. The imputation threshold of 40% was
set to avoid imputing the majority of (missing) observations
based on the minority of them.
Iterative imputation was performed by modeling each vari-
able with missing values as a function of other variables using
regularized linear regression and applying those models to
estimate missing values.24 Just like random undersampling and
Synthetic Minority Oversampling Technique resampling models,
imputation models were trained only on the training data sub-
sets to avoid data leakage. Imputed numerical variables were
standardized to reach zero-mean and unit variance.
The curse of dimensionality relates to the problem of having
a high number of variables for a given number of data points
(follow-ups). In this case, the data are sparse, and learning
algorithms can discover incorrect patterns. This problem was
addressed by dimensionality reduction (ie, the χ2 test was used
to select the 10 most relevant binary variables, matching their
number to 10 already available numerical variables). By allow-
ing no > 20 variables in the final model, at least 50 observations
per variable were available to the learning algorithm to addition-
ally reduce the chance of overfitting.
After addressing these challenges, the machine learning algo-
rithm was chosen. According to the no free lunch theorem, any
2 algorithms are equivalent when their performance is averaged
across all possible problems.25 An extensive set of 25 learning
algorithms was evaluated, many combined with random unders-
ampling, Synthetic Minority Oversampling Technique and COST,
resulting in 52 tested approaches (Table 2). Most of algorithms
had specific hyperparameters, such as the regularization param-
eter of logistic regression, which were tuned during model devel-
opment (Table S4). Developed models are dynamic as that they
can re-estimate the risk whenever new data becomes available.
The models employed predict not only the binary class label
but also its probability, that is, the risk of recurrent stroke. An
important concept for assessing the quality of predicted proba-
bilities, especially in the presence of class imbalance, is calibra-
tion plots. They show how well the predicted stroke probabilities
match the observed frequency of strokes. Calibration plots are
created by grouping the predicted probabilities into a fixed
number of groups and plotting the mean prediction for each
group (x-axis) against the observed stroke frequency in that
group (y-axis). The line x = y indicates a perfectly calibrated
model. To improve probability calibration of the employed mod-
els, isotonic regression was applied before performance evalu-
ation.26 To provide insights in the importance of variables for the
prediction, SHapley Additive exPlanations (SHAP framework)
Table 1. Baseline Characteristics
Characteristic* Included patients with stroke
Age, y 78.8 (10.0)
Sex
Male 183 (47.7)
Female 201 (52.3)
Body mass index, kg/m225.1 (4.3)
TOAST classification
Large artery artherosclerosis 22 (5.7)
Cardioembolism 106 (27.6)
Small artery occlusion 85 (22.1)
Other determined 2 (0.6)
Undetermined 169 (44.0)
Barthel index 11.8 (7.7)
*Data are presented as number (%) or mean (SD). TOAST indicates Trial of
ORG 10172 in Acute Stroke Treatment.
Downloaded from http://ahajournals.org by on March 9, 2023
CLINICAL AND POPULATION
SCIENCES
Vodencarevic et al Machine Learning Prediction of Stroke Recurrence
2302 July 2022 Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557
was applied.27 Finally, learning curves were used to evaluate
how the size of training data affects the model performance.
The machine learning pipeline (ie, the sequence of data
processing steps) consisting of: missing value imputation for
numerical variables, χ2 selection of binary variables, resam-
pling (where cost-sensitive, balanced learning and anomaly
detection algorithms were not applied), model training with
hyperparameter optimization, and probability calibration was
validated using double-nested cross-validation protocol (Figure
S2).28 At first, the whole dataset was randomly divided into the
development set (1083 follow-ups of 345 patients with 81
recurrent strokes) and hold-out set (106 follow-ups of differ-
ent 39 patients with 8 recurrent strokes, which was used for
calibration, variable importance, and additional performance
evaluation). The development set was used in double nested
cross-validation, consisting of 5 k-fold cross-validation loops
(k=5 in each loop). These loops were used for the indepen-
dent tasks of hyperparameter optimization, probability calibra-
tion, and unbiased and conservative performance estimate.28
To avoid data leakage, it was ensured that all follow-ups from
the same patient are either in the training or the test set in
each fold of the double nested cross-validation. The optimal
decision threshold was determined using Youden’s J statistic.29
More technical details and the whole machine learning pipeline
are given in Figure 1. To improve transparent reporting of the
machine learning modeling approach, a completed MI-CLAIM
checklist30 is provided as a separate supplement file.
RESULTS
The best prediction performance was achieved by the lin-
ear support vector machine algorithm19 in the combina-
tion with Synthetic Minority Oversampling Technique. The
AUROC was 0.70 (95% CI, 0.64–0.76), as measured by
double nested cross-validation (Figure 2A). The pooled
confusion matrix over 5 test cross-validation folds showed
specificity of 0.78 and sensitivity of 0.63 (Figure 2B).
These 2 metrics are selected as they represent common
metrics for evaluating the utility of binary classification
models in medical applications. An additional evaluation
on the hold-out set showed comparable results (AUROC
of 0.72). Performances of other approaches were signifi-
cantly lower (Table 2). The learning curve reveals steadily
growing AUROC when more data are provided to the
learning algorithm (Figure 2C). The training AUROC
depicted by the upper line finally comes close to the test
AUROC, indicating that the model does not suffer from
significant overfitting. The learning curve also shows that
model test performance is likely to grow further with more
follow-ups but likely not above the upper bound defined
by the training AUROC (0.74). The top 10 variables influ-
encing the predictions, according to the SHAP framework,
were widow(er) marital status, received rehabilitation, liv-
ing situation, age, high calorie food, mean diastolic blood
pressure, percutaneous endoscopic gastrotomy, GP home
visits, Mini Mental Status Test, and mean systolic blood
pressure (Figure 2D). The direction of influence of these
variables on the risk of recurrent stroke is illustrated in
Figure S3 and discussed in the next section. See Table
S2 for list of all 20 variables in the final model. Model
calibration showed significant improvement after applying
isotonic regression compared to uncalibrated support vec-
tor machine model (Figure 2E and 2F). Nonetheless, the
calibrated model still overestimates of the risk of recurrent
stroke considerably, especially in the mid and high ranges
of the stroke probability.
DISCUSSION
In this study, we developed a fairly accurate machine learn-
ing model for estimating the individual risk of recurrent
Table 2. Results of Applied Machine Learning Algorithms
in Combination With Different Strategies for Treating Class
Imbalance
Learning
algorithm* RUS SMOTE COST ADBLA
Most-frequent
dummy
0.50 (0.00) 0.50 (0.00) … …
Logistic regression 0.58 (0.15) 0.66 (0.09) 0.56 (0.11) …
Naïve Bayes 0.56 (0.09) 0.63 (0.08) … …
Linear SVM 0.59 (0.13) 0.70 (0.07) 0.53 (0.09) …
Ridge classifier 0.59 (0.14) 0.68 (0.08) 0.64 (0.12) …
Linear discriminant
analysis
0.60 (0.11) 0.67 (0.09) … …
Decision tree 0.52 (0.05) 0.59 (0.05) 0.58 (0.05) …
k-nearest neighbors 0.55 (0.09) 0.58 (0.06) … …
Nonlinear SVM 0.57 (0.12) 0.56 (0.06) 0.59 (0.11) …
Multi-layer per-
ceptron
0.56 (0.08) 0.61 (0.03) … …
Gaussian process
classifier
0.51 (0.04) 0.52 (0.10) … …
Random forest 0.57 (0.08) 0.65 (0.09) 0.64 (0.08) …
Extra trees 0.60 (0.11) 0.64 (0.09) 0.64 (0.07) …
AdaBoost 0.54 (0.04) 0.65 (0.07) … …
XGBoost 0.50 (0.06) 0.61 (0.05) 0.59 (0.07) …
Stacking meta-
classifier
0.58 (0.13) 0.61 (0.05) … …
Voting classifier 0.58 (0.09) 0.64 (0.06) … …
SGD classifier 0.55 (0.11) 0.67 (0.04) 0.54 (0.10) …
Elliptic envelope … … … 0.55 (0.06)
One-class SVM … … … 0.49 (0.11)
Isolation forest … … … 0.57 (0.05)
Balanced bagging ………0.57 (0.05)
Balanced random
forest
………0.62 (0.07)
Easy ensemble … … … 0.64 (0.08)
RUSBoost … … … 0.58 (0.07)
ADBLA indicates Anomaly Detection and Balanced Learning Algorithms;
AUROC, area under the receiver operating characteristic curve; COST, cost-
sensitive learning; RUS, random undersampling; SGD, stochastic gradient
descent; SMOTE, synthetic minority oversampling technique; and SVM, support
vector machine.
*Data are presented as mean (SD) of AUROC values over 5 folds of the outer
double nested cross-validation loop.
Downloaded from http://ahajournals.org by on March 9, 2023
CLINICAL AND POPULATION
SCIENCES
Vodencarevic et al Machine Learning Prediction of Stroke Recurrence
Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557 July 2022 2303
ischemic stroke within 1 year solely based on easily
obtained patient data. To the best of our knowledge, this
model is the first to reach objectively measured AUROC
of 0.70 without depending on imaging data. No compre-
hensive clinical data were available—very limited medica-
tion information and no laboratory values or images were
available. Information on all 20 variables necessary for
the risk prediction can be collected via interviews with
patients in the home care setting. The dynamic nature of
our model makes it possible to recalculate the individual
risk whenever new data become available.
To compare our approach with other prediction instru-
ments targeting the same long-term prediction, we
reviewed published statistical and machine learning stud-
ies and the established clinical risk scores. A meta-analysis
estimated an AUROC range of 0.55 to 0.65 and 0.58 to
0.68 for ESRS and modified ESRS, respectively,7 which
are the scores developed for the time horizon of 1 year.
Our model showed better performance, however requiring
20 variables for computing prediction versus 8 and 11 of
ESRS and modified ESRS, respectively. At the first glance,
our best model might seem suboptimal compared with the
published performance of the logistic regression (AUROC
0.74)15 and the artificial neural network (AUROC 0.77).17
The performance of both of those approaches, however,
was not well established as it was measured on the same
data used for training in the logistic regression approach
and bias was introduced by manually balancing classes in
the data before modeling and evaluation in the artificial neu-
ral network approach. Moreover, both approaches require
more comprehensive features, which cannot be easily
obtained from the patient at home. The logistic regression
approach is dependent on both the clinical values and reti-
nal imaging features. The artificial neural network approach
requires demographic, clinical and medication data as well
as features extracted manually from Doppler, CT, MR, or
digital subtraction angiography.
We tried to avoid any algorithm selection bias and per-
sonal preferences by evaluating a comprehensive set of
25 different algorithms. Where applicable, we combined
them with 3 techniques for treating the class imbal-
ance problem, which resulted in 52 machine learning
approaches tested. This is one of the major strengths
of this study. The prediction performance obtained by
the rigorous double-nested cross-validation method
was additionally confirmed on the separate hold-out test
set of patients. It is important to note that our validation
method did not only validate the prediction model itself
but also all other modeling steps included in the machine
learning pipeline (Figure 1). By making sure that all fol-
low-ups from a single patient are either in the training, or
the test set of each double nested cross-validation fold,
we prevented potential data leakage and made the per-
formance estimation additionally conservative.
The learning curve for the best model confirmed that
no serious overfitting took place. Moreover, it showed that
the performance of the linear support vector machine
model would likely improve slightly if more data were pro-
vided to the learning algorithm. After performance evalua-
tion, the final prediction model was trained on all available
data which was 28% larger than the original training sets.
Therefore, it is reasonable to expect even slightly better
performance on further unseen, future data.
By applying isotonic regression, the calibration curve
of the prediction model was significantly improved. Nev-
ertheless, it indicated a considerable deficit; namely, that
the model overestimates the risk in mid and high prob-
ability range. However, from a clinical point of view, the
Figure 1. Machine learning pipeline.
Here, 119 stated variables relate to
predictors (input variables). The whole
machine learning pipeline is provided
to the double nested cross-validation
protocol for model training, hyperparameter
optimization, probability calibration, and
performance evaluation. The hold-out
set was used for an additional check of
the model performance on unseen data,
as well as for estimating the importance
of single variables for the prediction.
COST indicates cost-sensitive learning;
FS, feature selection; RUS, random
undersampling; and SMOTE, synthetic
minority oversampling technique.
Downloaded from http://ahajournals.org by on March 9, 2023
CLINICAL AND POPULATION
SCIENCES
Vodencarevic et al Machine Learning Prediction of Stroke Recurrence
2304 July 2022 Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557
risk overestimation is preferred to risk underestimation
because false negatives are more critical than false
positives.
The SHAP explainable AI framework revealed which
variables were most important for the risk assessment.
Although this is solely a model-specific indicator of the
Figure 2. Model diagnostics for the linear support vector machine algorithm combined with synthetic minority oversampling
technique (SMOTE) resampling technique.
A, Double nested-cross validation receiver operating characteristic curve. B, Pooled normalized confusion matrix. C, Algorithm learning
curve. D, SHAP variable importance based on magnitude of variable attributions. E, Original calibration plot without probability calibration.
F, Calibration plot after probability calibration using isotonic regression. BP indicates blood pressure; GP, general practitioner; and PEG,
percutaneous endoscopic gastrotomy.
Downloaded from http://ahajournals.org by on March 9, 2023
CLINICAL AND POPULATION
SCIENCES
Vodencarevic et al Machine Learning Prediction of Stroke Recurrence
Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557 July 2022 2305
relative variable importance as measured in the hold-
out test, it is worth noting that several widely consid-
ered stroke and recurrent stroke risk factors showed
up among the top 10 predictors, such as patient age,
systolic and diastolic blood pressure. The usefulness of
these variables in recurrent risk stroke prediction was
also validated in several clinical scores as well, such as
the ESRS and the California risk scores. This gives con-
fidence that the predictions computed by the prediction
model are to a large extent based on known clinically
relevant variables. Another informative plot of the SHAP
framework is the summary plot, which shows estimated
feature effects on the model output (Figure S3). The
effects of several variables are intuitive, for example, the
lack of poststroke rehabilitative measures, higher age,
higher mean diastolic blood pressure, and the need for
more General Practitioner’s (GP) visits are all recognized
as risk-increasing factors. Several effects might seem
counterintuitive at first: patient widowed status, percu-
taneous endoscopic gastrotomy (PEG), living in a house,
and high mean systolic blood pressure are estimated as
risk-decreasing factors. However, these could be inter-
preted as follows: widowed and patients with PEG might
be getting more support from their surrounding (eg, by
a family, friends, or a care nurse)—a potential confound-
ing factor, while living in a house might be a sign of a
higher socioeconomic status, which may correlate with
better physical health in general. High mean systolic
blood pressure as a risk-decreasing factor might be an
artifact of certain drugs. For example, it is not uncommon
that NSAIDS (potential further confounder) used for
the prevention of recurrent stroke raise blood pressure.
Other estimated risk-decreasing factors including high
calorie diet and lower scores of the Mini Mental State
Test are not easy to explain and might be an artifact of
the imperfect model. The SHAP summary plot describes
the behavior of the (imperfect) predictive model and not
necessarily the causal relationships between variables.
Several important limitations of this study must be
underlined. All data came from the same source (ESPro
registry), and the algorithm was not validated externally.
Despite significant calibration improvement using iso-
tonic regression, the final calibration curve was still sub-
optimal due to risk overestimation. Variable importance
and effects estimation was based on one method only,
while ideally several methods could have been applied
and compared. These technical points remain the subject
of potential future work. Moreover, the number of avail-
able follow-ups was relatively small and the number of
initially collected variables high (250). To focus on those
variables, which are potentially clinically relevant, 2 stroke
experts made a preselection of variables to be included
in the analysis. In this process, a personal bias could not
be excluded. While 4 of the top 5 binary variables in the
final model (Table S2) are also selected in all 5 models of
the outer double-nested cross-validation loop (Table S3),
there is still a significant variability in χ2 variable selec-
tion. This can be attributed to the relatively small sample
size. Finally, further bias might have been created by the
inclusion of only expired patients in the analysis, which
was one of the data privacy requirements in this study.
CONCLUSIONS
Our modeling study showed that a reasonably accurate
individual prediction of recurrent stroke within 1 year
from the patient interview is feasible using ESPro reg-
istry data. The developed model is dynamic and appli-
cable at any time point when the necessary patient data
become available, and not just at the timepoint of stroke
onset. While related work indicates that possibly more
accurate models could be developed using laboratory
and imaging data, our prediction model can be used for
regular and more frequent initial risk assessments in the
patient’s home care setting, potentially as a part of web-
based or mobile telehealth solution. Identified high-risk
patients could be monitored more closely and potentially
advised to consult their treating physician about neces-
sary therapy adjustments. Such software tools could also
be beneficial for providers who monitor their patients
via structured telemonitoring programs. Risk prediction
could enable patient risk stratification, empowering pro-
viders to focus on high-risk patients.
ARTICLE INFORMATION
Received July 6, 2021; final revision received January 12, 2022; accepted Febru-
ary 17, 2022.
Affiliations
Digital Health, Siemens Healthcare GmbH, Erlangen, Germany (A.V.). Interdis-
ciplinary Center for Health Technology Assessment (HTA) and Public Health,
Friedrich-Alexander University Erlangen-Nürnberg, Germany (M.W.). Department
of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada
(J.J.C.). Health Policy, London School of Economics, United Kingdom (J.J.C.).
Computed Tomography, Siemens Healthcare GmbH, Forchheim, Germany (D.U.).
Digital Health, Siemens Healthcare GmbH, Erlangen, Germany (M.Z.-R.). Depart-
ment of Neurology, University Hospital Erlangen, Germany (S.S.). Interdisciplinary
Center for Health Technology Assessment and Public Health, Friedrich-Alexan-
der University Erlangen-Nürnberg, Germany (P.K.-R.).
Sources of Funding
The data collection in the Erlangen Stroke Registry is supported (Grant number
ZMV I 1-2520KEU3 05) by the German Federal Ministry of Health (BMG) as part
of the National Information System of the Federal Health Monitoring (Gesund-
heitsberichterstattung des Bundes—GBE). Siemens Healthcare GmbH funded
this modeling study and contributed to its design, literature search, data analysis,
and writing of the report.
Disclosures
M. Zimmermann-Rittereiser is an employee and a shareholder of Siemens
Healthcare GmbH. D. Ukalovic is an employee of Siemens Healthcare GmbH and
a shareholder of Mind Medicine and BioNTech. Dr Vodencarevic is an employee
of Novartis Pharma GmbH. J.J. Caro is an employee of Evidera. The other authors
report no conflicts.
Supplemental Material
Tables S1–S4
Figures S1–S3
Completed MI-CLAI M (2020) Checklist
Downloaded from http://ahajournals.org by on March 9, 2023
CLINICAL AND POPULATION
SCIENCES
Vodencarevic et al Machine Learning Prediction of Stroke Recurrence
2306 July 2022 Stroke. 2022;53:2299–2306. DOI: 10.1161/STROKEAHA.121.036557
REFERENCES
1. Vos T, Lim SS, Abbafati C, Abbas KM, Abbasi M, Abbasifard M,
Abbasi-Kangevari M, Abbastabar H, Abd-Allah F, Abdelalim A, et al.
Global burden of 369 diseases and injuries in 204 countries and terri-
tories, 1990–2019: a systematic analysis for the Global Burden of Dis-
ease Study 2019. Lancet Neurol. 2020;396:1204–1222. doi: 10.1016/
S0140-6736(20)30925-9
2. Kernan WN, Viscoli CM, Brass LM, Makuch RW, Sarrel PM, Roberts RS,
Gent M, Rothwell P, Sacco RL, Liu RC, et al. The stroke prognosis instru-
ment II (SPI-II): a clinical prediction instrument for patients with transient
ischemia and nondisabling ischemic stroke. Stroke. 2000;31:456–462. doi:
10.1161/01.str.31.2.456
3. Johnston SC, Gress DR, Browner WS, Sidney S. Short-term progno-
sis after emergency department diagnosis of TIA. JAMA. 2000;284:
2901–2906. doi: 10.1001/jama.284.22.2901
4. Ay H, Gungor L, Arsava EM, Rosand J, Vangel M, Benner T, Schwamm
LH, Furie KL, Koroshetz WJ, Sorensen AG. A score to predict early risk
of recurrence after ischemic stroke. Neurology. 2010;74:128–135. doi:
10.1212/WNL.0b013e3181ca9cff
5. Diener HC, Ringleb PA, Savi P. Clopidogrel for the secondary prevention
of stroke. Expert Opin Pharmacother. 2005;6:755–764. doi: 10.1517/
14656566.6.5.755
6. Sumi S, Origasa H, Houkin K, Terayama Y, Uchiyama S, Daida H,
Shigematsu H, Goto S, Tanaka K, Miyamoto S, et al. A modified Essen
stroke risk score for predicting recurrent cardiovascular events:
development and validation. Int J Stroke. 2013;8:251–257. doi:
10.1111/j.1747-4949.2012.00841.x
7. Chaudhary D, Abedi V, Li J, Schirmer CM, Griessenauer CJ, Zand R. Clini-
cal risk score for predicting recurrence following a cerebral Ischemic event.
Front Neurol. 2019;10:1106. doi: 10.3389/fneur.2019.01106
8. Cho EB, Bang OY, Chung CS, Lee KH, Kim GM. Prediction of early ischemic
stroke recurrence with multiparametric perfusion markers in symptomatic
large artery disease. Cerebrovascular Diseases. 2013;35(suppl 3):582.
Abstract. doi: 10.1159/000353129
9. Paciaroni M, Agnelli G, Caso V, Tsivgoulis G, Furie KL, Tadi P, Becattini C,
Falocci N, Zedde M, Abdul-Rahim AH, et al. Prediction of early recurrent
thromboembolic event and major bleeding in patients with acute stroke and
atrial fibrillation by a risk stratification schema: the ALESSA score study.
Stroke. 2017;48:726–732. doi: 10.1161/STROKEAHA.116.015770
10. Chandratheva A, Geraghty OC, Rothwell PM. Poor performance of current
prognostic scores for early risk of recurrence after minor stroke. Stroke.
2011;42:632–637. doi: 10.1161/STROKEAHA.110.593301
11. Thompson DD, Murray GD, Dennis M, Sudlow CL, Whiteley WN. Formal
and informal prediction of recurrent stroke and myocardial infarction after
stroke: a systematic review and evaluation of clinical prediction models in
a new cohort. BMC Med. 2014;12:58. doi: 10.1186/1741-7015-12-58
12. Mohan KM, Wolfe CD, Rudd AG, Heuschmann PU, Kolominsky-Rabas
PL, Grieve AP. Risk and cumulative risk of stroke recurrence: a sys-
tematic review and meta-analysis. Stroke. 2011;42:1489–1494. doi:
10.1161/STROKEAHA.110.602615
13. Zhang C, Zhao X, Wang C, Liu L, Ding Y, Akbary F, Pu Y, Zou X, Du W, Jing J,
et al; Chinese IntraCranial AtheroSclerosis (CICAS) Study Group. Prediction
factors of recurrent ischemic events in one year after minor stroke. PLoS
One. 2015;10:e0120105. doi: 10.1371/journal.pone.0120105
14. Zhang C, Wang Y, Zhao X, Liu L, Wang C, Pu Y, Zou X, Pan Y, Wong KS, Wang
Y; Chinese IntraCranial AtheroSclerosis (CICAS) Study Group. Prediction
of Recurrent Stroke or Transient Ischemic Attack After Noncardiogenic
Posterior Circulation Ischemic Stroke. Stroke. 2017;48:1835–1841. doi:
10.1161/STROKEAHA.116.016285
15. Yuanyuan Z, Jiaman W, Yimin Q, Haibo Y, Weiqu Y, Zhuoxin Y. Com-
parison of Prediction Models based on Risk Factors and Retinal Char-
acteristics Associated with Recurrence One Year after Ischemic
Stroke. J Stroke Cerebrovasc Dis. 2020;29:104581. doi: 10.1016/j.
jstrokecerebrovasdis.2019.104581
16. Park MH, Kwon DY, Jung JM. A machine learning approach in pre-
diction of recurrent stroke. Stroke. 2019; 50:AWP530. Abstract. doi:
10.1161/str.50.suppl_1.WP530
17. Chan KL, Leng X, Zhang W, Dong W, Qiu Q, Yang J, Soo Y, Wong
KS, Leung TW, Liu J. Early identification of high-risk TIA or minor
stroke using artificial neural network. Front Neurol. 2019;10:171. doi:
10.3389/fneur.2019.00171
18. Rücker V, Heuschmann PU, O’Flaherty M, Weingärtner M, Hess M, Sedlak
C, Schwab S, Kolominsky-Rabas PL. Twenty-year time trends in long-term
case-fatality and recurrence rates after ischemic stroke stratified by etiol-
ogy. Stroke. 2020;51:2778–2785. doi: 10.1161/STROKEAHA.120.029972
19. Kelleher JD, MacNamee B, Darcy A. Fundamentals of machine learning for
predictive data analytics - algorithms, worked examples, and case studies. The
MIT Press. Cambridge, Massachusetts; 2015.
20. Heuschmann PU, Biegler MK, Busse O, Elsner S, Grau A, Hasenbein U,
Hermanek P, Janzen RW, Kolominsky-Rabas PL, Kraywinkel K, et al. Devel-
opment and implementation of evidence-based indicators for measuring
quality of acute stroke care: the Quality Indicator Board of the German
Stroke Registers Study Group (ADSR). Stroke. 2006;37:2573–2578. doi:
10.1161/01.STR.0000241086.92084.c0
21. Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N. A survey on address-
ing high-class imbalance in big data. J Big Data. 2018;5:42. doi:
10.1186/s40537-018-0151-6
22. Smolyakov D, Sviridenko N, Ishimtsev V, Burikov E, Burnaev E. Learning
ensembles of anomaly detectors on synthetic data. Lu H, Tang H, Wang Z,
eds. In: Advances in Neural Networks – ISNN 2019. Lecture Notes in Com-
puter Science. 2019;292–306. doi: 10.48550/arXiv.1905.07892
23. Holt JM, Wilk B, Birch CL, Brown DM, Gajapathy M, Moss AC, Sosonkina
N, Wilk MA, Anderson JA, Harris JM, et al; Undiagnosed Diseases Net-
work. VarSight: prioritizing clinically reported variants with binary clas-
sification algorithms. BMC Bioinformatics. 2019;20:496. doi: 10.1186/
s12859-019-3026-8
24. van Buuren S, Groothuis-Oudshoorn, K. mice: Multivariate imputation
by chained equations in R. J Stat Softw. 2011;45:1–67. doi: 10.18637/
jss.v045.i03
25. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE
Trans Evol Comput. 1997;1:67–82. doi: 10.1109/4235.585893
26. Nilotpal C. Isotonic median regression: a linear programming approach.
Math Oper Res. 1989;14:303–308. doi: 10.1287/moor.14.2.303
27. Lundberg SM, Lee SI. A unified approach to interpreting model pre-
dictions. Adv Neural Inf Process Syst. 2017;30:4768–4777. doi:
10.5555/3295222.3295230
28. Cawley GC, Talbot NLC. On over-fitting in model selection and sub-
sequent selection bias in performance evaluation. J Mach Learn Res.
2010;11:2079–2107.
29. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–35. doi:
10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3
30. Norgeot B, Quer G, Beaulieu-Jones BK, Torkamani A, Dias R, Gianfrancesco
M, Arnaout R, Kohane IS, Saria S, Topol E, et al. Minimum information about
clinical artificial intelligence modeling: the MI-CLAI M checklist. Nat Med.
2020;26:1320–1324. doi: 10.1038/s41591-020-1041-y
Downloaded from http://ahajournals.org by on March 9, 2023