ArticlePDF Available

Prediction of ineffectiveness of biological drugs using machine learning and explainable AI methods: data from the Austrian Biological Registry BioReg

Authors:
  • Rheumatological office

Abstract and Figures

Objectives Machine learning models can support an individualized approach in the choice of bDMARDs. We developed prediction models for 5 different bDMARDs using machine learning methods based on patient data derived from the Austrian Biologics Registry (BioReg). Methods Data from 1397 patients and 19 variables with at least 100 treat-to-target (t2t) courses per drug were derived from the BioReg biologics registry. Different machine learning algorithms were trained to predict the risk of ineffectiveness for each bDMARD within the first 26 weeks. Cross-validation and hyperparameter optimization were applied to generate the best models. Model quality was assessed by area under the receiver operating characteristic (AUROC). Using explainable AI (XAI), risk-reducing and risk-increasing factors were extracted. Results The best models per drug achieved an AUROC score of the following: abatacept, 0.66 (95% CI, 0.54–0.78); adalimumab, 0.70 (95% CI, 0.68–0.74); certolizumab, 0.84 (95% CI, 0.79–0.89); etanercept, 0.68 (95% CI, 0.55–0.87); tocilizumab, 0.72 (95% CI, 0.69–0.77). The most risk-increasing variables were visual analytic scores (VAS) for abatacept and etanercept and co-therapy with glucocorticoids for adalimumab. Dosage was the most important variable for certolizumab and associated with a lower risk of non-response. Some variables, such as gender and rheumatoid factor (RF), showed opposite impacts depending on the bDMARD. Conclusion Ineffectiveness of biological drugs could be predicted with promising accuracy. Interestingly, individual parameters were found to be associated with drug responses in different directions, indicating highly complex interactions. Machine learning can be of help in the decision-process by disentangling these relations.
This content is subject to copyright. Terms and conditions apply.
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
https://doi.org/10.1186/s13075-024-03277-x
RESEARCH Open Access
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecom
mons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Arthritis Research & Therapy
Prediction ofineectiveness ofbiological
drugs using machine learning andexplainable
AI methods: data fromtheAustrian Biological
Registry BioReg
Dubravka Ukalovic1* , Burkhard F. Leeb2, Bernhard Rintelen3, Gabriela Eichbauer‑Sturm4, Peter Spellitz5,
Rudolf Puchner6, Manfred Herold7, Miriam Stetter8, Vera Ferincz9, Johannes Resch‑Passini5, Jochen Zwerina10,11,
Marcus Zimmermann‑Rittereiser12 and Ruth Fritsch‑Stork13,14,15
Abstract
Objectives Machine learning models can support an individualized approach in the choice of bDMARDs. We devel‑
oped prediction models for 5 different bDMARDs using machine learning methods based on patient data derived
from the Austrian Biologics Registry (BioReg).
Methods Data from 1397 patients and 19 variables with at least 100 treat‑to‑target (t2t) courses per drug were
derived from the BioReg biologics registry. Different machine learning algorithms were trained to predict the risk
of ineffectiveness for each bDMARD within the first 26 weeks. Cross‑validation and hyperparameter optimization were
applied to generate the best models. Model quality was assessed by area under the receiver operating characteristic
(AUROC). Using explainable AI (XAI), risk‑reducing and risk‑increasing factors were extracted.
Results The best models per drug achieved an AUROC score of the following: abatacept, 0.66 (95% CI, 0.54–0.78);
adalimumab, 0.70 (95% CI, 0.68–0.74); certolizumab, 0.84 (95% CI, 0.79–0.89); etanercept, 0.68 (95% CI, 0.55–0.87);
tocilizumab, 0.72 (95% CI, 0.69–0.77).
The most risk‑increasing variables were visual analytic scores (VAS) for abatacept and etanercept and co‑therapy
with glucocorticoids for adalimumab. Dosage was the most important variable for certolizumab and associated
with a lower risk of non‑response. Some variables, such as gender and rheumatoid factor (RF), showed oppo‑
site impacts depending on the bDMARD.
Conclusion Ineffectiveness of biological drugs could be predicted with promising accuracy. Interestingly, individual
parameters were found to be associated with drug responses in different directions, indicating highly complex inter‑
actions. Machine learning can be of help in the decision‑process by disentangling these relations.
Keywords Rheumatoid arthritis, bDMARD, Machine learning, Routinely collected data, DMARDs
*Correspondence:
Dubravka Ukalovic
dubravka.ukalovic@siemens‑healthineers.com
Full list of author information is available at the end of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 2 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
Introduction
Rheumatoid arthritis (RA) is an autoimmune inflam-
matory joint disease affecting 0.5–1% of the population
worldwide [1]. e last decades have seen great advances
in our knowledge of the pathogenesis, which has led to
an expanded armamentarium of therapeutical options
(and vice versa) [2, 3]. Today’s therapeutical management
of RA is governed by several concepts. e paradigm of
treating early and using a window of opportunity to pre-
vent joint destruction has become commonly accepted
policy [4]. Likewise, a treatment strategy with a clearly
defined clinical target is advocated in guidelines interna-
tionally under the catchphrase “treat to target” (t2t) [5].
In addition, a patient-tailored approach is pursued in
order to forestall unwanted side effects, and respective
research is undertaken under the notion of “precision
medicine” [6].
Precision medicine is a multilayered system, where
certain characteristics stemming from an array of items
derived from medical history details to serological or
imaging markers to genomic as well as other -omics
are chosen to create a model of predicting the clinical
response to certain treatments. In this respect, clinical
practice favors easily attainable items and gender, disease
activity, and duration of symptoms have been identified
as parameters distinguishing refractory from treatment
amenable rheumatoid arthritis in general [7].
In several reports focusing on the prediction of the
response to specific disease-modifying drugs (DMARD),
genetic biomarkers have surfaced, e.g., the PDE3A–
SLCO1C1 locus rs3794271 as marker of a positive
response to aTNF-therapy (anti-tumor necrosis fac-
tor therapy) [8, 9]. A platform combining the molecular
signature of RA patients and clinical data to predict the
response to aTNF was introduced in 2021 [10]. Its valid-
ity and practicability in academic centers as well as pri-
vate practices was reported recently proving superiority
to the clinical standard guided by recommendations [11].
However, for many practices, this approach may not be
feasible due to financial and organizational aspects. Con-
centrating on readily available patient data, e.g., a pre-
dictive role of sex was implied for RA patients on aTNF,
favoring male patients in early RA [12].
Machine learning techniques have been used sporadi-
cally to predict treatment responses. In this respect, the
Korean College of Rheumatology Biologics and Tar-
geted erapy Registry (KOBIO) was investigated by
two studies applying different predictive models for sev-
eral bDMARDS to predict remission at 1-year follow-up
[13, 14] in RA patients as well as patients with spondy-
larthritis. Lee etal. found random forest method model
to have the best prediction performance altogether with
AUROC values of 0.638 (95% CI, 0.576–0.658) [13].
An earlier conducted study [14] found AUROC values
between 0.511 and 0.694 with Ridge classifier perform-
ing the best for one drug (golimumab).
e goal of our study was to develop models to pre-
dict the risk of non-response for specific bDMARDs
considering a 6-month prediction time window, using
solely clinical routine data, and in addition to explain the
impact of each clinical feature contributing to the model
outcomes.
Methods
A high-level overview of the data collection and process-
ing chain is illustrated in Fig.1 and explained in the fol-
lowing section.
Patient‑derived data
Patient data were obtained from the Austrian Registry for
Biologicals, Biosimilars, and targeted synthetic DMARDs
in the treatment of inflammatory rheumatic disease—
BioReg, which was established in 2010 for the purpose of
monitoring those drugs’ safety and efficacy. e registry
includes patients suffering from rheumatoid arthritis,
psoriatic arthritis, and spondylarthritis [15].
BioReg is a nationwide registry with 8 private rheu-
matology practices and 12 hospitals spread through-
out Austria at the time of the study. Patients with the
above mentioned inflammatory rheumatic diseases are
included at the start of a new biological treatment. Inclu-
sion criteria of the registry are thus the presence of RA,
psoriatic arthritis, or spondylarthritis, age above 18, and
the start of a new bDMARD. Exclusion criteria are the
presence of other rheumatic diseases and age under 18.
For the present study, data from 1397 patients suffer-
ing from RA who were treated with bDMARDs collected
from 2010 until 2021 were retrieved. One patient can
occur multiple times in the data as the patient can be
enrolled to multiple treat-to-target courses. e patient
baseline characteristics are presented in Table 1. To
obtain markers predicting the response, only the baseline
visits were considered.
Exclusion criteria
e originally available raw dataset contained 62 vari-
ables for feature generation. We applied several measures
to reduce dimensionality, since the datasets per medi-
cation were relatively small and to avoid the “curse of
dimensionality,” which refers to the problem that more
data is often required to represent the variability of a
dataset in high-dimensional space. A list of this set of
variables, the missing rate, and the reason for exclusion
(e.g., missing rate, clinical relevance, correlation higher
than 0.8 with other variables or weak association with the
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 3 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
outcome label) is presented in the supplemental material
in Supplementary TableS1.
After applying the extraction criteria to the raw data-
set, the correlation between all variables was assessed
andvariables wereexcluded, if the correlation threshold
exceeded 0.8.
Due to high correlation of SDAI (Simplified Disease
Activity Index) and CDAI (Clinical Disease Activity
Index) with tender and swollen joint counts, SDAI and
CDAI were excluded to avoid redundancy. e variable
encoding the smoking status was excluded due to very
weak association with the ineffectiveness of the treat-
ment shown in Table1.
bDMARDs with less than 100 treat to target (t2t)
courses were excluded from the analysis. After obtaining
data from the selected bDMARDs, variables were kept,
if they reached a completeness rate of at least 67%. is
resulted in a slightly different set of variables, depend-
ing on the respective bDMARD. After performing the
machine learning modeling, an AUC < 0.65 of the models
(see below) was set as threshold for further evaluation,
since lower AUCs are considered often as poor, weak, or
low by medical researchers [16]. Applying those exclu-
sion criteria resulted in a cohort underwent treatment
with abatacept, adalimumab, certolizumab, etanercept,
or tocilizumab.
Statistical analysis
After obtaining the cleaned dataset, patient character-
istics for the whole cohort were evaluated: Two-sample
t-test was conducted for numerical variables and chi-
squared for categorical variables to assess whether the
variables are significantly associated with the outcome
of therapy. In addition, the same analysis was applied per
medication to evaluate whether similar patterns could be
observed after performing the machine learning analysis.
Machine learning modeling
Predicting non-responders within a t2t course can be
translated into a binary classification problem; ineffec-
tiveness was chosen as the independent outcome variable
to be predicted, where ineffectiveness was defined by the
experience and assessment of the rheumatologist. Since
treatment success for therapy with bDMARDs is assessed
within the first 6months according to EULAR (European
Alliance of Associations for Rheumatology) recommen-
dations [17], 6months were selected as the time hori-
zon for prediction. e baseline visits of the t2t courses
were categorized according to whether they were found
to be effective or ineffective within the first 6months of
treatment.
Data were split into a training set (90% of the origi-
nal dataset) and a test set (10% of the original data-
set). To avoid data leakage between the two datasets,
it was ensured that one patient was included in either
the test-set or training-set. In addition, it was ensured
that distributions of the therapy outcomes (ineffective
or not) were similar among training and test set (strati-
fied split). Iterative imputation, a method that predicts
the missing variable as a function of other variables,
Fig. 1 A Data preparation. Data were selected based on number of t2t courses. Variables were selected if the missing rate did not exceed
33%. B Machine learning pipeline: Data was labeled, depending on the outcome of the therapy course. Iterative imputation was applied,
on the hold‑out‑set (test‑set) and on the training set. Sampling strategies were applied, and the AUC (area under the curve) was collected for each
model configuration. The final, re‑trained model was explained via applying SHAP (SHapley Additive exPlanations)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 4 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
was applied to input variables to handle missing data
points. e hyperparameters, i.e., those parameters
that are set before each training step, were optimized
by using a model grid, with fixed hyperparameters (grid
search).
e model grid contained 17 base models with different
configurations described in the supplemental material in
Table S3. We applied nested five-fold cross-validation on
the training set, by iterating over an outer loop for model
evaluation and iterating over an inner loop within each
outer iteration step for hyperparameter tuning in order
to avoid overfitting. Also, during the cross-validation
process, split was performed group-wise, i.e., per patient.
Since the outcome distribution was highly imbalanced,
we also incorporated different sampling strategies into
the machine learning (ML) pipeline: synthetic minor-
ity over-sampling for numerical and categorical features
(SMOTE-NC) of the minority class (“ineffective”) and
random undersampling of the majority class (“effective”).
As a selection metric for the best model during
nested-cross-validation, we collected the area under
the receiver operating characteristic (AUC) for each
medication, cross-validation-fold, test set, and sam-
pling strategy, since AUC provides a generic metric to
judge the overall model performance. e collection of
model performance metrics per medication and model
Table 1 Characteristics of t2t courses
aTNF anti-tumor necrosis factor, CRP C-reactive protein, DAS28 Disease Activity Score 28, ESR erythrocyte sedimentation rate, TJC tender joint count, HAQ Health
Assessment Questionnaire, SJC swollen joint count, VAS-Pat. visual analogue scale patient, VAS-Ph. visual analogue scale physician, Anti-CCP anti-cyclic citrullinated
peptide, MTX methotrexate, IV Administration, intravenous administration, GC glucocorticoid
Ineective
Overall No Yes P‑Value
T2T courses (n)1843 1724 119
BMI, mean (SD), (kg/m2)26.4 (4.8) 26.5 (4.8) 26.4 (4.4) 0.830
Age, mean (SD), year 56.1 (13.6) 56.0 (13.6) 58.5 (13.5) 0.054
Gender, n (%) M 407 (22.1) 380 (22.0) 27 (22.7) 0.960
F1436 (77.9) 1344 (78.0) 92 (77.3)
Disease duration, mean (SD), year 10.4 (8.7) 10.4 (8.7) 11.4 (9.6) 0.257
IV administration, n (%) No 1610 (87.4) 1507 (87.4) 103 (86.6) 0.897
Yes 233 (12.6) 217 (12.6) 16 (13.4)
MTX co‑therapy, n (%) No 824 (44.7) 766 (44.4) 58 (48.7) 0.413
Yes 1019 (55.3) 958 (55.6) 61 (51.3)
Other DMARD co‑therapy, n (%) No 1545 (83.8) 1454 (84.3) 91 (76.5) 0.033
Yes 298 (16.2) 270 (15.7) 28 (23.5)
GC co‑therapy, n (%) No 1156 (62.7) 1105 (64.1) 51 (42.9) < 0.001
Yes 687 (37.3) 619 (35.9) 68 (57.1)
Previous aTNF therapy, n (%) No 1233 (66.9) 1168 (67.7) 65 (54.6) 0.004
Yes 610 (33.1) 556 (32.3) 54 (45.4)
HAQ, mean (SD) 1.0 (0.7) 1.0 (0.7) 1.2 (0.7) 0.015
Rheuma‑factor‑positivity, n (%) No 480 (30.0) 443 (29.6) 37 (35.6) 0.241
Yes 1120 (70.0) 1053 (70.4) 67 (64.4)
VAS‑Pat., mean (SD), mm 39.6 (24.4) 38.7 (24.2) 51.1 (24.6) < 0.001
VAS‑Ph., mean (SD), mm 28.7 (20.1) 28.4 (20.1) 32.4 (20.4) 0.047
Anti‑CCP, n (%) No 409 (33.2) 380 (32.8) 29 (39.2) 0.317
Yes 823 (66.8) 778 (67.2) 45 (60.8)
TJC, mean (SD) 4.5 (4.7) 4.4 (4.6) 6.0 (5.8) 0.006
SJC, mean (SD) 3.0 (2.9) 2.9 (2.9) 3.6 (2.8) 0.020
CRP, mean (SD), mg/dL 8.9 (15.4) 8.6 (15.0) 11.9 (19.8) 0.102
ESR, mean (SD), mm/h 19.1 (17.9) 18.9 (17.7) 22.5 (20.8) 0.094
DAS28‑ESR, mean (SD) 3.8 (1.5) 3.8 (1.5) 4.1 (1.5) 0.101
Smoker, n (%) Current 161 (8.7) 151 (8.8) 10 (8.4) 0.978
Past 87 (4.7) 81 (4.7) 6 (5.0)
Never 1595 (86.5) 1492 (86.5) 103 (86.6)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 5 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
configuration can be found in the supplemental mate-
rial in Table S3. e overall accuracy, i.e., the correctly
predicted instances divided by all instances, was not
evaluated, due to the imbalance of the dataset: Given a
non-responder-rate of < 10%, a model that would always
predict therapy response would still have a good (> 90%)
accuracy, which could be misleading when evaluating the
model-performance.
Explainability
To evaluate the impact of the individual parameters on
the outcome, we used the python library SHAP (“SHap-
ley Additive exPlanations”), a game-theoretic approach
for feature importance evaluation. In its original field,
game theory, these numbers (“Shapley values”) reflect the
contributions of a player in a coalition of players to the
game-outcome. In machine learning, they reflect the con-
tribution of a variable to the prediction model outcome
[18]. Moreover, SHAP reflects interactions between vari-
ables and can reveal patterns via global explanations, by
summarizing all local explanations of local predictions
per instance.
All statistical analyses were conducted in python 3.9,
using the python packages scikit-learn for machine learn-
ing, SHAP for feature importance analysis, and the table-
one library for descriptive statistics [19].
Results
Data from 1397 patients suffering from rheumatoid
arthritis at the beginning of a treatment course with a
new bDMARDs were extracted from the BioReg. Taking
the exclusion criteria into account, the number of treat-
ment courses amounted to 1843.
Treat‑to‑target (T2T) course characteristics
In Table1, the characteristics of the first visit of each t2t
course (as instance to be predicted) are summarized and
grouped by the target variable “Ineffective.” Overall, co-
therapy with other DMARDs than methotrexate (MTX),
glucocorticoid (GC)-co-therapy, previous therapy with
aTNF, higher scores in visual analogue scale (VAS)
namely VAS patient (VAS-Pat) or VAS physician (VAS-
Ph), and higher values in disease activity (reflected by
tender joint count/TJC and swollen joint count/SJC) were
significantly more frequent in ineffective t2t courses.
Assessing the p-values per medication revealed a more
differentiated picture as presented in Table2. e follow-
ing variables were associated with significantly higher
risk of non-response depending on the medication: GC
co-therapy for (adalimumab) ADA and (etanercept) ETA,
VAS-Pat for all drugs except ADA, VAS-Ph for (abata-
cept) ABA and (tocilizumab) TOC, previous therapy with
aTNF for (certolizumab) CERT and TOC, SJC for TOC,
DAS-28-ESR for TOC.
Higher dosage for CERT was associated with lower risk
of ineffectiveness.
Model quality metrics
e area under the receiver operating characteristics
for cross-validation per bDMARD could be calculated
for ADA, ABA, CERT, ETA, and TOC (Fig.2), ranging
from 0.66 to 0.84. e model with the highest prognos-
tic quality could be generated for CERT with an AUC of
0.84 (95% CI, 0.79–0.89). e most stable models with
the lowest standard deviations (SD) over the 5 folds were
generated for CERT with an AUC of 0.84 (SD: 0.05) and
TOC with AUC of 0.72 (SD: 0.05).
Table 3 lists the models with the highest predictive
quality and the associated strategy. Except for TOC,
maximum AUC was achieved by addressing class imbal-
ance: random undersampling combined with a Ridge
classifier model achieved highest AUC for ABA, while
the highest AUC for CERT was achieved by a combina-
tion of oversampling and a support vector classifier. For
ADA, the best model performance was achieved by over-
sampling and XGBoost (extreme gradient boosting). For
ETA, oversampling and random forest outperformed the
other model and sampling combinations.
Variable importance
e respective best performing models per bDMARD
weighted the considered variables differently, as shown
in the SHAP-summary plots in Fig.3. A list of the most
impactful variables encompassed different items or items
in a different order for each individual bDMARD.
VAS scores were the common most predictive factor
in abatacept (VAS-Ph) and etanercept (VAS-Pat). Co-
therapy with GC had the highest impact on the ineffec-
tiveness of adalimumab and VAS-Pat for certolizumab
calculated by the SHAP explainer. e direction of VAS-
Pat was identical for all bDMARDs, linking a higher fea-
ture level to a higher degree of ineffectiveness. In the case
of CERT, a smaller dosage was linked to more probable
ineffectiveness. Previous aTNF therapy was most predic-
tive for ineffectiveness in case of TOC.
An interesting observation concerns the consistency
of the direction of individual parameters across almost
all bDMARDs. Whereas GC-co-therapy showed the
same direction of effect with a higher GC dosage increas-
ing the probability of ineffectiveness for all bDMARDs
except for ETA, male gender was predictive not only for
ineffectiveness with ABA but also for effectiveness with
ADA. Likewise, a higher rheumatoid factor predicted
ineffectiveness in ABA, whereas in CERT, a lower RF was
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 6 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
linked to a worse clinical response (Fig.3), although these
observations were not statistically significant (Table2).
Discussion
In our study, we proved the feasibility of developing accu-
rate machine learning models to predict with moder-
ate to good prognostic quality the non-response of RA
patients after 6months in a real-world setting to individ-
ual bDMARDs. Furthermore, we could provide a quanti-
fication of each variable’s impact on the respective model
per bDMARD using the explainable AI (XAI) framework
SH AP.
e models in our studies yielded AUROC scores from
0.66 to 0.84 and consequently were considerably higher
than the ones seen in the methodologically most similar
studies [13, 14]. Herein, several machine learning mod-
els were applied to a Korean registry generating AUROC
scores from 0.561 to 0.638 [13] and 0.511 to 0.694 [14] for
the prediction of clinical response to bDMARDs in gen-
eral. In our study, we used similar modeling techniques
and furthermore addressed class imbalance by combining
under- and oversampling techniques with different pre-
diction models, which resulted in an improved model
performance. Moreover, selecting drugs with more than
100 t2t courses and predicting missing data points by
treating other features as input variables improved the
training base and helped to build a robust model pipeline.
An important facet of our study is the characteriza-
tion of feature importance including the direction of
the respective feature importance on drug responsive-
ness. Although XAI methods are controversial regard-
ing individual predictions (local explainability) [20], XAI
methods can be used to explain how machine learning
models work globally. Such global explanations can be
combined with descriptive analysis to obtain insights on
the importance of specific variables. In this respect, we
found GC-co-therapy, VAS scores, and disease activity to
be associated with higher risks of ineffectiveness in the
whole cohort, regardless of the individual drug. Our find-
ings are in line with the literature and add more detail,
e.g., the significance of patient reported features, such
as VAS patient (depicted in the SHAP Plots in Fig.3)
Table 2 P‑values grouped by drug. Factors with p < 0.05 and red color‑code were associated with higher risk of non‑response
significantly. Only one factor (dosage) with p < 0.05 was associated with lower risk of non‑response significantly. Dosage was
normalized to mg/kg/day or mg/day depending on the medication
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 7 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
as important feature in all investigated bDMARDs as
described in the study conducted by Lee etal. [13].
e importance of global assessments by patient and
physician is reflected by the incorporation of these
items into the different remission definitions based on
the disease activity indices DAS28, SDAI, and CDAI.
e central role of patients’ global assessment (PGA)
was underscored in a report comparing CDAI and
SDAI to the (most stringent) Boolean remission using
data of 3 large clinical trials with adalimumab; the dif-
ference between CDAI and SDAI vs. Boolean remission
was caused by higher patients’ VAS scores, leading to
a redefined Boolean remission to allow a higher VAS
score [21, 22].
Fig. 2 Area under the receiver operating characteristics for fivefold cross‑validation
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 8 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
In a recent paper by Capelusnik and Aletaha, the
authors investigated predictors of response in three
different large RCTs of aTNF including > 1300 patients
after 30 weeks of treatment confirming the earlier
notion of an inverse relationship of high baseline dis-
ease activity with a lower chance of achieving state tar-
gets (i.e., remission or low activity). In a more detailed
analysis, PGA, among other values, was found signifi-
cantly associated with a lower chance of response. Also,
in our study, a higher PGA was predictive of a higher
risk of bDMARD failure, which was significant in aTNF
as well as in abatacept and tocilizumab. Also applying
machine learning to predict response to DMARDs in
RA established PGA to be an important predictor of
remission in two recent reports [13, 23]. Duong etal.
investigated predictors for methotrexate therapy and
described a high PGA to be in the top 3 individual
components predicting a poor response. As mentioned
above, also in the Korean registry, patient-reported
outcome, i.e., the PGA in RA, was revealed as the most
important feature in the random forest as well as in the
XGBoost model [13].
Remarkably, opposite effects of variables could also
be observed, e.g., for gender and rheumatoid factor,
although these effects did not reach statistical signifi-
cance as demonstrated in Table2.
The possible influence of gender/sex on drug
responsiveness has come into focus in the last years.
Besides proposed measures to adequately address this
matter in future drug development [2], different drug
retention rates and clinical effects have also been
investigated in rheumatoid arthritis. This leads to the
comprehension that women overall show a diminished
response to drugs in rheumatoid arthritis [24]. Regis-
try-derived data have demonstrated better responses
or retention rates for male patients with rheumatoid
arthritis to DMARDs in general and to aTNF spe-
cifically [12, 2528]. This is in line with our findings,
where gender was an important feature in all aTNF
demonstrating a smaller risk of non-response for male
patients especially in CERT and ADA. However, this
was not statistically significant, only showing a statisti-
cal trend in CERT (p = 0.068).
Another feature of interest in the SHAP calculations
was the presence of rheumatoid factor (RF), which lead
to differential drug responsiveness depending on the spe-
cific bDMARD. Whereas a lower RF showed a trend to
associate with a smaller risk of ineffectiveness in ABA
and TOC, the opposite was seen in CERT, whereas the
rest of the aTNF did not show a distinct direction of
effect. e literature does not report consistent associa-
tions between the responsiveness to bDMARDs and RF.
In a Taiwanese registry, overall RF positivity was associ-
ated with drug survival, which was statistically signifi-
cant for ABA but not for aTNF and TOC, suggesting RF
positivity as a biomarker for better responsiveness to
abatacept [29]. An earlier systematic review and meta-
analysis could not find such an association [30]. Conflict-
ing data have also been published about the relationship
of RF and aTNF treatment, although to our knowledge,
differences between certolizumab and other aTNF have
not been reported [3134]. e different observation
period, which was 6months in our study opposed to one
to several years in others, especially as the effect seen in
the reported papers appeared after 6months, may have
contributed to partly discrepant findings in our study to
previous reports [35, 36].
is study has some limitations. First, the models were
developed using a single data source, the BioReg regis-
try. Although BioReg includes data from hospital set-
tings as well as private practices, a risk of systematic bias
remains. As the prescription of a biological or targeted
synthetic DMARD in Austria is mainly left to the discre-
tion of the treating physician without the need to com-
ply with objective outcome parameters used in clinical
trials, our data might harbor known as well as unknown
confounding variables, including confounding by indi-
cation. Moreover, the target variable “ineffectiveness” in
the registry was set solely based on the opinion of the
treating rheumatologist, which limits generalizability
Table 3 Best models according to highest mean AUROC score per medication
XGBoost, extreme gradient boosting; SVC, support vector classier; RF Classier, Random Forest Classier; RUS, random undersampling; OVS, oversampling
Medication Ineective Best model Sampling strategy Mean AUROC (95% CI)
No Yes
Abatacept 212 20 Ridge classifier RUS 0.66 (0.54–0.78)
Adalimumab 493 36 XG Boost OVS 0.70 (0.68–0.74)
Certolizumab 150 11 SVC OVS 0.84 (0.79–0.89)
Etanercept 530 23 RF Classifier OVS 0.68 (0.55–0.87)
Tocilizumab 339 29 XG Boost None 0.72 (0.69–0.77)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 9 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
Fig. 3 SHAP summary plots/impact of variables on model outcome. Variables are sorted in descending order of impact. Positive SHAP values
indicate an effect in the direction of higher risk of ineffectiveness. Correspondingly, negative values indicate an effect of the factor in the direction
of a lower risk for ineffectiveness. High values for the variables (features) are encoded in red; correspondingly low values are encoded in blue
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 10 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
and comparability compared to studies, where specific
thresholds for DAS-28-ESR or other objective measures
were used to create a binary outcome variable. However,
taking this approach often mirrors clinical practice. Fur-
thermore, our study sample was small, mirroring a rather
homogenous middle European population. e overall
small sample size may explain why smoking status shows
weak association with ineffectiveness as only 16 patients
were past or current smokers and showed no treatment
response, which is not in line with literature as smoking
is consistently reported as having high association with
treatment outcome.
Our described methodology should therefore be evalu-
ated using independent datasets.
Embedding such models in a clinical setting to sup-
port treatment decisions raises the question of how
an individual prediction should be presented to rheu-
matologists. A purely binary prediction with the result
non-responder vs. responder would carry a high risk of
misclassification, since, as can be seen in Fig. 2, a 100%
sensitivity can never be achieved for the data examined,
except for CERT and TOC, and this only if a high false
positive rate is accepted. e representation of the con-
tinuous risk as well as the AUROC per drug and model
would be preferable to a purely binary statement, which
should be the subject of future studies. It is also impor-
tant to emphasize that this study does not exclusively
look at bDMARD naïve patients; however, this may be
beneficial in a real-world scenario if such models would
be embedded in a software-assistant, supporting rheu-
matologists in their day-to-day work.
Conclusions
In conclusion, developing accurate machine learn-
ing models to identify patients with a high risk of non-
response before therapy with bDMARDs is feasible. e
algorithms used in our study should be applied to addi-
tional data sources including larger registries to refine
our models and evaluate feature importance to support
treatment decision in a clinical setting.
Abbreviations
BioReg Austrian Biologics Registry
ABA Abatacept
ADA Adalimumab
ADA‑Boost Adaptive Boosting
Anti‑CCP Anti‑cyclic citrullinated peptide
aTNF Anti‑tumor necrosis factor
AUC Area under the curve
AUROC Area under the receiver operating characteristic
bDMARDs Biologic disease‑modifying antirheumatic drugs
CDAI Clinical Disease Activity Index
CERT Certolizumab
CI Confidence interval
CRP C‑reactive protein
DAS28 Disease Activity Score 28
DMARDs Disease‑modifying antirheumatic drugs
ESR Erythrocyte sedimentation rate
ETA Etanercept
EULAR European Alliance of Associations for Rheumatology
ExtraTrees Extra Trees Classifier
GaussProc Gaussian Process Classifier
GC Glucocorticoid
HAQ Health assessment questionnaire
IV administration Intravenous administration
KOBIO Korean College of Rheumatology Biologics and Targeted
Therapy Registry
LDA Linear discriminant analysis
MTX Methotrexate
OVS Oversampling
RA Rheumatoid arthritis
RF Rheumatoid factor
RF classifier Random forest classifier
ROC Receiver operating characteristic
RUS Random undersampling
SD Standard deviation
SDAI Simplified disease activity index
SHAP SHapley Additive exPlanations
SJC Swollen joint count
SMOTE‑NC Synthetic minority over‑sampling for numerical and cat‑
egorical features
SVC Support vector classifier
t2t Treat‑to‑target
Supplementary Information
The online version contains supplementary material available at https:// doi.
org/ 10. 1186/ s13075‑ 024‑ 03277‑x.
Additional le1: TableS1. Originally available variables from raw dataset
potentially affecting treatment outcome, categorized by inclusion/exclu‑
sion criteria. TableS2. Correlation Heatmap for cleaned input dataset.
SDAI and CDAI were excluded due to correlation > 0.8 with TJC, SJC and
DAS28‑ESR. TableS3. Model outcome depending on class imbalancing
technique. Highest AUCs with a maximum difference between train mean
AUC and held out set of 0.1 were selected for final model evaluation to
ensure a robust and stable model.
Acknowledgements
We thank all patients and participating centers for their support of our
research.
Authors’ contributions
All authors helped in the drafting of the manuscript and in critically revising
it for important intellectual content, and all authors approved the final article
to be submitted for publication. D.U. and R.F.S. wrote the main manuscript.
D.U. conducted the data analysis and prepared all figures and tables. B. L., B.
R., G.E.S., P. S., R. P., M. H., M. S., V. F., J. R.P., J.Z. and R.F.S. are responsible for data
curation and data aquisition. D.U. and M.Z.R. are responsible for study concep‑
tion and design.
Funding
This research was funded by Siemens Healthineers.
Availability of data and materials
The data underlying this article cannot be shared publicly due to data privacy
of individuals. The data will be shared on reasonable request to the corre‑
sponding author and the registry.
Declarations
Ethics approval and consent to participate
The ethical committee of Lower Austria has approved the study design of
BioReg (Reference number GS4‑EK‑085–2009), which is renewed annually
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 11 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
(latest renewal January 2023). All patients gave their informed consent before
inclusion into BioReg.
Consent for publication
Not applicable.
Competing interests
Disclosure of Interests: Dubravka Ukalovic Employee of: Siemens Healthineers,
Burkhard Leeb Member of speakers’ bureau: AbbVie, Roche, MSD, Pfizer,
Actiopharm, Boehringer‑Ingelheim, Kwizda, Celgene, Sandoz, Grünenthal, Eli‑
Lilly, Consultant of: AbbVie, Amgen, Roche, MSD, Pfizer, Celgene, Grünenthal,
Kwizda, Eli‑Lilly, Novartis, Sandoz, Received Honoria from: Abbvie, Biogen,
Celgene, Eli Lilly, MSD, Pfizer, Roche, Novartis and Sandoz, Bernhard Rintelen
Member of speakers’ bureau: BMS, Eli‑Lilly, Pfizer, TRB‑Chemedica, UCB,
Wyeth, Consultant of: Abbott, Abbvie, Amgen, Gileat, Novartis, Pfizer, Roche,
TRB‑Chemedica, UCB, Wyeth, Grant/research support from: Abbott, Aesca,
Amgen, Centocor, Eli‑Lilly, Servier, UCB, Gabriela Eichbauer‑Sturm Member of
speakers’ bureau: AbbVie, Astro‑Pharma, Grünenthal, Jansen, Eli‑Lilly, Menarini,
MSD, Novartis, Pfizer, Roche, TRB, UCB, Fresenius Kabi, Peter Spellitz: None
declared, Rudolf Puchner Member of speakers’ bureau: AbbVie, BMS, Janssen,
Kwizda, MSD, Pfizer, Celgene, Grünenthal, Eli‑Lilly, Consultant of: AbbVie,
Amgen, Pfizer, Celgene, Grünenthal, Eli‑Lilly, Received Honoria from: Abbvie,
BMS, Gilead, Janssen, Kwizda, Lilly, MSD, Novartis and Pfizer, Manfred Herold:
None declared, Miriam Stetter: None declared, Vera Ferincz: None declared,
Johannes Resch‑Passini: None declared, Jochen Zwerina: None declared, Mar‑
cus Zimmermann‑Rittereiser Employee of: Siemens Healthineers, Shareholder
of: Siemens Healthineers, Ruth Fritsch‑Stork Member of speakers’ bureau:
AbbVie, Astra Zeneca, Astropharm, Novartis.
Author details
1 Siemens Healthcare GmbH, Computed Tomography, Forchheim, Germany.
2 Rheumatological Practice, Private Office, Hollabrunn, Austria. 3 Lower Austrian
State Hospital Stockerau, 2nd Department of Medicine, Lower Austrian
Competence Center for Rheumatology, Karl Landsteiner Institute for Clinical
Rheumatology, Stockerau, Austria. 4 Rheumatological Practice, Private Office,
Linz, Austria. 5 Rheuma‑Center Wien‑Oberlaa, Department of Rheumatol‑
ogy, Vienna, Austria. 6 Rheumatological Practice, Private Office, Wels, Austria.
7 Department of Internal Medicine II, Medical University of Innsbruck, Inns‑
bruck, Austria. 8 Rheumatological Practice, Private Office, Amstetten, Austria.
9 Department of Internal Medicine, University Hospital St. Pölten, St. Pölten,
Austria. 10 Hanusch Krankenhaus, Vienna, Austria. 11 Ludwig Boltzmann Institute
of Osteology, Vienna, Austria. 12 Siemens Healthcare GmbH, Digital & Automa‑
tion, Erlangen, Germany. 13 Health Care Center Mariahilf of ÖGK, Vienna, Aus‑
tria. 14 Biologica Registry BioReg, Stockerau, Austria. 15 Medical Faculty, Sigmund
Freud Private University Vienna, Vienna, Austria.
Received: 20 October 2023 Accepted: 25 January 2024
References
1. Doran MF, Pond GR, Crowson CS, O’Fallon WM, Gabriel SE. Trends in
incidence and mortality in rheumatoid arthritis in Rochester, Minnesota,
over a forty‑year period. Arthritis Rheum. 2002;46(3):625–31.
2. McInnes IB, Schett G. Pathogenetic insights from the treatment of rheu‑
matoid arthritis. The Lancet. 2017;389(10086):2328–37.
3. Alivernini S, Firestein GS, McInnes IB. The pathogenesis of rheumatoid
arthritis. Immunity. 2022;55(12):2255–70.
4. Burgers LE, Raza K, van der Helm‑van Mil AH. Window of opportunity in
rheumatoid arthritis ‑ definitions and supporting evidence: from old to
new perspectives. RMD Open. 2019;5(1):e000870.
5. Duarte C, Ferreira RJO, Santos EJF, da Silva JAP. Treating‑to‑target in
rheumatology: theory and practice. Best Pract Res Clin Rheumatol.
2022;36(1):101735.
6. Aletaha D. Precision medicine and management of rheumatoid arthritis. J
Autoimmun. 2020;110:102405.
7. Bécède M, Alasti F, Gessl I, Haupt L, Kerschbaumer A, Landesmann U,
et al. Risk profiling for a refractory course of rheumatoid arthritis. Semin
Arthritis Rheum. 2019;49(2):211–7.
8. Acosta‑Colman I, Palau N, Tornero J, Fernández‑Nebro A, Blanco F,
González‑Alvaro I, et al. GWAS replication study confirms the associa‑
tion of PDE3A–SLCO1C1 with anti‑TNF therapy response in rheumatoid
arthritis. Pharmacogenomics. 2013;14(7):727–34.
9. Wei K, Jiang P, Zhao J, Jin Y, Zhang R, Chang C, et al. Biomarkers to predict
DMARDs efficacy and adverse effect in rheumatoid arthritis. Front Immu‑
nol. 2022;13:865267.
10. Cohen S, Wells AF, Curtis JR, Dhar R, Mellors T, Zhang L, et al. A molecular
signature response classifier to predict inadequate response to tumor
necrosis factor‑α inhibitors: the NETWORK‑004 prospective observational
study. Rheumatol Ther. 2021;8(3):1159–76.
11. Curtis JR, Strand V, Golombek S, Zhang L, Wong A, Zielinski MC, et al.
Patient outcomes improve when a molecular signature test guides
treatment decision‑making in rheumatoid arthritis. Expert Rev Mol Diagn.
2022;22(10):973–82.
12. Jawaheer D, Olsen J, Hetland ML. Sex differences in response to anti‑
tumor necrosis factor therapy in early and established rheumatoid arthri‑
tis – results from the DANBIO registry. J Rheumatol. 2012;39(1):46–53.
13. Lee S, Kang S, Eun Y, Won HH, Kim H, Lee J, et al. Machine learning‑based
prediction model for responses of bDMARDs in patients with rheumatoid
arthritis and ankylosing spondylitis. Arthritis Res Ther. 2021;23(1):254.
14. Koo BS, Eun S, Shin K, Yoon H, Hong C, Kim DH, Hong S, Kim YG, Lee CK,
Yoo B, Oh JS. Machine learning model for identifying important clinical
features for predicting remission in patients with rheumatoid arthritis
treated with biologics. Arthritis Res Ther. 2021;23(1):178. https:// doi. org/
10. 1186/ s13075‑ 021‑ 02567‑y.
15. Rintelen B, Zwerina J, Herold M, Singer F, Hitzelhammer J, Halder W, Eich‑
bauer‑Sturm G, Puchner R, Stetter M, Leeb BF, BIOREG investigator group.
Validity of data collected in BIOREG, the Austrian register for biological
treatment in rheumatology: current practice of bDMARD therapy in
rheumatoid arthritis in Austria. BMC Musculoskelet Disord. 2016;17(1):358.
https:// doi. org/ 10. 1186/ s12891‑ 016‑ 1207‑4.
16. de Hond AAH, Steyerberg EW, van Calster B. Interpreting area under
the receiver operating characteristic curve. Lancet Digit Health.
2022;4(12):e853–5.
17. Smolen JS, Landewé RBM, Bergstra SA, Kerschbaumer A, Sepriano A,
Aletaha D, et al. EULAR recommendations for the management of
rheumatoid arthritis with synthetic and biological disease‑modifying
antirheumatic drugs: 2022 update. Ann Rheum Dis. 2023;82(1):3–18.
18. Lundberg SM, Lee S‑I. A unified approach to interpreting model predic‑
tions. Adv Neural Inf Process Syst. 2017;30:4768–77.
19. Pollard TJ, et al. tableone: an open source python package for producing
summary statistics for research papers. JAMIA Open. 2018;1(1):26–31.
20. Ghassemi M, Oakden‑Rayner L, Beam A. The false hope of current
approaches to explainable artificial intelligence in health care. The Lancet
Digit Health. 2021;3(11):e745–50.
21. Aletaha D, Wang X, Zhong S, Florentinus S, Monastiriakos K, Smolen JS.
Differences in disease activity measures in patients with rheumatoid
arthritis who achieved DAS, SDAI, or CDAI remission but not Boolean
remission. Semin Arthritis Rheum. 2020;50(2):276–84.
22. Studenic P, Aletaha D, de Wit M, Stamm TA, Alasti F, Lacaille D, et al. Ameri‑
can College of Rheumatology/Eular remission criteria for rheumatoid
arthritis: 2022 revision. Ann Rheum Dis. 2023;82(1):74–80.
23. Duong SQ, Crowson CS, Athreya A, Atkinson EJ, Davis JM 3rd, Warrington
KJ, et al. Clinical predictors of response to methotrexate in patients with
rheumatoid arthritis: a machine learning approach using clinical trial
data. Arthritis Res Ther. 2022;24(1):162.
24. Lend K, van Vollenhoven RF, Lampa J, Lund Hetland M, Haavardsholm
EA, Nordström D, et al. Sex differences in remission rates over 24 weeks
among three different biological treatments compared to conventional
therapy in patients with early rheumatoid arthritis (NORD‑STAR): a post‑
hoc analysis of a randomised controlled trial. The Lancet Rheumatology.
2022;4(10):e688–98.
25. Bergstra SA, Allaart CF, Ramiro S, Chopra A, Govind N, Silva C, et al.
Sex‑associated treatment differences and their outcomes in rheu‑
matoid arthritis: results from the METEOR register. J Rheumatol.
2018;45(10):1361–6.
26. Hyrich KL, Watson KD, Silman AJ, Symmons DP. Predictors of response to
anti‑TNF‑alpha therapy among patients with rheumatoid arthritis: results
from the British Society for Rheumatology biologics register. Rheumatol‑
ogy (Oxford). 2006;45(12):1558–65.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 12 of 12
Ukalovicetal. Arthritis Research & Therapy (2024) 26:44
27. Markenson JA, Gibofsky A, Palmer WR, Keystone EC, Schiff MH, Feng J,
et al. Persistence with anti‑tumor necrosis factor therapies in patients
with rheumatoid arthritis: observations from the RADIUS registry. J Rheu‑
matol. 2011;38(7):1273–81.
28. Neovius M, Arkema EV, Olsson H, Eriksson JK, Kristensen LE, Simard JF,
et al. Drug survival on TNF inhibitors in patients with rheumatoid arthritis
comparison of adalimumab, etanercept and infliximab. Ann Rheum Dis.
2015;74(2):354–60.
29. Lin CT, Huang WN, Tsai WC, Chen JP, Hung WT, Hsieh TY, et al. Predictors
of drug survival for biologic and targeted synthetic DMARDs in rheuma‑
toid arthritis: analysis from the TRA clinical electronic registry. PLoS ONE.
2021;16(4):e0250877.
30. Maneiro RJ, Salgado E, Carmona L, Gomez‑Reino JJ. Rheumatoid factor
as predictor of response to abatacept, rituximab and tocilizumab in
rheumatoid arthritis: systematic review and meta‑analysis. Semin Arthritis
Rheum. 2013;43(1):9–17.
31. Bobbio‑Pallavicini F, Caporali R, Alpini C, Avalle S, Epis OM, Klersy C,
et al. High IgA rheumatoid factor levels are associated with poor clinical
response to tumour necrosis factor α inhibitors in rheumatoid arthritis.
Ann Rheum Dis. 2007;66(3):302–7.
32. De Rycke L, Verhelst X, Kruithof E, Van den Bosch F, Hoffman IE, Veys EM,
et al. Rheumatoid factor, but not anti‑cyclic citrullinated peptide antibod‑
ies, is modulated by infliximab treatment in rheumatoid arthritis. Ann
Rheum Dis. 2005;64(2):299–302.
33. Lv Q, Yin Y, Li X, Shan G, Wu X, Liang D, et al. The status of rheumatoid
factor and anti‑cyclic citrullinated peptide antibody are not associated
with the effect of anti‑TNFα agent treatment in patients with rheumatoid
arthritis: a meta‑analysis. PLoS ONE. 2014;9(2):e89442.
34. Salgado E, Maneiro JR, Carmona L, Gómez‑Reino J. Rheumatoid factor
and response to TNF antagonists in rheumatoid arthritis: systematic
review and meta‑analysis of observational studies. Joint Bone Spine.
2014;81(1):41–50.
35. Alten R, Mariette X, Lorenz HM, Nüßlein H, Galeazzi M, Navarro F, et al.
Predictors of abatacept retention over 2 years in patients with rheuma‑
toid arthritis: results from the real‑world ACTION study. Clin Rheumatol.
2019;38(5):1413–24.
36. Gottenberg JE, Courvoisier DS, Hernandez MV, Iannone F, Lie E, Canhão H,
et al. Brief report: association of rheumatoid factor and anti‑citrullinated
protein antibody positivity with better effectiveness of abatacept:
results from the Pan‑European registry analysis. Arthritis Rheumatol.
2016;68(6):1346–52.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub‑
lished maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... This can lead to improved model performance, especially in the context of highly imbalanced datasets, where traditional classifiers may bias towards the majority class. The effectiveness of SMOTE in addressing class imbalance has been welldocumented in previous studies [39,41]. ...
... AUC, F1 score, and G-means are the three comprehensive metrics we prioritize in model evaluation. Specifically, we considered a value above 0.7 for these metrics as indicative of good model performance [41]. In summary, the combination of these metrics provides a comprehensive understanding of model performance, helping to assess its accuracy and applicability. ...
Article
Full-text available
Objective To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. Methods In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. Results A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. Conclusions Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.
Article
The integration of artificial intelligence (AI) into rheumatology has revolutionized research and clinical practice, offering transformative advancements in diagnostics, biomarker discovery, genomics, digital health technologies, and personalized medicine. This review provides a comprehensive analysis of cutting-edge AI applications in rheumatology, highlighting deep learning models for imaging diagnostics, AI-powered genomic analysis, and wearable health technologies for continuous disease monitoring. The findings demonstrate that AI enhances diagnostic precision, facilitates early disease detection, and enables personalized therapeutic strategies. However, significant challenges remain, including limited clinician adoption, ethical concerns, data privacy issues, and the need for robust model validation. A recent survey revealed that 73% of rheumatologists have never used AI in clinical practice, emphasizing the urgent need for targeted training and interdisciplinary collaboration. Additionally, AI is reshaping rheumatology research by optimizing drug discovery, clinical trial designs, and predictive analytics. Overcoming current barriers requires a multidisciplinary approach involving rheumatologists, AI specialists, and regulatory bodies to ensure the ethical, scalable, and effective implementation of AI-driven solutions in rheumatology.
Article
Full-text available
Background: The management of rheumatoid arthritis with biologic disease-modifying anti-rheumatic drugs (bDMARDs) requires careful consideration due to the considerable variability among patients. Despite the effectiveness of rituximab as a frequently used bDMARD, achieving treatment goals can be challenging because of the physiological and pathological differences among patients. Utilizing patient characteristics to predict disease activity post-rituximab treatment holds promise for improving treatment management and providing precision therapy. The purpose of this study was to develop machine learning (ML) models that can predict the disease activity, defined by the Disease Activity Score in 28 Joints (DAS28) 6 months, and 1 year after rituximab treatment initiation in patients with rheumatoid arthritis. This prediction was based on analyzing patient clinical, biochemical, and genetic data and understanding how these characteristics impact rituximab treatment outcomes. Methods: This is a retrospective cohort study involving a study population of 100 patients with rheumatoid arthritis who were treated with rituximab. Patient datasets, including demographic information, clinical features, laboratory characteristics, and FCGR3A genotyping were analyzed and used as input variables in five different ML prediction models [Linear Regression, Support Vector Regression (SVR), Cat Boost Regressor, Elastic Net, and Extreme Gradient Boosting (XGB) Regressor]. The performance of the models was evaluated using different input variable combinations in a 5-fold cross-validation setting, with training and testing data being divided into 20-80% ratio per cross-validation. Results: Results demonstrated the superiority of the Cat Boost Regressor model for predicting the disease activity 6 months and 1 year after rituximab treatment, as evidenced by the averaged error between predicted and actual DAS28 values. Furthermore, incorporating laboratory analyses from the 6th month enhances the models' predictive performance for DAS28 after 1 year of treatment, leading to increased precision. Conclusions: This approach highlights the ability of ML in enhancing the treatment management of chronic diseases like rheumatoid arthritis
Article
Full-text available
Objectives To provide an update of the EULAR rheumatoid arthritis (RA) management recommendations addressing the most recent developments in the field. Methods An international task force was formed and solicited three systematic literature research activities on safety and efficacy of disease-modifying antirheumatic drugs (DMARDs) and glucocorticoids (GCs). The new evidence was discussed in light of the last update from 2019. A predefined voting process was applied to each overarching principle and recommendation. Levels of evidence and strengths of recommendation were assigned to and participants finally voted on the level of agreement with each item. Results The task force agreed on 5 overarching principles and 11 recommendations concerning use of conventional synthetic (cs) DMARDs (methotrexate (MTX), leflunomide, sulfasalazine); GCs; biological (b) DMARDs (tumour necrosis factor inhibitors (adalimumab, certolizumab pegol, etanercept, golimumab, infliximab including biosimilars), abatacept, rituximab, tocilizumab, sarilumab and targeted synthetic (ts) DMARDs, namely the Janus kinase inhibitors tofacitinib, baricitinib, filgotinib, upadacitinib. Guidance on monotherapy, combination therapy, treatment strategies (treat-to-target) and tapering in sustained clinical remission is provided. Safety aspects, including risk of major cardiovascular events (MACEs) and malignancies, costs and sequencing of b/tsDMARDs were all considered. Initially, MTX plus GCs is recommended and on insufficient response to this therapy within 3–6 months, treatment should be based on stratification according to risk factors; With poor prognostic factors (presence of autoantibodies, high disease activity, early erosions or failure of two csDMARDs), any bDMARD should be added to the csDMARD; after careful consideration of risks of MACEs, malignancies and/or thromboembolic events tsDMARDs may also be considered in this phase. If the first bDMARD (or tsDMARD) fails, any other bDMARD (from another or the same class) or tsDMARD (considering risks) is recommended. With sustained remission, DMARDs may be tapered but should not be stopped. Levels of evidence and levels of agreement were high for most recommendations. Conclusions These updated EULAR recommendations provide consensus on RA management including safety, effectiveness and cost.
Article
Full-text available
Background: : The molecular signature response classifier (MSRC) predicts tumor necrosis factor-ɑ inhibitor (TNFi) non-response in rheumatoid arthritis. This study evaluates decision-making, validity, and utility of MSRC testing. Methods: : This comparative cohort study compared an MSRC-tested arm (N=627) from the Study to Accelerate Information of Molecular Signatures (AIMS) with an external control arm (N=2721) from US electronic health records. Propensity score matching was applied to balance baseline characteristics. Patients initiated a biologic/targeted synthetic disease-modifying antirheumatic drug, or continued TNFi therapy. Odds ratios (ORs) for six-month response were calculated based on clinical disease activity index (CDAI) scores for low disease activity/remission (CDAI-LDA/REM), remission (CDAI-REM), and minimally important differences (CDAI-MID) for change. Results: : In MSRC-tested patients, 59% had a non-response signature and 70% received therapy aligned with test results. In TNFi-treated patients, the MSRC had an 88% PPV and 54% sensitivity. MSRC-guided patients were significantly (p<0.0001) more likely to respond to b/tsDMARDs compared with those treated according to standard care (CDAI-LDA/REM: 36.0% vs 21.9%, OR 2.01[1.55-2.60]; CDAI-REM: 10.4% vs 3.6%, OR 3.14 [1.94-5.08]; CDAI-MID: 49.5% vs 32.8%, OR 2.01[1.58-2.55]). Conclusion: : MSRC clinical validity supports high clinical utility: guided treatment selection resulted in significantly superior outcomes relative to standard care; nearly three times more patients reached CDAI remission.
Article
Full-text available
Objective: In 2011, the American College of Rheumatology (ACR) and EULAR endorsed provisional criteria for remission in rheumatoid arthritis (RA), both Boolean-based and index-based. Based on recent studies indicating that a higher threshold for the patient global assessment (PtGA) may improve agreement between the two sets of criteria, our goals were to externally validate a revision of the Boolean remission criteria using a higher PtGA threshold and to validate the provisionally endorsed index-based criteria. Methods: We used data from four randomised trials comparing biological disease-modifying antirheumatic drugs to methotrexate or placebo. We tested the higher proposed PtGA threshold of 2 cm (Boolean2.0) (range 0-10 cm) compared with the original threshold of 1 cm (Boolean1.0). We analysed agreement between the Boolean-based and index-based criteria (Simplified Disease Activity Index (SDAI) and Clinical Disease Activity Index (CDAI)) for remission and examined how well each remission definition predicted later good physical function (Health Assessment Questionnaire (HAQ) score≤0.5) and radiographic non-progression. Results: Data from 2048 trial participants, 1101 with early RA and 947 with established RA, were included. The proportion of patients with disease in remission at 6 months after treatment initiation increased when using Boolean2.0 compared with Boolean1.0, from 14.8% to 20.6% in early RA and 4.2% to 6.0% in established RA. Agreement between Boolean2.0 and the SDAI or CDAI remission criteria was better than for Boolean1.0, particularly in early disease. Boolean2.0, SDAI, and CDAI remission criteria had similar positive likelihood ratios (LRs) to predict radiographic nonprogression and a HAQ score of ≤0.5 (positive LR 3.8-4.3). The omission of PtGA (BooleanX) worsened the prediction of good functional outcomes. Conclusion: Using the Boolean 2.0 criteria classifies, more patients as achieving remission and increases the agreement with index-based remission criteria without jeopardising predictive value for radiographic or functional outcomes. This revised Boolean definition and the previously provisionally endorsed index-based criteria were endorsed by ACR and EULAR.
Article
Full-text available
Background Methotrexate is the preferred initial disease-modifying antirheumatic drug (DMARD) for rheumatoid arthritis (RA). However, clinically useful tools for individualized prediction of response to methotrexate treatment in patients with RA are lacking. We aimed to identify clinical predictors of response to methotrexate in patients with rheumatoid arthritis (RA) using machine learning methods. Methods Randomized clinical trials (RCT) of patients with RA who were DMARD-naïve and randomized to placebo plus methotrexate were identified and accessed through the Clinical Study Data Request Consortium and Vivli Center for Global Clinical Research Data. Studies with available Disease Activity Score with 28-joint count and erythrocyte sedimentation rate (DAS28-ESR) at baseline and 12 and 24 weeks were included. Latent class modeling of methotrexate response was performed. The least absolute shrinkage and selection operator (LASSO) and random forests methods were used to identify predictors of response. Results A total of 775 patients from 4 RCTs were included (mean age 50 years, 80% female). Two distinct classes of patients were identified based on DAS28-ESR change over 24 weeks: “good responders” and “poor responders.” Baseline DAS28-ESR, anti-citrullinated protein antibody (ACPA), and Health Assessment Questionnaire (HAQ) score were the top predictors of good response using LASSO (area under the curve [AUC] 0.79) and random forests (AUC 0.68) in the external validation set. DAS28-ESR ≤ 7.4, ACPA positive, and HAQ ≤ 2 provided the highest likelihood of response. Among patients with 12-week DAS28-ESR > 3.2, ≥ 1 point improvement in DAS28-ESR baseline-to-12-week was predictive of achieving DAS28-ESR ≤ 3.2 at 24 weeks. Conclusions We have developed and externally validated a prediction model for response to methotrexate within 24 weeks in DMARD-naïve patients with RA, providing variably weighted clinical features and defined cutoffs for clinical decision-making.
Article
Full-text available
Rheumatoid arthritis (RA), one of the most common immune system diseases, mainly affects middle-aged and elderly individuals and has a serious impact on the quality of life of patients. Pain and disability caused by RA are significant symptoms negatively affecting patients, and they are especially seen when inappropriate treatment is administered. Effective therapeutic strategies have evolved over the past few decades, with many new disease-modifying antirheumatic drugs (DMARDs) being used in the clinic. Owing to the breakthrough in the treatment of RA, the symptoms of patients who could not be treated effectively in the past few years have been relieved. However, some patients complain about symptoms that have not been reported, implying that there are still some limitations in the RA treatment and evaluation system. In recent years, biomarkers, an effective means of diagnosing and evaluating the condition of patients with RA, have gradually been used in clinical practice to evaluate the therapeutic effect of RA, which is constantly being improved for accurate application of treatment in patients with RA. In this article, we summarize a series of biomarkers that may be helpful in evaluating the therapeutic effect and improving the efficiency of clinical treatment for RA. These efforts may also encourage researchers to devote more time and resources to the study and application of biomarkers, resulting in a new evaluation system that will reduce the inappropriate use of DMARDs, as well as patients’ physical pain and financial burden.
Article
Full-text available
The black-box nature of current artificial intelligence (AI) has caused some to question whether AI must be explainable to be used in high-stakes scenarios such as medicine. It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias. In this Viewpoint, we argue that this argument represents a false hope for explainable AI and that current explainability methods are unlikely to achieve these goals for patient-level decision support. We provide an overview of current explainability techniques and highlight how various failure cases can cause problems for decision making for individual patients. In the absence of suitable explainability methods, we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals often associated with explainability, and we caution against having explainability be a requirement for clinically deployed models.
Article
Significant recent progress in understanding rheumatoid arthritis (RA) pathogenesis has led to improved treatment and quality of life. The introduction of targeted-biologic and -synthetic disease modifying anti-rheumatic drugs (DMARDs) has also transformed clinical outcomes. Despite this, RA remains a life-long disease without a cure. Unmet needs include partial response and non-response to treatment in many patients, failure to achieve immune homeostasis or drug free remission, and inability to repair damaged tissues. RA is now recognized as the end of a multi-year prodromal phase in which systemic immune dysregulation, likely beginning in mucosal surfaces, is followed by a symptomatic clinical phase. Inflammation and immune reactivity are primarily localized to the synovium leading to pain and articular damage, but is also associated with a broader series of comorbidities. Here, we review recently described immunologic mechanisms that drive breach of tolerance, chronic synovitis, and remission.
Article
Background Rheumatoid arthritis is a chronic inflammatory disease with a well-recognised female preponderance. In this post-hoc analysis of the NORD-STAR trial, we aimed to examine sex differences in remission rates with three different biological treatments combined with methotrexate versus active conventional treatment over 24 weeks, in patients with early rheumatoid arthritis. Methods NORD-STAR was a multicentre, investigator-initiated, assessor-blinded, phase 4, randomised, controlled trial of early rheumatoid arthritis, done in Denmark, Finland, Iceland, Norway, Sweden, and the Netherlands. Newly diagnosed patients, naive to disease-modifying antirheumatic drugs, aged 18 years or older with early rheumatoid arthritis and with a symptom duration less than 24 months were randomly assigned (1:1:1:1) to receive active conventional treatment, certolizumab-pegol, abatacept, or tocilizumab. Sex was reported in case report forms by study physicians or by study nurses. Data on gender were not collected. Remission outcomes were analysed with logistic generalised estimating equations (GEE), using a logit link and exchangeable correlation matrix. The model included treatment, time, sex, and the relevant interactions. For this post-hoc analysis, the co-primary outcomes were differences in Clinical Disease Activity Index (CDAI) remission (CDAI score ≤2·8) between sexes over time and at week 24, assessed with interaction terms (men vs women within each treatment comparison) and using active conventional treatment as the reference. We present adjusted average marginal differences in remission rates (risk differences) with 95% CIs. Findings Between Dec 14, 2012, and Dec 11, 2018, 812 patients were enrolled and randomly assigned; 217 received active conventional treatment, 203 received certolizumab-pegol, 204 received abatacept, and 188 received tocilizumab. All 812 patients were included in this analysis; 561 (69%) were women and 251 (31%) were men. Observed CDAI remission rates at 24 weeks were numerically higher among men than among women despite comparable disease activity at baseline (55% vs 50% with active conventional treatment, 57% vs 52% with certolizumab-pegol, 65% vs 51% with abatacept, and 61% vs 40% with tocilizumab). In the adjusted analysis, with active conventional treatment as the reference, the only significant difference between men and women was in the tocilizumab group (pinteraction=0·015); men in the tocilizumab group had a higher probability of CDAI remission, on average over time, than did men in the active conventional treatment group (0·12; 95% CI 0·00 to 0·23), whereas women in the tocilizumab group had a lower probability of remission than did women in the active conventional treatment group (–0·05, 95% CI –0·13 to 0·02). Interpretation Numerically higher remission rates were observed in men than in women in all four treatment groups at week 24, suggesting that this generalised sex difference is not related to the treatment. The difference between men and women was significantly greater with tocilizumab, an interleukin (IL)-6 inhibitor, than with active conventional treatment, suggesting a possible additional sex-based effect specific for IL-6 blockade. Funding None.
Article
Despite its inclusion in current treatment recommendations, adherence to the treat-to-target strategy (T2T) is still poor. Among the issues are the definition(s) of target, especially the caveats of the patient global assessment (PGA), included in all recommended definitions of remission. The PGA is poorly related to inflammation, especially at low levels of disease activity, rather being a measure of the disease impact. Up to 60% of all patients otherwise in remission still score PGA at >1 and as high as 10. These patients (PGA-near-remission) are exposed to overtreatment if current recommendations are strictly followed and will continue to endure significant impact, unless adjuvant measures are implemented. A proposed method to overcome both these risks is to systematically pursue two targets: one focused on the disease process (the biological target) and another focused on the symptoms and impact (the impact target), the dual-target strategy. Candidate instruments to define each of these targets are discussed.