Figure - available from: Arthritis Research & Therapy
This content is subject to copyright. Terms and conditions apply.
Area under the receiver operating characteristics for fivefold cross-validation

Area under the receiver operating characteristics for fivefold cross-validation

Source publication
Article
Full-text available
Objectives Machine learning models can support an individualized approach in the choice of bDMARDs. We developed prediction models for 5 different bDMARDs using machine learning methods based on patient data derived from the Austrian Biologics Registry (BioReg). Methods Data from 1397 patients and 19 variables with at least 100 treat-to-target (t2...

Citations

... This can lead to improved model performance, especially in the context of highly imbalanced datasets, where traditional classifiers may bias towards the majority class. The effectiveness of SMOTE in addressing class imbalance has been welldocumented in previous studies [39,41]. ...
... AUC, F1 score, and G-means are the three comprehensive metrics we prioritize in model evaluation. Specifically, we considered a value above 0.7 for these metrics as indicative of good model performance [41]. In summary, the combination of these metrics provides a comprehensive understanding of model performance, helping to assess its accuracy and applicability. ...
Article
Full-text available
Objective To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. Methods In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. Results A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. Conclusions Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.
Article
Full-text available
Background: The management of rheumatoid arthritis with biologic disease-modifying anti-rheumatic drugs (bDMARDs) requires careful consideration due to the considerable variability among patients. Despite the effectiveness of rituximab as a frequently used bDMARD, achieving treatment goals can be challenging because of the physiological and pathological differences among patients. Utilizing patient characteristics to predict disease activity post-rituximab treatment holds promise for improving treatment management and providing precision therapy. The purpose of this study was to develop machine learning (ML) models that can predict the disease activity, defined by the Disease Activity Score in 28 Joints (DAS28) 6 months, and 1 year after rituximab treatment initiation in patients with rheumatoid arthritis. This prediction was based on analyzing patient clinical, biochemical, and genetic data and understanding how these characteristics impact rituximab treatment outcomes. Methods: This is a retrospective cohort study involving a study population of 100 patients with rheumatoid arthritis who were treated with rituximab. Patient datasets, including demographic information, clinical features, laboratory characteristics, and FCGR3A genotyping were analyzed and used as input variables in five different ML prediction models [Linear Regression, Support Vector Regression (SVR), Cat Boost Regressor, Elastic Net, and Extreme Gradient Boosting (XGB) Regressor]. The performance of the models was evaluated using different input variable combinations in a 5-fold cross-validation setting, with training and testing data being divided into 20-80% ratio per cross-validation. Results: Results demonstrated the superiority of the Cat Boost Regressor model for predicting the disease activity 6 months and 1 year after rituximab treatment, as evidenced by the averaged error between predicted and actual DAS28 values. Furthermore, incorporating laboratory analyses from the 6th month enhances the models' predictive performance for DAS28 after 1 year of treatment, leading to increased precision. Conclusions: This approach highlights the ability of ML in enhancing the treatment management of chronic diseases like rheumatoid arthritis