| The variables used for modeling and their importance ranking (in order of median importance). The left part of the figure shows the variables used in the pre-administration variables model. The right part of the figure shows a mixture of variables before and after administration (variables collected within 3 months after administration with MTX). The shorter the transverse column (i.e. the smaller the value), the greater importance of the median ranking of the variable (see the top and left part).

| The variables used for modeling and their importance ranking (in order of median importance). The left part of the figure shows the variables used in the pre-administration variables model. The right part of the figure shows a mixture of variables before and after administration (variables collected within 3 months after administration with MTX). The shorter the transverse column (i.e. the smaller the value), the greater importance of the median ranking of the variable (see the top and left part).

Source publication
Article
Full-text available
Background and Aims: Accurately predicting the response to methotrexate (MTX) in juvenile idiopathic arthritis (JIA) patients before administration is the key point to improve the treatment outcome. However, no simple and reliable prediction model has been identified. Here, we aimed to develop and validate predictive models for the MTX response to...

Contexts in source publication

Context 1
... some variables were seriously missing, they were not be used for modeling. All variables used for modeling are shown in Figure 1. ...
Context 2
... two groups of models were established based on variables before the onset of MTX and mix-variables within 3 months after starting MTX respectively, for finding the best model. In the first group of models, referred to as pre-administration variables models (MTX-A), we included 46 variables (see the left part of Figure 1). In the second group of models, referred to as mix-variables models (MTX-B), we extended MTX-A by adding 32 new variables (see the right part of Figure 1). ...
Context 3
... the first group of models, referred to as pre-administration variables models (MTX-A), we included 46 variables (see the left part of Figure 1). In the second group of models, referred to as mix-variables models (MTX-B), we extended MTX-A by adding 32 new variables (see the right part of Figure 1). The main process can be divided into three steps: (1) data processing, (2) feature selection, (3) model generation and validation. ...
Context 4
... importance ranking of all variables before using MTX and mix-variables before and after administering with MTX were shown in Figure 1 (left part and right part). The XGBoost algorithm was applied for selecting the minimum size and optimum accuracy features subset, and the results of this process were shown in Figure 3. ...
Context 5
... the MTX-A predictors, the three measurements reached the optimum when 10 feature subsets were selected (see Figure 3A). The 10 selected significant variables are listed above the dotted red line in the left part of Figure 1. In the MTX-B predictors, the three measurements achieve maximum performance when 6 feature subsets were screened out (see Figure 3B). ...
Context 6
... the MTX-B predictors, the three measurements achieve maximum performance when 6 feature subsets were screened out (see Figure 3B). Variables above the dotted red line in the right part of Figure 1 are these 6 features. The degree of contribution of all the above selected variables to response and the formulas behind modeling were described in detail in the Supplementary. ...

Similar publications

Article
Full-text available
Background Current approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, and most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcom...

Citations

... Over the past decade, ML has been applied to various aspects of healthcare, and the field of rheumatology has been no exception. ML models have shown promise in various issues, ranging from automated detection of disease flares to predicting response to therapy [7][8][9][10][11][12][13][14][15][16][17]. One can note how ML has used diverse data sources to predict disease classification and genetic patterns and answer research questions accurately in rheumatology. ...
Article
Full-text available
Objective The aim of this study is to develop a machine learning (ML) model to accurately predict liver enzyme elevation in rheumatoid arthritis (RA) patients on treatment with methotrexate (MTX) using electronic health record (EHR) data from a real-world RA cohort. Methods Demographic, clinical, biochemical, and prescription information from 569 RA patients initiated on MTX were collected retrospectively. The primary outcome was the liver transaminase elevation above the upper limit of normal (40 IU/mL), following the initiation of MTX. The total dataset was randomly split into a training (80%) and test set (20%) and used to develop a random forest classifier model. The best model was selected after hyper-parameter tuning and fivefold cross-validation. Results A total of 104 (18.2%) patients developed elevated transaminase while on MTX therapy. The best-performing predictive model had an accuracy/F1 score of 0.87. The top 10 predictive features were then used to create a limited feature model that retained most of the predictive accuracy, with an accuracy/F1 score of 0.86. Baseline high-normal transaminase levels, and higher lymphocyte and neutrophil blood count proportions were the highest predictors of elevated transaminase levels after MTX therapy. Conclusion Our proof-of-concept study suggests the possibility of building a well-performing ML model to predict liver transaminase elevation in RA patients being treated with MTX. Similar ML models could be used to identify “high-risk” patients and target them for early stratification.
... Compared with Linear Regression models, machine learning and deep learning models can deal with real-world evidence with facility. It is because that machine learning and deep learning techniques can process complex, highdimensional and interactive variable relationship, as well as these techniques can establish models with strong generalization and good accuracy (Kruppa et al., 2012;Lee et al., 2018;Mo et al., 2019). In recent years, some algorithms with more sophisticated principles have been developed, such as Gradient Boosting Decision Tree (GBDT), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), and TabNet, which have been highly recognized in algorithm competitions Ke et al., 2017;Prokhorenkova et al., 2017;Arik and Pfister, 2019;Janßen et al., 2019). ...
Article
Full-text available
Objective: Shengmai injection is a common treatment for coronary heart disease. The accurate dose regimen is important to maximize effectiveness and minimize adverse reactions. We aim to explore the effect of Shengmai injection in patients with coronary heart disease based on real-world data and establish a personalized medicine model using machine learning and deep learning techniques. Methods: 211 patients were enrolled. The length of hospital stay was used to explore the effect of Shengmai injection in a case-control study. We applied propensity score matching to reduce bias and Wilcoxon rank sum test to compare results between the experimental group and the control group. Important variables influencing the dose regimen of Shengmai injection were screened by XGBoost. A personalized medicine model of Shengmai injection was established by XGBoost selected from nine algorithm models. SHapley Additive exPlanations and confusion matrix were used to interpret the results clinically. Results: Patients using Shengmai injection had shorter length of hospital stay than those not using Shengmai injection (median 10.00 days vs. 11.00 days, p = 0.006). The personalized medicine model established via XGBoost shows accuracy = 0.81 and AUC = 0.87 in test cohort and accuracy = 0.84 and AUC = 0.84 in external verification. The important variables influencing the dose regimen of Shengmai injection include lipid-lowering drugs, platelet-lowering drugs, levels of GGT, hemoglobin, prealbumin, and cholesterol at admission. Finally, the personalized model shows precision = 75%, recall rate = 83% and F1-score = 79% for predicting 40 mg of Shengmai injection; and precision = 86%, recall rate = 79% and F1-score = 83% for predicting 60 mg of Shengmai injection. Conclusion: This study provides evidence supporting the clinical effectiveness of Shengmai injection, and established its personalized medicine model, which may help clinicians make better decisions.
... Therefore, currently, employing highthroughput technologies to develop computational methods to process patient -omic information to identify new and and precise conclusions is ever more important (45). For example, implementing machine learning algorithms in high-dimensional data analysis is a well-established approach to enhancing patient classification (46)(47)(48) or in predicting disease activity (49) in RA. Herein, we employed five machine learning approaches to analyze DNA methylation patterns, along with patient clinical information, to fine-tune the estimation of previously available stratifiers in an independent testing dataset. ...
Article
Full-text available
Objective Although Leflunomide (LEF) is effective in treating rheumatoid arthritis (RA), there are still a considerable number of patients who respond poorly to LEF treatment. Till date, few LEF efficacy-predicting biomarkers have been identified. Herein, we explored and developed a DNA methylation-based predictive model for LEF-treated RA patient prognosis. Methods Two hundred forty-five RA patients were prospectively enrolled from four participating study centers. A whole-genome DNA methylation profiling was conducted to identify LEF-related response signatures via comparison of 40 samples using Illumina 850k methylation arrays. Furthermore, differentially methylated positions (DMPs) were validated in the 245 RA patients using a targeted bisulfite sequencing assay. Lastly, prognostic models were developed, which included clinical characteristics and DMPs scores, for the prediction of LEF treatment response using machine learning algorithms. Results We recognized a seven-DMP signature consisting of cg17330251, cg19814518, cg20124410, cg21109666, cg22572476, cg23403192, and cg24432675, which was effective in predicting RA patient’s LEF response status. In the five machine learning algorithms, the support vector machine (SVM) algorithm provided the best predictive model, with the largest discriminative ability, accuracy, and stability. Lastly, the AUC of the complex model(the 7-DMP scores with the lymphocyte and the diagnostic age) was higher than the simple model (the seven-DMP signature, AUC:0.74 vs 0.73 in the test set). Conclusion In conclusion, we constructed a prognostic model integrating a 7-DMP scores with the clinical patient profile to predict responses to LEF treatment. Our model will be able to effectively guide clinicians in determining whether a patient is LEF treatment sensitive or not.
... ML is a vast field that has gained much interest in medicine in recent years. For example, Mo et al. [30] developed two predictive methotrexate response models for juvenile idiopathic arthritis by extreme gradient boosting, SVM, RF, and logistic regression, and Huang et al. [22] developed a breast cancer prediction model using an SVM. Owing to the practical advantages of ML, high prediction accuracy was achieved in the above models. ...
Article
Full-text available
In the first trimester of pregnancy, accurately predicting the occurrence of pregnancy-induced hypertension (PIH) is important for both identifying high-risk women and adopting early intervention. In this study, we used four machine-learning models (LASSO logistic regression, random forest, backpropagation neural network, and support vector machines) to predict the occurrence of PIH in a prospective cohort. Candidate features for predicting the occurrence of middle and late PIH were acquired using a LASSO algorithm. The performance of predictive models was assessed using receiver operating characteristic analysis. Finally, a nomogram was established with the model scores, age, and nulliparity. Calibration, clinical usefulness, and internal validation were used to assess the performance of the nomogram. In the training set (2258 pregnant women), eleven candidate factors in the first trimester were significantly associated with the occurrence of PIH (P < 0.001 in the training set). Four models showed AUCs from 0.780 to 0.816 in the training set. For the validation set (939 pregnant women), AUCs varied from 0.516 to 0.795. The nomogram showed good discrimination, with an AUC of 0.847 (95% CI: 0.805-0.889) in the training set and 0.753 (95% CI: 0.653-0.853) in the validation set. Decision curve analysis suggested that the model was clinically useful. The model developed using LASSO logistic regression achieved the best performance in predicting the occurrence of PIH. The derived nomogram, which incorporates the model score and maternal risk factors, can be used to predict PIH in clinical practice. We develop a model with good performance for clinical prediction of PIH in the first trimester.
... Then the data were initialized and modifed to a uniform format. In order to get a higher quality data set, missing values were filled with the mean value based on age (20). The main prediction module used the Scikit-Learn machine learning library to train the model and predict the results. ...
Article
Full-text available
Objective We aimed to construct and validate machine learning models for endotracheal tube (ETT) size prediction in pediatric patients. Methods Data of 990 pediatric patients underwent endotracheal intubation were retrospectively collected between November 2019 and October 2021, and separated into cuffed and uncuffed endotracheal tube subgroups. Six machine learning algorithms, including support vector regression (SVR), logistic regression (LR), random forest (RF), gradient boosting tree (GBR), decision tree (DTR) and extreme gradient boosting tree (XGBR), were selected to construct and validate models using ten-fold cross validation in training set. The optimal models were selected, and the performance were compared with traditional predictive formulas and clinicians. Furthermore, additional data of 71 pediatric patients were collected to perform external validation. Results The optimal 7 uncuffed and 5 cuffed variables were screened out by feature selecting. The RF models had the best performance with minimizing prediction error for both uncuffed ETT size (MAE = 0.275 mm and RMSE = 0.349 mm) and cuffed ETT size (MAE = 0.243 mm and RMSE = 0.310 mm). The RF models were also superior in predicting power than formulas in both uncuffed and cuffed ETT size prediction. In addition, the RF models performed slightly better than senior clinicians, while they significantly outperformed junior clinicians. Based on SVR models, we proposed 3 novel linear formulas for uncuffed and cuffed ETT size respectively. Conclusion We have developed machine learning models with excellent performance in predicting optimal ETT size in both cuffed and uncuffed endotracheal intubation in pediatric patients, which provides powerful decision support for clinicians to select proper ETT size. Novel formulas proposed based on machine learning models also have relatively better predictive performance. These models and formulas can serve as important clinical references for clinicians, especially for performers with rare experience or in remote areas.
... Compared with conventional modeling methods, machine learning and deep learning techniques have indubitable advantages in dealing with real-world data, such as 1) machine learning and deep learning techniques can deal with more complex, high-dimensional, and interactive variables, which is lacking in traditional models; 2) machine learning and deep learning models have stronger generalization and better accuracy than conventional models (Kruppa et al., 2012;Lee et al., 2018;Mo et al., 2019). Recently, some algorithms with more sophisticated principles have been developed, such as eXtreme Gradient Boosting (XGBoost), light gradient boosting machine (LightGBM), categorical boosting (CatBoost), and gradient boosting decision tree (GBDT), which have been highly recognized in algorithm competitions (Chen and Guestrin, 2016;Ke et al., 2017;Prokhorenkova et al., 2017;Zhang et al., 2019). ...
Article
Full-text available
Valproic acid/sodium valproate (VPA) is a widely used anticonvulsant drug for maintenance treatment of bipolar disorders. In order to balance the efficacy and adverse events of VPA treatment, an individualized dose regimen is necessary. This study aimed to establish an individualized medication model of VPA for patients with bipolar disorder based on machine learning and deep learning techniques. The sequential forward selection (SFS) algorithm was applied for selecting a feature subset, and random forest was used for interpolating missing values. Then, we compared nine models using XGBoost, LightGBM, CatBoost, random forest, GBDT, SVM, logistic regression, ANN, and TabNet, and CatBoost was chosen to establish the individualized medication model with the best performance (accuracy = 0.85, AUC = 0.91, sensitivity = 0.85, and specificity = 0.83). Three important variables that correlated with VPA daily dose included VPA TDM value, antipsychotics, and indirect bilirubin. SHapley Additive exPlanations was applied to visually interpret their impacts on VPA daily dose. Last, the confusion matrix presented that predicting a daily dose of 0.5 g VPA had a precision of 55.56% and recall rate of 83.33%, and predicting a daily dose of 1 g VPA had a precision of 95.83% and a recall rate of 85.19%. In conclusion, the individualized medication model of VPA for patients with bipolar disorder based on CatBoost had a good prediction ability, which provides guidance for clinicians to propose the optimal medication regimen.
... Compared with conventional modeling methods, machine learning and deep learning techniques have indubitable advantages in dealing with real-world evidence, such as the following: (1) machine learning and deep learning can deal with more complex, high-dimensional, and interactive variables, which is lacking in traditional models, and (2) machine learning and deep learning models have stronger generalization and better accuracy than traditional models (14)(15)(16). Recently, some algorithms with more sophisticated principles have been developed, such as eXtreme Gradient Boosting (XGBoost), light gradient boosting machine (LightGBM), Categorical Boosting (CatBoost), Gradient Boosting Decision Tree (GBDT), and TabNet, which have been highly recognized in algorithm competitions (17)(18)(19)(20)(21). Recently, the application of machine learning and deep learning techniques based on real-world study has been a trend, such as a novel prognostic scoring system of intrahepatic cholangiocarcinoma with ensemble machine learning algorithms (XGBoost, random forest, and GBDT), a prediction model of tacrolimus blood concentration in patients with autoimmune diseases using XGBoost, a novel vancomycin dose prediction model through XGBoost, and warfarin maintenance dose prediction through LightGBM (22)(23)(24)(25). ...
Article
Full-text available
Lapatinib is used for the treatment of metastatic HER2(+) breast cancer. We aim to establish a prediction model for lapatinib dose using machine learning and deep learning techniques based on a real-world study. There were 149 breast cancer patients enrolled from July 2016 to June 2017 at Fudan University Shanghai Cancer Center. The sequential forward selection algorithm based on random forest was applied for variable selection. Twelve machine learning and deep learning algorithms were compared in terms of their predictive abilities (logistic regression, SVM, random forest, Adaboost, XGBoost, GBDT, LightGBM, CatBoost, TabNet, ANN, Super TML, and Wide&Deep). As a result, TabNet was chosen to construct the prediction model with the best performance (accuracy = 0.82 and AUC = 0.83). Afterward, four variables that strongly correlated with lapatinib dose were ranked via importance score as follows: treatment protocols, weight, number of chemotherapy treatments, and number of metastases. Finally, the confusion matrix was used to validate the model for a dose regimen of 1,250 mg lapatinib (precision = 81% and recall = 95%), and for a dose regimen of 1,000 mg lapatinib (precision = 87% and recall = 64%). To conclude, we established a deep learning model to predict lapatinib dose based on important influencing variables selected from real-world evidence, to achieve an optimal individualized dose regimen with good predictive performance.
... Furthermore, compared with conventional modeling methods, machine learning and deep learning techniques have indubitable advantages in dealing with real-world data, such as (i) machine learning and deep learning can deal with more complex, high-dimensional, and interactive variables from the clinical environment, which is lacking in conventional models; (ii) machine learning and deep learning models have a stronger generalization and better accuracy than conventional models (37)(38)(39). Recently, the application of machine learning and deep learning techniques on individualized dose prediction models has been approbatory, such as a novel vancomycin dose prediction model through XGBoost and warfarin maintenance dose prediction through LightGBM (40,41). ...
Article
Full-text available
Tacrolimus is a major immunosuppressor against post-transplant rejection in kidney transplant recipients. However, the narrow therapeutic index of tacrolimus and considerable variability among individuals are challenges for therapeutic outcomes. The aim of this study was to compare different machine learning and deep learning algorithms and establish individualized dose prediction models by using the best performing algorithm. Therefore, among the 10 commonly used algorithms we compared, the TabNet algorithm outperformed other algorithms with the highest R ² (0.824), the lowest prediction error [mean absolute error (MAE) 0.468, mean square error (MSE) 0.558, and root mean square error (RMSE) 0.745], and good performance of overestimated (5.29%) or underestimated dose percentage (8.52%). In the final prediction model, the last tacrolimus daily dose, the last tacrolimus therapeutic drug monitoring value, time after transplantation, hematocrit, serum creatinine, aspartate aminotransferase, weight, CYP3A5 , body mass index, and uric acid were the most influential variables on tacrolimus daily dose. Our study provides a reference for the application of deep learning technique in tacrolimus dose estimation, and the TabNet model with desirable predictive performance is expected to be expanded and applied in future clinical practice.
... XGBoost (XGB) [57] stands for Extreme Gradient Boosting; it is based on gradient boosting method [58] which uses more accurate approximations to find the best tree model. Similar to gradient boosting, XGBoost builds an additive expansion of the objective function by minimizing a loss function. ...
Article
Full-text available
Chronic Obstructive Pulmonary Disease (COPD) is a progressive, obstructive lung disease that restricts airflow from the lungs. COPD patients are at risk of sudden and acute worsening of symptoms called exacerbations. Early identification and classification of COPD exacerbation can reduce COPD risks and improve patient’s healthcare and management. Pulse oximetry is a non-invasive technique used to assess patients with acutely worsening symptoms. As part of manual diagnosis based on pulse oximetry, clinicians examine three warning signs to classify COPD patients. This may lack high sensitivity and specificity which requires a blood test. However, laboratory tests require time, further delayed treatment and additional costs. This research proposes a prediction method for COPD patients’ classification based on pulse oximetry three manual warning signs and the resulting derived few key features that can be obtained in a short time. The model was developed on a robust physician labeled dataset with clinically diverse patient cases. Five classification algorithms were applied on the mentioned dataset and the results showed that the best algorithm is XGBoost with the accuracy of 91.04%, precision of 99.86%, recall of 82.19%, F1 measure value of 90.05% with an AUC value of 95.8%. Age, current and baseline heart rate, current and baseline pulse ox. (SPO2) were found the top most important predictors. These findings suggest the strength of XGBoost model together with the availability and the simplicity of input variables in classifying COPD daily living using a (wearable) pulse oximeter.
... In recent years, the prediction models based on machine learning have excellent performance in the diagnosis, treatment, and prognosis of diseases. [15][16][17] Tang et al have developed machine-learning models to predict TAC stable dose in renal transplant recipients, and their performance is better than that of traditional statistical methods. 18 However, there has been no research on the prediction of the TAC concentration in NS patients using machine learning. ...
Article
Full-text available
Purpose: Tacrolimus (TAC) is a first-line immunosuppressant for patients with refractory nephrotic syndrome (NS). However, there is a high inter-patient variability of TAC pharmacokinetics, thus therapeutic drug monitoring (TDM) is required. In this study, we aimed to employ machine learning algorithms to investigate the impact of clinical and genetic variables on the TAC dose/weight-adjusted trough concentration (C0/D) in Chinese children with refractory NS, and then develop and validate the TAC C0/D prediction models. Patients and methods: The association of 82 clinical variables and 244 single nucleotide polymorphisms (SNPs) with TAC C0/D in the third month since TAC treatment was examined in 171 children with refractory NS. Extremely randomized trees (ET), gradient boosting decision tree (GBDT), random forest (RF), extreme gradient boosting (XGBoost), and Lasso regression were carried out to establish and validate prediction models, respectively. The best prediction models were validated on a cohort of 30 refractory NS patients. Results: GBDT algorithm performed best in the whole group (R2=0.444, MSE=591.032, MAE=20.782, MedAE=18.980) and CYP3A5 nonexpresser group (R2=0.264, MSE=477.948, MAE=18.119, MedAE=18.771), while ET algorithm performed best in the CYP3A5 expresser group (R2=0.380, MSE=1839.459, MAE=31.257, MedAE=19.399). These prediction models included 3 clinical variables (ALB0, AGE0, and gender) and 10 SNPs (ACTN4 rs3745859, ACTN4 rs56113315, ACTN4 rs62121818, CTLA4 rs4553808, CYP3A5 rs776746, IL2RA rs12722489, INF2 rs1128880, MAP3K11 rs7946115, MYH9 rs2239781, and MYH9 rs4821478). Conclusion: The association between the clinical and genetic variables and TAC C0/D was described, and three TAC C0/D prediction models integrating clinical and genetic variables were developed and validated using machine learning, which may support individualized TAC dosing.