Fig 5 - available via license: CC BY
Content may be subject to copyright.
Trade-off between predictive accuracy and explainability. https://doi.org/10.1371/journal.pmed.1002709.g005

Trade-off between predictive accuracy and explainability. https://doi.org/10.1371/journal.pmed.1002709.g005

Source publication
Article
Full-text available
Background Resuscitated cardiac arrest is associated with high mortality; however, the ability to estimate risk of adverse outcomes using existing illness severity scores is limited. Using in-hospital data available within the first 24 hours of admission, we aimed to develop more accurate models of risk prediction using both logistic regression (LR...

Similar publications

Article
Full-text available
Machine Learning (ML) can potentially enhance predictions in real-life domains. This study presents an evaluation and comparison of different ML methods which can be applied on thyroid cancer dataset, called Prostate, Lung, Colorectal and Ovarian (PLCO), of approximately 155,000 participants with thyroid cancer occurrence and mortality incidence. T...

Citations

... Most studies used internal validation of the algorithms either by splitting data into a training set and validation set and re-sampling the sets 5-10 times for cross-validation (30, [135][136][137] or leave-one-out (76,138,139) method. The data used for training and validation of the algorithms came from a variety of sources such as clinical trial database (30, 138), electronic health records (34, [140][141][142], or internal institutional databases (143,144). ...
Article
Full-text available
The widespread adoption of mobile technologies offers an opportunity for a new approach to post-discharge care for patients with heart failure (HF). By enabling non-invasive remote monitoring and two-way, real-time communication between the clinic and home-based patients, as well as a host of other capabilities, mobile technologies have a potential to significantly improve remote patient care. This literature review summarizes clinical evidence related to virtual healthcare (VHC), defined as a care team + connected devices + a digital solution in post-release care of patients with HF. Searches were conducted on Embase (06/12/2020). A total of 171 studies were included for data extraction and evidence synthesis: 96 studies related to VHC efficacy, and 75 studies related to AI in HF. In addition, 15 publications were included from the search on studies scaling up VHC solutions in HF within the real-world setting. The most successful VHC interventions, as measured by the number of reported significant results, were those targeting reduction in rehospitalization rates. In terms of relative success rate, the two most effective interventions targeted patient self-care and all-cause hospital visits in their primary endpoint. Among the three categories of VHC identified in this review (telemonitoring, remote patient management, and patient self-empowerment) the integrated approach in remote patient management solutions performs the best in decreasing HF patients' re-admission rates and overall hospital visits. Given the increased amount of data generated by VHC technologies, artificial intelligence (AI) is being investigated as a tool to aid decision making in the context of primary diagnostics, identifying disease phenotypes, and predicting treatment outcomes. Currently, most AI algorithms are developed using data gathered in clinic and only a few studies deploy AI in the context of VHC. Most successes have been reported in predicting HF outcomes. Since the field of VHC in HF is relatively new and still in flux, this is not a typical systematic review capturing all published studies within this domain. Although the standard methodology for this type of reviews was followed, the nature of this review is qualitative. The main objective was to summarize the most promising results and identify potential research directions.
... To identify the characteristics associated with medical disputes, the significant features identified through multivariate analysis were used as input features for the ML model. We used 6 techniques-LR, DT, RF, SVM, GBDT, and DNN-in this study [22][23][24][25][26], which are the most commonly used techniques for binary classification predictions. These ML techniques were carefully selected to ensure a comprehensive evaluation. ...
... After obtaining the optimal model through the predictive evaluation, model explainability was assessed using a local interpretable model-agnostic explanation (LIME) to enhance clinical utility and transparency [24,27]. LIME was able to present the risk probability of encountering medical disputes in individual cases and provide insights into the factors contributing to the predicted risk. ...
... Therefore, this study used the SHAP and LIME explainers to analyze the model's output and determine feature importance. Notably, LIME had the advantages of displaying specific risk probabilities of medical disputes and elucidating the underlying mechanisms [24]. Displaying the model on the internet would further promote the utility of the model because it would be considerably convenient and simple for medical workers to use; thus, we finally developed a web-based app to present the optimal model. ...
Article
Full-text available
Background: Medical disputes are a global public health issue that is receiving increasing attention. However, studies investigating the relationship between hospital legal construction and medical disputes are scarce. The development of a multicenter model incorporating machine learning (ML) techniques for the individualized prediction of medical disputes would be beneficial for medical workers. Objective: This study aimed to identify predictors related to medical disputes from the perspective of hospital legal construction and the use of ML techniques to build models for predicting the risk of medical disputes. Methods: This study enrolled 38,053 medical workers from 130 tertiary hospitals in Hunan province, China. The participants were randomly divided into a training cohort (34,286/38,053, 90.1%) and an internal validation cohort (3767/38,053, 9.9%). Medical workers from 87 tertiary hospitals in Beijing were included in an external validation cohort (26,285/26,285, 100%). This study used logistic regression and 5 ML techniques: decision tree, random forest, support vector machine, gradient boosting decision tree (GBDT), and deep neural network. In total, 12 metrics, including discrimination and calibration, were used for performance evaluation. A scoring system was developed to select the optimal model. Shapley additive explanations was used to generate the importance coefficients for characteristics. To promote the clinical practice of our proposed optimal model, reclassification of patients was performed, and a web-based app for medical dispute prediction was created, which can be easily accessed by the public. Results: Medical disputes occurred among 46.06% (17,527/38,053) of the medical workers in Hunan province, China. Among the 26 clinical characteristics, multivariate analysis demonstrated that 18 characteristics were significantly associated with medical disputes, and these characteristics were used for ML model development. Among the ML techniques, GBDT was identified as the optimal model, demonstrating the lowest Brier score (0.205), highest area under the receiver operating characteristic curve (0.738, 95% CI 0.722-0.754), and the largest discrimination slope (0.172) and Youden index (1.355). In addition, it achieved the highest metrics score (63 points), followed by deep neural network (46 points) and random forest (45 points), in the internal validation set. In the external validation set, GBDT still performed comparably, achieving the second highest metrics score (52 points). The high-risk group had more than twice the odds of experiencing medical disputes compared with the low-risk group. Conclusions: We established a prediction model to stratify medical workers into different risk groups for encountering medical disputes. Among the 5 ML models, GBDT demonstrated the optimal comprehensive performance and was used to construct the web-based app. Our proposed model can serve as a useful tool for identifying medical workers at high risk of medical disputes. We believe that preventive strategies should be implemented for the high-risk group.
... 9,10 One of the primary uses of ML with tabular data in resuscitation research are predictive models to estimate the likelihood of outcomes such as return of spontaneous circulation (ROSC), survival, or neurological recovery after cardiac arrest. 7,8,11 In another example, tabular data was also used to develop early warning systems (EWS) that predict the risk of cardiac arrest or other serious adverse events among patients admitted to hospital. [12][13][14] These systems use ML models to analyze various data such as heart rate, blood pressure, respiratory rate, oxygen saturation, tempera-ture, and laboratory data, to identify patterns that may suggest a patient's condition is deteriorating. ...
Article
Full-text available
Aim: Artificial intelligence (AI) and machine learning (ML) are important areas of computer science that have recently attracted attention for their application to medicine. However, as techniques continue to advance and become more complex, it is increasingly challenging for clinicians to stay abreast of the latest research. This overview aims to translate research concepts and potential concerns to healthcare professionals interested in applying AI and ML to resuscitation research but who are not experts in the field. Main text: We present various research including prediction models using structured and unstructured data, exploring treatment heterogeneity, reinforcement learning, language processing, and large-scale language models. These studies potentially offer valuable insights for optimizing treatment strategies and clinical workflows. However, implementing AI and ML in clinical settings presents its own set of challenges. The availability of high-quality and reliable data is crucial for developing accurate ML models. A rigorous validation process and the integration of ML into clinical practice is essential for practical implementation. We furthermore highlight the potential risks associated with self-fulfilling prophecies and feedback loops, emphasizing the importance of transparency, interpretability, and trustworthiness in AI and ML models. These issues need to be addressed in order to establish reliable and trustworthy AI and ML models. Conclusion: In this article, we overview concepts and examples of AI and ML research in the resuscitation field. Moving forward, appropriate understanding of ML and collaboration with relevant experts will be essential for researchers and clinicians to overcome the challenges and harness the full potential of AI and ML in resuscitation.
... Machine learning is a branch of artificial intelligence that has proven increasingly useful in medicine [1][2][3]. The annual incidence of out-of-hospital cardiac arrests (OHCA) is 34.7-156.0/100000 ...
Preprint
Full-text available
Background: In this study, we develop a machine learning algorithm for the prediction of spontaneous circulation recovery in out-of-hospital cardiac arrest (OHCA) and cardiopulmonary resuscitation (CPR) patients. This will provide data support for the improvement of CPR success rates in OHCA patients. Methods and Results: We identified 463 patients who had undergone CPR following OHCA between September 2018 and April 2022 from the emergency digital intelligence platform(EDIP).The study endpoint was ROSC,defined as the restoration of a palpable pulse and an autonomous cardiac rhythm lasting for at least 20 minutes after the completion or cessation of CPR.The data were preprocessed with Pandas in python,and the 14 variables with the highest accuracy were determined in combination with clinical characteristics. 75% of the samples were divided into training sets to build the model,and the data were trained and tested using four machine learning algorithms:Logistic regression, XGBClassifier, Gradient Boosting Trees, and Random forest. 25% of the samples were divided into test sets for verification, and the performance of the model was evaluated according to the accuracy,precision,recall,relative operating characteristic curve (ROC curve) of the subjects,and the appropriate model was selected for impact factor analysis. The area under the curve (AUC) values of the four learning models of Logical regression, Random forest, XGBClassifier and Gradient Boosting Trees are 0.73,0.87,0.90 and 0.86 respectively.Select the Random forest(accuracy 0.89, precision 0.90,recall 0.89,AUC 0.87) to calculate the importance of each characteristic value,we concluded that the main predictors of autonomic circulation recovery in OHCA patients are age,speed of CPR initiation,history of cardiopulmonary conditions,another person is present when cardiac arrest occurs,chest compressions and defibrillation. Conclusions: Machine learning has the potential to predict the recovery of spontaneous circulation in OHCA patients treated with CPR. A Random Forest model was found to provide the most accurate predictions for this purpose. This can be used to provide data support and as a reference source to improve the success rate of CPR.
... The Support Vector Classifier (SVC) analyzes linear and nonlinear data for classification and regression. SVC aims to recognize categories by the creation of non-linear decision hyperplanes in a higher feature space [57]. SVC is resistant to data bias and variance and produces accurate predictions for binary or multiclass classifications. ...
Article
Full-text available
An ICU is a critical care unit that provides advanced medical support and continuous monitoring for patients with severe illnesses or injuries. Predicting the mortality rate of ICU patients can not only improve patient outcomes, but also optimize resource allocation. Many studies have attempted to create scoring systems and models that predict the mortality of ICU patients using large amounts of structured clinical data. However, unstructured clinical data recorded during patient admission, such as notes made by physicians, is often overlooked. This study used the MIMIC-III database to predict mortality in ICU patients. In the first part of the study, only eight structured variables were used, including the six basic vital signs, the GCS, and the patient’s age at admission. In the second part, unstructured predictor variables were extracted from the initial diagnosis made by physicians when the patients were admitted to the hospital and analyzed using Latent Dirichlet Allocation techniques. The structured and unstructured data were combined using machine learning methods to create a mortality risk prediction model for ICU patients. The results showed that combining structured and unstructured data improved the accuracy of the prediction of clinical outcomes in ICU patients over time. The model achieved an AUROC of 0.88, indicating accurate prediction of patient vital status. Additionally, the model was able to predict patient clinical outcomes over time, successfully identifying important variables. This study demonstrated that a small number of easily collectible structured variables, combined with unstructured data and analyzed using LDA topic modeling, can significantly improve the predictive performance of a mortality risk prediction model for ICU patients. These results suggest that initial clinical observations and diagnoses of ICU patients contain valuable information that can aid ICU medical and nursing staff in making important clinical decisions.
... Apart from sepsis, the current study has also been carried out on cardiac arrest prognosis, which has a higher mortality rate (Nanayakkara et al. [56]). The investigation has been carried out using logistic regression, random forest, support vector machine, ensemble classification, and neural network over multiple vital signs of cardiac arrest. ...
Article
Full-text available
The prime purpose of the proposed study is to construct a novel predictive scheme for assisting in the prognosis of criticality using the MIMIC-III dataset. With the adoption of various analytics and advanced computing in the healthcare system, there is an increasing trend toward developing an effective prognostication mechanism. Predictive-based modeling is the best alternative to work in this direction. This paper discusses various scientific contributions using desk research methodology towards the Medical Information Mart for Intensive Care (MIMIC-III). This open-access dataset is meant to help predict patient trajectories for various purposes ranging from mortality forecasting to treatment planning. With a dominant machine learning approach in this perspective, there is a need to discover the effectiveness of existing predictive methods. The resultant outcome of this paper offers an inclusive discussion about various available predictive schemes and clinical diagnoses using MIMIC-III in order to contribute toward better information associated with its strengths and weaknesses. Therefore, the paper provides a clear visualization of existing schemes for clinical diagnosis using a systematic review approach.
... Combining the outputs of the artificial neural network, gradient boosting decision tree, eXGBoosting machine, decision tree, and support vector machine, ensemble machine learning can use models created by numerous machine learning techniques to make predictions. Particularly, ensemble models frequently produce superior predicting performance than individual machine learning models (20,21). Broad upper and lower bounds were applied to grid and random hyperparameter searches to explore the optimal hyperparameters, and the area under the receiver operating characteristic curve (AUROC) was the primary metric to evaluate the prediction performance after the optimal hyperparameters were finally determined, helping to largely avoid underfitted and overfitted conditions. ...
Article
Full-text available
Purpose Using an ensemble machine learning technique that incorporates the results of multiple machine learning algorithms, the study’s objective is to build a reliable model to predict the early mortality among hepatocellular carcinoma (HCC) patients with bone metastases. Methods We extracted a cohort of 124,770 patients with a diagnosis of hepatocellular carcinoma from the Surveillance, Epidemiology, and End Results (SEER) program and enrolled a cohort of 1897 patients who were diagnosed as having bone metastases. Patients with a survival time of 3 months or less were considered to have had early death. To compare patients with and without early mortality, subgroup analysis was used. Patients were randomly divided into two groups: a training cohort (n = 1509, 80%) and an internal testing cohort (n = 388, 20%). In the training cohort, five machine learning techniques were employed to train and optimize models for predicting early mortality, and an ensemble machine learning technique was used to generate risk probability in a way of soft voting, and it was able to combine the results from the multiply machine learning algorithms. The study employed both internal and external validations, and the key performance indicators included the area under the receiver operating characteristic curve (AUROC), Brier score, and calibration curve. Patients from two tertiary hospitals were chosen as the external testing cohorts (n = 98). Feature importance and reclassification were both operated in the study. Results The early mortality was 55.5% (1052/1897). Eleven clinical characteristics were included as input features of machine learning models: sex (p = 0.019), marital status (p = 0.004), tumor stage (p = 0.025), node stage (p = 0.001), fibrosis score (p = 0.040), AFP level (p = 0.032), tumor size (p = 0.001), lung metastases (p < 0.001), cancer-directed surgery (p < 0.001), radiation (p < 0.001), and chemotherapy (p < 0.001). Application of the ensemble model in the internal testing population yielded an AUROC of 0.779 (95% confidence interval [CI]: 0.727–0.820), which was the largest AUROC among all models. Additionally, the ensemble model (0.191) outperformed the other five machine learning models in terms of Brier score. In terms of decision curves, the ensemble model also showed favorable clinical usefulness. External validation showed similar results; with an AUROC of 0.764 and Brier score of 0.195, the prediction performance was further improved after revision of the model. Feature importance demonstrated that the top three most crucial features were chemotherapy, radiation, and lung metastases based on the ensemble model. Reclassification of patients revealed a substantial difference in the two risk groups’ actual probabilities of early mortality (74.38% vs. 31.35%, p < 0.001). Patients in the high-risk group had significantly shorter survival time than patients in the low-risk group (p < 0.001), according to the Kaplan–Meier survival curve. Conclusions The ensemble machine learning model exhibits promising prediction performance for early mortality among HCC patients with bone metastases. With the aid of routinely accessible clinical characteristics, this model can be a trustworthy prognostic tool to predict the early death of those patients and facilitate clinical decision-making.
... Patient triage is another area in which AI has been introduced, through wearable devices designed to monitor remotely and analyse vital signs-e.g., consciousness. In these AI systems, algorithms are trained to classify disease conditions based on severity, which helps predict survival in the pre-hospital environment (Kim et al. 2018;Ellahham et al. 2020), as well as in emergency departments through electronic triage (e-triage) (Levin et al. 2018;Ellahham et al. 2020), and in Intensive Care Units (ICUs) (Che et al. 2016;Nanayakkara et al. 2018). ...
Article
Full-text available
The increasing application of artificial intelligence (AI) to healthcare raises both hope and ethical concerns. Some advanced machine learning methods provide accurate clinical predictions at the expense of a significant lack of explainability. Alex John London has defended that accuracy is a more important value than explainability in AI medicine. In this article, we locate the trade-off between accurate performance and explainable algorithms in the context of distributive justice. We acknowledge that accuracy is cardinal from outcome-oriented justice because it helps to maximize patients’ benefits and optimizes limited resources. However, we claim that the opaqueness of the algorithmic black box and its absence of explainability threatens core commitments of procedural fairness such as accountability, avoidance of bias, and transparency. To illustrate this, we discuss liver transplantation as a case of critical medical resources in which the lack of explainability in AI-based allocation algorithms is procedurally unfair. Finally, we provide a number of ethical recommendations for when considering the use of unexplainable algorithms in the distribution of health-related resources.
... The key difference between the two approaches is that the earlier study adopts a non-parametric approach and are free from a priori assumptions. The main advantage of ML models is the ability to efficiently integrate a diverse array of variables and automatic learning without being specifically programmed [53]. In contrast, traditional statistical models tend to not work well on datasets with high dimensions [54]. ...
Article
Full-text available
Background: The major mechanisms of dementia and cognitive impairment are vascular and neurodegenerative processes. Early diagnosis of cognitive impairment can facilitate timely interventions to mitigate progression. Objective: This study aims to develop a reliable machine learning (ML) model using socio-demographics, vascular risk factors, and structural neuroimaging markers for early diagnosis of cognitive impairment in a multi-ethnic Asian population. Methods: The study consisted of 911 participants from the Epidemiology of Dementia in Singapore study (aged 60- 88 years, 49.6% male). Three ML classifiers, logistic regression, support vector machine, and gradient boosting machine, were developed. Prediction results of independent classifiers were combined in a final ensemble model. Model performances were evaluated on test data using F1 score and area under the receiver operating curve (AUC) methods. Post modelling, SHapely Additive exPlanation (SHAP) was applied on the prediction results to identify the predictors that contribute most to the cognitive impairment prediction. Findings: The final ensemble model achieved a F1 score and AUC of 0.87 and 0.80 respectively. Accuracy (0.83), sensitivity (0.86), specificity (0.74) and predictive values (positive 0.88 negative 0.72) of the ensemble model were higher compared to the independent classifiers. Age, ethnicity, highest education attainment and neuroimaging markers were identified as important predictors of cognitive impairment. Conclusion: This study demonstrates the feasibility of using ML tools to integrate multiple domains of data for reliable diagnosis of early cognitive impairment. The ML model uses easy-to-obtain variables and is scalable for screening individuals with a high risk of developing dementia in a population-based setting.
... • Support Vector Classifier (SVC) performs classification and regression analysis on linear and non-linear data. SVC aims to identify classes by creating decision hyperplanes in a non-linear manner in a higher eigenspace [55]. SVC is a robust tool to address data bias and variance and leads to accurate prediction of binary or multiclass classifications. ...
Article
Full-text available
Cardiovascular diseases have been identified as one of the top three causes of death worldwide, with onset and deaths mostly due to heart failure (HF). In ICU, where patients with HF are at increased risk of death and consume significant medical resources, early and accurate prediction of the time of death for patients at high risk of death would enable them to receive appropriate and timely medical care. The data for this study were obtained from the MIMIC-III database, where we collected vital signs and tests for 6699 HF patient during the first 24 h of their first ICU admission. In order to predict the mortality of HF patients in ICUs more precisely, an integrated stacking model is proposed and applied in this paper. In the first stage of dataset classification, the datasets were subjected to first-level classifiers using RF, SVC, KNN, LGBM, Bagging, and Adaboost. Then, the fusion of these six classifier decisions was used to construct and optimize the stacked set of second-level classifiers. The results indicate that our model obtained an accuracy of 95.25% and AUROC of 82.55% in predicting the mortality rate of HF patients, which demonstrates the outstanding capability and efficiency of our method. In addition, the results of this study also revealed that platelets, glucose, and blood urea nitrogen were the clinical features that had the greatest impact on model prediction. The results of this analysis not only improve the understanding of patients’ conditions by healthcare professionals but allow for a more optimal use of healthcare resources.