Conference Paper

Predicting respiratory failure in patients with COVID-19 pneumonia: a case study from Northern Italy

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Covid-19 crisis caught health care services around the world by surprise, putting unprecedented pressure on Intensive Care Units (ICU). To help clinical staff to manage the limited ICU capacity, we have developed a Machine Learning model to estimate the probability that a patient admitted to hospital with COVID-19 symptoms would develop severe respiratory failure and require Intensive Care within 48 hours of admission. The model was trained on an initial co-hort of 198 patients admitted to the Infectious Disease ward of Mod-ena University Hospital, in Italy, at the peak of the epidemic, and subsequently refined as more patients were admitted. Using the Light-GBM Decision Tree ensemble approach, we were able to achieve good accuracy (AUC = 0.84) despite a high rate of missing values. Furthermore, we have been able to provide clinicians with explanations in the form of personalised ranked lists of features for each prediction , using only 20 out of more than 90 variables, using Shapley values to describe the importance of each feature.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For example, in Ferrari et al. (2020b) the objective was predicting the occurrence of respiratory failure in Covid-19 patients. Tuning the loss function reduced the impact of FN, e.g., patients' predicted to be safe but eventually becoming more severe. ...
Article
Full-text available
As Big Data Analysis meets healthcare applications, domain-specific challenges and opportunities materialize in all aspects of data science. Advanced statistical methods and Artificial Intelligence (AI) on Electronic Health Records (EHRs) are used both for knowledge discovery purposes and clinical decision support. Such techniques enable the emerging Predictive, Preventative, Personalized, and Participatory Medicine (P4M) paradigm. Working with the Infectious Disease Clinic of the University Hospital of Modena, Italy, we have developed a range of Data–Driven (DD) approaches to solve critical clinical applications using statistics, Machine Learning (ML) and Big Data Analytics on real-world EHR. Here, we describe our perspective on the challenges we encountered. Some are connected to medical data and their sparse, scarce, and unbalanced nature. Others are bound to the application environment, as medical AI tools can affect people's health and life. For each of these problems, we report some available techniques to tackle them, present examples drawn from our experience, and propose which approaches, in our opinion, could lead to successful real-world, end-to-end implementations. DESY report number DESY-22-153.
Article
Full-text available
Background Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions. Methods and findings Using data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals’ usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain). Conclusions Our AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the “information gain” achieved by considering more risk factors in the predictive model was significantly higher than the “modeling gain” achieved by adopting complex predictive models.
Article
Full-text available
Accurate prediction of survival for cystic fibrosis (CF) patients is instrumental in establishing the optimal timing for referring patients with terminal respiratory failure for lung transplantation (LT). Current practice considers referring patients for LT evaluation once the forced expiratory volume (FEV1) drops below 30% of its predicted nominal value. While FEV1 is indeed a strong predictor of CF-related mortality, we hypothesized that the survival behavior of CF patients exhibits a lot more heterogeneity. To this end, we developed an algorithmic framework, which we call AutoPrognosis, that leverages the power of machine learning to automate the process of constructing clinical prognostic models, and used it to build a prognostic model for CF using data from a contemporary cohort that involved 99% of the CF population in the UK. AutoPrognosis uses Bayesian optimization techniques to automate the process of configuring ensembles of machine learning pipelines, which involve imputation, feature processing, classification and calibration algorithms. Because it is automated, it can be used by clinical researchers to build prognostic models without the need for in-depth knowledge of machine learning. Our experiments revealed that the accuracy of the model learned by AutoPrognosis is superior to that of existing guidelines and other competing models.
Article
Clinical prognostic models derived from largescale healthcare data can inform critical diagnostic and therapeutic decisions. To enable off-theshelf usage of machine learning (ML) in prognostic research, we developed AUTOPROGNOSIS: a system for automating the design of predictive modeling pipelines tailored for clinical prognosis. AUTOPROGNOSIS optimizes ensembles of pipeline configurations efficiently using a novel batched Bayesian optimization (BO) algorithm that learns a low-dimensional decomposition of the pipelines high-dimensional hyperparameter space in concurrence with the BO procedure. This is achieved by modeling the pipelines performances as a black-box function with a Gaussian process prior, and modeling the similarities between the pipelines baseline algorithms via a sparse additive kernel with a Dirichlet prior. Meta-learning is used to warmstart BO with external data from similar patient cohorts by calibrating the priors using an algorithm that mimics the empirical Bayes method. The system automatically explains its predictions by presenting the clinicians with logical association rules that link patients features to predicted risk strata. We demonstrate the utility of AUTOPROGNOSIS using 10 major patient cohorts representing various aspects of cardiovascular patient care.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission
  • Rich Caruana
  • Yin Lou
  • Johannes Gehrke
  • Paul Koch
  • Marc Sturm
  • Noemie Elhadad
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad, 'Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission', in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, p. 1721-1730, New York, NY, USA, (2015). Association for Computing Machinery.
An interpretable machine learning framework for accurate severe vs nonsevere covid-19 clinical type classification
  • Yuanfang Chen
  • Liu Ouyang
  • Sheng Bao
  • Qian Li
  • Lei Han
  • Hengdong Zhang
  • Baoli Zhu
  • Ming Xu
  • Jie Liu
  • Yaorong Ge
  • Shi Chen
Yuanfang Chen, Liu Ouyang, Sheng Bao, Qian Li, Lei Han, Hengdong Zhang, Baoli Zhu, Ming Xu, Jie Liu, Yaorong Ge, and Shi Chen, 'An interpretable machine learning framework for accurate severe vs nonsevere covid-19 clinical type classification', medRxiv, (2020).
Early risk assessment for covid-19 patients from emergency department data using machine learning
  • Marcela P Frank Stefan Heldt
  • Sophie Vizcaychipi
  • Mattia Peacock
  • Lachlan Cinelli
  • Fernando Mclachlan
  • Stojan Andreotti
  • Robert Jovanovic
  • Nadezda Durichen
  • Lipunova
  • A Robert
  • Anne Fletcher
Frank Stefan Heldt, Marcela P Vizcaychipi, Sophie Peacock, Mattia Cinelli, Lachlan McLachlan, Fernando Andreotti, Stojan Jovanovic, Robert Durichen, Nadezda Lipunova, Robert A Fletcher, and Anne et al Hancock, 'Early risk assessment for covid-19 patients from emergency department data using machine learning', medRxiv, (2020).
Explainable ai for trees: From local explanations to global understanding
  • M Scott
  • Gabriel Lundberg
  • Hugh Erion
  • Alex Chen
  • Degrave
  • Bala Jordan M Prutkin
  • Ronit Nair
  • Jonathan Katz
  • Nisha Himmelfarb
  • Su-In Bansal
  • Lee
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee, 'Explainable ai for trees: From local explanations to global understanding', arXiv preprint arXiv:1905.04610, (2019).
Local Interpretation for the 6th and most critical day for the patient of the case study in section 1.2 COVID-19 Outbreak: Wuhan's Experience
Figure 5. Local Interpretation for the 6th and most critical day for the patient of the case study in section 1.2 COVID-19 Outbreak: Wuhan's Experience.', Anesthesiology, 132(6), 1317-1332, (jun 2020).
Machine learning to predict mortality and critical events in covid-19 positive new york city patients
  • Akhil Vaid
  • Sulaiman Somani
  • J Adam
  • Jessica K De Russak
  • Freitas
  • F Fayzan
Akhil Vaid, Sulaiman Somani, Adam J Russak, Jessica K De Freitas, and Fayzan F et al. Chaudhry, 'Machine learning to predict mortality and critical events in covid-19 positive new york city patients', medRxiv, (2020).