Fig 1 - uploaded by Zhongkai Hu
Content may be subject to copyright.
Study design for modeling the risk of an inpatient hospital readmission 30 days post discharge. There were three steps in model development: 1) two independent cohorts were constructed for retrospective modeling and prospective validation; 2) the retrospective cohort was split into two subgroups with each incorporating non-overlapped care facilities. The first subgroup was further split into model training and calibration sub cohorts, and the second subgroup was used as the blind-test cohort; and 3) the model was validated using the prospective cohort. Unsupervised clustering pattern analysis that included demographic and clinical data was performed. The prospectively validated model was then deployed in production to support healthcare quality monitoring and improvement efforts.  

Study design for modeling the risk of an inpatient hospital readmission 30 days post discharge. There were three steps in model development: 1) two independent cohorts were constructed for retrospective modeling and prospective validation; 2) the retrospective cohort was split into two subgroups with each incorporating non-overlapped care facilities. The first subgroup was further split into model training and calibration sub cohorts, and the second subgroup was used as the blind-test cohort; and 3) the model was validated using the prospective cohort. Unsupervised clustering pattern analysis that included demographic and clinical data was performed. The prospectively validated model was then deployed in production to support healthcare quality monitoring and improvement efforts.  

Source publication
Article
Full-text available
Objectives Identifying patients at risk of a 30-day readmission can help providers design interventions, and provide targeted care to improve clinical effectiveness. This study developed a risk model to predict a 30-day inpatient hospital readmission for patients in Maine, across all payers, all diseases and all demographic groups. Methods Our obj...

Similar publications

Article
Full-text available
Objective To characterize rates and trends over time of emergency department treatment-and-discharge stays, repeat observation stays, inpatient stays, any hospital revisit, and death within 30 days of discharge from observation stays. Design Retrospective cohort study. Setting 4750 hospitals in the USA. Participants Nationally representative sam...

Citations

... Moreover, the performance of the published models of 28-day (or sometimes 30-day) rehospitalisation is generally modest, with only a few notable exceptions [7]. No statistical difference has been observed between the performance of regression-based models and applied machine learning (ML, mean c-statistics of 0.74 vs. 0.71) [7] even though ML generally outperforms traditional statistical models [8][9][10][11][12][13][14]. ...
Article
Full-text available
Background Accurately estimating elderly patients’ rehospitalisation risk benefits clinical decisions and service planning. However, research in rehospitalisation and repeated hospitalisation yielded only models with modest performance, and the model performance deteriorates rapidly as the prediction timeframe expands beyond 28 days and for older participants. Methods A temporal zero-inflated Poisson (tZIP) regression model was developed and validated retrospectively and prospectively. The data of the electronic health records (EHRs) contain cohorts (aged 60+) in a major public hospital in Hong Kong. Two temporal offset functions accounted for the associations between exposure time and parameters corresponding to the zero-inflated logistic component and the Poisson distribution’s expected count. tZIP was externally validated with a retrospective cohort’s rehospitalisation events up to 12 months after the discharge date. Subsequently, tZIP was validated prospectively after piloting its implementation at the study hospital. Patients discharged within the pilot period were tagged, and the proposed model’s prediction of their rehospitalisation was verified monthly. Using a hybrid machine learning (ML) approach, the tZIP-based risk estimator’s marginal effect on 28-day rehospitalisation was further validated, competing with other factors representing different post-acute and clinical statuses. Results The tZIP prediction of rehospitalisation from 28 days to 365 days was achieved at above 80% discrimination accuracy retrospectively and prospectively in two out-of-sample cohorts. With a large margin, it outperformed the Cox proportional and linear models built with the same predictors. The hybrid ML revealed that the risk estimator’s contribution to 28-day rehospitalisation outweighed other features relevant to service utilisation and clinical status. Conclusions A novel rehospitalisation risk model was introduced, and its risk estimators, whose importance outweighed all other factors of diverse post-acute care and clinical conditions, were derived. The proposed approach relies on four easily accessible variables easily extracted from EHR. Thus, clinicians could visualise patients’ rehospitalisation risk from 28 days to 365 days after discharge and screen high-risk older patients for follow-up care at the proper time.
... Unfortunately, conventional models do not accurately predict readmissions; model c-statistics are rarely seen above 0.8 [8,9]. Additionally, most of the existing prediction models rely heavily on manual feature engineering [5,[10][11][12][13][14][15][16][17][18][19][20][21][22][23][24], which is based on domain knowledge and experience. Those features are often dataset-dependent, thus limiting generalizability between datasets or jurisdictions. ...
... Although the performances of different studies cannot be compared directly due to different methods and samples, these results validate the potential of the proposed automatic feature generation. There have been some attempts to define a large number of features manually from longitudinal data and apply feature selection methods [20][21][22][23][24]42]. However, it is unclear how to represent temporal aspects as features (for example, one has to determine whether to distinguish the same diagnosis code issued one week ago vs. three months ago and how). ...
Article
Full-text available
Background Hospital readmissions are one of the costliest challenges facing healthcare systems, but conventional models fail to predict readmissions well. Many existing models use exclusively manually-engineered features, which are labor intensive and dataset-specific. Our objective was to develop and evaluate models to predict hospital readmissions using derived features that are automatically generated from longitudinal data using machine learning techniques. Methods We studied patients discharged from acute care facilities in 2015 and 2016 in Alberta, Canada, excluding those who were hospitalized to give birth or for a psychiatric condition. We used population-level linked administrative hospital data from 2011 to 2017 to train prediction models using both manually derived features and features generated automatically from observational data. The target value of interest was 30-day all-cause hospital readmissions, with the success of prediction measured using the area under the curve (AUC) statistic. Results Data from 428,669 patients (62% female, 38% male, 27% 65 years or older) were used for training and evaluating models: 24,974 (5.83%) were readmitted within 30 days of discharge for any reason. Patients were more likely to be readmitted if they utilized hospital care more, had more physician office visits, had more prescriptions, had a chronic condition, or were 65 years old or older. The LACE readmission prediction model had an AUC of 0.66 ± 0.0064 while the machine learning model’s test set AUC was 0.83 ± 0.0045, based on learning a gradient boosting machine on a combination of machine-learned and manually-derived features. Conclusion Applying a machine learning model to the computer-generated and manual features improved prediction accuracy over the LACE model and a model that used only manually-derived features. Our model can be used to identify high-risk patients, for whom targeted interventions may potentially prevent readmissions.
... In order to facilitate and improve the discharge process and to prevent undesirable outcomes, different interventions involving ICT solutions have been developed over the years. Different decisionsupportive tools [9,10], tailormade discharge models and structured discharge summaries [11][12][13] have been tested with various results. A review of statistics published between 1999 and 2016 show that 7-89% of the ICT projects in different sectors failed because they did not meet one or more of three basic criteria: finishing on time, finishing within budget, and achieving satisfactory results [14]. ...
Article
Full-text available
Background Agile projects are statistically more likely to succeed then waterfall projects . The overall aim of this study was to explore the nursing staffs’ experiences with an agile development process, from its initial requirements to the deployment of its outcome of ICT solutions aimed at supporting discharge planning. Methods An explorative design with quantitative and qualitative methods was used. Qualitative data was collected through seven focus group interviews. Quantitative data was collected via an ICT-system, and with an evaluation form submitted by fourteen registered nurses and nine district nurses. Results Qualitative result of the experiences with the agile development process and its outcome resulted in one theme, four categories, and ten subcategories. The theme was found to be about time and timing, namely the amount of time for the different activities and the timing of activities within and between organisations. The agile development process increased the participants’ readiness for change by offering time to learn, practice, engage and reflect, and then adopt the ICT as a support to daily practice. Quantitative results showed a variated adoption of the ICT. Conclusion There is a need for time to prepare, understand and adopt new tools, services and procedures and a need for additional time to prepare, understand and adopt the new among individuals, collectives, organizations, and sometimes even between different collectives or organizations. The agile development process offered the end-users involvement through the development process, which gave them time to change it both individually and collectively. However, there is a need for close collaboration between the development project team and management to reach an organizational change that is timely for both the individual and the collective change. When time or timing fails in the development or implementation process, there is a huge risk of non-adoption of new tools, services, or procedures or among the end-users.
... An alternative approach to universal screening is to utilize patient risk scores or risk prediction models to identify and prioritize patients who are most likely to have HRSNs. Risk scores are already widely used in healthcare settings to predict a range of outcomes from specific disease conditions (e.g., cardiovascular disease) to hospital readmissions, healthcare cost, and ED utilization [34][35][36][37][38] . Recently, there has been increasing interest in using SDOH and social needs data to improve risk prediction models. ...
Article
Full-text available
Providers currently rely on universal screening to identify health-related social needs (HRSNs). Predicting HRSNs using EHR and community-level data could be more efficient and less resource intensive. Using machine learning models, we evaluated the predictive performance of HRSN status from EHR and community-level social determinants of health (SDOH) data for Medicare and Medicaid beneficiaries participating in the Accountable Health Communities Model. We hypothesized that Medicaid insurance coverage would predict HRSN status. All models significantly outperformed the baseline Medicaid hypothesis. AUCs ranged from 0.59 to 0.68. The top performance (AUC = 0.68 CI 0.66–0.70) was achieved by the “any HRSNs” outcome, which is the most useful for screening prioritization. Community-level SDOH features had lower predictive performance than EHR features. Machine learning models can be used to prioritize patients for screening. However, screening only patients identified by our current model(s) would miss many patients. Future studies are warranted to optimize prediction of HRSNs.
... The results from their study provided suggestions that through widespread adaptions of EMR systems within hospitals, accurate real-time, automated prediction models have the potential to improve patient care during admission and after discharge. Similarly, Hao et al. (2015) have also concluded that the results from their prospective validation Readmissions' Risk Prediction analysis demonstrated the robust reproducibility for the methods used in the derivation of reliable risk assessment. From the same study, it is also important to note that the results from the retrospective modelling had shown to provide better discriminative ability when compared to the prospective analysis. ...
... Prior hospital utilisation and LOS were also variables which were commonly used in the model to determine readmission risk. Thirteen out of 35 studies (Abdelrahman et al., 2014;Cotter et al., 2012;Hao et al., 2015;Hatipoglu et al., 2015;Jamei et al., 2017;Lenzi et al., 2016;Low et al., 2015Low et al., , 2016Low et al., , 2017aMahmoudi et al., 2020;Tan et al., 2013;Van Walraven et al., 2013;Wang et al., 2014) included prior hospital utilisation as an independent variable. These studies have all identified the importance and association of prior hospital utilisation with readmission risk, in which those who had one or more readmissions (particularly in the previous 6 months) have an increased risk of being readmitted. ...
... Readmission rates ranged between 3.1 and 74.1% depending on the risk category. The highest and lowest rate of readmission was identified in the study by Hao et al. (2015). These readmission rates tend to be higher in the older population in which 14% of 75 years olds were readmitted within 30 days of being discharged (Cotter et al., 2012). ...
Purpose The purpose of this paper is to identify and analyse the readmission risk prediction tools reported in the literature and their benefits when it comes to healthcare organisations and management. Design/methodology/approach Readmission risk prediction is a growing topic of interest with the aim of identifying patients in particular those suffering from chronic diseases such as congestive heart failure, chronic obstructive pulmonary disease and diabetes, who are at risk of readmission. Several models have been developed with different levels of predictive ability. A structured and extensive literature search of several databases was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-analysis strategy, and this yielded a total of 48,984 records. Findings Forty-three articles were selected for full-text and extensive review after following the screening process and according to the eligibility criteria. About 34 unique readmission risk prediction models were identified, in which their predictive ability ranged from poor to good (c statistic 0.5–0.86). Readmission rates ranged between 3.1 and 74.1% depending on the risk category. This review shows that readmission risk prediction is a complex process and is still relatively new as a concept and poorly understood. It confirms that readmission prediction models hold significant accuracy at identifying patients at higher risk for such an event within specific context. Research limitations/implications Since most prediction models were developed for specific populations, conditions or hospital settings, the generalisability and transferability of the predictions across wider or other contexts may be difficult to achieve. Therefore, the value of prediction models remains limited to hospital management. Future research is indicated in this regard. Originality/value This review is the first to cover readmission risk prediction tools that have been published in the literature since 2011, thereby providing an assessment of the relevance of this crucial KPI to health organisations and managers.
... The widespread adoption of the electronic medical record (EMR) and the linking of these records in health information exchanges (HIEs) allows for widespread collection of administrative and clinical data across multiple settings of clinical care, including the clinic, emergency room, hospital, pharmacy, and laboratory settings. These repositories represent a rich source of data with the potential to apply "big-data" machine learning techniques to aid in the risk stratification of individual patients in an automated fashion that may be implemented in the EMR system itself [7]. The objective of this study was to develop and validate a model to predict the individual one-year risk of developing a first-time diagnosis of HF in the adult population by applying machine learning methodology to a large, statewide HIE database that captures 97% of all EMR encounters in the state of Maine [8]. ...
Article
Full-text available
Background New-onset heart failure (HF) is associated with poor prognosis and high healthcare utilization. Early identification of patients at increased risk incident-HF may allow for focused allocation of preventative care resources. Health information exchange (HIE) data span the entire spectrum of clinical care, but there are no HIE-based clinical decision support tools for diagnosis of incident-HF. We applied machine-learning methods to model the one-year risk of incident-HF from the Maine statewide-HIE. Methods and results We included subjects aged ≥ 40 years without prior HF ICD9/10 codes during a three-year period from 2015 to 2018, and incident-HF defined as assignment of two outpatient or one inpatient code in a year. A tree-boosting algorithm was used to model the probability of incident-HF in year two from data collected in year one, and then validated in year three. 5,668 of 521,347 patients (1.09%) developed incident-HF in the validation cohort. In the validation cohort, the model c-statistic was 0.824 and at a clinically predetermined risk threshold, 10% of patients identified by the model developed incident-HF and 29% of all incident-HF cases in the state of Maine were identified. Conclusions Utilizing machine learning modeling techniques on passively collected clinical HIE data, we developed and validated an incident-HF prediction tool that performs on par with other models that require proactively collected clinical data. Our algorithm could be integrated into other HIEs to leverage the EMR resources to provide individuals, systems, and payors with a risk stratification tool to allow for targeted resource allocation to reduce incident-HF disease burden on individuals and health care systems.
... Social risk factors include the array of nonclinical, contextual, and socioeconomic characteristics that negatively affect health and increase utilization and costs [1,2]. Patients' social risk factor information can drive referrals to community partners [3] or be applied to population health management activities, like risk stratification [4]. In response to the potential value of these data and increased interest from payers and policymakers, the collection of patients' social risk factor information has become more common [5,6]. ...
Article
Full-text available
Health care organizations are increasingly documenting patients for social risk factors in structured data. Two main approaches to documentation, ICD-10 Z codes and screening questions, face limited adoption and conceptual challenges. This study compared estimates of social risk factors obtained via screening questions and ICD-10 Z diagnoses coding, as used in clinical practice, to estiamtes from validated survey instruments in a sample of adult primary care and emergency department patients at an urban safety-net health system. Financial strain, transportation barriers, food insecurity, and housing instability were independently assessed using instruments with published reliability and validity. These four social factors were also being collected by the health system in screening questions or could be mapped to ICD-10 Z code diagnosis code concepts. Neither the screening questions nor ICD-10 Z codes performed particularly well in terms of accuracy. For the screening questions, the Area Under the Curve (AUC) scores were 0.609 for financial strain, 0.703 for transportation, 0.698 for food insecurity, and 0.714 for housing instability. For the ICD-10 Z codes, AUC scores tended to be lower in the range of 0.523 to 0.535. For both screening questions and ICD-10 Z codes, the measures were much more specific than sensitive. Under real world conditions, ICD-10 Z codes and screening questions are at the minimal, or below, threshold for being diagnostically useful approaches to identifying patients’ social risk factors. Data collection support through information technology or novel approaches combining data sources may be necessary to improve the usefulness of these data.
... The Healthcare Cost and Utilization Project (HCUP) estimates that unplanned 30-day readmission costs the United States $41.3 billion. 1 Approximately 18% of patients on Medicare were readmitted within 30 days of discharge, a number that remained relatively unchanged between 2007 and 2010. 2 These unplanned hospital readmissions are both a burden on the US THE BIGGER PICTURE Unplanned readmission currently costs the United States millions of dollars. Predicting whether an incoming patient is at a high risk of readmission can help target healthcare efforts better to reduce this risk. ...
... Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems healthcare system as well as a strong indicator of sub-par quality of care. 2 In a bid to reduce healthcare costs, the ACA established the Hospital Readmission Reduction Program (HRRP) in 2012. This program aimed to financially penalize healthcare organizations for higher-than-expected readmission rates for certain health conditions. ...
... Consistent factors that lead to unplanned readmissions include premature discharge, length of stay in the hospital, and lack of post-discharge treatments, and might include other factors. 2 These other factors include advanced age; use of high-risk medications; specific disease diagnoses; presence of comorbidities; demographics, including socioeconomic status and race; and insurance/healthcare utilization. ...
Article
Full-text available
Healthcare costs due to unplanned readmissions are high and negatively affect health and wellness of patients. Hospital readmission is an undesirable outcome for elderly patients. Here, we present readmission risk prediction using five machine learning approaches for predicting 30-day unplanned readmission for elderly patients (age ≥ 50 years). We use a comprehensive and curated set of variables that include frailty, comorbidities, high-risk medications, demographics, hospital, and insurance utilization to build these models. We conduct a large-scale study with electronic health record (her) data with over 145,000 observations from 76,000 patients. Findings indicate that the category boost (CatBoost) model outperforms other models with a mean area under the curve (AUC) of 0.79. We find that prior readmissions, discharge to a rehabilitation facility, length of stay, comorbidities, and frailty indicators were all strong predictors of 30-day readmission. We present in-depth insights using Shapley additive explanations (SHAP), the state of the art in machine learning explainability.
... With the advancing adoption of commercial EHR systems and their instantaneous potential to provide clinically granular data from the entire course of hospitalization, factors such as living situation and nursing scores need further investigation. Fourth, a potential path to developing more comprehensive patient-risk models is machine learning, which has proven to be able to process extremely large numbers of input features and to be typically more predictive than standard logistic regression methods [62][63][64][65]. Last, interventional research is now needed to better understand the effects of risk prediction scores followed by available transitional care measures. ...
Article
Full-text available
Introduction: Readmissions after an acute care hospitalization are relatively common, costly to the health care system, and are associated with significant burden for patients. As one way to reduce costs and simultaneously improve quality of care, hospital readmissions receive increasing interest from policy makers. It is only relatively recently that strategies were developed with the specific aim of reducing unplanned readmissions using prediction models to identify patients at risk. EPIC's Risk of Unplanned Readmission model promises superior performance. However, it has only been validated for the US setting. Therefore, the main objective of this study is to externally validate the EPIC's Risk of Unplanned Readmission model and to compare it to the internationally, widely used LACE+ index, and the SQLAPE® tool, a Swiss national quality of care indicator. Methods: A monocentric, retrospective, diagnostic cohort study was conducted. The study included inpatients, who were discharged between the 1st of January 2018 and the 31st of December 2019 from the Lucerne Cantonal Hospital, a tertiary-care provider in Central Switzerland. The study endpoint was an unplanned 30-day readmission. Models were replicated using the original intercept and beta coefficients as reported. Otherwise, score generator provided by the developers were used. For external validation, discrimination of the scores under investigation were assessed by calculating the area under the receiver operating characteristics curves (AUC). Calibration was assessed with the Hosmer-Lemeshow X2 goodness-of-fit test This report adheres to the TRIPOD statement for reporting of prediction models. Results: At least 23,116 records were included. For discrimination, the EPIC´s prediction model, the LACE+ index and the SQLape® had AUCs of 0.692 (95% CI 0.676-0.708), 0.703 (95% CI 0.687-0.719) and 0.705 (95% CI 0.690-0.720). The Hosmer-Lemeshow X2 tests had values of p<0.001. Conclusion: In summary, the EPIC´s model showed less favorable performance than its comparators. It may be assumed with caution that the EPIC´s model complexity has hampered its wide generalizability-model updating is warranted.
... Singular features are ideal for training logistic regression classifiers because features are rather independent and [11], [13]- [22], [29]- [32], [34]- [41], [43]- [62], [65], [66], [68]- [77], [79], [80], [82], [87]- [91], [94], [95], [100], [101], [109], [113], [137]- [144] Basic Smoking, Alcohol, Living situation, Employment ✗ Admission and discharge information [8]- [22], [29]- [32], [34]- [47], [49]- [62], [65], [66], [68]- [73], [75]- [77], [79], [80], [82], [87]- [91], [94], [95], [99]- [101], [109], [113], [137]- [144] Admission information Admission date, First hospital visit, Elective ✓ Number of admissions in a past time period, Cost-weight of previous admission, Diagnosis of last admission ✓ ...
... Singular features are ideal for training logistic regression classifiers because features are rather independent and [11], [13]- [22], [29]- [32], [34]- [41], [43]- [62], [65], [66], [68]- [77], [79], [80], [82], [87]- [91], [94], [95], [100], [101], [109], [113], [137]- [144] Basic Smoking, Alcohol, Living situation, Employment ✗ Admission and discharge information [8]- [22], [29]- [32], [34]- [47], [49]- [62], [65], [66], [68]- [73], [75]- [77], [79], [80], [82], [87]- [91], [94], [95], [99]- [101], [109], [113], [137]- [144] Admission information Admission date, First hospital visit, Elective ✓ Number of admissions in a past time period, Cost-weight of previous admission, Diagnosis of last admission ✓ ...
... Total charge, Discharge date ✓ Transfer, Discharge disposition ✓ Clinical information [9]- [11], [13], [14], [16]- [20], [22], [29]- [31], [33]- [38], [40]- [62], [65], [66], [68]- [80], [82], [87]- [91], [94], [95], [95], [101], [101], [106], [109], [113], [139]- [145] Payment code information ICD-10codes, ICD-9 codes ✓ APR-DRG codes, DRG codes ✓ In-hospital symptom Vitals and lab values ✓ Rhythmic features Mean 10 most active hours, Total sleep time, Sedentary time ✓ Medical images Ultrasound exam ✗ Hospital information [34], [53], [65], [71], [77], [87], [106] Hospital statistics Total number of admissions, Number of patients ✓ Percent readmission within a time period (30 days etc.) ✓ Hospital characteristics ...
Preprint
Hospital readmission prediction is a study to learn models from historical medical data to predict probability of a patient returning to hospital in a certain period, 30 or 90 days, after the discharge. The motivation is to help health providers deliver better treatment and post-discharge strategies, lower the hospital readmission rate, and eventually reduce the medical costs. Due to inherent complexity of diseases and healthcare ecosystems, modeling hospital readmission is facing many challenges. By now, a variety of methods have been developed, but existing literature fails to deliver a complete picture to answer some fundamental questions, such as what are the main challenges and solutions in modeling hospital readmission; what are typical features/models used for readmission prediction; how to achieve meaningful and transparent predictions for decision making; and what are possible conflicts when deploying predictive approaches for real-world usages. In this paper, we systematically review computational models for hospital readmission prediction, and propose a taxonomy of challenges featuring four main categories: (1) data variety and complexity; (2) data imbalance, locality and privacy; (3) model interpretability; and (4) model implementation. The review summarizes methods in each category, and highlights technical solutions proposed to address the challenges. In addition, a review of datasets and resources available for hospital readmission modeling also provides firsthand materials to support researchers and practitioners to design new approaches for effective and efficient hospital readmission prediction.