Background: Self-insured employers, which are majority in US, face an increasing financial burden as health care costs have increased relative to savings. By applying machine learning (ML) techniques, we may decrease unnecessary hospitalizations by identifying low risk patients, who are on the path to become high risk, and eventually high cost. In this complex simulation, we applied and compared ... [Show full abstract] several ML techniques on data gathered from medical insurance claims.
Methods: The analysis was limited to employees and their spouses only. We identified about 8000 employees and spouses who were covered by employer sponsored health insurance plan between 2011 and 2016. De-identified data were used to predict high cost claimants. High cost was defined as annual spending of over $10,000 in medical claims. We utilized and compared methods namely: bootstrapped random forest, gradient boosted tree, neural networks, Naïve Bayes classifier, and logistic regression with LASSO. Variables in the model included yearly inpatient and ER visits, CCS categories, CPT codes, and demographic information.
Results: The mean (SD) age of eligible population was 49.7 (11.0) years. Approximately, the employee to spouse ratio each year was 3.5:1. Results show that bootstrapped random forest performed better than other techniques (AUROC: 80.33%). All other algorithms also performed extremely well. Future progress of these models will be oriented towards including employee health fair data and also strategies to avoid over fitting.
Conclusion: Healthcare is way behind other industries in successfully using ML to improve processes and disrupt one industry that is admittedly in need of positive disruption. Models allowing successful preventive actions will both increase revenue and lower costs. Application of modern ML techniques can identify high cost claimants earlier. Our approach is one of the first applications of significant recent advances in ML algorithms and computer technologies, opening new opportunities for the future analysis of high healthcare costs.