Chapter

Biases in Machine Learning in Healthcare

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Machine learning in healthcare (MLHC) has the potential to revolutionize healthcare and health systems research. However, these benefits must be weighed against the risks of MLHC in perpetuating or even magnifying existing health disparities. This chapter discusses existing and historical biases in clinical medicine, examines the potential hazards associated with MLHC implementation, and considers possible solutions to mitigate these concerns.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Over the past year, medical centers across the US have removed race adjustment from estimated glomerular filtration rate from serum creatinine (eGFRcr), with many now reporting the “White/other” value for all patients. These changes follow calls to reconsider the use of race in estimating kidney function¹ and in medicine broadly.² We analyzed potential changes in recommended care using eGFRcr with and without race among Black individuals in the US (individuals who are not Black would not be affected).
Article
Glomerular filtration rate (GFR) is critically important for determining drug dosing as well as prognosis and treatment in patients with kidney disease. Despite its importance, we rarely measure it directly. Instead, we use serum creatinine level to estimate GFR (eGFRcr). Because serum creatinine is determined by diet and muscle mass as well as GFR, we use age, sex, race (African American vs non–African American), height, or weight to adjust the estimation of GFR.
Article
Clinicians estimate kidney function to guide important medical decisions across a wide range of settings, including assessing the safety of radiology studies, choosing chemotherapy, and reviewing the use of common nonprescription medications such as nonsteroidal anti-inflammatory drugs. Because direct measurement of kidney function is infeasible at the bedside, the usual approach involves using estimating equations that rely on serum creatinine. These equations assign a higher estimated glomerular filtration rate (eGFR) to patients who are identified as black. Yet in some medical and social science disciplines, a consensus has emerged that race is a social construct rather than a biological one.¹ In this Viewpoint, we argue that the use of kidney function estimating equations that include race as a variable cause problems for transparency and unduly restrict access to care in some cases, yet offer only modest benefits to precision.
Chapter
Machine learning applied to electronic health records (EHRs) can generate actionable insights, from improving upon patient risk score systems, to predicting the onset of disease, to streamlining hospital operations. Statistical models that leverage the variety and richness of EHR-derived data are still relatively rare and offer an exciting avenue for further research. In this chapter, we present an overview of how machine learning has been applied in clinical settings and summarize the advantages it offers over traditional analysis methods. We describe the methodological and operational challenges of using machine learning in research and practice. Lastly, we offer our perspective on future application areas for machine learning that will significantly impact health and healthcare delivery.
Article
Background Equations to estimate glomerular filtration rate (GFR) are routinely used to assess kidney function. Current equations have limited precision and systematically underestimate measured GFR at higher levels.
Article
Equations to estimate glomerular filtration rate (GFR) are routinely used to assess kidney function. Current equations have limited precision and systematically underestimate measured GFR at higher values. To develop a new estimating equation for GFR: the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation. Cross-sectional analysis with separate pooled data sets for equation development and validation and a representative sample of the U.S. population for prevalence estimates. Research studies and clinical populations ("studies") with measured GFR and NHANES (National Health and Nutrition Examination Survey), 1999 to 2006. 8254 participants in 10 studies (equation development data set) and 3896 participants in 16 studies (validation data set). Prevalence estimates were based on 16,032 participants in NHANES. GFR, measured as the clearance of exogenous filtration markers (iothalamate in the development data set; iothalamate and other markers in the validation data set), and linear regression to estimate the logarithm of measured GFR from standardized creatinine levels, sex, race, and age. In the validation data set, the CKD-EPI equation performed better than the Modification of Diet in Renal Disease Study equation, especially at higher GFR (P < 0.001 for all subsequent comparisons), with less bias (median difference between measured and estimated GFR, 2.5 vs. 5.5 mL/min per 1.73 m(2)), improved precision (interquartile range [IQR] of the differences, 16.6 vs. 18.3 mL/min per 1.73 m(2)), and greater accuracy (percentage of estimated GFR within 30% of measured GFR, 84.1% vs. 80.6%). In NHANES, the median estimated GFR was 94.5 mL/min per 1.73 m(2) (IQR, 79.7 to 108.1) vs. 85.0 (IQR, 72.9 to 98.5) mL/min per 1.73 m(2), and the prevalence of chronic kidney disease was 11.5% (95% CI, 10.6% to 12.4%) versus 13.1% (CI, 12.1% to 14.0%). The sample contained a limited number of elderly people and racial and ethnic minorities with measured GFR. The CKD-EPI creatinine equation is more accurate than the Modification of Diet in Renal Disease Study equation and could replace it for routine clinical use. National Institute of Diabetes and Digestive and Kidney Diseases.
Article
Investigations describing the utilization pattern and documenting the value of intensive care are limited by the lack of a reliable and valid classification system. In this paper, the authors describe the development and initial validation of acute physiology and chronic health evaluation (APACHE), a physiologically based classification system for measuring severity of illness in groups of critically ill patients. APACHE uses information available in the medical record. In studies on 582 admissions to a university hospital ICU and 223 admissions to a community hospital ICU, APACHE was reliable in classifying ICU admissions. In validation studies involving these 805 admissions, the acute physiology score of APACHE demonstrated consistent agreement with subsequent therapeutic effort and mortality. This was true for a broad range of patient groups using a variety of sensitivity analyses. After successful completion of multi-institutional validation studies, the APACHE classification system could be used to control for case mix, compare outcomes, evaluate new therapies, and study the utilization of ICUs.
Article
In this article, Apgar described her innovative method for assessing the viability of newborn infants by assigning "scores" of 0, 1, or 2 to the infant's respiration, pulse, color, muscle tone, and response to stimuli one minute after birth. This procedure was later known as "Apgar scoring" and is now routinely used in hospitals worldwide.