Scott Zeger

Scott Zeger
Johns Hopkins Bloomberg School of Public Health | JHSPH · Department of Biostatistics

About

449
Publications
61,639
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
88,042
Citations

Publications

Publications (449)
Article
Full-text available
Objectives In low-income countries, birth weights for home deliveries are often measured at the nadir when babies may lose up of 10% of their birth weight, biasing estimates of small-for-gestational age (SGA) and low birth weight (LBW). We aimed to develop an imputation model that predicts the ‘true’ birth weight at time of delivery. Design We dev...
Article
Age-related deficits in pattern separation have been postulated to bias the output of hippocampal memory processing toward pattern completion, which can cause deficits in accurate memory retrieval. Although the CA3 region of the hippocampus is often conceptualized as a homogeneous network involved in pattern completion, growing evidence demonstrate...
Preprint
Background In South Asia, a third of babies are born small-for-gestational age (SGA) accounting for a quarter of all neonatal deaths. The risk factors are well described in the literature, but many studies are in high-and-middle income countries or measure SGA on facility births only. There are fewer studies that describe the prevalence of risk fac...
Article
Full-text available
Background: COVID-19 is a global pandemic caused by the novel coronavirus SARS-CoV-2. Some clinical features of severe COVID-19 represent blood vessel damage induced by activation of host immune responses, initiated by the virus. We hypothesized that autoantibodies against angiotensin converting enzyme-2 (ACE2), the SARS-CoV-2 receptor expressed o...
Article
Purpose: To assess factors associated with gender disparities in cataract surgery volume and evaluate how these differences have changed over time. Setting: Cataract surgeons in the 2012-2018 Medicare database. Design: Retrospective study. Methods: The association of provider gender with the number of cataract surgeries per office visit bill...
Preprint
Full-text available
This paper describes and illustrates the functionality of the baker R package. The package estimates a suite of nested partially-latent class models (NPLCM) for multivariate binary responses that are observed under a case-control design. The baker package allows researchers to flexibly estimate population-level class prevalences and posterior proba...
Article
Background: Prior observation has shown differences in COVID-19 hospitalization rates between SARS-CoV-2 variants, but limited information describes differences in hospitalization outcomes. Methods: Patients admitted to 5 hospitals with COVID-19 were included if they had hypoxia, tachypnea, tachycardia, or fever, and data to describe SARS-CoV-2...
Article
Background Little is known about the relationship between social determinants of health (SDH) and medication adherence among Medicaid beneficiaries with hypertension. Methods We conducted a posthoc subgroup analysis of 3044 adult Medicaid beneficiaries who enrolled in a parent prospective cohort study and had a diagnosis of hypertension based on t...
Preprint
Full-text available
Introduction Sudden cardiac death (SCD) is a devastating consequence often without antecedent expectation. Current risk stratification methods derived from baseline independently modeled risk factors are insufficient. Novel random forest machine learning (ML) approach incorporating time-dependent variables and complex interactions may improve SCD r...
Preprint
Full-text available
As clinicians are faced with a deluge of new information, data science can play a key role in highlighting key features towards developing new clinical hypotheses. Indeed, insights derived from machine learning can serve as a clinical support tool by connecting care providers with results from big data analysis to identify latent patterns that may...
Article
Full-text available
Background Scleroderma is a serious chronic autoimmune disease in which a patient’s disease state manifests in several irregularly spaced longitudinal measures of lung, heart, skin, and other organ systems. Threshold crossings of pulmonary and cardiac measures indicate potentially life-threatening key clinical events including interstitial lung dis...
Preprint
COVID-19 has challenged health systems to learn how to learn. This paper describes the context and methods for learning at one academic health center. Longitudinal regression models are used to represent the joint distribution of major clinical events including discharge, ventilation and death as well as multivariate biomarker processes that descri...
Preprint
Age-related deficits in pattern separation have been postulated to bias the output of hippocampal memory processing toward pattern completion, which can cause deficits in accurate memory retrieval. While the CA3 region of the hippocampus is often conceptualized as a homogeneous network involved in pattern completion, growing evidence demonstrates a...
Article
Full-text available
Introduction This study evaluates the association of multidimensional social determinants of health (SDoH) with non-adherence to diabetic retinopathy examinations. Research design and methods This was a post-hoc subgroup analysis of adults with diabetes in a prospective cohort study of enrollees in the Washington, DC Medicaid program. At study enr...
Article
Full-text available
Background Males experience increased severity of illness and mortality from SARS-CoV-2 compared to females but the mechanisms of male susceptibility are unclear. Methods We performed a retrospective cohort analysis of SARS-CoV-2 testing and admission data at 5 hospitals in the Maryland/Washington DC area. Using age-stratified logistic regression...
Article
Introduction: Sudden cardiac death (SCD) is the leading cause of death in the US and has significant public health impact. However, effective risk stratification for SCD remains lacking as current prediction models do not address the dynamic impact of time-varying risk factors including interim clinical events on SCD risk. Hypothesis: A recently de...
Article
Longitudinal trajectories of vital signs and biomarkers during admission remain poorly characterized for COVID-19 patients despite their potential to provide critical insights about disease progression. We studied 1884 patients with SARS-CoV2 infection from 3/4/2020-6/25/2020 within one Maryland hospital system and used a retrospective longitudinal...
Article
Full-text available
The plasma proteomic changes that precede the onset of dementia could yield insights into disease biology and highlight new biomarkers and avenues for intervention. We quantified 4,877 plasma proteins in nondemented older adults in the Atherosclerosis Risk in Communities cohort and performed a proteome-wide association study of dementia risk over f...
Preprint
Background. Rates of severe illness and mortality from SARS-CoV-2 are greater for males, but the mechanisms for this difference are unclear. Understanding the differences in outcomes between males and females across the age spectrum will guide both public health and biomedical interventions. Methods. Retrospective cohort analysis of SARS-CoV-2 test...
Article
Full-text available
Quantification learning is the task of prevalence estimation for a test population using predictions from a classifier trained on a different population. Quantification methods assume that the sensitivities and specificities of the classifier are either perfect or transportable from the training to the test population. These assumptions are inappro...
Article
Full-text available
Pneumococcal conjugate vaccine (PCV) introduction has reduced pneumococcal meningitis incidence. The Pneumococcal Serotype Replacement and Distribution Estimation (PSERENADE) project described the serotype distribution of remaining pneumococcal meningitis in countries using PCV10/13 for least 5–7 years with primary series uptake above 70%. The dist...
Article
Full-text available
Streptococcus pneumoniae serotype 1 (ST1) was an important cause of invasive pneumococcal disease (IPD) globally before the introduction of pneumococcal conjugate vaccines (PCVs) containing ST1 antigen. The Pneumococcal Serotype Replacement and Distribution Estimation (PSERENADE) project gathered ST1 IPD surveillance data from sites globally and ai...
Article
Full-text available
Importance Clinical effectiveness data on remdesivir are urgently needed, especially among diverse populations and in combination with other therapies. Objective To examine whether remdesivir administered with or without corticosteroids for treatment of coronavirus disease 2019 (COVID-19) is associated with more rapid clinical improvement in a rac...
Article
Background: Predicting the clinical trajectory of individual patients hospitalized with coronavirus disease 2019 (COVID-19) is challenging but necessary to inform clinical care. The majority of COVID-19 prognostic tools use only data present upon admission and do not incorporate changes occurring after admission. Objective: To develop the Severe...
Article
Study objective We evaluate the relationship between social determinants of health and emergency department (ED) visits in the Medicaid Cohort of the District of Columbia. Methods We conducted a retrospective cohort analysis of 8,943 adult Medicaid beneficiaries who completed a social determinants of health survey at study enrollment. We merged th...
Article
PURPOSE The Bone Metastases Ensemble Trees for Survival (BMETS) model uses a machine learning algorithm to estimate survival time following consultation for palliative radiation therapy for symptomatic bone metastases (SBM). BMETS was developed at a tertiary-care, academic medical center, but its validity and stability when applied to external data...
Article
Full-text available
Compositional data are common in many fields, both as outcomes and predictor variables. The inventory of models for the case when both the outcome and predictor variables are compositional is limited and the existing models are often difficult to interpret in the compositional space, due to their use of complex log‐ratio transformations. We develop...
Article
Objective: To develop distinct social risk profiles based on social determinants of health (SDH) information and to determine whether these social risk groups varied in terms of health, health care utilization, and costs. Methods: We prospectively enrolled 8943 beneficiaries insured by the District of Columbia Medicaid program between September...
Article
While plasma levels of several etiologically relevant molecules have been associated with Alzheimer’s disease and dementia more broadly, information about the full range of changes in the plasma proteome that precede dementia is lacking. Accordingly, this study used modified aptamer technology (SOMAscan) to examine the relationship between the plas...
Article
Learning health systems use data to generate knowledge that informs clinical care, but few studies have evaluated how to leverage patient-reported mental health symptoms and substance use data to make patient-specific predictions. We developed a general Bayesian prediction algorithm that uses self-reported psychiatric symptoms and substance use wit...
Article
Age-related memory deficits are correlated with neural hyperactivity in the CA3 region of the hippocampus. Abnormal CA3 hyperactivity in aged rats has been proposed to contribute to an imbalance between pattern separation and pattern completion, resulting in overly rigid representations. Recent evidence of functional heterogeneity along the CA3 tra...
Preprint
Rationale Remdesivir and dexamethasone reduced the severity of COVID-19 in clinical trials. However, their individual or combined effectiveness in clinical practice remains unknown. Objectives To examine the effectiveness of remdesivir with or without dexamethasone. Methods We conducted a multicenter, retrospective cohort study between March 4 an...
Article
Adverse health effects of household air pollution, including acute lower respiratory infections (ALRIs), pose a major health burden around the world, particularly in settings where indoor combustion stoves are used for cooking. Individual studies have limited exposure ranges and sample sizes, while pooling studies together can improve statistical p...
Article
Background Current approaches fail to separate patients at high versus low risk for ventricular arrhythmias owing to overreliance on a snapshot left ventricular ejection fraction measure. We used statistical machine learning to identify important cardiac imaging and time‐varying risk predictors. Methods and Results Three hundred eighty‐two cardiom...
Preprint
Full-text available
SARS-CoV-2 infection induces severe disease in a subpopulation of patients, but the underlying mechanisms remain unclear. We demonstrate robust IgM autoantibodies that recognize angiotensin converting enzyme-2 (ACE2) in 18/66 (27%) patients with severe COVID-19, which are rare (2/52; 3.8%) in hospitalized patients who are not ventilated. The antibo...
Article
This paper presents a model-based method for clustering multivariate binary observations that incorporates constraints consistent with the scientific context. The approach is motivated by the precision medicine problem of identifying autoimmune disease patient subsets or classes who may require different treatments. We start with a family of restri...
Article
Background: Risk factors for progression of coronavirus 2019 (COVID-19) to severe disease or death are underexplored in U.S. cohorts. Objective: To determine the factors on hospital admission that are predictive of severe disease or death from COVID-19. Design: Retrospective cohort analysis. Setting: Five hospitals in the Maryland and Washin...
Preprint
Age-related memory deficits are correlated with neural hyperactivity in the CA1 and CA3 regions of the hippocampus. Abnormal CA3 hyperactivity in aged rats has been proposed to contribute to an imbalance between the normal tradeoff between pattern separation and pattern completion, resulting in overly rigid representations. Recent evidence of funct...
Article
Full-text available
Background: Few randomized trials have assessed the impact of reducing household air pollution from biomass stoves on adverse birth outcomes in low-income countries. Methods: Two sequential trials were conducted in rural low-lying Nepal. Trial 1 was a cluster-randomized step-wedge trial comparing traditional biomass stoves and improved biomass s...
Preprint
Full-text available
Background: Risk factors for poor outcomes from COVID-19 are emerging among US cohorts, but patient trajectories during hospitalization ranging from mild-moderate, severe, and death and the factors associated with these outcomes have been underexplored. Methods: We performed a cohort analysis of consecutive COVID-19 hospital admissions at 5 Johns H...
Article
Background To determine if a machine learning approach optimizes survival estimation for patients with symptomatic bone metastases (SBM), we developed the Bone Metastases Ensemble Trees for Survival (BMETS) to predict survival using 27 prognostic covariates. To establish relative clinical utility, we compared BMETS to two simpler Cox regression mod...
Preprint
Compositional data are common in many fields, both as outcomes and predictor variables. The inventory of models for the case when both the outcome and predictor variables are compositional is limited and the existing models are difficult to interpret, due to their use of complex log-ratio transformations. We develop a transformation-free linear reg...
Article
Full-text available
Computer-coded verbal autopsy (CCVA) algorithms predict cause of death from high-dimensional family questionnaire data (verbal autopsy) of a deceased individual, which are then aggregated to generate national and regional estimates of cause-specific mortality fractions. These estimates may be inaccurate if CCVA is trained on non-local training data...
Preprint
Quantification Learning is the task of prevalence estimation for a test population using predictions from a classifier trained on a different population. Commonly used quantification methods either assume perfect sensitivity and specificity of the classifier, or use the training data to both train the classifier and also estimate its misclassificat...
Article
Full-text available
Background: Clinical research and medical practice can be advanced through the prediction of an individual's health state, trajectory, and responses to treatments. However, the majority of current clinical risk prediction models are based on regression approaches or machine learning algorithms that are static, rather than dynamic. To benefit from...
Article
Introduction Although there is abundant evidence of vascular perturbation from studies of peripheral blood in systemic sclerosis (SSc), there are few data about the ability to use biomarkers of vascular injury and growth factors to predict vascular outcomes and response to therapy. We sought to explore the association between candidate vascular bio...
Article
Full-text available
Background: Low-income and middle-income countries (LMICs) seek to better utilize household and health facility survey data for monitoring and evaluation, as well as for health program planning. However, analysis of this complex survey data are complicated. In Tanzania, the National Evaluation Platform project sought to analyze Demographic and Hea...
Article
Background: Numerous randomized trials have demonstrated non-inferiority of single- versus multiple-fraction palliative radiotherapy (RT) in the management of "uncomplicated" bone metastases. Yet there is neither a clear definition of what constitutes a "complicated" lesion nor substantial data regarding the prevalence of such "complicating" featu...
Article
Full-text available
Introduction Measuring quality of care in low-income and middle-income countries is complicated by the lack of a standard, universally accepted definition for ‘quality’ for any particular service, as well as limited guidance on which indicators to include in measures of quality of care, and how to incorporate those indicators into summary indices....
Article
Full-text available
Wearable sweat sensors have enabled real-time monitoring of sweat profiles (sweat concentration versus time) and could enable monitoring of electrolyte loss during exercise or for individuals working in extreme environments. To assess the feasibility of using a wearable sweat chloride sensor for real-time monitoring of individuals during exercise,...
Article
Objective: To estimate the contributions of health-related quality of life domains to the patient global assessment of health (PGA) in RA. Methods: Data are drawn from baseline visits of two observational RA cohorts. Participants completed patient-reported outcome measures (PROMs) including PGA and Patient-Reported Outcomes Measurement Informati...
Preprint
Cookstove replacement trials have found mixed results on their impact on respiratory health. The limited range of concentrations and small sample sizes of individual studies are important factors that may be limiting their statistical power. We present a hierarchical approach to modeling exposure concentrations and pooling data from multiple studie...
Preprint
BACKGROUND Despite the promise of machine learning (ML) to inform individualized medical care, the clinical utility of ML in medicine has been limited by the minimal interpretability and “black box” nature of these algorithms. OBJECTIVE To demonstrate a general framework for generating clinically-relevant and interpretable visualizations of “black...
Article
Background: Despite the promise of machine learning (ML) to inform individualized medical care, the clinical utility of ML in medicine has been limited by the minimal interpretability and black box nature of these algorithms. Objective: The study aimed to demonstrate a general and simple framework for generating clinically relevant and interpret...
Article
Background: Our objective was to evaluate the relationship between the "Make The Call, Don't Miss a Beat" national mass media campaign and emergency medical services (EMS) use among women with possible heart attack symptoms. Methods: We linked campaign TV public service advertisement data with national EMS activation data for 2010 to 2014. We id...
Article
Full-text available
Background: Pneumonia is the leading cause of death among children younger than 5 years. In this study, we estimated causes of pneumonia in young African and Asian children, using novel analytical methods applied to clinical and microbiological findings. Methods: We did a multi-site, international case-control study in nine study sites in seven...
Article
The rising burden of healthcare costs suggests that the healthcare system could benefit from novel methods that allow for continuous learning to provide more data-driven, individualised care at lower costs and with improved outcomes. Here, we present our synergistic Learning approach for Prediction, Interpretation/Inference and Communication (Learn...
Article
Chronic disease now affects approximately half of the US population, causes 7 in 10 deaths, and accounts for roughly 80% of US health care expenditure. Because the root causes of chronic diseases are largely behavioral, effective therapies require frequent, individualized interventions that extend beyond the hospital and clinic to reach patients in...
Article
Background: We sought to determine whether gender disparities exist in the prehospital management of chest pain (CP) or out-of-hospital cardiac arrest (OHCA) among patients who accessed the emergency medical services (EMS) system. Methods: We obtained 2010-2013 data from the National Emergency Medical Services Information System and identified a...
Preprint
Computer-coded-verbal-autopsy (CCVA) algorithms used to generate burden-of-disease estimates rely on non-local training data and yield inaccurate estimates in local context. We present a general calibration framework to improve estimates of cause-specific-mortality-fractions from CCVA when limited local training data is available. We formulate a Ba...
Preprint
Although the global burden of chronic diseases is rapidly growing, there is currently a mismatch between therapies that the healthcare system is equipped to provide and the interventions demonstrated to address the root causes of these chronic diseases. Since the underlying causes of chronic diseases are substantially lifestyle related, effective t...