Nima HejaziWeill Cornell Medicine | Cornell · Division of Biostatistics and Epidemiology
Nima Hejazi
Doctor of Philosophy
About
68
Publications
8,250
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,066
Citations
Introduction
I am an applied statistician interested in solving complex, data-driven, real-world problems while developing tools to empower discovery across diverse systems. My methodological interests include causal inference and causal machine learning, non/semi-parametric estimation, experimentation science, statistical machine learning and computational statistics, high-dimensional data analysis, and statistical computing and research software engineering.
Additional affiliations
March 2016 - October 2020
Publications
Publications (68)
Modified treatment policies are a widely applicable class of interventions used to study the causal effects of continuous exposures. Approaches to evaluating their causal effects assume no interference, meaning that such effects cannot be learned from data in settings where the exposure of one unit affects the outcome of others, as is common in spa...
Assessment of immune correlates of severe COVID-19 has been hampered by the low numbers of severe cases in COVID-19 vaccine efficacy (VE) trials. We assess neutralizing and binding antibody levels at 4 weeks post-Ad26.COV2.S vaccination as correlates of risk and of protection against severe-critical COVID-19 through 220 days post-vaccination in the...
Myalgic Encephalomyelitis (ME; sometimes referred to as chronic fatigue syndrome) is a relatively common and female-biased disease of unknown pathogenesis that profoundly decreases patients' health-related quality-of-life. ME diagnosis is hindered by the absence of robustly-defined and specific biomarkers that are easily measured from available sou...
Background
The first licensed malaria vaccine, RTS,S/AS01E, confers moderate protection against symptomatic disease. Because many malaria infections are asymptomatic, we conducted a large-scale longitudinal parasite genotyping study of samples from a clinical trial exploring how vaccine dosing regimen affects vaccine efficacy.
Methods
Between Sept...
Background
Stochastic interventional vaccine efficacy (SVE) analysis is a new approach to correlate of protection (CoP) analysis of a phase III trial that estimates how vaccine efficacy (VE) would change under hypothetical shifts of an immune marker.
Methods
We applied nonparametric SVE methodology to the COVE trial of messenger RNA-1273 vs placeb...
The COVE trial randomized participants to receive two doses of mRNA-1273 vaccine or placebo on Days 1 and 29 (D1, D29). Anti-SARS-CoV-2 Spike IgG binding antibodies (bAbs), anti-receptor binding domain IgG bAbs, 50% inhibitory dilution neutralizing antibody (nAb) titers, and 80% inhibitory dilution nAb titers were measured at D29 and D57. We assess...
Globally, 149 million children under 5 years of age are estimated to be stunted (length more than 2 standard deviations below international growth standards) 1,2 . Stunting, a form of linear growth faltering, increases the risk of illness, impaired cognitive development and mortality. Global stunting estimates rely on cross-sectional surveys, which...
Growth faltering in children (low length for age or low weight for length) during the first 1,000 days of life (from conception to 2 years of age) influences short-term and long-term health and survival 1,2 . Interventions such as nutritional supplementation during pregnancy and the postnatal period could help prevent growth faltering, but programm...
Sustainable Development Goal 2.2—to end malnutrition by 2030—includes the elimination of child wasting, defined as a weight-for-length z -score that is more than two standard deviations below the median of the World Health Organization standards for child growth ¹ . Prevailing methods to measure wasting rely on cross-sectional surveys that cannot m...
Longitudinal modified treatment policies (LMTP) have been recently developed as a novel method to define and estimate causal parameters that depend on the natural value of treatment. LMTPs represent an important advancement in causal inference for longitudinal studies as they allow the non-parametric definition and estimation of the joint effect of...
The best assay or marker to define mRNA-1273 vaccine-induced antibodies as a correlate of protection (CoP) is unclear. In the COVE trial, participants received two doses of the mRNA-1273 COVID-19 vaccine or placebo. We previously assessed IgG binding antibodies to the spike protein (spike IgG) or receptor binding domain (RBD IgG) and pseudovirus ne...
Heterogeneous treatment effects are driven by treatment effect modifiers, pre-treatment covariates that modify the effect of a treatment on an outcome. Current approaches for uncovering these variables are limited to low-dimensional data, data with weakly correlated covariates, or data generated according to parametric processes. We resolve these i...
In the phase 3 trial of the AZD1222 (ChAdOx1 nCoV-19) vaccine conducted in the U.S., Chile, and Peru, anti-spike binding IgG concentration (spike IgG) and pseudovirus 50% neutralizing antibody titer (nAb ID50) measured four weeks after two doses were assessed as correlates of risk and protection against PCR-confirmed symptomatic SARS-CoV-2 infectio...
Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning, and statistical theory to help answer scientifically impactful questions with statistical confidence. Targeted Learning is driven by complex problems in data science and has been implemented in a diversity of real‐world scenarios: observationa...
In the PREVENT-19 phase 3 trial of the NVX-CoV2373 vaccine (NCT04611802), anti-spike binding IgG concentration (spike IgG), anti-RBD binding IgG concentration (RBD IgG), and pseudovirus 50% neutralizing antibody titer (nAb ID50) measured two weeks post-dose two are assessed as correlates of risk and as correlates of protection against COVID-19. Ana...
Measuring immune correlates of disease acquisition and protection in the context of a clinical trial is a prerequisite for improved vaccine design. We analysed binding and neutralizing antibody measurements 4 weeks post vaccination as correlates of risk of moderate to severe-critical COVID-19 through 83 d post vaccination in the phase 3, double-bli...
About forty years ago, in a now--seminal contribution, Rosenbaum & Rubin (1983) introduced a critical characterization of the propensity score as a central quantity for drawing causal inferences in observational study settings. In the decades since, much progress has been made across several research fronts in causal inference, notably including th...
The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of...
Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo‐population in which selection biases are eliminated. Despite their e...
In the randomized, placebo-controlled PREVENT-19 phase 3 trial conducted in the U.S. and Mexico of the NVX-CoV2373 adjuvanted, recombinant spike protein nanoparticle vaccine, anti-spike binding IgG concentration (spike IgG) and pseudovirus 50% neutralizing antibody titer (nAb ID50) measured two weeks after two doses were assessed as correlates of r...
Continuous treatments have posed a significant challenge for causal inference, both in the formulation and identification of scientifically meaningful effects and in their robust estimation. Traditionally, focus has been placed on techniques applicable to binary or categorical treatments with few levels, allowing for the application of propensity s...
Anti-spike IgG binding antibody, anti-receptor binding domain IgG antibody, and pseudovirus neutralizing antibody measurements four weeks post-vaccination were assessed as correlates of risk of moderate to severe-critical COVID-19 outcomes through 83 days post-vaccination and as correlates of protection following a single dose of Ad26.COV2.S COVID-...
Longitudinal modified treatment policies (LMTP) have been recently developed as a novel method to define and estimate causal parameters that depend on the natural value of treatment. LMTPs represent an important advancement in causal inference for longitudinal studies as they allow the non-parametric definition and estimation of the joint effect of...
Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary exposures and static interventions and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by exposure. We present a theoretica...
Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that...
Background:
Hundreds of thousands of biodigesters have been constructed in Nepal. These household-level systems use human and animal waste to produce clean-burning biogas used for cooking, which can reduce household air pollution from woodburning cookstoves and prevent respiratory illnesses. The biodigesters, typically operated by female caregiver...
Antibody levels predict vaccine efficacy
Symptomatic COVID-19 infection can be prevented by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccines. A “correlate of protection” is a molecular biomarker to measure how much immunity is needed to fight infection and is key for successful global immunization programs. Gilbert et al . dete...
We present an end-to-end methodological framework for causal segment discovery that aims to uncover differential impacts of treatments across subgroups of users in large-scale digital experiments. Building on recent developments in causal inference and non/semi-parametric statistics, our approach unifies two objectives: (1) the discovery of user se...
Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that...
Background: In the Coronavirus Efficacy (COVE) trial, estimated mRNA-1273 vaccine efficacy against coronavirus disease-19 (COVID-19) was 94%. SARS-CoV-2 antibody measurements were assessed as correlates of COVID-19 risk and as correlates of protection.
Methods: Through case-cohort sampling, participants were selected for measurement of four serum a...
Background
A safe, effective vaccine is essential to eradicating human immunodeficiency virus (HIV) infection. A canarypox–protein HIV vaccine regimen (ALVAC-HIV plus AIDSVAX B/E) showed modest efficacy in reducing infection in Thailand. An analogous regimen using HIV-1 subtype C virus showed potent humoral and cellular responses in a phase 1–2a tr...
The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of...
Background and Aims
A recent study found that homeless individuals with opioid use disorder (OUD) had a lower risk of relapse on extended‐release naltrexone (XR‐NTX) versus buprenorphine‐naloxone (BUP‐NX), whereas non‐homeless individuals had a lower risk of relapse on BUP‐NX. This secondary study examined differences in mediation pathways to medic...
Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient infl...
The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including HIV, have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve...
Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoret...
Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning and statistical theory to help answer scientifically impactful questions with statistical confidence. Targeted Learning is driven by complex problems in data science and has been implemented in a diversity of real-world scenarios: observational...
Child growth failure is associated with a higher risk of illness and mortality, which contributed to the United Nations Sustainable Development Goal 2.2 to end malnutrition by 2030. Current prenatal and postnatal interventions, such as nutritional supplementation, have been insufficient to eliminate growth failure in low resource settings -motivati...
Globally 149 million children under five are estimated to be stunted (length more than 2 standard deviations below international growth standards). Stunting, a form of linear growth failure, increases risk of illness, impaired cognitive development, and mortality. Global stunting estimates rely on cross-sectional surveys, which cannot provide direc...
Acute malnutrition accounts for an immense disease burden and is implicated as a key, underlying cause of child mortality in low resource settings. Child wasting, defined as weight-for-length more than 2 standard deviations below international standards, is a leading indicator to measure the Sustainable Development Goal target to end malnutrition b...
The widespread availability of high-dimensional biological data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such datasets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confoun...
Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo-population in which selection biases are eliminated. Despite their e...
The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including HIV, have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve...
Motivation:
Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and othe...
Mediation analysis in causal inference has traditionally focused on binary exposures and deterministic interventions, and a decomposition of the average treatment effect in terms of direct and indirect effects. We present an analogous decomposition of the population intervention effect, defined through stochastic interventions on the exposure. Popu...
Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient infl...
Motivation
Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others...
Arsenic exposure is a worldwide health concern associated with an increased risk of skin, lung and bladder cancer but arsenic trioxide (ATO; AsIII) is also an effective chemotherapeutic agent. The current use of AsIII in chemotherapy is limited to acute promyelocytic leukemia (APL). However, AsIII was suggested as a potential therapy for other canc...
Mediation analysis in causal inference has traditionally focused on binary treatment regimes and deterministic interventions, and a decomposition of the average treatment effect in terms of direct and indirect effects. In this paper we present an analogous decomposition of the \textit{population intervention effect}, defined through stochastic inte...
We present methyvim , an R package implementing an algorithm for the nonparametric estimation of the effects of exposures on DNA methylation at CpG sites throughout the genome, complete with straightforward statistical inference for such estimates. The approach leverages variable importance measures derived from statistical parameters arising in ca...
The widespread availability of high-dimensional biological data has made the simultaneous screening of many biological characteristics a central problem in computational biology and allied sciences. While the dimensionality of such datasets continues to grow, so too does the complexity of biomarker identification from exposure patterns in health st...
Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses. For high-dimensional multivariate distributions, these hypotheses may concern a wide range of parameters, with complex and unknown dependence structures among variabl...