Nima Hejazi

Nima Hejazi
Weill Cornell Medicine | Cornell ·  Division of Biostatistics and Epidemiology

Doctor of Philosophy

About

68
Publications
8,250
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,066
Citations
Introduction
I am an applied statistician interested in solving complex, data-driven, real-world problems while developing tools to empower discovery across diverse systems. My methodological interests include causal inference and causal machine learning, non/semi-parametric estimation, experimentation science, statistical machine learning and computational statistics, high-dimensional data analysis, and statistical computing and research software engineering.
Additional affiliations
March 2016 - October 2020
University of California, Berkeley
Position
  • PhD Student
Education
August 2017 - May 2021
University of California, Berkeley
Field of study
  • Biostatistics
August 2016 - May 2017
University of California, Berkeley
Field of study
  • Biostatistics
August 2011 - December 2015
University of California, Berkeley
Field of study
  • Molecular and Cell Biology (Neurobiology), Psychology, Public Health

Publications

Publications (68)
Preprint
Modified treatment policies are a widely applicable class of interventions used to study the causal effects of continuous exposures. Approaches to evaluating their causal effects assume no interference, meaning that such effects cannot be learned from data in settings where the exposure of one unit affects the outcome of others, as is common in spa...
Article
Full-text available
Assessment of immune correlates of severe COVID-19 has been hampered by the low numbers of severe cases in COVID-19 vaccine efficacy (VE) trials. We assess neutralizing and binding antibody levels at 4 weeks post-Ad26.COV2.S vaccination as correlates of risk and of protection against severe-critical COVID-19 through 220 days post-vaccination in the...
Preprint
Full-text available
Myalgic Encephalomyelitis (ME; sometimes referred to as chronic fatigue syndrome) is a relatively common and female-biased disease of unknown pathogenesis that profoundly decreases patients' health-related quality-of-life. ME diagnosis is hindered by the absence of robustly-defined and specific biomarkers that are easily measured from available sou...
Article
Background The first licensed malaria vaccine, RTS,S/AS01E, confers moderate protection against symptomatic disease. Because many malaria infections are asymptomatic, we conducted a large-scale longitudinal parasite genotyping study of samples from a clinical trial exploring how vaccine dosing regimen affects vaccine efficacy. Methods Between Sept...
Article
Background Stochastic interventional vaccine efficacy (SVE) analysis is a new approach to correlate of protection (CoP) analysis of a phase III trial that estimates how vaccine efficacy (VE) would change under hypothetical shifts of an immune marker. Methods We applied nonparametric SVE methodology to the COVE trial of messenger RNA-1273 vs placeb...
Article
Full-text available
The COVE trial randomized participants to receive two doses of mRNA-1273 vaccine or placebo on Days 1 and 29 (D1, D29). Anti-SARS-CoV-2 Spike IgG binding antibodies (bAbs), anti-receptor binding domain IgG bAbs, 50% inhibitory dilution neutralizing antibody (nAb) titers, and 80% inhibitory dilution nAb titers were measured at D29 and D57. We assess...
Article
Full-text available
Globally, 149 million children under 5 years of age are estimated to be stunted (length more than 2 standard deviations below international growth standards) 1,2 . Stunting, a form of linear growth faltering, increases the risk of illness, impaired cognitive development and mortality. Global stunting estimates rely on cross-sectional surveys, which...
Article
Full-text available
Growth faltering in children (low length for age or low weight for length) during the first 1,000 days of life (from conception to 2 years of age) influences short-term and long-term health and survival 1,2 . Interventions such as nutritional supplementation during pregnancy and the postnatal period could help prevent growth faltering, but programm...
Article
Full-text available
Sustainable Development Goal 2.2—to end malnutrition by 2030—includes the elimination of child wasting, defined as a weight-for-length z -score that is more than two standard deviations below the median of the World Health Organization standards for child growth ¹ . Prevailing methods to measure wasting rely on cross-sectional surveys that cannot m...
Article
Full-text available
Longitudinal modified treatment policies (LMTP) have been recently developed as a novel method to define and estimate causal parameters that depend on the natural value of treatment. LMTPs represent an important advancement in causal inference for longitudinal studies as they allow the non-parametric definition and estimation of the joint effect of...
Article
The best assay or marker to define mRNA-1273 vaccine-induced antibodies as a correlate of protection (CoP) is unclear. In the COVE trial, participants received two doses of the mRNA-1273 COVID-19 vaccine or placebo. We previously assessed IgG binding antibodies to the spike protein (spike IgG) or receptor binding domain (RBD IgG) and pseudovirus ne...
Preprint
Full-text available
Heterogeneous treatment effects are driven by treatment effect modifiers, pre-treatment covariates that modify the effect of a treatment on an outcome. Current approaches for uncovering these variables are limited to low-dimensional data, data with weakly correlated covariates, or data generated according to parametric processes. We resolve these i...
Article
Full-text available
In the phase 3 trial of the AZD1222 (ChAdOx1 nCoV-19) vaccine conducted in the U.S., Chile, and Peru, anti-spike binding IgG concentration (spike IgG) and pseudovirus 50% neutralizing antibody titer (nAb ID50) measured four weeks after two doses were assessed as correlates of risk and protection against PCR-confirmed symptomatic SARS-CoV-2 infectio...
Chapter
Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning, and statistical theory to help answer scientifically impactful questions with statistical confidence. Targeted Learning is driven by complex problems in data science and has been implemented in a diversity of real‐world scenarios: observationa...
Article
Full-text available
In the PREVENT-19 phase 3 trial of the NVX-CoV2373 vaccine (NCT04611802), anti-spike binding IgG concentration (spike IgG), anti-RBD binding IgG concentration (RBD IgG), and pseudovirus 50% neutralizing antibody titer (nAb ID50) measured two weeks post-dose two are assessed as correlates of risk and as correlates of protection against COVID-19. Ana...
Article
Full-text available
Measuring immune correlates of disease acquisition and protection in the context of a clinical trial is a prerequisite for improved vaccine design. We analysed binding and neutralizing antibody measurements 4 weeks post vaccination as correlates of risk of moderate to severe-critical COVID-19 through 83 d post vaccination in the phase 3, double-bli...
Preprint
About forty years ago, in a now--seminal contribution, Rosenbaum & Rubin (1983) introduced a critical characterization of the propensity score as a central quantity for drawing causal inferences in observational study settings. In the decades since, much progress has been made across several research fronts in causal inference, notably including th...
Article
The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of...
Article
Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo‐population in which selection biases are eliminated. Despite their e...
Preprint
In the randomized, placebo-controlled PREVENT-19 phase 3 trial conducted in the U.S. and Mexico of the NVX-CoV2373 adjuvanted, recombinant spike protein nanoparticle vaccine, anti-spike binding IgG concentration (spike IgG) and pseudovirus 50% neutralizing antibody titer (nAb ID50) measured two weeks after two doses were assessed as correlates of r...
Preprint
Continuous treatments have posed a significant challenge for causal inference, both in the formulation and identification of scientifically meaningful effects and in their robust estimation. Traditionally, focus has been placed on techniques applicable to binary or categorical treatments with few levels, allowing for the application of propensity s...
Preprint
Full-text available
Anti-spike IgG binding antibody, anti-receptor binding domain IgG antibody, and pseudovirus neutralizing antibody measurements four weeks post-vaccination were assessed as correlates of risk of moderate to severe-critical COVID-19 outcomes through 83 days post-vaccination and as correlates of protection following a single dose of Ad26.COV2.S COVID-...
Preprint
Longitudinal modified treatment policies (LMTP) have been recently developed as a novel method to define and estimate causal parameters that depend on the natural value of treatment. LMTPs represent an important advancement in causal inference for longitudinal studies as they allow the non-parametric definition and estimation of the joint effect of...
Article
Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary exposures and static interventions and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by exposure. We present a theoretica...
Article
Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that...
Article
Full-text available
Background: Hundreds of thousands of biodigesters have been constructed in Nepal. These household-level systems use human and animal waste to produce clean-burning biogas used for cooking, which can reduce household air pollution from woodburning cookstoves and prevent respiratory illnesses. The biodigesters, typically operated by female caregiver...
Article
Full-text available
Antibody levels predict vaccine efficacy Symptomatic COVID-19 infection can be prevented by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccines. A “correlate of protection” is a molecular biomarker to measure how much immunity is needed to fight infection and is key for successful global immunization programs. Gilbert et al . dete...
Preprint
Full-text available
We present an end-to-end methodological framework for causal segment discovery that aims to uncover differential impacts of treatments across subgroups of users in large-scale digital experiments. Building on recent developments in causal inference and non/semi-parametric statistics, our approach unifies two objectives: (1) the discovery of user se...
Preprint
Full-text available
Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that...
Preprint
Full-text available
Background: In the Coronavirus Efficacy (COVE) trial, estimated mRNA-1273 vaccine efficacy against coronavirus disease-19 (COVID-19) was 94%. SARS-CoV-2 antibody measurements were assessed as correlates of COVID-19 risk and as correlates of protection. Methods: Through case-cohort sampling, participants were selected for measurement of four serum a...
Article
Full-text available
Background A safe, effective vaccine is essential to eradicating human immunodeficiency virus (HIV) infection. A canarypox–protein HIV vaccine regimen (ALVAC-HIV plus AIDSVAX B/E) showed modest efficacy in reducing infection in Thailand. An analogous regimen using HIV-1 subtype C virus showed potent humoral and cellular responses in a phase 1–2a tr...
Preprint
Full-text available
The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of...
Article
Background and Aims A recent study found that homeless individuals with opioid use disorder (OUD) had a lower risk of relapse on extended‐release naltrexone (XR‐NTX) versus buprenorphine‐naloxone (BUP‐NX), whereas non‐homeless individuals had a lower risk of relapse on BUP‐NX. This secondary study examined differences in mediation pathways to medic...
Article
Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient infl...
Article
Full-text available
The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including HIV, have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve...
Preprint
Full-text available
Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoret...
Preprint
Full-text available
Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning and statistical theory to help answer scientifically impactful questions with statistical confidence. Targeted Learning is driven by complex problems in data science and has been implemented in a diversity of real-world scenarios: observational...
Preprint
Full-text available
Child growth failure is associated with a higher risk of illness and mortality, which contributed to the United Nations Sustainable Development Goal 2.2 to end malnutrition by 2030. Current prenatal and postnatal interventions, such as nutritional supplementation, have been insufficient to eliminate growth failure in low resource settings -motivati...
Preprint
Full-text available
Globally 149 million children under five are estimated to be stunted (length more than 2 standard deviations below international growth standards). Stunting, a form of linear growth failure, increases risk of illness, impaired cognitive development, and mortality. Global stunting estimates rely on cross-sectional surveys, which cannot provide direc...
Preprint
Full-text available
Acute malnutrition accounts for an immense disease burden and is implicated as a key, underlying cause of child mortality in low resource settings. Child wasting, defined as weight-for-length more than 2 standard deviations below international standards, is a leading indicator to measure the Sustainable Development Goal target to end malnutrition b...
Article
Full-text available
The widespread availability of high-dimensional biological data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such datasets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confoun...
Preprint
Full-text available
Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo-population in which selection biases are eliminated. Despite their e...
Preprint
Full-text available
The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including HIV, have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve...
Article
Full-text available
Motivation: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and othe...
Article
Mediation analysis in causal inference has traditionally focused on binary exposures and deterministic interventions, and a decomposition of the average treatment effect in terms of direct and indirect effects. We present an analogous decomposition of the population intervention effect, defined through stochastic interventions on the exposure. Popu...
Preprint
Full-text available
Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient infl...
Preprint
Full-text available
Motivation Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others...
Article
Arsenic exposure is a worldwide health concern associated with an increased risk of skin, lung and bladder cancer but arsenic trioxide (ATO; AsIII) is also an effective chemotherapeutic agent. The current use of AsIII in chemotherapy is limited to acute promyelocytic leukemia (APL). However, AsIII was suggested as a potential therapy for other canc...
Preprint
Mediation analysis in causal inference has traditionally focused on binary treatment regimes and deterministic interventions, and a decomposition of the average treatment effect in terms of direct and indirect effects. In this paper we present an analogous decomposition of the \textit{population intervention effect}, defined through stochastic inte...
Article
Full-text available
We present methyvim , an R package implementing an algorithm for the nonparametric estimation of the effects of exposures on DNA methylation at CpG sites throughout the genome, complete with straightforward statistical inference for such estimates. The approach leverages variable importance measures derived from statistical parameters arising in ca...
Preprint
The widespread availability of high-dimensional biological data has made the simultaneous screening of many biological characteristics a central problem in computational biology and allied sciences. While the dimensionality of such datasets continues to grow, so too does the complexity of biomarker identification from exposure patterns in health st...
Article
Full-text available
Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses. For high-dimensional multivariate distributions, these hypotheses may concern a wide range of parameters, with complex and unknown dependence structures among variabl...

Network

Cited By