Eric J. Tchetgen Tchetgen’s research while affiliated with University of Pennsylvania and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (348)


Figure A.1: Empirical coverage of all methods in setting 1. The boxplots represent results obtained from 100 i.i.d. generated datasets, with the red line indicating the target coverage level of 0.9.
Figure A.2: Empirical coverage of all methods in setting 2. The boxplots represent results obtained from 100 i.i.d. generated datasets, with the red line indicating the target coverage level of 0.9.
Figure A.3: Empirical coverage of all methods in setting 3. The boxplots represent results obtained from 100 i.i.d. generated datasets, with the red line indicating the target coverage level of 0.9.
Doubly Robust and Efficient Calibration of Prediction Sets for Censored Time-to-Event Outcomes
  • Preprint
  • File available

January 2025

·

1 Read

Rebecca Farina

·

Arun Kumar Kuchibhotla

·

Eric J. Tchetgen Tchetgen

Our objective is to construct well-calibrated prediction sets for a time-to-event outcome subject to right-censoring with guaranteed coverage. Our approach is inspired by modern conformal inference literature, in that, unlike classical frameworks, we obviate the need for a well-specified parametric or semi-parametric survival model to accomplish our goal. In contrast to existing conformal prediction methods for survival data, which restrict censoring to be of Type I, whereby potential censoring times are assumed to be fully observed on all units in both training and validation samples, we consider the more common right-censoring setting in which either only the censoring time or only the event time of primary interest is directly observed, whichever comes first. Under a standard conditional independence assumption between the potential survival and censoring times given covariates, we propose and analyze two methods to construct valid and efficient lower predictive bounds for the survival time of a future observation. The proposed methods build upon modern semiparametric efficiency theory for censored data, in that the first approach incorporates inverse-probability-of-censoring weighting (IPCW), while the second approach is based on augmented-inverse-probability-of-censoring weighting (AIPCW). For both methods, we formally establish asymptotic coverage guarantees, and demonstrate both via theory and empirical experiments that AIPCW substantially improves efficiency over IPCW in the sense that its coverage error bound is of second-order mixed bias type, that is \emph{doubly robust}, and therefore guaranteed to be asymptotically negligible relative to the coverage error of IPCW.

Download

Fig. 2: Hypothetical graph of causal mediation pathways.
Real-world effectiveness and causal mediation study of BNT162b2 on long COVID risks in children and adolescents

December 2024

·

8 Reads

·

1 Citation

EClinicalMedicine

Qiong Wu

·

Bingyu Zhang

·

·

[...]

·

Yong Chen

Background The impact of pre-infection vaccination on the risk of long COVID remains unclear in the pediatric population. We aim to assess the effectiveness of BNT162b2 on long COVID risks with various strains of the SARS-CoV-2 virus in children and adolescents, using comparative effectiveness methods. We further explore if such pre-infection vaccination can mitigate the risk of long COVID beyond its established protective benefits against SARS-CoV-2 infection using causal mediation analysis. Methods We conducted real-world vaccine effectiveness study and mediation analysis using data from twenty health systems in the RECOVER PCORnet electronic health record (EHR) Program. Three independent cohorts were constructed including adolescents (12–20 years) during the Delta phase (July 1–November 30, 2021), children (5–11 years) and adolescents (12–20 years) during the Omicron phase (January 1–November 30, 2022). The intervention is first dose of the BNT162b2 vaccine in comparison with no receipt of COVID-19 vaccine. The outcomes of interest include conclusive or probable diagnosis of long COVID following a documented SARS-CoV-2 infection, and body-system-specific condition clusters of post-acute sequelae of SARS-CoV-2 infection (PASC), such as cardiac, gastrointestinal, musculoskeletal, respiratory, and syndromic categories. The effectiveness was reported as (1-relative risk)∗100 and mediating effects were reported as relative risks. Findings 112,590 adolescents (88,811 vaccinated) were included in the cohort for the analysis against Delta variant, and 188,894 children (101,277 vaccinated), and 84,735 adolescents (37,724 vaccinated) were included for the analysis against Omicron variant. During the Delta period, the estimated effectiveness of the BNT162b2 vaccine against long COVID among adolescents was 95.4% (95% CI: 90.9%–97.7%). During the Omicron phase, the estimated effectiveness against long COVID among children was 60.2% (95% CI: 40.3%–73.5%) and 75.1% (95% CI: 50.4%–87.5%) among adolescents. The direct effect of vaccination, defined as the effect beyond their impact on SARS-CoV-2 infections, was found to be statistically non-significant in all three study cohorts, with estimated relative risk of 1.08 (95% CI: 0.75–1.55) in the Delta study among adolescents, 1.24 (95% CI: 0.92–1.66) among children and 0.91 (95% CI: 0.69–1.19) among adolescents in the Omicron studies. Meanwhile, the estimated indirect effects, which are effects through protecting SARS-CoV-2 infections, were estimated as 0.04 (95% CI: 0.03–0.05) among adolescents during Delta phase, 0.31 (95% CI: 0.23–0.42) among children and 0.21 (95% CI: 0.16–0.27) among adolescents during the Omicron period. Interpretation Our study suggests that BNT162b2 was effective in reducing risk of long COVID outcomes in children and adolescents during the Delta and Omicron periods. The mediation analysis indicates the vaccine’s effectiveness is primarily derived from its role in reducing the risk of SARS-CoV-2 infection. Funding 10.13039/100000002National Institutes of Health.


Using negative controls to identify causal effects with invalid instrumental variables

November 2024

·

3 Reads

·

1 Citation

Biometrika

Many proposals for the identification of causal effects require an instrumental variable that satisfies strong, untestable unconfoundedness and exclusion restriction assumptions. In this paper, we show how one can potentially identify causal effects under violations of these assumptions by harnessing a negative control population or outcome. This strategy allows one to leverage sup-populations for whom the exposure is degenerate, and requires that the instrument-outcome association satisfies a certain parallel trend condition. We develop the semiparametric efficiency theory for a general instrumental variable model, and obtain a multiply robust, locally efficient estimator of the average treatment effect in the treated. The utility of the estimators is demonstrated in simulation studies and an analysis of the Life Span Study.


Predictors of HIV seroconversion in Botswana: machine learning analysis in a representative, population-based HIV incidence cohort

November 2024

·

17 Reads

AIDS (London, England)

Objective To identify predictors of HIV acquisition in Botswana. Design We applied machine learning approaches to identify HIV risk predictors using existing data from a large, well-characterized HIV incidence cohort. Methods We applied machine learning (randomForestSRC) to analyze data from a large population-based HIV incidence cohort enrolled in a cluster-randomized HIV prevention trial in 30 communities across Botswana. We sought to identify the most important risk factors for HIV acquisition, starting with 110 potential predictors. Results During a median 29-month follow-up of 8,551 HIV-negative adults, 147 (1.7%) acquired HIV. Our machine learning analysis found that for females, the most important variables for predicting HIV acquisition were the use of injectable hormonal contraception, frequency of sex in the prior 3 months with the most recent partner and residing in a community with HIV prevalence of 29% or higher. For the small proportion (0.3%) of females who had all three risk factors, their estimated probability of acquiring HIV during 29 months of follow-up was 34% (approximate annual incidence of 14%). For males, non-long-term relationships with the most recent partner and community HIV prevalence of 34% or higher were the most important HIV risk predictors. The 6% of males who had both risk factors had a 5.1% probability of acquiring HIV during the follow-up period (approximate annual incidence of 2.1%). Conclusions Machine learning approaches allowed us to analyze a large number of variables to efficiently identify key factors strongly predictive of HIV risk. These factors could help target HIV prevention interventions in Botswana. Clinical Trials Registration NCT01965470


The Nudge Average Treatment Effect

October 2024

·

8 Reads

The instrumental variable method is a prominent approach to recover under certain conditions, valid inference about a treatment causal effect even when unmeasured confounding might be present. In a groundbreaking paper, Imbens and Angrist (1994) established that a valid instrument nonparametrically identifies the average causal effect among compliers, also known as the local average treatment effect under a certain monotonicity assumption which rules out the existence of so-called defiers. An often-cited attractive property of monotonicity is that it facilitates a causal interpretation of the instrumental variable estimand without restricting the degree of heterogeneity of the treatment causal effect. In this paper, we introduce an alternative equally straightforward and interpretable condition for identification, which accommodates both the presence of defiers and heterogenous treatment effects. Mainly, we show that under our new conditions, the instrumental variable estimand recovers the average causal effect for the subgroup of units for whom the treatment is manipulable by the instrument, a subgroup which may consist of both defiers and compliers, therefore recovering an effect estimand we aptly call the Nudge Average Treatment Effect.


Regression-Based Proximal Causal Inference

September 2024

·

7 Reads

·

1 Citation

American Journal of Epidemiology

Negative controls are increasingly used to evaluate the presence of potential unmeasured confounding in observational studies. Beyond the use of negative controls to detect the presence of residual confounding, proximal causal inference (PCI) was recently proposed to de-bias confounded causal effect estimates, by leveraging a pair of treatment and outcome negative control or confounding proxy variables. While formal methods for statistical inference have been developed for PCI, these methods can be challenging to implement as they involve solving complex integral equations that are typically ill-posed. We develop a regression-based PCI approach, employing two-stage generalized linear regression models (GLMs) to implement PCI, which obviates the need to solve difficult integral equations. The proposed approach has merit in that (i) it is applicable to continuous, count, and binary outcomes cases, making it relevant to a wide range of real-world applications, and (ii) it is easy to implement using off-the-shelf software for GLMs. We establish the statistical properties of regression-based PCI and illustrate their performance in both synthetic and real-world empirical applications.


Batch Predictive Inference

September 2024

·

2 Reads

Constructing prediction sets with coverage guarantees for unobserved outcomes is a core problem in modern statistics. Methods for predictive inference have been developed for a wide range of settings, but usually only consider test data points one at a time. Here we study the problem of distribution-free predictive inference for a batch of multiple test points, aiming to construct prediction sets for functions -- such as the mean or median -- of any number of unobserved test datapoints. This setting includes constructing simultaneous prediction sets with a high probability of coverage, and selecting datapoints satisfying a specified condition while controlling the number of false claims. For the general task of predictive inference on a function of a batch of test points, we introduce a methodology called batch predictive inference (batch PI), and provide a distribution-free coverage guarantee under exchangeability of the calibration and test data. Batch PI requires the quantiles of a rank ordering function defined on certain subsets of ranks. While computing these quantiles is NP-hard in general, we show that it can be done efficiently in many cases of interest, most notably for batch score functions with a compositional structure -- which includes examples of interest such as the mean -- via a dynamic programming algorithm that we develop. Batch PI has advantages over naive approaches (such as partitioning the calibration data or directly extending conformal prediction) in many settings, as it can deliver informative prediction sets even using small calibration sample sizes. We illustrate that our procedures provide informative inference across the use cases mentioned above, through experiments on both simulated data and a drug-target interaction dataset.


Figure 1: A possible directed acyclic graph (DAG) of the causal relationship for variables that satisfy the required negative control independence assumptions. Dashed arrows indicate effects that are not required.
Figure 2: Bias (left) and coverage of 95% confidence intervals (right) of three methods for β A , with c U = 1 (top), 0.2 (middle) or 0 (bottom).
Regression-based proximal causal inference for right-censored time-to-event data

September 2024

·

24 Reads

Unmeasured confounding is one of the major concerns in causal inference from observational data. Proximal causal inference (PCI) is an emerging methodological framework to detect and potentially account for confounding bias by carefully leveraging a pair of negative control exposure (NCE) and outcome (NCO) variables, also known as treatment and outcome confounding proxies. Although regression-based PCI is well developed for binary and continuous outcomes, analogous PCI regression methods for right-censored time-to-event outcomes are currently lacking. In this paper, we propose a novel two-stage regression PCI approach for right-censored survival data under an additive hazard structural model. We provide theoretical justification for the proposed approach tailored to different types of NCOs, including continuous, count, and right-censored time-to-event variables. We illustrate the approach with an evaluation of the effectiveness of right heart catheterization among critically ill patients using data from the SUPPORT study. Our method is implemented in the open-access R package 'pci2s'.


Figure 3. Mean estimates of HIV-1 transmissions that occurred to recipients in intervention communities and control communities in the BCPP trial from different sources of infection. The barplots show, in increasing shades of blue, the estimated proportions of HIV-1 transmissions to recipients in intervention communities and control communities from individuals in the same community (intervention: 2.9% [95% CI: 0.8− 10.4], control: 8.2% [7.6− 24.9]), communities in the same trial arm (intervention: 1.9% [0.6− 4.4], control: 4.0% [3.1− 4.3]), communities in the opposite trial arm (intervention: 5.0% [4.5− 5.2], control: 2.2% [0.7− 5.2]), and from non-trial communities (intervention: 90.1% [81.1− 93.1], control: 85.6% [73.6− 90.5]). The mean estimate of the proportion of HIV-1 transmissions to recipients in intervention communities from intervention sources, that is, from individuals in the same intervention community and from individuals in other intervention communities was 4.9% [95% CI: 1.7− 14.4].
Figure 4. Counterfactual estimates of HIV-1 transmissions into BCPP trial communities showing the impact of a nationwide intervention. The grouped barplot shows the estimated number of transmissions to recipients in trial communities in the presence and absence of a nationwide intervention. Among the BCPP trial communities shown intervention communities are distinguished from control communities with an asterisk. The BCPP trial matched communities into 15 pairs based on geographical proximity to major urban areas, population-size and age structure, and access to health services. On average, a nationwide intervention could have reduced transmissions to recipients in trial communities by 59% [95% CI: 3− 87]. "Digawana" intervention community was excluded because there were no successfully sequenced post-baseline samples in the community that met inclusion criteria for phylogenetic analysis.
Unpacking sources of transmission in HIV prevention trials with deep–sequence pathogen data — BCPP/ Ya Tsie study

August 2024

·

38 Reads

To develop effective HIV prevention strategies that can guide public health policy it is important to identify the main sources of infection in HIV prevention studies. Accordingly, we devised a statistical approach that leverages deep– (or next generation) sequenced pathogen data to estimate the relative contribution of different sources of infection in community–randomized trials of infectious disease prevention. We applied this approach to the Botswana Combination Prevention Project (BCPP) and estimated that 90% [95% Confidence Interval (CI): 81 – 93] of new infections that occurred in individuals in communities that received combination prevention (including universal HIV test–and–treat) originated from individuals residing in communities outside of the trial area. We estimate that the relative impact of the intervention was greater in rural geographically isolated communities with limited opportunity for imported infections compared to communities neighboring major urban centers. Treating people with HIV limits the spread of infection to uninfected individuals; accordingly, counterfactual modeling scenarios estimated that a nationwide application of the intervention could have reduced transmissions to recipients in trial communities by 59% [3 – 87], much higher than the observed 30% reduction. Our results suggest that the impact of the BCPP trial intervention was substantially limited by sources of transmission outside the trial area, and that the impact of the intervention could be considerably larger if applied nationally. We recommend that studies of infectious disease prevention consider the impact of sources of transmission beyond the reach of the intervention when designing and evaluating interventions to inform public health programs.



Citations (46)


... In the construction of our COVID-19 positive cohort, we began by identifying individuals who received their rst positive COVID-19 PCR, antigen, or serology test and a diagnosis of COVID-19/PASC within the study period from March 1st, 2020, to December 3rd, 2022 (N = 1,017,542). From this initial group, we subsequently ltered for those with at least one medical visit occurring between 28 and 179 days after the index date (follow-up interval) [24][25][26][27] (N = 787,370) and at least one visit within the 7 days to 24 months leading up to the index date (baseline interval) (N = 676,582). We included only the patients with complete variable records (n = 488,606), and we re ned the positive cohort with age constraints between ve and twenty when the study period starts and complete records (N = 326,074). ...

Reference:

Does SARS-CoV-2 Infection Increase Risk of Neuropsychiatric and Related Conditions? Findings from Difference-in-Differences Analyses
Real-world effectiveness and causal mediation study of BNT162b2 on long COVID risks in children and adolescents

EClinicalMedicine

... In fact, a recent, large population study from the United States has also found a significant role of vaccination in reducing the risk of Long Covid in children. 66 Our study is not without limitations. First, this is a single-center study, although all children with a positive Sars-CoV-2 infection were referred to participate from outpatient family pediatricians and not only from our Institution. ...

Real-world Effectiveness and Causal Mediation Study of BNT162b2 on Long COVID Risks in Children and Adolescents

... In the case of continuous outcomes, under linear models for NCO W and primary outcome Y , Tchetgen Tchetgen et al. 15 establish that the corresponding proximal g-computation algorithm can be implemented by following a two-stage least-squares procedure. Recently, Liu et al. 17 extended the two-stage regression approach to handle binary, polytomous, or count outcomes under a familiar generalized linear model formulation in a point exposure setting. As they demonstrate, the regression-based approach is appealing for routine application of PCI as it circumvents the need to solve certain complicated integral equations typically involved in nonparametric PCI estimation 18 . ...

Regression-Based Proximal Causal Inference
  • Citing Article
  • September 2024

American Journal of Epidemiology

... In addition, under recent developments in the proximal causal inference framework, negative controls are viewed as a version of proxy variables for sources of bias (197). The proximal causal inference framework expands the potential utility of proxy variables in bias correction by acknowledging the presence of different sources of bias with imperfect information (e.g., unmeasured common cause or descendent, unmeasured mediator, censored data) using proxies to make inferences on these latent variables of interest (197)(198)(199). ...

Causal inference with hidden mediators
  • Citing Article
  • July 2024

Biometrika

... Weak IVs can pose problems, and the binary nature of the IV used in the method proposed by Cui and Tchetgen Tchetgen [5] may limit its application [24]. Recently, a new method for solving the optimal ITRs estimation problem under endogeneity has been developed [24], based on the idea of proximal causal inference [16,32]. Proximal causal inference allows causal effect identification based on the observed data through either treatment-inducing confounding proxies or outcome-inducing confounding proxies when the hypothesis that no unmeasured confounding is violated. ...

A confounding bridge approach for double negative control inference on causal effects
  • Citing Article
  • August 2024

Statistical Theory and Related Fields

... Latent mixture models can still be used if they are deemed reasonable for the specific problem, and alternative identification strategies such as those using proxies of the unmeasured common causes remain useful (see e.g. Tchetgen Tchetgen et al. 2024). ...

An Introduction to Proximal Causal Inference
  • Citing Article
  • August 2024

Statistical Science

... to name a few, Li (2020) and Viviano and Bradic (2023) assume that the observed outcomes {Y t } t≤T 0 and {Y t } t>T 0 are both stationary and Qiu et al. (2024) assume that the unobserved confounders are stationary in pre-intervention and post-intervention periods separately. We relax the mean stationary condition of {F t } t>T 0 in Section 5. ...

Doubly robust proximal synthetic controls

Biometrics

... The proxy literature Ghassami et al., 2022) extends surrogate models to deal with unobserved confounding, but has the same limitation; it may have arbitrary bias when the RSV is a post-outcome variable. It appears that only one strategy within the proxy literature uses only one auxiliary variable (Park, Richardson and Tchetgen Tchetgen, 2024). However, that strategy provides no guidance on how to deal with incomplete observations in program evaluation. ...

Single proxy control
  • Citing Article
  • March 2024

Biometrics

... There has been exciting progress in the development of methods for MNAR data in recent years. One notable thread of works utilize auxiliary variables such as instrumental variables [25][26][27] and shadow variables 28,29 . Matrix completion with MNAR data has also received significant attention [30][31][32][33][34] . ...

Identification and Semiparametric Efficiency Theory of Nonignorable Missing Data with a Shadow Variable
  • Citing Article
  • April 2024

... Meanwhile, approaches like the weighted generalized linear regression (WLR) , PCA-IVW , and 2ScML (Xue et al., 2023), can handle correlated SNPs without relying on prior distributions, but face challenges with weak IVs. GENIUS-MAWII (Ye et al., 2024) can incorporate correlated weak and invalid IVs. However, it requires access to individual-level data and requires the conditional distribution of the exposure given the genotype to be heteroscedastic to identify the causal effect. ...

GENIUS-MAWII: for robust Mendelian randomization with many weak invalid instruments
  • Citing Article
  • March 2024

Journal of the Royal Statistical Society Series B (Statistical Methodology)