Figure 4 - uploaded by Nicole Bohme Carnegie
Content may be subject to copyright.

# Heat maps of the bias in the estimated treatment effect for all four sensitivity analysis techniques in the case where the treatment and response models are nonlinear. Each cell is the average of 500 simulations with the level of unmeasured confounding given by the x and y axes, expressed in units of the standard deviation of the response variable. Reported biases are averages across all grid cells. 'Abs. bias' is calculated by taking absolute values first, so that overestimation in one region is not offset by underestimation in another.

Source publication
Article
Full-text available
When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis s...

## Context in source publication

Context 1
... 3 aggregates simulation results across levels of the sensitivity parameters. In contrast, Figure 4 disaggregates the results by combinations of sensitivity parameters and displays them in the form of a heat map. The closer the color is to blue in a given grid square, the larger the treatment effect estimate; the closer to red, the smaller (greater in negative value) the estimate. ...

## Citations

... The use of synthetically generated datasets-where treatment-outcome associations are known by design and simulated patterns of confounding approximate the observed data structure-has become increasingly popular to help tailor analytic choices for causal inference. 27,28,35,36,39,[47][48][49][50][51][52] Frameworks for generating synthetic datasets have largely been based on approaches that combine real data from the given study with simulated features. The basic concept of these approaches is to take the observed data structure and use modeled relationships from the original data to simulate outcome status while leaving both treatment assignment and baseline covariates unchanged or to simulate both treatment and outcome while leaving only baseline covariates unchanged. ...
Article
Full-text available
The propensity score has become a standard tool to control for large numbers of variables in healthcare database studies. However, little has been written on the challenge of comparing large-scale propensity score analyses that use different methods for confounder selection and adjustment. In these settings, balance diagnostics are useful but do not inform researchers on which variables balance should be assessed or quantify the impact of residual covariate imbalance on bias. Here, we propose a framework to supplement balance diagnostics when comparing large-scale propensity score analyses. Instead of focusing on results from any single analysis, we suggest conducting and reporting results for many analytic choices and using both balance diagnostics and synthetically generated control studies to screen analyses that show signals of bias caused by measured confounding. To generate synthetic datasets, the framework does not require simulating the outcome-generating process. In healthcare database studies, outcome events are often rare, making it difficult to identify and model all predictors of the outcome to simulate a confounding structure closely resembling the given study. Therefore, the framework uses a model for treatment assignment to divide the comparator population into pseudo-treatment groups where covariate differences resemble those in the study cohort. The partially simulated datasets have a confounding structure approximating the study population under the null (synthetic negative control studies). The framework is used to screen analyses that likely violate partial exchangeability due to lack of control for measured confounding. We illustrate the framework using simulations and an empirical example.
... Specific procedures for sensitivity analyses have long been an area of interest in methodological causal inference research, many times in the context of violations to the assumption of ignorability/no unmeasured confounding (Lin, Psaty and Kronmal, 1998;Imai, Keele and Yamamoto, 2010;Jo and Vinokur, 2011;Stuart and Jo, 2015;Dorie et al., 2016). Prior work has investigated the sensitivity of non-IV based causal inference approaches when the exclusion restriction is not satisfied (Millimet and Tchernis, 2013). ...
Article
Estimation of local average treatment effects in randomized trials typically relies upon the exclusion restriction assumption in cases where we are unwilling to rule out the possibility of unmeasured confounding. Under this assumption, treatment effects are mediated through the post-randomization variable being conditioned upon, and directly attributable to neither the randomization itself nor its latent descendants. Recently, there has been interest in mobile health interventions to provide healthcare support. Mobile health interventions such as the Rapid Encouragement/Education and Communications for Health (REACH), designed to support self-management for adults with type 2 diabetes, often involve both one-way and interactive messages. In practice, it is highly likely that any benefit from the intervention is achieved both through receipt of the intervention content and through engagement with/response to it. Application of an instrumental variable analysis in order to understand the role of engagement with REACH (or a similar intervention) requires the traditional exclusion restriction assumption to be relaxed. We propose a conceptually intuitive sensitivity analysis procedure for the REACH randomized trial that places bounds on local average treatment effects. Simulation studies reveal this approach to have desirable finite-sample behavior and to recover local average treatment effects under correct specification of sensitivity parameters.
... When the assumption of no unobserved confounders is called into question, researchers are advised to perform sensitivity analyses, consisting of a formal and systematic assessment of the robustness of their findings against plausible violations of unconfoundedness. The problem of sensitivity analysis has been studied across several disciplines, dating back to, at least, the classical work of Cornfield et al. [1959], and with more recent works from Rosenbaum and Rubin [1983b], Robins [1999], Frank [2000], Rosenbaum [2002], Imbens [2003], Brumback et al. [2004], Altonji et al. [2005], Hosman et al. [2010], Imai et al. [2010], Vanderweele and Arah [2011], Blackwell [2013], Frank et al. [2013], , Dorie et al. [2016], Middleton et al. [2016], Oster [2017], VanderWeele and Ding [2017], Kallus and Zhou [2018], Kallus et al. [2019], Cinelli et al. [2019], Franks et al. [2020], Cinelli and Hazlett [2020a,b], , Scharfstein et al. [2021], Jesson et al. [2021], among others. Most of this work, however, either focus on a specific target estimand of interest (e.g, a causal risk-ratio, or a causal risk difference), or impose parametric assumptions on the observed data, or on the nature of unobserved confounding (or both). ...
Preprint
Full-text available
We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution -- all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects in partially linear models, or in nonseparable models with a binary treatment) the bound is shown to depend on two easily interpretable quantities: the nonparametric partial $R^2$ (Pearson's "correlation ratio") of the unobserved variables with the treatment and with the outcome. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Finally, leveraging debiased machine learning, we provide flexible and efficient statistical inference methods to estimate the components of the bounds that are identifiable from the observed distribution.
... Finally, consequences of specific violations of non-testable causal assumptions can be gauged via sensitivity analyses and robustness checks (Ding & VanderWeele, 2016;Dorie, Harada, Carnegie, & Hill, 2016;Franks, D'Amour, & Feller, 2020;Rosenbaum, 2002). ...
Article
Full-text available
Graph-based causal models are a flexible tool for causal inference from observational data. In this paper, we develop a comprehensive framework to define, identify, and estimate a broad class of causal quantities in linearly parametrized graph-based models. The proposed method extends the literature, which mainly focuses on causal effects on the mean level and the variance of an outcome variable. For example, we show how to compute the probability that an outcome variable realizes within a target range of values given an intervention, a causal quantity we refer to as the probability of treatment success. We link graph-based causal quantities defined via the do -operator to parameters of the model implied distribution of the observed variables using so-called causal effect functions. Based on these causal effect functions, we propose estimators for causal quantities and show that these estimators are consistent and converge at a rate of $$N^{-1/2}$$ N - 1 / 2 under standard assumptions. Thus, causal quantities can be estimated based on sample sizes that are typically available in the social and behavioral sciences. In case of maximum likelihood estimation, the estimators are asymptotically efficient. We illustrate the proposed method with an example based on empirical data, placing special emphasis on the difference between the interventional and conditional distribution.
... We performed a sensitivity analysis to evaluate the impact of the choice of hyperparameter in the end-node piror for BART, that is, the choice of k in the bart() function (Dorie et al. 2016). Five-fold cross-validation was used to select the optimal hyperparameter k that minimized the misclassification error. ...
Article
Full-text available
The preponderance of large-scale healthcare databases provide abundant opportunities for comparative effectiveness research. Evidence necessary to making informed treatment decisions often relies on comparing effectiveness of multiple treatment options on outcomes of interest observed in a small number of individuals. Causal inference with multiple treatments and rare outcomes is a subject that has been treated sparingly in the literature. This paper designs three sets of simulations, representative of the structure of our healthcare database study, and propose causal analysis strategies for such settings. We investigate and compare the operating characteristics of three types of methods and their variants: Bayesian Additive Regression Trees (BART), regression adjustment on multivariate spline of generalized propensity scores (RAMS) and inverse probability of treatment weighting (IPTW) with multinomial logistic regression or generalized boosted models. Our results suggest that BART and RAMS provide lower bias and mean squared error, and the widely used IPTW methods deliver unfavorable operating characteristics. We illustrate the methods using a case study evaluating the comparative effectiveness of robotic-assisted surgery, video-assisted thoracoscopic surgery and open thoracotomy for treating non-small cell lung cancer.
... The improvement is also observed for RR-BART, but not for BI-XGB and BI-LASSO. Finally, it has been shown in the literature that the BART model for binary outcomes may be sensitive to the choice of prior for the hyperparameter k (Dorie et al., 2016;. We conducted a sensitivity analysis in Web Section 2 of the Supplementary Materials to assess whether the specification for k impacts the performance of BART based methods, BI-BART and RR-BART. ...
Preprint
Full-text available
Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can recover the good performance achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets. We propose an inference-based method RR-BART, that leverages the likelihood-based Bayesian machine learning technique, Bayesian Additive Regression Trees, and uses Rubin's rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of missing data. A representative simulation study suggests that RR-BART performs at least as well as combining bootstrap with BART, BI-BART, but offers substantial computational savings, even in complex conditions of nonlinearity and nonadditivity with a large percentage of overall missingness under the MAR mechanism. RR-BART is also less sensitive to the end note prior via the hyperparameter $k$ than BI-BART, and does not depend on the selection threshold value $\pi$ as required by BI-BART. Our simulation studies also suggest that encoding the missing values of a binary predictor as a separate category significantly improves the power of selecting the binary predictor for BI-BART. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome with data from the Study of Women's Health Across the Nation.
... Combines both precision and recall, i.e. a good F1 Score would mean both false positives and false negatives are low Specific to causal inference Absolute bias estimate Sensitivity to unmeasured confounding, in treatSens the estimate of the unmeasured confounder to render the effect of the putative cause to zero ("Coeff. on U" in Dorie et al. 2016) Point estimate, difference in proportions Effect estimate comparing two treatments, e.g. in BART and other algorithms used for causal inference (see e.g. Keele & Small, 2021) Reported with 95% confidence intervals Adjusted risk difference Evaluation of effect of candidate cause; linked average treatment effect in TMLE, (e.g. in Bodnar et al 2020) Reported with 95% confidence intervals ...
... In the absence of ignorability (no unmeasured confounders), sensitivity to unmeasured confounding may severely limit the generalizability of the study findings. The treatsens package estimates the magnitude of an unmeasured confounder that would be necessary to nullify the association between a treatment and the outcome, however, domain knowledge is needed in this analysis (Dorie et al., 2016). In contexts with limited (or improperly realized) randomization, unbalanced distributions of covariates may be biasing the findings. ...
Preprint
Full-text available
The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, and research using ML for social and health research questions remains fragmented. This may be due to the separate development of research in the computational/data versus social and health sciences as well as a lack of accessible overviews and adequate training in ML techniques for non data science researchers. This paper provides a meta-mapping of research questions in the social and health sciences to appropriate ML approaches, by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, and causal inference to common research goals, such as estimating prevalence of adverse health or social outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes. This meta-mapping aims at overcoming disciplinary barriers and starting a fluid dialogue between researchers from the social and health sciences and methodologically trained researchers. Such mapping may also help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences, and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research.
... In that setting, sensitivity analysis strives to assess how far confounding would affect the conclusion of a study (for example, would the ATE be of a different sign with such a hidden confounder). Such approaches date back to a study on the effect of smoking on lung cancer (Cornfield et al., 1959), and have been further developed for both parametric (Imbens, 2003;Rosenbaum, 2005;Dorie et al., 2016;Ichino et al., 2008;Cinelli and Hazlett, 2020) and semi-parametric situations (Franks et al., 2019;Veitch and Zaveri, 2020). Typically, the analysis translates expert judgment into mathematical expression of how much the confounding affects treatment assignment and the outcome, and finally how much the estimated treatment effect is biased (giving for example bounds). ...
Preprint
Full-text available
While a randomized controlled trial (RCT) readily measures the average treatment effect (ATE), this measure may need to be shifted to generalize to a different population. Standard estimators of the target population treatment effect are based on the distributional shift in covariates, using inverse propensity sampling weighting (IPSW) or modeling response with the g-formula. However, these need covariates that are available both in the RCT and in an observational sample, which often qualifies very few of them. Here we analyze how the classic estimators behave when covariates are missing in at least one of the two datasets - RCT or observational. In line with general identifiability conditions, these estimators are consistent when including only treatment effect modifiers that are shifted in the target population. We compute the expected bias induced by a missing covariate, assuming Gaussian covariates and a linear model for the conditional ATE function. This enables sensitivity analysis for each missing covariate pattern. In addition, this method is particularly useful as it gives the sign of the expected bias. We also show that there is no gain imputing a partially-unobserved covariate. Finally we study the replacement of a missing covariate by a proxy, and the impact of imputation. We illustrate all these results on simulations, as well as semi-synthetic benchmarks using data from the Tennessee Student/Teacher Achievement Ratio (STAR), and with a real-world example from the critical care medical domain.
... Rosenbaum and Rubin [1983a] developed a methodology that handles low-dimensional measured covariates, binary treatment, binary outcome, and a binary unmeasured confounder. This approach has been extended to accommodate normally distributed outcomes [Imbens, 2003], continuous treatments and a normally distributed unmeasured confounder [Carnegie et al., 2016], and a semiparametric Bayesian approach when the treatment and unmeasured confounder are binary [Dorie et al., 2016]. ...
... Rosenbaum and Rubin [1983a] developed a methodology that handles low-dimensional measured covariates, binary treatment, binary outcome, and a binary unmeasured confounder. This approach has been extended to accommodate normally distributed outcomes [Imbens, 2003], continuous treatments and a normally distributed unmeasured confounder [Carnegie et al., 2016], and a semiparametric Bayesian approach when the treatment and unmeasured confounder are binary [Dorie et al., 2016]. ...
Preprint
Full-text available
Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We build on the work of Franks et al. (2019)and Robins (2000) by specifying non-identified sensitivity parameters that govern a contrast between the conditional (on measured covariates) distributions of the outcome under treatment (control) between treated and untreated individuals. We use semiparametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We use this influence function to construct a one-step bias-corrected estimator of the ACE. Our estimator depends on semiparametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish sufficient conditions ensuring that our estimator has root-n asymptotics. We use our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study.
... The sensitivity parameters encode the relationship between both the treatment and unobserved confounders and the outcome and unobserved confounders (e.g. see Imbens, 2003;Dorie et al., 2016). Latent confounder models are usually parameterized so that some specific values of the sensitivity parameters ψ indicate the "no unobserved confounding" case. ...
Preprint
Full-text available
Recent work has focused on the potential and pitfalls of causal identification in observational studies with multiple simultaneous treatments. On the one hand, a latent variable model fit to the observed treatments can identify essential aspects of the distribution of unobserved confounders. On the other hand, it has been shown that even when the latent confounder distribution is known exactly, causal effects are still not point identifiable. Thus, the practical benefits of latent variable modeling in multi-treatment settings remain unclear. We clarify these issues with a sensitivity analysis method that can be used to characterize the range of causal effects that are compatible with the observed data. Our method is based on a copula factorization of the joint distribution of outcomes, treatments, and confounders, and can be layered on top of arbitrary observed data models. We propose a practical implementation of this approach making use of the Gaussian copula, and establish conditions under which causal effects can be bounded. We also describe approaches for reasoning about effects, including calibrating sensitivity parameters, quantifying robustness of effect estimates, and selecting models which are most consistent with prior hypotheses.