ArticlePublisher preview available

Variable selection and estimation in causal inference using Bayesian spike and slab priors

SAGE Publications Inc
Statistical Methods in Medical Research
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Unbiased estimation of causal effects with observational data requires adjustment for confounding variables that are related to both the outcome and treatment assignment. Standard variable selection techniques aim to maximize predictive ability of the outcome model, but they ignore covariate associations with treatment and may not adjust for important confounders weakly associated to outcome. We propose a novel method for estimating causal effects that simultaneously considers models for both outcome and treatment, which we call the bilevel spike and slab causal estimator (BSSCE). By using a Bayesian formulation, BSSCE estimates the posterior distribution of all model parameters and provides straightforward and reliable inference. Spike and slab priors are used on each covariate coefficient which aim to minimize the mean squared error of the treatment effect estimator. Theoretical properties of the treatment effect estimator are derived justifying the prior used in BSSCE. Simulations show that BSSCE can substantially reduce mean squared error over numerous methods and performs especially well with large numbers of covariates, including situations where the number of covariates is greater than the sample size. We illustrate BSSCE by estimating the causal effect of vasoactive therapy vs. fluid resuscitation on hypotensive episode length for patients in the Multiparameter Intelligent Monitoring in Intensive Care III critical care database.
This content is subject to copyright.
Article
Variable selection and estimation in
causal inference using Bayesian spike
and slab priors
Brandon Koch
1
, David M Vock
2
, Julian Wolfson
2
and
Laura Boehm Vock
3
Abstract
Unbiased estimation of causal effects with observational data requires adjustment for confounding variables that are
related to both the outcome and treatment assignment. Standard variable selection techniques aim to maximize pre-
dictive ability of the outcome model, but they ignore covariate associations with treatment and may not adjust for
important confounders weakly associated to outcome. We propose a novel method for estimating causal effects that
simultaneously considers models for both outcome and treatment, which we call the bilevel spike and slab causal
estimator (BSSCE). By using a Bayesian formulation, BSSCE estimates the posterior distribution of all model parameters
and provides straightforward and reliable inference. Spike and slab priors are used on each covariate coefficient which
aim to minimize the mean squared error of the treatment effect estimator. Theoretical properties of the treatment
effect estimator are derived justifying the prior used in BSSCE. Simulations show that BSSCE can substantially reduce
mean squared error over numerous methods and performs especially well with large numbers of covariates, including
situations where the number of covariates is greater than the sample size. We illustrate BSSCE by estimating the causal
effect of vasoactive therapy vs. fluid resuscitation on hypotensive episode length for patients in the Multiparameter
Intelligent Monitoring in Intensive Care III critical care database.
Keywords
Bayesian methods, causal inference, high-dimensional data, spike and slab, variable selection
1 Introduction
Inferring the causal effect of a treatment, exposure, or intervention (hereafter referred to as “treatment”) on some
outcome or response is often the primary goal of the study. Randomizing treatment assignment is the gold
standard for estimating causal treatment effects but is unethical, infeasible, or not cost-effective in many situa-
tions. When treatment is not randomized, confounding variables those associated with both treatment and
outcome can induce bias if unaccounted in the treatment effect estimator.
1
There are many ways to adjust for
confounding variables. G-computation treatment effect estimation
2
uses a model for the outcome as a function of
treatment and covariates to adjust for confounding. Alternatively, many approaches use only a model for the
treatment as a function of covariates, including inverse-probability weighting
3
and propensity score matching.
4
Moreover, some methods postulate models for both the outcome and treatment and are doubly robust (e.g.
augmented inverse-probability weighting,
5
targeted maximum likelihood estimation,
6
and model averaged
1
School of Community Health Sciences, University of Nevada, Reno, USA
2
Division of Biostatistics, University of Minnesota, Minneapolis, USA
3
Department of Mathematics, Computer Science, and Statistics, Gustavus Adolphus College, St. Peter, USA
Corresponding author:
Brandon Koch, School of Community Health Sciences, University of Nevada, Reno, NV 89557, USA.
Email: bkoch@unr.edu
Statistical Methods in Medical Research
2020, Vol. 29(9) 2445–2469
!The Author(s) 2020
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/0962280219898497
journals.sagepub.com/home/smm
... Still, it is highly desirable to try to adjust for confounding in our statistical models to the best of our ability. This is termed (perhaps somewhat unfortunately) causal inference in the literature [1,2,3,4,5,6,7,8,9,10]. This is also the approach that we will follow here, under the disclaimer that whether actual causality can be inferred remains a subject of interpretation and conjecture specific to the situation being studied. ...
... In this paper, we take inspiration from the approach of Koch et al. [10], who proposed a bi-level spike and slab prior for causal effect estimation in high dimensional problems (i.e. when number of predictors is larger than the number of observations). They considered a data-driven adaptive approach to propose their prior which reduces the variance of the causal estimate. ...
... We use these studies to compare our method with three other approaches. From now on, for the sake of illustration, we use the following acronyms: RBCE for robust Bayesian causal estimation (our method); SSCE for spike and slab causal estimation [10]; BSSCE for bi-level spike and slab causal estimation [10]; and BSSL for Bayesian spike and slab LASSO [20]. ...
Preprint
Full-text available
Causal effect estimation is a critical task in statistical learning that aims to find the causal effect on subjects by identifying causal links between a number of predictor (or, explanatory) variables and the outcome of a treatment. In a regressional framework, we assign a treatment and outcome model to estimate the average causal effect. Additionally, for high dimensional regression problems, variable selection methods are also used to find a subset of predictor variables that maximises the predictive performance of the underlying model for better estimation of the causal effect. In this paper, we propose a different approach. We focus on the variable selection aspects of high dimensional causal estimation problem. We suggest a cautious Bayesian group LASSO (least absolute shrinkage and selection operator) framework for variable selection using prior sensitivity analysis. We argue that in some cases, abstaining from selecting (or, rejecting) a predictor is beneficial and we should gather more information to obtain a more decisive result. We also show that for problems with very limited information, expert elicited variable selection can give us a more stable causal effect estimation as it avoids overfitting. Lastly, we carry a comparative study with synthetic dataset and show the applicability of our method in real-life situations.
... In this paper we take inspiration from the approach of Koch et al. [11], who proposed a bi-level spike and slab prior for causal effect estimation. They considered a data-driven adaptive approach to propose their prior which reduces the variance of the causal estimate. ...
... We present our analyses in Table 1 and Table 2. For the sake of clarity we use the following accronyms: RBCE for robust Bayesian causal estimation (our method); SSCE for spike and slab causal estimation [11]; BSSCE for bi-level spike and slab causal estimation [11]; and BSSL for Bayesian spike and slab lasso [15]. As it can be seen from both the tables, SSCE and BSSCE are formulated for problems where p ≤ n and therefore we do not have any results for n < 50. ...
... We present our analyses in Table 1 and Table 2. For the sake of clarity we use the following accronyms: RBCE for robust Bayesian causal estimation (our method); SSCE for spike and slab causal estimation [11]; BSSCE for bi-level spike and slab causal estimation [11]; and BSSL for Bayesian spike and slab lasso [15]. As it can be seen from both the tables, SSCE and BSSCE are formulated for problems where p ≤ n and therefore we do not have any results for n < 50. ...
Chapter
Causal inference concerns finding the treatment effect on subjects along with causal links between the variables and the outcome. However, the underlying heterogeneity between subjects makes the problem practically unsolvable. Additionally, we often need to find a subset of explanatory variables to understand the treatment effect. Currently, variable selection methods tend to maximise the predictive performance of the underlying model, and unfortunately, under limited data, the predictive performance is hard to assess, leading to harmful consequences. To address these issues, in this paper, we consider a robust Bayesian analysis which accounts for abstention in selecting explanatory variables in the high dimensional regression model. To achieve that, we consider a set of spike and slab priors through prior elicitation to obtain a set of posteriors for both the treatment and outcome model. We are specifically interested in the sensitivity of the treatment effect in high dimensional causal inference as well as identifying confounder variables. However, confounder selection can be deceptive in this setting, especially when a predictor is strongly associated with either the treatment or the outcome. To avoid that we apply a post-hoc selection scheme, attaining a smaller set of confounders as well as separate sets of variables which are only related to treatment or outcome model. Finally, we illustrate our method to show its applicability.
... In this paper we take inspiration from the approach of Koch et al. [8], who proposed a bi-level spike and slab prior for causal effect estimation. They considered a data-driven adaptive approach to propose their prior which reduces the variance of the causal estimate. ...
... We present our analyses in Table 1 and Table 2. For the sake of clarity we use the following accronyms: RBCE for robust Bayesian causal estimation (our method); SSCE for spike and slab causal estimation [8]; BSSCE for bi-level spike and slab causal estimation [8]; and BSSL for Bayesian spike and slab lasso [17]. As it can be seen from both the tables, SSCE and BSSCE are formulated for problems where p ≤ n and therefore we do not have any results for n < 50. ...
... We present our analyses in Table 1 and Table 2. For the sake of clarity we use the following accronyms: RBCE for robust Bayesian causal estimation (our method); SSCE for spike and slab causal estimation [8]; BSSCE for bi-level spike and slab causal estimation [8]; and BSSL for Bayesian spike and slab lasso [17]. As it can be seen from both the tables, SSCE and BSSCE are formulated for problems where p ≤ n and therefore we do not have any results for n < 50. ...
Preprint
Full-text available
Causal inference using observational data is an important aspect in many fields such as epidemiology, social science, economics, etc. In particular, our goal is to find the treatment effect on the subjects along with the causal links between the variables and the outcome. However, estimation for such problems are extremely difficult as the treatment effects may vary from subject to subject and modelling the underlying heterogeneity explicitly makes the problem practically unsolvable. Another issue we often face is the dimensionality of the problem and we need to find a subset of explanatory variables to initiate the treatment. However, currently variable selection methods tend to maximise the predictive performance of the outcome model only. This can be problematic in the case of limited information. As the consequence of mistreatment can be harmful. So, in this paper, we suggest a general framework with robust Bayesian analysis which accounts for abstention in deciding an explanatory variable in the high dimensional regression model. To achieve that, we consider a set of spike and slab priors through prior elicitation to obtain robust estimates for both the treatment and outcome model. We are specifically interested in the sensitivity of the treatment effect in the high dimensional causal inference as well as the identifying the confounder variables by means of variable selection. However, indicator based confounder selection can be deceptive in some cases. Especially, when the predictor is strongly associated with either the treatment or the outcome. This increases the posterior expectation of the selection indicators. To avoid that we apply a post-hoc selection scheme which successfully remove negligible non-zero effects from the model attaining a smaller set of confounders. Finally, we illustrate our result using synthetic dataset.
... Ertefaie et al. 7 developed a penalized objective function by employing both the outcome and treatment models to do a variable selection. Assuming Spike and slab priors for covariate coefficients, Koch et al. 8 explored a Bayesian method to estimate causal effects with outcome and treatment models simultaneously employed. Ghosh et al. 9 proposed the "multiply impute, then select" approach by employing the lasso method. ...
... Let X * Ii (k, ψ) denote the subvector of X * i (k, ψ) corresponding to X * Ii , generated from Step 1. Using the selected treatment model (8) with X Ii replaced by X * Ii (k, ψ), we calculate the fitted value π i (k, ψ) ≜ g(X * Ii (k, ψ), Z Ii ; γ I ). Then we obtain an estimate, say, τ(k, ψ), of τ 0 using (2) with π i replaced by π i (k, ψ), and calculate ...
... When V is equal to Σ −1 , the covariance matrix of γ I is no greater than the covariance variance of γ I in the Loewner order, where γ I is the subvector of the SIMEX estimator γ corresponding to γ I . Theorem 3.1(a) establishes the asymptotic distribution for the estimators for the effects corresponding to important pre-treatment variables in model (3), or equivalently, for the estimators of the parameters for the selected treatment model (8). Theorem 3.1(b) ensures the oracle property in the sense of Fan and Li 23 for the variable selection procedure for building the final treatment model (8). ...
Article
Full-text available
In the framework of causal inference, the inverse probability weighting estimation method and its variants have been commonly employed to estimate the average treatment effect. Such methods, however, are challenged by the presence of irrelevant pre-treatment variables and measurement error. Ignoring these features and naively applying the usual inverse probability weighting estimation procedures may typically yield biased inference results. In this article, we develop an inference method for estimating the average treatment effect with those features taken into account. We establish theoretical properties for the resulting estimator and carry out numerical studies to assess the finite sample performance of the proposed estimator.
... In recent years, several Bayesian linear models have been proposed for genetic studies using the SNP data, including Bayesian sparse linear mixed models, Bayesian spike-and-slab regression models, and Bayesian variable selection models [7]. These models have been used to identify genetic variants associated with complex traits, predict the traits using the SNP data, and identify genetic pathways involved in disease pathogenesis. ...
... In the literature, variable selection for causal inference (e.g., [14][15][16]) or measurement error correction (e.g., [17][18][19][20][21][22]) are discussed under various settings. However, in the concurrent presence of both features, limited work has been carried out to estimate ATE except for [23]. ...
Article
Full-text available
In causal inference, the estimation of the average treatment effect is often of interest. For example, in cancer research, an interesting question is to assess the effects of the chemotherapy treatment on cancer, with the information of gene expressions taken into account. Two crucial challenges in this analysis involve addressing measurement error in gene expressions and handling noninformative gene expressions. While analytical methods have been developed to address those challenges, no user-friendly computational software packages seem to be available to implement those methods. To close this gap, we develop an R package, called AteMeVs, to estimate the average treatment effect using the inverse-probability-weighting estimation method to handle data with both measurement error and spurious variables. This developed package accommodates the method proposed by Yi and Chen (2023) as a special case, and further extends its application to a broader scope. The usage of the developed R package is illustrated by applying it to analyze a cancer dataset with information of gene expressions.
Article
In this paper, we study causal inference with complex and noisy data accommodated. A new structure is called CHEMIST, which refers to Causal inference with High-dimensional Error-prone covariates and MISclassified Treatments. To suitably tackle those challenges when estimating the average treatment effect (ATE), we develop the FATE method, which reflects Feature screening, Adaptive lasso, Treatment adjustment, and Error elimination in covariates, to handle variable selection and measurement error correction. Under informative and error-eliminated data, we can estimate the ATE. To make our strategy available for public use, we develop a new R package CHEMIST, which provides functions for users to estimate the ATE. With the flexibility of arguments, one can examine different scenarios based on our package. In this paper, we introduce the FATE method and the implementation in the R package CHEMIST. Moreover, we demonstrate applications in two real data sets.
Article
PurposeNonlinear system identification heavily relies on the accuracy of nonlinear unit model selection. To improve identification accuracy, the Sparse Bayesian Learning method is incorporated into the nonlinear subspace. Enhanced nonlinear subspace identification is proposed.Methods The nonlinear term in the system is treated as an internal excitation. By applying low-level excitation, the response of the structure can be approximated as linear, allowing for the determination of the linear frequency response function of the structure. High-level excitation is then applied to separate the response caused by intrinsic nonlinear force excitation. The type of nonlinearity is evaluated using Spike-and-Slab Priors for Sparse Bayesian Learning. Finally, the screened nonlinear elements are substituted into subspace identification to determine nonlinear parameters.ResultsThe effectiveness of this method in dealing with nonlinear stiffness and damping is verified through a simulation example and its robustness is further discussed. Experiments on negative stiffness systems also demonstrate the method's good applicability when dealing with complex damping.Conclusion Incorporating the Sparse Bayesian Learning method into the nonlinear subspace significantly improves the accuracy of nonlinear system identification. The proposed approach effectively deals with nonlinear stiffness and damping, as validated by simulation results. The method's robustness is further demonstrated through extensive discussions, while experiments on negative stiffness systems showcase its applicability in complex damping scenarios.
Article
Full-text available
Methodological advancements, including propensity score methods, have resulted in improved unbiased estimation of treatment effects from observational data. Traditionally, a "throw in the kitchen sink" approach has been used to select covariates for inclusion into the propensity score, but recent work shows including unnecessary covariates can impact both the bias and statistical efficiency of propensity score estimators. In particular, the inclusion of covariates that impact exposure but not the outcome, can inflate standard errors without improving bias, while the inclusion of covariates associated with the outcome but unrelated to exposure can improve precision. We propose the outcome-adaptive lasso for selecting appropriate covariates for inclusion in propensity score models to account for confounding bias and maintaining statistical efficiency. This proposed approach can perform variable selection in the presence of a large number of spurious covariates, that is, covariates unrelated to outcome or exposure. We present theoretical and simulation results indicating that the outcome-adaptive lasso selects the propensity score model that includes all true confounders and predictors of outcome, while excluding other covariates. We illustrate covariate selection using the outcome-adaptive lasso, including comparison to alternative approaches, using simulated data and in a survey of patients using opioid therapy to manage chronic pain.
Article
Full-text available
This paper investigates the use of regularization priors in the context of treatment effect estimation using observational data where the number of control variables is large relative to the number of observations. First, the phenomenon of "regularization-induced confounding" is introduced, which refers to the tendency of regularization priors to adversely bias treatment effect estimates by over-shrinking control variable regression coefficients. Then, a simultaneous regression model is presented which permits regularization priors to be specified in a way that avoids this unintentional "re-confounding". The new model is illustrated on synthetic and empirical data.
Article
Full-text available
MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.
Article
Full-text available
In the causal adjustment setting, variable selection techniques based on one of either the outcome or treatment allocation model can result in the omission of confounders, which leads to bias, or the inclusion of spurious variables, which leads to variance inflation, in the propensity score. We propose a variable selection method based on a penalized objective function which considers the outcome and treatment assignment models simultaneously. The proposed method facilitates confounder selection in high-dimensional settings. We show that under regularity conditions our method attains the oracle property. The selected variables are used to form a doubly robust regression estimator of the treatment effect. We show that under some conditions our method attains the oracle property. Simulation results are presented and economic growth data are analyzed. Specifically, we study the effect of life expectancy as a measure of population health on the average growth rate of gross domestic product per capita.
Article
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Article
The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. However, it is often challenging to identify such covariates among the large number that may be measured in a given study. In this article, we propose GLiDeR (Group Lasso and Doubly Robust Estimation), a novel variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. The selected variables and corresponding coefficient estimates are used in a standard doubly robust ACE estimator. We provide asymptotic results showing that, for a broad class of data generating mechanisms, GLiDeR yields a consistent estimator of the ACE when either the outcome or treatment model is correctly specified. A comprehensive simulation study shows that GLiDeR is more efficient than doubly robust methods using standard variable selection techniques and has substantial computational advantages over a recently proposed doubly robust Bayesian model averaging method. We illustrate our method by estimating the causal treatment effect of bilateral versus single-lung transplant on forced expiratory volume in one year after transplant using an observational registry.