A preview of this full-text is provided by SAGE Publications Inc.
Content available from Statistical Methods in Medical Research
This content is subject to copyright.
Article
Variable selection and estimation in
causal inference using Bayesian spike
and slab priors
Brandon Koch
1
, David M Vock
2
, Julian Wolfson
2
and
Laura Boehm Vock
3
Abstract
Unbiased estimation of causal effects with observational data requires adjustment for confounding variables that are
related to both the outcome and treatment assignment. Standard variable selection techniques aim to maximize pre-
dictive ability of the outcome model, but they ignore covariate associations with treatment and may not adjust for
important confounders weakly associated to outcome. We propose a novel method for estimating causal effects that
simultaneously considers models for both outcome and treatment, which we call the bilevel spike and slab causal
estimator (BSSCE). By using a Bayesian formulation, BSSCE estimates the posterior distribution of all model parameters
and provides straightforward and reliable inference. Spike and slab priors are used on each covariate coefficient which
aim to minimize the mean squared error of the treatment effect estimator. Theoretical properties of the treatment
effect estimator are derived justifying the prior used in BSSCE. Simulations show that BSSCE can substantially reduce
mean squared error over numerous methods and performs especially well with large numbers of covariates, including
situations where the number of covariates is greater than the sample size. We illustrate BSSCE by estimating the causal
effect of vasoactive therapy vs. fluid resuscitation on hypotensive episode length for patients in the Multiparameter
Intelligent Monitoring in Intensive Care III critical care database.
Keywords
Bayesian methods, causal inference, high-dimensional data, spike and slab, variable selection
1 Introduction
Inferring the causal effect of a treatment, exposure, or intervention (hereafter referred to as “treatment”) on some
outcome or response is often the primary goal of the study. Randomizing treatment assignment is the gold
standard for estimating causal treatment effects but is unethical, infeasible, or not cost-effective in many situa-
tions. When treatment is not randomized, confounding variables – those associated with both treatment and
outcome – can induce bias if unaccounted in the treatment effect estimator.
1
There are many ways to adjust for
confounding variables. G-computation treatment effect estimation
2
uses a model for the outcome as a function of
treatment and covariates to adjust for confounding. Alternatively, many approaches use only a model for the
treatment as a function of covariates, including inverse-probability weighting
3
and propensity score matching.
4
Moreover, some methods postulate models for both the outcome and treatment and are doubly robust (e.g.
augmented inverse-probability weighting,
5
targeted maximum likelihood estimation,
6
and model averaged
1
School of Community Health Sciences, University of Nevada, Reno, USA
2
Division of Biostatistics, University of Minnesota, Minneapolis, USA
3
Department of Mathematics, Computer Science, and Statistics, Gustavus Adolphus College, St. Peter, USA
Corresponding author:
Brandon Koch, School of Community Health Sciences, University of Nevada, Reno, NV 89557, USA.
Email: bkoch@unr.edu
Statistical Methods in Medical Research
2020, Vol. 29(9) 2445–2469
!The Author(s) 2020
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/0962280219898497
journals.sagepub.com/home/smm