Instruments for causal inference - An epidemiologist's dream?

Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, USA.
Epidemiology (Impact Factor: 6.18). 08/2006; 17(4):360-72. DOI: 10.1097/01.ede.0000222409.00878.37
Source: PubMed

ABSTRACT The use of instrumental variable (IV) methods is attractive because, even in the presence of unmeasured confounding, such methods may consistently estimate the average causal effect of an exposure on an outcome. However, for this consistent estimation to be achieved, several strong conditions must hold. We review the definition of an instrumental variable, describe the conditions required to obtain consistent estimates of causal effects, and explore their implications in the context of a recent application of the instrumental variables approach. We also present (1) a description of the connection between 4 causal models-counterfactuals, causal directed acyclic graphs, nonparametric structural equation models, and linear structural equation models-that have been used to describe instrumental variables methods; (2) a unified presentation of IV methods for the average causal effect in the study population through structural mean models; and (3) a discussion and new extensions of instrumental variables methods based on assumptions of monotonicity.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: Unobserved confounding is a well known threat to causal inference in non-experimental studies. The instrumental variable design can under certain conditions be used to recover an unbiased estimator of a treatment effect even if unobserved confounding cannot be ruled out with certainty. For continuous outcomes, two stage least squares is the most common instrumental variable estimator used in epidemiologic applications. For a rare binary outcome, an analogous linear-logistic two-stage procedure can be used. Alternatively, a control function approach is sometimes used which entails entering the residual from the first stage linear model as a covariate in a second stage logistic regression of the outcome on the treatment. Both strategies for binary response have previously formally been justified only for continuous exposure, which has impeded widespread use of the approach outside of this setting. In this note, we consider the important setting of binary exposure in the context of a binary outcome. We provide an alternative motivation for the control function approach which is appropriate for binary exposure, thus establishing simple conditions under which the approach may be used for instrumental variable estimation when the outcome is rare. In the proposed approach, the first stage regression involves a logistic model of the exposure conditional on the instrumental variable, and the second stage regression is a logistic regression of the outcome on the exposure adjusting for the first stage residual. In the event of a non-rare outcome, we recommend replacing the second stage logistic model with a risk ratio regression.
    12/2014; 3(1):107-112. DOI:10.1515/em-2014-0009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Inadequate parenting is an important public health problem with possible severe and long-term consequences related to child development. We have solid theoretical and political arguments in favor of efforts enhancing the quality of the early family environment in the population at large. However, little is known about effect of universal approaches to parenting support during the transition to parenthood. This protocol describes an experimental evaluation of group based parenting support, the Family Startup Program (FSP), currently implemented large scale in Denmark. Participants will be approximately 2500 pregnant women and partners. Inclusion criteria are parental age above 18 and the mother expecting first child. Families are recruited when attending routine pregnancy scans provided as a part of the publicly available prenatal care program at Aarhus University Hospital, Skejby. Families are randomized within four geographically defined strata to one of two conditions a) participation in FSP or b) Treatment As Usual (TAU). FSP aims to prepare new families for their roles as parents and enhance parental access to informal sources of support, i.e. social network and community resources. The program consists of twelve group sessions, with nine families in each group, continuing from pregnancy until the child is 15 months old. TAU is the publicly available pre- and postnatal care available to families in both conditions. Analyses will employ survey data, administrative data from health visitors, and administrative register based data from Statistics Denmark. All data sources will be linked via the unique Danish Civil Registration Register (CPR) identifier. Data will be obtained at four time points, during pregnancy, when the child is nine months, 18 months and seven years. The primary study outcome is measured by the Parenting Sense of Competence scale (PSOC) J Clin Child Psychol 18:167-75, 1989. Other outcomes include parenting and couple relationship quality, utility of primary sector service and child physical health, socio-emotional and cognitive development. The protocol describes an ambitious experimental evaluation of a universal group-based parenting support program; an evaluation that has not yet been made either in Denmark or internationally. ID: NCT02294968 . Registered November 14 2014.
    BMC Public Health 04/2015; 15(1):409. DOI:10.1186/s12889-015-1732-3 · 2.32 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Finding individual-level data for adequately-powered Mendelian randomization analyses may be problematic. As publicly-available summarized data on genetic associations with disease outcomes from large consortia are becoming more abundant, use of published data is an attractive analysis strategy for obtaining precise estimates of the causal effects of risk factors on outcomes. We detail the necessary steps for conducting Mendelian randomization investigations using published data, and present novel statistical methods for combining data on the associations of multiple (correlated or uncorrelated) genetic variants with the risk factor and outcome into a single causal effect estimate. A two-sample analysis strategy may be employed, in which evidence on the gene-risk factor and gene-outcome associations are taken from different data sources. These approaches allow the efficient identification of risk factors that are suitable targets for clinical intervention from published data, although the ability to assess the assumptions necessary for causal inference is diminished. Methods and guidance are illustrated using the example of the causal effect of serum calcium levels on fasting glucose concentrations. The estimated causal effect of a 1 standard deviation (0.13 mmol/L) increase in calcium levels on fasting glucose (mM) using a single lead variant from the CASR gene region is 0.044 (95 % credible interval -0.002, 0.100). In contrast, using our method to account for the correlation between variants, the corresponding estimate using 17 genetic variants is 0.022 (95 % credible interval 0.009, 0.035), a more clearly positive causal effect.
    European Journal of Epidemiology 03/2015; DOI:10.1007/s10654-015-0011-z · 5.15 Impact Factor


1 Download