Page 1

Preference-based instrumental variable methods for the

estimation of treatment effects: assessing validity and

interpreting results

M. Alan Brookhart, Ph.D. and Sebastian Schneeweiss, M.D., Sc.D.

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham

and Women’s Hospital & Harvard Medical School, Boston, MA

Abstract

Observational studies of drugs and medical procedures based on administrative data are increasingly

used to inform regulatory and clinical decisions. However, the validity of such studies is often

questioned because available data may not contain measurements of many important prognostic

variables that guide treatment decisions. Recently, approaches to this problem have been proposed

that use instrumental variables (IV) defined at the level of an individual health care provider or

aggregation of providers. Implicitly, these approaches attempt to estimate causal effects by using

differences in medical practice patterns as a quasi-experiment. Although preference-based IV

methods may usefully complement standard statistical approaches, they make assumptions that are

unfamiliar to most biomedical researchers and therefore the validity of such analyses can be hard to

evaluate. Here, we propose a simple framework based on a single unobserved dichotomous variable

that can be used to explore how violations of IV assumptions and treatment effect heterogeneity may

bias the standard IV estimator with respect to the average treatment effect in the population. This

framework suggests various ways to anticipate the likely direction of bias using both empirical data

and commonly available subject matter knowledge, such as whether medications or medical

procedures tend to be overused, underused, or often misused. This approach is described in the context

of a study comparing the gastrointestinal bleeding risk attributable to different non-steroidal anti-

inflammatory drugs.

Keywords

pharmacoepidemiology; health services research; causal inference; outcomes research; unmeasured

confounding; instrumental variables

1 Introduction

Observational studies of prescription medications and other medical interventions based on

secondary administrative data are increasingly used to inform regulatory and clinical decisions.

However, the validity of such studies is often questioned because the available data may not

contain measurements of many important prognostic variables that guide treatment decisions

such as lab values (e.g., serum cholesterol levels), clinical variables (e.g., weight, blood

pressure), aspects of lifestyle (e.g., smoking status, eating habits), and measures of cognitive

and physical functioning (Walker, 1996). This problem is believed to be particularly acute in

Address correspondence to: M. Alan Brookhart, Ph.D., Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and

Women’s Hospital, 1620 Tremont Street, Suite 3030, Boston, MA 02120.

NIH Public Access

Author Manuscript

Int J Biostat. Author manuscript; available in PMC 2009 August 3.

Published in final edited form as:

Int J Biostat. 2007 ; 3(1): 14.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 2

studies of intended effects because of the very strong correlation between treatment choice and

disease risk.

The method of instrumental variables (IV) provides one potential approach to this problem.1

Instrumental variables often arise in the context of a natural or quasi-experiment and permit

the bounding and estimation of causal effects even when important confounding variables are

unrecorded. Informally, an IV is a variable that is predictive of the treatment under study but

completely unrelated to the study outcome other than through its effect on treatment. An IV

can be thought of as a factor that induces random variation in the treatment under study. Despite

their potential to address a fundamental and pervasive problem in observational studies of

treatment effects, applications of IV methods in medical research are rare, presumably because

plausible IVs have been difficult to find.

In recent work, instrumental variables defined at the level of the geographic region (Wen and

Kramer, 1999; Brooks et al, 2003; Stuckel et al, 2007), hospital or clinic (Johnston, 2000;

Brookhart, 2007), and individual physician (Korn and Baumrind, 1998; Brookhart et al,

2006; Wang et al, 2005) have been proposed or applied in medical outcomes research.

Implicitly, these studies have attempted to estimate causal effects by assuming that a) providers

(or groups of providers) differ in their use of the treatment under study; b) patients select or

are assigned to providers independently of the provider’s patterns of use of the treatment, and

c) a provider’s use of the treatment is unrelated to use of other medical interventions that might

influence the outcome. We call such IVs “preference-based instruments” since they are derived

from the assumption that different providers or groups of providers have different preferences

or treatment algorithms dictating how medications or medical procedures are used. Although

these approaches may reduce confounding in certain circumstances, they depend on strong

assumptions that are unfamiliar to most clinical researchers and are therefore hard to evaluate.

Furthermore, in some circumstances the treatment effects identified by such instruments can

be difficult to interpret.

We attempt to illuminate these important practical issues by describing a theoretical framework

that can be used to explore the sensitivity of the standard IV estimator to different types of

violations of IV assumptions. This framework assumes the existence of a single unmeasured

dichotomous variable that can be both a confounder and a source of treatment effect

heterogeneity. We consider how empirical data and subject matter knowledge can be used

within this framework to anticipate the direction and magnitude of bias of the standard IV

estimator that results from violations of IV assumptions. We also consider how general

knowledge about medical practice, such as whether medications tend to be overused,

underused, or potentially misused, can help interpret the target of estimation (i.e., the IV

estimand). We illustrate these ideas in the setting of an IV analysis of the effect of non-steroidal

anti-inflammatory drugs (NSAIDs) and upper gastrointestinal (GI) bleeding risk. Finally, we

consider how IV studies of other prescription medications and medical interventions may be

more or less sensitive to IV assumptions in studies using preference-based instruments.

2 Motivating Example: Short-term effects of non-steroidal anti-inflammatory

treatment assignment on risk of GI complication

We illustrate the ideas described in this paper in the context of a study that we conducted that

employed an instrumental variable defined at the level of the prescribing physician (Brookhart

et al, 2006). Our study attempted to assess the risk of GI toxicity among new users of non-

selective NSAIDs compared with new users of the COX-2 selective NSAIDs (coxibs). This

1See Angrist et al, 1996; Greenland, 2000; Martens et al 2006, and Hernán and Robins, 2006 for overviews of IV methods.

Brookhart and Schneeweiss Page 2

Int J Biostat. Author manuscript; available in PMC 2009 August 3.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 3

example both illustrates the use of a preference-based instrumental variable and the difficulty

of estimating intended treatment effects using administrative data.

As background, coxibs are generally thought to have greater GI tolerability than non-selective

NSAIDs; therefore, confounding arises in comparative studies of NSAIDs as a result of the

selective prescribing of coxibs to patients who are at elevated risk of GI complications, such

as patients with a history of smoking, alcoholism, obesity, or peptic ulcer disease. Because

many of these variables are poorly measured or completely unrecorded in typical

pharmacoepidemiologic databases, studies comparing the GI risks of different NSAIDs would

be expected to understate any protective effect of coxibs. Indeed, several observational studies

have been unable to attribute any GI-protective effect to the coxibs (Laporte at al, 2003).

Although a physician’s choice of NSAID relies strongly on an assessment of a patient’s

underlying GI risk, NSAID prescribing is also thought to depend on individual physician

preference.(Solomon et al, 2003; Schneeweiss et al, 2005). The possibility that physicians

strongly differ in their preference for different NSAIDs suggests that an instrumental variable

defined at the level of the prescribing physician could be used to estimate NSAID treatment

effects.

2.1 Study Population and Data

Our study was based on 37,842 new NSAID users drawn from a large population-based cohort

of Medicare beneficiaries who were eligible for a state-run pharmaceutical benefit plan. State

medical license numbers from the pharmacy claims were used to identify the prescribing

physician (Brookhart et al, 2007). From the Medicare and pharmacy claims we extracted a

treatment assignment X (X=1 if a patient was placed on a coxib, X=0 otherwise), a set of

measured covariates C, and an outcome Y indicating an upper GI bleed within 60 days of

initiating an NSAID.

We have proposed to use an instrumental variable defined at the level of the prescribing

physician. One approach to implementing this would be to use individual physician indicator

variables as instrumental variables. This approach would essentially use the proportion of

NSAID prescriptions for coxibs as a measure of a physician’s preference for coxibs. Such an

approach was implicitly used in studies that have used hospitals (Johnston, 2000) and

geographic regions (Brooks et al, 2003) as instrumental variables. In our study of NSAIDs,

however, the study period was an era of aggressive marketing and active debate about the safety

and effectiveness of coxibs and non-selective NSAIDs. Therefore, we sought an instrumental

variable that would allow preference to change. We opted to use the type of the most recent

NSAID prescription initiated by each physician as an instrument, i.e., we defined the

instrumental variable Z to be equal to 1 if the physician’s most recent new NSAID prescription

was for a coxib and zero otherwise. Pharmacy claims that occurred on the same day were

randomly ordered.

We justified the use of this variable by assuming that Z was effectively randomly assigned to

patients, so that patient characteristics were unrelated to Z, and also that Z was related to Y only

through its relationship with X, the choice of NSAID type. We also assumed that E[X|Z] ≠ E

[X], so that Z predicts X. In the following section we formalize these assumptions in terms of

moment assumptions in a structural (counterfactual) model.

3 The Method of Instrumental Variables

We describe our instrumental variable approach using the potential (counterfactual) outcome

framework of Rubin (1974). This approach requires that for each subject there exist two

counterfactual (potential) outcomes, Y1 and Y0, that correspond to the outcomes we would

Brookhart and SchneeweissPage 3

Int J Biostat. Author manuscript; available in PMC 2009 August 3.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 4

observe if a patient were treated with coxibs or non-selective NSAIDs, respectively. For these

outcomes we assume the following model (called a structural model):

(1)

where x is an assigned rather than an observed treatment, (x = 1 if the assigned treatment is a

coxib, x = 0 otherwise), ϵx is an error specific to the assigned treatment, and E[ϵx] = 0 for x ∈

{0,1}. The average treatment effect (ATE) in the population is expressed as E[Y1 – Y0] = α1.

Under the consistency assumption, which states that the observed outcome is indeed a

counterfactual outcome, the observed data are linked to the potential outcome through the

relation

(2)

Substituting the terms from the structural model (1) into the relation (2) allows us to write the

observed outcome as a function of the structural model parameters and error terms:

(3)

The term α0 + ϵ0 reflects an individual patient’s outcome if treatment were withheld, but

everything else were to remain the same about the patient and concomitant treatments. The

term ϵ1 – ϵ0 represents the added benefit or harm beyond α1 that an individual patient receives

from treatment. This term captures a patient’s unique response to treatment and allows for

treatment effect heterogeneity.

In our setting, the term ϵ0 represents both patient characteristics that are related to baseline

prognosis as well as other concomitant treatments that a patient might receive from the

physician that could affect the outcome. If Z has an independent relation with Y, either through

its association with patient characteristics or concomitant treatments, then E[ϵ0|Z] ≠ 0. For the

remainder of the paper, we equate the assumption E[ϵ0|Z] = 0 with the exclusion restriction of

Angrist et al (1996).

Traditional instrumental variables approaches in econometrics assume that treatment effects

are constant, so ϵ1 = ϵ0 for all patients. When this assumption and the exclusion restriction

hold, it is straightforward to see that

Thus, the parameter α1 can be estimated by plugging in sample quantities for the conditional

expectations

(4)

This is the standard instrumental variable estimator or Wald estimator. For this to be a

consistent estimator of α1, we need to assume that one patient’s counterfactual outcomes are

not affected by the treatment assignment of other patients. This along with the consistency

Brookhart and SchneeweissPage 4

Int J Biostat. Author manuscript; available in PMC 2009 August 3.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 5

assumption compose the so-called stable unit value treatment assumption (SUTVA) of Rubin

(1986). We also need to assume that the instrument is associated with receiving treatment, so

that E[X|Z = 1] – E[X|Z = 0] ≠ 0.

If treatment effects are heterogeneous, an additional assumption is required to meaningfully

interpret the standard IV estimator. One such assumption states that the correlation between

the received treatment and an individual’s response to it (as measured on a linear scale) is the

same across levels of Z., i.e., E[X(ϵ1 – ϵ0)|Z] = E[X(ϵ1 – ϵ0)]. If this holds, the standard IV

estimator will be consistent for the average treatment effect in the population.

Alternatively, Imbens and Angirst (1994) and Angrist et al (1996) established that under a

monotonicity assumption, the standard IV estimator is consistent for the average effect of

treatment among the sub-population of patients termed the “compliers” (Angrist et al, 1996)

or “marginal patients” (Harris and Remler, 1998). These are patients whose treatment status

depends on the level of the instrument. This parameter is termed the local average treatment

effect (Angrist et al, 1996). In the setting of a placebo-controlled RCT with non-compliance

in which the instrumental variable is the treatment arm assignment, the compliers are the

patients who always take their assigned treatment. Monotonicity requires that the IV

deterministically affect treatment in one direction. In the RCT example, ‘monotonicity’ asserts

that there are no patients who would do the opposite of what they were assigned.

In the setting of preference-based instrumental variables, the utility of IV approaches based on

a monotonicity assumption is unclear, in part because the concept of a marginal patient type

is problematic. For example, a certain type of patient may be treated 95% of the time by

physicians with Z = 1 and 5% of the time by physicians with Z = 0, whereas another patient-

type may be treated 52% of the time by physicians with Z = 1 and 48% of the time by physicians

with Z = 0. Both patients are technically “marginal,” as their treatment status is affected by the

instrument; however, clearly patients of the second type are less likely to have their treatment

status influenced by the physician that they see and are therefore likely to be down-weighted

in an instrumental variable estimate. See Hernán and Robins (2006) for a discussion of a

deterministic monotonicity assumption for preference-based instruments and other theoretical

issues concerning preference-based IVs. See Wooldridge (1997) and Heckman, Urzua, Vytlacil

(2006), for additional discussion of treatment effect heterogeneity.

In the following sections, we propose a structural model that represents both treatment effect

heterogeneity and violations of the structural instrumental variable assumptions. This model

is used to explore how violations of assumptions and treatment effect heterogeneity may bias

the traditional IV estimator relative to average effect of treatment in the population.

3.1 A structural model for sensitivity analysis

We extend the structural model (1) by introducing a single unobserved dichotomous variable

U that could represent a pre-treatment risk factor for the outcome, a concomitant treatment

assigned by the physician, or treatment effect modifier on the risk difference scale. Our new

model for the counterfactual Yx is given by

(5)

with E[ϵx|U] = 0 for x ∈ {0,1}. The average treatment effect for those with U = 0 is given by

E[Y1 – Y0|U = 0] = α1 and the average effect of treatment among those with U = 1 is E[Y1 –

Y0|U = 1] = α1 + α3. The average treatment effect in the whole population is given by E[Y1 –

Y0] = α1 + α3E[U].

Brookhart and SchneeweissPage 5

Int J Biostat. Author manuscript; available in PMC 2009 August 3.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript