Page 1

ORIGINAL ARTICLE

Instruments for Causal Inference

An Epidemiologist’s Dream?

Miguel A. Herna ´n* and James M. Robins*†

Abstract: The use of instrumental variable (IV) methods is attrac-

tive because, even in the presence of unmeasured confounding, such

methods may consistently estimate the average causal effect of an

exposure on an outcome. However, for this consistent estimation to

be achieved, several strong conditions must hold. We review the

definition of an instrumental variable, describe the conditions re-

quired to obtain consistent estimates of causal effects, and explore

their implications in the context of a recent application of the

instrumental variables approach. We also present (1) a description of

the connection between 4 causal models—counterfactuals, causal

directed acyclic graphs, nonparametric structural equation models,

and linear structural equation models—that have been used to

describe instrumental variables methods; (2) a unified presentation

of IV methods for the average causal effect in the study population

through structural mean models; and (3) a discussion and new

extensions of instrumental variables methods based on assumptions

of monotonicity.

(Epidemiology 2006;17: 360–372)

C

only answer an epidemiologist can provide is “no.” Regard-

less of how immaculate the study design and how perfect the

measurements, the unverifiable assumption of no unmeasured

confounding of the exposure effect is necessary for causal

inference from observational data, whether confounding ad-

justment is based on matching, stratification, regression, in-

verse probability weighting, or g-estimation.

Now, imagine for a moment the existence of an

alternative method that allows one to make causal infer-

ences from observational studies even if the confounders

remain unmeasured. That method would be an epidemiol-

ogist’s dream. Instrumental variable (IV) estimators, as

reviewed by Martens et al1and applied by Brookhart et al2

an you guarantee that the results from your observational

study are unaffected by unmeasured confounding? The

in the previous issue of EPIDEMIOLOGY, were developed to

fulfill such a dream.

Instrumental variables have been defined using 4 dif-

ferent representations of causal effects:

1. Linear structural equations models developed in econo-

metrics and sociology3,4and used by Martens et al1

2. Nonparametric structural equations models4

3. Causal directed acyclic graphs4–6

4. Counterfactual causal models7–9

Much of the confusion associated with IV estimators

stems from the fact that it is not obvious how these various

representations of the same concept are related. Because the

precise connections are mathematical, we will relegate them

to an Appendix. In the main text, we will describe the

connections informally.

Let us introduce IVs, or instruments, in randomized

experiments before we turn our attention to observational

studies. The causal diagram in Figure 1 depicts the structure

of a double-blind randomized trial. In this trial, Z is the

randomization assignment indicator (eg, 1 ? treatment, 0 ?

placebo), X is the actual treatment received (1 ? treatment,

0 ? placebo), Y is the outcome, and U represents all factors

(some unmeasured) that affect both the outcome and the

decision to adhere to the assigned treatment. The variable Z is

referred to as an instrument because it meets 3 conditions: (i)

Z has a causal effect on X, (ii) Z affects the outcome Y only

through X (ie, no direct effect of Z on Y, also known as the

exclusion restriction), and (iii) Z does not share common

causes with the outcome Y (ie, no confounding for the effect

of Z on Y). Mathematically precise statements of these con-

ditions are provided in the Appendix.

A double-blind randomized trial satisfies these condi-

tions in the following ways. Condition (i) is met because trial

participants are more likely to receive treatment if they were

assigned to treatment, condition (ii) is ensured by effective

double-blindness, and condition (iii) is ensured by the ran-

dom assignment of Z. The intention-to-treat effect (the aver-

age causal effect of Z on Y) differs from the average treatment

effect of X on Y when some individuals do not comply with

the assigned treatment. The greater the rate of noncompliance

(eg, the smaller the effect of Z on X on the risk-difference

scale), the more the intention-to-treat effect and the average

treatment effect will tend to differ. Because the average

treatment effect reflects the effect of X under optimal condi-

tions (full compliance) and does not depend on local condi-

tions, it is often of intrinsic public health or scientific interest.

Submitted 30 January 2006; accepted 6 February 2006.

From the *Department of Epidemiology, Harvard School of Public Health

and †Department of Biostatistics, Harvard School of Public Health,

Boston, Massachusetts.

Editors’ note: A related article appears on page 373.

Correspondence: Miguel A. Herna ´n, Department of Epidemiology. Harvard

School of Public Health. 677 Huntington Ave. 02115 Boston, MA.

E-mail: Miguel_hernan@post.harvard.edu.

Copyright © 2006 by Lippincott Williams & Wilkins

ISSN: 1044-3983/06/1704-0360

DOI: 10.1097/01.ede.0000222409.00878.37

Epidemiology • Volume 17, Number 4, July 2006

360

Page 2

Unfortunately, the average effect of X on Y may be affected

by unmeasured confounding.

Instrumental variables methods promise that if you

collect data on the instrument Z and are willing to make some

additional assumptions (see below), then you can estimate the

average effect of X on Y, regardless of whether you measured

the covariates normally required to adjust for the confounding

caused by U. IV estimators bypass the need to adjust for the

confounders by estimating the average effect of X on Y in the

study population from 2 effects of Z: the average effect of Z

on Y and the average effect of Z on X. These 2 effects can be

consistently estimated without adjustment because Z is ran-

domly assigned. For example, consider this well-known IV

estimator: The estimated effect of X on Y is equal to an

estimate of the ratio

E?Y?Z ? 1? ? E?Y?Z ? 0?

E?X?Z ? 1? ? E?X?Z ? 0?

of the effect of Z on Y divided by the effect of Z on X, all

measured in the scale of difference of risks or means, where

E?X?Z? ? Pr?X ? 1?Z? for the dichotomous variable X.

(Martens et al1showed the derivation and a geometrical

explanation of this IV estimator in the context of linear

models, and Brookhart et al2applied it to pharmacoepide-

miologic data.) To obtain the average treatment effect, one

inflates the intention-to-treat effect in the numerator of the

estimator by dividing by a denominator, which decreases as

noncompliance increases. That is, the effect of X on Y will

equal the effect of Z on Y when X is perfectly determined by

Z (risk difference E?X?Z ? 1? ? E?X?Z ? 0? ? 1). The weaker

the association between Z and X (the closer the Z-X risk

difference is to zero), the more the intention-to-treat effect

will be inflated because of the shrinking denominator.

This instrumental variables estimator can also be used

in observational settings. Investigators can estimate the aver-

age effect of an exposure X by identifying and measuring a

Z-like variable that meets conditions (ii) and (iii) as well as a

more general modified version of condition (i), which we

designate as condition (i*). Under condition (i*), the instru-

ment Z and exposure X are associated either because Z has a

causal effect on X, or because X and Z share a common

cause.4,10Martens et al1cite several articles that describe

some instruments used in observational studies. As these

examples show, the challenge of identifying and measuring

an instrument in an observational study is not trivial. The goal

of Brookhart et al2is to compare the effect of prescribing 2

classes of drugs (cyclooxygenase 2-?COX-2? selective and

nonselective nonsteroidal antiinflammatory drugs ?NSAIDs?)

on gastrointestinal bleeding. The authors propose the “phy-

sician’s prescribing preference” for drug class as the instru-

ment, arguing that it meets conditions (i), (ii), and (iii).

Because the proposed instrument is unmeasured, the authors

replace it in their main analysis by the (measured) surrogate

instrument “last prescription issued by the physician before

current prescription.”

Figure 2 shows a causal structure in which the instru-

ment Z (here, “last prescription issued by the physician before

current prescription”) is a surrogate for another unmeasured

instrument U* (here, “physician’s prescribing preference”).

Both Z and U* meet conditions (i*), (ii) and (iii) but, in

contrast to U*, Z does not satisfy the original condition (i).

The original condition (i) is equivalent to the second assump-

tion of Martens and colleagues1for the validity of an instru-

ment. It follows that Martens et al’s assumptions are too

restrictive and do not recognize that Z can be used as an

instrument. That is, under the Martens et al assumption that

the equations are structural (as defined in the Appendix), their

instrumental variables estimator is consistent for the effect of

X on Y provided the instrument Z is uncorrelated with the

error term E in the structural equation for the outcome Y

(which implies no confounding for the causal effect of Z on

Y), even when the instrument is correlated with the error term

F in the structural equation for the treatment X (which implies

confounding for the causal effect of Z on X).

The IV estimator described previously looks like an

epidemiologist’s dream come true: we can estimate the effect

of the X on Y, even if there is unmeasured confounding for the

effect of X on Y! Many sober readers, however, will suspect

any claim that an analytic method solves one of the major

problems in epidemiologic research. Indeed there are good

reasons for skepticism—as Martens et al1explain, and as the

example of Brookhart et al2illustrates. First, the IV effect

FIGURE 1. A double-blind randomized experiment with as-

signment Z, treatment X, outcome Y, and unmeasured factors

U. Z is an instrument.

FIGURE 2. An observational study with unmeasured instru-

ment U*, exposure X, outcome Y, and unmeasured factors U.

Z is a surrogate instrument.

Epidemiology • Volume 17, Number 4, July 2006Instruments for Causal Inference

© 2006 Lippincott Williams & Wilkins

361

Page 3

estimate will be biased unless the proposed instrument meets

conditions (ii) and (iii), but these conditions are not empiri-

cally verifiable. Second, any biases arising from violations of

conditions (ii) and (iii), or from sampling variability, will be

amplified if the association between instrument and exposure

?condition (i*)? is weak. Third, our discussion so far may

have appeared to suggest that conditions (i*), (ii), and (iii) are

sufficient to guarantee that the IV estimate consistently esti-

mates the average effect of X on Y. In fact, additional

unverifiable assumptions are required, regardless of whether

the data were generated from a randomized experiment or an

observational study. Finally, most epidemiologic exposures

are time-varying, which standard IV methods are poorly

equipped to address.

We now briefly review these 4 reasons for skepticism

(see also Greenland11). To illustrate these ideas, we will take

the study by Brookhart et al2as an example because one can

indirectly validate their observational estimates by comparing

them with the estimates from a previous randomized trial that

addressed the same question. We will focus on the effect of

prescribing selective versus nonselective NSAIDs on gastro-

intestinal bleeding over a period of 60 days in patients with

arthritis. This effect was estimated to be ?0.47 (in the scale

of risk difference multiplied by 100) in the randomized trial.

Violation of the Unverifiable Conditions (ii)

and (iii) Introduces Bias

Condition (ii), the absence of a direct effect of the

instrument on the outcome, will not hold if, as discussed by

Brookhart et al,2doctors tend to prescribe selective NSAIDs

together with gastroprotective medications (eg, omeprazol).

This direct effect of the instrument would introduce a down-

ward bias in the estimate, that is, the effect of prescribing

selective NSAIDs would look more protective than it really

is. However, the assumption cannot be verified from the data:

the unexpectedly strong inverse association between Z and Y

(?0.35, Table 3 in Brookhart et al) is consistent with a

violation of condition (ii) but also with a very strong protec-

tive effect of selective NSAIDs without a violation of con-

dition (ii).

Brookhart and colleagues2also discuss the possibility

that physicians who prescribe selective NSAIDs frequently

see higher-risk patients. This potential violation of condition

(iii) is the result of unmeasured confounding for the instru-

ment and would introduce an upward bias in the estimate. To

deal with this potential problem—consistent with the associ-

ation between Z and the measured covariates (Table 2 in

Brookhart et al)—the authors made the unverifiable assump-

tion that, within levels of the measured covariates, there were

no other common causes of the instrument and the outcome.

These violations of conditions (ii) and (iii) can be

represented by including arrows from U* to Y and from U to

Z, respectively (Fig. 3).

A Weak Condition (i*) Amplifies The Bias

An instrument weakly associated with exposure leads

to a small denominator of the IV estimator. Therefore, biases

that affect the numerator of the IV estimator (eg, unmeasured

confounding for the instrument, a direct effect of the instru-

ment) or small sample bias in the denominator will be greatly

exaggerated, and may result in an IV estimate that is more

biased than the unadjusted estimate. The exaggeration of the

effect by IV estimators may occur even in large samples and

in the absence of model misspecification. In the study by

Brookhart et al,2the overall Z ? X risk difference was 0.228

(the corresponding number in patients with arthritis was not

reported). Therefore, any bias affecting the numerator of the

IV estimator would be multiplied by approximately 4.4 (1/

0.228), which might explain why the IV effect estimate

?1.81 was farther from the randomized estimate ?0.47 than

the unadjusted estimate 0.10. The IV method might have

exaggerated the effect if the proposed instrument had a direct

effect due to, say, concomitant prescription of gastroprotec-

tive drugs. Alternatively, the instrument Z may satisfy con-

ditions (i*), (ii), and (iii). In that case, the difference between

the IV and the randomized estimates might not be due to bias

in the instrumental variable estimator but rather to sampling

variability or (as suggested by Brookhart et al) to the different

age distributions in the observational study and the random-

ized trial, along with strong effect-measure modification by

age. The latter hypothesis could be assessed by conducting an

analysis stratified by age.

In the context of linear models, Martens et al1demon-

strate that instruments are guaranteed to be weakly correlated

with exposure in the presence of strong confounding because

a strong association between X and U leaves little residual

variation for X to be strongly correlated with the instrument

U* in Figure 2. This problem may be compounded by the use

of surrogate instruments Z.

When Treatment Effects Are Heterogenous,

Conditions (i*) Through (iii) Are Insufficient to

Obtain Effect Estimates

Even when an instrument is available, additional assump-

tionsarerequiredtoestimatetheaveragecausaleffectofXinthe

population. Examples of such assumptions are discussed in the

following paragraphs as well as in the Appendix. Conditions

FIGURE 3. An observational study with exposure X, outcome

Y, and unmeasured factors U in which the variables U* and Z

do not qualify as instruments.

Herna ´n and Robins

Epidemiology • Volume 17, Number 4, July 2006

© 2006 Lippincott Williams & Wilkins

362

Page 4

(i*), (ii), and (iii) allow one to compute upper and lower bounds,

but not a point estimate, for the average causal effect. In a 1989

article, Robins8derived the bounds that can be computed under

conditions (i*) and (ii) plus a weak version of condition (iii), as

well as under different sets of other unverifiable assumptions.

Subsequently, Manski12derived related results, and Balke and

Pearl13derived narrower bounds under a stronger version of

condition (iii) given in the Appendix; this holds, for example,

when the instrument is a randomized assignment indicator. In a

double-blind randomized trial, confidence intervals for the in-

tention-to-treat effect of Z on Y that exceed zero by a wide

margin show that a positive treatment effect is occurring in a

subset of the population. However, if noncompliance is large

(say, 50%), bounds for the average treatment effect may include

the null hypothesis of zero. This would happen if, for example,

the (unobserved) effect of treatment in the noncompliers were

larger in magnitude and opposite in sign to that in the compliers.

However, Martens et al1and Brookhart et al2do present

point estimates—not bounds—for the causal effect of X on Y.

What other assumptions did the authors make either explicitly

or implicitly? The linear structural equation model used by

Martens et al assume that the effect of X on Y on the

mean-difference scale is the same for all subjects. This

assumption of no between-subject heterogeneity in the treat-

ment effect combined with conditions (i*), (ii), and (iii) is

sufficient to identify the effect of X on Y. (A causal effect is

said to be identified if there exists an estimator based on the

observed data ?Z, X, Y? that converges to ?is consistent for?

the effect in large samples). This assumption will hold under

the sharp null hypothesis that the exposure X has no effect on

any subject’s outcome (in contrast with the “nonsharp” null

hypothesis in which the net effect is still zero but includes

positive effects for some and negative for others). It follows

that, when conditions (i*), (ii), and (iii) hold, the usual IV

estimator will correctly estimate the average treatment effect

of 0 whenever the sharp null hypothesis is true. However,

when the sharp null is false, the assumption of no treatment

effect heterogeneity is biologically implausible for continu-

ous outcomes and logically impossible for dichotomous

outcomes.

There is a weaker, more plausible assumption that,

combined with conditions (i*), (ii) and (iii), still implies the

effect of X on Y is the ratio of the effect of Z on Y to the effect

of Z on X. This is the assumption that the X–Y causal risk

difference is the same among treated subjects with Z ? 1 as

among treated subjects with Z ? 0, and similarly among

untreated subjects.8,14In other words, this assumes that there

is no effect modification, on the additive scale, by Z of the

effect of X on Y in the subpopulations of treated and untreated

subjects (strictly speaking, any effect modification would be

due to the causal instrument U*). The identifying assumption

of no effect modification will not generally hold if the

unmeasured factors U on Figure 2 interact with X on an

additive scale to cause Y. Such effect modification would be

expected in many studies, including that by Brookhart et al.2

There might be effect modification, for example, if the risk

difference for the effect of selective NSAIDs (X) on gastro-

intestinal bleeding (Y) was modified by past history of gas-

tritis (U).

Another assumption that is commonly combined with

conditions (i*), (ii), and (iii) to identify the average effect of

X on Y is the monotonicity assumption. In the context of the

research by Brookhart et al,2with dichotomous Z and U*,

monotonicity means that no doctor who prefers nonselective

NSAIDs would prescribe selective NSAIDs to any patient

unless all doctors who prefer selective NSAIDs would do so.

Clearly, in the substantive setting of the study by Brookhart

et al, monotonicity is unlikely to hold. In other settings,

monotonicity may be more likely. The monotonicity assump-

tion does not affect the bounds for the average effect of X on

Y in the population (our target parameter so far).8,13However,

in the Appendix, we extend a result by Imbens and Angrist15

to show that, if the assumptions encoded by the DAG in

Figure 2 and the assumption of monotonicity all hold, a

particular causal effect is identified and the usual IV estimator

based on Z consistently estimates this effect. The identified

causal effect is the average effect of X on Y in the subset of

the study population who would be treated (1) with selective

NSAIDs by all doctors whose “prescribing preference” is for

selective NSAIDs and (2) with nonselective NSAIDs by all

doctors whose preference is for nonselective NSAIDs.15This

subset of the study population can be labeled as the “com-

pliers” because it is analogous to the subset of the population

in randomized experiments (in which the instrument is treat-

ment assignment) who would comply with whichever treat-

ment is assigned to them. A problem with this causal effect is

that we cannot identify the subset of the population (the

“compliers”) the effect estimate refers to. Further, this result

requires that a doctor’s unobserved “prescribing preference”

U* can be assumed to be dichotomous. In the Appendix we

argue that assumptions encoded by the DAG in Figure 2 are

more substantively plausible if U* is a continuous rather than

a dichotomous measure, although in that case a “complier” is

no longer well defined and the interpretation of the IV

estimator based on Z is different (see Appendix).

The assumptions of monotonicity and no effect modi-

fication by Z on an additive (risk difference) scale by no

means exhaust the list of assumptions that serve to identify

causal effects. Alternative identifying assumptions can result

in estimators of the average effect of X that differ from the

usual IV estimator. For example, in the Appendix, we show

that the assumption of no effect modification by Z on an

multiplicative (risk ratio) scale within both levels of X iden-

tifies the average causal effect.8,10However, under this as-

sumption, the estimated ratio of the average effect of Z on Y

to the average effect of Z on X is now biased (inconsistent) for

the average causal effect of X on Y; in the Appendix we

provide a consistent (asymptotically normal) estimator for the

treatment effect.8,10

Because all identifying assumptions are unverifiable,

Robins and Greenland16argued that it is useful to estimate

upper and lower bounds for the effect, instead of (or in

addition to) point estimates and confidence intervals obtained

under various explicit unverifiable assumptions. Such esti-

mates help to make clear “the degree to which public health

Epidemiology • Volume 17, Number 4, July 2006 Instruments for Causal Inference

© 2006 Lippincott Williams & Wilkins

363

Page 5

decisions are dependent on merging the data with strong prior

beliefs.” As noted above, the problem with bounds is that the

resulting interval may be too wide and therefore not very

informative. (Further, there will be 95% confidence intervals

around the upper and the lower bound attributable to sam-

pling variation.)

In addition, when it is necessary to condition on con-

tinuous (or many discrete) preinstrument covariates to try to

insure that the effect of Z on Y is unconfounded, the validity

of IV estimates based on parametric linear models for a

binary response Y also requires as usual both a correctly

specified functional form for the covariates effects and esti-

mated conditional probabilities that lie between zero and one.

The Standard IV Methodology Deals Poorly

With Time-Varying Exposures

Most epidemiologic exposures are time-varying. For

example, Brookhart et al2compared the risks after prescrip-

tion of either selective or nonselective NSAIDs, regardless of

whether patients stayed on the assigned drug during the

follow-up. In other words, the treatment variable was consid-

ered to not be time-varying, and the authors estimated an

observational analog of the intention-to-treat effect com-

monly estimated from randomized experiments. However, in

reality, patients may discontinue or switch their assigned

treatment over time. When this lack of adherence to the initial

treatment is not due to serious side effects, one could be more

interested in comparing the risks had the patients followed

their assigned treatment continuously during the follow-up.

In the presence of time-varying instruments, exposures,

and confounders, Robins’s g-estimation of nested structural

models10,17–19can be used to estimate causal effects. Nested

structural models achieve identification by assuming a non-

saturated model for the treatment effect at each time t (mea-

sured on either an additive or multiplicative scale) as a

function of a subject’s treatment, instrument, and covariate

history through t. These models naturally allow the analyst

(1) to obtain asymptotically unbiased point estimates of the

treatment effect in the treated study population, (2) to char-

acterize the effect on one’s inference to violations of the

model assumptions through sensitivity analysis, (3) to adjust

for baseline and time-varying continuous and discrete con-

founders of the instrument-outcome association, (4) to in-

clude continuous and multivariate instruments and treat-

ments, and (5) to use doubly-robust estimators. In the

Appendix we show that the linear structural equations of

Martens et al1are a simple case of a nested structural mean

model. Robins’s methods apply to continuous, count, failure

time, and rare dichotomous responses but not to nonrare

dichotomous responses.20For nonrare dichotomous re-

sponses, a new extension due to Van der Laan et al21can be

used. For treatments and instruments that are not time-

varying, Tan22has shown how to achieve many of properties

(a) through (e) under a model that achieves identification of

causal effects by assuming monotonicity.

CONCLUSION

We have reviewed how, in observational research, the

use of instrumental variables methods replaces the unverifi-

able assumption of no unmeasured confounding for the treat-

ment effect with other unverifiable assumptions such as “no

unmeasured confounding for the effect of the instrument” and

“no direct effect of the instrument.” Hence, the fundamental

problem of causal inference from observational data–the

reliance on assumptions that cannot be empirically veri-

fied—is not solved but simply shifted to another realm. As

always, investigators must apply their subject-matter knowl-

edge to study design and analysis to enhance the plausibility

of the unverifiable assumptions.

Further, when conditions (i*), (ii), and (iii) do not hold,

the direction of bias of IV estimates may be counterintuitive

for epidemiologists accustomed to conventional approaches

for confounding adjustment. For example, Brookhart et al2

found a much bigger effect estimate using IV methods

(?1.81) than the effect estimated by the randomized trial

(?0.47), whereas conventional methods were unable to de-

tect a beneficial effect of selective NSAIDs. The conventional

unadjusted and adjusted estimates were quite close (0.10 and

0.07, respectively), despite careful adjustment for most of the

known indications and risk factors for the outcome. If the

assumptions required for the validity of the usual IV estima-

tor held and these differences were not the result of sampling

variability, the aforementioned estimates would imply that

the magnitude of unmeasured confounding (from 0.07 to

?1.81) is much greater than the magnitude of the measured

confounding (from 0.10 to 0.07). An alternative explanation

is that the IV assumptions do not hold and the IV estimate is

biased in the apparently counterintuitive direction of exag-

gerating the protective effect.

In summary, Martens et al1are right: IV methods are

not an epidemiologist’s dream come true. Nonetheless, they

certainly deserve greater attention in epidemiology, as shown

by the interesting application presented by Brookhart et al2

But users of IV methods need to be aware of the limitations

of these methods. Otherwise, we risk transforming the meth-

odologic dream of avoiding unmeasured confounding into a

nightmare of conflicting biased estimates.

APPENDIX

This appendix is organized in 5 sections. The first

section describes 4 mathematical representations of causal

effects—counterfactuals, causal directed acyclic graphs, non-

parametric structural equation models, linear structural equa-

tions models—and their relations. The second section de-

scribes IV estimators that identify the average causal effect of

X on Y in the population by using no interaction assumptions.

We show that these estimators can be represented by param-

eters of particular structural mean models. The third section

describes IV estimators that identify the average causal effect

of X on Y in certain subpopulations by using monotonicity

assumptions. The fourth section contains important exten-

sions. The last section contains the proofs of the theorems

presented in the first 3 sections.

1. Representations of Causal Effects

As mentioned in the main text, IV estimators have been

defined using 4 different mathematical representations of

Herna ´n and Robins

Epidemiology • Volume 17, Number 4, July 2006

© 2006 Lippincott Williams & Wilkins

364

Page 6

causal effects. We now briefly describe each of these repre-

sentations:

1.1 Counterfactuals

A counterfactual random variable Y(x, z) encodes the

value that the variable Y would have if, possibly contrary to

fact, the variable X were set to the value x and the variable Z

set to z. The counterfactual variable Y(x, z) is assumed to be

well defined23in the sense that there is reasonable agreement

as to the hypothetical intervention (ie, closest possible world)

which sets X to x and Z to z.

Counterfactuals allow us to give precise mathematical

definitions for conditions (ii) and (iii) in the definition of an

instrument. Condition (ii), the exclusion restriction, is for-

malized under the counterfactual model by the assumption

that for all subjects,

Y?x, z ? 1) ? Y(x, z ? 0) ? Y(x)

where Y(x) is the counterfactual value of Y when X is set to x,

but each subject’s Z takes the same value as in the observed

data.7The condition (iii) that there is no confounding for the

effect of Z on Y is formalized by the 2 assumptions

Y(x ? 1)

Z

and

Y(x ? 0)

Z

where A ? B is read as “A is independent of B”. The average

causal effect of X on Y is defined to be E?Y(x ? 1)? ? E?Y(x ?

0)? when X is dichotomous, which we also write as E?Y(1)? ?

E?Y(0)? when no ambiguity will arise.

1.2 Causal Directed Acyclic Graphs (DAG)4,5

We define a DAG G to be a graph whose nodes

(vertices) are M random variables V ? (V1, . . ., VM) with

directed edges (arrows) and no directed cycles. We use PAm

to denote the parents of Vm, ie, the set of nodes from which

there is a direct arrow into Vm. The variable Vjis a descendant

of Vmif there is a sequence of nodes connected by edges

between Vmand Vjsuch that, following the direction indi-

cated by the arrows, one can reach Vjby starting at Vm. For

example, consider the causal DAG in Figure 2 that represents

the causal structure of an observational study with a surrogate

instrument Z. In this DAG, M ? 5 and we can choose V1? U,

V2? U*, V3? Z, V4? X, V5? Y; the parents PA4of

V4? X are (U, U*) and the nondescendants of X are (U, U*, Z).

A causal DAG is a DAG in which (1) the lack of an

arrow from node Vjto Vmcan be interpreted as the absence of

a direct causal effect of Vjon Vm(relative to the other

variables on the graph) and (2) all common causes, even if

unmeasured, of any pair of variables on the graph are them-

selves on the graph. In Figure 2, the lack of a direct arrow

between Z and Y indicates that treatment prescribed to the

previous patient Z does not have a direct causal effect

(causative or preventive) on the next patient’s outcome Y.

Also, the inclusion of the measured variables (Z, X, Y) implies

that the causal DAG must also include their unmeasured

common causes (U, U*). Note a causal DAG model makes no

reference to and is agnostic as to the existence of counter-

factuals.

Our causal DAGs are of no practical use unless we

make some assumption linking the causal structure repre-

sented by the DAG to the statistical data obtained in an

epidemiologic study. This assumption, referred to as the

causal Markov assumption (CMA), states that the nondescen-

dants of a given variable Vjare independent of Vjconditional

on the parents (ie, direct causes) of Vj. The CMA is mathe-

matically equivalent to the statement that the density f (V) of

the variables V in DAG G satisfies the Markov factorization

f (v) ??

j?1

M

f(vj?paj).

1.3 Nonparametric Structural Equation Models

(NPSEMs)4

An NPSEM is a causal model that both assumes the

existence of counterfactual random variables and can be

represented by a DAG. To provide a formal definition of an

NPSEM represented by a DAG G, we shall use the following

notation. For any random variable W, let ? denote the

support (ie, the set of possible values w) of W. For any

w1, . . ., wm, define w ?m? (w1, . . ., wm). Let R denote any

subset of variables in V and let r be a value of R. Then Vm(r)

denotes the counterfactual value of Vmwhen R is set to r. We

number the variables V so that for j ? i Vjis not a descendant

of Vi.

An NPSEM represented by a DAG G with vertex set V

assumes the existence of mutually independent unobserved

random variables (errors) ?mand deterministic unknown

functions fm(pam, ?m) such that V1? f1(?1) and the one-step

ahead counterfactual Vm?v ?m ? 1? ? Vm?pam? is given by

fm(pam,?m), and both Vmand the counterfactuals Vm(r) for any

R ? V

areobtainedrecursively

Vm?v ?m ? 1?, m ? 1 . For example, V3(v1) ? V3{v1, V2(v1)} and

V3? V3{V1,V2(V1)}. In Figure 2, Y(z,x) ? V5(v3,v4) ?

f5(V1,v4,?5) ? fy(U,x,?Y) does not depend on z since Z is not a

parent of Y or U, where we define fY? f5, ?Y? ?5since Y ?

V5. In summary, only the parents of Vmhave a direct effect on

Vmrelative to the other variables on G. A DAG G represented

by an NPSEM is a causal DAG for which the CMA holds

because the independence of the error terms ?mboth implies

the CMA and is essentially equivalent to the requirement that

all common causes of any variables on the graph are them-

selves on the causal DAG.

from

V1

andthe

1.4 Linear Structural Equation Models (LSEMs)

A (causal) LSEM for the observed variables is the

special case of an NPSEM in which for each observed Vmthe

deterministic functions fm(pam,?m) are linear in all the ob-

served parents of Vm. For example, in Figure 2, an LSEM for

Y assumes Y ? fY(X,U,?y) ? ?X ? ?Yis linear in X, where

?Y? ?Y(U,?y) is an unknown function of the unobservables

(U,?y). Note this LSEM for Y implies that the treatment effect

Y(x ? 1) ? Y(x ? 0) ? ? is the same constant ? for all

subjects, since according to the model Y(x ? 1) ? ? ? ?Y

and Y(x ? 0) ? ?Y. Linear structural equation modelers

Epidemiology • Volume 17, Number 4, July 2006Instruments for Causal Inference

© 2006 Lippincott Williams & Wilkins

365

Page 7

would redraw the DAG in Figure 2 as the DAG in Figure 4,

since they replace unmeasured common causes of 2 measured

variables by bidirectional edges.

These 4 causal models are connected as follows. An

LSEM is a special case of an NPSEM. An NPSEM is both a

causal DAG model and a counterfactual model. For example,

the NPSEM represented by the DAG in Figure 2 implies the

counterfactual versions of conditions (ii) and (iii) previously.

In fact an NPSEM implies a stronger version of condition

(iii): joint independence of the counterfactuals and Z, repre-

sented as

{Y(x ? 1), Y(x ? 0)} Z

Although an NPSEM is a causal DAG, not all causal DAG

models are NPSEMs. Indeed as mentioned above, a causal

DAG model makes no reference to and is agnostic about the

existence of counterfactuals. In this appendix we shall use

counterfactuals freely to derive results. In Section 4, we

briefly consider which of our results would remain true under

a causal DAG agnostic about counterfactuals. All the results

for NPSEMs described in this appendix actually hold under

the slightly weaker assumptions encoded in a fully random-

ized causally interpreted structured tree graph (FRCISTG)

model of Robins.24,25All NPSEMs are FRCISTGs but not all

FRCISTGs are NPSEMs.26

2. IV Estimators and Effect Modification

In this section we show that the usual IV estimator

estimates the parameter of a particular additive structural

mean model: a counterfactual model for the effect of treat-

ment on the treated. We then describe additional assumptions

necessary for this estimator to also identify the average causal

effect E?Y(1)? ? E?Y(0)? of X on Y in the entire study

population. We end by contrasting these results with those

obtained under a multiplicative structural mean model.

2.1 Additive Structural Mean Models (SMMs)

Additive and multiplicative SMMs were introduced by

Robins8in 1989 and were treated more fully in his later

work.10We first consider the special case in which X and Z

are time-independent and dichotomous and there are no

covariates (eg, measured confounders of the effect of Z on Y).

The general time-independent case is treated in Section 4. See

Robins10,18for time-varying treatments instruments and con-

founders. A nonparametric (saturated) additive SMM is

E?Y(1)?X ? 1, Z? ? E?Y(0)?X ? 1, Z? ? ?{1, Z, ?*}

where ?{1, Z, ?*} ? ?0* ? ?1*Z

or, equivalently,

E?Y?X, Z? ? E?Y(0)?X, Z? ? ?{X, Z, ?*} ? X(?0* ? ?1*Z)

where Y(1) and Y(0) are shorthand for Y(x ? 1) and Y(x ? 0),

respectively, and ?0

parameter ?0

the treated subjects with Z ? 0. Similarly ?0

average causal effect of treatment among the treated subjects

(X ? 1) with Z ? 1. Thus, for the treated subjects, the

parameter ?0

effect modification by Z on an additive scale. It immediately

follows that an LSEM Y ? ?X ? ?Yis an additive SMM

without effect modification by Z with ?0

We turn next to identification and estimation of the

parameters of this additive SMM under the conditional mean

independence assumption

?and ?1

?are unknown parameters. The

?is the average causal effect of treatment among

?? ?1

?is the

?is the main effect of treatment and ?1

?quantifies

?? ? and ?1

?? 0.

E?Y(0)?Z ? 1? ? E?Y(0)?Z ? 0?

(1)

which, by condition (iii), is satisfied by the NPSEM repre-

sented by the DAG in Figure 2 (but not by the DAG in Fig.

3). This assumption can be conveniently rewritten in the

mathematically equivalent form

E?Y ? X(?0

*? ?1

*)?Z ? 1? ? E?Y ? X ?0

*?Z ? 0?

(2)

Let us first consider the case where we assume ?1

a priori so there is no effect modification by Z among the

treated. Then ?0

aforementioned equation for ?0

?? 0

?is the only unknown parameter. Solving the

?with ?1

?? 0, we have

?0

*?

E?Y?Z ? 1? ? E?Y?Z ? 0?

E?X?Z ? 1? ? E?X?Z ? 0?

(3)

That is, ?0

average effect of Z on Y to the average effect of Z on X.10We

conclude that the usual IV estimator is estimating the param-

eter ?0

However if, as in most of the main text, our interest is

in the average causal effect E?Y(1)? ? E?Y(0)? of X on Y in the

study population, we are not yet finished because ?0

?is exactly the usual IV estimand–the ratio of the

?of our additive SMM.

?does not

FIGURE 4. The observational study represented by Figure 2

with all unmeasured common causes replaced by bidirectional

arrows.

Herna ´n and Robins

Epidemiology • Volume 17, Number 4, July 2006

© 2006 Lippincott Williams & Wilkins

366

Page 8

generally equal E?Y(1)? ? E?Y(0)?. Rather, by definition of

the model and the assumption of no effect modification by Z

among the treated,

?0

*? E?Y(1) ? Y(0)?X ? 1, Z ? 1?

? E?Y(1) ? Y(0)?X ? 1, Z ? 0?

and thus ?0

of treatment on the treated (X ? 1).10,14To conclude that

?0

usual IV estimand, we must assume or derive that the average

effect of treatment on the treated and on the untreated are

identical:

?? E?Y(1)?X ? 1? ? E?Y(0)?X ? 1? is the effect

??E?Y(1)? ? E?Y(0)? and thus that E?Y(1)? ? E?Y(0)? is the

E?Y(1)?X ? 1? ? E?Y(0)?X ? 1?

? E?Y(1)?X ? 0? ? E?Y(0)?X ? 0?

(4)

Equation 4 obviously holds when we assume an LSEM for Y

since an LSEM implies the same treatment effect for all

subjects regardless of their X. We can therefore conclude, as

stated in the text, that assuming an LSEM for Y identifies the

average causal effect as the ratio (3) irrespective of whether

the denominator E?X?Z ? 1? ? E?X?Z ? 0? equals the causal

effect of Z on X (as on the DAG in Fig. 1) or simply reflects

the noncausal association between Z and X due to the pres-

ence of their common cause U* (as on the DAG in Fig. 2).

We now provide weaker, somewhat more plausible,

assumptions than those imposed by an LSEM for Y under

which (4) holds and thus (3) equals E?Y(1)? ? E?Y(0)?. These

weaker assumptions are mean independence of Y(1) and Z as

E?Y(1)?Z ? 1? ? E?Y(1)?Z ? 0?

(5)

andtheassumption(6a)ofnoeffectmodificationbyZwithinthe

untreated (X ? 0). Consider the assumptions

E?Y(1) ? Y(0)?X ? 0, Z ? 1?

? E?Y(1) ? Y(0)?X ? 0, Z ? 0?, (6a)

E?Y(1) ? Y(0)?X ? 1, Z ? 1?

? E?Y(1) ? Y(0)?X ? 1, Z ? 0?. (6b)

Assumption

no current treatment interaction with respect to Z in

Robins (1994), and (6b) is a restatement of our assumption

?1

assumptions of no effect modification plus the counterfactual

mean independence assumptions (1) and (5) implies

(6a)was calledthe assumption of

*? 0. Robins8,10noted that the conjunction of these 2

E?Y(1)? ? E?Y(0)? ?

E?Y?Z ? 1? ? E?Y?Z ? 0?

E?X?Z ? 1? ? E?X?Z ? 0?

(7)

Heckman14later derived the same identifying formula under

the joint assumption that average treatment effect in those

with Z ? 1 did not further depend on X, and similarly for

Z ? 0. That is,

E?Y(1) ? Y(0)?X ? 1, Z ? 1?

? E?Y(1) ? Y(0)?X ? 0, Z ? 1?

(8a)

E?Y(1) ? Y(0)?X ? 1, Z ? 0?

? E?Y(1) ? Y(0)?X ? 0, Z ? 0?

(8b)

In fact, given Z and X are correlated and our 2 counterfactual

mean independence assumptions, the assumptions (8) are

equivalent to (6) as proved in Theorem 1 in the last section.

Results closely related to Theorem 1 were discussed by

Heckman.14

Furthermore, under the NPSEM represented by the

DAG in Figure 2, we show in Theorem 2 of section 5 that

sufficient conditions for equation (6b) ?i.e., ?1

with probability one, either Y (x ? 1, U) ? Y (x ? 0, U) does

Pr ?X (z ? 1,U) ? x?

*? 0? are that,

not depend on U or

Pr ?X (z ? 0,U) ? x?

does not depend on

U for x ? 1. Sufficient conditions for (6a) are identical except

that we replace “for x ? 1” by “for x ? 0”. However, under

our NPSEM it is impossible for the above ratio not to depend

on U for x ? 1 and x ? 0 simultaneously (see Theorem 3 in

section 5). Thus whenever U and X interact on the additive

scale to cause Y, it would not be reasonable to assume the

usual IV estimand exactly equals the average effect E ?Y(1) ?

Y(0)? . Finally note that the condition “Y (x ? 1, U) ? Y (x ?

0, U) does not depend on U” does not imply the treatment

effect is the same for each individual as Y (x ? 1, U) ? Y (x ?

0, U) ? fy(x ? 1, U,?y) ? fy(x ? 0, U,?y) may depend on ?y,

although not on U.

2.2 Multiplicative SMM

We now show that if we assume a multiplicative (ie,

log-linear) SMM without interaction and no multiplicative

effect modification by Z given X ? 0, E?Y(1)? ? E?Y(0)?

remains identified (ie, depends on the distribution of the

observed data), but no longer equals (7). The new identifying

estimand is given in the next theorem.

Again we consider the special case of dichotomous X

and Z and no covariates. The saturated multiplicative SMM is

E?Y(1)?X ? 1, Z? ? E?Y(0)?X ? 1, Z? ?{1, Z, ?*}

where ?{1, Z, ?*} ? exp {?0

*? ?1

*Z}(9)

or, equivalently,

E?Y?X, Z? ? E?Y0?X, Z? exp {X{?0

*? ?1

*Z}}

For a dichotomous Y, exp {?0

treated subjects with Z ? 0 and exp {?0

risk ratio in the treated with Z ? 1.

Theorem 4 in Section 5 shows that, when Equation (1)

holds and ?1

by Z in the treated, then

?} is the causal risk ratio in the

*? ?1

*} is the causal

?? 0 , ie, no multiplicative effect modification

exp (??0

*) ? 1 ?

E?Y?Z ? 1? ? E?Y?Z ? 0?

? E?Y?X ? 1, Z ? 0? E?X?Z ? 0?}{

E?Y?X ? 1, Z ? 1? E?X?Z ? 1?

Epidemiology • Volume 17, Number 4, July 2006 Instruments for Causal Inference

© 2006 Lippincott Williams & Wilkins

367

Page 9

If, in addition, Equation 5 holds and there is no multiplicative

effect modification by Z in the untreated, i.e.,

E?Y(1)?X ? 0, Z ? 1?

E?Y(0)?X ? 0, Z ? 1??

E?Y(1)?X ? 0, Z ? 0?

E?Y(1)?X ? 0, Z ? 0?

then E?Y(1)?/E?Y(0)? ? exp(?0

is

*), and the average causal effect

E?Y(1)? ? E?Y(0)? ? E?Y?X ? 0?

?{1 ? E?X?} ?exp (?0

*) ? 1? ? E?X? E?Y?X ? 1?

(10)

Because whenever E?Y(1)? ? E?Y(0)?, the expression for

E?Y(1)? ? E?Y(0)? in Equation 10 differs from that in Equa-

tion 7, our estimate of E?Y(1)? ? E?Y(0)? will depend on

whether we assume no effect modification by Z on an additive

versus a multiplicative scale. Unfortunately, as shown by

Robins,10it will not be possible to determine which, if either,

assumption is true. The reason for this impossibility is that,

even if we had an infinite sample size and Equations 1 and 5

hold, the only equality restriction on the joint distribution of

the observed data is given by Equation 2 or by the mathe-

matically equivalent expression

E?Y exp {?X {?0

*? ?1

*}}?Z ? 1?

? E?Y exp {?X?0

*}?Z ? 0?

(11)

Thus we have only one restriction (ie, one equation) satisfied

by the distribution of the observed data. This single restric-

tion can be written in either of the 2 different but mathemat-

ically equivalent forms Equation 2 or Equation 11. Because

with one equation it is not possible to solve for 2 parameters,

one cannot test whether ?1

additive model of Eq. (2) or in the saturated multiplicative

model of Eq. (11). Further, one cannot solve for ?0

model or estimate the average treatment in the total popula-

tion or any subpopulation. Thus only bounds on E?Y(1)? ?

E?Y(0)? are available.8Under an NPSEM’s stronger version

of condition (iii), which implies (but is not implied by)

Equations 1 and 5, Balke and Pearl13showed that E?Y(1)? ?

E?Y(0)? is identified in certain exceptional circumstances that

are so unusual as to be curiosities.

Summarizing, we cannot identify causal effects us-

ing additive or multiplicative SMMs when we leave the

functions ?{X, Z, ?} completely unspecified (saturated)

as we then have more unknown parameters to estimate

than equations to estimate them with. Thus, for identifica-

tion, we must reduce the dimension of ? through modeling

assumptions, such as assuming certain interactions and/or

main effects are absent.

An additional point is that although the assumptions

encoded in the DAG in Figure 2 (ie, conditions (ii) and (iii)

in the main text) are not empirically verifiable, they can, for

certain data distributions, be empirically rejected.13More

precisely, there exist empirical ?-level tests of the composite

assumptions encoded in the DAG in Figure 2 that, when they

reject, the rejection can be taken as evidence against those

*? 0 either in the saturated

*in either

assumptions. But, for most data distributions under which the

assumptions encoded in the DAG are false, these tests will

fail to reject at greater than level ? even with an infinite

sample size. That is, the tests are not consistent against all

alternatives.

Additive and multiplicative SMM models were devel-

oped to provide a rigorous framework for identification and

estimation via instrumental variables of the effects of a

time-varying treatment or exposure. SMMs explicitly use

counterfactuals (ie, potential outcomes) to characterize the

consequences of between-subject heterogeneity in the treat-

ment effect for instrumental variable estimation. For time-

independent (but not for time-varying treatments) treatments,

additive SMMs are somewhat related to the random coeffi-

cients model discussed by Heckman and Robb.3,27However,

Heckman and Robb did not fully appreciate the usefulness of

instrumental variable methods in these models. In particular,

Heckman and Robb3,27and Heckman28–in contrast to Rob-

ins10–failed to recognize the value of instrumental variables

for estimating average effect of treatment on the treated in the

presence of heterogenous treatment effects (ie, random coef-

ficients), as pointed out by Angrist, Imbens, and Rubin.29

3. IV Estimators Based on Monotonicity

Assumptions

As discussed in the text, monotonicity assumptions are

an alternative to the assumption of a nonsaturated model for

?{X, Z, ?} for obtaining identification.

When the causal instrument U* is binary, we can

define the compliers to be subjects for whom X(u* ? 0) ? 0,

X(u* ? 1) ? 1. Imbens and Angrist15proved that the average

causal effect in the compliers

E?Y(x ? 1) ? Y(x ? 0)?X(u* ? 0) ? 0, X(u* ? 1) ? 1?

equals

E?Y?U* ? 1? ? E?Y?U* ? 0?

E?X?U* ? 1? ? E?X?U* ? 0?under the monotonicity

assumption X(u* ? 1) ? X(u* ? 0) for all subjects. However,

they considered a setting in which, in contrast to ours, data on

the causal instrument U* was available. In Theorem 5 of

Section 5, we show that the average effect in the compliers is

identified by the ratio (7) even when we only have data on a

surrogate Z for the causal instrument U*. This result depends

critically on 2 assumptions: that Z is independent of X and Y

given the causal instrument U*, and that U* is binary.

However, we now argue that the independence assumption

has little substantive plausibility unless U* is continuous. To

do so we need to provide a more precise operational defini-

tion of a physician’s prescribing preference. We consider 2

possible definitions—one binary and one continuous.

Definition 1

Dichotomous prescribing preference: Let U* be a di-

chotomous (0,1) variable that takes the value 1 for a subject

i if and only if at the time the physician treats subject i, he

would treat more than 50% of all study subjects with selective

NSAIDs.

Herna ´n and Robins

Epidemiology • Volume 17, Number 4, July 2006

© 2006 Lippincott Williams & Wilkins

368

Page 10

Definition 2

Continuous prescribing preference: Let U* be a continu-

ous variable whose value for subject i is the proportion of the

study population that the subject’s physician would treat with

selective NSAIDs at the time the physician treats the subject i.

Consider 2 physicians both with the continuous U* ?

0.5 (say, one with continuous U* equal to 0.51 and the other

equal to 0.95) and thus with discrete U* ? 1. Then if the last

patient treated by subject i’s physician received selective

NSAIDs (Z ? 1), it is more likely that the patient’s physician

had the higher continuous U* and thus it is more likely that

subject i will receive selective NSAIDs (X ? 1). That is, X

and Z will be correlated given the discrete U* and the DAG

in Figure 2 will not represent the data. However, Figure 2

remains plausible if we use the continuous definition of U*.

In that case, neither Theorem 5 nor its monotonicity assump-

tion are relevant. Rather, for continuous U* we define mono-

tonicity as follows:

Definition of Monotonicity for Continuous U*

If a physician with U* ? u would treat patient i with

selective NSAIDs, then all physicians with U* greater than or

equal to u would treat the patient with selective NSAIDs.

Formally, X(u*) is a nondecreasing function of u* on the

support of U*.

Note that, under the DAG in Figure 2, U* satisfies

Pr (X ? 1?U*) ? U*, ie, among those patients whose

physician would treat a fraction U* of all patients, the

fraction of patients who receive treatment is exactly U*.

That is, the continuous instrument U* is the propensity

score for treatment.

Let MTP(u*) be the average treatment effect among

those who would be treated by a physician who treats a

fraction u* of the study population but by no physician who

treats less, ie, MTP(u*) ? E?Y(1) ? Y(0)?X(U* ? u*) ? 1,

{X(U* ? v) ? 0; v ? u*}?. Heckman and Vytlacil30

Angrist et al31show that under the assumptions encoded in

DAG 2 and continuous monotonicity, MTP (u*) equals the

derivative ?{E?Y?U* ?u*?}/?u*. Thus, were data on U*

available, MTP (u*) would be identified. In Theorem 6 (see

section 5) we show that, regardless of whether data on U* are

available, the estimand (7) based on Z is a particular weighted

average of ?{E?Y?U* ? u*?}/?u* and thus of MTP (u*).

Specifically,

and

E?Y?Z ? 1? ? E?Y?Z ? 0?

E?X?Z ? 1? ? E?X?Z ? 0???{

?

?U*E?Y?U*?}w (U*) dU*,

S(U*?Z ? 1) ? S(U*?Z ? 0)

w (U*) ?

?

Ilow

Iup

{S(U*?Z ? 1) ? S(U*?Z ? 0)}dU*

?

S(U*?Z ? 1) ? S(U?Z ? 0)

E?U*?Z ? 1? ? E?U*?Z ? 0?

where S(·) is the survival function.

4. Extensions

4.1 SMM With Covariates

WenowpresentamoregeneraladditiveSMMthatallows

for continuous or multivariate exposures X, instruments Z, and

preinstrument covariates C. A general additive SMM assumes

E?Y?X, Z, C? ? E?Y0?X, Z, C? ? ?{X, Z, C, ?*}

where ?{X, Z, C, ?} is a known function, ?* is an unknown

parameter vector and ?{0, Z, C, ?} ? ?{X, Z, C, 0} ? 0.

That is, an additive SMM is a model for the average causal

effect of treatment level X compared with baseline level 0

among the subset of subjects at level Z of the instrument and

level C of the confounders whose observed treatment is

precisely X.

We turn next to identification and estimation of the

parameters of a general additive SMM under the conditional

counterfactual mean independence assumption

E?Y(0)?Z ? 1, C? ? E?Y(0)?Z ? 0, C?

(12)

Now according to the model E?Y(0)?X, Z, C? ? E?Y ? ?

{X, Z, C, ?*}?X, Z, C?. Hence, averaging over X within levels

of (Z, C), we have E?Y(0)?Z, C? ? E?Y ? ?{X, Z, C, ?*}?

Z, C?. Thus, by the assumed counterfactual mean indepen-

dence assumption (12),

E?Y ? ?{X, Z, C, ?*}?Z, C? ? E?Y ? ?{X, Z, C, ?*}?C?

This implies that ¥i?1

n

Ui(?) has mean zero when ? ? ?* with

U(?) ? ?Y ? ?{X, Z, C, ?}? b(C) (Z ? E?Z?C?)

where b(C) is a user supplied vector function of C of the

dimension of ?* (as one needs one equation per unknown

parameter). Thus we would expect that the solution

?ˆto ¥i?1

Ui(?) ? 0 will be consistent and asymptotically

normal for ?* provided the square matrix E??U(?)/??T? of

expected partial derivatives is invertible, which can only

happen when ?* is identified. Conditions for identification

are discussed by Robins10and in Section 2 for the special

case of X and Z dichotomous and C absent. Note in a

randomized trial E?Z?C? will be a known function of the

randomization probabilities. In most trials, E?Z?C? ? 1/2 for

all C. In observational studies E?Z?C? will have to be esti-

mated from the data, often by regression. The estimator given

here is neither efficient nor doubly robust. Chamberlain32and

Robins10discuss efficient estimators. Robins33discusses dou-

bly robust estimators. G-estimation of nested additive and

multiplicative SMMs extend the aforementioned IV methods

for time-independent treatments to time-dependent treatments

with time-varying confounders.10

Analogously, a more general multiplicative SMM as-

sumes E?Y?X, Z, C??E?Y0?X, Z, C?exp(?{X, Z, C,?*}) where

?{X,Z,C,?} is a known function and ?{0,Z,C,?}) ?

?{X,Z,C,0} ? 0. Estimation proceeds as for an additive SMM

n

Epidemiology • Volume 17, Number 4, July 2006 Instruments for Causal Inference

© 2006 Lippincott Williams & Wilkins

369

Page 11

except U(?) is redefined to be Y exp???{X,Z,C,?}? ?

b(C)(Z ? E?Z?C?).

4.2 Causal DAGs Without Counterfactuals

Dawid6,34hasstrenuouslyarguedthatanyresultsobtained

using counterfactual causal models that cannot also be obtained

using causal DAG models without counterfactuals are suspect.

He particularly criticized instrumental variable methods that

obtain identification of the effect of treatment in the compliers

by assuming monotonicity. He argued that joint counterfactuals

such as X(u* ? 0) and X(u* ? 1) are not well defined. He

therefore concluded that compliers are not a well-defined subset

of the population and thus it is meaningless to speak of the

causal effect among compliers. However he claimed that impor-

tant instrumental variables results could still be obtained without

counterfactuals and he backed up this claim by rederiving

without counterfactuals the bounds for the average treatment

effect that Balke and Pearl13had previously derived under a

counterfactual model. This leaves unanswered the question of

whethertheidentifyingassumptionsofnoeffectmodificationby

Z within all levels of treatment X can be meaningfully expressed

in a causal DAG model without counterfactuals. Elsewhere we

show that it can be.

5. Theorems and Proofs

Theorem 1

Given Z and X dependent and Equations 5 and 1, a)

Equations 8 hold N Equations 6 hold, and b) both Equations

8 and 6 imply Equation 4 and thus 7

Proof.

a)fLetY(1)?Y(0)??. E???X?1,Z??E???X?0,Z?

fE???X?1,Z??E???X?0,Z??E???Z??E??? where the

last equality uses Equations 5 and 1

a) dConversely define ?(Z) ? E?X?Z?.

Then E????E???Z?1?

?E???X?1,Z?0??(0)?E???X?0,Z?0?{1??(0)}

?E???X?1,Z?1??(1)?E???X?0,Z?1?{1??(1)}

?E???X?1,Z?0??(1)?E???X?0,Z?0?{1??(1)}

where the last equality is by the premise Eqs (6).

Thus{E???X?1,Z?0??E???X?0,Z?0?}?(0)?E???X?0,Z?0?

?{E???X?1,Z?0??E???X?0,Z?0?}?(1)?E???X?0,Z?0?.

Thus 0?{E???X?1,Z?0??E???X?0,Z?0?}{?(1)??(0)}.

Since {?(1) ? ?(0)} ? 0 by assumption, we conclude

E???X?1,Z?1??E???X?0,Z?1?. A symmetric argument

shows E???X?1,Z?0??E???X?0,Z?0?

b) From the proof of a) f above, E???X?1,Z??E???X?0,Z?

?E???. Hence E??? ? E???X?1??E???X?0? ■

Theorem 2

Consider an NPSEM represented by the DAG in Figure

2. E?Y(1) ? Y(0)?X ? x, Z ? z ? does not depend on Z if,

with probability 1, either (i) Y(x ? 1, U) ? Y(x ? 0, U) does

Pr?X?z ? 1,U? ? x?

not depend on U or (ii)

Pr?X?z ? 0,U? ? x?

does not

depend on U.

Proof.

E?Y?1??Y?0??X,Z???E?Y?1? ? Y?0??X,Z,U?dF?U?X,Z?.

But E?Y(x)?X,Z,U? ? E?fy(x,U,?y)?X,Z,U?

? E?fy(x,U,?y)?U? ? E?Y(x,U)?U?. Hence if (i) holds,

E?Y(1) ? Y(0)?X ? x, Z ? z? ? E?Y(1) ? Y(0)] since

?dF(U?X,Z)?1.

If (ii) holds and X ? x,

f?U?X,Z? ?

f?X?U,Z? f?U?Z?

?f?X?U,Z?dF?U?Z??

f?X?U,Z?

f?X?U,Z ? 0?f?X?U,Z ? 0? f?U?

f?X?U,Z? f?U?

?f?X?U,Z?dF?U?

?

?

f?X?U,Z?

f?X?U,Z ? 0?f?X?U,Z ? 0?dF?U?

?

f?X?U,Z ? 0? f?U?

?f?X?U,Z ? 0?dF?U?

which does not depend on Z since, under the NPSEM, f(x?U,Z ?

z) ? Pr?X(z, U) ? x?. But E ?Y(1) ? Y(0)?X,Z ? ? E?Y(1, U) ?

Y(1, U)?U? also does not depend on Z. ■

Theorem 3

On the NPSEM represented by the DAG in Figure 2,

suppose

U

and

X

are

Pr?X?z ? 1,U? ? x?

Pr?X?z ? 0,U? ? x?depends on U for either x?1 or x?0.

dependentgiven

Z.

Then

Proof.

By contradiction. Assume the lemma is false. Let

Pr?X?z ? 1,U? ? x?

? r?x?.

Pr?X?z ? 0,U? ? x?

Then r(x) ?

1?r?1?x?Pr?X?z?0,U??1?x?

1?Pr?X?z ? 0,U? ? 1?x?

Hence Pr?X?z ? 0,U? ? 1 ? x? ?

1 ? r?x?

?r?1 ? x? ? 1?. So

Pr?X ? 1?Z ? 0,U? ? Pr?X ? 1?Z ? 0?.

By symmetry

1

r?x?

?

1 ? Pr?X?z ? 0,U? ? 1 ? x?

1 ? Pr?X?z ? 1,U? ? 1 ? x?

?

1 ?

1

r?1 ? x?Pr?X?z ? 1,U? ? 1 ? x?

1 ? Pr?X?z ? 1,U? ? 1 ? x?

U? ? Pr?X ? 1?Z ? 1?. Hence U and X are independent

given Z, which is a contradiction. ■

so Pr?X?1?Z?1,

Theorem 4 (Robins 1989)8

Assume Z and X are dichotomous and dependent, and

Equation 1 holds. Further assume model (9) holds with

?1

treated. Then ?0

*? 0, ie, no multiplicative effect modification by Z in the

*is identified and

Herna ´n and Robins

Epidemiology • Volume 17, Number 4, July 2006

© 2006 Lippincott Williams & Wilkins

370

Page 12

exp (??0

*) ? 1 ?

E?Y?Z ? 1? ? E?Y?Z ? 0?

E?Y?X ? 1, Z ? 1? E?X?Z ? 1?

? E?Y?X ? 1, Z ? 0? E?X?Z ? 0?

(13)

If, in addition, Equation 5 holds and there is no multi-

plicative effect modification by Z in the untreated, ie,

E?Y(1)?X ? 0, Z ? 1?

E?Y(0)?X ? 0, Z ? 1??

E?Y(1)?X ? 0, Z ? 0?

E?Y(0)?X ? 0, Z ? 0?

(14)

then E?Y(0)?,E?Y(1)?,E?Y(1)?/E?Y(0)? and E?Y(1)??E?Y(0)?

are identified.

E?Y(1)?/E?Y(0)??exp(?0

E?Y(0)??E?Y?X?0?{1?E?X?}?E?X?E?Y?X?1?exp(? ?1

E?Y(1)??E?Y?X?0?{1?E?X?}exp(?0

and E?Y(1)??E?Y(0)?

?E?Y?X?0?{1?E?X?}?exp(?0

Proof.

From ?9?, E?Y exp {?X{?0

and thus E?Y exp {?X{?0

Therefore E?Yexp{?X{?0

?1

E?Yexp{?X{?0

?E?Yexp{?X?0

E?Y exp {?X?0

So ?exp(??0

??exp(??0

from which (13) follows.

Now by ?1

Hence E?Y(0)??E?Y?X?0?{1?E?X?}?E?X?E?Y?X

?1?exp(??0

allowing us to calculate E?Y(1)? and thus E?Y(1)??E?Y(0)? ■

*),

*),

*)?E?X?E?Y?X?1?,

*)?1??E?X?E?Y?X?1?

*? ?1

*? ?1

*??1

*Z}}?X, Z? ? E?Y(0)?X, Z?

*Z}}?Z??E?Y(0)?Z??E?Y(0)?.

*}}?Z?1??E?Yexp{?X{?0

*?

*Z}}?Z?0? and hence

*??1

*}}?Z?1?

*}?Z?0?. Putting ?1

*}?Z ? 1? ? E?Yexp{?X?0

*)? 1?E?YX?Z?1??E?Y?Z?1?

*)?1?E?YX?Z?0??E?Y?Z?0?

*?0 we obtain

*}?Z?0?.

*? 0, E?Y(0)?X?1??E?Y?X?1?exp(??0

*).

*). By ?1

*?0 and ?14?,E?Y (1)?/E?Y(0)??exp(?0

*),

Theorem 5

Suppose we have an NPSEM represented by the DAG

in Figure 2. Further assume the causal instrument U* is

binary and that the following monotonicity assumption holds

X(u* ? 0) ? 1 implies X?u* ? 1) ? 1

Define the compliers to be subjects for whom X(u* ? 0) ? 0,

X(u* ? 1) ? 1. Then the average causal effect in the compliers

E ?Y(x ? 1) ? Y (x ? 0 )? X (u* ? 0) ? 0, X(u* ? 1) ? 1?

identified from the data (X, Z, Y) and equals the ratio (7).

is

Proof.

E?Y?Z?1??E?Y?Z?0?

?E?Y?U*?1,Z?1?E?U*?1?Z?1??E?Y?U*?0,Z?1?{1?

E?U*?1?Z?1?}?E?Y?U*?1,Z?0?E?U*?1?Z?0??

E?Y?U*?0,Z?0?{1?E?U*?1?Z?0?}?

E?Y?U*?1?E?U*?1?Z?1??E?Y?U*?0?{1?E?U*?1?Z?1?}?

E?Y?U*?1?E?U*?1?Z?0??E?Y?U*?0?{1?E?U*?1?Z?0?}

?{E?Y?U*?1??E?Y?U*?0?}{E?U*?1?Z?1??E?U*?1?Z?0?}.

Similarly,

E?X?Z?1??E?X?Z?0?

??E?X?U*?1??E?X?U*?0??{E?U*?1?Z?1??E?U*?1?Z?0?}.

Thus

E?Y?Z ? 1? ? E?Y?Z ? 0?

E?X?Z ? 1? ? E?X?Z ? 0??

The theorem then follows from Imbens and Angrist.15■

E?Y?U* ? 1? ? E?Y?U* ? 0?

E?X?U* ? 1? ? E?X?U* ? 0?.

Theorem 6

Suppose the NPSEM represented by the DAG in Figure

2 and the monotonicity assumption for continuous U* hold,

that Pr(X ? 1?U*) ? U*, and that E?Y?U*? is differentiable on

the support ?Ilow,Iup? ? ?0,1? of U*. Then

E?Y?Z ? 1? ? E?Y?Z ? 0?

E?X?Z ? 1? ? E?X?Z ? 0?

??{

?

?U*E?Y?U*?}

w (U*) dU*,

w(U*) ?

S(U*?Z ? 1) ? S(U*?Z ? 0)

?

Ilow

Iup

{S(U*?Z ? 1) ? S(U*?Z ? 0)} dU*

?

S(U*?Z ? 1) ? S(U*?Z ? 0)

E?U*?Z ? 1? ? E?U*?Z ? 0?

Proof.30,31

E?Y?Z ? 1? ? E?Y?Z ? 0? ?

?

Ilow

E?Y?U*?{F(U*?Z ? 1) ? F(U*?Z ? 0)}?Ilow

?U*E?Y?U*?}

?

Ilow

Iup

E?Y?U*?{f(U*?Z ? 1) ? f(U*?Z ? 0)} ?

Iup?

?

Ilow

Iup

{

?

{F(U*?Z ? 1) ? F (U*?Z ? 0)} dU* ?

Iup

{

?

?U*E?Y?U*?}

{S(U*?Z ? 1) ? S(U*?Z ? 0)} dU*

Similarly,

E?X?Z ? 1? ? E?X?Z ? 0? ?

?

Ilow

?

Ilow

E?U*?Z ? 1? ? E?U*?Z ? 0? ?

Iup

E?X?U*??f (U*?Z ? 1) ? f(U*?Z ? 0)? ?

Iup

U*?f (U*?Z ? 1) ? f(U*?Z ? 0)? ?

Epidemiology • Volume 17, Number 4, July 2006Instruments for Causal Inference

© 2006 Lippincott Williams & Wilkins

371

Page 13

?

Ilow

Iup

{S(U*?Z ? 1) ? S(U*?Z ? 0)} dU* ■

REFERENCES

1. Martens E, Pestman W, de Boer A, et al. Instrumental variables:

applications and limitations. Epidemiology. 2006;17:260–267.

2. Brookhart MA, Wang P, Solomon DH, et al. Evaluating short-term drug

effects using a physician-specific prescribing preference as an instru-

mental variable. Epidemiology. 2006;17:268–275.

3. Heckman J, Robb R. Alternative methods for estimating the impact of

interventions. In: Heckman J, Singer B, eds. Longitudinal Analysis of

Labor Market Data. New York: Cambridge University Press; 1985:

156–245.

4. Pearl J. Causality: Models, Reasoning, and Inference. New York:

Cambridge University Press; 2000.

5. Spirtes P, Glymour C, Scheines R. Causation, Prediction and Search.

2nd ed. Cambridge, MA: MIT Press; 2000.

6. Dawid AP. Causal inference using influence diagrams: the problem of

partial compliance. In: Green PJ, Hjort NL, Richardson S, eds. Highly

Structured Stochastic Systems. New York: Oxford University Press;

2003.

7. Holland PW. Causal inference, path analysis, and recursive structural

equation models. In: Clogg C (ed). Sociological Methodology. Wash-

ington, DC: American Sociological Association; 1988:449–484.

8. Robins JM. The analysis of randomized and non-randomized AIDS

treatment trials using a new approach to causal inference in longitudinal

studies. In: Sechrest L, Freeman H, Mulley A, eds. Health Services

Research Methodology: A Focus on AIDS. NCHRS, U.S. Public Health

Service; 1989:113–59.

9. Angrist J, Imbens GW, Rubin DB. Identification of causal effects using

instrumental variables. J Am Stat Assoc. 1996;91:444–455.

10. Robins JM. Correcting for non-compliance in randomized trials using

structural nested mean models. Commun Stat. 1994;23:2379–412.

11. Greenland S. An introduction to instrumental variables for epidemiolo-

gists. Int J Epidemiol. 2000;29:722–729.

12. Manski C. Nonparametric bounds on treatment effects. Am Econ Rev.

1990;80:319–323.

13. Balke A, Pearl J. Bounds on treatment effects from studies with imper-

fect compliance. J Am Stat Assoc. 1997;92:1171–1176.

14. Heckman J. Instrumental variables: A study of implicit behavioral

assumptions used in making program evaluations. J Human Resources.

1997;32:441–462.

15. Imbens GW, Angrist J. Identification and estimation of local average

treatment effects. Econometrica. 1994;62:467–475.

16. Robins J, Greenland S. Comment on “Identification of causal effects

using instrumental variables” by Angrist, Imbens and Rubin. J Am Stat

Assoc. 1996;91:456–8.

17. Robins JM. Analytic methods for estimating HIV treatment and

cofactor effects. In: Ostrow DG, Kessler R, eds. Methodological

Issues of AIDS Mental Health Research. New York: Plenum Pub-

lishing; 1993:213–290.

18. Robins JM. Optimal structural nested models for optimal sequential

decisions. In: Lin DY, Heagerty P, eds. Proceedings of the Second

Seattle Symposium on Biostatistics. New York: Springer; 2003.

19. Robins JM. Comment on “Covariance adjustment in randomized exper-

iments and observational studies” by Paul Rosenbaum. Stat Sci. 2002;

17:286–327.

20. Robins JM, Rotnitzky A. Estimation of treatment effects in randomised

trials with non-compliance and a dichotomous outcome using structural

mean models. Biometrika. 2004;91:763–783.

21. Van der Laan MJ, Hubbard A, Jewell N. Estimation of treatment effects

in randomized trials with noncompliance and a dichotomous outcome.

UC Berkeley Division of Biostatistics Working Paper Series 2004;

Working Paper 157.

22. Tan Z. Estimation of causal effects using instrumental variables. J Am

Stat Assoc. in press.

23. Robins JM, Greenland S. Comment on “Causal inference without

counterfactuals” by A.P. Dawid. J Am Stat Assoc. 2000;95:431–5.

24. Robins JM. A new approach to causal inference in mortality studies

with sustained exposure periods–Application to control of the healthy

worker survivor effect. Mathematical Modelling. 1986;7:1393–1512

(errata in Computers and Mathematics with Applications. 1987;14:

917–921.

25. Robins JM. Addendum to “A new approach to causal inference in

mortality studies with sustained exposure periods.” Computers and

Mathematics with Applications. 1987;14:923–945 (errata in Computers

and Mathematics with Applications. 1987;18:477.

26. Robins JM. Semantics of causal DAG models and the identification of

direct and indirect effects. In: Green P, Hjort NL, Richardson S, eds.

Highly Structured Stochastic Systems. New York: Oxford University

Press; 2003:70–81.

27. Heckman J, Robb R. Alternative methods for solving the problem of

selection bias in evaluating the impact of treatments on outcomes. In:

Wainer H, ed. Drawing Inferences from Self-Selected Samples. Berlin:

Springer Verlag; 1986.

28. Heckman J. Randomization and Social Policy Evaluation. Technical

Working Paper 107. National Bureau of Economic Research; 1991.

29. Angrist J, Imbens GW, Rubin DB. Rejoinder to comments on “Identi-

fication of causal effects using instrumental variables.” J Am Stat Assoc.

1996;91:468–472.

30. Heckman JJ, Vytlacil EJ. Local instrumental variables and latent vari-

able models for identifying and bounding treatment effects. Proc Natl

Acad Sci USA. 1999;96:4730–4734.

31. Angrist JD, Graddy K, Imbens GW. The interpretation of instrumental

variable estimators in simultaneous equations models with an applica-

tion to the demand for fish. Rev Econ Stud. 2000;67:499–527.

32. Chamberlain G. Asymptotic efficiency in estimation with conditional

moment estrictions. J Econometrics. 1987;34:305–334.

33. Robins JM. Robust estimation in sequentially ignorable missing data and

causal inference models. In: 1999 Proceedings of the Section on Bayes-

ian Statistical Science. Alexandria, VA: American Statistical Associa-

tion; 2000:6:–10.

34. Dawid AP. Causal inference without counterfactuals. J Am Stat Assoc.

2000;95:407–424.

Herna ´n and Robins

Epidemiology • Volume 17, Number 4, July 2006

© 2006 Lippincott Williams & Wilkins

372