Page 1

American Journal of Epidemiology

ª The Author 2011. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of

Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Vol. 174, No. 11

DOI: 10.1093/aje/kwr364

Advance Access publication:

October 24, 2011

Practice of Epidemiology

Effects of Adjusting for Instrumental Variables on Bias and Precision of Effect

Estimates

Jessica A. Myers*, Jeremy A. Rassen, Joshua J. Gagne, Krista F. Huybrechts,

Sebastian Schneeweiss, Kenneth J. Rothman, Marshall M. Joffe, and Robert J. Glynn

* Correspondence to Dr. Jessica A. Myers, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of

Medicine, Brigham and Women’s Hospital and Harvard Medical School, 1620 Tremont Street, Suite 3030, Boston, MA 02120

(e-mail: jmyers6@partners.org).

Initially submitted February 28, 2011; accepted for publication June 8, 2011.

Recent theoretical studies have shown that conditioning on an instrumental variable (IV), a variable that is

associated with exposure but not associated with outcome except through exposure, can increase both bias

and variance of exposure effect estimates. Although these findings have obvious implications in cases of known

IVs, their meaning remains unclear in the more common scenario where investigators are uncertain whether a

measured covariate meets the criteria for an IV or rather a confounder. The authors present results from two

simulation studies designed to provide insight into the problem of conditioning on potential IVs in routine epide-

miologic practice. The simulations explored the effects of conditioning on IVs, near-IVs (predictors of exposure that

are weakly associated with outcome), and confounders on the bias and variance of a binary exposure effect

estimate. The results indicate that effect estimates which are conditional on a perfect IV or near-IV may have

larger bias and variance than the unconditional estimate. However, in most scenarios considered, the increases in

errorduetoconditioningweresmallcomparedwiththetotalestimation error.Inthesecases,minimizingunmeasured

confounding should be the priority when selecting variables for adjustment, even at the risk of conditioning on IVs.

bias (epidemiology); confounding factors (epidemiology); epidemiologic methods; instrumental variable; precision;

simulation; variable selection

Abbreviations: IV, instrumental variable; RD, risk difference; RR, risk ratio.

Editor’s note: An invited commentary on this article ap-

pears on page 1223, and the authors’ response appears on

page 1228.

In studies of exposure effect, measured and unmeasured

factors that are associated with both exposure and outcome

may confound the targeted causal effect. Estimating the ex-

posure effect conditional on all confounding factors yields

consistent estimates (1–4), so choosing which variables to

use for adjustment instudies with manymeasured covariates

is an important step for ensuring the validity of effect esti-

mates. In an attempt to mimic a randomized trial, some

authors have argued that all measured preexposure covariates

should be balanced between exposure groups (5–8). This

strategy is equivalent to selecting all predictors of exposure,

a common practicewhen confounder adjustment is carried out

viathepropensityscore(9).Otherauthorshavearguedagainst

this practice on the grounds that adjusting for some types of

covariates may increase rather than decrease bias (10–12).

In particular, recent literature has questioned whether in-

strumental variables (IVs) (or instruments) should be con-

ditioned upon in effect estimation. IVs are variables that are

associated with exposure but are not associated with out-

come, except through their effect on exposure. IVs may be

used to obtain an unbiased estimate of exposure effect in the

presence of unmeasured confounding via the class of IV

methods. (See recent reviews (13–16) for precise definitions

of IVs and IV methods.) Because IVs are, by definition,

predictors of exposure, any confounder selection strategy

1213 Am J Epidemiol. 2011;174(11):1213–1222

Page 2

that is based on selecting the predictors of exposure will be

likely to include IVs.

Rubin (17) suggested that including a variable that is un-

related to outcome in the propensity score may reduce estima-

tion efficiency, and that result was confirmed in simulation

studies (18, 19). Theoretical results presented by Hahn (20)

and White and Lu (21) show that selecting confounders to

maximize independent variation in the exposure will result

in more efficient estimators. In addition, theoretical analyses

have established that including IVs in the set of conditioning

variablescanincreaseunmeasuredconfoundingbias(22–24),

and empirical examples presented by Bhattacharya and Vogt

(22) and Patrick et al. (25) found that including a ‘‘known’’

instrument in the propensity score model resulted in an esti-

mate which was farther from the assumed truth than that

obtained from the model that did not include the IV.

Despite the evident drawbacks of conditioning on an IV,

the implication of these results for epidemiologic practice

remains unclear. True instruments are difficult to identify and

cannot be verified empirically (15). For example, in a series

of commentaries on a paper by Stukel et al. (26), authors

debated whether or not the assumed IV, regional cardiac

catheterization rate, was more likely to be a confounder of

the association between invasive cardiac management and

survival of acute myocardial infarction and should therefore

be adjusted for (27–30). Moreover, in the presence of unmea-

sured confounding, an IV may look mistakenly like a con-

founder, since it may be associated with exposure and

associated with outcome conditional on exposure. Finally,

the available theoretical studies of this issue provide results

under linear models and do not indicate the magnitude of the

increases in bias and variance for other models. Any increase

in bias due to unnecessary conditioning must be weighed

against the danger of excluding real confounders from the

conditioning set—an issue that is particularly troubling in sec-

ondary analyses of electronic health-care data that often rely

on adjusting for hundreds of confounding covariates (31, 32).

Our objective in the current analysis was to explore the

magnitude of the effects on bias and variance of conditioning

on an IV in a range of common epidemiologic studies of a

binary exposure. Herewe expand on the theoretical analyses

by providing quantitative results under a range of common

linear models and by further providing results under multi-

plicative models. We focus on the casewhere anIV may exist

in the set of measured variables but it is uncertain to investi-

gators. We present results from a Monte Carlo simulation

study that considers true instruments, variables with no direct

effect on unobserved confounding factors or outcome, and

‘‘near-instruments,’’variables that areweaklyassociatedwith

theunmeasuredconfounder.Wealsoexploreeffectsundervary-

ing assumptions about the strength of the IVassociation with

exposure and the magnitude of the unmeasured confounding.

MATERIALS AND METHODS

Review of the theory

We refer to X as the exposure of interest and Y as an

outcome that may be caused by X. We assume that there

exists an unobserved factor, U, that confounds the association

between X and Y and a measured covariate, Z. If Z satisfies

the criteria for an IV for the exposure-outcome pair (X, Y),

then there is no association between Z and Y, except through

X, as shown in Figure 1. We may think of this graph as repre-

senting residual associations after controlling for a vector of

measuredconfounders.Inaddition,Umayrepresentaconstel-

lation of many unobserved confounders, and Z may represent

the combined effect of multiple instruments. The true expo-

sure effect and target of estimation is b2. The parameter a2

controls the strength of the IVassociation with exposure. The

magnitude of confounding is dependent on both a1and b1.

We want to compare the bias of the crude, unadjusted esti-

mator of exposure effect (given by the coefficient of the

regression of Yon X) with the bias of the estimator for expo-

sure effect that conditions on Z (given by the regression

coefficient on X in the regression of Yon X and Z). We follow

the example of Pearl (24) and assume a linear structural equa-

tion framework among zero-mean, unit-variance variables.

Under these assumptions, the crude association between

X and Y is given by

EðYjX ¼ x þ 1Þ ? EðYjX ¼ xÞ ¼ b2þ a1b1:

This quantity is biased for estimation of b2owing to con-

founding from U, and the bias is equal to a1b1. The asso-

ciation between X and Y conditional on Z is given by

EðYjX ¼ x þ 1;Z ¼ zÞ ? EðYjX ¼ x;Z ¼ zÞ¼ b2þa1b1

1?a2

2

:

The bias of this estimator is a1b1

greater in absolute magnitude than a1b1when b1, a1, and

a2are all nonzero. If a1or b1is zero, then both estimators

are unbiased. If a2¼ 0, then these biases are equal.

Therefore, in thisscenario, conditioning on an IVincreases

the bias of the exposure effect estimator compared with the

unadjusted estimator. This phenomenon can be explained

intuitively if we think of partitioning the variation in the

exposure variable, X, into 3 components: the variation ex-

plained by Z, the variation explained by U, and the unex-

plained variation. The proportion of the variation explained

by U, along with the association between U and Y, deter-

mines the magnitude of the unobserved confounding. When

we condition on Z, we effectively remove one source of var-

iation, thereby making the variation explained by U a larger

proportion of the remaining variation in X. Thus, the re-

sidual confounding bias from U is amplified as a result of

??1 ? a2

2

?; which is

YX

U

Z

2

1

1

2

Figure 1.

and an instrumental variable, Z, of the exposure-outcome pair (X, Y).

Causal diagram showing an unmeasured confounder, U,

1214 Myers et al.

Am J Epidemiol. 2011;174(11):1213–1222

Page 3

conditioning on Z. This intuition holds whenever there is

unobserved confounding and an IV, regardless of the specific

assumptions made above. In Appendix 1, we provide an ex-

ample data set that exhibits bias amplification.

Empirical example

The example of Patrick et al. (25) provides context for the

simulations that follow. In that study, rates of mortality and

hip fracture among elderly initiators of statin therapy and

glaucoma medications were compared. (The source popula-

tion and cohort are described in Appendix 2.) Information on

demographic characteristics, pretreatment diagnoses, and

pretreatment use of health-system services was extracted to

define202potentialconfounders.Theinvestigatorscompared

methods of selecting confounders for inclusion in the pro-

pensity score model for exposure to statins versus glaucoma

drugs. The inclusion of one covariate, prior glaucoma diag-

nosis, resulted in effect estimates that consistently moved

away from the expected effect based on the evidence from

randomized controlled trials (see Figure 2).

Glaucoma diagnosis is strongly negatively associated with

exposuretostatinsversusglaucomadrugs(oddsratio¼ 0.07),

but it does not independently predict mortality or hip frac-

ture. Therefore, glaucoma diagnosis appears to be acting

as an IVin this example, since its association with exposure

is much stronger than its association with outcome, and the

observed changes in effect estimates may be a manifestation

of bias amplification. Although the analysis of hip fracture

is one of the most extreme examples of bias amplification

documented in the literature (an increase of 21% in the fully

adjusted analysis), so much residual confounding remains

that including the IVin the propensity score model does not

alter study conclusions. In addition, the strength of the IV-

exposure relation in this example makes the IVeasy to iden-

tify and remove by investigators.

Monte Carlo simulation studies

Pearl (24) and White and Lu (21) provide formulas for the

increases in bias and variance associated with conditioning

on an IV or near-IV, but the rescaling of these results to a

Figure 2.

medication (details in Appendix 2). The adjustment factors used for each estimate, including the potential instrumental variable (IV) glaucoma

diagnosis, are shown on the left. The x-axis is presented on the log scale with tick marks unlogged. The approximate expected effects for mortality

and hip fracture were hazard ratios of 0.85 and 1.01, respectively (25). Horizontal bars, 95% confidence interval.

Estimated hazard ratios (HRs) for mortality (top) and hip fracture (bottom) in initiators of statin medication versus initiators of glaucoma

Effects of Adjusting for Instrumental Variables1215

Am J Epidemiol. 2011;174(11):1213–1222

Page 4

given scenario requires considerable computation. Therefore,

we performed 2 Monte Carlo simulation studies to obtain

quantitative results under a range of epidemiologic scenarios.

In the first experiment, we simulated data under an additive

model and assumed that the goal of estimation was the risk

difference in outcome between levels of exposure. In the

secondexperiment, we simulateddata under amultiplicative

model and considered the goal of estimation to be the risk

ratio for the outcome according to level of exposure. For

simplicity and to reflect a common study framework, all vari-

ables are binary.

Both simulation studies assumed the same basic causal

structure, shown in Figure 3. The true exposure effect and

target of estimation is b2. Note that Z is not a perfect in-

strument in Figure 3 as it was in Figure 1 because it is

associated with the unmeasured confounder U through c1.

However, by varying the value of c1, we can explore the

impacts of conditioning on Z when it is a perfect instrument

and when it is a near-instrument or confounder. As shown by

Pearl (24), bias amplification may result even when the con-

ditioning variable is not a perfect instrument. In addition, we

consider relatively large values of c1to compare the risks of

adjusting for an IV with the benefits of adjusting for a real

confounder. The code used to produce and analyze the sim-

ulations is available in Web Appendix 1, which appears on

the Journal’s Web site (http://aje.oxfordjournals.org/).

Simulation under additive risk

In each data set, we simulated a binary variable, Z, with

Pr(Z ¼ 1) ¼ 0.5 and binary variables U, X, and Y, such that

PrðU ¼ 1j ZÞ ¼ c0þ c1Z;

PrðX ¼ 1j U;ZÞ ¼ a0þ a1U þ a2Z;

PrðY ¼ 1j U;XÞ ¼ b0þ b1U þ b2X:

Variables were simulated in the above order so that the risk

of outcome would depend directly on U and X and indi-

rectly on Z. The parameters c0, a0, and b0define the base-

line prevalence of each variable, and each effect parameter

may be interpreted as a risk difference. Thevalues considered

for each parameter are listed in Table 1. These values were

chosen to provide the widest possible range of scenarios

within the (0, 1) probability bounds for each variable.

We considered 2 values for the baseline risk of outcome,

b0in {0.01, 0.2}, corresponding to rare and relatively com-

mon outcomes, respectively. Based on the value of b0, we

constructed a range of possible values for b1. Within this

restriction, we considered all possible combinations of pa-

rameter values, resulting in 1,280 unique simulation scenar-

ios. We included only 2 values for the exposure effect, b2,

because bias is invariant to the value of this parameter. We

included only positive parameter values to make the illus-

tration of concepts as clear as possible and to avoid repeating

scenarios that are symmetric and yield identical results.

For each simulation scenario, we simulated 2,500 data

sets of size n ¼ 10,000. In each data set, we calculated

? the crude risk difference (RD) between X and Y, RDcrude,

and

? the Mantel-Haenszel risk difference (33) between X and Y

conditional on Z, RDcond.

Both RDcrudeand RDcondare estimators of the exposure

effect, and we compared the performance of these two

estimators.

Simulation under multiplicative risk

Using the same binary variable Z as in the additive study,

we simulated binary variables U, X, and Y such that

Pr?U ¼ 1j Z?¼ c0cZ

Pr?Y ¼ 1j U;X?¼ b0bU

Simulating variables in the above order creates data with the

causal structure depicted in Figure 3 with associations pa-

rameterized as risk ratios. The values considered for each

parameter are listed in Table 2. We again considered all

possible combinations of parameter values, which resulted in

1,440 unique simulation scenarios. We used multiplevalues

of the true exposure effect, b2, since bias was no longer

invariant to its value.

In each scenario, we simulated 2,500 data sets of size

n ¼ 10,000 and calculated

? the crude risk ratio (RR) between X and Y, RRcrude, and

? the Mantel-Haenszel risk ratio (33) between X and Y con-

ditional on Z, RRcond.

1;

Pr?X ¼ 1j U;Z?¼ a0aU

1aZ

2;

1bX

2:

As in the additive simulations, we compared the two estima-

tors of exposure effect, RRcrudeand RRcond.

Evaluation of estimator performance

These simulation studies were designed to compare the

performance of estimators for b2with and without condi-

tioning on Z. For an estimator of exposure effectˆb2, we

estimated the bias with the equation

Bias ¼1

S

X

S

s¼1

ˆb2ðsÞ?b2;

whereˆb2ðsÞ is thevalue ofˆb2in the sth data set and S ¼ 2,500

isthenumberofsimulateddatasets.Weestimatedthestandard

YX

U

Z

2

1

1

1

2

Figure 3.

studies. Depending on parameter values, the measured covariate Z

may act as a confounder or as an instrumental variable for the expo-

sure-outcome pair (X, Y).

Causal diagram showing the structure of the simulation

1216 Myers et al.

Am J Epidemiol. 2011;174(11):1213–1222

Page 5

error ofˆb2using the square root of the sample variance of

ˆb2ðsÞ across simulated data sets. We calculated the bias and

variance of the exposure effect estimators separately in each

simulation scenario.

RESULTS

Additive simulation

Figure 4 shows the performance of RDcrudeon the x-axis

versus that of RDcondon the y-axis. The left panel displays

the biases of both estimators, and the right panel shows the

standarderrors.Resultsareshownforallsimulationscenarios

with b0¼ 0.2, b2¼ 0, and 3 values of c1. All values of b1,

a1, and a2are shown; the values of a1and b1are not differ-

entiated (but they may be inferred from the amount of crude

bias for a given scenario). Results for other values of b0, b2,

and c1are similar to the results shown here and are available

in Web Appendix 2 (Web Figures 1–4). In each plot, the solid

diagonal marks equality. A point on the line indicates a sim-

ulation scenario where the bias or standard error is invariant

to conditioning on Z; scenarios where the bias or standard

error is increased or decreased by conditioning on Z are

represented by points above or below the line, respectively.

In the top row of plots in Figure 4, c1equals 0, indicating

thatZissimulatedtobeaperfectinstrumentfortheexposure-

outcome pair (X, Y). Therefore, the bias in RDcrudeis due to

unobserved confounding from U. In general, conditioning

on the instrument, Z, results in an estimator of exposure

effectthatismore biasedthanthecrude estimator.Inaddition,

the standard error of RDcondis often larger than the standard

error of RDcrude. The magnitude of these increases depends

on thevalueof a2. WhenZ isa strong instrument(a2¼ 0.33),

the increases in bias and standard error due to conditioning on

Zarelargest;whenZisaweakinstrument(a2in{0.06,0.18}),

the increases are negligible; when Z has no association with

exposure (a2¼ 0), there is no increase in either bias or stan-

dard error.

In the center row of Figure 4, c1equals 0.06, indicating

that Z is not a perfect instrument because Z is associated

with Y through the unobserved confounder, U. However, we

may consider Z to be a near-instrument (or near-confounder),

since its association with U is relatively weak. In these

scenarios, conditioning on Z tended to result in increased

bias in simulation scenarios with the largest crude bias and

decreased bias in simulation scenarios with smaller crude

bias. In the former case, the unobserved confounding due to

U overwhelmstherelativelysmallamountof confoundingdue

toZ.Inthelattercase,theconfoundingduetoUissmaller,and

Z accounts for more of the overall confounding bias of

exposure effect. The effect on standard error was similar

to that observed in the top row (where Z is a perfect IV).

Table 1.Parameter Values Used in the Additive Simulationsa

VariableBaseline Risk Risk DifferenceCorresponding Risk Ratio

U

c0¼ 0.3

a0¼ 0.3

c1: 0, 0.006, 0.06, 0.24, 0.6

c1: 1.0, 1.02, 1.2, 1.8, 3.0

X

a1: 0, 0.06, 0.18, 0.33

a1: 1.0, 1.2, 1.6, 2.1

a2: 0, 0.06, 0.18, 0.33

a2: 1.0, 1.2, 1.6, 2.1

Y

b0¼ 0.2

b1: 0, 0.08, 0.36, 0.5

b1: 1.0, 1.4, 2.8, 3.5

b2: 0, 0.2

b2: 1.0, 2.0

b0¼ 0.01

b1: 0, 0.004, 0.018, 0.5

b1: 1.0, 1.4, 2.8, 51

b2: 0, 0.2

b2: 1.0, 21.0

aThe value of b0determines the set of potential values for b1and b2. Within that restriction, all possible combi-

nations of parameter values were considered. The corresponding risk ratios are calculated on the basis of the

baseline prevalence of each variable and will vary depending on the values of other variables.

Table 2.Parameter Values Used in the Multiplicative Simulationsa

VariableBaseline Risk Risk RatioCorresponding Risk Difference

U

c0¼ 0.3

a0¼ 0.3

c1: 1, 1.02, 1.2, 1.8, 3

c1: 0, 0.06, 0.06, 0.24, 0.6

X

a1: 1, 1.1, 1.3, 1.8

a1: 0, 0.03, 0.09, 0.24

a2: 1, 1.1, 1.3, 1.8

a2: 0, 0.03, 0.09, 0.24

Y

b0¼ 0.2

b1: 1, 1.2, 2.2

b1: 0, 0.04, 0.24

b2: 1, 1.2, 2.2

b2: 0, 0.04, 0.24

b0¼ 0.01

b1: 1, 2.2, 8.0

b1: 0, 0.012, 0.07

b2: 1, 2.2, 8.0

b2: 0, 0.012, 0.07

aThe value of b0determines the set of potential values for b1and b2. Within that restriction, all possible combi-

nations of parameter values were considered. The corresponding risk differences are calculated on the basis of the

baseline risk of each variable and will vary depending on the values of other variables.

Effects of Adjusting for Instrumental Variables 1217

Am J Epidemiol. 2011;174(11):1213–1222

Page 6

In the bottom row of Figure 4, c1equals 0.24, indicating

that Z is a confounder in these scenarios. When we condition

on the confounder, the bias is always equivalent or decreased,

but the standard error may increase or decrease. As before,

the magnitude of the increase in standarderror is determined

by the value of a2, with the largest increases occurring when

0.000.050.100.150.20

0.00

0.05

0.10

0.15

0.20

Bias of RDcrude

Bias of RDcond

0.007 0.008

Standard Error of RDcrude

0.0090.0100.011

0.007

0.008

0.009

0.010

0.011

Standard Error of RDcond

0.000.050.100.150.20

0.00

0.05

0.10

0.15

0.20

Bias of RDcrude

Bias of RDcond

0.007 0.008

Standard Error of RDcrude

0.0090.0100.011

0.007

0.008

0.009

0.010

0.011

Standard Error of RDcond

0.00 0.050.100.150.20

0.00

0.05

0.10

0.15

0.20

Bias of RDcrude

Bias of RDcond

0.0070.008

Standard Error of RDcrude

0.0090.0100.011

0.007

0.008

0.009

0.010

0.011

Standard Error of RDcond

Figure 4.

represents one simulation scenario in the additive simulations with c1¼ 0 (upper sections), c1¼ 0.06 (middle sections), or c1¼ 0.24 (lower

sections). The symbols identify values of a2(s, zero; n, 0.06; þ, 0.18; 3, 0.33). The solid diagonal line marks equality. Dashed lines mark the

threshold for a 10% increase or decrease, and dotted lines mark a 20% increase or decrease.

Bias (left panels) and standard error (right panels) of risk difference (RD) estimators with and without conditioning on Z. Each point

1218Myers et al.

Am J Epidemiol. 2011;174(11):1213–1222

Page 7

a2equals 0.33. Furthermore, when a2equals 0, Z has no

direct association with exposure, but conditioning on Z still

reduces the bias.

Across all of the additive simulation scenarios defined in

Table 1, the largest absolute increase in bias due to condi-

tioning on Z was an increase of 0.018 on a crude bias of

0.141. This scenario had the highest value considered for

each of a1, b1, and a2and c1¼ 0. (Equal biases were found

across values of b0and b2.) The largest observed increase in

standard error due to conditioning on Z was an increase of

0.003 on a crude standard error of 0.009. This scenario had

the highest value considered for all parameters.

Because a2is shown to be the most important parameter

in determining the magnitude of the increases in bias and

variancewhenconditioningonZ,wefurtherconsideredasce-

nario with a larger value for a2. In the case of a binary expo-

sure, the value of a2is constrained by the (0, 1) bounds on

probability of exposure. Therefore, in order to increase a2,

we reduced the baseline prevalence of exposure (a0¼ 0.1)

and chose the other parameter values as follows: c0¼ 0.3,

c1¼ 0, a2¼ 0.6, b0¼ 0.2, b1¼ 0.5, and b2¼ 0. Simulating

with these values yielded biases of 0.101 and 0.158 for

RDcrudeand RDcond, respectively, representing a 56% in-

crease in bias. The standard errors of RDcrudeand RDcond

were 0.01 and 0.012, respectively, representing a 20% in-

crease in standard error.

We further repeated one simulation scenario under varying

study sizes to explore the bias-variance trade-off as study size

is reduced. In particular, we use the scenario reported above

with the largest absolute increase in bias due to conditioning

on Z from Table 1. Figure 5 displays the standard error of

RDcrudeand RDcondunder a range of study sizes. The stan-

dard error increases rapidly as the study size decreases, and

the increase in standard error attributable to conditioning on

Z is negligible compared with the impact of study size. In

addition,evenatthesmalleststudysizeconsidered(n¼ 100),

the standard errors of both RDcrudeand RDcondare smaller

than the bias in this scenario.

Multiplicative simulation

Figure 6 shows the bias (left) and standard error (right) of

RRcrudeon the x-axis versus that of RRcondon the y-axis. As

in Figure 4, the y ¼ x line is provided. Results are displayed

for all simulation scenarios with b0¼ 0.2, b2¼ 1, and 3

values of c1. Results for other values of b0, b2, and c1are

similar to the results shown here and are available in Web

Appendix 2 (Web Figures 5–10).

In the multiplicative simulations, associations are param-

eterized as risk ratios, so the 3 values of c1shown in Figure 6

indicate that the variable Z is simulated to be a perfect

instrument,anear-instrument(ornear-confounder),andacon-

founder, respectively, for the exposure-outcome pair (X, Y).

Resultsaresimilartotheresultsfromtheadditivesimulations.

In the presence of unobserved confounding, conditioning on

a true instrument increases the bias and standard error in

exposure effect estimation, and this increase tends to be

larger when the instrument is strong (a2¼ 1.8) and when

the crude bias or standarderror is large.In the scenarios with

no confounding bias from U (a1¼ 1 or b1¼ 1), conditioning

on Z does not create bias. Conditioning on a near-instrument

tends to result in increased bias when the crude bias is large

and decreased bias when the crude bias is relatively small.

When Z is a confounder, bias generally decreases as a result

of conditioning on Z, but standard error may increase or

decrease.

The largest absolute bias increase for any scenario was an

increase of 1.636 on a crude bias of 7.773, achieved when

c1¼ 1, b0¼ 0.01, and the parameters a1, a2, b1, and b2are

maximized. The same scenario results in the largest increase

instandarderroracrossallmultiplicativesimulationscenarios:

an increase of 0.25 on a crude standard error of 1.676. The

scale of both bias and standard error is larger in the multipli-

cative simulations than in the additive simulations, but bias

remains the primary source of error.

DISCUSSION

Our simulation studies showed that estimating an exposure

effect conditional on a perfect instrument can increase the

bias and standard error of the exposure effect estimate, but

these increases were generally small. In particular, when re-

sidual confounding was small, the increase in bias and vari-

ance due to conditioning on an IV was essentially negligible.

When the residual confounding bias was large, the increase

in estimation error due to conditioning on an IV represented

only a small fraction of the overall error in most scenarios.

In addition, increases in bias and standard error were ob-

served when conditioning on a variable that was strongly

associatedwithexposureandweaklyassociatedwithoutcome.

These increases were always smaller than the increases ob-

served when adjusting for a perfect IV with equivalent associ-

ationwith exposure. As expected, the effects of conditioning

on an IVor near-IV were reduced with diminishing strength

of the unmeasured confounding and diminishing strength

of the IV association with exposure. These results are

consistent with past theoretical and simulation findings

(18, 20–24, 34).

Sample Size

Standard Error

100200 5001,0005,000 10,000

0

0.02

0.04

0.06

0.08

0.10

Crude

Conditional on Z

Figure 5.

and without conditioning on Z under a range of study sizes.

Standard error of exposureeffect estimators obtained with

Effects of Adjusting for Instrumental Variables1219

Am J Epidemiol. 2011;174(11):1213–1222

Page 8

These results have clear implications for epidemiologic

practice. First, variables that are known to be instruments

should not be conditioned upon. The belief that balancing

all preexposure covariates, as in randomized studies, can do

no harm does not hold in nonexperimental studies because

there may exist unobserved factors that cannot be balanced.

0.00.10.20.3 0.4

0.0

0.1

0.2

0.3

0.4

Bias of RRcrude

Bias of RRcond

0.0300.035

Standard Error of RRcrude

0.0400.0450.050

0.030

0.035

0.040

0.045

0.050

Standard Error of RRcond

0.0 0.10.2 0.30.4

0.0

0.1

0.2

0.3

0.4

Bias of RRcrude

Bias of RRcond

0.030 0.0350.040 0.0450.050

0.030

0.035

0.040

0.045

0.050

Standard Error of RRcrude

Standard Error of RRcond

0.00.10.20.30.4

0.0

0.1

0.2

0.3

0.4

Bias of RRcrude

Bias of RRcond

0.0300.035

Standard Error of RRcrude

0.040 0.0450.050

0.030

0.035

0.040

0.045

0.050

Standard Error of RRcond

Figure 6.

one simulation scenario in the multiplicative simulations with c1¼ 1 (upper sections), c1¼ 1.2 (middle sections), or c1¼ 1.8 (lower sections). The

symbols identify values of a2(s, 1.0; n, 1.1); þ, 1.3; 3, 1.8). The solid diagonal line marks equality. Dashed lines mark the threshold for a 10%

increase or decrease, and dotted lines mark a 20% increase or decrease.

Bias (left panels) and standard error (right panels) of risk ratio (RR) estimators with and without conditioning on Z. Eachpoint represents

1220Myers et al.

Am J Epidemiol. 2011;174(11):1213–1222

Page 9

Contrary to the current practice of selecting the best predic-

tors of exposure, a very strong association with exposure may

be indicative of an IVor near-IV that should be excluded, as

showninthe examplestudy.Second,orderingvariablesbased

on the magnitude of their association with outcome could

provide a reasonable approach to selecting covariates for con-

ditioning, as recommended by Hill (35) and implemented in

a high-dimensional propensity score algorithm (31) and in

Bayesian propensity scores (36). Although IVs may be as-

sociated with outcome in the presence of unmeasured con-

founding, covariates with relatively strong associations with

outcome are unlikely to be IVs. Finally, within the context

ofscenariosconsideredinthesimulationstudies,inadvertently

including an IV in the set of conditioning variables does not

appear to pose a major threat to the validity of exposure

effect estimates. In most scenarios, the need to control re-

sidual confounding greatly outweighed bias amplification

caused by adjusting for an IV. This threat can be further

reduced if strong predictors of exposure are carefully con-

sidered before being used in adjustment.

Although we were able to deduce consistent trends across

simulation scenarios, specific findings are dependent on the

specification of the data-generating process and the param-

eter values considered. In particular, it is clear that the mag-

nitude of the increase in bias is limited only by the extent to

which the IV determines exposure. In the case of a binary

exposure, this parameter is constrained by the baseline prev-

alence of exposure and the effects of other factors that de-

termine exposure. When analyzing a continuous exposure,

nosuchconstraintsexist,andtheIVassociationwithexposure

may be larger. In addition, in cases of a known IV (e.g.,

randomizedassignmenttoexposure),theassociationbetween

the IVand exposure may be stronger. In our simulation stud-

ies, the parameter values were chosen to represent the range

of associations most likely to be encountered in epidemio-

logic studies with a binary exposure and a binary covariate

thatisnotknowntobeanIV.Withinthisrange,themaximum

increase in bias (over the crude bias) observed in any sce-

nario was approximately 20%. When we further considered

a scenario with a stronger association between the IV and

exposure, we observed a 56% increase in bias. However,

achieving this magnitude of bias increase required both ex-

tremely large unmeasured confounding and a very strong

instrument. On the other hand, when conditioning on a con-

founder, a 50% orgreater decrease in bias was relatively easy

to achieve and did not require such an extreme scenario.

ACKNOWLEDGMENTS

Author affiliations: Division of Pharmacoepidemiology

and Pharmacoeconomics, Department of Medicine, Brigham

and Women’s Hospital and Harvard Medical School, Boston,

Massachusetts (Jessica A. Myers, Jeremy A. Rassen,

JoshuaJ.Gagne,KristaF.Huybrechts,SebastianSchneeweiss,

Kenneth J. Rothman, Robert J. Glynn); RTI Health Solu-

tions, Research Triangle Park, North Carolina (Kenneth J.

Rothman);andDepartmentofBiostatisticsandEpidemiology,

SchoolofMedicine,UniversityofPennsylvania,Philadelphia,

Pennsylvania (Marshall M. Joffe).

This project was funded by the Food and Drug Adminis-

tration through the Mini-Sentinel Coordinating Center and by

grantRO1-LM010213fromtheNationalLibraryofMedicine.

The authors acknowledge the members of the Food and

Drug Administration Mini-Sentinel Signal Evaluation Work-

ing Group for their contributions to this paper.

Conflict of interest: none declared.

REFERENCES

1. Cochran WG. The effectiveness of adjustment by subclassifi-

cation in removing bias in observational studies. Biometrics.

1968;24(2):295–313.

2. Billewicz WZ. The efficiency of matched samples: an empir-

ical investigation. Biometrics. 1965;21(3):623–644.

3. Rosenbaum PR, Rubin DB. The central role of the propensity

score in observational studies for causal effects. Biometrika.

1983;70(1):41–55.

4. Rosenbaum PR, Rubin DB. Reducing bias in observational

studies using subclassification on the propensity score. J Am

Stat Assoc. 1984;79(387):516–524.

5. D’Agostino RB Jr. Propensity score methods for bias reduc-

tion in the comparison of a treatment to a non-randomized

control group. Stat Med. 1998;17(19):2265–2281.

6. Rosenbaum PR. Observational Studies. 2nd ed. New York,

NY: Springer Verlag, Publishers; 2002.

7. Rubin DB. The design versus the analysis of observational

studies for causal effects: parallels with the design of ran-

domized trials. Stat Med. 2007;26(1):20–36.

8. Rubin DB. Should observational studies be designed to allow

lack of balance in covariate distributions across treatment

groups? Stat Med. 2009;28(9):1420–1423.

9. Weitzen S, Lapane KL, Toledano AY, et al. Principles for

modeling propensity scores in medical research: a systematic

literature review. Pharmacoepidemiol Drug Saf. 2004;13(12):

841–853.

10. Shrier I. Re: The design versus the analysis of observational

studies for causal effects: parallels with the design of ran-

domized trials [letter]. Stat Med. 2008;27(14):2740–2741.

11. Pearl J. Causality: Models, Reasoning, and Inference. 2nd ed.

New York, NY: Cambridge University Press; 2009.

12. Sjo ¨lander A. Propensity scores and M-structures. Stat Med.

2009;28(9):1416–1420.

13. Martens EP, Pestman WR, de Boer A, et al. Instrumental vari-

ables: application and limitations. Epidemiology. 2006;17(3):

260–267.

14. Glymour MM. Natural experiments and instrumental variable

analyses in social epidemiology. In: Oakes JM, Kaufman JS,

eds. Methods in Social Epidemiology. San Francisco, CA: John

Wiley & Sons, Inc; 2006:429–468.

15. Herna ´n MA, Robins JM. Instruments for causal inference:

an epidemiologist’s dream? Epidemiology. 2006;17(4):

360–372.

16. Grootendorst P. A review of instrumental variables estimation

of treatment effects in the applied health sciences. Health Serv

Outcomes Res Methodol. 2007;7(3):159–179.

17. Rubin DB. Estimating causal effects from large data sets using

propensity scores. Ann Intern Med. 1997;127(8):757–763.

18. Brookhart MA, Schneeweiss S, Rothman KJ, et al. Variable

selection for propensity score models. Am J Epidemiol. 2006;

163(12):1149–1156.

19. Austin PC, Grootendorst P, Anderson GM. A comparison of

the ability of different propensity score models to balance

Effects of Adjusting for Instrumental Variables1221

Am J Epidemiol. 2011;174(11):1213–1222

Page 10

measured variables between treated and untreated subjects:

a Monte Carlo study. Stat Med. 2007;26(4):734–753.

20. HahnJ.Functionalrestriction andefficiencyincausalinference.

Rev Econ Stat. 2004;86(1):73–76.

21. White H, Lu X. Causal diagrams for treatment effect estima-

tion with application to efficient covariate selection. Rev Econ

Stat. In press.

22. Bhattacharya J, Vogt WB. Do Instrumental Variables Belong

in Propensity Scores? (NBER Technical Working Paper no.

343). Cambridge, MA: National Bureau of Economic Re-

search; 2007.

23. Wooldridge J. Should Instrumental Variables Be Used As

Matching Variables? East Lansing, MI: Michigan State Univer-

sity; 2009. (https://www.msu.edu/~ec/faculty/wooldridge/

current%20research/treat1r6.pdf). (Accessed May 1, 2011).

24. Pearl J. On a class of bias-amplifying variables that endanger

effect estimates. In: Gru ¨nwald P, Spirtes P, eds. Proceedings of

the Twenty-Sixth Conference on Uncertainty in Artificial

Intelligence (UAI 2010). Corvallis, OR: Association for Uncer-

tainty in Artificial Intelligence; 2010:425–432.

25. Patrick AR, Schneeweiss S, Brookhart MA, et al. The impli-

cations of propensity score variable selection strategies in

pharmacoepidemiology: an empirical illustration. Pharmacoe-

pidemiol Drug Saf. 2011;20(6):551–559.

26. Stukel TA, Fisher ES, Wennberg DE, et al. Analysis of ob-

servational studies in the presence of treatment selection bias:

effects of invasive cardiac management on AMI survival using

propensity score and instrumental variable methods. JAMA.

2007;297(3):278–285.

27. Novikov I, Kalter-Leibovici O. Analytic approaches to obser-

vational studies with treatment selection bias [letter]. JAMA.

2007;297(19):2077.

28. D’Agostino RB Jr, RB D’Agostino Sr. Estimating treatment

effects using observational data. JAMA. 2007;297(3):314–316.

29. Stukel TA, Fisher ES, Wennberg DE. Analytic approaches to

observational studies with treatment selection bias—reply

[letter]. JAMA. 2007;297(19):2078.

30. Stukel TA, Fisher ES, Wennberg DE. Using observational data

to estimate treatment effects. JAMA. 2007;297(19):2078–2079.

31. Schneeweiss S, Rassen JA, Glynn RJ, et al. High-dimensional

propensityscore adjustment instudies oftreatment effects using

health care claims data. Epidemiology. 2009;20(4):512–522.

32. Brookhart MA, Stu ¨rmer T, Glynn RJ, et al. Confounding

control in healthcare database research: challenges and po-

tential approaches. Med Care. 2010;48(6 suppl):S114–S120.

33. Rothman KJ,Greenland S,LashTL, eds. Modern Epidemiology.

3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008.

34. Rubin DB, Thomas N. Matching using estimated propensity

scores: relating theory to practice. Biometrics. 1996;52(1):

249–264.

35. Hill J. Discussion of research using propensity-score match-

ing: comments on ‘A critical appraisal of propensity-score

matching in the medical literature between 1996 and 2003’ by

Peter Austin. Statistics in Medicine. Stat Med. 2008;27(12):

2055–2061.

36. McCandless LC, Gustafson P, Austin PC. Bayesian propensity

score analysis for observational data. Stat Med. 2009;28(1):

94–112.

APPENDIX 1

Example Data Set

We present an example data set from the multiplicative

simulation study, where amplification of bias and standard

error were relatively large. These data were simulated under

the following true parameter values: b0¼ 0.01, b1¼ 8,

b2¼ 8(thetrueexposureeffect),a0¼ 0.3,a1¼ 1.8,a2¼ 1.8,

c0¼ 0.3, and c1¼ 1. The simulated data for one set of

10,000 patients is given in Appendix Table 1. The crude

estimate of exposure effect from these data is RRcrude¼

15.52. Thus, the bias of RRcrudeis 7.52 (15.52 ? 8 ¼ 7.52).

The estimate of exposure effect conditional on Z is RRcond¼

17.07, and the bias of RRcondis 9.07 (17.07 ? 8 ¼ 9.07).

Notethatthesamemechanismthatresultsinbiasamplification

also results in estimates of exposure effect that are heteroge-

neous across strata of Z (risk ratios of 12.7 when Z ¼ 0 and

26.8 when Z ¼ 1).

APPENDIX 2

Empirical Example

The data in the empirical example come from the inves-

tigation described by Patrick et al. (25). That cohort study

included patients initiating the use of statins and glaucoma

medications among Medicare beneficiaries aged 65 years or

older who were enrolled in the Pharmaceutical Assistance

Contract for the Elderly (PACE) program provided by the

state of Pennsylvania. Enrollees in PACE were eligible for

inclusionin the study population iftheyhadfilleda prescrip-

tion for any statin or glaucoma drug between January 1,

1996, and December 31, 2002, and demonstrated continu-

ous use of the health-care system.

Initiation of drug therapy was defined as an eligible ben-

eficiary’s filling at least 1 prescription for a medication of

interest between January 1, 1996, and December 31, 2002,

but not using one during the 18 months prior to the index

date. The index date was the first date on which a prescrip-

tion for a statin or glaucoma drug was filled. Follow-up was

continued for 1 year after the initiation of therapy. Covariates

weredefinedonthe basis ofenrollmentinformation(age,sex,

race) and claims made during the year before the index date.

Appendix Table 1.

Multiplicative Simulations

One Simulated Data Set From the

Z 5 0Z 5 1

Y 5 1Y 5 0Y 5 1Y 5 0

X ¼ 1

X ¼ 0

6031,258 1,0842,263

803,05920 1,633

1222Myers et al.

Am J Epidemiol. 2011;174(11):1213–1222