Page 1

American Journal of Epidemiology

ª The Author 2010. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of

Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Vol. 172, No. 12

DOI: 10.1093/aje/kwq332

Advance Access publication:

October 29, 2010

Original Contribution

Odds Ratios for Mediation Analysis for a Dichotomous Outcome

Tyler J. VanderWeele* and Stijn Vansteelandt

* Correspondence to Dr. Tyler J. VanderWeele, Departments of Epidemiology and Biostatistics, Harvard School of Public Health,

677 Huntington Avenue, Boston, MA 02115 (e-mail: tvanderw@hsph.harvard.edu).

Initially submitted November 23, 2009; accepted for publication August 26, 2010.

For dichotomous outcomes, the authors discuss when the standard approaches to mediation analysis used in

epidemiology and the social sciences are valid, andtheyprovidealternative mediation analysis techniqueswhen the

standard approaches will not work. They extend definitions of controlled direct effects and natural direct and indirect

effects from the risk difference scale to the odds ratio scale. A simple technique to estimate direct and indirect

effectoddsratios bycombininglogistic andlinearregressionsisdescribedthatapplieswhenthe outcomeisrareand

the mediator continuous. Further discussion is given as to how this mediation analysis technique can be extended

to settings in which data come from a case-control study design. For the standard mediation analysis techniques

used in the epidemiologic and social science literatures to be valid, an assumption of no interaction between the

effects of the exposure and the mediator on the outcome is needed. The approach presented here, however, will

apply even when there are interactions between the effect of the exposure and the mediator on the outcome.

case-control studies; causal inference; decomposition; dichotomous response; epidemiologic methods; interac-

tion; logistic regression; odds ratio

Abbreviations: CDE, controlled direct effect; CI, confidence interval; ESE, empirical standard error; NDE, natural direct effect; NIE,

natural indirect effect; OR, odds ratio; SSE, estimated standard error; TE, total effect.

Editor’s note: Invited commentaries on this article ap-

pear on pages 1349 and 1352, and the authors’ response

is published on page 1355.

The causal inference literature has made a considerable

contribution to mediation analysis by providing definitions

for direct and indirect effects that allow for the effect de-

composition of a total effect into a direct and an indirect

effect even in settings involving nonlinearities and interac-

tions (1, 2), thereby circumventing an important limitation

to the concepts and methods for mediation that have been

used in the social sciences (2). The causal inference litera-

ture on mediation has focused on the risk difference scale.

Many analyses in epidemiology, however, use the odds ratio

scale because the outcome is dichotomous and the data arise

from a case-control study design.

In this paper, we consider the use of the odds ratio scale

for mediation analysis. The use of this scale has the advan-

tage that, when the outcome is rare and the mediator con-

tinuous, direct and indirect effects can be estimated through

very simple regressions, even with data arising from a case-

control study design. Under certain no-interaction assump-

tions, this technique reduces to the approach often used in

the epidemiologic literature of including an intermediate

variable in a logistic regression to assess mediation. How-

ever, when the no-interaction assumption does not hold, the

approach described in the present paper can still be used.

DIRECT AND INDIRECT EFFECTS ODDS RATIOS

We will let A denote an exposure of interest, Ya dichoto-

mous outcome, and M a potential mediator. We let C denote

a set of baseline covariates not affected by the exposure. The

relations among these variables are depicted in Figure 1. For

example, A may denote estrogen therapy, M serum lipid

concentrations, and Y cardiovascular disease. A question

of interest may then be the extent to which the effect of

1339 Am J Epidemiol 2010;172:1339–1348

Page 2

estrogen therapy A on cardiovascular disease Y is mediated

through serum lipid concentrations M and the extent to

which it is through other pathways (3, 4). For simplicity

in the example, we suppose treatment is binary and let

A ¼ 1 denote estrogen therapy and A ¼ 0 otherwise.

To address this and similar questions concerning media-

tion, we use the counterfactual framework (5, 6). We will let

Yaand Madenote, respectively, the values of the outcome

and mediator that would have been observed had the expo-

sure A been set, possibly contrary to fact, to level a. We will

let Yamdenote thevalue ofthe outcome that would have been

observed had the exposure, A, and the mediator, M, been set,

possibly contrary to fact, to levels a and m, respectively. We

also assume the technical assumptions called ‘‘consistency’’

and ‘‘composition’’ generally presupposed in the causal in-

ference literature and described elsewhere (7–9).

We extend the definitions of direct and indirect effects

(1, 2) in causal inference from the risk difference to the

odds ratio scale. On the risk difference scale, the total

effect, conditional on C ¼ c, comparing exposure level

a with a*, is defined by E½Ya? Ya*j c? and compares the

average outcome in stratum C ¼ c if A had been set to

a with the average outcome in stratum C ¼ c if A had been

set to a*. On the odds ratio (OR) scale, the total effect (TE),

conditional on C ¼ c, comparing exposure level a with a*,

is defined by

ORTE

a;a*jc¼PðYa¼ 1j cÞ=f1 ? PðYa¼ 1j cÞg

PðYa* ¼ 1j cÞ=f1 ? PðYa* ¼ 1j cÞg

and compares the odds of outcome Y ¼ 1 in stratum C ¼ c if

A had been a with the odds of outcome Y ¼ 1 in stratum C ¼

c if A had been a*. In the context of the cardiovascular

example, if we let a ¼ 1 denote the estrogen therapy and

a*¼ 0 denote no therapy, then ORTE

ratio for cardiovascular disease comparing estrogen therapy

with no therapy for individuals with covariate values c.

As with the total causal effect, we can also define direct

and indirect effects on either the risk difference or the odds

ratio scale. We will adopt the definitions and nomenclature

of Pearl (2) for the risk difference scale and extend these

concepts to the odds ratio scale. On the risk difference scale,

the controlled direct effect, conditional on C ¼ c, comparing

exposure level a with a*and fixing the mediator to level m,

is defined by E½Yam? Ya*mj c? and captures the effect of

exposure A on outcome Y, intervening to fix M to m. On

1;0jcwould be the odds

the odds ratio scale, one could define the conditional con-

trolled direct effect (CDE) as

ORCDE

a;a*jcðmÞ ¼PðYam¼ 1j cÞ=f1 ? PðYam¼ 1j cÞg

PðYa*m¼ 1j cÞ=f1 ? PðYa*m¼ 1j cÞg:

If A is a binary, this isPðY1m¼ 1j cÞ=f1 ? PðY1m¼ 1j cÞg

PðY0m¼ 1j cÞ=f1 ? PðY0m¼ 1j cÞg.

Note that these conditional controlled direct effects may

vary with m when there is interaction between the effects

of A and M on the odds ratio scale. In the cardiovascular

example, ORCDE

1;0jcðmÞ would denote the odds ratio for cardio-

vascular disease comparing therapy and no therapy with

serum lipid concentrations fixed at level m.

The so-called ‘‘natural direct effect’’ (2) or ‘‘pure direct

effect’’ (1) differs from the controlled direct effect in that

the intermediate M is set to the level Ma*, the level it

would have naturally been under some reference condi-

tion for the exposure, A ¼ a*; the natural direct effect,

conditional on C ¼ c, on the risk difference scale thus

takes the form E½YaMa*? Ya*Ma*j c?. The natural direct

effect thus captures the effect of the exposure, estrogen

therapy, on the outcome, cardiovascular disease, interven-

ing to set the mediator, serum lipid concentration, to the

level it would have been under the reference exposure

level (e.g., no estrogen therapy). The conditional natural

direct effect (NDE) odds ratio can be defined analogously

and takes the form

ORNDE

a;a*jc

?a*?¼

PðYaMa*¼ 1j cÞ=f1 ? PðYaMa*¼ 1j cÞg

PðYa*Ma*¼ 1j cÞ=f1 ? PðYa*Ma*¼ 1j cÞg:

On the odds ratio scale, the conditional natural direct effect

can be interpreted as comparing the odds, conditional on

C ¼ c, of the outcome Y if exposure had been a, but if the

mediator had been fixed to Ma* (i.e., to what it would have

been if exposure had been a*) to the odds, conditional on

C ¼ c, of the outcome Y if exposure had been a*but if the

mediator had been fixed at the same level Ma*. This would

capture the odds ratio for cardiovascular disease comparing

therapy with no therapy intervening to set the serum lipid

concentration to the level it would have been for each sub-

ject had they not had estrogen therapy.

One can similarly define a natural indirect effect. On the

risk difference scale, the conditional natural indirect effect

can be defined as E½YaMa? YaMa*j c?, which compares, con-

ditional on C ¼ c, the effect of the mediator at levels Maand

Ma* on the outcome when exposure A is set to a. The con-

ditional natural indirect effect (NIE) can be defined analo-

gously on the odds ratio scale as

ORNIE

a;a*jcðaÞ ¼PðYaMa¼ 1j cÞ=f1 ? PðYaMa¼ 1j cÞg

PðYaMa*¼ 1j cÞ=f1 ? PðYaMa*¼ 1j cÞg:

On the odds ratio scale, the conditional natural indirect ef-

fect can be interpreted as comparing the odds, conditional

on C ¼ c, of the outcome Yif exposure had been a but if the

mediator had been fixed to Ma(i.e., to what it would have

AMYC

Figure 1.

come Y, and covariates C.

Example of mediation with exposure A, mediator M, out-

1340 VanderWeele and Vansteelandt

Am J Epidemiol 2010;172:1339–1348

Page 3

been if exposure had been a) to the odds, conditional on C ¼

c, of the outcome Y if exposure had been a but if the

mediator had been fixed to Ma* (i.e., to what it would have

been if exposure had been a*). The natural indirect effect

odds ratio thus captures the odds ratio for cardiovascular

disease comparing serum lipid concentration under therapy

and no therapy if the subject had in fact had estrogen ther-

apy. As discussed elsewhere, controlled direct effects are

often of greater interest in policy evaluation (2, 10), whereas

natural direct and indirect effects are often of greater interest

in evaluating the action of various mechanisms (10, 11).

Note that throughout this paper we will consider all effects

conditional on the covariates C, and we will thus use ex-

pressions such as ‘‘natural direct effect’’ and ‘‘conditional

natural direct effect’’ interchangeably.

On the risk difference scale, natural direct and indirect

effects have the property that the total effect E½Ya? Ya*j c?

decomposes into a natural direct and indirect effect:

E

h

Ya? Ya*j c

i

¼ E

¼ E

h

YaMa? Ya*Ma*j c

h

i

þ E

YaMa? YaMa*j c

ih

YaMa*? Ya*Ma*jc

i

:

The decomposition holds even when there are nonlinearities

and interactions. On the odds ratio scale, the natural direct

and indirect effects also have a decomposition property. On

the odds ratio scale, the odds ratio for the total effect de-

composes into a product of odds ratios for the natural direct

and indirect effect:

ORTE

a;a*jc¼PðYa¼ 1j cÞ=f1 ? PðYa¼ 1j cÞg

PðYa* ¼ 1j cÞ=f1 ? PðYa* ¼ 1j cÞg

PðYaMa¼ 1j cÞ=f1 ? PðYaMa¼ 1j cÞg

PðYa*Ma*¼ 1j cÞ=f1 ? PðYa*Ma*¼ 1j cÞg

¼PðYaMa¼ 1j cÞ=f1 ? PðYaMa¼ 1j cÞg

PðYaMa*¼ 1j cÞ=f1 ? PðYaMa*¼ 1j cÞg

3PðYaMa*¼ 1j cÞ=f1 ? PðYaMa*¼ 1j cÞg

PðYa*Ma*¼ 1j cÞ=f1 ? PðYa*Ma*¼ 1j cÞg;

¼

where the first expression in the product is the natural in-

direct effect odds ratio, ORNIE

a;a*jcðaÞ, and the second expres-

sion is the natural direct effect odds ratio, ORNDE

the log scale, this is logðORTE

logðORNDE

logðORTE

of the effect of the exposure mediated by the intermediate

on the log odds scale. If the outcome is rare, one can

ORNDE

ORNIE

a;a*jcðaÞ ? 1

ORNIE

a;a*jcðaÞ ? 1

on the risk difference scale. We have given formulas for the

‘‘pure natural direct effect’’ and the ‘‘total natural indirect

effect’’ (1); refer to the Web Appendix, which is posted on

the Journal’s Web site (http://aje.oxfordjournals.org/) for

a;a*jcða*Þ. On

a;a*jcðaÞÞþ

logðORNIE

a;a*jcÞ ¼ logðORNIE

ratio,

a;a*jcða*ÞÞ.

a;a*jcÞ, thus constitutes a measure of the proportion

The

a;a*jcðaÞÞ=

use

a;a*jcða*Þ3

o

no

=

n

ORNDE

a;a*jcða*Þ 3

as a measure of the proportion mediated

further discussion of these measures and for analogous for-

mulas for the ‘‘total natural direct effect’’ and the ‘‘pure

natural indirect effect’’ (1).

Under certain assumptions that the set of covariates C

contains all relevant confounding variables, the direct and

indirect effects can be identified with observed data. We will

follow the exposition of VanderWeele (12) and VanderWeele

and Vansteelandt (9) on the identification assumptions pro-

posed by Pearl (2). These identification assumptions were

presented to identify direct and indirect effects on the risk

difference scale but they apply also to the odds ratio scale.

To identify total effects, it is generally assumed that, con-

ditional on some set of measured covariates C, the effect of

exposure A on outcome Yis unconfounded; in counterfactual

notation, this is Ya

symbol‘to denote that Yais independent of A conditional

a researcher will attempt to collect data on a sufficiently rich

set of covariates C to try to control for confounding of the

exposure-outcome relation.

then the odds ratio for the total causal effect, ORTE

identified and can be estimated from the data using

‘Aj C, where we use the independence

on C. In practice, to make this assumption more plausible,

If thisassumptionholds,

a;a*jc, is

PðYa¼ 1j cÞ=f1 ? PðYa¼ 1j cÞg

PðYa* ¼ 1j cÞ=f1 ? PðYa* ¼ 1j cÞg

¼

PðY ¼ 1j a;cÞ=f1 ? PðY ¼ 1j a;cÞg

PðY ¼ 1j a*;cÞ=f1 ? PðY ¼ 1j a*;cÞg:

The left-hand side isthe odds ratiofor the total causal effect,

ORTE

a;a*jc; the right-hand side is an expression that can be

estimated from the data.

Controlled direct effects on the risk difference or

risk ratio scale are identified if conditioning on the set of

covariates C suffices to control for confounding of both the

exposure-outcome and the mediator-outcome relations. In

counterfactual notation, these 2 assumptions can, respec-

tively, be written as that for all a and m,

a

Yam

Mj

Yam

Aj C

ð1Þ

a

n

A;C

o

:

ð2Þ

Assumption 1 is similar to the assumption of no-unmeasured

confounding assumption for total effects. Assumption 2 re-

quires that, conditional on {A, C}, there is no unmeasured

confounding for the mediator-outcome relation. If assump-

tion 1 is satisfied but assumption 2 fails (i.e., if there is me-

diator-outcome confounding), then estimators for the direct

and indirect effect will in general be biased (1, 2, 13, 14).

Thus, in the cardiovascular example, if U denoted some as-

pect of diet that was associated with serum lipid levels and

was alsoassociatedwithcardiovasculardisease,thenitwould

be necessary to control for U in estimating the direct effect of

estrogen therapy on cardiovascular disease controlling for

serum lipid levels. If estrogen therapy were randomized, then

its effect on serum lipid concentrations only or on cardiovas-

cular disease only could be estimated without control for

U but, when the direct effect of estrogen therapy on

Odds Ratios for Mediation 1341

Am J Epidemiol 2010;172:1339–1348

Page 4

cardiovascular disease controlling for serum lipid concentra-

tions is of interest, data on U would be needed.

Unfortunately, in many studies using mediation analysis,

little attention is given to data collection for variables con-

founding the mediator-outcome relation. Effort is often

made to collect data on some set of covariates C that suffice

to control for confounding of the exposure-outcome relation

so that assumption 1 is satisfied, but this will not ensure that

assumption 2 is satisfied. As noted above, when there are

mediator-outcome confounding variables that are unmea-

sured or for which control has not been made, estimates

of direct and indirect effects will generally be biased. In

epidemiologic research for which questions of mediation

are of interest, greater effort should be made to collect data

on potential mediator-outcome confounders. When these

assumptions 1 and 2 do not hold, then sensitivity analysis

for mediation for violations of the no-unmeasured con-

founding assumptions should be used (15, 16). If assump-

tions 1 and 2 hold, then the controlled direct effect on the

risk difference scale and on the odds ratio scale is identified,

and ORCDE

a;a*jcðmÞ is then given by

PðYam¼ 1j cÞ=f1 ? PðYam¼ 1j cÞg

PðYa*m¼ 1j cÞ=f1 ? PðYa*m¼ 1j cÞg

¼

PðY ¼ 1j a;m;cÞ=f1 ? PðY ¼ 1j a;m;cÞg

PðY ¼ 1j a*;m;cÞ=f1 ? PðY ¼ 1j a*;m;cÞg:

For the identification of natural direct and indirect effects,

additionalassumptionsareneeded.Naturaldirectandindirect

effectswillbeidentifiedif,inadditiontoassumptions1and2,

the following 2 assumptions hold, that for all a, a*, and m,

a

Yam

Ma*j C

Ma

Aj C

ð3Þ

a

ð4Þ

Assumption 3 can be interpreted as that, conditional on C,

there is no unmeasured confounding for the exposure-

mediator relation. Assumption 4 will hold if confounding

for the mediator-outcome relation can be controlled for by

some set of baseline covariates C, so that there is no effect of

exposure A that confounds the mediator-outcome relation

(i.e., no effect L of exposure A that itself affects both

M and Y). Thus, assumption 4 would be violated in the case

of Figure 2. In some settings, assumption 4 may be plausible

if the mediator M occurs shortly after the exposure A (9). If,

however, there is a variable L that is an effect of A and affects

both M and Y, then assumption 4 is violated and natural direct

and indirect effects will not in general be identified (17),

irrespective of whether data are available on L. In such set-

tings, it may still be possible to identify controlled direct

effect odds ratios, but alternative statistical approaches such

as marginal structural models (12, 18, 19) or structural nested

models (20–24) will generally be needed. Note that none of

assumptions 1–4canbetestedbyusingdata;a researcherwill

have to rely on subject matter knowledge in evaluating them.

In the next section, we will show how natural direct and in-

direct effects can be estimated in a relatively straightforward

manner using regression.

REGRESSION ANALYSIS FOR DIRECT AND INDIRECT

EFFECT ODDS RATIOS

In this section, we describe a simple regression technique

that can be used to estimate controlled direct effect and nat-

ural direct and indirect effect odds ratios when the assump-

tions above hold. The estimation technique for controlled

direct effect odds ratios will require only assumptions 1 and

2 and will make use of a single logistic regression. The esti-

mation technique for natural direct and indirect effect odds

ratios will require assumptions 1–4 above and will combine

the results of a linear and logistic regression to obtain the

effects of interest; the estimation technique for natural direct

and indirecteffects will alsorequire that the outcomeYisrare

so that odds ratios approximate risk ratios, which allows one

to obtain particularly simple formulae. We consider a setting

in which the mediator M is continuous and the outcome Y is

dichotomous. We have described a similar approach for con-

tinuous outcomes elsewhere (9). Derivations for the results

below are given in the Web Appendix.

Consider the use of the following 2 models, a logistic

regression for the outcome Y (with no A 3 M product term)

and a linear regression for the mediator M:

logitðPðY ¼ 1j a;m;cÞÞ ¼ h0þ h1a þ h2m þ h4#c

ð5Þ

and

E½Mj a;c? ¼ b0þ b1a þ b2#c;

ð6Þ

where the error term for the linear regression for M is nor-

mally distributed with constant variance. If assumptions 1–4

hold and if regression models 5 and 6 are correctly specified,

then the controlled and natural direct effect and natural in-

direct effect odds ratios are given by

ORNDE

a;a*jcða*Þ ? ORCDE

a;a*jcðmÞ ¼ exp?h1ða ? a*Þ?

ORNIE

a;a*jcðaÞ ? exp?h2b1ða ? a*Þ?;

where the approximation holds to the extent the rare out-

come assumption holds. These expressions essentially use

h1for the direct effect and h2b1for the indirect effect, and

these expressions are also often used in the social science

A

M

Y

C

L

Figure 2.

come Y, covariates C, and a mediator-outcome confounder L that is

itself affected by the exposure.

Example of mediation with exposure A, mediator M, out-

1342 VanderWeele and Vansteelandt

Am J Epidemiol 2010;172:1339–1348

Page 5

literature for mediation analysis with a dichotomous out-

come (25, 26). The use of models 5 and 6 along with the

expressions above is often referred to as the ‘‘Baron-

Kenny’’ approach to mediation (26). A related approach,

common in both the epidemiologic literature and the social

science literature, consists of regressing Y on A, M, C as in

model 5 and then examining whether the coefficient for A is

different from that obtained when Yis regressed on A and C

alone, such as the folllowing:

logitðPðY ¼ 1j a;cÞÞ ¼ /0þ /1a þ /2#c:

The difference between coefficients for A, /1? h1, is some-

times interpreted as an indirect effect. The traditional ‘‘pro-

portion explained’’ methods (27–30) are closely related and

use (/1? h1)//1as the measure of interest, again effectively

relying on the difference between the 2 coefficients. In the

included Appendix, we in fact show that, under assumptions

1–4, correct specification of models 5 and 6, and a rare out-

come, these 2 approaches to mediation analysis with a di-

chotomous outcome are essentially equivalent with /1? h1

? h2b1. The results above provide a formal counterfactual

interpretation of these various effect measures. An alterna-

tive measure of the ‘‘proportion explained’’ proposed by

Wang et al. (31) is, under certain exchangeability assump-

tions, similar to a natural indirect effect (32).

However, a limitation of all of the standard approaches is

that they presuppose that there is no statistical interaction on

the odds ratio scale between A and M in the logistic model

for Y. When such A 3 M interactions are present and are

ignored, the logistic regression model 5 will not be correctly

specified, and the difference /1 ? h1 does not carry

a straightforward interpretation as an indirect causal effect;

the definition of an indirect effect essentially breaks down

within the standard Baron-Kenny approach when such in-

teractions are present (33). Hafeman (34) has also recently

documented the biases that can arise with the traditional

‘‘proportion explained’’ methods when used in multiplica-

tive models for a dichotomous outcome in which interaction

terms are omitted. Here, we show how the regression ap-

proach can be extended to allow for interaction. Specifically,

suppose that, instead of model 5, the following model,

which includes an A 3 M product term, is used:

logitðPðY ¼ 1j a;m;cÞÞ ¼ h0þ h1a þ h2m þ h3am þ h4#c:

ð7Þ

If assumptions 1–4 hold and if the regression models 6 and 7

are correctly specified and the outcome is rare, then the

controlled direct effect and natural indirect effect odds ratios

are given, respectively, by

ORCDE

a;a*jcðmÞ¼ exp?ðh1þ h3mÞ?a ? a*??

ORNIE

ð8Þ

a;a*jcðaÞ ? exp?ðh2b1þ h3b1aÞ?a ? a*??:

The formula for the controlled direct effect odds ratio

requires that assumptions 1 and 2 hold and that model 7 is

correctly specified; no rare outcome assumption is required.

ð9Þ

The formula for the natural indirect effect odds ratio re-

quires that assumptions 1–4 hold, that models 6 and 7 are

correctly specified, and that the outcome Y is rare. An esti-

mator can also be given for the natural direct effect odds

ratio (refer to the Web Appendix material) but is more com-

plicated because, when there is interaction between A and M

in the logistic model for Y, the natural direct effect will be

different for subjects with different covariate values C.

Model 7 and expressions 8 and 9 essentially generalize

the Baron-Kenny approach to allow for exposure-mediator

interactions.

Ninety-five percent confidence intervals for the controlled

directeffectoddsratio inexpression8 and thenatural indirect

effect odds ratio in expression 9 can be computed by using

standard regression output and are given, respectively, by

?

explogORCDE

a;a*jcðmÞ61:96?a ? a*?

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

rh

11þ2rh

13m þrh

33m2

q

?

and

exp

?

logORNIE

a;a*jcðaÞ61:96?a ? a*?

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ijisthe covariance betweenˆbiandˆbjin model 6, and

ijis the covariance betweenˆhiandˆhjin model 7; these

covariances are given in the regression output of standard

statistical software. Alternatively, standard errors for ex-

pressions 8 and 9 could be obtained by bootstrapping.

Expressions 8 and 9 generalize mediation analysis with

a dichotomous outcome to settings in which there may be

interactions on the odds ratio scale between the exposure

and mediator of interest. The standard approach of omit-

ting the h3am product term in assessing mediation is

highly problematic when correct specification of a logistic

regression model for Y requires the product term. When

there is in fact such interaction between A and M, ignoring

this (as is often done) can result in highly misleading in-

ferences concerning mediation. If, for example, the direc-

tion of the association between A and Y differs for

different levels of m and if the h3am term in model 7 is

omitted, the resulting estimate of the exposure coefficient

h1may be close to 0 because of averaging. This might

result in a researcher’s concluding that the effect of A

on Y is largely mediated by M, when in fact all that is

the case is that there is an interaction between the effects

of A and M on Y. At the very least, epidemiologists, before

applying the standard approach, should test whether h3¼

0 in the regression model 7 and should consider whether

the no-unmeasured-confounding assumptions described

above are satisfied. If there is evidence that h36¼ 0, then

this standard approach of merely including the mediator

in a regression for the outcome Y to obtain direct and

indirect effects should not be used. The approach de-

scribed above, however, of using both models 6 and 7

could still be used when there is an interaction between

A and M in model 7.

ðh2þ h3aÞ2rb

11þ b2

1

?rh

22þ 2rh

23a þ rh

33a2?

q

o

;

where rb

rh

Odds Ratios for Mediation1343

Am J Epidemiol 2010;172:1339–1348