Page 1

Biostatistics (2010), 11, 1, pp. 34–47

doi:10.1093/biostatistics/kxp034

Advance Access publication on October 8, 2009

Semiparametric estimation of the average causal effect of

treatment on an outcome measured after a

postrandomization event, with missing outcome data

PETER B. GILBERT∗

Department of Biostatistics, University of Washington, Seattle, WA 98105, USA and Fred Hutchinson

Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA

pgilbert@scharp.org

YUYING JIN

Department of Biostatistics, University of Washington, Seattle, WA 98105, USA

SUMMARY

In the past decade, several principal stratification–based statistical methods have been developed for test-

ing and estimation of a treatment effect on an outcome measured after a postrandomization event. Two

examples are the evaluation of the effect of a cancer treatment on quality of life in subjects who remain

alive and the evaluation of the effect of an HIV vaccine on viral load in subjects who acquire HIV infec-

tion. However, in general the developed methods have not addressed the issue of missing outcome data,

and hence their validity relies on a missing completely at random (MCAR) assumption. Because in many

applications the MCAR assumption is untenable, while a missing at random (MAR) assumption is defen-

sible, we extend the semiparametric likelihood sensitivity analysis approach of Gilbert and others (2003)

and Jemiai and Rotnitzky (2005) to allow the outcome to be MAR. We combine these methods with the

robust likelihood–based method of Little and An (2004) for handling MAR data to provide semiparamet-

ric estimation of the average causal effect of treatment on the outcome. The new method, which does not

require a monotonicity assumption, is evaluated in a simulation study and is applied to data from the first

HIV vaccine efficacy trial.

Keywords: Causal inference; HIV vaccine trial; Missing at random; Posttreatment selection bias; Principal stratifica-

tion; Sensitivity analysis.

1. INTRODUCTION

For randomized placebo-controlled efficacy trials of an HIV vaccine evaluated in HIV-uninfected vol-

unteers, a primary objective is to evaluate the effect of vaccination on the incidence of HIV infection.

Another objective, which was secondary in 2 trials of antibody-based vaccines (Flynn and others, 2005;

Pitisittithumandothers,2006)andco-primaryin2trialsofTcell–basedvaccines(Buchbinder and others,

∗To whom correspondence should be addressed.

c ? The Author 2009. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Page 2

Semiparametric estimation of the average causal effect of treatment

35

2008), is to evaluate the effect of vaccination on HIV viral load measured after HIV infection. Two causal

approaches of interest for assessing the latter objective are intent-to-treat (ITT) methods that assess the

burden of illness (a composite end point that combines the infection end point and the postinfection end

point) (Chang and others, 1994) and conditional methods that assess the postinfection end point in the

principal stratum of subjects who would be HIV-infected under either treatment assignment (the so-called

“always infected” stratum). As discussed by Gilbert and others (2003) [henceforth GBH] and others, the

2 approaches address different substantive questions. For example, in registrational/licensure trials, the

ITT approach may be most appropriate for the primary analysis and the conditional approach would

be used in secondary analyses of the mechanistic vaccine effect on the postinfection outcome, whereas

in preregistrational test-of-concept trials, the conditional approach might be used for the primary analysis

(Mehrotra and others, 2006).

Here, we develop methodology for the conditional approach to evaluate the causal vaccine effect on

the postinfection end point. For concreteness, we refer to the first postrandomization event as infection

and the outcome of interest measured after this event as viral load. However, the method has general

application to evaluating causal treatment effects on outcomes measured after a postrandomization event,

including studies of quality of life (Rubin, 2000), prostate cancer severity (Shepherd and others, 2008),

kidney disease, and cancer screening (Joffe and others, 2007).

Because the set of trial participants who are in the always infected stratum is unknown, causal treat-

ment effects for this group are not identified from randomized trial data. Two approaches for addressing

the nonidentifiability have been to derive sharp bounds for causal effects (Jemiai and Rotnitzky, 2003;

Hudgens and others, 2003; Zhang and Rubin, 2003) and to estimate causal effects under an additional set

of identifiability assumptions that include models describing the nature and degree of possible selection

bias, with a sensitivity analysis to explore how the inferences vary over a range of the selection models

(GBH; Hayden and others, 2005; Hudgens and Halloran, 2006; Jemiai and Rotnitzky, 2005 (henceforth

JR); Jemiai and others, 2007; Shepherd, Gilbert, Jemiai and others, 2006; Shepherd and others, 2007,

2008). All these methods assume that viral load is observed for all infected subjects. Therefore, the va-

lidity of their inferences depends on a missing completely at random assumption (MCAR). While the

MCAR assumption is often untenable, trials may collect sufficient data on participant characteristics to

make a missing at random (MAR) assumption plausible. This article extends the approach of GBH and

JR to accommodate MAR missingness of the viral load end point.

To illustrate the new method, we focus on the first efficacy trial (Flynn and others, 2005) and define

the end point Ypto be the pre-antiretroviral therapy (pre-ART) log10viral load at the month 12 visit post

HIV infection diagnosis. Only subjects who have not started ART by the month 12 visit contribute a value

Yp. We exclude viral load values measured after ART initiation because ART strongly effects viral load

levels (Gilbert and others, 2003). Inferences about Ypapply to a population where ART is not prescribed

during the first 12 months after infection diagnosis.

Of the 368 subjects who became HIV-infected during the trial, 121 had Ypobserved, 138 had missing

data because they initiated ART prior to the month 12 study visit, and 109 had missing data because they

dropped out prior to initiating ART and prior to the visit. Figure 1 shows the VaxGen data on Yp= Y3in

relationship to 2 key covariates: Y1is early pre-ART square root CD4 cell count, and Y2is early pre-ART

log10viral load; these variables average all pre-ART values measured within 2 months of infection diagno-

sis. Viral load early and at 12 months were positively correlated (Figures 1(e) and (f)), with Spearman rank

correlation0.68(0.55)forthevaccine(placebo)group.Thereisevidenceofanegativecorrelationbetween

early CD4 cell count and viral load at 12 months (Figures 1(c) and (d)) for the vaccine group (Spearman

correlation = −0.33) but not for the placebo group (Spearman correlation = 0.16).

In addition, lower levels of Y1 and higher levels of Y2 were highly predictive of ART initiation

(P values< 0.001 in a multivariate Cox model; Gilbert and others, 2005), showing that MCAR was badly

violated. MAR may be plausible, however, because physicians base decisions to prescribe ART on the

Page 3

36 P. B. GILBERT AND Y. JIN

Fig. 1. VaxGen trial data: jittered pre-antiretroviral (pre-ART) viral loads at the month 12 postinfection diagnosis

visit (Y3) ((a) and (b)); pre-ART viral loads at the month 12 postinfection diagnosis visit versus square root CD4 cell

counts early after infection diagnosis (Y3versus Y1; (c) and (d)); pre-ART viral loads at the month 12 postinfection

diagnosis visit vs early after infection diagnosis (Y3versus Y2; (e) and (f)).

monitoring of CD4 cell counts, viral loads, HIV-related clinical events, and comorbidities (Hammer and

others, 2008). In fact, MAR missingness due to ART initiation may approximately hold “systematically”

in settings where ART is offered to all infected participants when their biomarkers are observed to cross

prespecified thresholds or they present with prespecified symptomatic illnesses.

Page 4

Semiparametric estimation of the average causal effect of treatment

37

Two popular approaches to making valid inferences under MAR are inverse probability weighted

(IPW)–based methods (e.g. Robins and others, 1995) and likelihood-based methods (e.g. Little and

Rubin,2002).ForsomeHIVvaccinetrials,theformermethodsareexpectedtoberelativelyinefficientand

possibly unstable because the estimated weights for some subjects are expected to be near zero. Specif-

ically, for a subject with Ypobserved, the weight in the denominator of the estimating equation equals

the estimated probability that the subject did not drop out by the month 12 visit multiplied by the

estimated probability that the subject did not start ART by the visit conditional on not dropping out.

These estimates can be computed as fitted values in regression models based on the subject’s covariates.

The latter conditional probability may be near zero for subjects whose CD4 cell counts drop below the

prespecified threshold at which ART initiation is recommended (currently the recommended threshold

is between 200 and 500 cells/mm3depending on the country). In fact, for ethical reasons any efficacy

trial will offer ART to all infected participants who meet treatment criteria, such that the more suc-

cessful the treatment coverage the closer some estimated weights in the denominator may be to zero.

Therefore, for some HIV vaccine trials, IPW methods are expected to provide relatively imprecise

inferences.

While likelihood methods are less subject to the instability problem, they are susceptible to mis-

specification of the model relating Yp to covariates. To partially ameliorate this problem, we use the

robust likelihood–based method of Little and An (2004) (henceforth LA) that is based on penalized

splines of the propensity score. Our approach only handles monotone missing data (i.e. dropout); an

alternative approach would handle nonmonotone missingness using parametric multiple imputation and

Monte Carlo Markov chain, for example, by extending the method of Mogg and Mehrotra (2007) for

analyzing viral load data to address postrandomization selection bias. The article is organized as fol-

lows. Section 2 describes the causal estimand of interest and identifiability assumptions. Section 3 shows

how to combine the methodologies of GBH/JR and LA into a procedure for consistently estimating the

causal estimand under an MAR assumption. Section 4 evaluates the new method in a simulation study.

Section 5 applies the method to the first HIV vaccine efficacy trial, and Section 6 offers concluding

remarks.

2. AVERAGE CAUSAL EFFECT ESTIMAND AND IDENTIFIABILITY ASSUMPTIONS

2.1

Notation and estimand

Let Z be treatment assignment, and let X be a q-vector of baseline covariates fully observed for everyone.

Let S be the indicator of the postrandomization event. Subjects experiencing S = 1 are subsequently

evaluated at V visits, where variables Y1,...,Yn1are collected at visit 1, variables Yn1+1,...,Yn1+n2are

collected at visit 2, and so on, with variables Y?V−1

variable of interest. For j = 1,..., p, let Mj be the indicator of whether Yj is missing and set M =

(M1,..., Mp)?. The variables Y are only meaningful if S = 1; thus, Y and M are undefined if S = 0 and

we denote this by Y = M = ∗. For HIV vaccine trials, Z is vaccination assignment (Z = 1, vaccine;

Z = 0, placebo) and S is HIV infection diagnosis during the trial.

Eachparticipanthaspotentialinfectionoutcome S(1)ifassignedvaccineand S(0)ifassignedplacebo.

For Z = 0,1, the potential outcomes Y(Z) ≡ (Y1(Z),Y2(Z),...,Yp(Z))?and M(Z) ≡ (M1(Z),

M2(Z),..., Mp(Z))?are defined if S(Z) = 1; otherwise Y(Z) ≡ ∗ and M(Z) ≡ ∗. With µz ≡

E(Yp(z)|S(0) = S(1) = 1) for z = 0,1, the “average causal effect (ACE)” estimand of interest is

ACE ≡ µ1− µ0. Our goal is to estimate the ACE based on assumptions and the observed i.i.d. data

(Zi, Xi, Si, Mi,Yi), i = 1,..., N.

i=1ni+1,...,Ypcollected at visit V, where p =?V

i=1ni.

The entire collection of p variables measured after S = 1 is Y ≡ (Y1,...,Yp)?, where Ypis the outcome

Page 5

38P. B. GILBERT AND Y. JIN

2.2

MAR assumption

For subjects with S = 1, let Yobsdenote the components of Y that are observed and Ymisdenote the

components of Y that are missing. Let f be the conditional cumulative distribution function (cdf) of

M given Y and S = 1, f (M|Y, S = 1,ν), where ν denotes unknown parameters. MAR states that

missingness depends only on the observed values Yobs, that is,

f (M|Y, S = 1,ν) = f (M|Yobs, S = 1,ν)

for all Ymis,ν.

For simplicity, we develop the methods for a setting where Y1,...,Yp−1are fully observed in infected

subjects (with S = 1) and only Yphas missing values. In Section 6, we describe how to extend the method

to the case of monotone missing data.

2.3

Identifiability assumptions

Throughout, we make the following 3 assumptions.

A1: Stable unit treatment values assumption (Rubin, 1978).

A2: The treatment assignment Z is independent of (X, S(0), S(1), M(0), M(1),Y(0),Y(1)).

A3: For infected subjects (with S = 1), the missing data mechanism for Y is MAR.

All the papers for studying the causal vaccine effect in the always infected stratum cited in Section 1

assume A1 and A2; we refer the reader to these articles for discussion about their justification. As dis-

cussed in Section 1, the MAR assumption A3 may be quite plausible when missingness is mainly due to

ART initiation and ART guidelines are used. If the missing data mechanism is believed to be non-MAR,

then the methodology here may mislead and should be used with caution. An advantageous feature of the

MAR assumption is that investigators can design clinical trials to collect the covariate data that make it

plausible.

We next postulate additional assumptions that identify the ACE and are indexed by fixed sensitivity

parameters. Following JR, we suppose 3 models, which we refer to collectively as A4:

g0(Pr(S(1) = 1|S(0) = 1,Yp(0) = y) = α0+ β0y,

g1(Pr(S(0) = 1|S(1) = 1,Yp(1) = y) = α1+ β1y,

Pr(S(0) = 1|S(1) = 1) = φ,

(2.1)

(2.2)

(2.3)

where g0and g1are known invertible link functions whose inverses are continuous in α0and α1, α0and

α1are unknown parameters to be estimated, and β0, β1, and φ are known sensitivity parameters that are

varied over plausible ranges, where subject matter experts can help define the plausible ranges. With logit

links g0and g1, β0is interpreted as the difference in the log odds of infection in the vaccine group given

infection in the placebo group with y versus y − 1 viral load, and β1is interpreted similarly reversing

the role of vaccine and placebo. The parameter φ is interpreted as the probability that a subject infected

in the vaccine group would also be infected in the placebo group. Except for the method of JR, all of the

previously developed methods cited above assume φ = 1 (i.e. monotonicity, that the vaccine does not

increase the risk of infection for any subject), in which case the selection model (2.2) is superfluous and

only model (2.1) is used, and the only sensitivity parameter is β0. In this case, the methods of GBH and

JR are equivalent. For greater applicability of the method, here, we allow for nonmonotonic settings by

considering the trio of models (2.1–2.3).

Page 6

Semiparametric estimation of the average causal effect of treatment

39

3. ESTIMATION OF THE ACE

3.1

Estimation of the ACE for complete data

For the case that Ypis measured from all subjects with S = 1, JR proved that A1, A2, and A4 identify

the ACE and developed unbiased estimating equations that can be solved to obtain a consistent asymp-

totically normal estimator of the ACE. Shepherd and others (2008) summarized this result together with

a computational procedure for solving the equation and for obtaining the consistent sandwich variance

estimator. With θ ≡ (p0, p1,α0,α1,µ0,µ1)?, JR’s estimating equation?N

JR’s estimating equation was derived based on the following relationships, which form the basis for

integrating the LA and JR methods:

i=1Ui(θ) = 0 is detailed using

our notation in Appendix A ([A.1]–[A.6]) in the supplementary material available at Biostatistics online.

E[g−1

0(Yp; α0,β0)|S = 1, Z = 0] = φp1

E[g−1

p0,

(3.1)

1(Yp; α1,β1)|S = 1, Z = 1] = φ,

E[Ypg−1

(3.2)

0(Yp; α0,β0)|S = 1, Z = 0] = φp1

E[Ypg−1

p0µ0,

(3.3)

1(Yp; α1,β1)|S = 1, Z = 1] = φµ1.(3.4)

3.2

Estimation of the ACE for MAR data

We show how to augment JR’s estimating equations to handle missing data via LA’s method. Following

Section 5 of LA, we suppose q + p ? 3 so that there are at least 2 covariates. For infected subjects,

separately for groups Z = 0 and Z = 1, define the logit of the propensity score for Ypto be observed,

given the covariates X,Y1,...,Yp−1:

Y∗

1≡ logit(Pr(Mp= 0|X,Y1,...,Yp−1, S = 1, Z = z)).

LA observed that, conditional on the propensity score and assuming MAR, the missingness of Ypdoes

not depend on Z, X, Y1,...,Yp−1(Rosenbaum and Rubin, 1983). Consequently, each of the expectations

in (3.1–3.4) can be written as

(3.5)

E[h(Yp)|S = 1, Z = z] = E[(1 − Mp)h(Yp)|S = 1, Z = z]

+ E[Mp× E(h(Yp)|Y∗

where h(Yp) denotes a function of Yp. Therefore, JR’s estimating function Ui(θ) can be modified to

accommodate MAR missingness, where the modified function UM

The estimating function UM

i(θ) is detailed in Appendix A ([A.7]–[A.12]) in the supplementary material

available at Biostatistics online. The function UM

g−1

nique is applied to obtain the predicted values? Ei[h(Yp)].

3.3

Calculation of the fitted values

1, X2,..., Xq,Y1,...,Yp−1, S = 1, Z = z)|S = 1, Z = z],

(3.6)

i(θ) uses (3.6) instead of (3.1)–(3.4).

i(θ) includes predicted values? Ei[h(Yp)] for h(Yp) =

z(Yp; α0,β0) or Ypg−1

z(Yp; α0,β0), for z = 0,1, and UM

i(θ) becomes fully specified once LA’s tech-

We use LA’s (Section 5) penalized spline propensity prediction method to predict Ypfor infected subjects

with missing data. First, a logistic regression model is fit relating Mpto (X1,..., Xq,Y1,...,Yp−1)?,

Page 7

40P. B. GILBERT AND Y. JIN

which yields estimated propensities? Y∗

Ypmay be entered into the regression model parametrically. From this fitted model, the value E[Yp|Y∗

X2,..., Xq,Y1,...,Yp−1, S = 1, Z = z] is predicted for each subject with SMp= 1 using his or her

estimated propensity? Y∗

a general link function) or a closed-form equation (for an inverse probit link function). These steps may

be done in the vaccine and placebo groups separately.

Applied to our setting, LA’s specific propensity spline prediction model replaces one of the predictor

variables, say X1, by Y∗

1for all subjects with S = 1. Second, a spline regression model

1is fit using subjects for whom Ypis observed. Beyond? Y∗

1and other covariates. Third, as described below, fitted values? Ei[g−1

of Ypon? Y∗

1, additional covariates that predict

1,

z(Yp; αz,βz)]

and? Ei[Ypg−1

z(Yp; αz,βz)] are computed using these regression fits and either numerical integration (for

1and supposes

(X2,..., Xq,Y1,...,Yp−1|Y∗

∼ N((sX

(Yp|Y∗

∼ N(sY

1, S = 1, Z = z)

zq(Y∗

z2(Y∗

1),...,sX

1),sY

z1(Y∗

1),...sY

zp−1(Y∗

1)),?z),

(3.7)

1, X2,..., Xq,Y1,...,Yp−1, S = 1, Z = z,γz)

zp(Y∗

1) + rz(X∗

1,..., X∗

q,Y∗

1,...,Y∗

p−1,γz),σ2

z),

for z = 0,1, where for j = 2,...,q, sX

of Xjon Y∗

is a spline for the regression of Yj on Y∗

is a parametric function with unknown parameter vector γzthat satisfies rz(Y∗

all γz. For subject i in group z with SiMpi = 1, the predicted value of Ypis obtained as? Ei[Yp] =

denotes the sample estimate of the spline sX

realization of Xij, and yijis the realization of Yij. The model (3.7) enters X∗

follow-up work, Zhang and Little (2005) showed that the LA method—and hence our sensitivity analysis

method—remains valid if the X∗

Next, using (3.7), for a general link function, we compute the fitted values? Ei[g−1

predicted value? Ei[Ypg−1

and? Ei[Ypg−1

zj(Y∗

1); and for j = 1,..., p − 1, sY

1, where Y∗

1) = E(Xj|Y∗

1, S = 1, Z = z) is a spline for the regression

zj(Y∗

j= Yj− sY

1, where X∗

j= Xj− sX

zj(Y∗

1) = E(Yj|Y∗

1). Furthermore, for z = 0,1, rz

1,0,...,0,γz) = 0 for

1, S = 1, Z = z)

zj(Y∗

? sY

zp(? y∗

i1) + rz(? x∗

i1,...,? x∗

iq,? y∗

i1,...,? y∗

i(p−1); ?γ

z), where ? x∗

ij= xij−? sX

zj(? x∗

i1), ? y∗

ij= yij−? sY

jas covariates; in

zj(? y∗

i1), ? sX

zj

zj,? sY

zjdenotes the sample estimate of the spline sY

zj, xijis the

jand Y∗

j,Y∗

jare replaced with the Xj,Yj.

z(Yp; αz,βz)] as

?g−1

is the inverse probit function, numerical integration is not needed and the fitted values? Ei[g−1

described in Appendix A in the supplementary material available at Biostatistics online.

z(y; αz,βz)1

? σzd?([y −? Ei[Yp]]/? σz), where ? is the cdf of the standard normal distribution. The

z(Yp; αz,βz)] can instead be computed as simple functions of? Ei[Yp]. This procedure is

z(Yp; αz,βz)] is also computed using numerical integration. For the case that gz

z(Yp; αz,βz)]

3.4

Computational algorithm for estimating the ACE

ACE = ? µ1− ? µ0is computed by solving?N

Step 1: Estimate p0≡ Pr(S(0) = 1) by solving?N

Step 2: Plug ? p0 and the fitted values ? Ei[g−1

into UM

4i(θ) and solve for α1.

The estimate?

by solving?N

?N

i=1UM

i(θ) = 0 (UM

i(θ) is defined in Appendix

A [A.7]–[A.12] in the supplementary material available at Biostatistics online) with the following steps.

i=1UM

1i(p0) = 0, and estimate p1≡ Pr(S(1) = 1)

i=1UM

2i(p1) = 0.

0(Yp; α0,β0)] into UM

3i(θ) and solve for α0 in

1(Yp; α1,β1)]

i=1UM

3i(α0)= 0witha1Dlinesearch.Similarly,plug ? p1andthefittedvalues? Ei[g−1

Page 8

Semiparametric estimation of the average causal effect of treatment

41

Step 3: Plug the estimates of p0and α0and the fitted values? Ei[Ypg−1

Under the monotonicity assumption (i.e. φ = 1), the computational algorithm is the same except that the

second part of Step 2 is omitted (α1is no longer relevant), and UM

(A.13) of Appendix A in the supplementary material available at Biostatistics online). If gzis inverse

probit, then Steps 2 and 3 use closed-form formulas (A.14) and (A.15) in Appendix A in the supplemen-

tary material available at Biostatistics online; otherwise they use numerical integration as described in

Section 3.3.

0(Yp; α0,β0)] into UM

6i(µ1) = 0 for µ1.

5i(θ) and

solve?N

i=1UM

5i(µ0) = 0 for µ0. Similarly, solve?N

i=1UM

6i(θ) simplifies (given in expression

3.5

Standard errors and CIs

Deriving asymptotic-based standard error estimators for?

develop a bootstrap approach. As LA did not develop CIs nor standard error estimators, the performance

of this approach is of interest even for the setting without a postrandomization event.

Within each treatment group Z = z separately, B bootstrap data sets are constructed by sampling with

replacement Nzrealizations of (Xi, Si, Mi,Yi)|Zi = z, z = 0,1. The estimation procedure described

in Section 3.4 is carried out for each bootstrap data set. Then, standard errors for ? µ0, ? µ1, and?

obtained as the α/2 and 1 − α/2 percentiles of the bootstrap estimates.

ACE and confidence intervals (CIs) for ACE is

difficult, given the smoothing and tuning parameter selection involved with the penalized splines. We

ACE are

estimated by the sample standard deviations of the bootstrap estimates, and (1 − α) × 100% CIs are

3.6

Robustness of the ACE estimator

LA’s Theorem 1 states a double robustness–type property of the propensity spline prediction procedure,

see Kang and Schafer (2007) and comments within for further discussion. In our setting, this robustness

property can be stated as follows.

THEOREM 3.1 Assume A1–A4, and let ? µ0and ? µ1be the estimators of µ0and µ1, respectively, obtained

propensity Y∗

E[Y∗

by solving?N

i=1UM

i(θ) = 0. Then, for z = 0,1, ? µzis a consistent estimator of µzif either (i) the mean

1, S = 1, Z = z] = sY

of Ypconditional on (Y∗

1, X2,..., Xq,Y1,...,Yp−1, S = 1, Z) in (3.7) is correctly specified or (ii) the

1is correctly specified, and (iii) E[X∗

zj(Y∗

j|Y∗

1, S = 1, Z = z] = sX

zj(Y∗

1) for j = 2,...,q and

j|Y∗

1) for j = 1,..., p.

LA comment that the robustness property of (iii) is that rzdoes not need to be correctly specified and

suggest that (b2) is a weak assumption because of the flexibility of the spline regression models.

4. SIMULATION STUDY

We briefly describe the simulation study of an HIV vaccine trial, with expanded details in Appendix B in

the supplementary material available at Biostatistics online.

4.1

Design of simulation study

Four simulation experiments were carried out such that A1–A4 hold with φ = 1 (i.e. monotonicity holds)

and g−1

or β0 = 3. Based on the VaxGen trial, we consider the 3 variables Y1, Y2, and Y3defined in Section 1

0(y; α0,β0) = ?(α0+ β0y). We created moderate or extreme selection bias specified by β0= 1

Page 9

42P. B. GILBERT AND Y. JIN

(the outcome Y3is month 12 pre-ART log10viral load; Y1is early pre-ART square root CD4 cell count;

Y2is early pre-ART log10viral load).

Each simulation generated data with an even chance of assignment to vaccine or placebo, an overall

placebo infection rate of 25%, and Y(0) = (Y1(0),Y2(0),Y3(0))?multivariate normal, with means and

variances equal to the sample estimates obtained in the VaxGen trial (Gilbert and others, 2005). The cor-

relations were set to cor(Y1(0),Y2(0)) = cor(Y1(0),Y3(0)) = −0.5 and cor(Y2(0),Y3(0)) = 0.8. For in-

fected vaccine group subjects, Y(1) was set to Y(0)+(0,0,−0.5)?, which specifies the true ACE = −0.5.

We assumed complete data for Y1and Y2in infected subjects and created missing data for Y3that was

either MCAR or MAR. In either case, Y3is missing in about 50% of infected subjects. The MAR scenario

generated missingness indicators from a logistic regression model of M3on Y1and Y2with coefficients

chosen from a fit to the VaxGen data.

Four methods were used to estimate the ACE with a 95% bootstrap percentile CI. The first was or-

dinary least squares complete case (OLSCC), which simply compares sample averages of Y3between

groups in all infected subjects that have Y3measured. The next 3 methods are causal: before deletion

(BD), which applies GBH/JR to Y3using the complete data (an unknowable gold standard for compar-

ison); CC, a complete case analysis that applies GBH/JR; SPPL, the new method with penalized spline

propensity prediction based on a regression of Y3on the spline of Y∗

the estimated propensity to observe Y3from a linear logistic regression of M3on Y1and Y2. This SPPL

method uses 15 equally spaced fixed knots and a truncated linear basis and uses model (3.7) with a linear

additive model for Y∗

equivalent to the GBH/JR method for the special case that φ = 1 and β0= β1= 0.

1, where Y∗

1is the linear predictor of

2: rz(Y∗

1,Y∗

2,γz) = γz1Y∗

1+ γz2Y∗

2, for z = 0,1. Note that the OLSCC method is

4.2

Simulation results

Table 1 shows bias and root mean squared error (RMSE) of the estimated ACE and the coverage probabil-

ity of the nonparametric bootstrap CIs. In all cases, the OLSCC method is badly biased because it ignores

Table 1. Bias, RMSE and 95% coverage probability for the OLSCC, BD, CC, and SPPL methods (true

ACE = −0.5)

OLSCC BD

MCAR data

BiasRMSEBias

β0= 1

ACE0.4260.464

−0.002

ACE coverage27.6%91.6%

β0= 3

ACE 0.6080.6270.001

ACE coverage 2.2% 91.8%

MAR data

BiasRMSE Bias

β0= 1

ACE 0.343 0.372

−0.005

ACE coverage 37.6% 94.8%

β0= 3

ACE0.4640.482

−0.004

ACE coverage 6.8%95.6%

CCSPPL

RMSEBiasRMSEBias RMSE

0.147

−0.020 0.214

−0.0020.169

92.8%93.4%

0.152

−0.0370.2130.0030.168

93.4%94.6%

RMSEBias RMSEBiasRMSE

0.131

−0.0250.1770.0020.154

97.4%96.8%

0.135

−0.050 0.194

−0.001 0.156

94%95.6%

NOTE: BD = GBH/JR applied to complete data BD (unattainable gold standard); CC = GBH/JR applied to CCs; SPPL = spline

propensity prediction with a linear predictor.

Page 10

Semiparametric estimation of the average causal effect of treatment

43

Fig. 2. Simulation study: boxplots of nonparametric bootstrap percentile 95% CI lengths for the ACE.

the postrandomization selection bias. As expected, the CC method is unbiased under MCAR missing-

ness but biased under MAR missingness, whereas the new SPPL method is unbiased under both MCAR

and MAR data. Corresponding to this, the CIs about the ACE for the CC method have too-low coverage

probability under MAR data, whereas the CIs for the SPPL method are always near the correct level. The

RMSEs for the SPPL method are 21.1% lower on average than those of the CC method, showing that the

new method provides substantially better estimators than the existing methods.

Figure 2 shows distributions of the CI lengths, demonstrating that the SPPL method provides much

more precise estimates than the CC method. Furthermore, under MAR missingness the bootstrap variance

estimates of?

method is unbiased (not shown). Additional simulations were conducted under the above design except

that Y(1) was set to Y(0)+(0,−0.5,−0.5)?. The results were similar, demonstrating that the new method

performs well when there is an average causal effect on a postinfection covariate used to predict the

outcome of interest.

ACE had a similar distribution as the Monte Carlo sample variances of the?

ACE?s for the

SPPL method but not for the CC method, indicating that the bootstrap variance estimator for the SPPL

5. EXAMPLE

We now apply the newly developed SPPL method to the VaxGen data. The same method as evaluated

in the simulations was applied, except with 3000 bootstrap replications. Our outcome of interest Y3and

Page 11

44 P. B. GILBERT AND Y. JIN

covariates Y1and Y2are the same as considered above. Of the 368 subjects who became HIV-infected,

there is nearly complete data for Y1(n = 334) and Y2(n = 341), and, as discussed above, one-third of

the infected subjects have Y3observed (n = 121) (Figure 1). The R2value of a linear regression of Y3

on Y1and Y2is 0.40, suggesting that accounting for the predictive information in Y1and Y2with the new

method is expected to provide greater efficiency than the CC method that does not leverage this positive

correlation.

Because the estimated incidence of HIV infection was slightly lower in the vaccine group than the

placebo group (Flynn and others, 2005), the data support (but of course do not prove) the monotonicity

assumption that φ = 1. Accordingly, we first conduct a sensitivity analysis assuming φ = 1. For the sen-

sitivity parameter β0ranging from −2 to 2, which in our opinion covers the plausible range for potential

selection bias, Figure 3 shows point and 95% bootstrap CI estimates of the ACE by the CC method and

by the new method. Both analyses support no causal vaccine effect on viral load, and the new method

provides slightly narrower CIs.

Acknowledging that monotonocity may not hold, we next performed a sensitivity analysis setting

φ = 0.8. This value was chosen based on the fact that the upper 95% confidence limit for the hazard ratio

(vaccine/placebo) was 1.17. A value φ = 0.8 assumes that a vaccinated subject who becomes infected

would have a 20% chance of avoiding infection had he/she been assigned placebo, which approximately

specifies a 17% plausible elevation of infection risk in the vaccine group. The shaded regions in Figure 4

show values of (β0,β1) under which a 95% CI for the ACE excludes 0. Figure 4 shows that only under

conditions of large selection bias in certain directions are the data consistent with a beneficial or harmful

effect. We conclude that the data are consistent with the null hypothesis of ACE = 0.

We conducted the example using an inverse probit link function gz. With this link, interpreting the

sensitivity parameters βzrequires thinking in the “Z-metric” (normal quantitle metric), where a one unit

increase in Yp leads to an increase in the probit score ?−1(Pr(S(1) = 1|S(0) = 1,Yp(0) = y) by

βzstandard deviations. For those uncomfortable thinking in the Z-metric, it may be preferable to use

Fig. 3. Estimated ACE and 95% CIs with φ = 1. Solid lines are for the SPPL method and dashed lines are for the CC

method.

Page 12

Semiparametric estimation of the average causal effect of treatment

45

Fig. 4. (a) Estimated ACE, (b) lower 95% confidence limits, and (c) upper 95% confidence limits with φ = 0.8. Solid

lines are for the SPPL method and dashed lines are for the CC method.

an alternative link function that they find more interpretable, for example, logit. Furthermore, it is good

practice in sensitivity analysis to repeat the analysis for multiple link functions.

6. CONCLUDING REMARKS

Recently developed methods for estimating causal treatment effects on an outcome measured after a pos-

trandomization event, including the methods of GBH and JR, are valid under an MCAR assumption. We

have extended the GBH and JR methods to allow for MAR missingness, by augmenting it with LA’s ro-

bust likelihood approach that uses penalized spline propensity prediction. The simulation study showed

that the new method corrects for the bias of the existing methods and improves precision by incorporating

covariates that predict the outcome.

We have focused on accommodating missing data on the outcome of interest Yp, and as such have sup-

posed complete data for the postrandomization event S. The method could be extended to accommodate

MAR missingness of S via IPW approach of Shepherd and others, 2008 to address this issue.

If the missing outcome data follow a monotone pattern, such that Yj+1,...,Ypare missing for all sub-

jects for whom Yjis missing, then the method can be extended using the approach described in Section 7

of LA. Specifically, the propensity spline model (3.7) can be applied sequentially to each block of missing

values. With this approach, the fitted values? Ei[g−1

z(Yp; αz,βz)] and? Ei[Ypg−1

z(Yp; αz,βz)] appearing in

Page 13

46P. B. GILBERT AND Y. JIN

the estimating function UM

sequential regression models.

Extensions of interest include accommodating nonmonotone missing data and expanding the sensitiv-

ity analysis to evaluate the impact on inferences of deviations from the MAR assumption.

i(θ) are computed by replacing missing covariate values by predicted values in

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

ACKNOWLEDGMENTS

Conflict of Interest: None declared.

FUNDING

National Institutes of Health (RO1 AI054165-06) to P.B.

REFERENCES

BUCHBINDER, S. P., MEHROTRA, D. V., DUERR, A., FITZGERALD, D. W., MOGG, R., LI, D., GILBERT, P. B.,

LAMA, J. R., MARMOR, M., DEL RIO, C. and others (2008). The Step study: the first test-of-concept efficacy

trial of a cell-mediated immunity HIV vaccine. Lancet 372, 1881–1893.

CHANG, M. N., GUESS, H. A. AND HEYSE, J. F. (1994). Reduction in the burden of illness: a new efficacy measure

for prevention trials. Statistics in Medicine 13, 1807–1814.

FLYNN, N. M., FORTHAL, D. N., HARRO, C. D., JUDSON, F. N., MAYER, K. H., PARA, M. F. AND THE RGP120

HIV VACCINE STUDY GROUP (2005). Placebo-controlled phase 3 trial of recombinant glycoprotein 120 vaccine

to prevent HIV-1 infection. The Journal of Infectious Diseases 191, 654–665.

GILBERT, P. B., ACKERS, M. L., BERMAN, P. W., FRANCIS, D. P., POPOVIC, V., HU, D. J., HEYWARD, W. L.,

SINANGIL, F., SHEPHERD, B., GURWITH, M. (2005). HIV-1 virologic and immunologic progression and

antiretroviral therapy initiation among HIV-1-infected participants in an efficacy trial of recombinant glycoprotein

120 vaccine. The Journal of Infectious Diseases 192, 974–983.

GILBERT, P. B., BOSCH, R. AND HUDGENS, M. G. (2003). Sensitivity analysis for the assessment of causal vaccine

effects on viral load in HIV vaccine trials. Biometrics 59, 531–541.

HAMMER, S. M., ERON, JR, J. J., REISS, P., SCHOOLEY, R. T., THOMPSON, M. A., WALMSLEY, S., CAHN, P.,

FISCHL, M. A., GATELL, J. M., HIRSCH, M. S. and others (2008). Antiretroviral treatment of adult HIV infec-

tion: 2008 recommendations of the International AIDS Society—USA panel. Journal of the American Medicial

Association 300, 555–570.

HAYDEN, D., PAULER, D. K. AND SCHOENFELD, D. (2005). An estimator for treatment comparisons among

survivors in randomized trials. Biometrics 61, 305–310.

HUDGENS, M. G. AND HALLORAN, M. E. (2006). Causal vaccine effects on binary postinfection outcomes. Journal

of the American Statistical Association 101, 51–64.

HUDGENS, M. G., HOERING, A. AND SELF, S. G. (2003). On the analysis of viral load endpoints in HIV vaccine

trials. Statistics in Medicine 22, 2281–2298.

JEMIAI, Y. AND ROTNITZKY, A. (2005).Semiparametricmethodsforinferringtreatmenteffectsonoutcomesdefined

only if a post-randomization event occurs, [Unpublished PhD. Dissertation]. Harvard School of Public Health,

Department of Biostatistics.

Page 14

Semiparametric estimation of the average causal effect of treatment

47

JEMIAI, Y., ROTNITZKY, A., SHEPHERD, B. AND GILBERT, P. B. (2007). Semiparametric estimation of treatment

effects given base-line covariates on an outcome measured after a post-randomization event occurs. Journal of the

Royal Statistical Society, Series B 69, 879–902.

JOFFE, M. M., SMALL, D. AND HSU, C.-Y. (2007). Defining and estimating intervention effects for groups that will

develop an auxiliary outcome. Statistical Science 22, 74–97.

KANG, J. D. Y. AND SCHAFER, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies

for estimating a population mean from incomplete data. Statistical Science 22, 523–539.

LITTLE, R. AND AN, H. (2004). Robust likelihood-based analysis of multivariate data with missing values. Statistica

Sinica 14, 949–968.

LITTLE, R. J. A. AND RUBIN, D. B. (2002). Statistical Analysis with Missing Data. New York: Wiley.

LITTLE, R AND ZHANG, G. (2008). Extensions of the penalized spline of propensity prediction method of imputa-

tion. Biometrics, doi: 10.1111/j.1541-0420.2008.01155.x.

MEHROTRA, D. V., LI, X. AND GILBERT, P. B. (2006). A comparison of eight methods for the dual-endpoint

evaluation of efficacy in a proof-of-concept HIV vaccine trial. Biometrics 62, 893–900.

MOGG, R. AND MEHROTRA, D. V. (2007). Analysis of antiretroviral immunotherapy trials with potentially

non-normal and incomplete longitudinal data. Statistics in Medicine 26, 484–497.

PITISUTTITHUM, P., GILBERT, P. B., GURWITH, M., HEYWARD, W., MARTIN, M., VAN GRIENSVEN, F.,

HU, D., TAPPERO, J. W., CHOOPANYA, K., THE BANGKOK VACCINE EVALUATION GROUP. (2006).

Randomized, double-blind, placebo-controlled efficacy trial of a bivalent recombinant glycoprotein 120 HIV-1

vaccine among injection drug users in Bangkok, Thailand. The Journal of Infectious Diseases 194, 1661–1671.

ROBINS, J. M., ROTNITZKY, A. AND ZHAO, L. P. (1995). Analysis of semiparametric regression models for re-

peated outcomes in the presence of missing data. Journal of the American Statistical Association 90, 106–121.

ROSENBAUM, P. R. AND RUBIN, D. B. (1983). Assessing sensitivity to an unobserved binary covariate in an

observational study with binary outcome. The Journal of the Royal Statistical Society, Series B 45, 212–218.

RUBIN, D. B. (1978). Bayesian inference for causal effects. Annals of Statistics 6, 34–58.

RUBIN, D. B. (2000). Comment on “Causal inference without counterfactuals” by A.P. Dawid. Journal of the

American Statistical Association 95, 435–437.

SHEPHERD, B., GILBERT, P. B., JEMIAI, Y. AND ROTNITZKY, A. (2006). Sensitivity analyses comparing outcomes

only existing in a subset selected post-randomization, conditional on covariates, with application to HIV vaccine

trials. Biometrics 62, 332–342.

SHEPHERD, B., GILBERT, P. B. AND LUMLEY, T. (2007). Sensitivity analyses comparing time-to-event outcomes

only existing in a subset selected post-randomization, conditional on covariates, with application to HIV vaccine

trials. Journal of the American Statistical Association 102, 573–582.

SHEPHERD, B., GILBERT, P. B. AND MEHROTRA, D. V. (2006). Eliciting a counterfactual sensitivity parameter.

The American Statistician 61, 56–63.

SHEPHERD, B. E., REDMAN, M. W. AND ANKERST, D. P. (2008). Does finasteride affect the severity of prostate

cancer? A causal sensitivity analysis. Journal of the American Statistical Association 103, 1392–1404.

ZHANG, J. L. AND RUBIN, D. B. (2003).Estimationofcausaleffectsviaprincipalstratificationwhensomeoutcomes

are truncated by “death.” Journal of Educational and Behavioral Statistics 28, 353–368.

[Received December 30, 2008; revised July 29, 2009; accepted for publication July 30, 2009]