Page 1

Causal Inference in Hybrid Intervention Trials Involving

Treatment Choice

Qi Long, Roderick J. Little, and Xihong Lin?

June 14, 2006

Abstract

Randomized allocation of treatments is a cornerstone of experimental design, but has draw-

backs when a limited set of individuals are willing to be randomized, or the act of randomization

undermines the success of the treatment. Choice-based experimental designs allow a subset of

the participants to choose their treatments. We discuss here causal inferences for hybrid experi-

mental designs where some participants are randomly allocated to treatments and others receive

their treatment preference. This work was motivated by the \Women Take Pride" (WTP) study

(Janevic et al., 2001), a doubly randomized preference trial (DRPT) to assess behavioral inter-

ventions for women with heart disease. We propose a model for estimating the causal e?ects in

the subpopulations de?ned by treatment preferences, and hence preference e?ects. An EM algo-

rithm is described for computing maximum likelihood estimates of the model parameters. The

method is illustrated by analyzing sickness impact pro?le (SIP) scores and treatment adherence

in the WTP data. Our results show 1) some evidence that SIP scores were improved when women

received their prefered treatment; and 2) strong preference e?ects on program adherence; that is,

women assigned to their prefered treatment were more likely to adhere to the program. We also

provide a framework for assessing the DRPT and other hybrid trial designs, and discuss some

alternative designs from the perspective of the strength of assumptions required to make causal

inferences.

KEY WORDS: Clinical Trials; Doubly Randomized Preference Trials; EM algorithm; Partially Ran-

domized Preference Trials; Randomization; Selection Bias.

?Qi Long is Rollins Assistant Professor, Department of Biostatistics, Emory University, Atlanta, GA, 30329; Rod-

erick J. Little is Professor, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109; Xihong Lin

is Professor, Department of Biostatistics, Harvard University, Boston, MA, 02115. This research is supported by the

National Cancer Institute grant R01CA76404. We would like to thank Noreen Clark for providing the WTP data and

useful comments.

Page 2

1 Introduction

Randomized assignment of subjects to treatments is a cornerstone of good experimental design. With

full compliance and no missing data, randomization allows valid estimates of average treatment e?ects

under minimal assumptions, by avoiding selection bias and ensuring that the observed mean for each

treatment is an unbiased estimate of the overall mean if all individuals in the population had received

that treatment (e.g., Little and Rubin, 2001).

However, it is also well known that randomization does not solve all problems in experiments

involving alternative treatments. It is not always ethically feasible, as in medical trials when the

principle of equipoise is not widely accepted, or in assessments of potentially harmful environmental

e?ects that are unlikely to be bene?cial. Inferences are only possible for the subset of subjects willing

to be randomized, potentially excluding a signi?cant fraction of the population, including subjects

with strong treatment preferences. A behavioral treatment, in which strong motivation on behalf

of the participant is required and treatment assignment cannot be blinded, may be more successful

if subjects are allowed to choose, rather than are randomized to a treatment. Random assignment

to a treatment perceived to be inferior may lead to issues of noncompliance and missing data that

undermine the randomization and complicate causal inferences.

An alternative to randomization is to simply allow participants to choose their treatments. Partic-

ipants that choose their treatment are more likely to participate and fully comply with the protocol,

and the trial may more realistically measure the outcome of the treatment in the population of in-

terest. On the other hand, this approach has all the well known problems of observational studies,

in that treatment comparisons are obscured by the confounding e?ects of selection. Few experi-

mentalists would accept the notion that the advantages of these designs compensate for their major

weaknesses.

A natural question is whether there are hybrid designs between the extremes of randomization

or choice that improve on either design. One candidate is Zelen's (1990) randomized consent design,

which reverses the usual order of consent and randomization, by randomizing prior to consent and

predicating the consent process on the randomized treatment.In a study with two alternative

treatments (say A and B), participants are randomized to treatment (say A or B). Those randomized

to A are asked if they are willing to receive A, after a discussion of the two treatments. If the

participant agrees, A is given. If not, then B is given. The same procedure is followed in the other

arm, with roles of A and B reversed. Zelen (1990) shows that this design allows for valid tests of the

null hypothesis of no treatment e?ect, and can be more powerful than a randomized design restricted

1

Page 3

to participants willing to be randomized. However, the ethics of describing the treatments after the

treatment assignment has already been made have been questioned (Ellenberg, 1992). See Altman

et al. (1995) for more discussion of this design.

We consider here other hybrid designs that combine features of randomization and patient choice

of treatments (Lambert and Wood, 2000). Perhaps the simplest approach is to ask the participants'

treatment preferences in the context of a conventional fully randomized trial (Torgerson et al., 1996),

and use that information as a covariate or e?ect-modi?er; however, randomizing to a treatment other

than the stated preference may be problematic. A more radical approach is the partially randomized

preference trial (PRPT), where participants who are willing to be randomized to the treatments are

randomized, and those who are not are assigned to their preferred treatment (Brewin and Bradley,

1989). A variation on this theme is the doubly-randomized preference trial (DRPT), where partici-

pants are randomized into a \randomization arm", within which treatments are randomized, and a

\preference arm", within which participants get to choose their treatments. Versions of the DRPT

are described by R• ucker (1989), Wennberg et al. (1993) and Janevic et al. (2003). It seems plausible

that in PRPT's and DRPT's, the additional information on participants who get to choose their

treatments might usefully supplement the information from the participants who are randomized,

although as discussed below this often requires modeling assumptions. Overall, hybrid designs en-

able us to estimate the preference e?ects and causal e?ects in subpopulations de?ned by treatment

preferences, which cannot be estimated in a completely randomized trial.

We consider causal inference for these designs within the framework of potential outcomes, also

known as Rubin's causal model (Holland, 1986). Originally formalized by Neyman (1923) in the con-

text of randomized experiments, this framework was generalized and extended by Rubin (1974, 1977,

1978) to nonrandomized studies. The key underlying idea is that causal estimands are comparisons of

the potential outcomes that would have been observed under di?erent exposures of the same units to

treatments at a particular place and time. Robins (1986, 1987) extended Rubin's \point treatment"

potential outcome framework to evaluate direct and indirect e?ects of time-varying treatments in

experimental and observational longitudinal studies. However, those methods are not directly ap-

plicable to the hybrid designs discussed in this paper.

The ?rst objective of this paper is to propose a general model for assessing preference e?ects on

the outcomes of interest. Our second objective is to propose a framework based on recent statistical

ideas of causal inference for assessing the hybrid designs described above, and extensions. The basic

idea is to classify individuals in the population into strata, which may or may not be observed

for participants in the trial, and then assess assumptions required to identify the causal e?ects of

2

Page 4

treatments within these strata. Causal e?ects are de?ned as the di?erence in average outcome if all

individuals within a stratum were assigned to treatment A and if all individuals within a stratum

were assigned to treatment B (e.g. Rubin, 1974).

Our paper is motivated by the \Women Take Pride" study (Janevic et al., 2003), which utilized a

DRPT to assess behavioral interventions for women with heart disease. In Section 2 we describe the

design of that study, and show how to estimate causal e?ects in subpopulations de?ned by treatment

preference, using a method of moments approach similar to that proposed by R• ucker (1989). In

Section 3, we propose a conceptual model for analyzing DRPTs and other hybrid trial designs.

In Section 4, we propose a more general likelihood-based method of analysis that accommodates

covariates and estimates causal parameters using the EM algorithm, and in Section 5 we apply these

methods to the WTP data. In Section 6, we present a simulation study to evaluate the performance of

the proposed method. In Section 7 we discuss alternative designs from the perspective of the strength

of assumptions required to make causal inference. Section 8 presents some concluding remarks.

2 The \Women Take Pride" Study

The \Woman Take Pride" (WTP) intervention study (Janevic et al., 2001) concerns women aged

60 years and older with diagnosed cardiac disease being treated by daily heart medication. The

interventions are behavioral programs aimed at enhancing the women's ability to manage their

disease, based on principles of self-regulation from social cognitive theory. The comparison is between

two versions of an intervention consisting of six weekly units: a Group treatment (Ti= A), where

6-8 women meet for 2-2 1/2 hours in a group setting; and a Self-directed treatment (Ti= B) where

the participant studies at home following an initial orientation session. Motivation and support are

provided through the social environment in the Group version, and through weekly telephone calls

from the health educator and peer leader in the Self-directed treatment. These two versions of the

intervention present the same material and only di?er in format.

A DRPT design where some participants choose their treatment was seen as preferable to a

completely randomized design in this setting, since the choice more accurately re?ects a clinical

situation. In a \real world" setting, patients may well have a preference for a group or self-directed

format, and preference may well impact the motivation and adherence of the participants, and hence

the success of the treatment. The design is summarized in Figure 1. At the ?rst stage, a total of

3079 women with heart disease were randomized to a Random arm (Wi = R), where treatments

were randomly assigned, and a Choice arm (Wi= P), where participants received their preferred

treatment. Within the Random arm (n = 1613), 575 (35:6%) women agreed to participate; within

3

Page 5

the Choice arm (n = 1466), 496 (33:8%) agreed to participate. At the second stage, the women in the

Random arm were randomized to three groups: control (n = 184) , the Group treatment A (n = 190)

and the Self-Directed treatment B (n = 201); women in the Choice arm were asked to choose between

Treatment A (n = 321) and Treatment B (n = 175). We do not analyze the data for the Control

group here, since our focus is on comparisons of the Group and Self-Directed Treatments, and our

general conceptual model is more simply presented for the case of two treatments. Extensions of our

methods to more than two treatments are straightforward, however.

Primary outcomes Y of the study are measures of improved physical and psychosocial functioning,

frequency and severity of symptoms, and health-care utilization measured at baseline, 4, 12 and 18

months. In this paper, we analyze physical, psychological and total scores of the sickness impact

pro?le (SIP) at month 12. These are measures of physical, psychological and total functional health

status, scored between 0 and 100, with higher scores indicating greater impairment due to illness

(Bergner et al., 1981). To improve normality assumptions we analyze all three SIP scores on the

log-transformed scale after adding a constant 0.05. We conduct an intent-to-treat analysis of the

SIP score outcomes. However, we also analyze a measure of treatment adherence as a secondary

outcome measure, speci?cally a binary variable for whether a woman completed at least one unit of

materials. Table 1a summarizes the observed data for those outcomes.

Table 1a shows that in the Random arm (Wi= R), women assigned the Self-Directed treatment

(Ti= A) have higher average SIP scores than women assigned the Group treatment (Ti= B); the

same trend is observed in the Choice arm (Wi = P), with a greater mean di?erence. Adherence

rates are 76% for both treatments in the Random arm, while in the Choice arm the adherence rate

is higher for the Group treatment (93%) than for the Self-Directed treatment (77%).

The treatment comparisons in the Random arm are valid because of the random allocation. On

the other hand, the treatment comparisons in the Choice arm are potentially biased by the e?ects

of self-selection. Let Cidenote treatment preference, with Ci= A if an individual prefers treatment

A and Ci = B if an individual prefers treatment B. The mean outcome for treatment A are for

individuals with Ci= A, and the mean outcome for treatment B is for individuals with Ci= B.

The comparison of these two means does not estimate a causal e?ect in a particular population,

which is the key requirement for a causal e?ect in Rubin's (1974) sense. A direct comparison of

these two means requires the very debatable assumption that the subpopulation with Ci= A and

the subpopulation with Ci= B are equivalent with respect to treatment outcomes. This assumption

might be improved by regression adjustment for known characteristics of participants in the two

groups, but as in any observational setting, such adjustments do not necessarily remove the bias. A

4

Page 6

direct comparison of individuals with Ci= A(B) in the choice arm and individuals with Ti= A(B)

in the random arm as was done in Janevic et al. (2003), is problematic for similar reasons.

A causal analysis that addresses the issue of choice is to construct estimates of mean outcomes

of treatments A and B within each of the two preference subpopulations Ci= A and Ci= B. This

is not possible from data in the Choice arm alone, because it requires outcome data for participants

who do not receive their treatment of choice. However, it can be addressed with a DRPT, since some

participants in the Random arm do not receive their treatment of choice, and treatment assignment

remains random within the two preference subpopulations, Ci= A and Ci= B. Speci?cally, de?ne

? ?(A)=overall mean outcome if assigned to treatment A in the whole population

? ?A(A)=mean outcome if assigned to treatment A in the subpopulation that prefers A (Ci= A)

? ?B(A)=mean outcome if assigned to A in the subpopulation that prefers B (Ci= B),

and de?ne ?(B), ?A(B) and ?B(B) as the corresponding mean outcomes if assigned to the Group

treatment B, in the overall population and the two subpopulations respectively. Let ?B be the

proportion of the population that prefers the Group treatment (Ci= B). Then

?(A)=?B?B(A) + (1 ? ?B)?A(A)

?B?B(B) + (1 ? ?B)?A(B):?(B)=

From the choice arm, we can estimate ^ ?A(A), ^ ?B(B), and ^ ?B= 321=496 = 0:65 (Fig. 1). From the

random arm, we can estimate ^ ?(A) and ^ ?(B). Thus, ^ ?B(A) and ^ ?A(B) can be estimated by solving

a set of linear equations.

Let ?B = ?B(B) ? ?B(A), the di?erence in outcome means for treatments B and A in the

subpopulation that prefers B. Then ?Bcan be estimated as

^?B= ^ ?B(B) ? ^ ?B(A);

Similarly in the subpopulation that prefers A, we have ?A= ?B(B) ? ?B(A), which is estimated by

^?A= ^ ?A(B) ? ^ ?A(A):

The di?erence ?B? ?Ais de?ned as the preference e?ect for the two treatments, and measures the

extent to which treatment preference modi?es the treatment e?ect.

We ?rst apply this method to our health outcome measures in the WTP study. Our results (Table

1b) show that for women who preferred the Group format, average SIP physical scores at month 12

5

Page 7

were lower (-0.370) when they were assigned to the Group format than when they were assigned to

the Self-Directed format; for women who preferred the SD format, average SIP physical scores at

month 12 were higher (+0.080) when they were assigned to the Group format than when they were

assigned to the Self-Directed format. These results, though not statistically signi?cant, are in the

direction of women having better physical functional health status when assigned their treatment of

choice. Our results also show a similar trend for the other two SIP scores (results not included). The

intervention e?ects as well as the preference e?ects, however, are not statistically signi?cant.

We now apply our method to study intervention adherence. The results show that for women

who prefer the Group format, the adherence rate when assigned to the Group format is estimated to

be 18% (P < 0:001) higher than the adherence rate when assigned to the Self-Directed format; for

women who prefer the Self-Directed format, the adherence rate when assigned to the Self-Directed

format is estimated to be 33% (P < 0:001) higher than the adherence rate when assigned to the Group

format. These results indicate that women are more likely to adhere to the program they prefer.

The preference e?ect, de?ned as the treatment e?ect di?erence between the two subpopulations, is

thus estimated as^?B?^?A= 0:51 (P < 0:001). It follows that the treatment e?ects on adherence are

highly signi?cant in the two subpopulations and they are signi?cantly di?erent. The results suggest

that the very similar adherence rates for the two intervention groups in the Random arm mask strong

preference e?ects, with much higher adherence rates for the prefered interventions.

This analysis method is a more formal description of the method of R• ucker (1989). It does not

require a distributional assumption and is applicable to estimating causal e?ects of an arbitrary

outcome, e.g., the causal e?ect of a health outcome at a post-intervention time point such as 12

month. We now describe a framework that elucidates the implicit assumptions in this analysis, and

then generalize the analysis to include covariates.

3 A Conceptual Model for Analyzing Hybrid Trial Designs

We present a general conceptual model for assessing hybrid intervention trials like the WTP study,

which clari?es assumptions that are implicit in the above analysis. This framework will also be

applied to assess other designs in Section 7. For simplicity we focus on designs involving just two

treatments A and B, although the framework extends in an obvious way to designs with more than

two treatments. We ?rst stratify the target population into ?ve groups (Figure 2(a)):

1. The set of individuals unwilling to participate even if given their choice of treatments (P).

Clearly we cannot learn anything empirically about treatment e?ects for this group without

6

Page 8

making assumptions that relate it to a group we can study. We do not consider this group

further here.

2. The set of individuals willing to participate if given the choice of treatment (P). We stratify

this group into four subpopulations:

(a) Individuals that prefer A and will not participate unless allowed to choose A (PRA).

(b) Individuals that prefer A but are willing to participate in a randomized trial (PRA).

(c) Individuals that prefer B but are willing to participate in a randomized trial (PRB).

(d) Individuals that prefer B and will not participate unless allowed to choose B (PRB).

We consider two versions of each treatment, a version where the treatment is chosen by the

participant (AC and BC) and a version where the treatment is assigned by randomization (AR

and BR). We allow for the possibility that outcomes under these two versions of each treatment

might di?er. Throughout we assume the potential outcomes for each individual do not depend on

the treatment status of other individuals in the sample, the so-called Stable Unit Treatment Value

Assumption (SUTVA) (Rubin, 1978).

The table in Figure 2(b) results from crossing subpopulation stratum with treatment. Mean

outcomes in the empty cells can be estimated from the data, but mean outcomes in the cells labeled

F are inestimable, since they are a-priori counterfactuals (Angrist, Imbens and Rubin, 1996) (AIR):

we cannot observe outcomes in these cells under any design. For example, we do not get to see

the e?ect of randomizing to A (AR) in the subpopulation of individuals who prefer A but will not

participate in a randomized trial (PRA). Opinions di?er on the extent to which it is meaningful to

consider treatment outcomes in such cells, and this needs to be considered in the speci?c context of

each trial. In our discussion we follow AIR and focus attention on outcomes that are measurable

under some design, that is, the cells without F's in Figure 2(b). Causal comparisons (in Rubin's

sense) are comparisons of column means within rows of Figure 2(b). Comparisons between means

in di?erent rows are not causal since they concern di?erent subpopulations.

The WTP study described above is an example of a particular hybrid trial design, the Doubly

Randomized Preference Trial (DRPT), which has the generic form of Figure 3(a). People willing to

participate (P) are ?rst randomized to a choice arm and a random arm. Within the choice arm,

people receive their treatment of choice (PA or PB). Within the random arm, individuals willing

to be randomized (R) are randomized to ARor BR, and those not willing to be randomized do not

participate. Figure 3(b) indicates that mean outcomes can be estimated directly for four pooled

7

Page 9

subpopulations: the mean for AC in the subpopulation that prefers A, namely PRASPRA; the

and the mean for BC in the subpopulation that prefers B; namely PRBSPRB. The only causal

e?ect that is estimable directly without additional assumptions is the comparison of ARand BRin

the combined population PR = PRA [ PRB; this is the treatment comparison from the random

arm of the study.

means for ARand BRin the subpopulation that is willing to be randomized, namely PRASPRB;

If the outcome for an individual randomly assigned to a treatment is the same as if that individual

had chosen that treatment, then AR= AC= A and BR= BC= B. We follow other authors by calling

this assumption the \exclusion restriction" (ER), since it is an example of an exclusion restriction in

the sense that the term is used in econometrics (Angrist and Rubin, 1996). Under the ER assumption,

the four columns in Figure 3(b) reduce to two. Additional assumptions are still needed to estimate

the individual cells in the table, since there are eight cells (two a-priori counterfactual) and only four

means can be directly estimated from the data. Suppose now we also assume that the random arm

and choice arm participants are random samples from the same population, that is PRA = PA and

PRB = PB. We call this assumption \no selection bias from randomization" (NSBR), which allows

us to combine the information from the Random and Choice arms of the study. Under ER and

NSBR, the table in Figure 3(b) collapses to Figure 3(c) with just four cells, and the mean outcomes

of A and B in the PA and PB subpopulations are then identi?ed. The method described in Section

2 estimates these means, using information from the random and choice arms.

In particular, the analysis of the WTP data in the previous section implicitly makes the ER and

NSBR assumptions. The mean outcomes in the two diagonal cells of Figure 3(c) are estimated from

the Choice arm, and the column marginal mean outcomes are estimated from the Random arm. The

remaining o?-diagonal cells can then be estimated, with the proportion of participants that prefer

A estimated from the choice arm. In support of the NSBR assumption, we note that the proportion

of screened individuals agreeing to participate is comparable in the randomization (575/1613) and

choice (496/1466) arms. If a sizeable proportion of the population only participated if given their

treatment of choice, we would expect the participation rate to be higher in the choice arm than in

the randomization arm. We discuss designs under which the ER and NSBR assumptions can be

relaxed in Section 7.

4A General Model for a DRPT with covariates

Janevic, et al. (2003) found that the probability of choosing each treatment was a?ected by demo-

graphic variables and disease severity. The outcomes within the two subpopulations (Ci= A and

8

Page 10

Ci= B) are also likely to be a?ected by baseline covariates other than the treatments. We hence

extend the analysis of the previous section to accommodate covariates. As before, we consider a

DRPT with two treatments, generically denoted as A and B. In this section, we assume ER and

NSBR within the subpopulations de?ned by values of the covariates.

4.1The Model

Suppose the data are comprised of n subjects. For subject i, let Yidenote the observed outcome

of interest, Tidenote the treatment assignment (A or B), Cidenote the treatment preference (A or

B), Wi= R if subject i is randomized to the random arm and Wi= P if subject i is randomized

to the choice arm, X1i be a set of covariates associated with Yi, and X2i be a set of covariates

associated with Ci, where X1i and X2i may overlap. In the application to the WTP data, the

observed outcome of interest is the SIP physical score, the covariates associated with the SIP physical

score are age, employment status and some baseline measures, and the covariates associated with

treatment preference Ci are employment status, baseline total symptom impact and baseline SIP

physical score.

Let Yi(A) and Yi(B) be the potential outcomes of Y for subject i when Ti= A, and Ti= B,

respectively. The average causal e?ect of treatment assignment for the whole population is

? = E fYi(B)g ? E fYi(A)g;

The average causal e?ect of treatment assignment for the subpopulation preferring treatment m,

(m = A;B) and having covariates X1is

?m(X1) = E fYi(B)jCi= m;X1g ? E fYi(A)jCi= m;X1g:

Averaging over the distribution of X1, the average causal e?ect of treatment assignment for the

subpopulation preferring treatment m equals to

?m= EX1jCi=m[E fYi(B)jCi= m;X1g ? E fYi(A)jCi= m;X1g]:

The causal parameters ?m(X1) and ?mcan be related to estimable quantities under the ER and

NSBR assumptions. Speci?cally, since subjects are randomized to the Choice or Random arms, we

have

E fYi(j)jCi= m;X1g = E fYi(j)jCi= m;X1;Wig;

where m and j take values A;B. In the Random arm, since subjects are randomized to treatment

groups, we have

E fYi(j)jCi= m;X1;Wi= Rg=E fYi(j)jTi= j;Ci= m;X1;Wi= Rg

9

Page 11

=E(YijTi= j;Ci= m;X1;Wi= R):

In the Choice arm, subjects are assigned to treatments they prefer, that is, Ti = Ci. Thus for

j = A;B,

E fYi(j)jCi= j;X1;Wi= Pg=E fYi(j)jTi= Ci;Ci= j;X1;Wi= Pg

E(YijTi= j;Ci= j;X1;Wi= C):=

We can not estimate EfYi(A)jCi= B;X1;Wi= Pg and EfYi(B)jCi= A;X1;Wi= Pg from data

from the Choice arm alone. However, under the ER and NSBR assumptions, we have

EfYi(A)jCi= B;X1;Wi= Pg

EfYi(B)jCi= A;X1;Wi= Pg

=EfYi(A)jTi= A;Ci= B;X1;Wi= Rg = EfYijTi= A;Ci= B;X1;Wi= Rg

EfYi(B)jTi= B;Ci= A;X1;Wi= Rg = EfYijTi= B;Ci= A;X1;Wi= Rg=

Hence, we can then use data in the random arm in conjunction with the data in the choice arm

to estimate these quantities, by viewing each group in the random arm as a mixture of the two

preference subpopulations.

For notational simplicity, we recode the values of Tiand Ciby replacing A by 1 and B by 0. We

assume that the distribution of Yigiven Ti, Ci, X1iand Wibelongs to the exponential family

f(YijTi;Ci;X1i;Wi) = exp

(

Yi?i? b(?i)

?a?1

i

+ c(Yi;?)

)

;

where aiis a known constant, ? is a scale parameter, ?iis the canonical parameter, b(?) and c(?) are

known functions. The mean of Yiis ?i= E(YijTi;Ci;X1i;Wi) = b0(?i) and is assumed to have the

form

g(?i) = ?0+ XT

1i?X1+ Ti?T+ Ci?C+ TiCi?TC;(1)

where g(?) is a monotonic link function (McCullagh and Nelder, 1989). The model is completed by

assuming the treatment preference Cigiven X2ifollows a logistic model with ?i= Pr(Ci= 1jX2i)

satisfying

logit(?i) = ?0+ XT

2i?X2:(2)

The causal e?ect of treatment assignment for the subpopulation preferring treatment m (m = 1;0)

given covariates X1is

?m(X1)=E fY (1)jC = m;X1g ? E fY (0)jC = m;X1g

g?1f?0+ X1?X1+ ?T+ m(?C+ ?TC)g ? g?1(?0+ X1?X1+ m?C)

=

10

Page 12

The marginal causal e?ects, ?mhave the form

?m=

Z

?m(X1= x1)f(x1jC = m)dx1;(3)

where f(x1jC = m) can be empirically estimated from the Choice arm, that is,^f(x1jC = m) =

P

In the random arm (Wi;Ti;Yi) are observed but Ciis not observed. In the choice arm, (Wi;Ti;Yi;Ci)

iI(Ci= m;X1i= x1)=P

iI(Ci= m), where I(?) is an indicator function.

are observed and Ti = Ci, since subjects are assigned to their preferred treatment. Denote by

? = (?;?;?) the parameter vector. De?ne Y = (Y1;Y2;:::;Yn)T, and T;C, X1, X2, W similarly.

The observed data loglikelihood is given by

`(Y;CobsjT;X1;X2;W;?)=

X

+

Wi=P

[logff(YijTi;X1i;Ci)g + Cilog(?i) + (1 ? Ci)log(1 ? ?i)]

X

(4)

Wi=R

[logf?if(YijTi;X1i;Ci= 1) + (1 ? ?i)f(YijTi;X1i;Ci= 0)g];

where Cobsdenote the observed C values for the Choice arm, and f(YijTi;X1i;Ci) follows the gener-

alized linear model (1) and ?ifollows the logistic model (2).

4.2Estimation Using the EM Algorithm

An EM algorithm (Dempster, Laird and Rubin 1977) can be used to calculate the maximum likelihood

(ML) estimate of ? for the above model. The complete data (Yi;Ci;Ti;X1i;X2i;Wi) have loglikelihood

`(Y;CjT;X1;X2;W;?)=

X

X

i

`(Yi;CijTi;X1i;X2i;Wi;?)

=

i

[logff(YijTi;Xi;Ci;Wi)g + Cilog(?i) + (1 ? Ci)log(1 ? ?i)]: (5)

The EM algorithm iterates between an E step, which replaces missing values of Ciin (5) by their

conditional expectations given the observed data, and an M step, which maximizes the expected

complete-data loglikelihood (5) to yield updated parameter estimates.

1. E Step at the kth iteration. We calculate the expected complete-data loglikelihood given Y ,

Cobs, T, X, Z, W and the current parameter estimates ?(k), namely

Ef`(Y;CjT;X1;X2;W)jY;Cobs;T;X1;X2;W;?(k)g

X

where for participants in the random arm (Wi= R),

=

Wi=R;

X

m=0;1

w(k)

i;m`(Yi;Ci= mjTi;X1i;X2i;?(k)) +

X

Wi=P

`(Yi;CijTi;X1i;X2i;?(k)); (6)

w(k)

i;m

=p(Ci= mjYi;Ti;Xi;?(k))

f(YijTi;X1i;Ci= m;?(k))p(Ci= mjTi;X2i;?(k))

?(k)

=

if(YijTi;X1i;Ci= 1;?(k)) + (1 ? ?(k)

i)f(YijTi;X1i;Ci= 0;?(k))

:

11

Page 13

where ?(k)

i

= p(Ci= 1jTi;X2i;?(k)). The E-step estimates the weights w(k)

complete-data log-likelihood (5) for the random arm.

i;min the weighted

2. M Step at the kth iteration. This step updates the parameter estimates ?(k+1)by maximizing

the expected complete-data loglikelihood (5). We ?rst construct an augmented data set as

follows. Each observation in the random arm, with Ci missing, is replaced by two \?lled-

in" observations, in which the missing treatment preference indicator Ciis replaced by 0 and

1 respectively and the corresponding weights wi;0 and wi;1 are computed using the current

estimates of the parameters. The observations in the choice arm are unchanged, with weights

are set to one. Using this augmented data set, we ?t a weighted generalized linear model for

?(k+1)and ?(k+1), and a weighted logistic model for ?(k+1).

The resulting estimates from the EM algorithm at convergence give the ML estimates of ?. For

the special case of no covariates X1i;X2iand a binary outcome, the ML estimates of ? transformed

to the mean scale reduce to the method of moments estimates in Section 2.2. A similar EM algorithm

was proposed by Ibrahim (1990) for missing categorical covariates. Our EM algorithm extends his

algorithm to the hybrid randomized-preference design by estimating the weights in the random arm

and ?xing the weights to be one in the choice arm. We consider estimating standard errors of

the ML estimates using bootstrap, the observed information and the approximation proposed in

Ibrahim (1990). The observed information is obtained by directly computing the second derivative

of the observed likelihood using a symbolic di?erentiation algorithm.

5 Analysis of the WTP data

We now apply the methods of Section 4 to the WTP data, to estimate preference e?ects adjusting

for covariates. Group format is coded as 1 and Self-Directed format is coded as 0. Based on the

previous analysis in Janevic, et al. (2003), we consider the following covariates in models (1) and (2):

employment status, age, total symptom impact at baseline, which is a measure of symptom severity

scored between 0 and 70 (Clark et al., 1997), and SIP scores at baseline.

Table 2 presents the estimates of the regression parameters in (1) and (2) and their estimated

standard errors for the outcome SIP physical score. The results from model (2) show that employed

women (OR = 1:84, P = 0:046), and women with greater physical limitations at baseline (OR = 1:03,

P = 0:012) are more likely to choose the Self-Directed format , suggesting that these women tend

to opt for the more ?exible scheduling that the Self-Directed format provides. Women with a higher

total symptom impact score at baseline are less likely to choose the Self-Directed format (OR = 0:97,

12

Page 14

P = 0:013), suggesting that these women may be interested in opportunites to meet women in a

similar situation, which the Group format provides. The magnitudes of these e?ects on treatment

preference are comparable to those in Janevic et al. (2003), which includes more detailed discussion

on predictors of program format preference. In terms of the covariate e?ects on SIP physical score,

women with higher baseline SIP score or total symptom impact score have higher SIP physical score

at month 12; other baseline covariates are not signi?cant (P > 0:05). The signi?cant interaction

between treatment preference and treatment assignment (P = 0:039) suggests an e?ect of treatment

preference for this outcome, with women having a better SIP physical score when assigned their

treatment of choice.

Table 3 summarizes the treatment and preference e?ects on the subpopulation means for SIP

physical score and the other two SIP outcomes. These adjusted means are obtained by integrating

over the distributions of the covariates using equation (3). The treatment and preference e?ects for

the SIP psychological and SIP Total scores have a similar pattern to that for SIP physical score, but

are not statistically signi?cant. Overall, our analysis shows limited evidence of a bene?t of women

getting the types of treatment they prefer. This ?nding addresses one of the hypotheses proposed

by the investigators in the WTP study, that is, making both treatment formats available to women

may be advantageous.

The last row of Table 3 shows strong preferences e?ects on treatment adherence. The marginal

causal treatment e?ect on adherence on the probability scale for the subpopulation preferring the

Group treatment is^?1= 0:208 (P < 0:001), and the marginal causal treatment e?ect on adherence

on the probability scale for the subpopulation preferring the Self-Directed treatment is^?0= ?0:336

(P < 0:001). These results show that women who prefer the Group treatment are 20.8% more likely

to adhere to the treatment if assigned to the Group format than if assigned to the Self-Directed form,

whereas women who prefer the Self-Directed treatment are 33.6% more likely to adhere if assigned

to the Self-Directed treatment than if assigned to the Group treatment. These results are consistent

with the ?ndings in Section 2 that women are more likely to adhere to the program they prefer. The

covariate-adjusted treatment e?ects are slightly stronger than those in Section 2.2 without covariate

adjustments.

6A Simulation Study

We conducted a simulation study to evaluate the ?nite sample performance of the proposed method.

The design of the simulation study was similar to that of the WTP study. Each data set consisted

of 1000 observations. Independent binary observations Ciof the treatment preference indicator were

13

Page 15

generated using the logistic model

logitfPr(Ci= 1)g = ?0+ X2i?1;

where ?1= ?1, ?1= 2, and the X2i were generated from a uniform distribution on the interval

[0;1]. The outcome variable Yiwas assumed to be binary and generated independently using the

logistic model

logitfE(yi)g = ?0+ X1i?1+ Ti?T+ Ci?C+ TiCi?TC;

where ?0= ?2, ?1= ?T= ?C= 2, ?TC= ?2, the X1iwere generated from a uniform distribution on

the interval [0;1]. Values of Widenoting random versus choice arm were generated from a Bernoulli

distribution with P(Wi= R) = 0:5. The treatment assignment indicators Tiwere set equal to Ciin

the choice arm (Wi= P), and were generated from a Bernoulli distribution with P(Ti= 1) = 0:5

in the random arm (Wi= R). The preference indicators Ciwere set to missing in the random arm

(Wi= R).

A total of 125 simulated data sets were generated and analyzed. Table 4 presents the simulation

results. The point estimates are very close to the true values and there is no evidence of bias. We

compared three methods for estimating the standard errors: the bootstrap method, the observed

information, and the approximation given by Ibrahim (1990). These estimated standard errors can

be compared with the empirical standard errors. Our results show that the observed information

based standard errors are closest to the empirical standard errors, the bootstrap standard errors

perform similarly but are slight overestimates for the coe?cients of treatment preference and the

interaction between treatment preference and treatment assignment. The approximation given by

Ibrahim (1990) seems to slightly underestimate the standard errors. This might be due to the

fact that the estimators of the regression parameters ? and ? in model (1) and (2) are assumed

independent in the approximation.

7 Alternative Hybrid Trial Designs

We now consider some other hybrid designs using the framework developed in Section 3. A simple

alternative to the DRPT is the partially randomized preference trial (PRPT) mentioned in Section 1.

In this design, individuals willing to be randomized are assigned to the random arm and receive AR

or BR, and individuals not willing to be randomized are allowed to choose the treatment, ACor BC

(Figure 4(a)). Thus the PR individuals are randomized to ARand BR, the PRA individuals receive

ACand the PRB individuals receive BC. Figure 4(b) shows which of the cell means in Figure 2(b) can

be directly estimated under this design (E); the cells denoted by E are not a-priori counterfactual

14