Page 1

Volume 7, Issue 12011Article 25

The International Journal of

Biostatistics

Clarifying the Role of Principal Stratification

in the Paired Availability Design

Stuart G. Baker, National Institutes of Health

Karen S. Lindeman, Johns Hopkins Medical Institutions

Barnett S. Kramer, National Institutes of Health

Recommended Citation:

Baker, Stuart G.; Lindeman, Karen S.; and Kramer, Barnett S. (2011) "Clarifying the Role of

Principal Stratification in the Paired Availability Design," The International Journal of

Biostatistics: Vol. 7: Iss. 1, Article 25.

DOI: 10.2202/1557-4679.1338

Available at: http://www.bepress.com/ijb/vol7/iss1/25

©2011 Berkeley Electronic Press. All rights reserved.

Page 2

Clarifying the Role of Principal Stratification

in the Paired Availability Design

Stuart G. Baker, Karen S. Lindeman, and Barnett S. Kramer

Abstract

The paired availability design for historical controls postulated four classes corresponding to

the treatment (old or new) a participant would receive if arrival occurred during either of two time

periods associated with different availabilities of treatment. These classes were later extended to

other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a

tool and lists four interpretations of principal stratification. In the case of the paired availability

design, principal stratification is a tool that falls squarely into Pearl’s interpretation of principal

stratification as “an approximation to research questions concerning population averages.” We

describe the paired availability design and the important role played by principal stratification in

estimating the effect of receipt of treatment in a population using data on changes in availability of

treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated

estimate to make the generalizability assumption more plausible. By showing why the

assumptions are plausible we show why the paired availability design, which includes principal

stratification as a key component, is useful for estimating the effect of receipt of treatment in a

population. Thus, for our application, we answer Pearl’s challenge to clearly demonstrate the value

of principal stratification.

KEYWORDS: principal stratification, causal inference, paired availability design

Page 3

1. Introduction

Judea Pearl asks if principal stratification (Frangakis and Rubin, 2002) is a tool or

a goal (Pearl, 2011). In the paired availability design for historical controls (Baker

and Lindeman, 1994), which involves a type of principal stratification as a key

component, principal stratification is a tool to achieve the goal of estimating the

effect of receipt of treatment in a population. Because many readers may not be

familiar with the paired availability design, we describe it in some detail. Special

emphasis is placed on assumptions for using historical controls, estimating

treatment effect in the principal strata of interest, and generalizing the estimate

from the principal strata of interest to the general population. For the latter, we

propose a new estimate that increases the plausibility of generalizing to the

population. We also discuss the related use of principal stratification with all-or-

none compliance. With this background, we address Judea Pearl’s critique of the

value of principal stratification as it relates to the paired availability design.

2. Overview of paired availability design

The paired availability design uses historical controls to estimate the effect of

receipt of a new treatment if all persons received the new treatment instead of the

old treatment. In terms of statistical inference, an ideal study would randomize

subjects to receive either new or old treatment. However sometimes

randomization is not feasible or desirable, such as when there are strongly held

views about the merits of treatment or when blinding of patients to treatments is

not feasible. In this situation the paired availability design can play an important

role in estimating the effect of receipt of treatment.

The standard form of the paired availability design uses data from two

time periods in each of many medical centers providing data (Baker and

Lindeman, 1994, Baker et al., 2001). A modified form can also use data from

more than two time periods in any particular medical center (Baker and

Lindeman, 2001). By using multiple medical centers instead of a single medical

center, systematic bias is reduced. The analysis is complicated because the

change in availability of treatment between time periods differs among medical

centers.

In order to estimate the effect of receipt of treatment, as opposed to the

effect of a change in availability of treatment, Baker and Lindeman (1994)

proposed a four-category potential outcomes model for receipt of treatment if

arrival would have occurred in either period. Their model involved reasonable

assumptions for estimation. Baker and Lindeman (1994) also proposed a

likelihood formulation. This type of model and the same plausible assumptions

1

Baker et al.: Role of Principal Stratification in Paired Availability Designs

Published by Berkeley Electronic Press, 2011

Page 4

for estimation had also been independently proposed by various investigators in

the context of all-or-none compliance in randomized trials. Permutt and Hebel

(1989) proposed a version without an explicit mathematical formulation. Imbens

and Angrist (1994) followed by Angrist et al. (1996) proposed a version based on

instrumental variables. Cuzick et al. (1997) proposed a version in the context of

cancer screening trials. Frangakis and Rubin (2002) extended this model to other

settings and called it principal stratification. If reasonable assumptions hold, the

principal stratification model in Baker and Lindeman (1994) yields an unbiased

estimate of the effect of receipt of treatment among subjects in some principal

strata. An additional assumption is needed to appropriately apply this estimate to

all eligible persons.

We discuss the paired availability design and the role of principal

stratification in the context of the original example related to obstetric

anesthesiology (Baker and Lindeman 1994, 2001). Participants were women in

labor arriving in one or more time periods at various medical centers. The goal

was to estimate the effect of receiving versus not receiving epidural analgesia on

the probability of a Cesarean section (C/S). The paired availability design relies

on three types of assumptions, which we discuss in turn: (1) assumptions needed

to analyze data from different time periods as data from different randomization

groups (2) assumptions needed to estimate the effect of treatment in some

principal strata, and (3) an assumption that the estimated treatment effect in some

principal strata is a good estimate of the treatment effect for all eligible persons in

the entire population. We modify some of the assumptions listed in Baker et al.

(2001) and Baker and Lindeman (2001).

3. Assumptions to analyze time periods as randomization groups

The paired availability design requires the following four assumptions that justify

analyzing data from the two time periods as if they were data from two

randomization groups.

Assumption 1. Stable Ancillary Care. Between the two time periods, there

are no systematic changes in patient management unrelated to the treatment of

interest that would affect the probability of outcome (after any adjustment).

Assumption 2. Stable Disease Natural History. Between the two time

periods, there are no systematic changes in the timing of disease-related events or

the spectrum of manifestations of disease in the absence of treatment.

Assumption 3. Stable Population. Between the two time periods, there

are no changes in the characteristics of the eligible population that would affect

the probability of outcome.

2

The International Journal of Biostatistics, Vol. 7 [2011], Iss. 1, Art. 25

http://www.bepress.com/ijb/vol7/iss1/25

DOI: 10.2202/1557-4679.1338

Page 5

Assumption 4. Stable Evaluation. Eligibility criteria and definitions of

outcome are constant over time.

In the application to obstetric anesthesiology, Assumption 1 says that,

between the two time periods, there are no systematic changes in obstetric

practice unrelated to epidural analgesia that could affect the probability of C/S.

Even changes in billing or reimbursement rules can have an effect on the validity

of this assumption. Assumption 1 is plausible because medical centers were

situated in various geographic locations and data collection took place at various

times. If data were available from additional medical centers with no change in

availability of epidural analgesia, investigators could estimate the change in the

probability of C/S due to changes in care unrelated to the change in availability of

epidural analgesia. Then Assumption 1 would say that this estimate is sufficient

to adjust for any possible bias due to systematic changes in care.

We cannot think of any examples where Assumption 2 would be seriously

questioned in the application involving obstetric anesthesiology. In other settings,

Assumption 2 could be violated by an increase or decrease in prevalence of

resistant bacteria between the two time periods.

In the application to obstetric anesthesiology, Assumption 3 says that,

between the two time periods, there are no changes in the characteristics of the

eligible population that would affect the probability of C/S. Assumption 3 is

plausible because medical centers were restricted to those serving a closed

population, such as an army medical center or the only hospital in a geographic

region. In other words, Assumption 3 is plausible because it was unlikely that a

woman in labor would go to a, considerably less convenient, hospital in order to

receive epidural analgesia.

In the application to obstetric anesthesiology, Assumption 4 was plausible

because there was no change between the two time periods in the eligibility

criterion of being in labor and the determination of the outcome of Cesarean

section. In contrast, in an application in oncology where eligibility is determined

by stage of cancer, the use of a new or more sensitive radiologic test to stage

cancer in the second time period may artifactually improve prognosis of each

stage even if the treatment in the two time periods was the same (Feinstein et al.

1985). Also in the field of oncology, the definition of an outcome of disease

progression can change over time, for example with the increasing use

"biochemical failure" rather than symptoms of recurrence, such as bone pain.

3

Baker et al.: Role of Principal Stratification in Paired Availability Designs

Published by Berkeley Electronic Press, 2011