Content uploaded by Chunlin Qian
Author content
All content in this area was uploaded by Chunlin Qian on Feb 24, 2014
Content may be subject to copyright.
MODELING MULTISTATE SURVIVAL
ILLUSTRATED IN
BONE MARROW TRANSPLANTATION
John P. Klein, Ph.D.
and
Chunlin Qian, Ph.D.
Technical Report 15
August 1996
Division of Biostatistics
Medical College of Wisconsin
8701 Watertown Plank Road
Milwaukee WI 53226
Phone: (414) 456-8280
MODELING MULTISTATE SURVIVAL ILLUSTRATED IN
BONE MARROW TRANSPLANTATION
John P. Klein, Chunlin Qian
John P. Klein, Medical College of Wisconsin, 8701 Watertown Plank Road,
Milwaukee, WI 53226
KEY WORDS: Proportional hazards models, Time Dependent Covariates, Left Truncation
Abstract
In many applications of survival analysis techniques there are intermediate events whose
occurrence may effect a patient's prognosis. The occurrence of these intermediate events can be
modeled using a proportional hazards model with time dependent covariates or by a model using
distinct hazards for each event that allows for non proportional hazard rates when other
intermediate events occur. Of interest to clinical investigators are not the estimates of these
transition intensities, but rather synthesized estimates of predictive probabilities of patient's final
response given their current history of occurrence of these intermediate events. We show, using
an example of bone marrow transplantation taken from the data base of the International Bone
Marrow Transplant Registry, that these predictive probabilities are equivalent to certain transition
probabilities in a multistate Markov model. We show how, by using a combination of proportional
hazards regression and left truncated proportional hazards regression, one can estimate model
parameters and the desired predictive probabilities. Asymptotic properties of the estimators are
discussed. Finally, we show how these predictive probabilities can be used to study the effects of
treatment strategies which alter the rate at which some intermediate events occur.
1. INTRODUCTION
In many applications of survival analysis techniques the ultimate outcome of a patient’s
treatment depends on the occurrence and timing of some intermediate events. This is particularly
true when studying the recovery process of a patient from a bone marrow transplant for leukemia.
Here a patient can experience one of several terminal events, such as death in remission,
reoccurrence of their leukemia or simply death. As the patient recovers from their transplant a
number of intermediate events may occur that have an influence on their eventual prognosis.
Examples of such intermediate events are the return of the patient’s platelets to a “normal” level, the
development of various types of infections, the occurrence of acute or chronic graft-versus-host
disease, etc.
A natural way to model complex experiments such as this is by using a multistate model.
Andersen et al (1991) (See also Andersen et al 1993) has studied such models using a finite state
Markov process model where the hazard rates for each possible transition in the multistate model
are modeled by a separate Cox (1972) proportional hazards model. Here each of the transition
probabilities is estimated using a (left truncated) Cox model. In a multistate model with two
intermediate events and two terminal events this entails fitting 12 separate Cox models.
Recently, Klein et al (1993) have suggested an alternative approach to multistate modeling.
They suggest fitting a Cox model to each of the events with time dependent covariates used to
model the timing of the intermediate events that precede the event of interest. In a multistate model
with two intermediate events and two terminal events this entails fitting 4 separate Cox models.
This model is discussed in Section 3.
The Klein and Andersen approach are two extremes of how one can model multistate
survival. In this report we shall examine how one may model multistate survival experiments
where some of the transition rates are assumed to be proportional to others. This general model is
discussed in Section 4.
Once the transition rates are modeled it is necessary to synthesize these rates to provide
predictions of the patient's eventual prognosis. The patient’s prognosis is a dynamic entity that
depends on their history at a given point in time. The models we fit allow us to estimate a series of
predictive probabilities based on potential patient histories which may be observed at some time t.
These patient histories include the information known on the patient at entry to the study (the fixed-
time covariates) and the knowledge of when the intermediate events have occurred.
Recently, Arjas and Eerola
(1994)
(cf. Eerola
(1993)) have described a framework of
“predictive causality” for longitudinal studies that can be used to illustrate how the timing of the
occurrences of the time dependent covariates in a patient’s recovery process changes the prediction
of his or her final prognosis. For a given patient, let (T,X)={(T
m
,X
m
); m
>
1} denote the ordered
times, 0
<
T
1
<
T
2
<
...
, at which events occur during a patient’s recovery from transplantation,
with description, X
m
, of what has happened to the patient at time T
m
. In the bone marrow
transplantation recovery process X
m
may denote return of the platelets to normal levels, the
development of acute GVHD, or the occurrence of relapse, or death. A patient history, H
t,
at
some time t post-transplantation consists of all the pre-transplantation information available on the
patient (the fixed-time covariates) and the set of marked points, {(T
m
,X
m
); T
m
<
t}, reflecting
what has happened to the patient up to this point in time. We consider the prediction that some
event, W, such as relapse, occurs in time interval, E (W∈E ), for example within two years post-
transplantation. The predicted probability that W∈E should depend on the patient’s history at the
time t at which this prediction is made. We define a prediction process by µ
t
(E)=P[W∈E|H
t
]
The prediction process allows us to examine the effect of time dependent (and fixed-time)
covariates on the predicted prognosis of a given patient in three ways. First, we can fix the time t
and the history, H, for a patient up to time t and see how the predicted probability of W being in Ε
changes as the prediction interval E varies. In the bone marrow transplantation example this will
allow us to estimate how the probability of relapse within τ years post-transplantation, changes as τ
varies for a patient with a given history at time t. That is, given a particular history at a given time
for a patient we can provide a prognosis for this patient at times in the future. Second, we can fix a
potential history, H, for a patient and the prediction interval, E, and see how the µ
t
(E) changes as t
increases. For example, for a patient with a given history of development of acute GVHD or
platelet recovery, this will give insight into how more and more of a patient history allows us to
refine our prediction of the chance that he or she would relapse within the first two years post-
transplantation, say. Arjas and Eerola call this the learning effect. Finally, we can fix the
prediction interval, E, and the time at which we observe the patient history, t, and look at the
prediction process for patients with different histories. This allows us to study directly the effect
of the timing of the intermediate endpoints on the prognosis of future patients. In the bone
marrow transplantation recovery process this may suggest to the physician that, if certain events
have not occurred by a given time, some additional therapy should be given, based on this model.
The example that is used throughout this paper is from a multicenter bone marrow
transplantation study of patients given an HLA identical sibling transplants, conducted between
1985 and 1990, for patients with acute lymphoblastic leukemia (ALL) or acute myelogenous
leukemia reported to the International Bone Marrow Transplant Registry. The data set consists of
1823 patients with observation times ranging from 10 days to 2236 days. 957 patients were alive
and disease free at their last observation time, 442 died in remission and 424 patients were
observed to relapse. In Section 2 a multistate model for this data is presented and in Section 5 we
shall present some empirical estimates of the predicted probabilities.
2 BONE MARROW TRANSPLANTATION
Bone marrow transplantation is a standard treatment for acute leukemia. Recovery
following bone marrow transplantation is a complex process. Prognosis for recovery may depend
on risk factors known at the time of transplantation, such as patient's or donor's age and sex, the
stage of initial disease, the time from diagnosis to transplantation, and so on. The final prognosis
may change as the patient’s post-transplantation history develops with the occurrence of events
during the recovery process, such as the development of acute or chronic graft-versus-host disease
(GVHD), the return of the platelet count to normal levels, the return of granulocytes to normal
levels, or the development of infections. Transplantation can be considered a failure when a
patient’s leukemia returns (relapse) or when he or she dies while in remission (treatment-related
death). Of interest is how the probabilities of relapse (denoted by R) and treatment-related death
(denoted by D), as well as leukemia-free survival (the probability of being alive and in remission),
depend on the pre-transplantation (fixed-time covariates) and post-transplantation (time dependent
covariates) patient history.
Figure 1 shows a simplified diagram of a patient’s recovery process based on two
intermediate events which may occur in the recovery process. These intermediate events are the
development of acute GVHD which typically occurs within the first 100 days following
transplantation (denoted by an A), and the recovery of the platelet count to a self-sustaining level ≥
40 x 10
9
/L (called platelet recovery in the sequel and denoted by a P). Immediately following
transplantation, patients have depressed platelet counts and are free of acute GVHD. At some point
in time they may develop acute GVHD or have their platelets recover, at which time their prognosis
(probabilities of treatment-related death or relapse at some future time) may change. These events
may occur in any order or a patient may die or relapse without any of these events occurring.
Patients may then experience the other event, which again modifies their prognosis, or they may
die or relapse.
T
r
a
n
s
p
l
a
n
t
P
12
P
A
D
R
24
45
46
D
R
25
26
A
13
P
D
R
45
46
34
D
R
35
36
D
A
15
16
FIGURE 1
Multistate Model For Bone Marrow Transplant Recovery
Figure 1 shows that there are 12 possible transitions that can occur in this multistate model.
There are six possible states in which a patient may be in at any given time, t. These states are:
1-{T
P
≥t, T
A
≥t, T
D
≥t, T
R
≥t} (Alive disease free without having GVHD or having had
platelets recovered)
2-{T
P
<t, T
A
≥t, T
D
≥t, T
R
≥t} (Alive disease free without having GVHD with platelets
recovered)
3-{T
P
≥t, T
A
<t, T
D
≥t, T
R
≥t} (Alive disease free without platelets recovered having
experienced GVHD)
4 -{T
P
<t, T
A
<t, T
D
≥t, T
R
≥t} (Alive disease free with platelets recovered having experienced
GVHD)
5 - {T
D
<t, T
R
≥t} (Dead prior to relapse)
6- {T
D
≤t, T
R
<t} (Relapsed)
3. PROPORTIONAL HAZARDS MODEL
In this section we shall present a basic model for multistate survival studies based on a
series of Cox regression analysis using time dependent covariates. To model survival we assume
that an individual is at risk having any one of the events in some set e. This set consists of both the
intermediate events which may affect a patient's eventual prognosis and the terminal events. In
the bone marrow transplant example the set e is {A, P, R, D}, where A is the event GVHD has
occurred, P is the event the platelets have recovered, R is the event relapsed and D is the event
died.
From the events in the set e we can define a set of states s = {1,2,...,p}. Each element of s
tells us which final event has occurred or what combination of intermediate events has occurred.
In the transplant example there are six states listed in the previous section.
For a given model only certain transitions are possible. We let t be the set of possible
transitions. In the transplant example t has twelve elements as shown in Figure 1. That is
t ={12, 13, 15, 16, 24, 25, 26, 34, 25, 26, 45, 46}. For any event X ∈e we define t(X) as the
set of transitions into event X that are possible. In our example t(P) ={12, 34}, t(A) ={13, 23},
t(D) ={15, 25, 35, 45}, and t(R) ={16, 26, 36, 46}.
For any event, X, in e we define the ancestor set a(X) as the set of intermediate events that
may happen prior to the occurrence of the event X. In our example we have a(P)= {A}, a(A)=
{P} and a(R)= a(D) = {A, P}.
To model the transitions rates for this model we shall use a proportional hazards regression
model. For each event, X, in e we fit a proportional hazards regression model which includes the
fixed time covariates specific to the event as well a time dependent covariate for each of the events
in the ancestor set of X. If we let Z
F
be the vector of fixed time covariates that have an influence
on any event in e and let β
FX
be a vector of risk coefficients for these covariates for the event X.
Note that if a fixed covariate has no effect on the timing of event X then the risk coefficient for that
factor is set to 0. The model for the hazard rate of the time to event X is given by
λ(t |Z
F
) = λ
oX
(t) exp{β
FX
Z
F
+
∑
x'∈a(X)
β
x'x
I[T
x'
<t] }. (3.1)
Here I[] is the indicator function and β
x'x
is the risk coefficient for the effect of the occurrence of
event X' on the time to event X. The baseline hazard rate, λ
oX
(t), can be different for distinct
levels of some fixed covariates although for simplicity we shall consider the unstratified case in the
sequel. The parameters in (4.1) can be estimated from any standard Cox regression package.
Using the model (4.1) the hazard rate for any of the transitions in the set t can be modeled.
Specifying a transition determines X and the values to be assigned to the indicators I[T
X'
<t] for any
intermediate event. For example,
λ
15
(t | Z
F
)=λ
oD
(t) exp{β
FD
Z
F
}
λ
25
(t | Z
F
)=λ
oD
(t) exp{β
FD
Z
F
+ β
PD
}
λ
35
(t | Z
F
)=λ
oD
(t) exp{β
FD
Z
F
+ β
AD
}
and
λ
45
(t | Z
F
)=λ
oD
(t) exp{β
FD
Z
F
+β
PD
+ β
AD
}.
For any transition, ij, we define the cumulative transition rate as
Λ
ij
(t | Z
F
)= ⌡
⌠
0
t
λ
ij
(u| Z
F
)du , i≠j, i,j∈t
Λ
ij
(t | Z
F
)= 0 if i≠j, i,j∉t, and
Λ
ii
(t | Z
F
) = -
∑
j∈s
Λ
ij
(u | Z
F
) , i∈s.
Since Λ
ij
(t | Z
F
)is absolutely continuous for any i,j,∈s it follows that the matrix Λ = (Λ
ij
)
pxp
is the
transition intensity of a Markov process with state space s = {1,...,p} (See Andersen et al pp 92-
93). The transition probability matrix of this Markov process is given by
P[s,t | Z
F
] =
∏
s<u≤t
[I+dΛ(u | Z
F
)] , (3.2)
where Π is the product-integral (cf. Gill and Johansen (1990) for details on the matrix product
integral) and I is the pxp identity matrix. This transition probability matrix serves as the basis for
making an inference about a patient's eventual prognosis given their current history.
To estimate the transition probability matrix the required Cox models are fit and the
estimators of β are obtained. Breslow's estimator of the baseline hazard (Breslow 1972) rates are
then computed and substituted into (4.2). For the bone marrow transplant example this yields the
following estimators of the predicted probabilities (Here we shall ignore the dependence on Z
F
for
notational convenience)
P
^
ii
(s,t)
=
∏
s<u≤t
{1-
∑
j:i<j
∆Λ
^
ij
(u) }, i=1, 2, 3, 4;
P
^
ij
(s,t)
=
∑
s<u≤t
P
^
ii
(s,u-)
P
^
jj
(u,t) ∆Λ
^
ij
(u) , ij=12,13,24, 34,45,46;
P
^
ij
(s,t)
=
∑
s<u≤t
P
^
ii
(s,u-)[ ∆Λ
^
ij
(u)+
P
^
4j
(u,t)∆Λ
^
i4
(u)] , ij=25,26, 35, 36;
and
P
^
1j
(s,t)
=
∑
s<u≤t
P
^
11
(s,u-)[ ∆Λ
^
1j
(u)+
P
^
2j
(u,t)∆Λ
^
12
(u)+
P
^
3j
(u,t)∆Λ
^
13
(u)], j=4,5,6.
The asymptotic distribution of P[s,t | Z
F
] can be obtained by basic counting process
techniques. Details are found in Qian(1995). The basic result is as follows (Here for ease of
exposition we have suppressed the dependence on the fixed covariates, Z
F
) :
Theorem 1 Under suitable regularity conditions each of the elements of the random matrix
n {P
^
[s,t | Z
F
] -P[s,t | Z
F
]} converges weakly to a zero-mean Gaussian martingale with
covariance function given by
Cov(
n(P
^
ij
(s,t), P
^
km
(s,t)) =
∑
x∈e
⌡
⌠
s
t
F
ij
,
X
(s,u,t) F
km,X
(s,u,t)
s
x
(0)
(β
X
,u)
dΛ
oX
(u) + G
'
ij,X
Σ
-1
X
G
km,X
,
where
F
ij
,
X
=
∑
gh∈t(X)
i≤g<h≤j
D
ighj,X
(s,u,t); ij∈ s
G
ij,X
(s,t)=
⌡
⌠
s
t
∑
gh∈t(X)
i≤g<h≤j
{D
ighj,X
(s,u,t)[Z
gl
- e
X
(β
x
,u)]dΛ
ox
(u))}; ij∈ s
D
ighj,X
(s,u,t) = exp{β
X
Z
gh
} P
ig
(s,u-) [P
hj
(u,t) -P
gj
(u,t)] ij, gh∈ s.
s
x
(0)
(β
X
,u) =
∑
l=1
n
exp{β
X
Z
Xl
(u)}Y
Xl
(t),
e
X
(β
X
,u) =
∑
l=1
n
Z
Xl
(u) exp{β
X
Z
Xl
(u)}Y
Xl
(t)
s
x
(0)
(β
X
,u)
; and
Σ
X
is the covariance matrix of the estimates of β
X
.
Here Ζ
jk
is the union of the set of fixed covariates with a set of indicator covariates that tell
us that an individual is in state j at time t. Y
Xl
(t) is the indicator that individual l is at risk for event
X at time t, and Z
Xl
(t) is the covariate vector for event X for individual l at time t.
Estimators of the variability of the predicted probabilities are obtained by substituting the
appropriate estimator into the covariance in Theorem 1. In particular we have that the variance of P
^
ij
(s,t) is estimated consistently by
∑
x∈e
⌡
⌠
s
t
[
F
^
ij
,
X
(s,u,t)
S
x
(0)
(β
X
,u)
]
2
dΝ
X
(u) + G
^
'
ij,X
i
-1
(β
X
^
) G
^
ij,X
, (3.3)
where dN
x
(t) is the number of type X events occurring at time t and i
X
is the observed information
matrix for the regression estimates for event X.
4. Child-Event Models
The model constructed in Section 3 assumes that for any event X in e the hazard rates of
any two X transitions ij, km ∈t(X) are proportional. This is a testable hypothesis that may fail to
be true in some circumstances. In this section we shall look at models that relax this assumption.
To relax this proportionality assumption we consider models with time dependent
stratification. Suppose we can divide the ancestor set a(X) into two disjoint sets a
s
(X) and a
c
(X) .
Here a
s
(X) is the set of ancestors of X for which a time dependent stratification will be used and
a
c
(X) is the set of ancestors for which the proportional hazards modeling will be used. Let m(X) =
2 to the power the number of elements in a
s
(X). Here m(X) is the total number of distinct baseline
hazard rates to be fit in the model. Number the m(X) baseline hazard rates from (0, ...,0) to
(1,...,1). At an event time T
X
we shall call an event a type X
h
th event if h=(I[T
x'
<t], X'∈ a
s
(X)).
Thus we have created m(X) "child-events", X
h
, from each parent-event X. The X
h
transition set is
naturally t(X
h
) = {ij∈t(X): {h=(I[T
x'
<t], X'∈ a
s
(X) )} as determined by state i}.
For each child event a distinct baseline hazard rate is assumed so that
λ
X
h
(t| Z
F
) = λ
oX
h
(t ) exp{β
FX
Z
F
+
∑
X'∈a
c
(X)
β
X'X
I{T
X'
<t] }
and the hazard rate for each X
h
transition is
λ
ij
(t) = λ
oX
h
(t ) exp{β
X
Z
ij
)}.
Here Z
ij
consists of the fixed covariates and a vector of 0 and 1's with a 1 in the correct position
for any event in a
c
(X) which must have occurred prior to time t to be in state i.
Estimates of Λ
oX
h
(t ) and the β's can be obtained from standard Cox regression packages.
As opposed to the proportional hazards model, in this analysis there may be some time dependent
stratification so that left truncated regression models must be employed. Once the parameter
estimates are obtained and an estimate for Λ
ij
(t) is obtained then these can be used in (3.2) to obtain
estimates of the predicted probabilities.
To illustrate this approach consider the bone marrow transplantation example. One
possible time dependent stratification is to fit different baseline rates for the death event for
individuals whose platelets have or have not recovered. Consider the parent event D whose
ancestors are the events P and A. The set a(D)is divided into the sets a
c
(D)= {A} and a
s
(D)={P}.
Two child events, D
1
and D
2
are defined by {T
p
≥T
D
}and {T
P
<T
D
}. Here D
1
is the event death
without platelets being recovered and D
2
the event death with platelets recovered. Two
proportional hazards models are fit for to the death event. The first model is λ
D
1
(t | Z
F
)= λ
oD
1
(t)
exp{β
FX
Z
F
+β
AD
I[T
A
≤t]}. Individuals are censored for λ
oD
1
when their platelets recover. For
the second model we have λ
oD
2
(t) exp{β
FX
Z
F
+β
AD
I[T
A
≤t]}. Here the analysis for λo
D
2
is
based on a left truncated Cox regression model with individuals entering the risk set at the time at
which their platelets recover. The four transition rates to the state D are
λ
15
(t | Z
F
) = λ
oD
1
(t) exp{β
FX
Z
F
},
λ
25
(t | Z
F
) = λ
oD
2
(t) exp{β
FX
Z
F
},
λ
35
(t | Z
F
) = λ
oD
1
(t) exp{β
FX
Z
F
+ β
AD
}; and
λ
45
(t | Z
F
) = λ
oD
2
(t) exp{β
FX
Z
F
+ β
AD
}.
If in addition to stratifying on the recovery time for the platelets we also stratify for D on
the occurrence of acute GVHD we have a
s
(D)={P,A} and a
c
(D) is the empty set. Now there are
four child events for D corresponding h = (0,0), (1,0), (0,1) and (1,1). These correspond to the
states {T
P
>T
D
, T
A
>T
D
}, {T
P
≤T
D
, T
A
>T
D
}, {T
P
>T
D
, T
A
≤T
D
} and {T
P
≤ T
D
, T
A
≤ T
D
},
respectively. The models for the transitions into state D contain distinct baseline hazard rates for
each of these states, and there are no time dependent covariates in the model.
The asymptotic properties of the estimated prediction probabilities are similar to those in
theorem one with the simple change of the summations over X∈e being changed to double sums
over both X∈e and h=1,...,m(X). For example, the estimated variance of the predicted
probability of a type ij transition in the time period (s,t] is
V
^
(P
^
ij
(s,t)) =
∑
x∈e
∑
h=1
m(X)
⌡
⌠
s
t
[
F
^
ij
,
X
(s,u,t)
S
x
h
(0)
(β
X
,u)
]
2
dΝ
X
h
(u) + G
^
'
ij,X
h
i
-1
(β
X
^
)G
^
ij,X
h
.
In the model presented above the coefficient vector, β
X
, is the same for all child events,
X
h
. This assumption can be relaxed as well by allowing each child event to have its own β. This
involves fitting separate Cox models for each child event. The estimation process follows as
above. Here an estimate of the asymptotic variance of P
^
ij
(s,t) is
V
^
(P
^
ij
(s,t)) =
∑
x∈e
∑
h=1
m(X)
⌡
⌠
s
t
[
F
^
ij
,
X
h
(s,u,t)
S
x
h
(0)
(β
X
h
,u)
]
2
dΝ
X
h
(u) + G
^
'
ij,X
h
i
-1
(β
X
h
^
)G
^
ij,X
h
.
The extreme case of this model is where all events are divided to their fullest (i.e. each child
event corresponds to one and only one transition) and each transition has its own β. This is the
usual model for multi-state processes introduced by Andersen et al (1991) (Cf. Andersen et al
(1993) Section VII.2).
5. BONE MARROW TRANSPLANT EXAMPLE
To illustrate these calculations we shall fit the multistate proportional hazards model to the
data from the International Bone Marrow Transplant Registry. As shown in figure 1 we have a
model with two intermediate events, platelet recovery (P) and acute GVHD (A) and two terminal
events, death in remission (D) and relapse (R). There were 1823 patients in the data set.
After a careful examination of the effects of various fixed time covariates on the four events
we found that the most important covariates were the patients Karnofsky score at transplant, their
waiting time from diagnosis to transplant and their age. In testing for proportional hazards for each
of these covariates using a time dependent covariate approach (See Klein and Moeschberger
(1996)) we found that the relapse hazards were not proportional at different ages. In the analysis
reported below we have decided to stratify all the analysis on age (two strata age ≤20 or age >20).
The other two risk factors were discretized as Karnofsky Score ≤80 versus Karnofsky score ≥90,
and time from diagnosis to transplant ≤10 weeks versus >10 weeks.
To apply the proportional hazards model we fit four Cox models to the data, one for each
of the four endpoints. For each event, X, we include a time dependent covariate for each event in
a(X). The results are found in Table 1.
Table 1
Estimated Risk Coefficients And Standard Errors For The Proportional Hazards
Model
Covariate
Platelet
Recovery
Acute GVHD
Death in
Remission
Relapse
Karnofsky Score ≤80
-.333 (.075)
.208 (.109) *
.359 (.108)
.414 (.119)
Waiting Time >10 Weeks
-.062 (.060) *
.014 (.099) *
.411 (.099)
.351 (.102)
Platelet Recovered
-.347 (.166)
-1.405 (.116)
-.322 (.126)
Acute GVHD
-0.433 (.074)
1.172 (.097)
-.283 (.130)
* Not significant at 5% level
Here we see that patients with a low Karnofsky score tend to take longer to have their
platelets recover and are more likely to die or relapse. Patients with a long waiting time to
transplant also have an increased risk of relapse and death.
Examining the two time dependent covariates we see that when a patient's platelets recover
their risks of GVHD, death and relapse are decreased. When a patient develops GVHD their risk
of relapse is decreased but their risk of death is increased. This decease in relapse risk is the well-
known graft-versus-leukemia effect of GVHD.
To examine the fit of the proportional hazards model we also fit the Andersen model with
distinct baseline hazard rate (stratified on age) and different covariate values for each transition.
Here a standard Cox model is used for transitions 12, 13, 15, 16 and left truncated Cox models are
used for all other transitions. The results are in Table 2.
Table 2
Estimated Risk Coefficients And Standard Errors From Fitting The Andersen
Model
Transition
Karnofsky Score ≤80
Waiting Time >10 Weeks
1->2
-.319 (.083)
-.065 (.065)*
1->3
.251 (.115)
-.013 (.106)
1->5
.422 (.185)
.760 (.170)
1->6
.609 (.251)
.518 (.239)
2->4
-.098 (.364)*
.189 (.288)*
2->5
.959 (.254)
.031 (.267)*
2->6
.332 (.157)
.246 (.127)
3->4
-.334 (.173)
-.040 (.146)
3->5
.142 (.190)*
.330 (.180)*
3->6
1.063 (.454)
.445 (.434)*
4->5
.235 (.273)*
.297 (.233)*
4->6
.133 (.372)*
.474 (.297)*
* Not significant at 5% level
To examine the fit of the simpler proportional hazards we plot in Figure 2 the logs of the
baseline hazards estimated from the Andersen model for each of the transitions. If the proportional
hazards model holds true then we should have parallel curves for each transition into one of the
four events. A cursory look at these figures does not suggest any marked departure from
proportionality.
We shall use the proportional hazards multistate model to examine how a patient's
prognosis at one year after transplant depends on their history in the first few weeks of their
recovery process. We first estimate the probability of dying in remission in the first year given the
patient's history at s weeks following transplant for each of the four possible states a patient may
be in at s weeks. This estimated probability is given by P
^
i5
[7s,365]. Figure 3 shows the
estimates under the proportional model for an individual who is under 20 years of age with a
Karnofsky score of 90 or more and a waiting time to transplant of less than 10 weeks. Other
values of the fixed covariates would give slightly different pictures. Here a patient is initially in the
state 1 and we see that when their platelets recover their risk of death drops. The development of
GVHD at any point in time elevates the chance of death. This probability is particularly high if the
platelets have yet to recover. Figure 4 gives the one year probability of relapsing for each of the
four states. Here again a patient is initially in state 1 and has a relatively high likelihood of
relapsing. When graft-versus-host disease occurs this probability drops.
Figure 5 gives the leukemia free survival probabilities for the first year given a patient's
history at s weeks. This is the probability of being alive and disease free at the end of the first year
after transplant. This probability is given by 1- {P
i5
[7s,365]+ P
i6
[7s,365]}. The curves
naturally increase as a patient survives disease free for a longer time. We see that once a patient
has their platelets recover their prognosis is much better. The occurrence of GVHD without the
platelets being recovered leads to the least favorable prognosis.
Figure 6 shows 95% confidence intervals and point estimates for the leukemia free survival
at one year for each possible history a patient may have at s weeks. For comparison the
proportional hazards and Andersen models are presented. Here we note that the confidence
intervals based on the proportional hazards model are shorter. This is to be expected since this
model has fewer parameters to estimate.
6 DISCUSSION
In our example we have presented estimates of predicted probabilities for some basic
outcomes in bone marrow transplantation for a patient with a given history at some point in their
recovery process. Similar plots can be used to examine how different values of the fixed time
covariates affect the predicted patient prognosis.
We have chosen here to fix the time, t, to which the prediction is made at one year and to
see how changes in the history affect the estimated probabilities. We could have fixed the time at
which the history was measured and draw a curve for a range of times. These curves would be
predicted survival curves given a patient's history at some time. An example of this approach can
be found in Klein et al (1993).
The models presented here can also be used to provide some insight into how changing the
rate or the timing of intermediate events effect a patient's eventual prognosis. For example, if
some therapy was developed to increase the rate at which platelets recover this hypothetical therapy
could be compared to existing therapy by modifying the baseline platelet recovery hazard rate and
examining the predicted probabilities of death and relapse. This approach can also be used to
examine how changing the rate at which one competing risk occurs affects the occurrence of
another competing risk. For example, if the treatment mortality rate where cut in half how does
this effect the predicted probability of relapse? This approach is more reasonable than existing
methods for analyzing competing risks where one postulates a world in which one of the
competing risks can not occur.
The basis of all the models presented here is a sound preliminary analysis of the data using
proportional hazards regression models. This analysis involves not only finding important
prognostic factors, but also involves checking of the proportionality assumptions of the models to
determine the number of child events.
ACKNOWLEDGMENTS
This research was supported by contract 2R01 CA54706-04A1 from the National Cancer
Institute.
REFERENCES
Andersen, P.K., Borgan, Ø ., Gill, R.D. and Keiding, N. (1993). Statistical Models
Based on Counting Processes. Springer-Verlag.
Andersen, P.K., Hansen, L.S. and Keiding N. (1991). Non- and Semi- parametric
Estimation of Transition Probabilities from Censored Observation of a Non- homogeneous Markov
Process. Scand. J. Statist. 18: 153-167.
Arjas, E. and Eerola, M. (1993). On predictive Causality in Longitudinal Studies. J. of
Statist. Planning and Inference, 34, 361-386.
Breslow, N.E. (1972). Contribution to the discussion on the paper by D.R. Cox,
Regression and life table. J. Roy. Statist. Soc. B., 34: 216-7.
Cox, D.R. (1972). Regression models and life tables (with discussion). J. Roy. Statist.
Soc. B 34, 187-220.
Eerola, M. (1993). On Predictive Causality in the Statistical Analysis of a Series of
Events. Statistical Research Report 14, The Finnish Statistical Society.
Gill, R.D. and Johansen, S. (1990). A survey of product-integration with a view towards
application in survival analysis. Ann. Statist. 18, 1501-1555.
Klein, J.P., Keiding, N. and Copelan, E.A. (1993). Plotting Summary Predictions in
Multistate Survival Models: Probabilities of Relapse and Death in Remission for Bone Marrow
Transplantation Patients. Statist. Med., 12, 2315-2332.
Klein, J.P. and Moeschberger, M.L. (1996). Survival Analysis, Springer, New York,
(In Press).
Qian, C. (1995). Time-dependent covariates in a general survival model with any finite
number of intermediate and final events unpublished Ph.D. Thesis, The Ohio State University.