Page 1

On Estimating the Relationship between Longitudinal

Measurements and Time-to-Event Data Using a Simple

Two-Stage Procedure

Paul S. Albert and Joanna H. Shih

Biometric Research Branch

Division of Cancer Treatment and Diagnosis

National Cancer Institute

Bethesda, Maryland U.S.A. 20892

January 30, 2009

SUMMARY

Ye et al. (2008) proposed a joint model for longitudinal measurements and time-to-event

data in which the longitudinal measurements are modeled with a semiparametric mixed

model to allow for the complex patterns in longitudinal biomarker data. They proposed a

two-stage regression calibration approach which is simpler to implement than a joint mod-

eling approach. In the first stage of their approach, the mixed model is fit without regard

to the time-to-event data. In the second stage, the posterior expectation of an individual’s

random effects from the mixed-model are included as covariates in a Cox model. Although

Ye et al. (2008) acknowledged that their regression calibration approach may cause bias

due to the problem of informative dropout and measurement error, they argued that the

bias is small relative to alternative methods. In this article, we show that this bias may

be substantial. We show how to alleviate much of this bias with an alternative regression

calibration approach which can be applied for both discrete and continuous time-to-event

data. Through simulations, the proposed approach is shown to have substantially less bias

than the regression calibration approach proposed by Ye et al. (2008). In agreement with the

methodology proposed by Ye et al., an advantage of our proposed approach over joint mod-

eling is that it can be implemented with standard statistical software and does not require

complex estimation techniques.

Page 2

1Introduction

Ye et al. (2008) proposed a two-stage regression calibration approach for estimating the re-

lationship between longitudinal measurements and time-to-event data. Their approach was

motivated by trying to establish such a relationship when the longitudinal measurements

follow a complex semi-parametric mixed model with subject-specific random stochastic pro-

cesses and the time-to-event data follow a proportional hazards model. Specifically, they

proposed a semi-parametric model with additive errors for the longitudinal measurements

Xijof the form

Xij

= Z?

iβ + ϕ(tij) + Ui(tij)bi+ Wi(tij) + ?ij

= X∗

i(tij) + ?ij,(1)

where β is a vector of regression coefficients associated with fixed effect covariates Zi, ϕ(t)

is an unknown smooth function over time, biis a vector of subject-specific random effects

corresponding to covariates Ui(t) which is assumed normally distributed with mean 0 and

variance Σb. Further, Wi(tij) is a zero mean integrated Wiener stochastic process. We

denote Xias all longitudinal measurements on the ith individual.

In Ye et al.’s approach, the relationship between the slope of the longitudinal process

and a time-to-event outcome Tiis characterized by a Cox proportional hazards model with

the slope at time t, denoted as X∗?

i(t), being treated as a time-dependent covariate. The

authors proposed a two-stage estimation procedure in which in the first stage, the mean of

the posterior distribution of the slope at time t, E[X∗?

i(t)|Xi,Zi], is estimated using model

(1) without regard to the time-to-event process Ti. In the second stage, E[X∗?

i(t)|Xi,Zi]

replaces X∗?

i(t) in the Cox model. Ye et al. (2008) proposed two approaches: (i) the ordinary

regression calibration (ORC) approach in which E[X∗?

i(t)|Xi,Zi] is estimated using (1) with

all available longitudinal measurements and (ii) the risk set regression calibration (RRC)

approach in which these expectations are obtained by estimating model (1) after each event

using only longitudinal measurements for subjects at risk at time t (i.e., subjects who have

1

Page 3

an event before time t are removed from the estimation).

The advantage of these regression calibration approaches are that they do not require the

complex joint modeling of the longitudinal and time-to-event processes. In the discussion

of their paper, Ye et al. acknowledge that these approaches may result in biased estimation

due to informative dropout and measurement error, and that improved performance will

require incorporating informative dropout and the uncertainty of measurement error into

the estimation. In this article we show that an alternative two-stage procedure can be for-

mulated which reduces the bias considerably without requiring complex joint modeling of

both processes. For simplicity, we develop the approach for a longitudinal model without the

smooth function ϕ(t) and the stochastic component Wi(t) in (1), but the proposed approach

applies more generally. In this approach, we approximate the conditional distribution of the

longitudinal process given the event time, simulate complete follow-up data based on the

approximate conditional model, and then fit the longitudinal model with complete follow-up

on each patient (hence avoiding the problem of informative dropout in Ye et al.’s approach).

Section 2 develops the approach for a discrete event time distribution followed by an approx-

imation for the continuous event time distribution. The results of simulations which show

the advantages of the proposed approach over ORC and RRC are provided in Section 3. A

discussion follows in Section 4.

2Modeling Framework

We begin by considering a discrete event time distribution. Define Tito be a discrete event

time which can take on discrete values tj, j = 1,2,..,J, and Yijto be a binary indicator of

whether the ith patient is dead at time tj. Then Ji=

J ?

j=1(1−Yij) = J−Yi.where Yi.=

J ?

j=1Yij

indicates the number of follow-up measurements before death or administrative censoring

for the ith patient. Every patient will be followed until death or the end of follow-up at time

tJ.

For illustrative purposes, we will consider a joint model for longitudinal and discrete time-

to-event data in which the discrete event time distribution is modeled as a linear function of

2

Page 4

the slope of an individual’s longitudinal process on the probit scale. Specifically,

P(Yij= 1|Yi(j−1)= 0) = Φ(α0j+ α1bi1),(2)

where j = 1,2,...,J, Yi0is taken as 0, α0jgoverns the baseline discrete event time distribution

and bi1is the individual slope from the linear mixed model,

Xij= Xi∗(tj) + ?ij, (3)

Xi∗(tj) = β0+ β1tj+ bi0+ bi1tj,(4)

where i = 1,2,...,I and j = 1,2,...,Ji. In (4), the parameters β0 and β1 are fixed-effect

parameters characterizing the mean intercept and slope of the longitudinal process, respec-

tively, (bi0,bi1)?is a vector of random effects which are assumed multivariate normal with

?

normal with mean zero and variance σ2

mean 0 and variance Σb=

σ2

σb0,b1

b0

σb0,b1

σ2

b1

?

, and ?ijis a residual error term which is assumed

?. In (2)-(4), the event time and the longitudinal pro-

cess are linked through bi1, and the parameter α1governs the relationship between the slope

of the longitudinal process and the event time distribution. Denote Xi= (Xi1,Xi2,...,XiJi)?,

bi= (bi0,bi1)?, and β = (β0,β1)?. As in Ye et al., the normality assumption for biis made

for these joint models. Although, not the focus of this article, various articles have proposed

methods with flexible semi-parametric random effects distributions and have demonstrated

that inferences are robust to departures from normality (Song et al., 2002; Hsieh et al.,

2006).

For estimating the relationship between the slope of the longitudinal process and the time-

to-event process, the calibration approach of Ye et al. (2008) reduces to first, estimating

E[bi1|Xi,β] using (3) and (4), and second, replacing bi1by E[bi1|Xi,?β] in estimating (2).

problem of informative dropout, whereby bi0and bi1can depend on the event time Ti(which

As recognized by Ye et al., this methodology introduces bias in two ways. First, there is the

will occur if α1?= 0 in (2) ). Ignoring this informative dropout may result in substantial bias.

Second, not accounting for the measurement error in E[bi1|Xi,?β] relative to true values of

bi1will result in attenuated estimation of α1.

3

Page 5

We propose a simple approach which reduces these two sources of bias. We first focus

on the problem of informative dropout.The bias from informative dropout is a result

of differential follow-up whereby the response process is related to the length of follow-up

(i.e., in (2)-(4), when α1 is positive, patients who die early are more likely to have large

positive slopes). There would be no bias if all J follow-up measurements were observed on

all patients. Thus, we recapture these missing measurements by generating data from the

conditional distribution of Xigiven Ti, denoted as Xi|Ti. Since Xi|Tiunder (2)-(4) does not

have a tractable form, we propose a simple approximation for this conditional distribution.

Under model (2)-(4), the distribution of Xi|Tican be expressed as

P(Xi|Ti) =

?

h(Xi|bi,Ti)g(bi|Ti)dbi. (5)

Since Tiand the values of Xiare conditionally independent given bi, h(Xi|bi,Ti) = h(Xi|bi),

where h(Xi|bi) is the product of Ji univariate normal density functions each with mean

X∗

i(tj) (j = 1,2,...,Ji) and variance σ2

?. The distribution of Xi|Tican easily be obtained with

standard statistical software if we approximate g(bi|Ti) by a normal distribution. Under the

assumption that g(bi|Ti) is normally distributed with mean µTi= (µ0Ti,µ1Ti)?and variance

Σ∗

bTi, and by rearranging mean structure parameters in the integrand of (5) so that the

random effects have mean zero, Xi|Ticorresponds to the following mixed model

Xij|(Ti,b∗

i0Ti,b∗

i1Ti) = β∗

0Ti+ β∗

1Titj+ b∗

i0Tii+ b∗

i1Titj+ ?∗

ij, (6)

where i = 1,2,...,I, j = 1,2,..,Ji, and the residuals ?∗

ijare assumed to have independent

normal distributions with mean zero and variance σ∗

?

2. Further, the fixed-effects parameters

β∗

0Tiand β∗

Tior who are censored at time Ti= tJ. In addition, the associated random effects b∗

1Tiare intercept and slope parameters for patients who have an event at time

iTi=

(b∗

i0Ti,b∗

i1Ti)?are multivariate normal with mean 0 and variance Σ∗

bTifor each Ti. Thus,

this flexible conditional model involves estimating separate fixed effect intercept and slope

parameters for each potential event-time and for subjects who are censored at time tJ.

4

Page 6

Likewise, separate random effects distributions are estimated for each of these discrete time

points. For example, the intercept and slope fixed-effect parameters for those patients who

have an event at time Ti= 3 are β∗

03and β∗

13, respectively. In addition, the intercept and

slope random effects for those patients who have an event at Ti= 3, b∗

iTi= (b∗

i03,b∗

i13)?, are

multivariate normal with mean 0 and variance Σ∗

b3. Model (6) can be fit with standard R

code which is available from the first author.

A similar approximation of the conditional distribution of the longitudinal process given

dropout time has been proposed for estimating mean change over time in longitudinal mea-

surements subject to informative dropout (Wu and Bailey, 1989; Wu and Follmann, 1999).

In this article, we use the approximation to construct complete longitudinal datasets which

in turn are used to to estimate the mean of the posterior distribution of an individual’s

random effects given the data. Specifically, multiple complete longitudinal datasets can then

be constructed by simulating Xijvalues from (6) where the parameters are replaced by their

estimated values. Since the simulated datasets have complete follow-up on each individual,

the bias in estimating E[bi1|Xi,β] caused by informative dropout is much reduced.

We provide a correction to account for the measurement error in using E[bi1|Xi,?β],

adjust for measurement error in a covariate, we note that

denoted as?bi1, instead of using the actual random slope bi1. As in Carroll et al. (1984) who

P(Yij= 1|Yi(j−1)= 0,Xi) =

?

Φ(α0j+ α1b1i)g(b1i|Xi)db1i

?

1 + α2

= Φ

α0j+ α1?bi1

?

1Var(?bi1− bi1)

?

, (7)

where Var(?bi1− bi1) measures the error of estimation in?bi1relative to bi1which is the the

?

where Q =

1-1 element in the matrix Var(?bi− bi), given by

Var(?bi− bi) = Σb− ΣbR?

i=1(F?

and Ware, 1982; Verbeke and Molenberghs, 2000). Further, Fiand Riare vectors of fixed

i

Wi− WiFiQF?

iWi

?

RiΣb,

I ?

iWiFi)−1, and where Wi= V−1

i, where Viis the variance of Xi(Laird

and random effects for the ith subject. This variance formula incorporates the error in

5

Page 7

estimating the fixed effects in the longitudinal model. Expression (7) follows from that fact

?

Only individuals who have at least two longitudinal measurements provide useful informa-

that EΦ(a + V )

?

= Φ

?

(a + µ)/√1 + τ2?

, where V ∼ N(µ,τ2).

tion in assessing the relationship between an individual’s slope and their time-to-event data,

so we assume that all individuals in the analysis have at least two follow-up times. Thus,

α01= α02= −∞ and the regression parameters in the discrete-time model α0j(j = 3,4,..,J)

and α1can be estimated by maximizing the likelihood

L =

I?

i=1

? Ji

?

j=1

{1 − P(Yij= 1|Yi(j−1)= 0,Xi)}(1−Yij)?

P(Yi(Ji+1)= 1|YiJi= 0,Xi)Ji<J. (8)

Thus, we propose the following algorithm for estimating α0j(j = 3,4,...,J) and α1with a

two-stage procedure:

1. Estimate model (6) with all available longitudinal measurements using linear mixed-

modeling software such as lme in R.

2. Simulate complete longitudinal pseudo measurements (i.e., Xijfor i = 1,2...,I and j =

1,2,...,J) using model (6) with model parameters estimated from step 1. Specifically,

these measurements are simulated by first simulating values of b∗

iTifrom a normal

distribution with mean 0 and variance Σ∗

bTiand ?∗

ijfrom a normal distribution with

mean 0 and variance σ∗

?

2, where the variance parameters are estimated in step 1.

3. Estimate model (3) and (4) (without regard to the event time distribution (2) ) with

complete longitudinal measurements simulated in step 2 using linear mixed modeling

software.

4. Estimate α0j (j = 1,2,...,J) and α1(denoted as ?

α0j and ? α1, respectively) using (7)

and (8) with?bi1obtained from step 3.

5. Repeat steps 2 to 4 M times and average ?

α0jand ? α1to get final estimates.

6

Page 8

The approach can be generalized for continuous event-time distributions where, Ti is

the continuous event time for the i individual, all individuals are followed up to time TE,

and where patients are administratively censored at the end of the study when Ti > TE.

In addition, the Cox model, λ(t,bi1) = λ0(t)exp(αbi1) is used to relate the longitudinal

measurements to time-to-event data.We can approximate this conditional distribution

by first discretizing the follow-up interval into K equally spaced intervals. We define di

as a discretized version of the continuous event time distribution, whereby, di = k when

?

time is administratively censored at time TE. The conditional distribution of the longitu-

Ti∈

(k − 1)TE/K,kTE/K

?

, k = 1,2,..,K, and where di= K + 1 when patient i’s event

dinal measurements given the continuous event time, Xi|Ti, can be approximated by the

distribution of the longitudinal measurements given the discretized version di, Xi|di, where,

as for the discrete event time model, this conditional distribution can be approximated by a

linear mixed model

Xij|(di,b∗

i0di,b∗

i1di) = β∗

0di+ β∗

1ditj+ b∗

i0di+ b∗

i1ditj+ ?∗

ij, (9)

where i = 1,2,...,I and j = 1,2,,,,Ji, and where Jiis the number of follow-up measurements

before death or administrative censoring for the ith patient. Similar to (6), β∗

0diand β∗

1di

are intercept and slope parameters for patients with a discretized event time of di. Also,

b∗

idi= (b∗

i0dib∗

i1di)?are assumed to be normally distributed with mean 0 and variance Σ∗

bdi.

For continuous event times, we apply the previous algorithm for discrete-time data except

that in Step 1 we fit model (9) for a reasonably large K, and in Step 3, we fit a Cox model

without a measurement error correction instead of the discrete-time model.

Asymptotic standard errors from the discrete or continuous event time models cannot be

used for inference since they fail to account for the missing data uncertainty in our procedure.

The bootstrap (Efron and Tibshirani, 1993) can be used for valid standard error estimation.

7

Page 9

3Simulations

We evaluated the procedure for both discrete and continuous time to event data with a

simulation study. For discrete event-time data, we assume that there are potentially five

follow-up times J = 5 at discrete times tj = j (j = 1,...,5) and I = 300 subjects having

at least 2 or more longitudinal measurements (i.e., α01= α02= −∞) with α0j= 0.50 for

j = 3,4, and 5. Table 1 shows the mean and standard deviation for various estimators of α1.

These values are provided for estimators in which bi1is assumed known, estimators which

use complete simulated data, ORC and RCC, and our proposed approach with and without

measurement error correction for different numbers of simulated datasets (M). The results

show that ORC and RCC have a approximately 10% bias, while the proposed approach is

unbiased. We also found that choosing M = 10 provided a good balance between efficiency

and computational efficiency. Further, not incorporating the measurement error correction

in the proposed approach had little effect on the results. We found this to be the case even

when we increased the measurement error above 1, suggesting that this adjustment is not

particularly important for the simple model in which longitudinal measurements and survival

are linked through a random slope parameter. The measurement error correction may be

more important for a more complex model such as that presented by Ye et al. (2008).

For continuous time-to-event data, the simulation was conducted with an exponential

survival distribution with a mean of 5 years when bi= 0, administrative censoring after 5

years, and α = 0.5. We also assume that longitudinal measurements are taken at t1= 0,

t2 = 0.125, t3 = 0.25, t4 = 0.75, t5 = 1, t6 = 2, t7 = 3, t8 = 4, and t9 = 5 (J = 9),

with survival times being categorized into one-year intervals with K = 5. Table 2 shows

the results of these simulations with I = 300, α = 0.5, M = 10, and different values of the

measurement error σ?. We present the mean (standard deviation) of parameter estimates

with complete longitudinal data, ORC, RRC, and the the proposed approach. Although the

proposed approach has increasing bias as σ?becomes large, this approach has less bias than

both ORC and RCC for all values of σ?. Further, we conducted an additional simulation

in which measurements t3 to t9 were missing with probability 0.5, creating datasets with

8

Page 10

fewer observations on each subject. Results were essentially the same as reported in Table

2, suggesting that our approach does well even with shorter sequences of longitudinal data

(data not shown).

4 Discussion

This article proposes a simple regression calibration approach for estimating the relationship

between longitudinal measurements and time-to-event data which accounts for informative

dropout in the longitudinal process. The approach is not completely unbiased since since

the conditional distribution of the longitudinal process given the event time is approximated

by a multivariate normal distribution. Particularly when the longitudinal and time-to-event

processes are strongly linked, there may be small amounts of departure from normality.

The effect of this lack of normality on bias appears to increase as the measurement error

increases. However, in most situations, the bias is substantially smaller than the ORC and

RCC approaches proposed in Ye et al. The simulation results demonstrate that, in general,

the proposed approach results in estimates with increased variance relative to ORC and

RRC. More precise estimation is possible under a more parsimonious paramaterization. For

example, β∗

1Tiin (6) may be modeled as linear in Ti.

The proposed approach could be applied to a setting in which the two processes are

linked through the true value of the longitudinal processes and time-to-event distribution.

Further, the approach could be extended to allow for a more complex stochastic processes

mean structure for the longitudinal process and for a semi-parametric fixed-effect structure

as proposed by Ye et al. (2008). This would involve fitting model (6) or (9) with a different

smooth curve ϕ(t) and stochastic process Wi(t) for each discretized dropout time. Such a

model could be fit within the framework proposed by Zhang et al. (1998).

Our setup assumes that event times are only administratively censored after a fixed

follow-up at the end of the study. For the case in which patients are censored prematurely

dropout times can be imputed based on a model fit using only patients who had the potential

to be followed over the entire study duration.

9

Page 11

REFERENCES

Carroll, R.J., Spiegelman, C.H., Lan, K.K.G., Bailey, K.T., and Abbott, R.D. (1984). On

errors-in-variables for binary regression models. Biometrika 71, 19-25.

Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Boostrap Chapman and Hall:

New York.

Hsieh, F., Tseng, Y.K., and Wang, J.L. (2006). Joint modeling of survival and longitudinal

data: likelihood approach revisted. Biometrics 62, 1037-1043.

Laird, N.M. and Ware, J.H. (1982). Random-effects models for longitudinal data. Biomet-

rics 38, 963-974.

Song, X., Davidian, M., and Tsiatis A. A. (2002) A semiparametric likelihood approach for

joint modeling of longitudinal data and time-to-event data. Biometrics 58, 742-753.

Tsiatis, A.A. and Davidian, M. (2004). Joint modeling of longitudinal and time-to-event

data: An overview. Statistica Sinica 14, 809-834.

Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. New

York: Springer Verlag.

Wu, M.C. and Bailey, K.R. (1989) Estimation and comparison of changes in the presence

of informative right censoring: conditional linear model. Biometrics 45, 939-955.

Wu, M.C. and Follmann, D.A. (1999). Use of summary measurements to adjust for infor-

mative missingness repated measures data with random effects. Biometrics 55, 75-84.

Zhang, D., Lin, X., Raz, J., and Sowers, M. (1998). Semiparametric stochastic mixed

models for longitudinal data. Journal of the American Statistical Association 93, 710-

719.

Ye, W., Lin, X., and Taylor, J.M.G. (2008). Semiparametric modeling of longitudinal meau-

rements and time-to-event data-A two stage regression calibration approach. Biomet-

rics 64, 1238-1246.

10

Page 12

Table 1: Estimates of α1from model (2)-(4) when β0= 1, β1= 3, σb0= 1, σb1= 1, and

σb0,b1= 0. We assume that σ?= 0.75, α0= −1.5, α1= 0.50, J = 5, and I = 300. Further,

we assume that tj = j and all individual’s who are alive at t5 = 5 are administratively

censored at that time point. The means (standard deviations) from 1000 simulations are

presented.

Estimator

Known1

Complete2

ORC

RRC

Prop M=3 w/o MC

Prop M=3 w/ MC

Prop M=10 w/o MC

Prop M=10 w/ MC

Prop M=20 w/o MC

Prop M=20 w/ MC

Prop M=50 w/o MC

Prop M=50 w/ MC

Prop M=100 w/o MC

Prop M=100 w/ MC

? α1

0.50

0.45

0.45

0.49

0.49

0.49

0.49

0.49

0.49

0.49

0.49

0.49

0.49

SD

0.076

0.077

0.078

0.075

0.097

0.098

0.090

0.092

0.089

0.090

0.088

0.089

0.087

0.089

0.50

1Model (2) fit with bi1assumed known.

2Model (2) fit with?bi1 replacing bi1. The empirical Bayes estimates?bi1 are obtained by

fitting (3) and (4) with complete longitudinal measurements.

11

Page 13

Table 2: Estimates of α from model (3)-(4) and λ(t,bi1) = λ0(t)exp(αbi1) where I = 300

and M = 10. We also assume that β0= 1, β1= 3, σb0= 1, σb1= 1, and σbi0,bi1= 0. The

means (standard deviations) from 1000 simulations are presented.

Parameters

σ?

0.2

Estimators of α

ORC

0.22(0.069)

0.43(0.072)

0.19(0.066)

0.37(0.069)

0.14(0.067)

0.32(0.067)

α Complete

0.25(0.072)

0.50(0.077)

0.25(0.072)

0.50(0.077)

0.25(0.072)

0.50(0.078)

RRCProposed

0.24 (0.084)

0.48 (0.088)

0.24(0.104)

0.48(0.109)

0.23(0.149)

0.46(0.155)

0.25

0.50

0.25

0.50

0.25

0.50

0.21 (0.071)

0.42(0.072)

0.19 (0.067)

0.36(0.068)

0.16 (0.063)

0.30(0.064)

0.5

1.0

12