Page 1

Biostatistics (2011), 12, 2, pp. 354–368

doi:10.1093/biostatistics/kxq061

Advance Access publication on September 21, 2010

Estimation of the 2-sample hazard ratio function using a

semiparametric model

SONG YANG∗

Office of Biostatistics Research, National Heart, Lung, and Blood Institute,

6701 Rockledge Drive, MSC 7913, Bethesda, MD 20892, USA

yangso@nhlbi.nih.gov

ROSS L. PRENTICE

Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North,

PO Box 19024 Seattle, WA 98109, USA

SUMMARY

The hazard ratio provides a natural target for assessing a treatment effect with survival data, with the

Cox proportional hazards model providing a widely used special case. In general, the hazard ratio is a

function of time and provides a visual display of the temporal pattern of the treatment effect. A variety of

nonproportionalhazardsmodelshavebeenproposedintheliterature.However, availablemethodsforflex-

ibly estimating a possibly time-dependent hazard ratio are limited. Here, we investigate a semiparametric

model that allows a wide range of time-varying hazard ratio shapes. Point estimates as well as pointwise

confidence intervals and simultaneous confidence bands of the hazard ratio function are established under

this model. The average hazard ratio function is also studied to assess the cumulative treatment effect. We

illustrate corresponding inference procedures using coronary heart disease data from the Women’s Health

Initiative estrogen plus progestin clinical trial.

Keywords: Clinical trial; Empirical process; Gaussian process; Hazard ratio; Simultaneous inference; Survival analy-

sis; Treatment–time interaction.

1. INTRODUCTION

Consider the comparison of failure times between a treated and control group under independent cen-

sorship. The hazard ratio provides a natural target of estimation in many applications since it permits a

focus on relative failure rates across the study follow-up period, without the need to model absolute fail-

ure rates, which may be sensitive to study eligibility criteria and other factors. The proportional hazards

special case of the Cox (1972) regression model is widely used for hazard ratio estimation. The maximum

partial likelihood procedure (Cox, 1975) provides a convenient and robust means of estimating a constant

hazard ratio and yields a log-rank procedure for testing equality of hazards between the 2 groups.

∗To whom correspondence should be addressed.

c ? The Author 2010. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Page 2

Estimation of the 2-sample hazard ratio function using a semiparametric model

355

In general, the hazard ratio may be a function of time, and estimation of the hazard ratio function may

provide useful insights into temporal aspects of treatment effects. For example, Gilbert and others (2002)

develop a nonparametric estimation procedure for the log-hazard ratio function with simultaneous confi-

dence bands, for use as an exploratory data analytic tool. Naturally, confidence bands may be wide with

such a nonparametric estimator, particularly at longer follow-up times where data may be sparse. See also

Gray (1992), Kooperberg and others (1995), Cai and Sun (2003), Tian and others (2005), Abrahamowicz

and Mackenzie (2007), and Peng and Huang (2007), and references therein, for additional related work.

Parametric or semiparametric hazard ratio models have potential to contribute valuably to treatment

effect assessment. Hazard ratio models having parameters of useful interpretation, and that embrace a

range of hazard ratio shapes, may be particularly valuable. The Cox model allows time-varying covariates

to be defined that can, for example, allow separate hazard ratios for the elements of a partition of the

time axis or allow the hazard ratio to be a parametric function of follow-up time more generally. Various

other semiparametric regression models have been proposed for failure time data analyses, including

accelerated failure time models, proportional odds models, and linear transformation models, many of

which are embraced by the broad class of models for which Zeng and Lin (2007) develop maximum

likelihood estimation procedures. Some more semiparametric models can be found in Vaupel and others

(1979), Hsieh (1996), Chen and Wang (2000), Tsodikov (2002), Yang and Prentice (2005), and Chen and

Cheng (2006). Many of these models induce a semiparametric class of models for the hazard ratio function

that includes proportional hazards as a special case. Hazard ratio estimators under such semiparametric

models can avoid the instability that may attend nonparametric hazard ratio function estimators.

One of these, proposed by Yang and Prentice (2005), involves short-term and long-term hazard ratio

parameters, and a hazard ratio function that depends also on the control group survivor function. Assume

absolutely continuous failure times and label the 2 groups control and treatment, with hazard functions

λC(t) and λT(t), respectively. Let h(t) = λT(t)/λC(t) be the hazard ratio function and SC(t) the survivor

function of the control group. The model postulates that

h(t) =

1

e−β2+ (e−β1− e−β2)SC(t),

t < τ0,

(1.1)

where β1and β2are scalar parameters and

τ0= sup

?

x:

?x

0

λC(t)dt < ∞

?

.(1.2)

This model includes the proportional hazards model and the proportional odds model as special cases. It

has a monotone h(t) with a variety of patterns, including proportional hazards, no initial effect, disappear-

ing effect, and crossing hazards, among others. Thus, the model presumably entails sufficient flexibility

for many applications. It has also been studied for current status data in Tong and others (2007).

In comparison, for many commonly used special cases of the accelerated failure time model either

limt↓0h(t) = 1 or limt↑τ0h(t) ∈ {0,1,∞} and the hazard ratio stays above or below one when λC

is increasing. This is less flexible than desired. For the class of linear transformation models, with the

logarithmic transformation, the hazard ratio also inherits some of these restrictions at many common

baseline distributions. Similar properties hold as well for many other semiparametric models.

Under model (1.1), estimation procedures to date have focused on the finite-dimensional parameters,

as has mostly been the case also for estimation under other semiparametric models. Here, we extend the

estimation to pointwise and simultaneous inference on the hazard ratio function itself. First, consistency

and asymptotic normality of the estimate at a fixed time point are established. Then procedures for con-

structing pointwise confidence intervals and simultaneous confidence bands for the hazard ratio function

are developed, and some modifications are implemented to improve moderate sample size performance.

Page 3

356 S. YANG AND R. L. PRENTICE

For additional display of the treatment effect, simultaneous confidence bands are also obtained for the

average hazard ratio function over a time interval. The average hazard ratio gives a summary measure of

treatment comparison and provides a picture of the cumulative treatment effect to augment display of the

temporal pattern of the hazard ratio. These hazard ratio estimation procedures are applied to data from the

Women’s Health Initiative (WHI) estrogen plus progestin clinical trial (Writing Group For the Women’s

Health Initiative Investigators, 2002; Manson and others, 2003), which yielded a hazard ratio function

for the primary coronary heart disease outcome that was decidedly nonproportional. Understanding the

hazard ratio function shape in this setting was important to integrating the clinical trial data with a large

body of preceding observational literature that had failed to identify an early hazard ratio increase (e.g.

Manson and others, 2003; Prentice and others, 2005).

We organize the article as follows: In Section 2, the short-term and long-term hazard ratio model

and the hazard ratio estimate are described. Pointwise confidence intervals of the hazard ratio are estab-

lished. Simultaneous confidence bands for the hazard ratio and the average hazard ratio are provided in

Section 3. Simulation results are presented in Section 4. Application to data from the WHI trial is given in

Section 5. Some concluding remarks are given in Section 6. Proofs of the asymptotic results are contained

in the Supplementary Material available at Biostatistics online.

2. HAZARD RATIO FUNCTION ESTIMATION

Let T1,...,Tnbe the pooled lifetimes of the 2 groups, with T1,...,Tn1, n1< n, constituting the control

group having the survivor function SC. Let C1,...,Cn be the censoring variables, and Zi = I(i >

n1),i = 1,...,n, where I(∙) is the indicator function. The available data consist of the independent

triplets (Xi,δi, Zi), i = 1,...,n, where Xi = min(Ti,Ci) and δi = I(Ti ? Ci). We assume that Ti

and Ciare independent given Zi. The censoring variables (Ci’s) need not be identically distributed, and

in particular, the 2 groups may have different censoring patterns. For t < τ0with τ0defined in (1.2), let

R(t) be the the odds function 1/SC(t) − 1 of the control group. The model of Yang and Prentice (2005)

can be expressed as

λi(t) =

1

e−β1Zi+ e−β2ZiR(t)

dR(t)

dt

,

i = 1,...,n,

t < τ0,

(2.1)

where λi(t) is the hazard function for Tigiven Zi. Under the model, the hazard ratio is

h(t) =

1 + R(t)

e−β1+ e−β2R(t).

To estimate h(t), we need to estimate the parameter β β β = (β1,β2)Tand the baseline function R(t), where

“T ” denotes transpose. Let us first introduce the estimators from Yang and Prentice (2005).

Define

n

?

where b = (b1,b2)T. Let τ < τ0be such that

lim

n

with probability 1. For t ? τ, let

?

K(t) =

i=1

I(Xi? t),

Hj(t; b) =

n

?

i=1

δie−bjZiI(Xi? t),

j = 1,2,

K(τ) > 0,

(2.2)

ˆP(t; b) =

?

s?t

1 −?H2(s; b)

K(s)

?

,

ˆR(t; b) =

1

ˆP(t; b)

?t

0

ˆP−(s; b)

K(s)

H1(ds; b),

Page 4

Estimation of the 2-sample hazard ratio function using a semiparametric model

357

where ?H2(s; b) denotes the jump of H2(s; b) in s andˆP−(s; b) denotes the left continuous (in s) version

ofˆP(s; b), Define the martingale residuals

?t

ˆ Mi(t; b) = δiI(Xi? t) −

0

I(Xi? s)

ˆR(ds; b)

e−b1Zi+ e−b2Zi ˆR(s; b),

1 ? i ? n.

Yang and Prentice (2005) proposed a pseudo maximum likelihood estimatorˆβ β β = (ˆβ1,ˆβ2)Tof β β β, which

is the zero of Q(b), where

?τ

Q(b) =

n

?

i=1

0

fi(t; b)ˆ Mi(dt; b),

(2.3)

with fi= ( f1i, f2i)T, where

f1i(t; b) =

Zie−b1Zi

e−b1Zi+ e−b2Zi ˆR(t; b),

f2i(t; b) =

Zie−b2Zi ˆR(t; b)

e−b1Zi+ e−b2Zi ˆR(t; b).

Onceˆβ β β is obtained, R(t) can be estimated byˆR(t;ˆβ β β), and the hazard ratio h(t) can be estimated by

ˆh(t) =

1 +ˆR(t;ˆβ β β)

e−β1+ e−β2 ˆR(t;ˆβ β β)

.

In Appendix A of the Supplementary Material available at Biostatistics online, we show thatˆh(t) is

strongly consistent for h(t) under model (2.1).

To study the distributional properties ofˆh(t), let

Wn(t) =√n(ˆh(t) − h(t)),

t ? τ.

For the asymptotic distribution ofˆβ β β, define

A(t) =

?

e−β1

e−β1+ e−β2 ˆR(t; β β β),

?

?τ

e−β2 ˆR(t; β β β)

e−β1+ e−β2 ˆR(t; β β β)

?

?T

,

K1(t) =

i?n1

I(Xi? t),

K2(t) =

i>n1

I(Xi? t),

ω(t) =

t

A(s)h(s)K1(s)K2(s)

(1 + R(s))(1 +ˆR(s; β β β))K(s)(h(s)e−β2− 1)dR(s)

ˆP(s; β β β).

From Theorem A2 of Yang and Prentice (2005) and some algebra,

Q(β β β) =

?

i?n1

?τ

0

μ1dMi+

?

i>n1

?τ

0

μ2dMi

+ op(1),

Page 5

358 S. YANG AND R. L. PRENTICE

where

μ1(t) = −A(t)K2(t)h(t)

K(t)

+

ˆP−(t; β β β)(1 +ˆR(t; β))

K

ω(t),

μ2(t) = A(t)K1(t)

K(t)+

ˆP−(t; β β β)(e−β1+ e−β2 ˆR(t; β β β))

K(t)

ω(t),

(2.4)

Mi(t) = δiI(Xi? t) −

?t

0

I(Xi? s)

dR(s)

e−β1Zi+ e−β2ZiR(s),

i = 1,...,n.

Now forˆR(t;ˆβ β β), from Lemma A3 in Yang and Prentice (2005) and some algebra,

√n(ˆR(t; β β β) − R(t)) =

1

√nˆP(t; β β β)

?

i?n1

?t

0

ν1dMi+

?

i>n1

?t

0

ν2dMi

,

(2.5)

where

ν1(t) =nˆP−(t; β β β)

K(t)

(1 + R(t)),ν2(t) =nˆP−(t; β β β)

K(t)

(e−β1+ e−β2R(t)).

Let

D(t; β β β) =∂ˆR(t; β β β)

∂β β β

,

U =

?

−1

n

∂Q(β β β)

∂β β β

?−1

,

B(t) = h(t)A(t) +

e−β1− e−β2

(e−β1+ e−β2R(t))2D(t; β β β),

e−β1− e−β2

(e−β1+ e−β2R(t))2

C(t) =

1

ˆP(t; β β β).

For t ? τ, define the process

˜ Wn(t) =BT(t)U

√n

?

?

i?n1

?τ

?t

0

μ1dMi+

?

?t

i>n1

?τ

0

μ2dMi

+C(t)

√n

i?n1

0

ν1dMi+

?

i>n1

0

ν2dMi

.(2.6)

With the representations for Q(β β β) and√n(ˆR(t; β β β)−R(t)), in Appendix B of the Supplementary Material

available at Biostatistics online, we show that Wnis asymptotically equivalent to ˜ Wnwhich converges

weakly to a zero-mean Gaussian process W∗. The weak convergence of Wnthus follows. The limiting

covariancefunctionσ(s,t)of W∗involvesthederivative D(t; β β β) andthederivativematrixinU.Although

analytic forms of these derivatives are available, they are quite complicated and cumbersome. Here, we

approximate them by numerical derivatives. For the functions B(t),C(t), μ1(t), μ2(t), ν1(t), and ν2(t),

define corresponding ˆB(t),ˆC(t),..., by replacing β β β withˆβ β β, R(t) with ˆR(t;ˆβ β β) and D(t; β β β) with the

numerical derivatives. Similarly, let ˆ U be the numerical approximation of U. Simulation studies show

Page 6

Estimation of the 2-sample hazard ratio function using a semiparametric model

359

that the results are fairly stable with respect to the choice of the jump size in the numerical derivatives,

and that the choice of n−1/2works well. With these approximations, we can estimate σ(s,t), s ? t ? τ,

by

??τ

ˆ σ(s,t) =ˆBT(s)ˆ U

0

1

n(1 +ˆR(w;ˆβ β β))

[ ˆ μ1(w) ˆ μ1T(w)K1(w)

+ ˆ μ2(w) ˆ μ2T(w)K2(w)ˆh(w)]ˆR(dw,ˆβ β β)

?

ˆ UTˆB(t)

+ˆC(s)ˆC(t)

?s

0

1

n(1 +ˆR(w;ˆβ β β))

[ ˆ ν12(w)K1(w)

+ ˆ ν22(w)K2(w)ˆh(w)]ˆR(dw,ˆβ β β)

+ˆC(t)ˆBT(s)ˆ U

?t

0

1

n(1 +ˆR(w;ˆβ β β))

[ ˆ μ1(w) ˆ ν1(w)K1(w)

+ ˆ μ2(w) ˆ ν2(w)K2(w)ˆh(w)]ˆR(dw,ˆβ β β)

+ˆC(s)ˆBT(t)ˆ U

?s

0

1

n(1 +ˆR(w;ˆβ β β))

[ ˆ μ1(w) ˆ ν1(w)K1(w)

+ ˆ μ2(w) ˆ ν2(w)K2(w)ˆh(w)]ˆR(dw,ˆβ β β).(2.7)

Now for a fixed t0? τ, from the above results, confidence intervals for h(t0) can be obtained from the

asymptotic normality ofˆh(t0) and the estimated variance ˆ σ(t0,t0). The usual logarithm transformation

results in the asymptotic 100(1−α)% confidence intervalˆh(t0)exp

100(1 − α/2)% percentile of the standard normal distribution.

?

∓zα/2

√

√nˆh(t0)

ˆ σ(t0,t0)

?

, where zα/2is the

3. SIMULTANEOUS CONFIDENCE BANDS

To make simultaneous inference on h(t) over a time interval I = [a,b] ⊂ [0,τ], consider

Vn(t) =√n

ˆh(t)

s(t)(ln(ˆh(t)) − ln(h(t))),

where s(t) converges in probability, uniformly in t over I, to a bounded function s∗(t) > 0. From the

weak convergence of Wnto W∗and the functional delta method, we have the weak convergence of Vnto

W∗/s∗.Thus, ifcαistheupperαthpercentileofsupt∈I|W∗/s∗|, anasymptotic100(1−α)%simultaneous

confidence band for h(t), t ∈ I, can be obtained as

?

ˆh(t)exp

∓cαs(t)

√nˆh(t)

?

.

It is difficult to obtain cαanalytically. One obvious alternative would be the bootstrapping method. How-

ever, it is very time-consuming and results in lower than nominal coverage probabilities in some simula-

tion studies. Lin and others (1993) used a normal resampling approximation to simulate the asymptotic

Page 7

360 S. YANG AND R. L. PRENTICE

distribution of sums of martingale residuals for checking the Cox regression model. The normal resam-

pling approach reduces computing time significantly and has become a standard method. It has been used

in many works, including Lin and others (1994), Cheng and others (1997), Gilbert and others (2002),

Tian and others (2005), and Peng and Huang (2007). We will modify this approach for our problem here.

For t ? τ, define the process

i?n1

i?n1

i?n1

i?n1

where ?i,i = 1,...,n, are independent variables that are also independent from the data. Furthermore,

these variables have mean zero and variance converging to one as n → ∞. In the normal resampling

approach mentioned above, the ?i’s are the standard normal variables. However, the standard normal

variables often result in lower coverage probabilities in various simulation studies. Thus, with moderate

sized samples, we need to make some adjustment.

Conditional on (Xi,δi, Zi),i = 1,...,n,ˆ Wnis a sum of n independent variables at each time point.

In Appendix B of the Supplementary Material available at Biostatistics online, we show that ˆ Wngiven

the data converges weakly to W∗. It follows that supt∈I|ˆ W/s| given the data converges in distribution

to supt∈I|W∗/s∗|. Therefore, cαcan be estimated empirically from a large number of realizations of the

conditional distribution of supt∈I|ˆ W/s| given the data.

Several choices of the weight s arise from recommendations in the literature for confidence bands

of the survivor function and the cumulative hazard function in the one sample case. The choice s(t) =

?

by Bie and others (1987), which often have narrower widths in the middle of data range and wider widths

near the extremes of data range (Lin and others, 1994). One could also choose s(t) =ˆh(t). This choice

does not involve ˆ σ(t,t) and hence is easier to implement. It may be adequate when ˆ σ(t,t) only varies

mildly over time.

Let a ∈ (0,τ) and define the average hazard ratio, over [a,t],

ˉh(t) =

t − a

Note that the average hazard ratio involves an integral of the hazard ratio rather than a ratio of integrated

hazards. It provides a measure for the cumulative treatment effect over a time interval to augment the

temporal effect display from the hazard ratio estimates. It can be estimated by

?t

ˆ Wn(t) =

ˆBT(t)ˆ U

√n

?

?

?

?

?τ

0

ˆ μ1d(?iNi) +

?

?t

i>n1

?τ

0

ˆ μ2d(?iNi)

+

ˆC(t)

√n

?t

0

ˆ ν1d(?iNi) +

?

i>n1

0

ˆ ν2d(?iNi)

=

ˆBT(t)ˆ U

√n

?iδi ˆ μ1(Xi)I(Xi? τ) +

?

i>n1

?iδi ˆ μ2(Xi)I(Xi? τ)

+

ˆC(t)

√n

?iδiˆ ν1(Xi)I(Xi? t) +

?

i>n1

?iδiˆ ν2(Xi)I(Xi? t)

,

(3.1)

ˆ σ(t,t) results in equal precision bands (Nair, 1984), which differ from pointwise confidence intervals in

that cαreplaces zα/2. The choice s(t) = 1 + ˆ σ(t,t) results in the Hall–Wellner type bands recommended

1

?t

a

h(s)ds,

a < t < τ.

?ˉh(t) =

1

t − a

a

ˆh(s)ds,

a < t < τ.

Page 8

Estimation of the 2-sample hazard ratio function using a semiparametric model

361

To obtain simultaneous confidence bands for the average hazard ratio, let

ˉ Wn(t) =√n(?ˉh(t) −ˉh(t)),

In Appendix B of the Supplementary Material available at Biostatistics online, we show that ˉ Wn(t) con-

verges weakly to the zero-mean Gaussian process?t

for simplicity, we consider only the process

ˉVn(t) =√n(ln(?ˉh(t)) − ln(ˉh(t))).

Thus, if ˉ cαistheupperαthpercentileofsupt∈[a,b]

simultaneous confidence band forˉh(t),t ∈ I, can be obtained as

?

To approximate the critical value ˉ cα, again we use a resampling approximation. In Appendix B of the

Supplementary Material available at Biostatistics online, the process?t

approximated empirically from a large number of realizations of the conditional distribution of

supt∈[a,b]

a < t < τ.

aW∗(s)ds/(t − a). Also,?ˉh(t) behaves more stably

thanˆh(t) and its covariance function is comparatively insensitive to the choice of weight function. Hence,

From the functional delta method, it follows thatˉVn(t) converges weakly to?t

aW∗(s)ds/((t − a)ˉh(t)).

???t

aW∗(s)ds/((t−a)ˉh(t))??, anasymptotic100(1−α)%

∓ˉ cα

ˆh(t)exp

√n

?

.

aˆ Wn(s)ds/(t − a) given the data

is shown to converge weakly to?t

???t

aW∗(s)ds/(t − a). From this and strong consistency of?ˉh(t), ˉ cαcan be

aˆ Wn(s)ds/((t − a)?ˉh(t))??given the data.

4. SIMULATION STUDIES

Without any finite-sample modifications, it was found that the empirical coverage probabilities of the pro-

posed confidence bands for the hazard ratio were often lower than the nominal levels for small samples,

especially with substantial censoring. In a series of simulation studies, we have gone through an extensive

trial and error process to evaluate various modifications. In the end, we recommend that the left continu-

ous versions of the integrands in (2.3) be used. Also, instead of ˆP(t; b), we will use the asymptotically

equivalent form exp?−?t

of s(ti),i = 1,...,n, with ti’s being the uncensored observations. This restriction is similar in spirit to

the recommendations of Nair (1984) and Bie and others (1987), except we measure the extremeness of

data by s(ti). For the hazard ratio and small to moderate n, we choose the ?i’s in (3.1) to be a multiple of

the standard normal variables. We will use an ad hoc multiplier of 1 + 1/(2√n) based on various simu-

lations. For n equal to 400 or larger, the standard normal variables can be used. For the average hazard

ratio, no such multiplier adjustment is necessary.

Next, we report the results from some representative simulation studies. Here and for the real data

application in Section 5 later, τ was set to exclude the last-order statistic. All numerical computations

were done in “Matlab.” First, under the model of Yang and Prentice (2005), lifetime variables were gen-

erated with R(t) chosen to yield the standard exponential distribution for the control group. The values

of β were (log(0.9),log(1.2)) and (log(1.2),log(0.8)), representing 1/3 increase or decrease over time

from the initial hazard ratio, respectively. The censoring variables were independent and identically dis-

tributed with the log-normal distribution, where the normal distribution had mean c and standard deviation

0.5, with c chosen to achieve various censoring rates. The empirical coverage probabilities were obtained

0

H2(ds;b)

K(s)

?. In addition, it is best to restrict to the time range [infκ,supκ], where

κ is the set of observations at which the weight function s(t) is less than or equal to the 90%th percentile

Page 9

362 S. YANG AND R. L. PRENTICE

from 1000 repetitions, and for each repetition, the critical values cαand ˉ cαwere calculated empirically

from 1000 realizations of relevant conditional distributions. The results of these simulations are sum-

marized in Table 1, where the equal precision bands, Hall–Wellner type bands and unweighted bands

for the hazard ratio are denoted by EP, HW, and UW, respectively. Results for simultaneous confidence

bands of the average hazard ratio are also included with the column headerˉh. From Table 1, the em-

pirical coverage probabilities for the hazard ratio were mostly close to the nominal level. The empirical

coverage probabilities for the average hazard ratio were mostly conservative. The conservative results

were partially due to the finite-sample modifications intended for the hazard ratio. Those modifications

improved the performance of the hazard ratio estimation procedure under some scenarios, while yield-

ing conservatism in others, particularly for the more stable average hazard ratio estimator. The coverage

probabilities for the equal precision bands overall were closer to the nominal level than other types of

bands.

To check the robustness of the proposed procedures, we carried out various simulation studies with

monotone hazard ratio not satisfying the model of Yang and Prentice (2005). For Table 2, the control

group lifetime variables were standard exponential. The hazard ratio was linear from 0 to the 99th per-

centile of the standard exponential and continuous and constant afterward. The initial and end hazard

ratios again were (0.9,1.2) and (1.2,0.8), respectively, and the censoring variables were the same as

before. From Table 2, the results are similar to those in Table 1, with slight undercoverage under some

scenarios.

Table 1. Empirical coverage probabilities of the simultaneous confidence bands, for the hazard ratio (EP,

HW, and UW) and the average hazard ratio (ˉh), under the model of Yang and Prentice (2005), based on

1000 repetitions

Hazard ratio

0.9 ↑ 1.2

Censoring rate (%)

10

30

50

75

10

30

50

75

10

30

50

75

10

30

50

75

10

30

50

75

10

30

50

75

n1= n2

40

EP

0.954

0.952

0.971

0.967

0.955

0.947

0.955

0.967

0.960

0.954

0.941

0.960

0.966

0.936

0.943

0.956

0.959

0.926

0.930

0.959

0.957

0.949

0.944

0.951

HW

0.946

0.946

0.960

0.966

0.957

0.940

0.943

0.979

0.966

0.950

0.937

0.970

0.980

0.948

0.948

0.959

0.974

0.946

0.946

0.966

0.973

0.965

0.962

0.957

UW

0.963

0.961

0.976

0.977

0.959

0.955

0.956

0.979

0.966

0.951

0.940

0.971

0.983

0.967

0.954

0.964

0.975

0.945

0.939

0.968

0.963

0.945

0.947

0.954

ˉh

0.973

0.970

0.977

0.964

0.963

0.962

0.965

0.976

0.977

0.969

0.964

0.967

0.976

0.980

0.967

0.966

0.971

0.964

0.953

0.965

0.967

0.968

0.970

0.961

80

160

1.2 ↓ 0.8 40

80

160

Page 10

Estimation of the 2-sample hazard ratio function using a semiparametric model

363

Table 2. Empirical coverage probabilities of the simultaneous confidence bands, for the hazard ratio (EP,

HW, and UW) and the average hazard ratio (ˉh), under a monotone hazard ratio model not satisfying the

model of Yang and Prentice (2005), based on 1000 repetitions

Hazard ratio

0.9 ↑ 1.2

Censoring rate (%)

10

30

50

75

10

30

50

75

10

30

50

75

10

30

50

75

10

30

50

75

10

30

50

75

n1= n2

40

EP

0.955

0.965

0.945

0.971

0.959

0.935

0.938

0.956

0.963

0.952

0.940

0.957

0.976

0.952

0.955

0.966

0.965

0.954

0.941

0.965

0.969

0.963

0.937

0.955

HW

0.957

0.952

0.941

0.972

0.963

0.943

0.943

0.958

0.964

0.949

0.935

0.969

0.969

0.956

0.963

0.970

0.967

0.967

0.948

0.965

0.977

0.967

0.943

0.963

UW

0.954

0.964

0.960

0.9754

0.958

0.940

0.937

0.965

0.950

0.937

0.920

0.976

0.975

0.967

0.966

0.975

0.969

0.969

0.960

0.971

0.960

0.969

0.941

0.970

ˉh

0.973

0.976

0.962

0.970

0.983

0.968

0.956

0.955

0.974

0.966

0.960

0.971

0.982

0.970

0.961

0.967

0.975

0.972

0.968

0.973

0.976

0.967

0.963

0.970

80

160

1.2 ↓ 0.8 40

80

160

5. APPLICATION

Let us illustrate the proposed methods with data from the WHI randomized controlled trial of combined

(estrogenplusprogestin)postmenopausalhormonetherapy, whichreportedanelevatedcoronaryheartdis-

ease risk and overall unfavorable health benefits versus risks over a 5.6-year study period (Writing Group

For the Women’s Health Initiative Investigators, 2002; Manson and others, 2003). Few research reports

have stimulated as much public response, since preceding observational research literature suggested a

40–50% reduction in coronary heart disease incidence among women taking postmenopausal hormone

therapy. Analysis of the WHI observational study shows a similar discrepancy with the WHI clinical trial

for each of coronary heart disease, stroke, and venous thromboembolism. The discrepancy is partially

explained by confounding in the observational study. A remaining source of discrepancy between the

clinical trial and the observational study is elucidated by recognizing a dependence of the hazard ratio on

the therapy duration (e.g. Prentice and others, 2005). Here, we look at the time to coronary heart disease

in the WHI clinical trial, which included 16608 postmenopausal women initially in the age range of 50–

79 with uterus (n1= 8102). There were 188 and 147 events observed in the treatment and control group,

respectively, implying about 98% censoring, primarily by the trial stopping time. Fitting model (2.1) to

this data set, we getˆβ β β = (0.65,−3.63)T. Due to heavy censoring, the value 0.03 of exp(ˆβ β β2) cannot be

interpreted as the estimated long-term hazard ratio in the range of study follow-up times. The estimated

hazard ratio function is needed for a more complete and accurate assessment of the treatment effect.

Page 11

364 S. YANG AND R. L. PRENTICE

To examine model adequacy, we can use a residual plot that is similar to the method for the Cox regres-

sion model (Cox and Snell, 1968). Let ?Cand ?Tbe the cumulative hazard functions of the 2 groups,

respectively. Then ?C(Ti), i ? n1, ?T(Ti), i > n1are i.i.d. from the standard exponential distribu-

tion. Letˆ?Candˆ?Tbe the model-based estimator of ?Cand ?T, respectively, and define the residuals

ˆ?C(Xi), i ? n1, ˆ?T(Xi), i > n1. If model (2.1) is correct, the residuals should behave like a censored

sample from the standard exponential distribution. Thus, the Aalen–Nelson cumulative hazard estimator

based on them should be close to the identity function. If there is noticeable deviation, then model (2.1) is

questionable. Similarly, the residual plot can be obtained for the piecewise constant hazards ratio model

used in Prentice and others (2005). Both residual plots, not shown here, suggest that the 2 models fit the

data adequately, with similar residual behaviors.

The 95% pointwise confidence intervals and simultaneous confidence bands for the hazard ratio func-

tion are given in Figure 1. For comparison, the 95% confidence intervals for 0–2, 2–5, and > 5 years from

Prentice and others (2005) are included, over the median of uncensored data in each time interval. Com-

pared with the piecewise constant hazards ratio model, the confidence bands do not depend on partitioning

of the data range and provide more continuously changing display of the treatment effect. The confidence

bands are generally in agreement with the results from Prentice and others (2005). The UW band is wider

than the other 2 bands most of the time. The HW band is the narrowest in the middle section but is quite

wide at the beginning. Both the EP band and the HW band give narrower intervals for the middle portion

of the data range than the piecewise Cox model. Near the end of the data range, all 3 bands have about

the same width as the confidence interval from Prentice and others (2005). Overall the EP band matches

most closely with the results for the piecewise constant hazards ratio model. The width of the EP band is

Fig. 1. The 95% pointwise confidence intervals and simultaneous confidence bands of the hazard ratio function for the

WHI data: Solid lines—equal precision confidence band; dashed lines— Hall–Wellner type confidence band; dash

dotted lines— unweighted confidence band; outside dotted lines—pointwise confidence limits; and central dotted

line—the estimated hazard ratio function; vertical segments—95% confidence intervals for the hazard ratios in 0–2,

2–5, and > 5 years intervals from Prentice and others (2005), over the median of uncensored data in each time interval,

with “∗” indicating the point estimates.

Page 12

Estimation of the 2-sample hazard ratio function using a semiparametric model

365

less than or equal to the piecewise model–based confidence intervals for most of the data range, except at

the beginning. Note that the constant function 1 is not excluded in the HW and UW bands. In comparison,

the EP band stays above 1 for about the first 600 days. From Prentice and others (2005), the confidence

interval for 0 − 2yr excludes 1, indicating an elevation in coronary heart disease risk for the treatment

early on. For this data set, the standard error of the estimated hazard ratio begins at 0.43, quickly comes

down to below 0.20 at 600 days and stays below 0.20 for the rest of data range. Since the UW band

does not take the variance into account and the HW band emphasizes the middle range, the elevated stan-

dard error at early follow-up times likely explains the discrepancy among the results. Compared with the

original analysis that showed an overall difference between the 2 groups, the results here and those from

Prentice and others (2005) give more detailed analysis on the dependence of the hazard ratio on time and

help explaining the discrepancy between the results of the WHI clinical trial and preceding observational

research, much of which involved cohorts where women could be enrolled some years after initiating

hormone therapy.

For the average hazard ratio function, the estimator and the 95% simultaneous confidence band are

given in Figure 2. The standard error of the estimated average hazard ratio varies more mildly over time,

and both the estimated average hazard ratio and the confidence band are changing much more smoothly

compared with the results for the hazard ratio in Figure 1. Note that the confidence band stays above 1 for

t < 700 days. This is in agreement with the results of Prentice and others (2005).

To compare with the nonparametric approach, Figure 3 gives the estimated hazard ratio, the 95%

pointwise confidence intervals and simultaneous confidence band of Gilbert and others (2002), based on

the R programs from the author’s site. The same scale as that in Figure 1 is used for comparison and re-

sults in truncation of some portion of the plot. The estimated hazard ratio suggests that the hazard ratio is

reasonably monotonic. The nonparametric hazard ratio estimate is somewhat lower than the hazard ratio

estimates in Figure 1 under either model (2.1) or the piecewise constant hazards ratio model. The confi-

dence band is wider than those in Figure 1 for the beginning and later parts of the data range, reflecting

Fig. 2. The 95% simultaneous confidence band of the average hazard ratio function for the WHI data: dotted line—

estimated average hazard ratio; and solid lines—95% simultaneous confidence band.

Page 13

366S. YANG AND R. L. PRENTICE

Fig. 3. Nonparametric 95% pointwise confidence intervals and simultaneous confidence band of the hazard ratio

function for the WHI data: dotted line—estimated hazard ratio; solid lines—simultaneous confidence band; and

dashed lines—pointwise confidence intervals.

the difficulty in making nonparametric inference on the hazard functions, especially with heavy censoring

and in the tail region.

From the results here and additional numerical studies and real data applications, we find that for the

hazard ratio, the EP bands are preferable if the interest is in the largest possible data range; if the interest

is in part of the middle portion, then the HW bands are usually better. For the average hazard ratio, the

simple confidence band proposed here works adequately, although could possibly be improved if more

elaborate weights are used.

6. DISCUSSION

We have focused on the model of Yang and Prentice (2005) in deriving inference procedures for the

hazard ratio function. Under this model, the hazard ratio involves the baseline survivor function, but not

the baseline density function, a property shared by some other semiparametric models. Thus, inference

on the hazard ratio may be easier and more reliable than approaches involving densities, such as those

under the accelerated failure time model or the nonparametric approaches.

To assess the cumulative treatment effect, we have worked with the average hazard ratio function

here, partly due to its close connection with the hazard ratio and its corresponding ready interpretation.

Alternatively, the ratios ST(t)/SC(t) and (1 − ST(t))/(1 − SC(t)) or the difference ST(t) − SC(t), could

be considered.

We expect that the model of Yang and Prentice (2005) can provide an adequate approximation for a

wide range of applications. More rigorous model checking procedures would be useful to address model

fit and robustness issues. Note that the form of this model is not closed under a relabeling of treatment and

control groups, so its use may be more natural if there is a “no treatment ” or “standard treatment” control

Page 14

Estimation of the 2-sample hazard ratio function using a semiparametric model

367

group. It would be possible to study hazard ratio function estimation for larger classes of semiparametric

models to incorporate an even wider range of time dependence of the hazard ratio, though there is a trade

off between the model fit and increasing variance, as well as analysis cost. Also, while we have focused

on the 2-sample comparison here, adjustment for covariates may be considered. These and other problems

are worthy of further exploration.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

ACKNOWLEDGMENTS

We thank Co-Editor Anastasios Tsiatis, a referee, and an associate editor for helpful comments and sug-

gestions. Conflict of Interest: None declared.

FUNDING

National Institutes of Health (CA 53996 to Ross L. Prentice).

REFERENCES

ABRAHAMOWICZ, M. AND MACKENZIE, T. A. (2007). Joint estimation of time-dependent and non-linear effects

of continuous covariates on survival. Statistics in Medicine 26, 392–408.

BIE, O., BORGAN, Ø. AND LIESTØL, K. (1987). Confidence intervals and confidence bands for the cumulative

hazard rate function and their small-sample properties. Scandinavian Journal of Statistics 14, 221–233.

CAI, Z. AND SUN, Y. (2003). Local linear estimation for time-dependent coefficients in Coxs regression models.

Scandinavian Journal of Statistics 30, 93–111.

CHEN, Y. Q. AND CHENG, S. (2006). Linear life expectancy regression with censored data. Biometrika 93,

303–313.

CHEN, Y. Q. AND WANG, M. C. (2000). Analysis of accelerated hazards models. Journal of the American Statistical

Association 95, 608–618.

CHENG. S. C., WEI, L. J. AND YING, Z. (1997). Predicting survival probabilities with semiparametric transforma-

tion models. Journal of the American Statistical Association 92, 227–235.

COX, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society,

Series B 34, 187–220.

COX, D. R. (1975). Partial likelihood. Biometrika 62, 269–276.

COX, D. R. AND SNELL, E. J. (1968). A general definition of residuals (with discussion). Journal of the Royal

Statistical Society, Series B 30, 248–275.

GILBERT, P. B., WEI, L. J., KOSOROK, M. R. AND CLEMENS, J. D. (2002). Simultaneous inferences on the

contrast of two hazard functions with censored observations. Biometrics 58, 773–780.

GRAY, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer

prognosis. Journal of the American Statistical Association 87, 942–951.

HSIEH, F. (1996). A transformation model for two survival curves: an empirical process approach. Biometrika 83,

519–528.

Page 15

368S. YANG AND R. L. PRENTICE

KOOPERBERG, C., STONE, C. J. AND TRUONG, Y. K. (1995).Hazardregression.JournaloftheAmericanStatistical

Association 90, 78–94.

LIN, D. Y., FLEMING, T. R. AND WEI, L. J. (1994). Confidence bands for survival curves under the proportional

hazards model. Biometrika 81, 73–81.

LIN, D. Y., WEI, L. J. AND YING, Z. (1993). Checking the Cox model with cumulative sums of martingale-based

residuals. Biometrika 80, 557–572.

MANSON, J. E., HSIA, J., JOHNSON, K. C., ROSSOUW, J. E., ASSAF, A. R., LASSER, N. L., TREVISAN, M.,

BLACK, H. R., HECKBERT, S. R., DETRANO, R. and others (2003). Estrogen plus progestin and the risk of

coronary heart disease. The New England Journal of Medicine 349, 523–534.

NAIR, V. N. (1984). Confidence bands for survival functions with censored data: a comparative study. Technometrics

26, 265–275.

PENG, L. AND HUANG, Y. (2007). Survival analysis with temporal covariate effects. Biometrika 94, 719–733.

PRENTICE, R. L., LANGER, R., STEFANICK, M. L., HOWARD, B. V., PETTINGER, M., ANDERSON, G., BARAD,

D., CURB, J. D., KOTCHEN, J., KULLER, L. and others (2005). Combined postmenopausal hormone therapy and

cardiovascular disease: toward resolving the discrepancy between observational studies and the womens health

initiative clinical trial. American Journal of Epidemiology 162, 404–414.

TIAN, L., ZUCKER, D. AND WEI, L. J. (2005). On the Cox model with time-varying regression coefficients. Journal

of the American statistical Association 100, 172–183.

TONG, X., ZHU, C. AND SUN, J. (2007). Semiparametric regression analysis of two-sample current status data, with

applications to tumorigenicity experiments. The Canadian Journal of Statistics 35, 575–584.

TSODIKOV, A. (2002). Semi-parametric models of long- and short-term survival: an application to the analysis of

breast cancer survival in Utah by age and state. Statistics in Medicine 21, 895–920.

VAUPEL J. W., MANTON, K. G. AND STALLARD, E. (1979). The impact of heterogeneity in individual frailty on

the dynamics of mortality. Demography 16, 439–454.

WRITING GROUP FOR THE WOMEN’S HEALTH INITIATIVE INVESTIGATORS (2002).Risksandbenefitsofestrogen

plus progestin in healthy postmenopausal women: principal results from the women’s health initiative randomized

controlled trial. Journal of the American Medical Association 288, 321–333.

YANG, S. AND PRENTICE, R. L. (2005). Semiparametric analysis of short-term and long-term hazard ratios with

two-sample survival data. Biometrika 92, 1–17.

ZENG, D. AND LIN, D. Y. (2007). Maximum likelihood estimation in semiparametric regression models with cen-

sored data. Journal of the Royal Statistical Society, Series B 69, 507–564.

[Received October 21, 2009; revised August 23, 2010; accepted for publication August 24, 2010]