Page 1

Biostatistics (2011), 12, 2, pp. 354–368

doi:10.1093/biostatistics/kxq061

Advance Access publication on September 21, 2010

Estimation of the 2-sample hazard ratio function using a

semiparametric model

SONG YANG∗

Office of Biostatistics Research, National Heart, Lung, and Blood Institute,

6701 Rockledge Drive, MSC 7913, Bethesda, MD 20892, USA

yangso@nhlbi.nih.gov

ROSS L. PRENTICE

Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North,

PO Box 19024 Seattle, WA 98109, USA

SUMMARY

The hazard ratio provides a natural target for assessing a treatment effect with survival data, with the

Cox proportional hazards model providing a widely used special case. In general, the hazard ratio is a

function of time and provides a visual display of the temporal pattern of the treatment effect. A variety of

nonproportionalhazardsmodelshavebeenproposedintheliterature.However, availablemethodsforflex-

ibly estimating a possibly time-dependent hazard ratio are limited. Here, we investigate a semiparametric

model that allows a wide range of time-varying hazard ratio shapes. Point estimates as well as pointwise

confidence intervals and simultaneous confidence bands of the hazard ratio function are established under

this model. The average hazard ratio function is also studied to assess the cumulative treatment effect. We

illustrate corresponding inference procedures using coronary heart disease data from the Women’s Health

Initiative estrogen plus progestin clinical trial.

Keywords: Clinical trial; Empirical process; Gaussian process; Hazard ratio; Simultaneous inference; Survival analy-

sis; Treatment–time interaction.

1. INTRODUCTION

Consider the comparison of failure times between a treated and control group under independent cen-

sorship. The hazard ratio provides a natural target of estimation in many applications since it permits a

focus on relative failure rates across the study follow-up period, without the need to model absolute fail-

ure rates, which may be sensitive to study eligibility criteria and other factors. The proportional hazards

special case of the Cox (1972) regression model is widely used for hazard ratio estimation. The maximum

partial likelihood procedure (Cox, 1975) provides a convenient and robust means of estimating a constant

hazard ratio and yields a log-rank procedure for testing equality of hazards between the 2 groups.

∗To whom correspondence should be addressed.

c ? The Author 2010. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

Page 2

Estimation of the 2-sample hazard ratio function using a semiparametric model

355

In general, the hazard ratio may be a function of time, and estimation of the hazard ratio function may

provide useful insights into temporal aspects of treatment effects. For example, Gilbert and others (2002)

develop a nonparametric estimation procedure for the log-hazard ratio function with simultaneous confi-

dence bands, for use as an exploratory data analytic tool. Naturally, confidence bands may be wide with

such a nonparametric estimator, particularly at longer follow-up times where data may be sparse. See also

Gray (1992), Kooperberg and others (1995), Cai and Sun (2003), Tian and others (2005), Abrahamowicz

and Mackenzie (2007), and Peng and Huang (2007), and references therein, for additional related work.

Parametric or semiparametric hazard ratio models have potential to contribute valuably to treatment

effect assessment. Hazard ratio models having parameters of useful interpretation, and that embrace a

range of hazard ratio shapes, may be particularly valuable. The Cox model allows time-varying covariates

to be defined that can, for example, allow separate hazard ratios for the elements of a partition of the

time axis or allow the hazard ratio to be a parametric function of follow-up time more generally. Various

other semiparametric regression models have been proposed for failure time data analyses, including

accelerated failure time models, proportional odds models, and linear transformation models, many of

which are embraced by the broad class of models for which Zeng and Lin (2007) develop maximum

likelihood estimation procedures. Some more semiparametric models can be found in Vaupel and others

(1979), Hsieh (1996), Chen and Wang (2000), Tsodikov (2002), Yang and Prentice (2005), and Chen and

Cheng (2006). Many of these models induce a semiparametric class of models for the hazard ratio function

that includes proportional hazards as a special case. Hazard ratio estimators under such semiparametric

models can avoid the instability that may attend nonparametric hazard ratio function estimators.

One of these, proposed by Yang and Prentice (2005), involves short-term and long-term hazard ratio

parameters, and a hazard ratio function that depends also on the control group survivor function. Assume

absolutely continuous failure times and label the 2 groups control and treatment, with hazard functions

λC(t) and λT(t), respectively. Let h(t) = λT(t)/λC(t) be the hazard ratio function and SC(t) the survivor

function of the control group. The model postulates that

h(t) =

1

e−β2+ (e−β1− e−β2)SC(t),

t < τ0,

(1.1)

where β1and β2are scalar parameters and

τ0= sup

?

x:

?x

0

λC(t)dt < ∞

?

.(1.2)

This model includes the proportional hazards model and the proportional odds model as special cases. It

has a monotone h(t) with a variety of patterns, including proportional hazards, no initial effect, disappear-

ing effect, and crossing hazards, among others. Thus, the model presumably entails sufficient flexibility

for many applications. It has also been studied for current status data in Tong and others (2007).

In comparison, for many commonly used special cases of the accelerated failure time model either

limt↓0h(t) = 1 or limt↑τ0h(t) ∈ {0,1,∞} and the hazard ratio stays above or below one when λC

is increasing. This is less flexible than desired. For the class of linear transformation models, with the

logarithmic transformation, the hazard ratio also inherits some of these restrictions at many common

baseline distributions. Similar properties hold as well for many other semiparametric models.

Under model (1.1), estimation procedures to date have focused on the finite-dimensional parameters,

as has mostly been the case also for estimation under other semiparametric models. Here, we extend the

estimation to pointwise and simultaneous inference on the hazard ratio function itself. First, consistency

and asymptotic normality of the estimate at a fixed time point are established. Then procedures for con-

structing pointwise confidence intervals and simultaneous confidence bands for the hazard ratio function

are developed, and some modifications are implemented to improve moderate sample size performance.

Page 3

356 S. YANG AND R. L. PRENTICE

For additional display of the treatment effect, simultaneous confidence bands are also obtained for the

average hazard ratio function over a time interval. The average hazard ratio gives a summary measure of

treatment comparison and provides a picture of the cumulative treatment effect to augment display of the

temporal pattern of the hazard ratio. These hazard ratio estimation procedures are applied to data from the

Women’s Health Initiative (WHI) estrogen plus progestin clinical trial (Writing Group For the Women’s

Health Initiative Investigators, 2002; Manson and others, 2003), which yielded a hazard ratio function

for the primary coronary heart disease outcome that was decidedly nonproportional. Understanding the

hazard ratio function shape in this setting was important to integrating the clinical trial data with a large

body of preceding observational literature that had failed to identify an early hazard ratio increase (e.g.

Manson and others, 2003; Prentice and others, 2005).

We organize the article as follows: In Section 2, the short-term and long-term hazard ratio model

and the hazard ratio estimate are described. Pointwise confidence intervals of the hazard ratio are estab-

lished. Simultaneous confidence bands for the hazard ratio and the average hazard ratio are provided in

Section 3. Simulation results are presented in Section 4. Application to data from the WHI trial is given in

Section 5. Some concluding remarks are given in Section 6. Proofs of the asymptotic results are contained

in the Supplementary Material available at Biostatistics online.

2. HAZARD RATIO FUNCTION ESTIMATION

Let T1,...,Tnbe the pooled lifetimes of the 2 groups, with T1,...,Tn1, n1< n, constituting the control

group having the survivor function SC. Let C1,...,Cn be the censoring variables, and Zi = I(i >

n1),i = 1,...,n, where I(∙) is the indicator function. The available data consist of the independent

triplets (Xi,δi, Zi), i = 1,...,n, where Xi = min(Ti,Ci) and δi = I(Ti ? Ci). We assume that Ti

and Ciare independent given Zi. The censoring variables (Ci’s) need not be identically distributed, and

in particular, the 2 groups may have different censoring patterns. For t < τ0with τ0defined in (1.2), let

R(t) be the the odds function 1/SC(t) − 1 of the control group. The model of Yang and Prentice (2005)

can be expressed as

λi(t) =

1

e−β1Zi+ e−β2ZiR(t)

dR(t)

dt

,

i = 1,...,n,

t < τ0,

(2.1)

where λi(t) is the hazard function for Tigiven Zi. Under the model, the hazard ratio is

h(t) =

1 + R(t)

e−β1+ e−β2R(t).

To estimate h(t), we need to estimate the parameter β β β = (β1,β2)Tand the baseline function R(t), where

“T ” denotes transpose. Let us first introduce the estimators from Yang and Prentice (2005).

Define

n

?

where b = (b1,b2)T. Let τ < τ0be such that

lim

n

with probability 1. For t ? τ, let

?

K(t) =

i=1

I(Xi? t),

Hj(t; b) =

n

?

i=1

δie−bjZiI(Xi? t),

j = 1,2,

K(τ) > 0,

(2.2)

ˆP(t; b) =

?

s?t

1 −?H2(s; b)

K(s)

?

,

ˆR(t; b) =

1

ˆP(t; b)

?t

0

ˆP−(s; b)

K(s)

H1(ds; b),

Page 4

Estimation of the 2-sample hazard ratio function using a semiparametric model

357

where ?H2(s; b) denotes the jump of H2(s; b) in s andˆP−(s; b) denotes the left continuous (in s) version

ofˆP(s; b), Define the martingale residuals

?t

ˆ Mi(t; b) = δiI(Xi? t) −

0

I(Xi? s)

ˆR(ds; b)

e−b1Zi+ e−b2Zi ˆR(s; b),

1 ? i ? n.

Yang and Prentice (2005) proposed a pseudo maximum likelihood estimatorˆβ β β = (ˆβ1,ˆβ2)Tof β β β, which

is the zero of Q(b), where

?τ

Q(b) =

n

?

i=1

0

fi(t; b)ˆ Mi(dt; b),

(2.3)

with fi= ( f1i, f2i)T, where

f1i(t; b) =

Zie−b1Zi

e−b1Zi+ e−b2Zi ˆR(t; b),

f2i(t; b) =

Zie−b2Zi ˆR(t; b)

e−b1Zi+ e−b2Zi ˆR(t; b).

Onceˆβ β β is obtained, R(t) can be estimated byˆR(t;ˆβ β β), and the hazard ratio h(t) can be estimated by

ˆh(t) =

1 +ˆR(t;ˆβ β β)

e−β1+ e−β2 ˆR(t;ˆβ β β)

.

In Appendix A of the Supplementary Material available at Biostatistics online, we show thatˆh(t) is

strongly consistent for h(t) under model (2.1).

To study the distributional properties ofˆh(t), let

Wn(t) =√n(ˆh(t) − h(t)),

t ? τ.

For the asymptotic distribution ofˆβ β β, define

A(t) =

?

e−β1

e−β1+ e−β2 ˆR(t; β β β),

?

?τ

e−β2 ˆR(t; β β β)

e−β1+ e−β2 ˆR(t; β β β)

?

?T

,

K1(t) =

i?n1

I(Xi? t),

K2(t) =

i>n1

I(Xi? t),

ω(t) =

t

A(s)h(s)K1(s)K2(s)

(1 + R(s))(1 +ˆR(s; β β β))K(s)(h(s)e−β2− 1)dR(s)

ˆP(s; β β β).

From Theorem A2 of Yang and Prentice (2005) and some algebra,

Q(β β β) =

?

i?n1

?τ

0

μ1dMi+

?

i>n1

?τ

0

μ2dMi

+ op(1),

Page 5

358 S. YANG AND R. L. PRENTICE

where

μ1(t) = −A(t)K2(t)h(t)

K(t)

+

ˆP−(t; β β β)(1 +ˆR(t; β))

K

ω(t),

μ2(t) = A(t)K1(t)

K(t)+

ˆP−(t; β β β)(e−β1+ e−β2 ˆR(t; β β β))

K(t)

ω(t),

(2.4)

Mi(t) = δiI(Xi? t) −

?t

0

I(Xi? s)

dR(s)

e−β1Zi+ e−β2ZiR(s),

i = 1,...,n.

Now forˆR(t;ˆβ β β), from Lemma A3 in Yang and Prentice (2005) and some algebra,

√n(ˆR(t; β β β) − R(t)) =

1

√nˆP(t; β β β)

?

i?n1

?t

0

ν1dMi+

?

i>n1

?t

0

ν2dMi

,

(2.5)

where

ν1(t) =nˆP−(t; β β β)

K(t)

(1 + R(t)),ν2(t) =nˆP−(t; β β β)

K(t)

(e−β1+ e−β2R(t)).

Let

D(t; β β β) =∂ˆR(t; β β β)

∂β β β

,

U =

?

−1

n

∂Q(β β β)

∂β β β

?−1

,

B(t) = h(t)A(t) +

e−β1− e−β2

(e−β1+ e−β2R(t))2D(t; β β β),

e−β1− e−β2

(e−β1+ e−β2R(t))2

C(t) =

1

ˆP(t; β β β).

For t ? τ, define the process

˜ Wn(t) =BT(t)U

√n

?

?

i?n1

?τ

?t

0

μ1dMi+

?

?t

i>n1

?τ

0

μ2dMi

+C(t)

√n

i?n1

0

ν1dMi+

?

i>n1

0

ν2dMi

.(2.6)

With the representations for Q(β β β) and√n(ˆR(t; β β β)−R(t)), in Appendix B of the Supplementary Material

available at Biostatistics online, we show that Wnis asymptotically equivalent to ˜ Wnwhich converges

weakly to a zero-mean Gaussian process W∗. The weak convergence of Wnthus follows. The limiting

covariancefunctionσ(s,t)of W∗involvesthederivative D(t; β β β) andthederivativematrixinU.Although

analytic forms of these derivatives are available, they are quite complicated and cumbersome. Here, we

approximate them by numerical derivatives. For the functions B(t),C(t), μ1(t), μ2(t), ν1(t), and ν2(t),

define corresponding ˆB(t),ˆC(t),..., by replacing β β β withˆβ β β, R(t) with ˆR(t;ˆβ β β) and D(t; β β β) with the

numerical derivatives. Similarly, let ˆ U be the numerical approximation of U. Simulation studies show