Page 1

A semi-parametric generalization of the Cox proportional hazards

regression model: Inference and Applications

Karthik Devarajana,* and Nader Ebrahimib

aDivision of Population Science, Fox Chase Cancer Center, Philadelphia, PA 19111

bDivision of Statistics, Northern Illinois University, DeKalb, IL 60115

Abstract

The assumption of proportional hazards (PH) fundamental to the Cox PH model sometimes may not

hold in practice. In this paper, we propose a generalization of the Cox PH model in terms of the

cumulative hazard function taking a form similar to the Cox PH model, with the extension that the

baseline cumulative hazard function is raised to a power function. Our model allows for interaction

between covariates and the baseline hazard and it also includes, for the two sample problem, the case

of two Weibull distributions and two extreme value distributions differing in both scale and shape

parameters. The partial likelihood approach can not be applied here to estimate the model parameters.

We use the full likelihood approach via a cubic B-spline approximation for the baseline hazard to

estimate the model parameters. A semi-automatic procedure for knot selection based on Akaike’s

Information Criterion is developed. We illustrate the applicability of our approach using real-life

data.

Keywords

censored survival data analysis; crossing hazards; Frailty model; maximum likelihood; regression;

spline function; Akaike information criterion; Weibull distribution; extreme value distribution

1 Introduction

The modeling and analysis of data in which the principal endpoint is the time until an event

occurs is often of prime interest in medical and engineering studies. Typically, such an event

is the onset of a disease or death itself as seen in clinical trials or failure of an item or a system

as seen in industrial life testing. The time to an event is normally referred to as survival or

failure time.

The primary goal in analyzing censored survival data is to assess the dependence of survival

time on covariates. The secondary goal is the estimation of the underlying distribution of

survival time. The Cox Proportional Hazards (PH) model (Cox, 1972) is a standard tool for

exploring the association of covariates with survival time. An interesting feature of this model

is that it is semi-parametric in the sense that it can be factored into a parametric part consisting

© 2010 Elsevier B.V. All rights reserved.

*Corresponding author. karthik.devarajan@fccc.edu (Karthik Devarajan), nader@math.niu.edu (Nader Ebrahimi).

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting

proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could

affect the content, and all legal disclaimers that apply to the journal pertain.

NIH Public Access

Author Manuscript

Comput Stat Data Anal. Author manuscript; available in PMC 2012 January 1.

Published in final edited form as:

Comput Stat Data Anal. 2011 January 1; 55(1): 667–676. doi:10.1016/j.csda.2010.06.010.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 2

of a regression parameter vector associated with the covariates and a non-parametric part that

can be left completely unspecified.

In the Cox PH model, given a vector of possibly time-dependent covariates z, the hazard

function at time t is assumed to be of the form

(1.1)

where λ0(t) is the baseline hazard function, denoting the hazard under no covariate effect and

g(z) is a non-negative function of the covariate vector z, referred to as the risk function, such

that g(0) = 1. The most commonly used form of the Cox PH model is

(1.2)

where β = (β1,…,βp)′ is a p vector of regression coefficients. The focus is on inference for β,

with the baseline hazard function, λ0(t), the non-parametric part, left completely unspecified.

In spite of its semi-parametric feature, the Cox PH model implicitly assumes that the hazard

and survival curves corresponding to two different values of the covariates do not cross.

Although this assumption may be valid in many experimental settings, it has been found to be

suspect in others. For example, if the treatment effect decreases with time, then one might

expect the hazard curves corresponding to the treatment and control groups to converge. Other

examples that indicate the presence of non-proportional hazards are also given in Gore et al.

(1984), and Tonak et al. (1979), among others.

In this paper, we describe a semi-parametric generalization of the Cox PH model which allows

crossing of hazards as well as survival functions. In Section 2, we discuss its unique properties

and place it within the context of censored survival data analysis. In Section 3, we describe an

estimation procedure for this model using cubic B-spline approximations for the baseline

hazard. We illustrate our method with real-life examples in Section 4, and in Section 5 we

provide some concluding remarks.

2 A Semi-Parametric Generalization of the Cox PH Model

We describe a semi-parametric generalization of the Cox PH model in which the hazard

functions corresponding to different values of the covariates can cross. The special case of this

model was originally introduced by Quantin et al. (1996) for the purpose of goodness of fit

testing of the Cox PH model. Devarajan (2000) outlined the unique properties of this non-

proportional hazards regression model as well as inference for this model using maximum

penalized likelihood estimation, and provided a theoretical justification for using spline

approximations for the baseline hazard. In addition, Devarajan and Ebrahimi (2002, p.237)

used this model and developed a goodness of fit procedure for testing the Cox PH model. In

independent work, Hsieh (2001) and Wu & Hsieh (2009) discussed an estimating equations

approach for this model by approximating the baseline hazard using piecewise constants.

In our model, the survival function corresponding to a covariate vector z is assumed to be of

the form

(2.1)

Devarajan and Ebrahimi Page 2

Comput Stat Data Anal. Author manuscript; available in PMC 2012 January 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 3

where S0(t) is an arbitrary baseline survival function, and g(z) and h(z) are nonnegative

functions of the covariate vector z such that g(0) = h(0) = 1. This model includes, for the two

sample problem, the case of two Weibull and two extreme value distributions differing in both

scale and shape parameters. The Cox PH model is obtained as a special case of model (2.1) by

setting h(z) = 1. In this paper, we will consider only the exponential function for g and h, i.e.,

g(z) = exp(β′z) and h(z) = exp(γ′z) where β and γ are unknown p vectors of parameters. In

terms of cumulative hazard functions, our non-proportional hazards regression model takes the

specific form

(2.2)

The Cox PH model is obtained as a special case of model (2.2) by setting γ = 0. The conditional

survival function is

(2.3)

and the conditional hazard function is

(2.4)

Applying a complementary log(− log) transformation in (2.3), we get

The above equation can be expressed as

(2.5)

where ψ(x) = log(− log(x)) and h(t) = log{− log{S0(t)}}. This can be shown to be a member of

the family of models described in Cheng et al. (1997) (see Devarajan, 2000, pp.45-48 for

details). An equivalent version of (2.5) is

(2.6)

where the error ε has distribution function F = 1 − ψ−1. Since the distribution of the baseline

h(T) = log{− log(S0(T)}} is unit extreme value, equation (2.6) results in a scale and shape

transformation of this unit extreme value distribution. The generalized model (2.3) can be

interpreted as a transformation of the unit extreme value distribution in terms of

reparametrizations of the scale and shape parameters. Similarly, applying a log transformation

in (2.3) (with ψ(x) = log(x) and h(t) = log{S0(t)}) and using similar arguments as above, one

can interpret our generalized model as a transformation of the unit exponential distribution in

terms of reparametrizations of the scale and shape parameters.

Devarajan and EbrahimiPage 3

Comput Stat Data Anal. Author manuscript; available in PMC 2012 January 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 4

2.1 Some Useful Features of the Generalization

We describe several useful features of our generalized model and highlight its relationship to

other survival models. These properties provide the framework for a generalized approach to

censored survival data analysis.

2.1.1 Crossing hazards over time—The hazards ratio corresponding to two different

covariate vectors z1 and z2 is

(2.7)

Since this ratio is a monotone function of t, the model allows the hazards ratio to invert over

time. In other words, it allows crossing of hazard curves. For example, when treatment effect

decreases or increases over time, model (2.2) can be applied.

2.1.2 Relation to the Time-Dependent Coefficient Cox PH Model—If γ is assumed

to be close to zero, we can approximate the right hand side of equation (2.2) as follows:

Thus, for two different covariate vectors z1 and z2, we have,

(2.8)

where η(t) = β + g(t) · γ with g(t) = log{Λ0(t)}. This is a special case of the Cox PH model with

time-dependent coefficients η(t) which allows for crossing of hazard curves. When the

deviation from proportional hazards is small, our proposed model approximates the Cox PH

model with time dependent coeffcients.

In terms of the hazard functions,

(2.9)

where h(t) = 1+log{Λ0(t)}. Cox (1972) considered the case where h(t) = t, a dummy time-

dependent variable for goodness of fit testing of the Cox PH model. Therneau and Grambsch

(1994) have considered the model (2.9) with an assumed form for h(t).

2.1.3 Proportionality of the Hazard-Cumulative Hazard Ratios—Using equations

(2.2) and (2.4), we see that,

(2.10)

Devarajan and Ebrahimi Page 4

Comput Stat Data Anal. Author manuscript; available in PMC 2012 January 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 5

Let k(t∣z) denote the conditional growth rate of the logarithm of the cumulative hazard function.

Then Using (2.10), we see that

(2.11)

where

function.

is the growth rate of the logarithm of the baseline cumulative hazard

Equation (2.10) has a form similar to that of (1.2), the Cox PH model. The difference between

the two is that (1.2) models the growth rate of the cumulative hazard function while (2.10)

models the growth rate of the logarithm of the cumulative hazard function. Equation (2.10)

implies that the ratio of the hazard function to the cumulative hazard function, given a covariate

vector z, is proportional to the ratio of the hazard function to the cumulative hazard function

at baseline. It is worth mentioning here that this property is related to the constancy of the ratio

of the hazard function to the cumulative hazard function of the extreme value distribution. Note

that if we set γ = 0 in (2.10), the ratios of hazard to cumulative hazard in (2.10) are equal and

it reduces to the Cox PH model.

2.1.4 Proportionality of the Logarithm of Cumulative Hazards—For β = 0, model

(2.2) reduces to

(2.12)

Model (2.12) is similar to (2.2) and has all its features with the exception that it does not include

the Cox PH model. By taking logarithm on both sides of (2.12), it is easy to see that the

logarithms of the cumulative hazards are proportional.

2.1.5 Relation to the Frailty Model—In survival analysis, it has been found that differences

between hazards due to covariate effects tend to converge as follow-up time elapses. Such an

effect can be accounted for by postulating the existence of unobserved random effects or

frailties with prognostic value. In terms of hazard functions, we can write

(2.13)

where u is an unobserved frailty or random effect (see Hougaard (2000) pp.215-245 for more

details). Assuming that u has a positive stable distribution with parameter θ, one can easily

show that

(2.14)

It is clear from (2.14) that if we define γ = (θ, 0, ⋯, 0), then our proposed model and (2.14)

will be the same. When frailty interacts with treatment, we can re-write (2.14) as

(2.15)

Devarajan and EbrahimiPage 5

Comput Stat Data Anal. Author manuscript; available in PMC 2012 January 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript