Page 1

Biostatistics (2012), 13, 2, pp. 256–273

doi:10.1093/biostatistics/kxr050

Advance Access publication on January 30, 2012

On the covariate-adjusted estimation for an overall

treatment difference with data from a randomized

comparative clinical trial

LU TIAN

Department of Health Research & Policy, Stanford University, Stanford, CA 94305, USA

TIANXI CAI

Department of Biostatistics, Harvard University, Boston, MA 02115, USA

LIHUI ZHAO

Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA

LEE-JEN WEI∗

Department of Biostatistics, Harvard University, Boston, MA 02115, USA

wei@hsph.harvard.edu

SUMMARY

To estimate an overall treatment difference with data from a randomized comparative clinical study,

baseline covariates are often utilized to increase the estimation precision. Using the standard analysis

of covariance technique for making inferences about such an average treatment difference may not be

appropriate, especially when the fitted model is nonlinear. On the other hand, the novel augmentation pro-

cedure recently studied, for example, by Zhang and others (2008. Improving efficiency of inferences in

randomized clinical trials using auxiliary covariates. Biometrics 64, 707–715) is quite flexible. However,

in general, it is not clear how to select covariates for augmentation effectively. An overly adjusted estima-

tor may inflate the variance and in some cases be biased. Furthermore, the results from the standard infer-

ence procedure by ignoring the sampling variation from the variable selection process may not be valid.

In this paper, we first propose an estimation procedure, which augments the simple treatment contrast

estimator directly with covariates. The new proposal is asymptotically equivalent to the aforementioned

augmentation method. To select covariates, we utilize the standard lasso procedure. Furthermore, to make

valid inference from the resulting lasso-type estimator, a cross validation method is used. The validity of

the new proposal is justified theoretically and empirically. We illustrate the procedure extensively with a

well-known primary biliary cirrhosis clinical trial data set.

Keywords: ANCOVA; Cross validation; Efficiency augmentation; Mayo PBC data; Semi-parametric efficiency.

∗To whom correspondence should be addressed.

c ? The Author 2012. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Page 2

On the covariate-adjusted estimation for the treatment difference

257

1. INTRODUCTION

For a typical randomized clinical trial to compare two treatments, generally a summary measure θ0for

quantifying the treatment effectiveness difference can be estimated unbiasedly or consistently using its

simple two-sample empirical counterpart, sayˆθ. With the subject’s baseline covariates, one may obtain

a more efficient estimator for θ0via a standard analysis of covariance (ANCOVA) technique or a novel

augmentation procedure, which is well documented in Zhang and others (2008) and a series of papers

(Leon and others, 2003; Tsiatis, 2006; Tsiatis and others, 2008; Lu and Tsiatis, 2008; Gilbert and others,

2009; Zhang and Gilbert, 2010). The ANCOVA approach can be problematic, especially when the regres-

sion model is nonlinear, for example, the logistic or Cox model. For this case, the ANCOVA estimator

generally does not converge to θ0, but to a quantity which may be difficult to interpret as a treatment con-

trast measure. Moreover, in the presence of censored event time observations, this quantity may depend

on the censoring distribution. On the other hand, the above augmentation procedure, referred as ZTD,

in the literature always produces a consistent estimator for θ0, provided that the simple estimatorˆθ is

consistent.

In theory, the ZTD estimator, denoted byˆθZTDhereafter, is asymptotically more efficient thanˆθ no

matter how many covariates being augmented. In practice, however, an “overly augmented” or

“mis-augmented” estimator may have a larger variance than that ofˆθ and in special case may even have

undesirable finite sample bias. Recently, Zhang and others (2008) showed empirically that the ZTD via

the standard stepwise regression for variable selection performs satisfactorily when the number of covari-

ates is not large. In general, however, it is not clear that the standard inference procedures for θ0based on

estimators augmented by covariates selected via a rather complex variable selection process is appropriate

especially when the number of covariates involved is not small relative to the sample size. Therefore, it is

highly desirable to develop an estimation procedure to properly and systematically augmentˆθ and make

valid inference for the treatment difference using the data with practical sample sizes.

Now, let Y be the response variable, T be the binary treatment indicator, and Z be a p-dimensional

vector of baseline covariates including 1 as its first element and possibly transformations of original vari-

ables. The data, {(Yi,Ti,Zi),i = 1,...,n}, consist of n independent copies of (Y,T,Z), where T and Z

are independent of each other. Let P(T = 1) = π ∈ (0,1). First, suppose that we are interested in the

mean difference: θ0= E(Y|T = 1) − E(Y|T = 0). A simple unadjusted estimator is

n

?

which consistently estimates θ0. To improve efficiency in estimating θ0, one may employ the standard

ANCOVA procedure by fitting the following linear regression “working” model:

ˆθ =1

n

i=1

(Ti− π)Yi

π(1 − π),

E(Y|T,Z) = θT + γ?Z,

where θ and γ are unknown parameters. Since T ⊥ Z and {(Ti,Zi),i = 1,...,n} are independent copies

of (T,Z), the resulting ANCOVA estimator is asymptotically equivalent to

?

n

i=1

where ˆ γ is the ordinary least square estimator for γ of the model E(Y|Z) = γ?Z. As n → ∞, ˆ γ converges

to

γ0= argminγE(Y − γ?Z)2.

ˆθ − ˆ γ?

1

n

?

(Ti− π)Zi

π(1 − π)

?

,

(1.1)

Page 3

258L. TIAN AND OTHERS

It follows that the ANCOVA estimator is asymptotically equivalent to

ˆθ − γ?

0

?

1

n

n

?

i=1

(Ti− π)Zi

π(1 − π)

?

.(1.2)

In theory, sinceˆθ is consistent to θ0, the ANCOVA estimator is also consistent to θ0and more efficient

thanˆθ regardless of whether the above working model is correctly specified. Furthermore, as noted by

Tsiatis and others (2008), the nonparametric ANCOVA estimator proposed by Koch and others (1998)

andˆθZTDare also asymptotically equivalent to (1.2) when π = 0.5. We give details of this equivalence in

Appendix A.

The novel ZTD procedure is derived by specifying optimal estimating functions under a very general

semi-parametric setting. The efficiency gain from ˆθZTD has been elegantly justified using the semi-

parametric inference theory (Tsiatis, 2006). The ZTD is much more flexible than the ANCOVA method

in that it can handle cases when the summary measure θ0is beyond the simple difference of two group

means. On the other hand, the ANCOVA method may only work under above simple linear regression

model.

In this paper, we study the estimator (1.1), which augmentsˆθ directly with the covariates. The key

question is how to choose ˆ γ in (1.1) especially when p is not small with respect to n. Here, we utilize

the lasso procedure with a cross validation process to construct a systematic procedure for selecting

covariates to increase the estimation precision. The validity of the new proposal is justified theoretically

and empirically via an extensive simulation study. The proposal is also illustrated with the data from a

clinical trial to evaluate a treatment for a specific liver disease.

2. ESTIMATING THE TREATMENT DIFFERENCE VIA PROPER AUGMENTATION FROM COVARIATES

For a general treatment contrast measure θ0and its simple two-sample estimatorˆθ, assume that

n

?

where τi(η) is the influence function from the ith observation, η is a vector of unknown parameters, and

i = 1,...,n. Note that the influence function generally only involves a rather small number of unknown

parameters, which is not dependent on Z. Let ˆ η denote the consistent estimator for η. Generally, the above

asymptotic expansion is also valid with τibeing replaced by τi(ˆ η). Now, (1.2) can be rewritten as

?

ˆθ − θ0= n−1

i=1

τi(η) + op

?

1

√n

?

,

ˆθ − γ?

0

n−1

n

?

i=1

ξi

?

,

where ξi= (Ti− π)Zi/{π(1 − π)},i = 1,...,n. Then ˆ γ in (1.1) is the minimizer of

?

When the dimension of Z is not small, to obtain a stable minimizer, one may consider the following

regularized minimand:

n

?

n

i=1

{τi(ˆ η) − γ?ξi}2.(2.1)

Lλ(γ) =

i=1

{τi(ˆ η) − γ?ξi}2+ λ|γ|,

Page 4

On the covariate-adjusted estimation for the treatment difference

259

where λ is the lasso tuning parameter (Tibshirani, 1996) and | ∙ | denote the L1norm for a vector. For any

fixed λ, let the resulting minimizer be denoted by ˆ γ(λ). The corresponding augmented estimator and its

variance estimator are

ˆθlasso(λ) =ˆθ − ˆ γ(λ)?

?

n−1

n

?

i=1

ξi

?

and

ˆVlasso(λ) = n−2

n

?

i=1

{τi(ˆ η) − ˆ γ(λ)?ξi}2,

(2.2)

respectively. Asymptotically, one may ignore the variability of ˆ γ(λ) and treat it as a constant when we

make inferences about θ0. However, in some cases, we have found empirically that similar toˆθZTD,

ˆθlasso(λ) is biased partly due to the fact that ˆ γ(λ) and {ξi,i = 1,...,n} are correlated. In the simula-

tion study, we show via a simple example this undesirable finite-sample phenomenon. In practice, such

biasmaynothaverealimpactontheconclusionsaboutthetreatmentdifference, θ0,whenthestudysample

size is relatively large with respect to the dimension of Z.

One possible solution to reduce the correlation between ˆ γ(λ) and ξiis to use a cross validation proce-

dure. Specifically, we randomly split the data into K nonoverlapping sets {D1,...,DK} and construct an

estimator for θ0:

ˆθcv(λ) =ˆθ −1

n

n

?

i=1

ˆ γ(−i)(λ)?ξi,

where i ∈ Dki, ˆ γ(−i)(λ) is the minimizer of

?

j / ∈Dki

{τj(ˆ η(−i)) − γ?ξj}2+ λ|γ|,

and ˆ η(−i)is a consistent estimator for η with all data but not from Dki. Note that ˆ γ(−i)(λ) and ξi are

independent and no extra bias would be added fromˆθcv(λ) toˆθ. When n ? p, the variance ofˆθcv(λ) can

be estimated byˆVlasso(λ) given in (2.2). HoweverˆVlasso(λ) tends to underestimate its true variance when

p is not small.

Here, we utilize the above cross validation procedure to construct a natural variance estimator:

ˆVcv(λ) = n−2

n

?

i=1

{τi(ˆ η(−i)) − ˆ γ?

(−i)(λ)ξi}2.

In Appendix B, we justify that this estimator is better thanˆVlasso(λ). Moreover, when λ is close to zero

and p is large, that is, one almost uses the standard least square procedure to obtain ˆ γ(−i)(λ), the above

variance estimate can be modified slightly for improving its estimation accuracy (see Appendix B for

details). A natural “optimal” estimator using the above lasso procedure isˆθopt=ˆθcv(ˆλ), whereˆλ is the

penalty parameter value, which minimizesˆVcv(λ) over a range of λ values of interest. As a referee kindly

pointed out, when θ0is the mean difference, one may replace (2.1) by the simple least squared objective

function

n

?

without the need of estimating the influence function.

i=1

?

Ti− π

π(1 − π)

?2

(Yi− γ?Zi)2

Page 5

260L. TIAN AND OTHERS

3. APPLICATIONS

In this section, we show how to apply the new estimation procedure to various cases. To this end, we

only need to determine the initial estimateˆθ for the contrast measure of interest and its corresponding

first-order expansion in each application. First, we consider the case that the response is continuous or

binary and the group mean difference is the parameter of interest. Here,

ˆθ =1

n

n

?

i=1

?TiYi

π

−(1 − Ti)Yi

1 − π

?

.

In this case, it is straightforward to show that

ˆθ − θ0=1

n

n

?

i=1TiYi/πn, and ˆ μ0=?n

i=1

?Ti(Yi− ˆ μ1)

π

−(1 − Ti)(Yi− ˆ μ0)

1 − π

?

+ op

?

1

√n

?

,

where η = (μ1,μ0)?, ˆ μ1=?n

θ0= log{p1(1 − p0)/p0/(1 − p1)}, then

i=1(1 − Ti)Yi/(1 − π)n.

Now, when the response is binary with success rate pj for the treatment group j, j = 0,1, but

ˆθ = log( ˆ p1) − log(1 − ˆ p1) − log( ˆ p0) + log(1 − ˆ p0),

i=1TiYi/πn, and ˆ p0=?n

ˆθ − θ0=1

n

i=1

Last, we consider the case when Y is the time to a specific event but may be censored by an indepen-

dent censoring variable. To be specific, we observe (˜Y,?) where˜Y = Y ∧ C, ? = I(Y < C), C is the

censoring time, and I(∙) is the indicator function. A most commonly used summary measure for quan-

tifying the treatment difference in survival analysis is the ratio of two hazard functions. The two sample

Cox estimator is often used to estimate such a ratio. However, when the proportional hazards assumption

between two groups is not valid, this estimator converges to a parameter which may be difficult to interpret

as a measure of the treatment difference. Moreover, this parameter depends on the censoring distribution.

Therefore, it is desirable to use a model-free summary measure for the treatment contrast. One may simply

use the survival probability at a given time t0as a model-free summary for survivorship. For this case,

θ0= P(Y > t0|T = 1) − P(Y > t0|T = 0) andˆθ =ˆS1(t0) −ˆS0(t0), whereˆSj(∙) is the Kaplan–Meier

estimator of the survival function of Y in group j, j = 0,1. For this case,

n

?

where

ˆ Mij(s) = I(Ti= j)

where ˆ p1=?n

i=1(1 − Ti)Yi/(1 − π)n. For this case,

?(Yi− ˆ p1)Ti

n

?

π ˆ p1(1 − ˆ p1)−

(Yi− ˆ p0)(1 − Ti)

(1 − π) ˆ p0(1 − ˆ p0)

?

+ op

?

1

√n

?

.

ˆθ − θ0= −n−1

i=1

?

Ti

π

?t0

0

ˆS1(t0)dˆ Mi1(s)

?N

j=1I(˜Yj? s)Tj

−1 − Ti

1 − π

?t0

0

ˆS0(t0)dˆ Mi0(s)

j=1I(˜Yj? s)(1 − Tj)

?N

I(˜Yi? u)dˆ?j(u)

?

+ op

?

1

√n

?

,

?

I(˜Yi? s)?i−

?s

0

?

,

andˆ?j(∙) is the Nelson–Alan estimator for the cumulative hazard function of Y in group j (Flemming

and Harrington, 1991).

To summarize a global survivorship beyond using t-year survival rates, one may use the mean survival

time. Unfortunately, in the presence of censoring, such a measure cannot be estimated well. An alternative