Page 1

Biostatistics (2012), 13, 2, pp. 256–273

doi:10.1093/biostatistics/kxr050

Advance Access publication on January 30, 2012

On the covariate-adjusted estimation for an overall

treatment difference with data from a randomized

comparative clinical trial

LU TIAN

Department of Health Research & Policy, Stanford University, Stanford, CA 94305, USA

TIANXI CAI

Department of Biostatistics, Harvard University, Boston, MA 02115, USA

LIHUI ZHAO

Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA

LEE-JEN WEI∗

Department of Biostatistics, Harvard University, Boston, MA 02115, USA

wei@hsph.harvard.edu

SUMMARY

To estimate an overall treatment difference with data from a randomized comparative clinical study,

baseline covariates are often utilized to increase the estimation precision. Using the standard analysis

of covariance technique for making inferences about such an average treatment difference may not be

appropriate, especially when the fitted model is nonlinear. On the other hand, the novel augmentation pro-

cedure recently studied, for example, by Zhang and others (2008. Improving efficiency of inferences in

randomized clinical trials using auxiliary covariates. Biometrics 64, 707–715) is quite flexible. However,

in general, it is not clear how to select covariates for augmentation effectively. An overly adjusted estima-

tor may inflate the variance and in some cases be biased. Furthermore, the results from the standard infer-

ence procedure by ignoring the sampling variation from the variable selection process may not be valid.

In this paper, we first propose an estimation procedure, which augments the simple treatment contrast

estimator directly with covariates. The new proposal is asymptotically equivalent to the aforementioned

augmentation method. To select covariates, we utilize the standard lasso procedure. Furthermore, to make

valid inference from the resulting lasso-type estimator, a cross validation method is used. The validity of

the new proposal is justified theoretically and empirically. We illustrate the procedure extensively with a

well-known primary biliary cirrhosis clinical trial data set.

Keywords: ANCOVA; Cross validation; Efficiency augmentation; Mayo PBC data; Semi-parametric efficiency.

∗To whom correspondence should be addressed.

c ? The Author 2012. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Page 2

On the covariate-adjusted estimation for the treatment difference

257

1. INTRODUCTION

For a typical randomized clinical trial to compare two treatments, generally a summary measure θ0for

quantifying the treatment effectiveness difference can be estimated unbiasedly or consistently using its

simple two-sample empirical counterpart, sayˆθ. With the subject’s baseline covariates, one may obtain

a more efficient estimator for θ0via a standard analysis of covariance (ANCOVA) technique or a novel

augmentation procedure, which is well documented in Zhang and others (2008) and a series of papers

(Leon and others, 2003; Tsiatis, 2006; Tsiatis and others, 2008; Lu and Tsiatis, 2008; Gilbert and others,

2009; Zhang and Gilbert, 2010). The ANCOVA approach can be problematic, especially when the regres-

sion model is nonlinear, for example, the logistic or Cox model. For this case, the ANCOVA estimator

generally does not converge to θ0, but to a quantity which may be difficult to interpret as a treatment con-

trast measure. Moreover, in the presence of censored event time observations, this quantity may depend

on the censoring distribution. On the other hand, the above augmentation procedure, referred as ZTD,

in the literature always produces a consistent estimator for θ0, provided that the simple estimatorˆθ is

consistent.

In theory, the ZTD estimator, denoted byˆθZTDhereafter, is asymptotically more efficient thanˆθ no

matter how many covariates being augmented. In practice, however, an “overly augmented” or

“mis-augmented” estimator may have a larger variance than that ofˆθ and in special case may even have

undesirable finite sample bias. Recently, Zhang and others (2008) showed empirically that the ZTD via

the standard stepwise regression for variable selection performs satisfactorily when the number of covari-

ates is not large. In general, however, it is not clear that the standard inference procedures for θ0based on

estimators augmented by covariates selected via a rather complex variable selection process is appropriate

especially when the number of covariates involved is not small relative to the sample size. Therefore, it is

highly desirable to develop an estimation procedure to properly and systematically augmentˆθ and make

valid inference for the treatment difference using the data with practical sample sizes.

Now, let Y be the response variable, T be the binary treatment indicator, and Z be a p-dimensional

vector of baseline covariates including 1 as its first element and possibly transformations of original vari-

ables. The data, {(Yi,Ti,Zi),i = 1,...,n}, consist of n independent copies of (Y,T,Z), where T and Z

are independent of each other. Let P(T = 1) = π ∈ (0,1). First, suppose that we are interested in the

mean difference: θ0= E(Y|T = 1) − E(Y|T = 0). A simple unadjusted estimator is

n

?

which consistently estimates θ0. To improve efficiency in estimating θ0, one may employ the standard

ANCOVA procedure by fitting the following linear regression “working” model:

ˆθ =1

n

i=1

(Ti− π)Yi

π(1 − π),

E(Y|T,Z) = θT + γ?Z,

where θ and γ are unknown parameters. Since T ⊥ Z and {(Ti,Zi),i = 1,...,n} are independent copies

of (T,Z), the resulting ANCOVA estimator is asymptotically equivalent to

?

n

i=1

where ˆ γ is the ordinary least square estimator for γ of the model E(Y|Z) = γ?Z. As n → ∞, ˆ γ converges

to

γ0= argminγE(Y − γ?Z)2.

ˆθ − ˆ γ?

1

n

?

(Ti− π)Zi

π(1 − π)

?

,

(1.1)

Page 3

258L. TIAN AND OTHERS

It follows that the ANCOVA estimator is asymptotically equivalent to

ˆθ − γ?

0

?

1

n

n

?

i=1

(Ti− π)Zi

π(1 − π)

?

.(1.2)

In theory, sinceˆθ is consistent to θ0, the ANCOVA estimator is also consistent to θ0and more efficient

thanˆθ regardless of whether the above working model is correctly specified. Furthermore, as noted by

Tsiatis and others (2008), the nonparametric ANCOVA estimator proposed by Koch and others (1998)

andˆθZTDare also asymptotically equivalent to (1.2) when π = 0.5. We give details of this equivalence in

Appendix A.

The novel ZTD procedure is derived by specifying optimal estimating functions under a very general

semi-parametric setting. The efficiency gain from ˆθZTD has been elegantly justified using the semi-

parametric inference theory (Tsiatis, 2006). The ZTD is much more flexible than the ANCOVA method

in that it can handle cases when the summary measure θ0is beyond the simple difference of two group

means. On the other hand, the ANCOVA method may only work under above simple linear regression

model.

In this paper, we study the estimator (1.1), which augmentsˆθ directly with the covariates. The key

question is how to choose ˆ γ in (1.1) especially when p is not small with respect to n. Here, we utilize

the lasso procedure with a cross validation process to construct a systematic procedure for selecting

covariates to increase the estimation precision. The validity of the new proposal is justified theoretically

and empirically via an extensive simulation study. The proposal is also illustrated with the data from a

clinical trial to evaluate a treatment for a specific liver disease.

2. ESTIMATING THE TREATMENT DIFFERENCE VIA PROPER AUGMENTATION FROM COVARIATES

For a general treatment contrast measure θ0and its simple two-sample estimatorˆθ, assume that

n

?

where τi(η) is the influence function from the ith observation, η is a vector of unknown parameters, and

i = 1,...,n. Note that the influence function generally only involves a rather small number of unknown

parameters, which is not dependent on Z. Let ˆ η denote the consistent estimator for η. Generally, the above

asymptotic expansion is also valid with τibeing replaced by τi(ˆ η). Now, (1.2) can be rewritten as

?

ˆθ − θ0= n−1

i=1

τi(η) + op

?

1

√n

?

,

ˆθ − γ?

0

n−1

n

?

i=1

ξi

?

,

where ξi= (Ti− π)Zi/{π(1 − π)},i = 1,...,n. Then ˆ γ in (1.1) is the minimizer of

?

When the dimension of Z is not small, to obtain a stable minimizer, one may consider the following

regularized minimand:

n

?

n

i=1

{τi(ˆ η) − γ?ξi}2.(2.1)

Lλ(γ) =

i=1

{τi(ˆ η) − γ?ξi}2+ λ|γ|,

Page 4

On the covariate-adjusted estimation for the treatment difference

259

where λ is the lasso tuning parameter (Tibshirani, 1996) and | ∙ | denote the L1norm for a vector. For any

fixed λ, let the resulting minimizer be denoted by ˆ γ(λ). The corresponding augmented estimator and its

variance estimator are

ˆθlasso(λ) =ˆθ − ˆ γ(λ)?

?

n−1

n

?

i=1

ξi

?

and

ˆVlasso(λ) = n−2

n

?

i=1

{τi(ˆ η) − ˆ γ(λ)?ξi}2,

(2.2)

respectively. Asymptotically, one may ignore the variability of ˆ γ(λ) and treat it as a constant when we

make inferences about θ0. However, in some cases, we have found empirically that similar toˆθZTD,

ˆθlasso(λ) is biased partly due to the fact that ˆ γ(λ) and {ξi,i = 1,...,n} are correlated. In the simula-

tion study, we show via a simple example this undesirable finite-sample phenomenon. In practice, such

biasmaynothaverealimpactontheconclusionsaboutthetreatmentdifference, θ0,whenthestudysample

size is relatively large with respect to the dimension of Z.

One possible solution to reduce the correlation between ˆ γ(λ) and ξiis to use a cross validation proce-

dure. Specifically, we randomly split the data into K nonoverlapping sets {D1,...,DK} and construct an

estimator for θ0:

ˆθcv(λ) =ˆθ −1

n

n

?

i=1

ˆ γ(−i)(λ)?ξi,

where i ∈ Dki, ˆ γ(−i)(λ) is the minimizer of

?

j / ∈Dki

{τj(ˆ η(−i)) − γ?ξj}2+ λ|γ|,

and ˆ η(−i)is a consistent estimator for η with all data but not from Dki. Note that ˆ γ(−i)(λ) and ξi are

independent and no extra bias would be added fromˆθcv(λ) toˆθ. When n ? p, the variance ofˆθcv(λ) can

be estimated byˆVlasso(λ) given in (2.2). HoweverˆVlasso(λ) tends to underestimate its true variance when

p is not small.

Here, we utilize the above cross validation procedure to construct a natural variance estimator:

ˆVcv(λ) = n−2

n

?

i=1

{τi(ˆ η(−i)) − ˆ γ?

(−i)(λ)ξi}2.

In Appendix B, we justify that this estimator is better thanˆVlasso(λ). Moreover, when λ is close to zero

and p is large, that is, one almost uses the standard least square procedure to obtain ˆ γ(−i)(λ), the above

variance estimate can be modified slightly for improving its estimation accuracy (see Appendix B for

details). A natural “optimal” estimator using the above lasso procedure isˆθopt=ˆθcv(ˆλ), whereˆλ is the

penalty parameter value, which minimizesˆVcv(λ) over a range of λ values of interest. As a referee kindly

pointed out, when θ0is the mean difference, one may replace (2.1) by the simple least squared objective

function

n

?

without the need of estimating the influence function.

i=1

?

Ti− π

π(1 − π)

?2

(Yi− γ?Zi)2

Page 5

260L. TIAN AND OTHERS

3. APPLICATIONS

In this section, we show how to apply the new estimation procedure to various cases. To this end, we

only need to determine the initial estimateˆθ for the contrast measure of interest and its corresponding

first-order expansion in each application. First, we consider the case that the response is continuous or

binary and the group mean difference is the parameter of interest. Here,

ˆθ =1

n

n

?

i=1

?TiYi

π

−(1 − Ti)Yi

1 − π

?

.

In this case, it is straightforward to show that

ˆθ − θ0=1

n

n

?

i=1TiYi/πn, and ˆ μ0=?n

i=1

?Ti(Yi− ˆ μ1)

π

−(1 − Ti)(Yi− ˆ μ0)

1 − π

?

+ op

?

1

√n

?

,

where η = (μ1,μ0)?, ˆ μ1=?n

θ0= log{p1(1 − p0)/p0/(1 − p1)}, then

i=1(1 − Ti)Yi/(1 − π)n.

Now, when the response is binary with success rate pj for the treatment group j, j = 0,1, but

ˆθ = log( ˆ p1) − log(1 − ˆ p1) − log( ˆ p0) + log(1 − ˆ p0),

i=1TiYi/πn, and ˆ p0=?n

ˆθ − θ0=1

n

i=1

Last, we consider the case when Y is the time to a specific event but may be censored by an indepen-

dent censoring variable. To be specific, we observe (˜Y,?) where˜Y = Y ∧ C, ? = I(Y < C), C is the

censoring time, and I(∙) is the indicator function. A most commonly used summary measure for quan-

tifying the treatment difference in survival analysis is the ratio of two hazard functions. The two sample

Cox estimator is often used to estimate such a ratio. However, when the proportional hazards assumption

between two groups is not valid, this estimator converges to a parameter which may be difficult to interpret

as a measure of the treatment difference. Moreover, this parameter depends on the censoring distribution.

Therefore, it is desirable to use a model-free summary measure for the treatment contrast. One may simply

use the survival probability at a given time t0as a model-free summary for survivorship. For this case,

θ0= P(Y > t0|T = 1) − P(Y > t0|T = 0) andˆθ =ˆS1(t0) −ˆS0(t0), whereˆSj(∙) is the Kaplan–Meier

estimator of the survival function of Y in group j, j = 0,1. For this case,

n

?

where

ˆ Mij(s) = I(Ti= j)

where ˆ p1=?n

i=1(1 − Ti)Yi/(1 − π)n. For this case,

?(Yi− ˆ p1)Ti

n

?

π ˆ p1(1 − ˆ p1)−

(Yi− ˆ p0)(1 − Ti)

(1 − π) ˆ p0(1 − ˆ p0)

?

+ op

?

1

√n

?

.

ˆθ − θ0= −n−1

i=1

?

Ti

π

?t0

0

ˆS1(t0)dˆ Mi1(s)

?N

j=1I(˜Yj? s)Tj

−1 − Ti

1 − π

?t0

0

ˆS0(t0)dˆ Mi0(s)

j=1I(˜Yj? s)(1 − Tj)

?N

I(˜Yi? u)dˆ?j(u)

?

+ op

?

1

√n

?

,

?

I(˜Yi? s)?i−

?s

0

?

,

andˆ?j(∙) is the Nelson–Alan estimator for the cumulative hazard function of Y in group j (Flemming

and Harrington, 1991).

To summarize a global survivorship beyond using t-year survival rates, one may use the mean survival

time. Unfortunately, in the presence of censoring, such a measure cannot be estimated well. An alternative

Page 6

On the covariate-adjusted estimation for the treatment difference

261

is to use the so-called restricted mean survival time, that is, the area under the survival function up to time

point t0. The corresponding consistent estimator is the area under the Kaplan–Meier curve. For this case,

θ0= E(Y ∧ t0|T = 1) − E(Y ∧ t0|T = 0) and

?t0

For this case,

?

π

0

ˆθ =

0

ˆS1(s)ds −

?t0

0

ˆS0(s)ds,

ˆθ − θ0 = n−1

n

?

i=1

−Ti

?t0

?

?t0

s

ˆS1(t)dt

?N

?t0

j=1I(˜Yj? s)Tj

?

?

dˆ Mi1(s)

+1 − Ti

1 − π

0

?t0

s

ˆS0(t)dt

?N

j=1I(˜Yj? s)(1 − Tj)

?

dˆ Mi0(s)

?

+ op(

1

√n).

4. A SIMULATION STUDY

We conducted an extensive simulation study to examine the finite sample performance of the new esti-

matesˆθcv(λ) andˆθoptfor θ0. First, we investigate whetherˆVcv(λ) estimates the true variance ofˆθcv(λ)

well under various practical settings. We also examine the finite sample properties for the interval estima-

tion procedure based on the optimalˆθopt. To this end, we consider the following models for generating the

underlying data:

1. the linear regression model with continuous response

Y = mT(Z) + N(0,1);

2. the logistic regression model with binary response

P(Y = 1|T,Z) = [1 + exp{−mT(Z)}]−1;

3. the Cox regression model with survival response

Y = ?0exp{mT(Z)},

where ?0and censoring time are generated from the unit exponential distribution and U(0,3),

respectively, and we are interested in survival curves over the time interval [0,t0] = [0,2.5].

Throughout we let n = 200 and generate (Z[1],..., Z[100])?from multivariate normal distribution with

mean 0, variance 1, and a compound symmetry covariance ℘ chosen to be either 0 or 0.5. For each

generated data set, the 20-fold cross validation is used to calculateˆθcv(λ) andˆVcv(λ) over a sequence of

tuning parameters {λ1,λ2,...,λ100}, where λ1is chosen such that ˆ γ(λ1) = 0 for all simulated data sets,

λk= 10−3/98λk−1for k = 2,...,99 and λ100= 0. In the first set of simulation, we set

20

?

m0(Z) =

j=1

j

20Z[j],

m1(Z) = 1 +

20

?

j=1

j

20Z[j].

Page 7

262L. TIAN AND OTHERS

Fig. 1. Comparing various estimates forˆθcv(λ) at {λ1,...,λ100}: the empirical variance ofˆθcv(λ) (black curve);

ˆVcv(λ) (dashed curve);ˆVlasso(λ) (grey curve); (a–c) for independent coviariate; (d–f) for dependent covariate.

All the results are summarized based on 5000 replications. In Figure 1, we present the average ofˆVcv(λ),

the average ofˆVlasso(λ), and the empirical variance ofˆθcv(λ) when ℘ = 0 for continuous, binary, and sur-

vival responses, respectively. The results suggest thatˆVcv(λ) approximates the true variance ofˆθcv(λ) very

well; whileˆVlasso(λ) obtained without cross validation tends to severely underestimate the true variance.

When the covariates are correlated with ℘ = 0.5, the corresponding results are presented in Figure 1. The

results are consistent with the case with ℘ = 0.

Next, we examine the performance of the optimal estimatorˆθopt=ˆθcv(ˆλ), whereˆλ is chosen to be

the minimizer ofˆVcv(λ),λ ∈ {λ1,...,λ100}. For each simulated data set, we construct a 95% confidence

intervals(CI)basedonˆθoptandˆVopt=ˆVcv(ˆλ).Wesummarizedresultsfromthe5000replicationsbasedon

the empirical bias, standard error, and coverage level and length of the constructed CIs. For comparisons,

we also obtain those values based on the simple estimatorˆθ,ˆθZTD, andˆθcv(λ0) along with their variance

Page 8

On the covariate-adjusted estimation for the treatment difference

263

Table 1. The empirical bias, standard error, and coverage levels and lengths for the 0.95 CI based onˆθ,

ˆθopt,ˆθcv(λ0), andˆθZTD

ResponseEstimatorIndependent covariatesCorrelated covariates

mT(Z) =?20

ECL (%)

94.9

94.2

94.7

87.2

j=1jZ[j]/20 + T

BIAS

−0.005

0.001

0.001

−0.001

0.004

0.004

0.003

−0.005

0.005

0.005

0.005

0.005

BIAS

0.007

0.002

0.002

0.003

ESE

0.403

0.169

0.167

0.204

EAL (10−3)

1.580 (1.1†)

0.648 (0.6)

0.652 (0.6)

0.622 (0.6)

ESE

1.100

0.166

0.163

0.359

EAL (10−3)

4.264 (3.0)

0.743 (1.9)

0.749 (1.9)

0.749 (1.8)

ECL (%)

94.4

97.0

97.3

72.6

Continuous

ˆθ

ˆθopt

ˆθcv(λ0)

ˆθZTD

ˆθ

ˆθopt

ˆθcv(λ0)

ˆθZTD

ˆθ

ˆθopt

ˆθcv(λ0)

ˆθZTD

Binary

0.009

0.003

0.003

−0.011

0.003

0.001

0.001

0.004

0.291

0.245

0.243

0.259

1.136 (0.2)

0.946 (0.7)

0.953 (0.7)

0.822 (0.7)

95.1

94.6

94.9

88.9

0.271

0.191

0.189

0.201

1.047 (0.3)

0.745 (0.5)

0.747 (0.5)

0.508 (0.7)

94.6

95.2

95.5

78.9

Survival

0.164

0.127

0.127

0.141

0.626 (0.2)

0.476 (0.4)

0.479 (0.4)

0.457 (0.4)

94.1

93.7

94.0

89.5

0.173

0.112

0.111

0.122

0.665 (0.1)

0.426 (0.3)

0.427 (0.3)

0.401 (0.3)

94.5

93.9

94.2

89.8

mT(Z) = (T + 1)?20

0.8763.476 (2.6)

0.5332.084 (2.0)

0.530 2.097 (2.0)

0.583 2.068 (2.2)

j=1{(−1)T(Z2

94.9

94.4

94.8

91.1

[j]− 1)/2 + jZ[j]/20} + 2T

0.009 1.502

−0.038

0.0691.191

0.3901.305

Continuous

ˆθ

ˆθopt

ˆθcv(λ0)

ˆθZTD

ˆθ

ˆθopt

ˆθcv(λ0)

ˆθZTD

ˆθ

ˆθopt

ˆθcv(λ0)

ˆθZTD

0.019

0.002

0.016

−0.159

0.023

0.017

0.021

−0.023

−0.003

−0.005

−0.002

−0.023

5.499 (4.4)

4.618 (8.4)

4.685 (8.7)

4.193 (7.1)

93.0

93.9

94.5

88.9

1.188

Binary0.288

0.242

0.240

0.265

1.130 (0.2)

0.935 (0.7)

0.941 (0.7)

0.855 (0.8)

94.3

94.7

95.0%

88.8

−0.001

−0.003

0.002

−0.006

0.010

0.005

0.007

0.014

0.290

0.188

0.187

0.201

1.140 (0.3)

0.753 (0.6)

0.757 (0.6)

0.546 (0.7)

95.4

95.4

95.7

82.8

Survival0.173

0.141

0.140

0.157

0.659 (0.1)

0.531 (0.4)

0.534 (0.4)

0.515 (0.4)

93.7

93.6

93.8

89.4

0.173

0.114

0.114

0.120

0.663 (0.1)

0.431 (0.3)

0.433 (0.3)

0.411 (0.3)

94.6

94.4

94.6

91.4

BIAS, empirical bias; ESE, empirical standard error of the estimator; EAL, empirical average length; and ECL: empirical coverage

level.

†The Monte Carlo standard error in estimating the average length.

estimators, where λ0is the minimizer of the empirical variance ofˆθcv(λ0). In all the numerical studies,

the forward subset selection procedure coupled with BIC is used to select variables for the efficiency

augmentation in the ZTD procedure. The results are summarized in Table 1. The coverage levels forˆθopt

are close to the nominal counterparts and the interval lengths are almost identical to those based on the

estimate with the true optimal λ0. On the other hand, the simple estimateˆθ tends to have substantially

wider interval estimates thanˆθopt,ˆθcv(λ0), andˆθZTD. The empirical standard error ofˆθZTDis slightly

greater than that ofˆθoptorˆθcv(λ0), which implies the advantages of lasso procedure. More importantly,

the naive variance estimator ofˆθZTDmay severely underestimate the true variance and thus results in

Page 9

264L. TIAN AND OTHERS

much more liberal confidence interval estimation procedure, which potentially can be corrected via cross

validation. In summary, for all cases studied, the augmented estimators can substantially improve the

efficiency ofˆθ in terms of narrowing the average length of the confidence interval of θ0andˆθopt-based

inference is more reliable than that based onˆθZTD. Furthermore, in the variance estimation forˆθopt =

ˆθcv(ˆλ), the variability inˆλ may cause slightly downward bias, which is almost negligible in our empirical

studies. Last, all estimators considered here are almost unbiased in the first set of simulation.

For the second set of simulation, we repeat the above numerical study with

m0(Z) =

20

?

j=1

?

(Z2

[j]− 1) +

j

10Z[j]

?

and

m1(Z) =

20

?

j=1

?

−(Z2

[j]− 1) +

j

10Z[j]

?

+ 2.

We augment the simple estimator by Z = (Z[1],..., Z[40], Z2

are reported in Figure 2(a–f) and Table 1. The results are similar to those from the first set of simulation

study except that for the continuous outcome, the empirical bias ofˆθZTDis not trivial relative to the cor-

responding standard error. On the other hand, the estimateˆθoptis almost unbiased for all cases as ensured

by the cross validation procedure. Note that without knowing the practical meanings of the response, the

absolute magnitude of the bias alone is difficult to interpret and a seemingly substantial bias relative to

the standard error may still be irrelevant in practice. However, the presence of such a bias still poses a

risk in making statistical inference on marginal treatment effect. In further simulations (not reported), we

have found that the bias cannot be completely eliminated by increasing sample size or including quadratic

transformation in Z. Last, we would like to pointed out the presence of bias is a uncommon finite sample

phenomenon and does not undermine the asymptotical validity of ZTD and similar procedures. For exam-

ple, under the aforementioned setup if we reduce the dimension of Z to 10 and increase the sample size to

500, then the bias becomes essentially 0.

For the third set of simulation, we examine the potential efficiency loss due to not including important

nonlinear transformations of baseline covariates in the efficiency augmentation. To this end, we simulate

continuous, binary and survival outcomes as in the previous stimulation study with

?Z2

[1],..., Z2

[40])?. The corresponding results

m0(Z) =

20

?

j=1

[j]− 1

2

+

j

20Z[j]

?

and

m1(Z) =

20

?

j=1

?

−(Z2

[j]− 1) +

j

10Z[j]

?

+ 2.

We augment the efficiency of the initial estimator first by Z1= (Z[1],..., Z[100])?and second by Z2=

(Z[1],..., Z[100], Z2

based on 5000 replications. As expected, the empirical performance of the estimator augmented by Z2is

superior to that of its counterpart using Z1. The gains in efficiency for binary and survival outcomes are

less significant than that for continuous outcome, which is likely due to the fact that the influence function

ofˆθ is neither a linear nor a quadratic function of Z[j], j = 1,...,100 in the binary or survival setting.

In the fourth set of simulation, we examine the “null model” setting in which none of the covariates

are related to the response. To this end, we generate continuous responses Y from the normal distribution

[1],..., Z2

[20])?. In Table 2, we present the empirical bias and standard error ofˆθopt

Page 10

On the covariate-adjusted estimation for the treatment difference

265

Fig. 2. Comparing various estimates forˆθcv(λ) at {λ1,...,λ100}: the empirical variance ofˆθcv(λ) (black curve);

ˆVcv(λ) (dashed curve);ˆVlasso(λ) (grey curve) ; (a–c) for independent coviariate; (d–f) for dependent covariate.

Table 2. The empirical bias and standard error ofˆθoptaugmented by Z1and Z2

ResponseAugmentation vector Independent covariatesCorrelated covariates

BIAS

−0.024

−0.020

−0.001

0.001

0.037

0.037

ESE

0.770

0.745

0.261

0.258

0.156

0.154

BIAS

−0.085

−0.035

−0.004

−0.002

0.004

0.003

ESE

1.831

1.492

0.239

0.226

0.133

0.124

Continuous

Z1

Z2

Z1

Z2

Z1

Z2

Binary

Survival

BIAS, empirical bias; ESE, empirical standard error.

Page 11

266L. TIAN AND OTHERS

Fig. 3. Empirical variance ofˆθopt(wiggly solid curve) and its variance estimator (dashed curve) in the presence of

high-dimensional noise covariates. The horizontal solid curve presents the optimal variance level.

N(0,1) for T = 0 and N(1,1) for T = 1. The covariate Z is from a standard multivariate normal

distribution generated independent of Y. For each generated data set, we obtain the optimal estimatorˆθopt

and its variance estimator as in the previous simulation study. Based on 3000 replications, we estimate the

empirical variance ofˆθoptand the average of the variance estimator for given combination of n and p. To

examine the effect of “overadjustment”, we let p = 0,20,40,...,780 and 800 while fixing the sample

size n at 200. In Figure 3, we present the empirical average forˆVcv(ˆλ) (dashed curve) and the empirical

variance ofˆθopt(solid curve). The optimal estimator is the naive estimatorˆθ without any covariate-based

augmentation in this case. The figure demonstrates that the variance ofˆθoptincreases very slowly with

the dimension p and is still near the optimal level even with 800 noise covariates. The variance estimator

slightly underestimates the true variance and the downward bias increases with the dimension p, which

could be attributable to the fact that we useˆVcv(ˆλ) = minλ{ˆVcv(λ)} as the variance estimator without any

adjustments. On the other hand, the bias remains rather low (<6% of the empirical variance) such that the

valid inference on θ0can still be made over the entire range of p. In Figure 3, we represent the similar

results with noise covariates generated from dependent multivariate normal distribution as in the previous

simulation studies.

5. AN EXAMPLE

We illustrate the new proposal with the data from a clinical trial to compare D-penicillmain and placebo

for patients with primary biliary cirrhosis (PBC) of liver (Therneau and Grambsch, 2000). The primary

endpoint is the time to death. The trial was conducted between 1974 and 1984. For illustration, we use

the difference of two restricted mean survival time up to t0= 3650 (days) as the primary parameter θ0

of interest. Moreover, we consider 18 baseline covariates for augmentation: gender, stages (1, 2, 3, and

4), presence of ascites, edema, hepatomegaly or enlarged liver, blood vessel malformations in the skin,

log-transformed age, serum albumin, alkaline phosphotase, aspartate aminotransferase, serum bilirubin,

serum cholesterol, urine copper, platelet count, standardized blood clotting time, and triglycerides. There

Page 12

On the covariate-adjusted estimation for the treatment difference

267

Fig. 4. Analysis results for PBC data.

are 276 patients with complete covariate information (136 and 140 in control and D-penicillmain arms,

respectively). The data used in our analysis are given in the Appendix D.1 of Flemming and Harrington

(1991). Figure 4 provides the Kaplan–Meier curves for the two treatment groups. The simple two sample

estimateˆθ is 115.2 (days) with an estimated standard errorˆV of 156.6 (days). The corresponding 95%

confidence interval for the difference is (−191.8, 422.1) (days). The optimal estimateˆθoptaugmented

additively with the above 18 coavariates is 106.3 with an estimated standard errorˆVoptof 121.4. These

estimates were obtained via a 23-fold cross validation (note that 276 = 23 × 12) described in Section 2.

The corresponding 95% CI is (−131.8, 344.4). To examine the effect of K on the result, we repeated

the analysis with 92-fold cross validation (n = 276 = 92 × 3) and the optimal estimator barely changes

(108.3 with a 95% CI of (−128.5, 345.1)). In our limited experience, the estimation result is not sensitive

to K ? max(20,n1/2).

To examine how robust the new proposal is with respect to different augmentations. We consider a

case which includes the above 18 covariates but also their quadratic terms as well as all their two-way

interactions. The dimension of Z is 178 for this case. The resulting optimalˆθoptis 110.1 with an estimated

standard error of 122.6. Note the resulting estimates are amazingly close to those based on the augmented

procedure with 18 covariates only.

To examine the advantage of using the cross validation for the standard error estimation, in Figure 4,

we plotˆVcv(λ) andˆVlasso(λ) over the order of 100 λ’s, which were generated using the same approach

as in Section 4. Note thatˆVlasso(λ) is substantially smaller thanˆVcv(λ), especially when λ approaches to

0, that is, there is no penalty for the L2loss function. Forˆθopt,ˆVlassois about 20% smaller than its cross

validated counterpart.

It has been shown via numerical studies that the ZTD performs well via the standard stepwise re-

gression by ignoring the sampling variation of the estimated weights when the dimension of Z is not

large with respect to n. However, it is not clear how the ZTD augmentation performs with a relatively

high-dimensional covariate vector Z. It would be interesting to compare the ZTD and the new proposal

with the PBC data. To this end, we implement ZTD augmentation procedure using (1) baseline covari-

ates (p = 18); (2) baseline covariates and their quadratic transformations as well as all their two-way

Page 13

268L. TIAN AND OTHERS

Table 3. ComparisonsbetweenthenewandZTDestimatewiththedatafromtheMayoClinicPBCclinical

trial (SE: estimated standard error)

p

The new optimal procedureZTD

Estimate

92.0

106.3

110.1

SE

121.5

121.4

122.6

Estimate

96.3

126.4

65.3

SE

119.4

111.7

114.6

5

18

178

BIAS, empirical bias; ESE, empirical standard error.

interactions (p = 178); and (3) only five baseline covariates: edema and log-transformed age, serum

albumin, serum bilirubin, and standardized blood clotting time, which were selected in building a mul-

tivariate Cox regression model to predict the patient’s survival by Therneau and Grambsch (2000). Note

that the ZTD procedure augments the following estimating equations for θ0:

n

?

i=1

(1 − Ti)˜?i

ˆK0(˜Yi∧ t0)[˜Yi∧ t0− at0] = 0,

n

?

i=1

Ti˜?i

ˆK1(˜Yi∧ t0)[˜Yi∧ t0− at0− θ] = 0,

whereat0is the restricted mean for the comparator and θ is the treatment difference,˜?i= I(Yi∧t0< Ci),

andˆKj(∙) is the Kaplan–Meier estimate for the survival function of censoring time C in group T = j, j =

0,1. In Table 3, we present the resulting ZDT point estimates and their corresponding standard error

estimates for the above three cases. Here, we used the standard forward stepwise regression procedure to

select the augmentation covariates with the entry Type I error rate of 0.10 (Zhang and others, 2008; Zhang

and Gilbert, 2010). It appears that using the entire data set for selecting covariates and making inferences

about θ0may introduce nontrivial bias and an overly optimistic standard error estimate when p is large.

On the other hand, the new procedure does not lose efficiency and yields similar result as ZTD procedure

when p is small.

6. REMARKS

The new proposal performs well even when the dimension of the covariates involved for augmentation is

not large. The new estimation procedure may be implemented for improving estimation precision regard-

less of the marginal distributions of the covariate vectors between two treatment groups being balanced.

On the other hand, to avoid post ad hoc analysis, we strongly recommend that the investigators prespecify

the set of all potential covariates for adjustment in the protocol or the statistical analysis plan before the

data from the clinical study are unblinded.

The stratified estimation procedure for the treatment difference is also commonly used for improving

the estimation precision using baseline covariate information. Specifically, we divide the population into

K strata based on baseline variables, denoted by {Z ∈ B1},...,{Z ∈ BK}, the stratified estimator is

?K

ˆθstr=

k=1ˆθkwk

?K

k=1wk

,

Page 14

On the covariate-adjusted estimation for the treatment difference

269

whereˆθk and wk are corresponding simple two sample estimator for the treatment difference and the

weight for the kth stratum, k = 1,..., K. In general, the underlying treatment effect may vary across

strata and consequently the stratified estimator may not converge to θ0. If θ0 is the mean difference

between two groups and wkis the size of the kth stratum,ˆθstris a consistent estimator for θ0. Like the

ANCOVA, the stratified estimation procedure may be problematic. On the other hand, one may use the in-

dicators {I(Z ∈ B1),..., I(Z ∈ BK)}?to augmentˆθ to increase the precision for estimating the treatment

difference θ0.

In this paper, we follow the novel approach taken, for example, by Zhang and others (2008) for

augmenting the simple two sample estimator but present a systematic practical procedure for choosing

covariates for making valid inferences about the overall treatment difference. When p is large, there are

several advantages over other approaches for augmentingˆθ with covariates. First, it avoids the complex

variable selection step in two arms separately as proposed in Zhang and others (2008). Second, compared

with other variable selection methods such as the stepwise regression, the lasso method directly controls

the variability of ˆ γ, which improves the empirical performance of the augmented estimator. Third, the

cross validation step enables more accurate estimation of the variance of the augmented estimator. When

λ increases from 0 to +∞, the resulting estimator varies from the fully augmented estimator using all

the components of Zi toˆθ. The lasso procedure also possesses superior computational efficiency with

high-dimensional covariates to alternatives. Last, sinceˆθZTDcan also be viewed as a generalized method

of moment estimator with

?

θ −ˆθ0

n−1?n

i=1ξi

?

≈ 0

as moment conditions (Hall, 2005), the cross validation method introduced here may be extended to a

much broader context than the current setting.

It is important to note that if a permuted block treatment allocation rule is used for assigning patients

to the two treatment groups, the augmentation method proposed in the paper can be easily modified. For

instance, for the K-fold cross validation process, one may choose the sets {Dk,k = 1,..., K} so that

each permuted block would not be in different sets.

For assigning patients to the treatment groups, a stratified random treatment allocation rule is also

often utilized to ensure a certain level of balance between the two groups in each stratum. For this case,

a weighted average θ0of the treatment differences θk0with weight wk,k = 1,..., K, across K strata

may be the parameter of interest for quantifying an overall treatment contrast. Letˆθkbe the simple two

sample estimator for θk0and ˆ wkbe the corresponding empirical weight for wk. Then the weight average

ˆθ =?

we can use the weighted average?

obtain a valid variance estimate even for the simple two sample estimatorˆθ (Shao and others, 2010). How

to extend the augmentation procedure to cases with more complicated treatment allocation rule warrants

further research.

kˆ wkˆθk/?

kˆ wkis the simple estimator for θ0. For the kth stratum, one may use the same approach

as discussed in this paper to augmentˆθk, let the resulting optimal estimator be denoted byˆθopt,k. Then

kˆ wkˆθopt,k/?

kˆ wk to estimate θ0. On the other hand, for the case

with the dynamic treatment allocation rules (see, e.g., Pocock and Simon, 1975), it is not clear how to

APPENDIX A

Asymptotical equivalence between ZTD and ANCOVA

When the group mean is the parameter of interest, the naive estimator for θ0can viewed as the root of the

estimating equation

Page 15

270L. TIAN AND OTHERS

n

?

i=1

?

Ti

1 − Ti

?

S0(θ,a,Yi,Ti) =

n

?

i=1

?

Ti

1 − Ti

?

(Yi− a − Tiθ) = 0,

where a = E(Y|T = 0) is a nuisance parameter. In the ZTD augmentation procedure, one may augment

this simple estimating equation via following steps:

• Obtain the initial estimator

?

ˆθ

ˆ a

?

=1

n

n

?

i=1

?

(Ti−π)Yi

π(1−π)

(1−Ti)Yi

1−π

?

from the original estimating equation

• Obtainˆβ1andˆβ?

0by minimizing the objective function

n

?

i=1

Ti{S0(ˆθ, ˆ a,Yi,Ti) − β?

1Zi}2

and

n

?

i=1

(1 − Ti){S0(ˆθ, ˆ a,Yi,Ti) − β?

0Zi}2,

respectively. In other words, usingˆβ?

• Solve the augmented estimating equations

n

?

to obtainˆθZTD.

The resultingˆθZTDis always asymptotically more efficient than the naive counterpart and a simple sand-

wich variance estimator can be used to consistently estimate the variance of the new estimator. It has been

shown thatˆθZTDis asymptotically the most efficient one from the class of the estimators

?

i=1

whose members are all consistent for θ0and asymptotically normal. When π = 0.5

ˆθ − θ0=1

n

i=1

the optimal weight minimizing the variance of

jZ to approximate E{S0(θ0,a0; Y,T)|Z,T = j}.

i=1

?

Ti

1 − Ti

?

S0(θ,a,Yi,Ti) −

n

?

i=1

(Ti− π)

?

ˆβ?

−ˆβ?

1Zi

0Zi

?

= 0

A =

ˆθγ=ˆθ − γ?

?

n−1

n

?

(Ti− π)Zi

π(1 − π)

?????γ ∈ Rp

?

,

n

?

{2(2Ti− 1)Yi− θ0},

ˆθ − γ?1

n

n

?

i=1

2(2Ti− 1)Zi

is simply

[E{2(2Ti− 1)Zi}⊗2]−1E[2(2Ti− 1)Zi{2(2Ti− 1)Yi− θ0}] = [E(Z⊗2

Therefore,ˆθZTDis asymptotically equivalent to the commonly used ANCOVA estimator. This equivalence

is noted in Tsiatis and others (2008).

i)]−1E(ZiYi) = γ0.

Page 16

On the covariate-adjusted estimation for the treatment difference

271

APPENDIX B

Justification of the cross validation based variance estimator forˆθcv(λ)

To justify the cross validation based variance estimator, first consider the expansion

?

ˆθcv(λ) =

ˆθ − γ?

0

?

n−1

n

?

i=1

ξi

??

− n−1

n

?

i=1

{ˆ γ(−i)(λ) − γ0}?ξi.

The variance ofˆθcv(λ) can be expressed as V11+ V22− 2V12, where

V11= E

?

ˆθ − γ?

0

?

n−1

n

?

i=1

ξi

??2

,

V22=

1

n2E

?n

i=1

?

?

{ˆ γ(−i)(λ) − γ0}?ξi

?2

,

and

V12=1

nE

??

ˆθ − γ?

0

n−1

n

?

i=1

ξi

??

n

?

i=1

{ˆ γ(−i)(λ) − γ0}?ξi

?

.

First,

V12=1

n2E

?n

i=1

?

n

?

?

(τi(ˆ η) − γ?

0ξi)

n

?

i=1

{ˆ γ(−i)(λ) − γ0}?ξi

?

≈1

n2

i?=j

E[(τi(ˆ η) − γ?

0ξi){ˆ γ(−j)(λ) − γ0}?]Eξj+

1

n2

n

?

i=1

E[(τi(ˆ η) − γ?

0ξi){ˆ γ(−i)(λ) − γ0}?ξi]

≈1

n2

i=1

E{ˆ γ(−i)(λ) − γ0}?E[(τi(ˆ η) − γ?

0ξi)ξi] ≈ 0.

Therefore, the variance of the augmented estimatorˆθcv(λ) is approximately

V11+ V22

=1

≈ˆVcv(λ) +(n − 1)

n

In our experience, d(λ) = E[ξ?

O(n−1) and is negligible, when λ is not close 0. Therefore, in general,ˆVcv(λ) serves as a satisfactory

estimator for the variance ofˆθcv(λ). For small λ, to explicitly estimate d(λ), the covariance between

ξ?

ˆd(λ) =2(K2− 1)

n(n − 1)K

as an ad hoc jackknife-type estimator, where ˆ γ(λ) is the lasso solution based on the entire data set. To

justify the approximation, first note that when λ is close to 0,

n[E{(τi(ˆ η) − γ?

0ξi)2} + E{( ˆ γ(−i)(λ) − γ0)?ξi}2] +(n − 1)

E[ξ?

n

E[ξ?

1{ˆ γ(−1)(λ) − γ0}ξ?

2{ˆ γ(−2)(λ) − γ0}]

1ˆ γ(−1)(λ)ξ?

2ˆ γ(−2)(λ)].

1ˆ γ(−1)(λ)ξ?

2ˆ γ(−2)(λ)] = O(n−2) is very small compared withˆVcv(λ) =

1ˆ γ(−1)(λ) and ξ?

2ˆ γ(−2)(λ), one may use

?

1?i<j?n

ξ?

i

?K − 1

K

ˆ γ(−j)(λ) − ˆ γ(λ)

?

ξ?

j

?K − 1

K

ˆ γ(−i)(λ) − ˆ γ(λ)

?

(6.1)

Page 17

272L. TIAN AND OTHERS

ˆ γ(λ) − γ0≈

n

?

i=1

ϒi

and

ˆ γ(−i)(λ) − γ0≈

K

K − 1

?

i / ∈Dki

ϒi,

where ϒiis the mean zero influence function from the ith observation for ˆ γ(λ). Therefore,

d(λ) = E[ξ?

1ˆ γ(−1)(λ)ξ?

2ˆ γ(−2)(λ)] ≈

?

1 −

1

K2

?

E[ξ?

1ϒ2ξ?

2ϒ1],

which can be approximated byˆd(λ) and one may useˆVcv(λ) + (n − 1)ˆd(λ)/n as the variance estimator

for the augmented estimator. Note that the difference betweenˆVcvand its modified version appears to be

negligible in all the numerical studies presented in the paper.

ACKNOWLEDGEMENTS

The authors are grateful to the editor and reviewers for their insightful comments. Conflict of Interest:

None declared.

FUNDING

National Institutes of Health (R01 AI052817, RC4 CA155940, U01 AI068616, UM1 AI068634, R01

AI024643, U54 LM008748, R01 HL089778).

REFERENCES

FLEMMING, T. AND HARRINGTON, D. (1991). Counting Processes and Survival Analysis. New York: Wiley.

GILBERT, P. B., SATO, M., SUN, X. AND MEHROTRA, D. V. (2009). Efficient and robust method for comparing

the immunogenicity of candidate vaccines in randomized clinical trials. Vaccine 27, 396–401.

HALL, A. (2005). Generalized Method of Moments (Advanced Texts in Econometrics). London: Oxford University

Press.

KOCH, G., TANGEN, C., JUNG, J. AND AMARA, I. (1998). Issues for covariance analysis of dichotomous and

ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them. Statis-

tics in Medicine 17, 1863–1892.

LEON, S., TSIATIS, A. AND DAVIDIAN, M. (2003). Semiparametric efficiency estimation of treatment effect in a

pretest-posttest study. Biometrics 59, 1046–1055.

LU, X. AND TSIATIS, A. (2008). Improving efficiency of the log-rank test using auxiliary covariates. Biometrika 95,

676–694.

POCOCK, S. AND SIMON, R. (1975). Sequential treatment assignment with balancing for prognostic factors in the

controlled clinical trial. Biometrics 31, 102–115.

SHAO, J., YU, X. AND ZHONG, B. (2010). A theory for testing hypotheses under covariate-adaptive randomization.

Biometrika 97, 347–360.

THERNEAU, T. AND GRAMBSCH, P. (2000).ModelingSurvivalData:ExtendingtheCoxModel.NewYork:Springer.

TIBSHIRANI, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society,

Series B 58, 267–288.

TSIATIS, A. (2006). Semiparametric Theory and Missing Data. New York: Springer.

TSIATIS, A., DAVIDIAN, M., ZHANG, M. AND LU, X. (2008). Covariate adjustment for two-sample treat-

ment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics in Medicine 27,

4658–4677.

Page 18

On the covariate-adjusted estimation for the treatment difference

273

ZHANG, M. AND GILBERT, P. B. (2010). Increasing the efficiency of prevention trials by incorporating baseline

covariates. Statistical of Communications in Infectious Diseases 2. http://www.bepress.com/scid/vol2/iss1/art1.

doi:10.2202/1948–4690.1002.

ZHANG, M., TSIATIS, A. AND DAVIDIAN, M. (2008). Improving efficiency of inferences in randomized clinical

trials using auxiliary covariates. Biometrics 64, 707–715.

[Received July 17, 2011; revised November 20, 2011; accepted for publication December 2, 2011]