Page 1

arXiv:1105.1575v1 [stat.ME] 9 May 2011

Evaluating the diagnostic powers of variables and their

linear combinations when the gold standard is

continuous

Zhanfeng Wanga,b, Yuan-chin Ivan Changb,1

aDepartment of Statistics and Finance, University of Science and Technology of China,

Hefei, 230026, China

bAcademia Sinica, Taipei, Taiwan, 11529

Abstract

The receiver operating characteristic (ROC) curve is a very useful tool for

analyzing the diagnostic/classification power of instruments/classification

schemes as long as a binary-scale gold standard is available. When the gold

standard is continuous and there is no confirmative threshold, ROC curve

becomes less useful. Hence, there are several extensions proposed for evalu-

ating the diagnostic potential of variables of interest. However, due to the

computational difficulties of these nonparametric based extensions, they are

not easy to be used for finding the optimal combination of variables to im-

prove the individual diagnostic power. Therefore, we propose a new measure,

which extends the AUC index for identifying variables with good potential to

be used in a diagnostic scheme. In addition, we propose a threshold gradient

descent based algorithm for finding the best linear combination of variables

that maximizes this new measure, which is applicable even when the number

of variables is huge. The estimate of the proposed index and its asymptotic

property are studied. The performance of the proposed method is illustrated

using both synthesized and real data sets.

Keywords: ROC curve, Area under curve, Gold standard, Classification

1Corresponding author: ycchang@stat.sinica.edu.tw

Preprint submitted to Computational Statistical and Data AnalysisMay 10, 2011

Page 2

1. Introduction

The ROC curve, founded on a binary gold standard, is one of the most

important tools to measure the diagnostic power of a variable or classifier,

and its importance has been intensively studied by many authors, which

can easily be found in the literature and textbooks such as Pepe (2003) and

Krzanowski and Hand (2009). Moreover, when the number of variables is

huge, many algorithms have been proposed for finding the best combination

of variables to increase the individual classification accuracy (Su and Liu

(1993), Pepe (2003), Ma and Huang (2005), and Wang et al. (2007a)). How-

ever, in many classification or diagnostic problems, the professed binary gold

standard is essentially derived from a continuous-valued variable. If there is

no such confirmative threshold for the continuous gold standard, then the

evaluation of variables/classifiers according to the ROC curve based anal-

ysis may vary as the choices of thresholds change and therefore becomes

less informative. For example, glycosylated hemoglobin is usually used as a

primary diabetic control index, and is originally measured as a continuous-

valued variable. Health institutes, such as the World Health Organization

and National Institutes of Health (NIH), suggest a cutting point for it based

on current findings for diabetic diagnosis and control. Once its cutting point

is fixed, then the association between the variables of interests, such as new

drugs, and this binary-scale standard can be evaluated using some ROC

curve related analysis methods. However, as advances are made in science

and medicine about this disease, this criterion will be re-evaluated and re-

vised as necessary. Then, the performance evaluation of variables/classifiers

may vary as the binary-recoding scheme is changed. It is clear that an unwar-

ranted performance measure may result in misleading conclusions and may

require re-evaluation of all the available diagnostic methods again every time

a new standard is proposed. Hence, a measure that directly connects to the

continuous gold standard is always preferred, which motivates our study of a

new measure when the gold standard is continuous. Our goal in this paper is

to find a robust measure, which is not affected by the choice of cutting point

of a gold standard or how the binary outcome is derived from a continuous

gold standard.

Although there are a lot of reports about the ROC curve, there is still a

lack of study when the gold standard is not binary (Krzanowski and Hand,

2009). In Henkelman et al. (1990), they proposed a maximum likelihood

method under ordinal scale gold standard.Recently, Zhou et al. (2005),

2

Page 3

Choi et al. (2006), and Wang et al. (2007b) considered the ROC curve es-

timation problems based on some nonparametric and Bayesian approaches,

when there is no gold standard. In addition, some ROC-type analysis with-

out a binary gold standard has been considered in Obuchowski (2005) and

Obuchowski (2006), where a nonparametric method is used to construct a

new measure, and many other applications with continuous gold standard

are discussed. However, these approaches, due to computational issue, are

not easy to apply to the case that the optimal combination of variables is

of interest; especially when the number of variables is large as in modern

biological/genetic related studies (Waikar et al. (2009)).

In this paper, an extension of the AUC-type measure is proposed, which

is independent of the choice of threshold of the continuous gold standard, and

algorithms for finding the best linear combination of variables that maximizes

the proposed measure are studied. Under the joint multivariate normality

assumption, the algorithm for the linear combination can be founded us-

ing the LARS method. When this joint normality assumption is violated, we

propose a threshold gradient descent based method (TGDM) to find the opti-

mal linear combination. Thus, our algorithms also inherit the nice properties

of LARS and TGDM when dealing with the high dimensional and variable

selection problems. Numerical studies are conducted to evaluate the perfor-

mances of the proposed methods with different ranges of cutting points using

both synthesized and real data sets. The estimate of this novel measure and

its asymptotic properties are also presented.

In the next section, we first present a novel measure for evaluating the

diagnostic potential of individual variables and then an estimate of this mea-

sure. The algorithms for finding the best linear combination are discussed

in Section 3. Numerical results based on the synthesized data and some real

examples follow. A summary and conclusions are given in Section 4. The

technical details are presented in Appendix.

2. An AUC-type Measure with a Continuous Gold Standard

Before introducing a novel AUC-type measure based on a continuous gold

standard, we first fix the notation and briefly review the definition of the ROC

curve and its related measures. Let Z and Y be two continuous real-valued

random variables, where Z denotes the gold standard and Y is a variable

of interest with diagnostic potential to be measured. Then, for example, Z

is a primary index for measuring a disease and Y is some other measure of

3

Page 4

subjects that is related to the disease of interest. In some medical diagnostics,

the primary index is difficult to measure, and we are usually looking for

variables that are strongly associated with Z and easy to measure, to be used

as surrogates. That is why we need to evaluate the “level of association”

of Y to Z. Likewise, in some bioinformatical studies, in order to develop

new treatments, we would like to identify any strong associations between

some genomic related factors Y to the continuous gold standard Z. Suppose

that there is an unambiguous threshold c of Z that can be used to classify

subjects into two subgroups, and assume further that subjects with Z > c

are classified as diseased, and otherwise as members of the control group.

Then the ROC curve, for such a given c, is defined as ROC(t) ≡ SD(S−1

where SD(t) = P(Y > t|Z > c) and SC(t) = P(Y > t|Z ≤ c), and the AUC

of variable Y is defined as

C(t)),

AUC(c) = P(Y+

c > Y−

c)(1)

where random variables Y+

jects of the disease and non-disease groups with density functions f(y|Z > c)

and f(y|Z < c). That is, Y+

populations defined by {Z > c} and {Z ≤ c}, respectively. It is clear that

the AUC(c) defined in (1) is a function of c, which will change as the thresh-

old c of Z varies. Hence, when the threshold is dubious, using AUC(c) as a

measure may misjudge the diagnostic power of Y or the level of association

between Y and Z.

Let fc(t) be a probability density function defined on the range of possible

values of c, then AUCIis defined as

c and Y−

c respectively denote the Y -value of sub-

c

and Y−

c

are random variables for the sub-

AUCI

≡?AUC(t)fc(t)dt.(2)

Hence, by its definition, the proposed AUCIis independent of the choice of

cutting point for the continuous gold standard, and any monotonic trans-

formation of Y as well. This kind of threshold independent property is also

one of the important properties of the ROC curve and AUC when used as

measures of diagnostic performance. Since AUCI is defined as an integra-

tion of AUC(c) over the range of possible cutting points with respect to a

weight function fc(t), the support of fc(t) should be chosen as a subset of

the support of the density of Z. Moreover, we can use fc(t) to put different

weights on all possible cutting points of Z if there is some information about

the possible cutting point. If Z is an ordinal discrete variable, then there are

4

Page 5

only countable cutting points, and fc(t) can be chosen as a probability mass

function of all possible cutting points, and the integration of (2) becomes

AUCI

=?

ti∈CAUC(ti)fc(ti), (3)

where C is a set of all possible cutting points. In particular, when Z is binary,

we can let fc(t) be a degenerated probability density, then AUCIis the same

as the original AUC.

2.1. Estimate of AUCI

Let random variables (Yi,Zi) denote a pair of measures from subject i,

for i ≥ 1. Suppose that {(yi,zi), i = 1,...,n} are n independent observed

values of random variables (Yi,Zi), i = 1,···,n. For a given cutting point c,

a subject i, i = 1,...,n, is assigned as a “case” if zi> c and otherwise labeled

as a “control”. That is, for a given c, we divide the observed subjects into

two groups; let S1(c) and S0(c) be the case and control groups with sample

sizes n1 and n0, respectively. It is obvious that these assignments depend

on the choice of c. Then for a fixed c, the empirical estimate of AUC(c) is

defined as

ˆA(c) =

1

n0n1

?

i∈S1(c);j∈S0(c)

ψ(yi− yj), (4)

where ψ(u) = 1, if u > 0; = 0.5, if u = 0 and = 0 if u < 0.

easy to see thatˆA(c) does not exist, either c > max{zi,i = 1,···,n} or

c < min{zi,i = 1,···,n}, since for these two cases, we have either n1= 0 or

n0= 0. Therefore, in this paper, we assumeˆA(c) = 0.5 when either one of

the cases occurs.)

If the whole support of Z is considered as a possible range of cutting

points, then a natural estimate of AUCIcan be defined as

(It is

ˆAI=

?

ˆA(t)dˆFc(t), (5)

whereˆFc(t) is the empirical estimate of the cumulative distribution function

of Z based on {z1,...,zn}. However, in practice, it is rare to choose cutting

points at ranges near the two ends of the distribution of Z. Thus, instead of

the whole range of Z, we might explicitly define a weight function fc(t) on a

particular critical range. Below, we demonstrate three possible choices: (1)

5