Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous

Source: arXiv


The receiver operating characteristic (ROC) curve is a very useful tool for analyzing the diagnostic/classification power of instruments/classification schemes as long as a binary-scale gold standard is available. When the gold standard is continuous and there is no confirmative threshold, ROC curve becomes less useful. Hence, there are several extensions proposed for evalu-ating the diagnostic potential of variables of interest. However, due to the computational difficulties of these nonparametric based extensions, they are not easy to be used for finding the optimal combination of variables to im-prove the individual diagnostic power. Therefore, we propose a new measure, which extends the AUC index for identifying variables with good potential to be used in a diagnostic scheme. In addition, we propose a threshold gradient descent based algorithm for finding the best linear combination of variables that maximizes this new measure, which is applicable even when the number of variables is huge. The estimate of the proposed index and its asymptotic property are studied. The performance of the proposed method is illustrated using both synthesized and real data sets.

Download full-text


Available from: Yuan-chin Ivan Chang, Aug 05, 2015
9 Reads
  • Source
    The Annals of Statistics 01/2004; 32(2). · 2.18 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Coronary heart disease (CHD) remains the most common cause of death among blacks, and the difference in CHD mortality between blacks and whites is growing. This trend may be due in part to higher rates of CHD risk factors among blacks. This study was done to determine the prevalence of CHD risk factors among a population-based sample of 403 rural blacks in Virginia. Community-based screening evaluations included the determination of exercise and smoking habits, blood pressure, height, weight, total and high-density lipoprotein (HDL) cholesterol, and glycosylated hemoglobin. The prevalences of smoking (32.5% of men, 20.0% of women), high cholesterol (16.6% of men, 18.9% of women) and sedentary lifestyle (37.5% of men, 66.7% of women) were similar to prevalences reported for other black populations. However, the prevalences of diabetes (13.6% of men, 15.6% of women), hypertension (30.9% of men, 43.1% of women), and obesity (38.7% of men, 64.7% of women) were higher than those reported elsewhere. Increased body mass index was significantly associated with higher prevalences of hypertension, diabetes, and low HDL cholesterol. Innovative methods are needed to decrease the high risk factor prevalences among this population.
    Southern Medical Journal 09/1997; 90(8):814-20. DOI:10.1097/00007611-199708000-00008 · 0.93 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein expression profiling for differences indicative of early cancer holds promise for improving diagnostics. Due to their high dimensionality, statistical analysis of proteomic data from mass spectrometers is challenging in many aspects such as dimension reduction, feature subset selection as well as construction of classification rules. Search of an optimal feature subset, commonly known as the feature subset selection (FSS) problem, is an important step towards disease classification/diagnostics with biomarkers. We develop a parsimonious threshold-independent feature selection (PTIFS) method based on the concept of area under the curve (AUC) of the receiver operating characteristic (ROC). To reduce computational complexity to a manageable level, we use a sigmoid approximation to the empirical AUC as the criterion function. Starting from an anchor feature, the PTIFS method selects a feature subset through an iterative updating algorithm. Highly correlated features that have similar discriminating power are precluded from being selected simultaneously. The classification rule is then determined from the resulting feature subset. The performance of the proposed approach is investigated by extensive simulation studies, and by applying the method to two mass spectrometry data sets of prostate cancer and of liver cancer. We compare the new approach with the threshold gradient descent regularization (TGDR) method. The results show that our method can achieve comparable performance to that of the TGDR method in terms of disease classification, but with fewer features selected. Supplementary Material and the PTIFS implementations are available at Supplementary data are available at Bioinformatics online.
    Bioinformatics 11/2007; 23(20):2788-94. DOI:10.1093/bioinformatics/btm442 · 4.98 Impact Factor
Show more