Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous

Source: arXiv

ABSTRACT The receiver operating characteristic (ROC) curve is a very useful tool for analyzing the diagnostic/classification power of instruments/classification schemes as long as a binary-scale gold standard is available. When the gold standard is continuous and there is no confirmative threshold, ROC curve becomes less useful. Hence, there are several extensions proposed for evalu-ating the diagnostic potential of variables of interest. However, due to the computational difficulties of these nonparametric based extensions, they are not easy to be used for finding the optimal combination of variables to im-prove the individual diagnostic power. Therefore, we propose a new measure, which extends the AUC index for identifying variables with good potential to be used in a diagnostic scheme. In addition, we propose a threshold gradient descent based algorithm for finding the best linear combination of variables that maximizes this new measure, which is applicable even when the number of variables is huge. The estimate of the proposed index and its asymptotic property are studied. The performance of the proposed method is illustrated using both synthesized and real data sets.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Sensitivity and specificity are used to characterize the accuracy of a diagnostic test. Receiver operating characteristic (ROC) analysis can be used more generally to plot the sensitivity versus (1-specificity) over all possible cutoff points. We develop an ROC analysis that can be applied to diagnostic tests with and without a gold standard. Moreover, the method can be applied to multiple correlated diagnostic tests that are used on the same individual. Simulation studies were performed to assess the discrimination ability of the no-gold-standard method compared with the situation where a gold standard exists. We used the area under the ROC curve (AUC) to quantify the diagnostic accuracy of tests and the difference between AUCs to compare their accuracies. In particular, we can estimate the prevalence of disease/infection under the no-gold-standard method. The method we proposed works well in the absence of a gold standard for correlated test data. Correlation affected the width of posterior probability intervals for these differences. The proposed method was used to analyze ELISA test scores for Johne’s disease in dairy cattle.
    Journal of Agricultural Biological and Environmental Statistics 06/2006; 11(2):210-229. DOI:10.1198/108571106X110883 · 0.78 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principle choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.
    The Annals of Statistics 01/2004; 32(2). · 2.44 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We develop a Bayesian methodology for nonparametric estimation of ROC curves used for evaluation of the accuracy of a diagnostic procedure. We consider the situation where there is no perfect reference test, that is, no “gold standard”. The method is based on a multinomial model for the joint distribution of test-positive and test-negative observations. We use a Bayesian approach which assures the natural monotonicity property of the resulting ROC curve estimate. MCMC methods are used to compute the posterior estimates of the sensitivities and specificities that provide the basis for inference concerning the accuracy of the diagnostic procedure. Because there is no gold standard, identifiability requires that the data come from at least two populations with different prevalences. No assumption is needed concerning the shape of the distributions of test values of the diseased and non diseased in these populations. We discuss an application to an analysis of ELISA scores in the diagnostic testing of paratuberculosis (Johne’s Disease) for several herds of dairy cows and compare the results to those obtained from some previously proposed methods.
    Journal of Agricultural Biological and Environmental Statistics 03/2007; 12(1):128-146. DOI:10.1198/108571107X178095 · 0.78 Impact Factor

Preview (2 Sources)

Available from