Søreide, K.: Receiver-operating characteristic (ROC) curve analysis in diagnostic, prognostic and predictive biomarker research. Clinical Pathology 62, 1-5

Department of Surgery, Stavanger University Hospital, Stavanger, Norway.
Journal of clinical pathology (Impact Factor: 2.92). 10/2008; 62(1):1-5. DOI: 10.1136/jcp.2008.061010
Source: PubMed


From a clinical perspective, biomarkers may have a variety of functions, which correspond to different stages in the disease development, e.g. in the progression of cancer. Biomarkers can assist in the care of patients for screening, diagnosis, prognosis, prediction and surveillance. Fundamental for the use of biomarkers in all situations is biomarker accuracy - the ability to correctly classify one condition and/or outcome from another. Receiver-operating characteristic (ROC) curve analysis is a useful tool in assessment of biomarker accuracy. Its advantages include testing accuracy across the entire range of scores and thereby not requiring a predetermined cut-off point, in addition to easily examined visual and statistical comparisons across tests or scores, and, finally, independence from outcome prevalence. Further, ROC curve analysis is a useful tool for evaluating the accuracy of a statistical model that classifies subjects into one of two categories. Diagnostic models are different from predictive and prognostic models in that the latter incorporate time-to-event analysis, for which censored data may pose a weakness of the model, or the reference standard. However, with the appropriate use of ROC curves, investigators of biomarkers can improve their research and presentation of results. ROC curves help identify the most appropriate classification rules. ROC curves avoid confounding resulting from varying thresholds with subjective ratings. The ROC curve results should always be put in perspective, because a good classifier does not guarantee the eventual clinical outcome, in particular for time-dependant events in screening, prediction, and/or prognosis studies where particular statistical precautions and methods are needed.

49 Reads
  • Source
    • "Rice (2003) and Piet & Rice (2004) recognized this as a thresholdresponse mode of fisheries management, amenable to the Signal Detection Theory (SDT; Egan 1975), used to quantify the probability that an observer (operator) may respond when thresholds are exceeded. We take an analogous approach, similar to receiver–operator characteristics (ROC; Metz 1978; Søreide 2009) to quantify the evidential support behind management of the harvest rate () via total allowable catch (TAC) setting. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of regulating the rate of harvesting a natural resource, taking account of the wider system represented by a set of ecological and economic indicators, given differing stakeholder priorities. This requires objective and transparent decision making to show how indicators impinge on the resulting regulation decision. We offer a new scheme for combining indicators, derived from assessing the suitability of lowering versus not lowering the harvest rate based on indicator values relative to their pre-defined reference levels. Using the practical example of fisheries management under an ‘ecosystem approach’, we demonstrate how different stakeholder views can be quantitatively represented by weighting-sets applied to these comparisons. Using the scheme in an analysis of historical data from the Celtic Sea fisheries, we find great scope for negotiating agreement among disparate stakeholders.This article is protected by copyright. All rights reserved.
    Conservation Letters 08/2015; 00:n/a-n/a. DOI:10.1111/conl.12177 · 7.24 Impact Factor
  • Source
    • "These missing predictions were not taken into account when calculating the statistics. The performance of each model is visualized in receiver operating characteristic (ROC) plots (Soreide, 2009). These plots illustrate the performance of a model in terms of true positive rate (sensitivity) and true negative rate (1-specificity), or alternatively , in terms of PPV and 1-NPV. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Low molecular weight (LMW) respiratory sensitizers can cause occupational asthma but due to a lack of adequate test methods, prospective identification of respiratory sensitizers is currently not possible. This paper presents the evaluation of Structure-Activity Relationship models (SARs) as potential methods to prospectively conclude on the sensitization potential of LMW chemicals. The predictive performance of the SARs calculated from their training sets was compared to their performance on a dataset of newly identified respiratory sensitizers and non-sensitizers, derived from literature. The predictivity of the available SARs for new substances was markedly lower than their published predictive performance. For that reason, no single SAR model can be considered sufficiently reliable to conclude on potential LMW respiratory sensitization properties of a substance. The individual applicability domains of the models were analyzed for adequacies and deficiencies. Based on these findings, a tiered prediction approach is subsequently proposed. This approach combines the two SARs with the highest positive and negative predictivity taking into account model specific chemical applicability domain issues. The tiered approach provided reliable predictions for one third of the respiratory sensitizers and non-sensitizers of the external validation set compiled by us. For these chemicals, a positive predictive value of 96% and a negative predictive value of 89% was obtained. The tiered approach was not able to predict the other two thirds of the chemicals, meaning that additional information is required and that there is an urgent need for other test methods, e.g. in chemico or in vitro, to reach a reliable conclusion.
    Toxicological Sciences 09/2014; 142(2). DOI:10.1093/toxsci/kfu188 · 3.85 Impact Factor
  • Source
    • "To assess the relationship of the items toward PTB diagnosis, we used logistic regression analysis. The discriminating ability of significant items with regard to PTB-diagnosis was assessed with receivers operating characteristic (ROC) analysis (66). Negative predictive value, that is, the probability of a suspect in our cohort not having PTB if the item was absent, and the negative likelihood ratio (LR), that is, the ratio between the false negative tests among patients having the disease and true negative tests among healthy patients, were assessed to describe the items ability to exclude PTB. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background : The tuberculosis (TB) case detection rate has stagnated at 60% due to disorganized case finding and insensitivity of sputum smear microscopy. Of the identified TB cases, 4% die while being treated, monitored with tools that insufficiently predict failure/mortality. Objective : To explore the TBscore, a recently proposed clinical severity measure for pulmonary TB (PTB) patients, and to refine, validate, and investigate its place in case finding. Design : The TBscore's inter-observer agreement was assessed and compared to the Karnofsky Performance Score (KPS) (paper I). The TBscore's variables underlying constructs were assessed, sorting out unrelated items, proposing a more easily assessable TBscoreII, which was validated internally and externally (paper II). Finally, TBscore and TBscoreII's place in PTB-screening was examined in paper III. Results : The inter-observer variability when grading PTB patients into severity classes was moderate for both TBscore (κ W=0.52, 95% CI 0.46-0.56) and KPS (κ W=0.49, 95% CI 0.33-0.65). KPS was influenced by HIV status, whereas TBscore was unaffected by it. In paper II, proposed TBscoreII was validated internally, in Guinea-Bissau, and externally, in Ethiopia. In both settings, a failure to bring down the score by ≥25% from baseline to 2 months of treatment predicted subsequent failure (p=0.007). Finally, in paper III, TBscore and TBscoreII were assessed in health-care-seeking adults and found to be higher in PTB-diagnosed patients, 4.9 (95% CI 4.6-5.2) and 3.9 (95% CI 3.8-4.0), respectively, versus patients not diagnosed with PTB, 3.0 (95% CI 2.7-3.2) and 2.4 (95% CI 2.3-2.5), respectively. Had we referred only patients with cough >2 weeks to sputum smear, we would have missed 32.1% of the smear confirmed cases in our cohort. A TBscoreII>=2 missed 8.6%. Conclusions : TBscore and TBscoreII are useful monitoring tools for PTB patients on treatment, as they could fill the void which currently exists in risk grading of patients. They may also have a role in PTB screening; however, this requires our findings to be repeated elsewhere.
    Global Health Action 05/2014; 7(1):24303. DOI:10.3402/gha.v7.24303 · 1.93 Impact Factor
Show more