StAR: a simple tool for the statistical comparison of ROC curves

Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile.
BMC Bioinformatics (Impact Factor: 2.67). 02/2008; 9:265. DOI: 10.1186/1471-2105-9-265
Source: PubMed

ABSTRACT As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art.
In this work we describe and release new software to assess the statistical significance of the observed difference between the AUCs of any two classifiers for a common task estimated from paired data or unpaired balanced data. The software is able to perform a pairwise comparison of many classifiers in a single run, without requiring any expert or advanced knowledge to use it. The software relies on a non-parametric test for the difference of the AUCs that accounts for the correlation of the ROC curves. The results are displayed graphically and can be easily customized by the user. A human-readable report is generated and the complete data resulting from the analysis are also available for download, which can be used for further analysis with other software. The software is released as a web server that can be used in any client platform and also as a standalone application for the Linux operating system.
A new software for the statistical comparison of ROC curves is released here as a web server and also as standalone software for the LINUX operating system.

Download full-text


Available from: Alex William Slater, Jan 10, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput technologies and the production of large amounts of data enable the discovery of essential proteins at the system level by analyzing Protein-Protein Interaction (PPI) networks, and replacing biological or chemical experiments. Furthermore, additional gene-level annotation information, such as Gene Ontology (GO) terms, helps to detect essential proteins with higher accuracy. Various centrality algorithms have been used to determine essential proteins in a PPI network, and, recently motif centrality GO, which is based on network motifs and GO terms, works best in detecting essential proteins in a Baker's yeast Saccharomyces cerevisiae PPI network, compared to other centrality algorithms. However, each centrality algorithm contributes to the detection of essential proteins with different properties, which makes the integration of them a logical next step. In this paper, we construct a new feature space, named CENT-ING-GO consisting of various centrality measures and GO terms, and provide a computational approach to predict essential proteins with various machine learning techniques. The experimental results show that CENT-ING-GO feature space improves performance over the INT-GO feature space in previous work by Acencio and Lemke in 2009. We also demonstrate that pruning a PPI with informative GO terms can improve the prediction performance further.
    Tsinghua Science & Technology 12/2012; 17(6):645-658. DOI:10.1109/TST.2012.6374366
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The popularity of social media websites like Flickr and Twitter has created enormous collections of user-generated content online. Latent in these content collections are observations of the world: each photo is a visual snapshot of what the world looked like at a particular point in time and space, for example, while each tweet is a textual expression of the state of a person and his or her environment. Aggregating these observations across millions of social sharing users could lead to new techniques for large-scale monitoring of the state of the world and how it is changing over time. In this paper we step towards that goal, showing that by analyzing the tags and image features of geo-tagged, time-stamped photos we can measure and quantify the occurrence of ecological phenomena including ground snow cover, snow fall and vegetation density. We compare several techniques for dealing with the large degree of noise in the dataset, and show how machine learning can be used to reduce errors caused by misleading tags and ambiguous visual content. We evaluate the accuracy of these techniques by comparing to ground truth data collected both by surface stations and by Earth observing satellites. Besides the immediate application to ecology, our study gives insight into how to accurately crowd-source other types of information from large, noisy social sharing datasets.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Early detection of dementia will be important for implementation of disease-modifying treatments in the near future. We aimed to investigate the diagnostic validity and reliability of the Japanese version of the revised Addenbrooke's Cognitive Examination (ACE-R J) for identifying mild cognitive impairment (MCI) and dementia. We translated and adapted the original ACE-R for use with a Japanese population. Standard tests for evaluating cognitive decline and dementing disorders were applied. A total of 242 subjects (controls = 73, MCI = 39, dementia = 130) participated in this study. The optimal cut-off scores of ACE-R J for detecting MCI and dementia were 88/89 (sensitivity 0.87, specificity 0.92) and 82/83 (sensitivity 0.99, specificity 0.99) respectively. ACE-R J was superior to the Mini-Mental State Examination in the detection of MCI (area under the curve (AUC): 0.952 vs. 0.868), while the accuracy of the two instruments did not differ significantly in identifying dementia (AUC: 0.999 vs. 0.993). The inter-rater reliability (ICC = 0.999), test-retest reliability (ICC = 0.883), and internal consistency (Cronbach's α = 0.903) of ACE-R J were excellent. ACE-R J proved to be an accurate cognitive instrument for detecting MCI and mild dementia. Further neuropsychological evaluation is required for the differential diagnosis of dementia subtypes.
    International Psychogeriatrics 08/2011; 24(1):28-37. DOI:10.1017/S1041610211001190 · 1.89 Impact Factor