StAR: A simple tool for the statistical comparison of ROC curves

Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile.
BMC Bioinformatics (Impact Factor: 2.58). 02/2008; 9(1):265. DOI: 10.1186/1471-2105-9-265
Source: PubMed


As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art.
In this work we describe and release new software to assess the statistical significance of the observed difference between the AUCs of any two classifiers for a common task estimated from paired data or unpaired balanced data. The software is able to perform a pairwise comparison of many classifiers in a single run, without requiring any expert or advanced knowledge to use it. The software relies on a non-parametric test for the difference of the AUCs that accounts for the correlation of the ROC curves. The results are displayed graphically and can be easily customized by the user. A human-readable report is generated and the complete data resulting from the analysis are also available for download, which can be used for further analysis with other software. The software is released as a web server that can be used in any client platform and also as a standalone application for the Linux operating system.
A new software for the statistical comparison of ROC curves is released here as a web server and also as standalone software for the LINUX operating system.

Download full-text


Available from: Alex William Slater, Jan 10, 2014
29 Reads
  • Source
    • "We compared the performance of the classifier by Kim, 2012 against that of the classifier by Acencio and Lemke, 2009. In addition, we used the StAR (Statistical Analysis of ROC curves) [54] which is an available tool in a web ( to determine if a result is significantly better than others, based on Mann-Whitney U-statistics test. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput technologies and the production of large amounts of data enable the discovery of essential proteins at the system level by analyzing Protein-Protein Interaction (PPI) networks, and replacing biological or chemical experiments. Furthermore, additional gene-level annotation information, such as Gene Ontology (GO) terms, helps to detect essential proteins with higher accuracy. Various centrality algorithms have been used to determine essential proteins in a PPI network, and, recently motif centrality GO, which is based on network motifs and GO terms, works best in detecting essential proteins in a Baker's yeast Saccharomyces cerevisiae PPI network, compared to other centrality algorithms. However, each centrality algorithm contributes to the detection of essential proteins with different properties, which makes the integration of them a logical next step. In this paper, we construct a new feature space, named CENT-ING-GO consisting of various centrality measures and GO terms, and provide a computational approach to predict essential proteins with various machine learning techniques. The experimental results show that CENT-ING-GO feature space improves performance over the INT-GO feature space in previous work by Acencio and Lemke in 2009. We also demonstrate that pruning a PPI with informative GO terms can improve the prediction performance further.
    Tsinghua Science & Technology 12/2012; 17(6):645-658. DOI:10.1109/TST.2012.6374366
  • Source
    • "AUC has an interpretation as the probability that a randomly chosen positive instance will be ranked above a randomly chosen negative instance [40]. We also evaluate the statistical significance of the difference in AUCs between two classifiers, using the statistical ROC analysis tool StAR[41]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Results Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. Conclusions The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available. The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at, allowing searching for and visualizing connections between given biological entities.
    BMC Bioinformatics 06/2012; 13(1):119. DOI:10.1186/1471-2105-13-119 · 2.58 Impact Factor
  • Source
    • "The figure compares the likelihood ratio confidence score from equation (1) to the baseline approaches (voting and percentage), using the tag set S={snow, snowy, snowing, snowstorm} . The area under the ROC curve (AUC) statistics are 0.929, 0.905, and 0.903 for confidence, percentage, and voting, respectively , and the improvement of the confidence method is statistically significant with p = 0.0713 according to the statistical test of [29]. The confidence method also outperforms other methods for the other three cities (not shown due to space constraints). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The popularity of social media websites like Flickr and Twitter has created enormous collections of user-generated content online. Latent in these content collections are observations of the world: each photo is a visual snapshot of what the world looked like at a particular point in time and space, for example, while each tweet is a textual expression of the state of a person and his or her environment. Aggregating these observations across millions of social sharing users could lead to new techniques for large-scale monitoring of the state of the world and how it is changing over time. In this paper we step towards that goal, showing that by analyzing the tags and image features of geo-tagged, time-stamped photos we can measure and quantify the occurrence of ecological phenomena including ground snow cover, snow fall and vegetation density. We compare several techniques for dealing with the large degree of noise in the dataset, and show how machine learning can be used to reduce errors caused by misleading tags and ambiguous visual content. We evaluate the accuracy of these techniques by comparing to ground truth data collected both by surface stations and by Earth observing satellites. Besides the immediate application to ecology, our study gives insight into how to accurately crowd-source other types of information from large, noisy social sharing datasets.
Show more