Vol. 21 no. 20 2005, pages 3940–3941
Data and text mining
ROCR: visualizing classifier performance in R
Tobias Sing1,?, Oliver Sander1, Niko Beerenwinkel2and Thomas Lengauer1
1Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics,
Stuhlsatzenhausweg 85, 66123 Saarbru ¨cken, Germany and2Department of Mathematics,
University of California, Berkeley, CA 94720-3840, USA
Received on March 10, 2005; revised on June 1, 2005; accepted on August 9, 2005
Advance Access publication August 11, 2005
Summary: ROCR is a package for evaluating and visualizing the per-
formance of scoring classifiers in the statistical language R. It features
over 25 performance measures that can be freely combined to create
two-dimensional performance curves. Standard methods for investig-
ating trade-offs between specific performance measures are available
within a uniform framework, including receiver operating characteristic
(ROC) graphs,precision/recall plots,lift chartsand cost curves. ROCR
integrates tightly with R’s powerful graphics capabilities, thus allowing
for highly adjustable plots. Being equipped with only three commands
and reasonable default values for optional parameters, ROCR com-
bines flexibility with ease of usage.
Availability: http://rocr.bioinf.mpi-sb.mpg.de. ROCR can be used
under the terms of the GNU General Public License. Running within
R, it is platform-independent.
Pattern classification has become a central tool in bioinformatics,
offering rapid insights into large data sets (Baldi and Brunak, 2001).
While one area of our work involves predicting phenotypic prop-
erties of HIV-1 from genotypic information (Beerenwinkel et al.,
2002, 2003; Sing et al., 2004), scoring or ranking predictors are also
vital inawide range ofother biologicalproblems. Examples include
microarray analysis (e.g. prediction of tissue condition based on
gene expression), protein structural and functional characterization
(remote homology detection, prediction of post-translational modi-
fications and molecular function annotation based on sequence or
structural motifs), genome annotation (gene finding and splice site
identification), protein–ligand interactions (virtual screening and
molecular docking) and structure–activity relationships (predicting
bioavailability or toxicity of drug compounds). In many of these
and extensive noise due to variability in experimental assays com-
plicate predictive modelling. Thus, careful predictor validation is
The real-valued output of scoring classifiers is turned into a bin-
ary class decision by choosing a cutoff. As no cutoff is optimal
according to all possible performance criteria, cutoff choice
involves a trade-off among different measures. Typically, a
trade-off between a pair of criteria (e.g. sensitivity versus
specificity) is visualized as a cutoff-parametrized curve in the
plane spanned by the two measures. Popular examples of
such trade-off visualizations include receiver operating char-
acteristic (ROC) graphs, sensitivity/specificity curves, lift charts
and precision/recall plots. Fawcett (2004) provides a general intro-
duction into evaluating scoring classifiers with a focus on
Although functions for drawing ROC graphs are provided by the
Bioconductor project (http://www.bioconductor.org) or by the
machine learning package Weka (http://www.cs.waikato.ac.nz/
?ml), for example, no comprehensive evaluation suite is available
to date. ROCR is a flexible evaluation package for R (http://www.
r-project.org), a statistical language that is widely used in biomed-
ical data analysis. Our tool allows for creating cutoff-parametrized
performance curves by freely combining two out of more than
25 performance measures (Table 1). Curves from different cross-
validation or bootstrapping runs can be averaged by various meth-
ods. Standard deviations, standard errors and box plots are available
to summarize the variability across the runs. The parametrization
can be visualized by printing cutoff values at the corresponding
curve positions, or by coloring the curve according to the cutoff. All
components of a performance plot are adjustable using a flexible
mechanism for dispatching optional arguments. Despite this flex-
ibility, ROCR is easy to use, with only three commands and reas-
onable default values for all optional parameters.
In the example below, we will briefly introduce ROCR’s three
commands—prediction, performance and plot—applied
to a 10-fold cross-validation set of predictions and correspond-
ing class labels from a study on predicting HIV coreceptor usage
from the sequence of the viral envelope protein. After loading
the dataset, a prediction object is created from the raw predic-
tions and class labels.
pred <- prediction(
Performance measures or combinations thereof are computed by
invoking the performance method on this prediction object.
The resulting performance object can be visualized using the
method plot. For example, an ROC curve that trades off
the rate of true positives against the rate of false positives is
obtained as follows:
perf <- performance(pred, "tpr", "fpr")
?To whom correspondence should be addressed.
? The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: firstname.lastname@example.org
The optional parameter avg selects a particular form of
performance curve averaging across the validation runs; the visu-
alization of curve variability is determined with the parameter
Issuing demo(ROCR) starts a demonstration of further graphical
capabilities of ROCR. The command help(package=ROCR)
points to the available help pages. In particular, a complete
list of available performance measures can be obtained via
help(performance). A reference manual can be downloaded
from the ROCR website.
In conclusion, ROCR is a comprehensive tool for evaluating
scoring classifiers and producing publication-quality figures. It
allows for studying the intricacies inherent to many biological data-
sets and their implications on classifier performance.
Work at MPI supported by EU NoE BioSapiens (LSHG-CT-2003-
Conflict of Interest: none declared.
Baldi,P. and Brunak,S. (2001) Bioinformatics: The Machine Learning Approach.
MIT Press, Cambridge, MA.
Beerenwinkel,N. et al. (2003) Geno2pheno: estimating phenotypic drug resistance
from HIV-1 genotypes. Nucleic Acids Res., 31, 3850–3855.
Beerenwinkel,N. et al. (2002) Diversity and complexity of HIV-1 drug resistance: a
bioinformatics approach to predicting phenotype from genotype. Proc. Natl Acad.
Sci. USA, 99, 8271–8276.
Fawcett,T. (2004) ROC graphs: notes and practical considerations for researchers.
Technical Report HPL-2003-4. HP Labs, Palo Alto, CA.
Sing,T., Beerenwinkel,N. and Lengauer,T. (2004) Learning mixtures of localized rules
by maximizing the area under the ROC curve. Valencia, Spain. In Proceedings of
the 1st International Workshop on ROC Analysis in Artificial Intelligence, 89–96.
Fig. 1. Visualizationsofclassifierperformance(HIVcoreceptorusagedata):(a)receiveroperatingcharacteristic(ROC)curve;(b)peakaccuracyacrossarange
of cutoffs; (c) absolute difference between empirical and predicted rate of positives for windowed cutoff ranges, in order to evaluate how well the scores are
calibrated as probability estimates. Owing to the probabilistic interpretation, cutoffs need to be in the interval [0,1], in contrast to other performance plots.
(d) Score density estimates for the negative (solid) and positive (dotted) class.
Table 1. Performance measures in the ROCR package
Contingency ratios error rate, accuracy, sensitivity, specificity,
true/false positive rate, fallout, miss,
precision, recall, negative predictive
Phi/Matthews correlation coefficient,
mutual information, c2test statistic,
F-measure, lift, precision-recall
ROC convex hull, area under
the ROC curve
calibration error, mean cross-entropy,
root mean-squared error
expected cost, explicit cost
Performance in ROC space
ROCR: Visualizing clasifier performance in R