ROCR: Visualizing classifier performance in R

Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Saarbrücken, Germany.
Bioinformatics (Impact Factor: 4.98). 11/2005; 21(20):3940-1. DOI: 10.1093/bioinformatics/bti623
Source: PubMed


ROCR is a package for evaluating and visualizing the performance of scoring classifiers in the statistical language R. It
features over 25 performance measures that can be freely combined to create two-dimensional performance curves. Standard methods
for investigating trade-offs between specific performance measures are available within a uniform framework, including receiver
operating characteristic (ROC) graphs, precision/recall plots, lift charts and cost curves. ROCR integrates tightly with R's
powerful graphics capabilities, thus allowing for highly adjustable plots. Being equipped with only three commands and reasonable
default values for optional parameters, ROCR combines flexibility with ease of usage.

Availability: ROCR can be used under the terms of the GNU General Public License. Running within R, it is platform-independent.

Contact: tobias.sing{at}

  • Source
    • "(The R Foundation for Statistical Computing, using procedures from the Epi[32], pROC[28]and ROCR[33]packages. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Currently, the majority of patients diagnosed with pancreatic ductal adenocarcinoma (PDAC) present with locally invasive and/or metastatic disease, resulting in five-year survival of less than 5%. The development of an early diagnostic test is, therefore, expected to significantly impact the patient's prognosis. In this study, we successfully evaluated the feasibility of identifying diagnostic cell free microRNAs (miRNAs) for early stage PDAC, through the analysis of urine samples. Using Affymetrix microarrays, we established a global miRNA profile of 13 PDAC, six chronic pancreatitis (CP), and seven healthy (H) urine specimens. Selected differentially expressed miRNAs were subsequently investigated using an independent technique (RT-PCR) on 101 urine samples including 46 PDAC, 29 CP and 26 H. Receiver operating characteristic (ROC) and logistic regression analyses were applied to determine the discriminatory potential of the candidate miRNA biomarkers. Three miRNAs (miR-143, miR-223, and miR-30e) were significantly over-expressed in patients with Stage I cancer when compared with age-matched healthy individuals (P=0.022, 0.035 and 0.04, respectively); miR-143, miR-223 and miR-204 were also shown to be expressed at higher levels in Stage I compared to Stages II-IV PDAC (P=0.025, 0.013 and 0.008, respectively). Furthermore, miR-223 and miR-204 were able to distinguish patients with early stage cancer from patients with CP (P=0.037 and 0.036). Among the three biomarkers, miR-143 was best able to differentiate Stage I (n=6) from healthy (n=26) with area under the curve (AUC) of 0.862 (95% CI 0.695-1.000), with sensitivity (SN) of 83.3% (95% CI 50.0-100.0), and specificity (SP) of 88.5% (95% CI 73.1-100.0). The combination of miR-143 with miR-30e was significantly better at discriminating between these two groups, achieving an AUC of 0.923 (95% CI 0.793-1.000), with SN of 83.3% (95% CI 50.0-100.0) and SP of 96.2% (95% CI 88.5-100.0). In this feasibility study, we demonstrate for the first time the utility of miRNA biomarkers for non-invasive, early detection of PDAC in urine specimens.
    Full-text · Article · Jan 2016 · American Journal of Cancer Research
  • Source
    • "We used linear functions for all the variables except soil moisture and rainfall. The habitat suitability model performance was assessed using a threshold independent measure of the Area Under the Curve (AUC) of the Receiver Operator Characteristic (ROC) plot using the ROCR library[49]in R. The AUC is a dimensionless metric that varies between 0 and 1, where values close to 1 represent greater model accuracy. We used a 10-fold cross validation technique, setting aside 20% of the data for validation and estimating model parameters with the remaining 80% of the observations. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Species distribution modeling has been widely used in studying habitat relationships and for conservation purposes. However, neglecting ecological knowledge about species, e.g. their seasonal movements, and ignoring the proper environmental factors that can explain key elements for species survival (shelter, food and water) increase model uncertainty. This study exemplifies how these ecological gaps in species distribution modeling can be addressed by modeling the distribution of the emu (Dromaius novaehollandiae) in Australia. Emus cover a large area during the austral winter. However, their habitat shrinks during the summer months. We show evidence of emu summer habitat shrinkage due to higher fire frequency, and low water and food availability in northern regions. Our findings indicate that emus prefer areas with higher vegetation productivity and low fire recurrence, while their distribution is linked to an optimal intermediate (~0.12 m3 m-3) soil moisture range. We propose that the application of three geospatial data products derived from satellite remote sensing, namely fire frequency, ecosystem productivity, and soil water content, provides an effective representation of emu general habitat requirements, and substantially improves species distribution modeling and representation of the species' ecological habitat niche across Australia.
    Full-text · Article · Jan 2016 · PLoS ONE
  • Source
    • "For this purpose, the values of sensitivity and specificity in relation to the predicted probability were entered on a diagram and the value corresponding to the intersection of the two curves was selected. To evaluate the performance of the model, we calculated the receiver operating characteristic (ROC) curve by plotting true positive points (occupied territory) against false positives with the ROCR package for R (Sing et al. 2005). The area under the resulting curve (AUC) indicates for each model the predictive performance expressed as an index ranging from 0 to 1 (DeLong et al. 1988). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ortolan buntings Emberiza hortulana have undergone one of the most severe population declines of any European farmland bird over the last thirty years. The aim of this study was to find out which habitat features, including crop characteristics, ortolan bunting prefers in Estonia in breeding areas. This study compared currently occupied and unoccupied ortolan bunting territories. Occupied areas contained significantly more tall broadleaf trees, crop types, structural elements (trees, bushes, roads, overhead power lines and buildings) and spring wheat, but also had lower crop drilling densities. Ortolan bunting territories were best described by a logistic regression model containing six variables: amount of structural point elements, length of power lines, amount of tall broadleaf trees and number of different crops had a positive effect, whereas crop density and area of autumn-sown crops had a negative effect. Based on the findings of this study, the following conservation measures can be recommended: lower crop densities; spring rather than autumn-sown crops; small-field systems containing a variety of crops; scattered scrub preserved or planted; habitat patches of permanent grasslands, hedges and tall broadleaf trees retained within the agricultural landscape.
    Full-text · Article · Dec 2015 · European Journal of Ecology
Show more