Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L et al. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small SubUnit rRNA sequences with curated taxonomy. Nucleic Acids Res 41: D597-D604

CNRS, UMR 7144, Adaptation et Diversité en Milieu Marin, 29682 Roscoff, France, UPMC Université Paris 06, UMR 7144, Station Biologique de Roscoff, 29682 Roscoff, France, CNRS, UMR 7138, Systématique Adaptation Evolution, Parc Valrose, BP71. F06108 Nice cedex 02, France, UMR 7138, Université de Nice-Sophia Antipolis, Systématique Adaptation Evolution, Parc Valrose, BP71. F06108 Nice cedex 02, France, Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK, Laboratoire Universitaire de Biodiversité et Ecologie Microbienne (EA3882), ESMISAB, Technopôle Brest-Iroise, 29280 Plouzané, France, Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC), Barcelona, Catalonia, Spain, Laboratoire d'Océanographie de Villefranche, Marine Microbial Ecology, UPMC Université Paris 06 et CNRS, UMR7093, Station Zoologique, BP28, 06230 Villefranche-sur-Mer, France, Department of Ecology, University of Kaiserslautern, 67663 Kaiserslautern, Germany, Department of Biology, University of Oslo, Marine Biology, NO-0316 Oslo, Norway, Department of Genetics and Evolution, University of Geneva, Switzerland, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy, Laboratory of Soil Biology, University of Neuchâtel, Rue Emile Argand 11, CH-2000, Neuchâtel, Switzerland, UPMC Université Paris 06, 29682 Roscoff, France, CNRS, FR2424, Station Biologique de Roscoff, 29682 Roscoff, France, Ifremer, Centre de Brest, DYNECO/Pelagos BP70, 29280 Plouzané, France and Point Compétence Informatique, Rue Jean-Baptiste Say, 56850 Caudan, France.
Nucleic Acids Research (Impact Factor: 9.11). 11/2012; 41(Database issue). DOI: 10.1093/nar/gks1160
Source: PubMed


The interrogation of genetic markers in environmental meta-barcoding studies is currently seriously hindered by the lack of taxonomically curated reference data sets for the targeted genes. The Protist Ribosomal Reference database (PR(2), provides a unique access to eukaryotic small sub-unit (SSU) ribosomal RNA and DNA sequences, with curated taxonomy. The database mainly consists of nuclear-encoded protistan sequences. However, metazoans, land plants, macrosporic fungi and eukaryotic organelles (mitochondrion, plastid and others) are also included because they are useful for the analysis of high-troughput sequencing data sets. Introns and putative chimeric sequences have been also carefully checked. Taxonomic assignation of sequences consists of eight unique taxonomic fields. In total, 136 866 sequences are nuclear encoded, 45 708 (36 501 mitochondrial and 9657 chloroplastic) are from organelles, the remaining being putative chimeric sequences. The website allows the users to download sequences from the entire and partial databases (including representative sequences after clustering at a given level of similarity). Different web tools also allow searches by sequence similarity. The presence of both rRNA and rDNA sequences, taking into account introns (crucial for eukaryotic sequences), a normalized eight terms ranked-taxonomy and updates of new GenBank releases were made possible by a long-term collaboration between experts in taxonomy and computer scientists.

Download full-text


Available from: Enrique Lara,
103 Reads
  • Source
    • "Because of their extreme morphological and behavioural diversity, the study of even relatively narrow lineages requires a high degree of taxonomic expertise (e.g. Guillou et al. 2012; Pawlowski and Holzmann, 2014). As a result, the knowledge of protistan ecology and evolution is limited by the small number of taxonomists, resulting in scarcity of taxonomically well-resolved ecological data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Planktonic Foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic Foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR², the Planktonic Foraminifera Ribosomal Reference database. The first version of the database includes 3,322 reference 18S rDNA sequences belonging to 32 out of the 47 known morphospecies of extant planktonic Foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR² website, available at, allows downloading the entire database or specific sections, as well as the identification of new planktonic Foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Molecular Ecology Resources 03/2015; 15(6). DOI:10.1111/1755-0998.12410 · 3.71 Impact Factor
  • Source
    • "For the 'Super-group', 'Phylum' and 'Class' levels, the taxonomic framework of PhytoREF was derived from the PR2 database (; Guillou et al. 2013), which mainly follows a comprehensive recent classification framework of eukaryotes (Adl et al. 2012). The 'Family' and 'Order' levels of terrestrial, marine and freshwater micro-and macroalgae were based on the taxonomic classification system of the AlgaeBase database (Guiry & Guiry 2014; Guiry et al. 2014; http:// "
    [Show abstract] [Hide abstract]
    ABSTRACT: Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic eco-systems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. In order to interpret environmental sequences, metabarcoding necessarily relies on taxonomically-curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6,490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3,333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. 1,867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (, is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Molecular Ecology Resources 03/2015; 15(6). DOI:10.1111/1755-0998.12401 · 3.71 Impact Factor
  • Source
    • "SSU rRNA gene sequences assembled via Cufflinks and those reconstructed with EMIRGE were classified using the Ribosome Database Project web server with an 80% cutoff for the lowest classification level (Wang et al., 2007). Unclassified sequences (generally Eukaryotic) were searched against the SILVA database (Pruesse et al., 2012), the Protist Ribosomal Reference Database (Guillou et al., 2013) and the non-redundant NCBI database using BLASTN. All sequences were then aligned using the SINA Aligner (Pruesse et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: A fundamental question in microbial ecology relates to community structure, and how this varies across environment types. It is widely believed that some environments, such as those at very low pH, host simple communities based on the low number of taxa, possibly due to the extreme environmental conditions. However, most analyses of species richness have relied on methods that provide relatively low ribosomal RNA (rRNA) sampling depth. Here we used community transcriptomics to analyze the microbial diversity of natural acid mine drainage biofilms from the Richmond Mine at Iron Mountain, California. Our analyses target deep pools of rRNA gene transcripts recovered from both natural and laboratory-grown biofilms across varying developmental stages. In all, 91.8% of the ∼254 million Illumina reads mapped to rRNA genes represented in the SILVA database. Up to 159 different taxa, including Bacteria, Archaea and Eukaryotes, were identified. Diversity measures, ordination and hierarchical clustering separate environmental from laboratory-grown biofilms. In part, this is due to the much larger number of rare members in the environmental biofilms. Although Leptospirillum bacteria generally dominate biofilms, we detect a wide variety of other Nitrospira organisms present at very low abundance. Bacteria from the Chloroflexi phylum were also detected. The results indicate that the primary characteristic that has enabled prior extensive cultivation-independent 'omic' analyses is not simplicity but rather the high dominance by a few taxa. We conclude that a much larger variety of organisms than previously thought have adapted to this extreme environment, although only few are selected for at any one time.
    The ISME Journal 11/2014; 9(4). DOI:10.1038/ismej.2014.200 · 9.30 Impact Factor
Show more