Article

The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy

CNRS, UMR 7144, Adaptation et Diversité en Milieu Marin, 29682 Roscoff, France, UPMC Université Paris 06, UMR 7144, Station Biologique de Roscoff, 29682 Roscoff, France, CNRS, UMR 7138, Systématique Adaptation Evolution, Parc Valrose, BP71. F06108 Nice cedex 02, France, UMR 7138, Université de Nice-Sophia Antipolis, Systématique Adaptation Evolution, Parc Valrose, BP71. F06108 Nice cedex 02, France, Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK, Laboratoire Universitaire de Biodiversité et Ecologie Microbienne (EA3882), ESMISAB, Technopôle Brest-Iroise, 29280 Plouzané, France, Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC), Barcelona, Catalonia, Spain, Laboratoire d'Océanographie de Villefranche, Marine Microbial Ecology, UPMC Université Paris 06 et CNRS, UMR7093, Station Zoologique, BP28, 06230 Villefranche-sur-Mer, France, Department of Ecology, University of Kaiserslautern, 67663 Kaiserslautern, Germany, Department of Biology, University of Oslo, Marine Biology, NO-0316 Oslo, Norway, Department of Genetics and Evolution, University of Geneva, Switzerland, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy, Laboratory of Soil Biology, University of Neuchâtel, Rue Emile Argand 11, CH-2000, Neuchâtel, Switzerland, UPMC Université Paris 06, 29682 Roscoff, France, CNRS, FR2424, Station Biologique de Roscoff, 29682 Roscoff, France, Ifremer, Centre de Brest, DYNECO/Pelagos BP70, 29280 Plouzané, France and Point Compétence Informatique, Rue Jean-Baptiste Say, 56850 Caudan, France.
Nucleic Acids Research (Impact Factor: 9.11). 11/2012; 41(Database issue):D597-D604. DOI: 10.1093/nar/gks1160
Source: PubMed

ABSTRACT

The interrogation of genetic markers in environmental meta-barcoding studies is currently seriously hindered by the lack of taxonomically curated reference data sets for the targeted genes. The Protist Ribosomal Reference database (PR(2), http://ssu-rrna.org/) provides a unique access to eukaryotic small sub-unit (SSU) ribosomal RNA and DNA sequences, with curated taxonomy. The database mainly consists of nuclear-encoded protistan sequences. However, metazoans, land plants, macrosporic fungi and eukaryotic organelles (mitochondrion, plastid and others) are also included because they are useful for the analysis of high-troughput sequencing data sets. Introns and putative chimeric sequences have been also carefully checked. Taxonomic assignation of sequences consists of eight unique taxonomic fields. In total, 136 866 sequences are nuclear encoded, 45 708 (36 501 mitochondrial and 9657 chloroplastic) are from organelles, the remaining being putative chimeric sequences. The website allows the users to download sequences from the entire and partial databases (including representative sequences after clustering at a given level of similarity). Different web tools also allow searches by sequence similarity. The presence of both rRNA and rDNA sequences, taking into account introns (crucial for eukaryotic sequences), a normalized eight terms ranked-taxonomy and updates of new GenBank releases were made possible by a long-term collaboration between experts in taxonomy and computer scientists.

Download full-text

Full-text

Available from: Enrique Lara
  • Source
    • "To illustrate over-and under-grouping of amplicons, the importance of the breaking phase, high-resolution clustering, and Swarm's ability to visualize OTUs' internal structures, we used 18S rRNA amplicon data from the BioMarKs consortium (Logares et al., 2014) that sampled European near-shore marine sites. The PR2 v203 reference database was used for taxonomic assignment (Guillou et al., 2013). The full methods can be found online in html format (File S1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold ( d ), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.
    Full-text · Article · Dec 2015 · PeerJ
  • Source
    • "Because of their extreme morphological and behavioural diversity, the study of even relatively narrow lineages requires a high degree of taxonomic expertise (e.g. Guillou et al. 2012; Pawlowski and Holzmann, 2014). As a result, the knowledge of protistan ecology and evolution is limited by the small number of taxonomists, resulting in scarcity of taxonomically well-resolved ecological data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Planktonic Foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic Foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR², the Planktonic Foraminifera Ribosomal Reference database. The first version of the database includes 3,322 reference 18S rDNA sequences belonging to 32 out of the 47 known morphospecies of extant planktonic Foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR² website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic Foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Full-text · Article · Mar 2015 · Molecular Ecology Resources
  • Source
    • "For the 'Super-group', 'Phylum' and 'Class' levels, the taxonomic framework of PhytoREF was derived from the PR2 database (http://ssu-rrna.org/; Guillou et al. 2013), which mainly follows a comprehensive recent classification framework of eukaryotes (Adl et al. 2012). The 'Family' and 'Order' levels of terrestrial, marine and freshwater micro-and macroalgae were based on the taxonomic classification system of the AlgaeBase database (Guiry & Guiry 2014; Guiry et al. 2014; http:// www.algaebase.org/). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic eco-systems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. In order to interpret environmental sequences, metabarcoding necessarily relies on taxonomically-curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6,490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3,333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. 1,867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.org), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Full-text · Article · Mar 2015 · Molecular Ecology Resources
Show more