Schloss PD, Gevers D, Westcott SL.. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6: e27310

Department of Microbiology & Immunology, University of Michigan, Ann Arbor, Michigan, United States of America.
PLoS ONE (Impact Factor: 3.23). 12/2011; 6(12):e27310. DOI: 10.1371/journal.pone.0027310
Source: PubMed

ABSTRACT The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.

Download full-text


Available from: Dirk Gevers, Sep 27, 2015
40 Reads
  • Source
    • "Those that did not match the primer sequences, were less than 200 bp, or contained any ambiguities were excluded from further analysis. For phylotype analyses, the remaining sequences were denoised (Schloss et al., 2011) and aligned against the SILVA bacterial and archaeal 16S rRNA gene database (Release 115) (Pruesse et al., 2007) in mothur. "
    Soil Biology and Biochemistry 10/2015; 89:238-247. · 3.93 Impact Factor
  • Source
    • "Mendoza et al. (2015) provide a useful overview for navigating these challenges. From our own experience applying HTS to assay macrobial eDNA, we recommend using negative controls with carrier biomaterial (Xu et al. 2009) and positive controls with mock community DNA (Schloss et al. 2011). Different perspectives are still emerging about how control data should be used to ''correct'' co-sequenced eDNA data (Nguyen et al. 2015), but the transparent and self-critical cognitive approach recommended by Gilbert et al. (2005) for ancient DNA is highly applicable to all eDNA studies, particularly those with conservation applications where substantial economic and legal consequences may result (Kelly 2014). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Environmental DNA (eDNA) refers to the genetic material that can be extracted from bulk environmental samples such as soil, water, and even air. The rapidly expanding study of eDNA has generated unprecedented ability to detect species and conduct genetic analyses for conservation, management, and research, particularly in scenarios where collection of whole organisms is impractical or impossible. While the number of studies demonstrating successful eDNA detection has increased rapidly in recent years, less research has explored the ''ecology'' of eDNA—myriad interactions between extraorganismal genetic material and its environment—and its influence on eDNA detection, quantification, analysis, and application to conservation and research. Here, we outline a framework for understanding the ecology of eDNA, including the origin, state, transport, and fate of extraorganismal genetic material. Using this framework, we review and synthesize the findings of eDNA studies from diverse environments, taxa, and fields of study to highlight important concepts and knowledge gaps in eDNA study and application. Additionally, we identify frontiers of conservation-focused eDNA application where we see the most potential for growth, including the use of eDNA for estimating population size, population genetic and genomic analyses via eDNA, inclusion of other indicator biomole-cules such as environmental RNA or proteins, automated sample collection and analysis, and consideration of an expanded array of creative environmental samples. We discuss how a more complete understanding of the ecology of eDNA is integral to advancing these frontiers and maximizing the potential of future eDNA applications in conservation and research.
  • Source
    • "The amplicons were multiplexed and pyrosequenced using 454 FLX-Titanium technology at the J. Craig Venter Institute (Rockville , MD, USA). After removing low-quality sequences (at least Q30, sequences shorter than 200 nt., sequences with homopolymers longer than six nucleotides and sequences containing ambiguous base calls or incorrect primer sequences) with custom-Perl scripts (see Supplemental Material), reads were processed using the online tool mothur and its standard 454 SOP (Schloss et al. 2011; ). "
    Dataset: final
Show more