Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies

Department of Microbiology & Immunology, University of Michigan, Ann Arbor, Michigan, United States of America.
PLoS ONE (Impact Factor: 3.53). 12/2011; 6(12):e27310. DOI: 10.1371/journal.pone.0027310
Source: PubMed

ABSTRACT The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.

Download full-text


Available from: Dirk Gevers, Jun 26, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The elevational diversity pattern for microorganisms has received great attention recently but is still understudied, and phylogenetic relatedness is rarely studied for microbial elevational distributions. Using a bar-coded pyrosequencing technique, we examined the biodiversity patterns for soil bacterial communities of tundra ecosystem along 2000–2500 m elevations on Changbai Mountain in China. Bacterial taxonomic richness displayed a linear decreasing trend with increasing elevation. Phylogenetic diversity and mean nearest taxon distance (MNTD) exhibited a unimodal pattern with elevation. Bacterial communities were more phylogenetically clustered than expected by chance at all elevations based on the standardized effect size of MNTD metric. The bacterial communities differed dramatically among elevations, and the community composition was significantly correlated with soil total carbon (TC), total nitrogen, C:N ratio, and dissolved organic carbon. Multiple ordinary least squares regression analysis showed that the observed biodiversity patterns strongly correlated with soil TC and C:N ratio. Taken together, this is the first time that a significant bacterial diversity pattern has been observed across a small-scale elevational gradient. Our results indicated that soil carbon and nitrogen contents were the critical environmental factors affecting bacterial elevational distribution in Changbai Mountain tundra. This suggested that ecological niche-based environmental filtering processes related to soil carbon and nitrogen contents could play a dominant role in structuring bacterial communities along the elevational gradient.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Funding Information We thank our reviewers for their helpful comments and suggestions. Abstract Current limitations in culture-based methods have lead to a reliance on cul-ture-independent approaches, based principally on the comparative analysis of primary semantides such as ribosomal gene sequences. DNA can be remarkably stable in some environments, so its presence does not indicate live bacteria, but extracted ribosomal RNA (rRNA) has previously been viewed as an indicator of active cells. Stable isotope probing (SIP) involves the incorporation of heavy isotopes into newly synthesized nucleic acids, and can be used to separate newly synthesized from existing DNA or rRNA. H 2 18 O is currently the only potential universal bacterial substrate suitable for SIP of entire bacterial communities. The aim of our work was to compare soil bacterial community composition as revealed by total versus SIP-labeled DNA and rRNA. Soil was supplemented with H 2 18 O and after 38 days the DNA and RNA were co-extracted. Heavy nucleic acids were separated out by CsCl and CsTFA density centrifugation. The 16S rRNA gene pools were characterized by DGGE and pyrosequencing, and the sequence results analyzed using mothur. The majority of DNA (~60%) and RNA (~75%) from the microcosms incubated with H 2 18 O were labeled by the isotope. The analysis indicated that total and active members of the same type of nucleic acid represented similar community structures, which suggested that most dominant OTUs in the total nucleic acid extracts contained active members. It also supported that H 2 18 O was an effective universal label for SIP for both DNA and RNA. DNA and RNA-derived diversity was dissimilar. RNA from this soil more comprehensively recovered bacterial richness than DNA because the most abundant OTUs were less numerous in RNA than DNA-derived community data, and dominant OTU pools didn't mask rare OTUs as much in RNA.
    04/2015; 4(2):208-219. DOI:10.1002/mbo3.230
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high-throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols to cluster hypervariable sequences of the V4 region of the small subunit ribosomal RNA gene (18S) into OTUs. By including individually tagged single specimens and “populations” of various species in our communities, we examine the impact of intra- and interspecific diversity on OTU clustering. Communities consisting of single individuals per species generated a correspondence of 59–84% between OTU number and species richness at a 3% divergence threshold. However, when multiple individuals per species were included, the correspondence between OTU number and species richness dropped to 31–63%. Our results suggest that intraspecific variation in this marker can often exceed 3%, such that a single species does not always correspond to one OTU. We advocate the need to apply group-specific divergence thresholds when analyzing complex and taxonomically diverse communities, but also encourage the development of additional filtering steps that allow identification of artifactual rRNA gene sequences or pseudogenes that may generate spurious OTUs.
    Ecology and Evolution 04/2015; 5(11). DOI:10.1002/ece3.1485 · 1.66 Impact Factor