TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage

Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands.
Nucleic Acids Research (Impact Factor: 8.81). 05/2012; 40(Web Server issue):W82-7. DOI: 10.1093/nar/gks418
Source: PubMed

ABSTRACT Amplicon sequencing of the hypervariable regions of the small subunit ribosomal RNA gene is a widely accepted method for identifying the members of complex bacterial communities. Several rRNA gene sequence reference databases can be used to assign taxonomic names to the sequencing reads using BLAST, USEARCH, GAST or the RDP classifier. Next-generation sequencing methods produce ample reads, but they are short, currently ∼100-450 nt (depending on the technology), as compared to the full rRNA gene of ∼1550 nt. It is important, therefore, to select the right rRNA gene region for sequencing. The primers should amplify the species of interest and the hypervariable regions should differentiate their taxonomy. Here, we introduce TaxMan: a web-based tool that trims reference sequences based on user-selected primer pairs and returns an assessment of the primer specificity by taxa. It allows interactive plotting of taxa, both amplified and missed in silico by the primers used. Additionally, using the trimmed sequences improves the speed of sequence matching algorithms. The smaller database greatly improves run times (up to 98%) and memory usage, not only of similarity searching (BLAST), but also of chimera checking (UCHIME) and of clustering the reads (UCLUST). TaxMan is available at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used (MDA - Multiple Displacement Amplification) and one new primer-free method (pWGA - primase-based Whole Genome Amplification) were compared using a PCR-based method as control. Pyrosequencing of an environmental sample and Principal Component Analysis revealed that MDA impacted community profiles more strongly than pWGA and indicated that this related to species GC content, although an influence of DNA integrity could not be excluded. Subsequently, biases by species GC content, DNA integrity and fragment size were separately analysed using defined mixtures of DNA from various species. We found significantly less amplification of species with the highest GC content for MDA-based templates, and to a lesser extent for pWGA. DNA fragmentation also interfered severely: species with more fragmented DNA were less amplified with MDA and pWGA. pWGA was unable to amplify low molecular weight DNA (<1.5 kb), whereas MDA was inefficient. We conclude that pWGA is the most promising method for characterization of microbial communities in low-biomass environments and for currently planned astrobiological missions to Mars.
    Environmental Microbiology 12/2013; DOI:10.1111/1462-2920.12365 · 6.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: 16S rDNA pyrosequencing is a powerful approach that requires extensive usage of computational methods for delineating microbial compositions. Previously, it was shown that outcomes of studies relying on this approach vastly depend on the choice of pre-processing and clustering algorithms used. However, obtaining insights into the effects and accuracy of these algorithms is challenging due to difficulties in generating samples of known composition with high enough diversity. Here, we employ in silico microbial datasets to better understand how the experimental data are transformed into taxonomic clusters by computational methods. We were able to qualitatively replicate the raw experimental pyrosequencing data after rigorous adjustments on existing simulation software. This allowed us to simulate datasets of real-life complexity, which we used to assess the influence and performance of two widely used pre-processing methods along with eleven clustering algorithms. We show that the choice, order and mode of the pre-processing methods have a larger impact on the accuracy of the clustering pipeline than the clustering methods themselves. Without pre-processing, the difference between the performances of clustering methods is large. Depending on the clustering algorithm, the most optimal analysis pipeline resulted in significant underestimations of the expected number of clusters (min. 3.4%, max. 13.6%), allowing us to make quantitative estimations of the bacterial complexity of real microbiome samples. Supplementary data are available at Bioinformatics online. The simulated datasets are available via,
    Bioinformatics 02/2014; 30(11). DOI:10.1093/bioinformatics/btu085 · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The leaf microbiome is influenced by both biotic and abiotic factors. Currently, we know little about the relative importance of these factors in determining microbiota composition and dynamics. To explore this issue, we collected weekly leaf samples over a 98-day growing season from multiple cultivars of common bean, soybean, and canola planted at three locations in Ontario, Canada, and performed Illumina-based microbiome analysis. We find that the leaf microbiota at the beginning of the season is very strongly influenced by the soil microbiota but, as the season progresses, it differentiates, becomes significantly less diverse, and transitions to having a greater proportion of leaf-specific taxa that are shared among all samples. A phylogenetic investigation of communities by reconstruction of unobserved states imputation of microbiome function inferred from the taxonomic data found significant differences between the soil and leaf microbiome, with a significant enrichment of motility gene categories in the former and metabolic gene categories in the latter. A network co-occurrence analysis identified two highly connected clusters as well as subclusters of putative pathogens and growth-promoting bacteria. These data reveal some of the complex ecological dynamics that occur in microbial communities over the course of a growing season and highlight the importance of community succession.

Full-text (2 Sources)

Available from
Jun 6, 2014