TaxMan: A server to trim rRNA reference databases and inspect taxonomic coverage

Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands.
Nucleic Acids Research (Impact Factor: 9.11). 05/2012; 40(Web Server issue):W82-7. DOI: 10.1093/nar/gks418
Source: PubMed


Amplicon sequencing of the hypervariable regions of the small subunit ribosomal RNA gene is a widely accepted method for identifying the members of complex bacterial communities. Several rRNA gene sequence reference databases can be used to assign taxonomic names to the sequencing reads using BLAST, USEARCH, GAST or the RDP classifier. Next-generation sequencing methods produce ample reads, but they are short, currently ∼100-450 nt (depending on the technology), as compared to the full rRNA gene of ∼1550 nt. It is important, therefore, to select the right rRNA gene region for sequencing. The primers should amplify the species of interest and the hypervariable regions should differentiate their taxonomy. Here, we introduce TaxMan: a web-based tool that trims reference sequences based on user-selected primer pairs and returns an assessment of the primer specificity by taxa. It allows interactive plotting of taxa, both amplified and missed in silico by the primers used. Additionally, using the trimmed sequences improves the speed of sequence matching algorithms. The smaller database greatly improves run times (up to 98%) and memory usage, not only of similarity searching (BLAST), but also of chimera checking (UCHIME) and of clustering the reads (UCLUST). TaxMan is available at


Available from: Egija Zaura
  • Source
    • "The minimum confidence was set at 0.8. For taxonomy assignment the SILVA rRNA database [49] was trimmed to span the targeted hypervariable regions V5– V7 as described by Brandt et al. [50]. The taxonomy assigned OTUs were aligned using PyNAST [51]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Currently there are no evidence-based ecological measures for prevention of overgrowth and subsequent infection by fungi in the oral cavity. The aim of this study was to increase our knowledge on fungal-bacterial ecological interactions. Salivary Candida abundance of 82 Dutch adults aged 58-80 years was established relative to the bacterial load by quantitative PCR analysis of the Internal Transcribed (ITS) region (Candida) and 16S rDNA gene (bacteria). The salivary microbiome was assessed using barcoded pyrosequencing of the bacterial hypervariable regions V5-V7 of 16S rDNA. Sequencing data was preprocessed by denoising and chimera removal, clustered in Operational Taxonomic Units (OTUs) and assigned to taxonomy. Both OTU-based (PCA, diversity statistics) and phylogeny-based analyses (UniFrac, PCoA) were performed. Saliva of Dutch older adults contained 0-4 × 10(8) CFU/mL Candida with a median Candida load of 0.06%. With increased Candida load the diversity of the salivary microbiome decreased significantly (p<0.001). Increase in the Candida load correlated positively with class Bacilli, and negatively with class Fusobacteria, Flavobacteria, and Bacteroidia. Microbiomes with high Candida load were less diverse and had a distinct microbial composition towards dominance by saccharolytic and acidogenic bacteria--streptococci. The control of the acidification of the oral environment may be a potential preventive measure for Candida outgrowth that should be evaluated in longitudinal clinical intervention trials.
    PLoS ONE 08/2012; 7(8):e42770. DOI:10.1371/journal.pone.0042770 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Massively parallel sequencing allows for rapid sequencing of large numbers of sequences in just a single run. Thus, 16S ribosomal RNA (rRNA) amplicon sequencing of complex microbial communities has become possible. The sequenced 16S rRNA fragments (reads) are clustered into operational taxonomic units and taxonomic categories are assigned. Recent reports suggest that data pre-processing should be performed before clustering. We assessed combinations of data pre-processing steps and clustering algorithms on cluster accuracy for oral microbial sequence data. Results: The number of clusters varied up to two orders of magnitude depending on pre-processing. Pre-processing using both denoising and chimera checking resulted in a number of clusters that was closest to the number of species in the mock dataset (25 versus 15). Based on run time, purity and normalized mutual information, we could not identify a single best clustering algorithm. The differences in clustering accuracy among the algorithms after the same pre-processing were minor compared with the differences in accuracy among different pre-processing steps. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: or
    Bioinformatics 09/2012; 28(22). DOI:10.1093/bioinformatics/bts552 · 4.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used (MDA - Multiple Displacement Amplification) and one new primer-free method (pWGA - primase-based Whole Genome Amplification) were compared using a PCR-based method as control. Pyrosequencing of an environmental sample and Principal Component Analysis revealed that MDA impacted community profiles more strongly than pWGA and indicated that this related to species GC content, although an influence of DNA integrity could not be excluded. Subsequently, biases by species GC content, DNA integrity and fragment size were separately analysed using defined mixtures of DNA from various species. We found significantly less amplification of species with the highest GC content for MDA-based templates, and to a lesser extent for pWGA. DNA fragmentation also interfered severely: species with more fragmented DNA were less amplified with MDA and pWGA. pWGA was unable to amplify low molecular weight DNA (<1.5 kb), whereas MDA was inefficient. We conclude that pWGA is the most promising method for characterization of microbial communities in low-biomass environments and for currently planned astrobiological missions to Mars.
    Environmental Microbiology 12/2013; 16(3). DOI:10.1111/1462-2920.12365 · 6.20 Impact Factor
Show more