TaxMan: A server to trim rRNA reference databases and inspect taxonomic coverage

Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands.
Nucleic Acids Research (Impact Factor: 9.11). 05/2012; 40(Web Server issue):W82-7. DOI: 10.1093/nar/gks418
Source: PubMed


Amplicon sequencing of the hypervariable regions of the small subunit ribosomal RNA gene is a widely accepted method for identifying
the members of complex bacterial communities. Several rRNA gene sequence reference databases can be used to assign taxonomic
names to the sequencing reads using BLAST, USEARCH, GAST or the RDP classifier. Next-generation sequencing methods produce
ample reads, but they are short, currently ∼100–450 nt (depending on the technology), as compared to the full rRNA gene of
∼1550 nt. It is important, therefore, to select the right rRNA gene region for sequencing. The primers should amplify the
species of interest and the hypervariable regions should differentiate their taxonomy. Here, we introduce TaxMan: a web-based
tool that trims reference sequences based on user-selected primer pairs and returns an assessment of the primer specificity
by taxa. It allows interactive plotting of taxa, both amplified and missed in silico by the primers used. Additionally, using the trimmed sequences improves the speed of sequence matching algorithms. The smaller
database greatly improves run times (up to 98%) and memory usage, not only of similarity searching (BLAST), but also of chimera
checking (UCHIME) and of clustering the reads (UCLUST). TaxMan is available at

Download full-text


Available from: Egija Zaura
  • Source
    • "Although it is common to exclude bacteria containing a relative prevalence less than 1% of a microbial community (Jervis-Bardy et al., 2015), organisms at very low prevalence are capable of causing chronic disease (Silva et al., 2015) and it may be erroneous to perform such arbitrary exclusions. Furthermore, most primer sets do not detect or poorly detect certain bacteria (Brandt et al., 2012) such that an organism designated as low prevalence may be a predominant or otherwise significant microbe. Thus, in order to detect the full spectrum of microbes inhabiting a complex microenvironment, it may be necessary to perform several deep sequencing runs on the same sample using different universal primers. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Microbial metagenomics are hindered in clinical tissue samples as a result of the large relative amount of human DNA in relation to microbial DNA acting as competitive inhibitors of downstream applications. We evaluated the LOOXSTER® Enrichment Kit to separate eukaryotic and prokaryotic DNA in submucosal intestinal tissue samples having a low microbial biomass and to determine the effects of enrichment on 16s rRNA microbiota sequencing. The enrichment kit reduced the amount of human DNA in the samples 40-70% resulting in a 3.5-fold increase in the number of 16s bacterial gene sequences detected on the Illumina MiSeq platform. This increase was accompanied by the detection of 41 additional bacterial genera and 94 tentative species. The additional bacterial taxon detected accounted for as much as 25% of the total bacterial population that significantly altered the relative prevalence and composition of the intestinal microbiota. The ability to reduce the competitive inhibition created by human DNA and the concentration of bacterial DNA may allow metagenomics to be performed on complex tissues containing a low bacterial biomass.
    Full-text · Article · Nov 2015 · Journal of microbiological methods
  • Source
    • "The minimum confidence was set at 0.8. For taxonomy assignment the SILVA rRNA database [49] was trimmed to span the targeted hypervariable regions V5– V7 as described by Brandt et al. [50]. The taxonomy assigned OTUs were aligned using PyNAST [51]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Currently there are no evidence-based ecological measures for prevention of overgrowth and subsequent infection by fungi in the oral cavity. The aim of this study was to increase our knowledge on fungal-bacterial ecological interactions. Salivary Candida abundance of 82 Dutch adults aged 58-80 years was established relative to the bacterial load by quantitative PCR analysis of the Internal Transcribed (ITS) region (Candida) and 16S rDNA gene (bacteria). The salivary microbiome was assessed using barcoded pyrosequencing of the bacterial hypervariable regions V5-V7 of 16S rDNA. Sequencing data was preprocessed by denoising and chimera removal, clustered in Operational Taxonomic Units (OTUs) and assigned to taxonomy. Both OTU-based (PCA, diversity statistics) and phylogeny-based analyses (UniFrac, PCoA) were performed. Saliva of Dutch older adults contained 0-4 × 10(8) CFU/mL Candida with a median Candida load of 0.06%. With increased Candida load the diversity of the salivary microbiome decreased significantly (p<0.001). Increase in the Candida load correlated positively with class Bacilli, and negatively with class Fusobacteria, Flavobacteria, and Bacteroidia. Microbiomes with high Candida load were less diverse and had a distinct microbial composition towards dominance by saccharolytic and acidogenic bacteria--streptococci. The control of the acidification of the oral environment may be a potential preventive measure for Candida outgrowth that should be evaluated in longitudinal clinical intervention trials.
    Full-text · Article · Aug 2012 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Massively parallel sequencing allows for rapid sequencing of large numbers of sequences in just a single run. Thus, 16S ribosomal RNA (rRNA) amplicon sequencing of complex microbial communities has become possible. The sequenced 16S rRNA fragments (reads) are clustered into operational taxonomic units and taxonomic categories are assigned. Recent reports suggest that data pre-processing should be performed before clustering. We assessed combinations of data pre-processing steps and clustering algorithms on cluster accuracy for oral microbial sequence data. Results: The number of clusters varied up to two orders of magnitude depending on pre-processing. Pre-processing using both denoising and chimera checking resulted in a number of clusters that was closest to the number of species in the mock dataset (25 versus 15). Based on run time, purity and normalized mutual information, we could not identify a single best clustering algorithm. The differences in clustering accuracy among the algorithms after the same pre-processing were minor compared with the differences in accuracy among different pre-processing steps. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: or
    Full-text · Article · Sep 2012 · Bioinformatics
Show more