[Show abstract][Hide abstract] ABSTRACT: The nuclear ribosomal internal transcribed spacer (ITS) region is the primary choice for molecular identification of fungi. Its two highly variable spacers (ITS1 and ITS2) are usually species specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and blast searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS1 and ITS2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases. We introduce ITSx, a Perl-based software tool to extract ITS1, 5.8S and ITS2 – as well as full-length ITS sequences – from both Sanger and high-throughput sequencing data sets. ITSx uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences. ITSx has a very high proportion of true-positive extractions and a low proportion of false-positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITSx is rich in features and written to be easily incorporated into automated sequence analysis pipelines. * ITSx paves the way for more sensitive blast searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non-ITS sequences from any data set. This is particularly useful for amplicon-based next-generation sequencing data sets, where insidious non-target sequences are often found among the target sequences. Such non-target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.
Methods in Ecology and Evolution 10/2013; 4(10):914-919. · 5.92 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Fungi from the Ceratobasidiaceae family have important ecological roles as pathogens, saprotrophs, non-mycorrhizal endophytes, orchid mycorrhizal and ectomycorrhizal symbionts, but little is known about the distribution and evolution of these nutritional modes. All public ITS sequences of Ceratobasidiaceae were downloaded from databases, annotated with ecological and taxonomic metadata, and tested for the non-random phylogenetic distribution of nutritional modes. Phylogenetic analysis revealed six main clades within Ceratobasidiaceae and a poor correlation between molecular phylogeny and morphological–cytological characters traditionally used for taxonomy. Sequences derived from soil (representing putative saprotrophs) and orchid mycorrhiza clustered together, but remained distinct from pathogens. All nutritional modes were phylogenetically conserved in the Ceratobasidiaceae based on at least one index. Our analyses suggest that in general, autotrophic orchids form root symbiosis with available Ceratobasidiaceae isolates in soil. Ectomycorrhiza-forming capability has evolved twice within the Ceratobasidiaceae and it had a strong influence on the evolution of mycoheterotrophy and host specificity in certain orchid taxa.
[Show abstract][Hide abstract] ABSTRACT: Despite recent advances in understanding community ecology of ectomycorrhizal fungi, little is known about their spatial patterning and the underlying mechanisms driving these patterns across different ecosystems.This meta‐study aimed to elucidate the scale, rate and causes of spatial structure of ectomycorrhizal fungal communities in different ecosystems by analysing 16 and 55 sites at the local and global scales, respectively. We examined the distance decay of similarity relationship in species‐ and phylogenetic lineage‐based communities in relation to sampling and environmental variables.Tropical ectomycorrhizal fungal communities exhibited stronger distance‐decay patterns compared to non‐tropical communities. Distance from the equator and sampling area were the main determinants of the extent of distance decay in fungal communities. The rate of distance decay was negatively related to host density at the local scale. At the global scale, lineage‐level community similarity decayed faster with latitude than with longitude.Synthesis. Spatial processes play a stronger role and over a greater scale in structuring local communities of ectomycorrhizal fungi than previously anticipated, particularly in ecosystems with greater vegetation age and closer to the equator. Greater rate of distance decay occurs in ecosystems with lower host density that may stem from increasing dispersal and establishment limitation. The relatively strong latitude effect on distance decay of lineage‐level community similarity suggests that climate affects large‐scale spatial processes and may cause phylogenetic clustering of ectomycorrhizal fungi at the global scale.
Clinical and Experimental Allergy 01/2013; 101(5). · 5.43 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Reverse complementary DNA sequences – sequences that are inadvertently given backwards with all purines and pyrimidines transposed – can affect sequence analysis detrimentally unless taken into account. We present an open-source, high-throughput software tool –v-revcomp (http://www.cmde.science.ubc.ca/mohn/software.html) – to detect and reorient reverse complementary entries of the small-subunit rRNA (16S) gene from sequencing datasets, particularly from environmental sources. The software supports sequence lengths ranging from full length down to the short reads that are characteristic of next-generation sequencing technologies. We evaluated the reliability of v-revcomp by screening all 406 781 16S sequences deposited in release 102 of the curated SILVA database and demonstrated that the tool has a detection accuracy of virtually 100%. We subsequently used v-revcomp to analyse 1 171 646 16S sequences deposited in the International Nucleotide Sequence Databases and found that about 1% of these user-submitted sequences were reverse complementary. In addition, a nontrivial proportion of the entries were otherwise anomalous, including reverse complementary chimeras, sequences associated with wrong taxa, nonribosomal genes, sequences of poor quality or otherwise erroneous sequences without a reasonable match to any other entry in the database. Thus, v-revcomp is highly efficient in detecting and reorienting reverse complementary 16S sequences of almost any length and can be used to detect various sequence anomalies.
[Show abstract][Hide abstract] ABSTRACT: Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.
PLoS ONE 01/2011; 6(9):e24940. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Reverse complementary DNA sequences––sequences that are inadvertently cast backward and in which all purines and pyrimidines
are transposed––are not uncommon in sequence databases, where they may introduce noise into sequence-based research. We show
that about 1% of the public fungal ITS sequences, the most commonly sequenced genetic marker in mycology, are reverse complementary,
and we introduce an open source software solution to automate their detection and reorientation. The MacOSX/Linux/UNIX software
operates on public or private datasets of any size, although some 50 base pairs of the 5.8S gene of the ITS region are needed
for the analysis.
KeywordsDNA barcoding–Environmental sampling–Hidden Markov models–Quality assessment–Sequence identification
[Show abstract][Hide abstract] ABSTRACT: The internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit holds a central position in the pursuit of the taxonomic affiliation of fungi recovered through environmental sampling. Newly generated fungal ITS sequences are typically compared against the International Nucleotide Sequence Databases for a species or genus name using the sequence similarity software suite blast. Such searches are not without complications however, and one of them is the presence of chimeric entries among the query or reference sequences. Chimeras are artificial sequences, generated unintentionally during the polymerase chain reaction step, that feature sequence data from two (or possibly more) distinct species. Available software solutions for chimera control do not readily target the fungal ITS region, but the present study introduces a blast-based open source software package (available at http://www.emerencia.org/chimerachecker.html) to examine newly generated fungal ITS sequences for the presence of potentially chimeric elements in batch mode. We used the software package on a random set of 12 300 environmental fungal ITS sequences in the public sequence databases and found 1.5% of the entries to be chimeric at the ordinal level after manual verification of the results. The proportion of chimeras in the sequence databases can be hypothesized to increase as emerging sequencing technologies drawing from pooled DNA samples are becoming important tools in molecular ecology research.
[Show abstract][Hide abstract] ABSTRACT: DNA sequences accumulating in the International Nucleotide Sequence Databases (INSD) form a rich source of information for taxonomic and ecological meta-analyses. However, these databases include many erroneous entries, and the data itself is poorly annotated with metadata, making it difficult to target and extract entries of interest with any degree of precision. Here we describe the web-based workbench PlutoF, which is designed to bridge the gap between the needs of contemporary research in biology and the existing software resources and databases. Built on a relational database, PlutoF allows remote-access rapid submission, retrieval, and analysis of study, specimen, and sequence data in INSD as well as for private datasets though web-based thin clients. In contrast to INSD, PlutoF supports internationally standardized terminology to allow very specific annotation and linking of interacting specimens and species. The sequence analysis module is optimized for identification and analysis of environmental ITS sequences of fungi, but it can be modified to operate on any genetic marker and group of organisms. The workbench is available at http://plutof.ut.ee.
[Show abstract][Hide abstract] ABSTRACT: We introduce an open source software utility to extract the highly variable ITS1 and ITS2 subregions from fungal nuclear ITS sequences, the region of choice for environmental sampling and molecular identification of fungi. Inclusion of parts of the neighbouring, very conserved, ribosomal genes in the sequence identification process regularly leads to distorted results. The utility is available for UNIX-type operating systems, including MacOS X, and processes about 1 000 sequences per minute.