Daniel H Huson
Research interests
-
InterestsComputational Science, Algorithm Development, Algorithm Design, Software Development, Software, Metagenomics
Publications
-
Introduction to the Analysis of Environmental Sequences: Metagenomics with MEGAN.
Methods in molecular biology (Clifton, N.J.). 01/2012; 856:415-29.
Metagenomics is the study of microbial organisms using sequencing applied directly to environmental samples. Similarly, in metatranscriptomics and metaproteomics, the RNA and protein sequences of such samples are studied. The analysis of these kinds of data often starts by asking the questions of &q... [more] Metagenomics is the study of microbial organisms using sequencing applied directly to environmental samples. Similarly, in metatranscriptomics and metaproteomics, the RNA and protein sequences of such samples are studied. The analysis of these kinds of data often starts by asking the questions of "who is out there?", "what are they doing?", and "how do they compare?". In this chapter, we describe how these computational questions can be addressed using MEGAN, the MEtaGenome ANalyzer program. We first show how to analyze the taxonomic and functional content of a single dataset and then show how such analyses can be performed in a comparative fashion. We demonstrate how to compare different datasets using ecological indices and other distance measures. The discussion is conducted using a number of published marine datasets comprising metagenomic, metatranscriptomic, metaproteomic, and 16S rRNA data.
-
3.76Impact points
Analysis of 16S rRNA environmental sequences using MEGAN.
BMC genomics. 11/2011; 12 Suppl 3:S17.
Metagenomics is a rapidly growing field of research aimed at studying assemblages of uncultured organisms using various sequencing technologies, with the hope of understanding the true diversity of microbes, their functions, cooperation and evolution. There are two main approaches to metagenomics: a... [more] Metagenomics is a rapidly growing field of research aimed at studying assemblages of uncultured organisms using various sequencing technologies, with the hope of understanding the true diversity of microbes, their functions, cooperation and evolution. There are two main approaches to metagenomics: amplicon sequencing, which involves PCR-targeted sequencing of a specific locus, often 16S rRNA, and random shotgun sequencing. Several tools or packages have been developed for analyzing communities using 16S rRNA sequences. Similarly, a number of tools exist for analyzing randomly sequenced DNA reads. We describe an extension of the metagenome analysis tool MEGAN, which allows one to analyze 16S sequences. For the analysis all 16S sequences are blasted against the SILVA database. The result output is imported into MEGAN, using a synonym file that maps the SILVA accession numbers onto the NCBI taxonomy. Environmental samples are often studied using both targeted 16S rRNA sequencing and random shotgun sequencing. Hence tools are needed that allow one to analyze both types of data together, and one such tool is MEGAN. The ideas presented in this paper are implemented in MEGAN 4, which is available from: http://www-ab.informatik.uni-tuebingen.de/software/megan.
-
4.93Impact points
Fast computation of minimum hybridization networks.
Bioinformatics (Oxford, England). 11/2011; 28(2):191-7.
MOTIVATION: Hybridization events in evolution may lead to incongruent gene trees. One approach to determining possible interspecific hybridization events is to compute a hybridization network that attempts to reconcile incongruent gene trees using a minimum number of hybridization events. RESULTS: W... [more] MOTIVATION: Hybridization events in evolution may lead to incongruent gene trees. One approach to determining possible interspecific hybridization events is to compute a hybridization network that attempts to reconcile incongruent gene trees using a minimum number of hybridization events. RESULTS: We describe how to compute a representative set of minimum hybridization networks for two given bifurcating input trees, using a parallel algorithm and provide a user-friendly implementation. A simulation study suggests that our program performs significantly better than existing software on biologically relevant data. Finally, we demonstrate the application of such methods in the context of the evolution of the Aegilops/Triticum genera. AVAILABILITY AND IMPLEMENTATION: The algorithm is implemented in the program Dendroscope 3, which is freely available from www.dendroscope.org and runs on all three major operating systems.
-
4.93Impact points
Tanglegrams for rooted phylogenetic trees and networks.
Bioinformatics (Oxford, England). 07/2011; 27(13):i248-56.
In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect ... [more] In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment. Thus, the question arises how to define and compute a tanglegram for such networks. In this article, we present the first formal definition of a tanglegram for rooted phylogenetic networks and present a heuristic approach for computing one, called the NN-tanglegram method. We compare the performance of our method with existing tree tanglegram algorithms and also show a typical application to real biological datasets. For maximum usability, the algorithm does not require that the trees or networks are bifurcating or bicombining, or that they are on identical taxon sets. The algorithm is implemented in our program Dendroscope 3, which is freely available from www.dendroscope.org. scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de.
-
9.43Impact points
Reference-guided assembly of four diverse Arabidopsis thaliana genomes.
Proceedings of the National Academy of Sciences of the United States of America. 06/2011; 108(25):10249-54.
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg e... [more] We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.
-
11.34Impact points
Integrative analysis of environmental sequences using MEGAN4.
Genome research. 06/2011; 21(9):1552-60.
A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxon... [more] A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxonomic analysis program. The new program, MEGAN4, provides an integrated approach to the taxonomic and functional analysis of metagenomic, metatranscriptomic, metaproteomic, and rRNA data. While taxonomic analysis is performed based on the NCBI taxonomy, functional analysis is performed using the SEED classification of subsystems and functional roles or the KEGG classification of pathways and enzymes. A number of examples illustrate how such analyses can be performed, and show that one can also import and compare classification results obtained using others' tools. MEGAN4 is freely available for academic purposes, and installers for all three major operating systems can be downloaded from www-ab.informatik.uni-tuebingen.de/software/megan.
-
4.41Impact points
LOCAS--a low coverage assembly tool for resequencing projects.
PloS one. 01/2011; 6(8):e23455.
Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and ... [more] Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking. We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime. LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.
-
3.43Impact points
Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG.
BMC bioinformatics. 01/2011; 12 Suppl 1:S21.
Metagenomics is the study of microbial organisms using sequencing applied directly to environmental samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of metagenome projects. While metagenomics provides information on the gene c... [more] Metagenomics is the study of microbial organisms using sequencing applied directly to environmental samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of metagenome projects. While metagenomics provides information on the gene content, metatranscriptomics aims at understanding gene expression patterns in microbial communities. The initial computational analysis of a metagenome or metatranscriptome addresses three questions: (1) Who is out there? (2) What are they doing? and (3) How do different datasets compare? There is a need for new computational tools to answer these questions. In 2007, the program MEGAN (MEtaGenome ANalyzer) was released, as a standalone interactive tool for analyzing the taxonomic content of a single metagenome dataset. The program has subsequently been extended to support comparative analyses of multiple datasets. The focus of this paper is to report on new features of MEGAN that allow the functional analysis of multiple metagenomes (and metatranscriptomes) based on the SEED hierarchy and KEGG pathways. We have compared our results with the MG-RAST service for different datasets. The MEGAN program now allows the interactive analysis and comparison of the taxonomical and functional content of multiple datasets. As a stand-alone tool, MEGAN provides an alternative to web portals for scientists that have concerns about uploading their unpublished data to a website.
-
A survey of combinatorial methods for phylogenetic networks.
Genome biology and evolution. 11/2010; 3:23-35.
The evolutionary history of a set of species is usually described by a rooted phylogenetic tree. Although it is generally undisputed that bifurcating speciation events and descent with modifications are major forces of evolution, there is a growing belief that reticulate events also have a role to p... [more] The evolutionary history of a set of species is usually described by a rooted phylogenetic tree. Although it is generally undisputed that bifurcating speciation events and descent with modifications are major forces of evolution, there is a growing belief that reticulate events also have a role to play. Phylogenetic networks provide an alternative to phylogenetic trees and may be more suitable for data sets where evolution involves significant amounts of reticulate events, such as hybridization, horizontal gene transfer, or recombination. In this article, we give an introduction to the topic of phylogenetic networks, very briefly describing the fundamental concepts and summarizing some of the most important combinatorial methods that are available for their computation.
-
6.40Impact points
Comparison of multiple metagenomes using phylogenetic networks based on ecological indices.
The ISME journal. 10/2010; 4(10):1236-42.
Second-generation sequencing technologies are fueling a vast increase in the number and scope of metagenome projects. There is a great need for the development of new methods for visualizing the relationships between multiple metagenomic data sets. To address this, a novel approach is presented that... [more] Second-generation sequencing technologies are fueling a vast increase in the number and scope of metagenome projects. There is a great need for the development of new methods for visualizing the relationships between multiple metagenomic data sets. To address this, a novel approach is presented that combines the use of taxonomic analysis, ecological indices and non-hierarchical clustering to provide a network representation of the relationships between different metagenome data sets. The approach is illustrated using several published data sets of different types, including metagenomes, metatranscriptomes and 16S ribosomal profiles. Application of the approach to the same data summarized at different taxonomical levels gives rise to remarkably similar networks, indicating that the analysis is very robust. Importantly, the networks provide the both visual definition and metric quantification for the non-rooted relationship between samples, combining the desirable characteristics of other tools into one.
-
3.43Impact points
Short clones or long clones? A simulation study on the use of paired reads in metagenomics.
BMC bioinformatics. 01/2010; 11 Suppl 1:S12.
Metagenomics is the study of environmental samples using sequencing. Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. Most metagenome sequencing projects so far have been based on Sanger or Roche-454 sequencing, as only these techn... [more] Metagenomics is the study of environmental samples using sequencing. Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. Most metagenome sequencing projects so far have been based on Sanger or Roche-454 sequencing, as only these technologies provide long enough reads, while Illumina sequencing has not been considered suitable for metagenomic studies due to a short read length of only 35 bp. However, now that reads of length 75 bp can be sequenced in pairs, Illumina sequencing has become a viable option for metagenome studies. This paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs. This work shows that paired reads perform better than single reads, as expected, but also, perhaps slightly less obviously, that long clones allow more specific assignments than short ones. A new version of the program MEGAN that explicitly takes paired reads into account is available from our website.
-
4.93Impact points
Visual and Statistical Comparison of Metagenomes.
Bioinformatics (Oxford, England). 07/2009;
BACKGROUND: Metagenomics is the study of the genomic content of an environmental sample of microbes. Advances in the throughput and cost-efficiency of sequencing technology is fueling a rapid increase in the number and size of metagenomic datasets being generated. Bioinformatics is faced with the pr... [more] BACKGROUND: Metagenomics is the study of the genomic content of an environmental sample of microbes. Advances in the throughput and cost-efficiency of sequencing technology is fueling a rapid increase in the number and size of metagenomic datasets being generated. Bioinformatics is faced with the problem of how to handle and analyze these datasets in an efficient and useful way. One goal of these metagenomic studies is to get a basic understanding of the microbial world both surrounding us and within us. One major challenge is how to compare multiple datasets. Furthermore, there is a need for bioinformatics tools that can process many large datasets and are easy to use. RESULTS: This paper describes two new and helpful techniques for comparing multiple metagenomic datasets. The first is a visualization technique for multiple datasets and the second is a new statistical method for highlighting the differences in a pairwise comparison. We have developed implementations of both methods that are suitable for very large datasets and provide these in Version 3 of our stand-alone metagenome analysis tool MEGAN. Conclusion: These new methods are suitable for the visual comparison of many large metagenomes and the statistical comparison of two metagenomes at a time. Nevertheless, more work needs to be done to support the comparative analysis of multiple metagenome datasets. AVAILABILITY: Version 3 of MEGAN, which implements all ideaspresented in this paper, can be obtained from our website at:www-ab.informatik.uni-tuebingen.de/software/megan. CONTACT: mitra@informatik.uni-tuebingen.de.
-
4.93Impact points
Computing galled networks from real data.
Bioinformatics (Oxford, England). 07/2009; 25(12):i85-93.
MOTIVATION: Developing methods for computing phylogenetic networks from biological data is an important problem posed by molecular evolution and much work is currently being undertaken in this area. Although promising approaches exist, there are no tools available that biologists could easily and ro... [more] MOTIVATION: Developing methods for computing phylogenetic networks from biological data is an important problem posed by molecular evolution and much work is currently being undertaken in this area. Although promising approaches exist, there are no tools available that biologists could easily and routinely use to compute rooted phylogenetic networks on real datasets containing tens or hundreds of taxa. Biologists are interested in clades, i.e. groups of monophyletic taxa, and these are usually represented by clusters in a rooted phylogenetic tree. The problem of computing an optimal rooted phylogenetic network from a set of clusters, is hard, in general. Indeed, even the problem of just determining whether a given network contains a given cluster is hard. Hence, some researchers have focused on topologically restricted classes of networks, such as galled trees and level-k networks, that are more tractable, but have the practical draw-back that a given set of clusters will usually not possess such a representation. RESULTS: In this article, we argue that galled networks (a generalization of galled trees) provide a good trade-off between level of generality and tractability. Any set of clusters can be represented by some galled network and the question whether a cluster is contained in such a network is easy to solve. Although the computation of an optimal galled network involves successively solving instances of two different NP-complete problems, in practice our algorithm solves this problem exactly on large datasets containing hundreds of taxa and many reticulations in seconds, as illustrated by a dataset containing 279 prokaryotes. AVAILABILITY: We provide a fast, robust and easy-to-use implementation of this work in version 2.0 of our tree-handling software Dendroscope, freely available from http://www.dendroscope.org.
-
11.34Impact points
The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus).
Genome research. 02/2009;
We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the... [more] We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%-15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting at a very low genetic diversity shortly before extinction. Despite the samples' heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine samples was subjected to metagenomic analysis, and showed striking differences between a wild-captured individual and a born-in-captivity one. This study therefore adds to the growing evidence that extensive sequencing of museum collections is both feasible and desirable, and can yield complete genomes.
-
3.43Impact points
Methods for comparative metagenomics.
BMC bioinformatics. 02/2009; 10 Suppl 1:S12.
BACKGROUND: Metagenomics is a rapidly growing field of research that aims at studying uncultured organisms to understand the true diversity of microbes, their functions, cooperation and evolution, in environments such as soil, water, ancient remains of animals, or the digestive system of animals and... [more] BACKGROUND: Metagenomics is a rapidly growing field of research that aims at studying uncultured organisms to understand the true diversity of microbes, their functions, cooperation and evolution, in environments such as soil, water, ancient remains of animals, or the digestive system of animals and humans. The recent development of ultra-high throughput sequencing technologies, which do not require cloning or PCR amplification, and can produce huge numbers of DNA reads at an affordable cost, has boosted the number and scope of metagenomic sequencing projects. Increasingly, there is a need for new ways of comparing multiple metagenomics datasets, and for fast and user-friendly implementations of such approaches. RESULTS: This paper introduces a number of new methods for interactively exploring, analyzing and comparing multiple metagenomic datasets, which will be made freely available in a new, comparative version 2.0 of the stand-alone metagenome analysis tool MEGAN. CONCLUSION: There is a great need for powerful and user-friendly tools for comparative analysis of metagenomic data and MEGAN 2.0 will help to fill this gap.
-
8.48Impact points
Filtered z-closure supernetworks for extracting and visualizing recurrent signal from incongruent gene trees.
Systematic biology. 01/2009; 57(6):939-47.
-
5.20Impact points
Evolution of Arabidopsis thaliana microRNAs from random sequences.
RNA (New York, N.Y.). 11/2008;
One mechanism for the origin of new plant microRNAs (miRNAs) is from inverted duplications of transcribed genes. However, even though many young MIRNA genes have recently been identified in Arabidopsis thaliana, only a subset shows evidence for having evolved by this route. We propose that the hundr... [more] One mechanism for the origin of new plant microRNAs (miRNAs) is from inverted duplications of transcribed genes. However, even though many young MIRNA genes have recently been identified in Arabidopsis thaliana, only a subset shows evidence for having evolved by this route. We propose that the hundreds of thousands of partially self-complementary foldback sequences found in a typical plant genome provide an alternative path for miRNA evolution. Our genome-wide analyses of young MIRNA genes suggest that some arose from DNA that either has self-complementarity by chance or that represents a highly eroded inverted duplication. These observations are compatible with the idea that, following capture of transcriptional regulatory sequences, random foldbacks can occasionally spawn new miRNAs. Subsequent stabilization through coevolution with initially fortuitous targets may lead to fixation of a small subset of these proto-miRNA genes.
-
4.29Impact points
Drawing explicit phylogenetic networks and their integration into SplitsTree.
BMC evolutionary biology. 02/2008; 8:22.
BACKGROUND: SplitsTree provides a framework for the calculation of phylogenetic trees and networks. It contains a wide variety of methods for the import/export, calculation and visualization of phylogenetic information. The software is developed in Java and implements a command line tool as well as ... [more] BACKGROUND: SplitsTree provides a framework for the calculation of phylogenetic trees and networks. It contains a wide variety of methods for the import/export, calculation and visualization of phylogenetic information. The software is developed in Java and implements a command line tool as well as a graphical user interface. RESULTS: In this article, we present solutions to two important problems in the field of phylogenetic networks. The first problem is the visualization of explicit phylogenetic networks. To solve this, we present a modified version of the equal angle algorithm that naturally integrates reticulations into the layout process and thus leads to an appealing visualization of these networks. The second problem is the availability of explicit phylogenetic network methods for the general user. To advance the usage of explicit phylogenetic networks by biologists further, we present an extension to the SplitsTree framework that integrates these networks. By addressing these two problems, SplitsTree is among the first programs that incorporates implicit and explicit network methods together with standard phylogenetic tree methods in a graphical user interface environment. CONCLUSION: In this article, we presented an extension of SplitsTree 4 that incorporates explicit phylogenetic networks. The extension provides a set of core classes to handle explicit phylogenetic networks and a visualization of these networks.
-
4.41Impact points
MetaSim: a sequencing simulator for genomics and metagenomics.
PLoS ONE. 02/2008; 3(10):e3373.
BACKGROUND: The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software s... [more] BACKGROUND: The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets. METHODOLOGY/PRINCIPAL FINDINGS: To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree. CONCLUSIONS/SIGNIFICANCE: MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.
-
4.41Impact points
Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome.
PLoS ONE. 02/2008; 3(6):e2527.
BACKGROUND: Soil ecosystems harbor the most complex prokaryotic and eukaryotic microbial communities on Earth. Experimental approaches studying these systems usually focus on either the soil community's taxonomic structure or its functional characteristics. Many methods target DNA as marker mole... [more] BACKGROUND: Soil ecosystems harbor the most complex prokaryotic and eukaryotic microbial communities on Earth. Experimental approaches studying these systems usually focus on either the soil community's taxonomic structure or its functional characteristics. Many methods target DNA as marker molecule and use PCR for amplification. METHODOLOGY/PRINCIPAL FINDINGS: Here we apply an RNA-centered meta-transcriptomic approach to simultaneously obtain information on both structure and function of a soil community. Total community RNA is random reversely transcribed into cDNA without any PCR or cloning step. Direct pyrosequencing produces large numbers of cDNA rRNA-tags; these are taxonomically profiled in a binning approach using the MEGAN software and two specifically compiled rRNA reference databases containing small and large subunit rRNA sequences. The pyrosequencing also produces mRNA-tags; these provide a sequence-based transcriptome of the community. One soil dataset of 258,411 RNA-tags of approximately 98 bp length contained 193,219 rRNA-tags with valid taxonomic information, together with 21,133 mRNA-tags. Quantitative information about the relative abundance of organisms from all three domains of life and from different trophic levels was obtained in a single experiment. Less frequent taxa, such as soil Crenarchaeota, were well represented in the data set. These were identified by more than 2,000 rRNA-tags; furthermore, their activity in situ was revealed through the presence of mRNA-tags specific for enzymes involved in ammonia oxidation and CO(2) fixation. CONCLUSIONS/SIGNIFICANCE: This approach could be widely applied in microbial ecology by efficiently linking community structure and function in a single experiment while avoiding biases inherent in other methods.
Following (1)
-
Gilbert Jack
University of Chicago