A Bioinformatician's Guide to Metagenomics

Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA.
Microbiology and molecular biology reviews: MMBR (Impact Factor: 14.61). 01/2009; 72(4):557-78, Table of Contents. DOI: 10.1128/MMBR.00009-08
Source: PubMed


As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

Download full-text


Available from: Alla Lapidus,

Click to see the full-text of:

Article: A Bioinformatician's Guide to Metagenomics

5.69 MB

See full-text
  • Source
    • "Metagenome assembly is a critical step, since researchers are dealing with an unknown number of different genomes, and the possibility of assembling a chimeric sequence is real. It is well known that NGS platforms produce shorter reads than traditional dideoxynucletide sequencing, and short reads are more difficult to assemble, especially for metagenomics (Raes et al., 2007; Kunin et al., 2008). In order to minimize the effect of this sequence mosaic, bioinformaticians have been dedicated to discovering new assembly algorithms and pipelines, which will now be discussed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years a major worldwide problem has arisen with regard to infectious diseases caused by resistant bacteria. Resistant pathogens are related to high mortality and also to enormous healthcare costs. In this field, cultured microorganisms have been commonly focused in attempts to isolate antibiotic resistance genes or to identify antimicrobial compounds. Although this strategy has been successful in many cases, most of the microbial diversity and related antimicrobial molecules have been completely lost. As an alternative, metagenomics has been used as a reliable approach to reveal the prospective reservoir of antimicrobial compounds and antibiotic resistance genes in the uncultured microbial community that inhabits a number of environments. In this context, this review will focus on resistance genes as well as on novel antibiotics revealed by a metagenomics approach from the soil environment. Biotechnology prospects are also discussed, opening new frontiers for antibiotic development.
    Frontiers in Microbiology; 09/2014
  • Source
    • "In contrast to whole genome sequencing, metagenomes comprise a variety of differentially abundant species, and there can be substantial interpopulation diversity within a single species. Whether two reads originate from the same gene as an entity is depending on many factors; abundance of the organism in the sample, size and copy number of the gene in the original sample [8], [9], effectiveness of enrichment strategies [10], amplification biases introduced during random amplification [11]–[13], biases inherent to next-generation sequencing protocols [14], and depth of sequencing and read lengths [15]. In theory, two reads of the same taxonomic unit should be assembled into a single contig if they have sufficient overlap. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Pathogen surveillance in animals does not provide a sufficient level of vigilance because it is generally confined to surveillance of pathogens with known economic impact in domestic animals and practically nonexistent in wildlife species. As most (re-)emerging viral infections originate from animal sources, it is important to obtain insight into viral pathogens present in the wildlife reservoir from a public health perspective. When monitoring living, free-ranging wildlife for viruses, sample collection can be challenging and availability of nucleic acids isolated from samples is often limited. The development of viral metagenomics platforms allows a more comprehensive inventory of viruses present in wildlife. We report a metagenomic viral survey of the Western Arctic herd of barren ground caribou (Rangifer tarandus granti) in Alaska, USA. The presence of mammalian viruses in eye and nose swabs of 39 free-ranging caribou was investigated by random amplification combined with a metagenomic analysis approach that applied exhaustive iterative assembly of sequencing results to define taxonomic units of each metagenome. Through homology search methods we identified the presence of several mammalian viruses, including different papillomaviruses, a novel parvovirus, polyomavirus, and a virus that potentially represents a member of a novel genus in the family Coronaviridae.
    PLoS ONE 08/2014; 9(8):e105227. DOI:10.1371/journal.pone.0105227 · 3.23 Impact Factor
  • Source
    • "Early on, culture-based approaches revealed “the great plate count anomaly” wherein only about 1% of visible microscopic cells can be cultured using conventional techniques (Staley and Konopka, 1985; Zhang and Xu, 2008; Stein and Nicol, 2011). The DNA technologies available today use genetic information to model the structure and composition of a microbial community (Venter et al., 2004; Tringe and Rubin, 2005; Hugenholtz and Tyson, 2008; Kunin et al., 2008; Vakhlu et al., 2008; Marguerat and Bähler, 2009; Metzker, 2010; Wooley et al., 2010; Simon and Daniel, 2011; Sun et al., 2011; van Elsas and Boersma, 2011; Thomas et al., 2012; Yousuf et al., 2012; Bibby, 2013; Mathieu et al., 2013). Capable of generating millions of base pairs in a matter of hours for only a few thousand dollars, the primary limitation to next-gen sequencing technologies is handling the expansive datasets and applying appropriate statistical analyses to address the biological questions at hand (Metzker, 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Plants in terrestrial systems have evolved in direct association with microbes functioning as both agonists and antagonists of plant fitness and adaptability. As such, investigations that segregate plants and microbes provide only a limited scope of the biotic interactions that dictate plant community structure and composition in natural systems. Invasive plants provide an excellent working model to compare and contrast the effects of microbial communities associated with natural plant populations on plant fitness, adaptation, and fecundity. The last decade of DNA sequencing technology advancements opened the door to microbial community analysis, which has led to an increased awareness of the importance of an organism's microbiome and the disease states associated with microbiome shifts. Employing microbiome analysis to study the symbiotic networks associated with invasive plants will help us to understand what microorganisms contribute to plant fitness in natural systems, how different soil microbial communities impact plant fitness and adaptability, specificity of host-microbe interactions in natural plant populations, and the selective pressures that dictate the structure of above-ground and below-ground biotic communities. This review discusses recent advances in invasive plant biology that have resulted from microbiome analyses as well as the microbial factors that direct plant fitness and adaptability in natural systems.
    Frontiers in Microbiology 07/2014; 5:368. DOI:10.3389/fmicb.2014.00368 · 3.99 Impact Factor
Show more