A Bioinformatician's Guide to Metagenomics

Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA.
Microbiology and molecular biology reviews: MMBR (Impact Factor: 14.61). 01/2009; 72(4):557-78, Table of Contents. DOI: 10.1128/MMBR.00009-08
Source: PubMed


As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  • Source
    • "However, due to the lack of whole genome information from the individual members of the community, details on the metabolic pathways and the relationships among members of the community are not well understood. In metagenomic analysis, grouping sequences from a particular genome from microbial community sequencing data is an important step referred to as binning (Kunin et al., 2008; Mande et al., 2012). The binning process can greatly reduce the complexity of metagenomics data by grouping similar sequences together followed by assembly and annotation to the individual genome bins. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Microcystis bloom, a cyanobacterial mass occurrence often found in eutrophicated water bodies, is one of the most serious threats to freshwater ecosystems worldwide. In nature, Microcystis forms aggregates or colonies that contain heterotrophic bacteria. The Microcystis-bacteria colonies were persistent even when they were maintained in lab culture for a long period. The relationship between Microcystis and the associated bacteria was investigated by a metagenomic approach in this study. We developed a visualization-guided method of binning for genome assembly after total colony DNA sequencing. We found that the method was effective in grouping sequences and it did not require reference genome sequence. Individual genomes of the colony bacteria were obtained and they provided valuable insights into microbial community structures. Analysis of metabolic pathways based on these genomes revealed that while all heterotrophic bacteria were dependent upon Microcystis for carbon and energy, Vitamin B12 biosynthesis, which is required for growth by Microcystis, was accomplished in a cooperative fashion among the bacteria. Our analysis also suggests that individual bacteria in the colony community contributed a complete pathway for degradation of benzoate, which is inhibitory to the cyanobacterial growth, and its ecological implication for Microcystis bloom is discussed.
    Preview · Article · Feb 2016 · Frontiers in Microbiology
  • Source
    • "Metagenome assembly is a critical step, since researchers are dealing with an unknown number of different genomes, and the possibility of assembling a chimeric sequence is real. It is well known that NGS platforms produce shorter reads than traditional dideoxynucletide sequencing, and short reads are more difficult to assemble, especially for metagenomics (Raes et al., 2007; Kunin et al., 2008). In order to minimize the effect of this sequence mosaic, bioinformaticians have been dedicated to discovering new assembly algorithms and pipelines, which will now be discussed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years a major worldwide problem has arisen with regard to infectious diseases caused by resistant bacteria. Resistant pathogens are related to high mortality and also to enormous healthcare costs. In this field, cultured microorganisms have been commonly focused in attempts to isolate antibiotic resistance genes or to identify antimicrobial compounds. Although this strategy has been successful in many cases, most of the microbial diversity and related antimicrobial molecules have been completely lost. As an alternative, metagenomics has been used as a reliable approach to reveal the prospective reservoir of antimicrobial compounds and antibiotic resistance genes in the uncultured microbial community that inhabits a number of environments. In this context, this review will focus on resistance genes as well as on novel antibiotics revealed by a metagenomics approach from the soil environment. Biotechnology prospects are also discussed, opening new frontiers for antibiotic development.
    Full-text · Conference Paper · Sep 2014
  • Source
    • "In contrast to whole genome sequencing, metagenomes comprise a variety of differentially abundant species, and there can be substantial interpopulation diversity within a single species. Whether two reads originate from the same gene as an entity is depending on many factors; abundance of the organism in the sample, size and copy number of the gene in the original sample [8], [9], effectiveness of enrichment strategies [10], amplification biases introduced during random amplification [11]–[13], biases inherent to next-generation sequencing protocols [14], and depth of sequencing and read lengths [15]. In theory, two reads of the same taxonomic unit should be assembled into a single contig if they have sufficient overlap. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Pathogen surveillance in animals does not provide a sufficient level of vigilance because it is generally confined to surveillance of pathogens with known economic impact in domestic animals and practically nonexistent in wildlife species. As most (re-)emerging viral infections originate from animal sources, it is important to obtain insight into viral pathogens present in the wildlife reservoir from a public health perspective. When monitoring living, free-ranging wildlife for viruses, sample collection can be challenging and availability of nucleic acids isolated from samples is often limited. The development of viral metagenomics platforms allows a more comprehensive inventory of viruses present in wildlife. We report a metagenomic viral survey of the Western Arctic herd of barren ground caribou (Rangifer tarandus granti) in Alaska, USA. The presence of mammalian viruses in eye and nose swabs of 39 free-ranging caribou was investigated by random amplification combined with a metagenomic analysis approach that applied exhaustive iterative assembly of sequencing results to define taxonomic units of each metagenome. Through homology search methods we identified the presence of several mammalian viruses, including different papillomaviruses, a novel parvovirus, polyomavirus, and a virus that potentially represents a member of a novel genus in the family Coronaviridae.
    Full-text · Article · Aug 2014 · PLoS ONE
Show more