Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome

University of California Davis, United States of America
PLoS Computational Biology (Impact Factor: 4.62). 06/2012; 8(6):e1002358. DOI: 10.1371/journal.pcbi.1002358
Source: PubMed


Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies.

Download full-text


Available from: Brandi L Cantarel,
  • Source
    • "Also significant in the context of metabolic reconstruction from metagenomic datasets, a naïve pathway mapping strategy (whereby the detection of a protein implies the potential activity of all the biological pathways the protein might be involved in) can lead to an overestimation of the functional diversity of microbial communities. Parsimony approaches, as employed in the HUMAnN pipeline, are then applied to offer a more accurate representation of the functionality of a microbial community by specifically identifying the minimum set of biological pathways that can account for all the protein families detected [31] [58] [59]. While for metagenomics and metatranscriptomics relative quantification and even absolute quantification with the use of internal standards are accessible, protein abundance is harder to determine. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Some of the most transformative discoveries promising to enable the resolution of this century's grand societal challenges will most likely arise from environmental science and particularly environmental microbiology and biotechnology. Understanding how microbes interact in situ, and how microbial communities respond to environmental changes remains an enormous challenge for science. Systems biology offers a powerful experimental strategy to tackle the exciting task of deciphering microbial interactions. In this framework, entire microbial communities are considered as metaorganisms and each level of biological information (DNA, RNA, proteins and metabolites) is investigated along with in situ environmental characteristics. In this way, systems biology can help unravel the interactions between the different parts of an ecosystem ultimately responsible for its emergent properties. Indeed each level of biological information provides a different level of characterisation of the microbial communities. Metagenomics, metatranscriptomics, metaproteomics, metabolomics and SIP-omics can be employed to investigate collectively microbial community structure, potential, function, activity and interactions. Omics approaches are enabled by high-throughput 21st century technologies and this review will discuss how their implementation has revolutionised our understanding of microbial communities.
    Computational and Structural Biotechnology Journal 12/2015; 13:24–32. DOI:10.1016/j.csbj.2014.11.009
  • Source
    • "Metatranscriptome analyses typically include the assignment of the predicted function and taxonomic origin of RNA-seq reads, by directly searching metatranscriptomic sequences (bags of reads) against prokaryotic genomes (the reference genomes) (Leimena et al., 2013) or known protein sequences (Franzosa et al., 2014). This way, tools and pipelines—including MG-RAST (Meyer et al., 2008), MEGAN (Huson et al., 2011) and HUMAnN (Abubucker et al., 2012)—that have been developed for metagenome data analysis can be utilized for analyzing metatranscriptomic "
    [Show abstract] [Hide abstract]
    ABSTRACT: Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory characteristics of the microbial communities. Current metatranscriptomics projects are often carried out without matched metagenomic datasets (of the same microbial communities). For the projects that produce both metatranscriptomic and metagenomic datasets, their analyses are often not integrated. Metagenome assemblies are far from perfect, partially explaining why metagenome assemblies are not used for the analysis of metatranscriptomic datasets. Here we report a reads mapping algorithm for mapping of short reads onto a de Bruijn graph of assemblies. A hash table of junction k-mers (k-mers spanning branching structures in the de Bruijn graph) is used to facilitate fast mapping of reads to the graph. We developed an application of this mapping algorithm: a reference based approach to metatranscriptome assembly using graphs of metagenome assembly as the reference. Our results show that this new approach (called TAG) helps to assemble substantially more transcripts that otherwise would have been missed or truncated because of the fragmented nature of the reference metagenome. TAG was implemented in C++ and has been tested extensively on the linux platform. It is available for download as open source at © The Author(s) 2015. Published by Oxford University Press.
    Bioinformatics 04/2015; DOI:10.1093/bioinformatics/btv510 · 4.98 Impact Factor
  • Source
    • "In the present study, analysis of 16S amplicon sequencing data was performed using the default settings of PICRUSt (version 0.9.1). The resulting metagenomic data were entered into the HMP unified metabolic analysis network (HUMAnN) (Abubucker et al., 2012) pipeline (version 0.98) to sort individual genes into Kyoto encyclopedia of genes and genomes (KEGG) pathways representing varying proportions of each imputed sample metagenome. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The development of the infant intestinal microbiome in response to dietary and other exposures may shape long-term metabolic and immune function. We examined differences in the community structure and function of the intestinal microbiome between four feeding groups, exclusively breastfed infants before introduction of solid foods (EBF), non-exclusively breastfed infants before introduction of solid foods (non-EBF), EBF infants after introduction of solid foods (EBF+S), and non-EBF infants after introduction of solid foods (non-EBF+S), and tested whether out-of-home daycare attendance was associated with differences in relative abundance of gut bacteria. Bacterial 16S rRNA amplicon sequencing was performed on 49 stool samples collected longitudinally from a cohort of 9 infants (5 male, 4 female). PICRUSt metabolic inference analysis was used to identify metabolic impacts of feeding practices on the infant gut microbiome. Sequencing data identified significant differences across groups defined by feeding and daycare attendance. Non-EBF and daycare-attending infants had higher diversity and species richness than EBF and non-daycare attending infants. The gut microbiome of EBF infants showed increased proportions of Bifidobacterium and lower abundance of Bacteroidetes and Clostridiales than non-EBF infants. PICRUSt analysis indicated that introduction of solid foods had a marginal impact on the microbiome of EBF infants (24 enzymes overrepresented in EBF+S infants). In contrast, over 200 bacterial gene categories were overrepresented in non-EBF+S compared to non-EBF infants including several bacterial methyl-accepting chemotaxis proteins (MCP) involved in signal transduction. The identified differences between EBF and non-EBF infants suggest that breast milk may provide the gut microbiome with a greater plasticity (despite having a lower phylogenetic diversity) that eases the transition into solid foods.
    Frontiers in Cellular and Infection Microbiology 02/2015; 5. DOI:10.3389/fcimb.2015.00003 · 3.72 Impact Factor
Show more