Bioinformatics for the Human Microbiome Project

University of California Davis, United States of America
PLoS Computational Biology (Impact Factor: 4.62). 11/2012; 8(11):e1002779. DOI: 10.1371/journal.pcbi.1002779
Source: PubMed


Microbes inhabit virtually all sites of the human body, yet we know very little about the role they play in our health. In recent years, there has been increasing interest in studying human-associated microbial communities, particularly since microbial dysbioses have now been implicated in a number of human diseases [1]–[3]. Dysbiosis, the disruption of the normal microbial community structure, however, is impossible to define without first establishing what “normal microbial community structure” means within the healthy human microbiome. Recent advances in sequencing technologies have made it feasible to perform large-scale studies of microbial communities, providing the tools necessary to begin to address this question [4], [5]. This led to the implementation of the Human Microbiome Project (HMP) in 2007, an initiative funded by the National Institutes of Health Roadmap for Biomedical Research and constructed as a large, genome-scale community research project [6]. Any such project must plan for data analysis, computational methods development, and the public availability of tools and data; here, we provide an overview of the corresponding bioinformatics organization, history, and results from the HMP (Figure 1).

Download full-text


Available from: Dirk Gevers, Dec 15, 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. Copyright © 2015 Zhou et al.
    mBio 01/2015; 6(1). DOI:10.1128/mBio.02288-14 · 6.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The incidence of esophageal adenocarcinoma (EAC) has increased nearly five-fold over the last four decades in the United States. Barrett's esophagus, the replacement of the normal squamous epithelial lining with a mucus-secreting columnar epithelium, is the only known precursor to EAC. Like other parts of the gastrointestinal (GI) tract, the esophagus hosts a variety of bacteria and comparisons among published studies suggest bacterial communities in the stomach and esophagus differ. Chronic infection with Helicobacter pylori in the stomach has been inversely associated with development of EAC, but the mechanisms underlying this association remain unclear. The bacterial composition in the upper GI tract was characterized in a subset of participants (n=12) of the Seattle Barrett's Esophagus Research cohort using broad-range 16S PCR and pyrosequencing of biopsy and brush samples collected from squamous esophagus, Barrett's esophagus, stomach corpus and stomach antrum. Three of the individuals were sampled at two separate time points. Prevalence of H. pylori infection and subsequent development of aneuploidy (n=339) and EAC (n=433) was examined in a larger subset of this cohort. Within individuals, bacterial communities of the stomach and esophagus showed overlapping community membership. Despite closer proximity, the stomach antrum and corpus communities were less similar than the antrum and esophageal samples. Re-sampling of study participants revealed similar upper GI community membership in two of three cases. In this Barrett's esophagus cohort, Streptococcus and Prevotella species dominate the upper GI and the ratio of these two species is associated with waist-to-hip ratio and hiatal hernia length, two known EAC risk factors in Barrett's esophagus. H. pylori-positive individuals had a significantly decreased incidence of aneuploidy and a non-significant trend toward lower incidence of EAC.
    PLoS ONE 06/2015; 10(6):e0129055. DOI:10.1371/journal.pone.0129055 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The MetaCyc database ( provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30,000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc ( is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups.
    Nucleic Acids Research 11/2011; 40(Database issue):D742-53. DOI:10.1093/nar/gkr1014 · 9.11 Impact Factor
Show more