[Show abstract][Hide abstract] ABSTRACT: Stratified sulfurous lakes are appropriate environments for studying the links between composition and functionality in microbial communities and are potentially modern analogs of anoxic conditions prevailing in the ancient ocean. We explored these aspects in the Lake Banyoles karstic area (NE Spain) through metagenomics and in silico reconstruction of carbon, nitrogen and sulfur metabolic pathways that were tightly coupled through a few bacterial groups. The potential for nitrogen fixation and denitrification was detected in both autotrophs and heterotrophs, with a major role for nitrogen and carbon fixations in Chlorobiaceae. Campylobacterales accounted for a large percentage of denitrification genes, while Gallionellales were putatively involved in denitrification, iron oxidation and carbon fixation and may have a major role in the biogeochemistry of the iron cycle. Bacteroidales were also abundant and showed potential for dissimilatory nitrate reduction to ammonium. The very low abundance of genes for nitrification, the minor presence of anammox genes, the high potential for nitrogen fixation and mineralization and the potential for chemotrophic CO2 fixation and CO oxidation all provide potential clues on the anoxic zones functioning. We observed higher gene abundance of ammonia-oxidizing bacteria than ammonia-oxidizing archaea that may have a geochemical and evolutionary link related to the dominance of Fe in these environments. Overall, these results offer a more detailed perspective on the microbial ecology of anoxic environments and may help to develop new geochemical proxies to infer biology and chemistry interactions in ancient ecosystems.
The ISME Journal 01/2015; 9(7). DOI:10.1038/ismej.2014.254 · 9.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Bacterial community composition and functional potential change subtly across gradients in the surface ocean. In contrast, while there are significant phylogenetic divergences between communities from freshwater and marine habitats, the underlying mechanisms to this phylogenetic structuring yet remain unknown. We hypothesized that the functional potential of natural bacterial communities is linked to this striking divide between microbiomes. To test this hypothesis, metagenomic sequencing of microbial communities along a 1,800 km transect in the Baltic Sea area, encompassing a continuous natural salinity gradient from limnic to fully marine conditions, was explored. Multivariate statistical analyses showed that salinity is the main determinant of dramatic changes in microbial community composition, but also of large scale changes in core metabolic functions of bacteria. Strikingly, genetically and metabolically different pathways for key metabolic processes, such as respiration, biosynthesis of quinones and isoprenoids, glycolysis and osmolyte transport, were differentially abundant at high and low salinities. These shifts in functional capacities were observed at multiple taxonomic levels and within dominant bacterial phyla, while bacteria, such as SAR11, were able to adapt to the entire salinity gradient. We propose that the large differences in central metabolism required at high and low salinities dictate the striking divide between freshwater and marine microbiomes, and that the ability to inhabit different salinity regimes evolved early during bacterial phylogenetic differentiation. These findings significantly advance our understanding of microbial distributions and stress the need to incorporate salinity in future climate change models that predict increased levels of precipitation and a reduction in salinity.
PLoS ONE 02/2014; 9(2):e89549. DOI:10.1371/journal.pone.0089549 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Understanding the microbial content of the air has important scientific, health, and economic implications. While studies have primarily characterized the taxonomic content of air samples by sequencing the 16S or 18S ribosomal RNA gene, direct analysis of the genomic content of airborne microorganisms has not been possible due to the extremely low density of biological material in airborne environments. We developed sampling and amplification methods to enable adequate DNA recovery to allow metagenomic profiling of air samples collected from indoor and outdoor environments. Air samples were collected from a large urban building, a medical center, a house, and a pier. Analyses of metagenomic data generated from these samples reveal airborne communities with a high degree of diversity and different genera abundance profiles. The identities of many of the taxonomic groups and protein families also allows for the identification of the likely sources of the sampled airborne bacteria.
PLoS ONE 12/2013; 8(12):e81862. DOI:10.1371/journal.pone.0081862 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Proteomics Standard Initiative Common QUery InterfaCe (PSICQUIC) specification was created by the Human Proteome Organization Proteomics Standards Initiative (HUPO-PSI) to enable computational access to molecular-interaction data resources by means of a standard Web Service and query language. Currently providing >150 million binary interaction evidences from 28 servers globally, the PSICQUIC interface allows the concurrent search of multiple molecular-interaction information resources using a single query. Here, we present an extension of the PSICQUIC specification (version 1.3), which has been released to be compliant with the enhanced standards in molecular interactions. The new release also includes a new reference implementation of the PSICQUIC server available to the data providers. It offers augmented web service capabilities and improves the user experience. PSICQUIC has been running for almost 5 years, with a user base growing from only 4 data providers to 28 (April 2013) allowing access to 151 310 109 binary interactions. The power of this web service is shown in PSICQUIC View web application, an example of how to simultaneously query, browse and download results from the different PSICQUIC servers. This application is free and open to all users with no login requirement (http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml).
Nucleic Acids Research 05/2013; 41(Web Server issue). DOI:10.1093/nar/gkt392 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies.
[Show abstract][Hide abstract] ABSTRACT: As metagenomic studies continue to increase in their number, sequence volume and complexity, the scalability of biological analysis frameworks has become a rate-limiting factor to meaningful data interpretation. To address this issue, we have developed JCVI Metagenomics Reports (METAREP) as an open source tool to query, browse, and compare extremely large volumes of metagenomic annotations. Here we present improvements to this software including the implementation of a dynamic weighting of taxonomic and functional annotation, support for distributed searches, advanced clustering routines, and integration of additional annotation input formats. The utility of these improvements to data interpretation are demonstrated through the application of multiple comparative analysis strategies to shotgun metagenomic data produced by the National Institutes of Health Roadmap for Biomedical Research Human Microbiome Project (HMP) (http://nihroadmap.nih.gov). Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short reads as predicted by the HMP Unified Metabolic Analysis Network (HUMAnN) pipeline. Further, the capacity of METAREP to facilitate the identification and simultaneous comparison of taxonomic and functional annotations including biological pathway and individual enzyme abundances from hundreds of community samples is demonstrated by providing scenarios that describe how these data can be mined to answer biological questions related to the human microbiome. These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the nature and extent of variation in taxonomic and functional profiles across body habitats and individuals. Over one thousand HMP WGS datasets and the latest open source code are available at http://www.jcvi.org/hmp-metarep.
PLoS ONE 06/2012; 7(6):e29044. DOI:10.1371/journal.pone.0029044 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat's signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81-99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome.
[Show abstract][Hide abstract] ABSTRACT: Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at http://huttenhower.sph.harvard.edu/humann. This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies.
[Show abstract][Hide abstract] ABSTRACT: The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.
[Show abstract][Hide abstract] ABSTRACT: JCVI Metagenomics Reports (METAREP) is a Web 2.0 application designed to help scientists analyze and compare annotated metagenomics datasets. It utilizes Solr/Lucene, a high-performance scalable search engine, to quickly query large data collections. Furthermore, users can use its SQL-like query syntax to filter and refine datasets. METAREP provides graphical summaries for top taxonomic and functional classifications as well as a GO, NCBI Taxonomy and KEGG Pathway Browser. Users can compare absolute and relative counts of multiple datasets at various functional and taxonomic levels. Advanced comparative features comprise statistical tests as well as multidimensional scaling, heatmap and hierarchical clustering plots. Summaries can be exported as tab-delimited files, publication quality plots in PDF format. A data management layer allows collaborative data analysis and result sharing.
Web site http://www.jcvi.org/metarep; source code http://github.com/jcvi/METAREP CONTACT: firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: With the advance of high-throughput genomics and proteomics technologies, it becomes critical to mine and curate protein-protein
interaction (PPI) networks from biological research literature. Several PPI knowledge bases have been curated by domain experts
but they are far from comprehensive. Observing that PPI-relevant documents can be obtained from PPI knowledge bases recording
literature evidences and also that a large number of unlabeled documents (mostly negative) are freely available, we investigated
learning from positive and unlabeled data (LPU) and developed an automated system for the retrieval of PPI-relevant articles aiming at assisting the curation of a bacterial
PPI knowledge base, MPIDB. Two different approaches of obtaining unlabeled documents were used: one based on PubMed MeSH term
search and the other based on an existing knowledge base, UniProtKB. We found unlabeled documents obtained from UniProtKB
tend to yield better document classifiers for PPI curation purposes. Our study shows that LPU is a possible scenario for the
development of an automated system to retrieve PPI-relevant articles, where there is no requirement for extra annotation effort.
Selection of machine learning algorithms and that of unlabeled documents would be critical in constructing an effective LPU-based
Keywordsdocument retrieval-learning from positive and unlabeled-protein-protein interaction
[Show abstract][Hide abstract] ABSTRACT: The JCVI metagenomics analysis pipeline provides for the efficient and consistent annotation of shotgun metagenomics sequencing data for sampling communities of prokaryotic organisms. The process can be equally applied to individual sequence reads from traditional Sanger capillary electrophoresis sequences, newer technologies such as 454 pyrosequencing, or sequence assemblies derived from one or more of these data types. It includes the analysis of both coding and non-coding genes, whether full-length or, as is often the case for shotgun metagenomics, fragmentary. The system is designed to provide the best-supported conservative functional annotation based on a combination of trusted homology-based scientific evidence and computational assertions and an annotation value hierarchy established through extensive manual curation. The functional annotation attributes assigned by this system include gene name, gene symbol, GO terms, EC numbers, and JCVI functional role categories.
[Show abstract][Hide abstract] ABSTRACT: Generation of syntactically correct and unambiguous names for proteins is a challenging, yet vital task for functional annotation processes. Proteins are often named based on homology to known proteins, many of which have problematic names. To address the need to generate high-quality protein names, and capture our significant experience correcting protein names manually, we have developed the Protein Naming Utility (PNU, http://www.jcvi.org/pn-utility). The PNU is a web-based database for storing and applying naming rules to identify and correct syntactically incorrect protein names, or to replace synonyms with their preferred name. The PNU allows users to generate and manage collections of naming rules, optionally building upon the growing body of rules generated at the J. Craig Venter Institute (JCVI). Since communities often enforce disparate conventions for naming proteins, the PNU supports grouping rules into user-managed collections. Users can check their protein names against a selected PNU rule collection, generating both statistics and corrected names. The PNU can also be used to correct GenBank table files prior to submission to GenBank. Currently, the database features 3080 manual rules that have been entered by JCVI Bioinformatics Analysts as well as 7458 automatically imported names.
[Show abstract][Hide abstract] ABSTRACT: *Abstract*
Phenotypic data are routinely used to elucidate gene and protein function in most organisms amenable to experimental manipulation. However, although phenotype ontologies exist for many eukaryotic model organisms, no standardized system exists for the capture of phenotypic information in bacteria. We propose to build an Ontology of Microbial Phenotypes and use it to annotate the prokaryotic model organism _Escherichia coli_.
Phenotypes are the observable characteristics of an organism that result from the combination of a particular genotype and a particular environment, and thus are a basic and fundamental aspect of the biology of all organisms. The awesome power of genetics is founded on how the phenotypes of mutant genes, alone and in combination, contribute to understanding the biology of affected systems. To fully exploit the power of phenotypes for functional and comparative genomics, the ability to make comparisons across datasets and systems is vital. Making these comparisons either manually or computationally is hindered by the fact that phenotypes are not described consistently for bacteria. Our project aims to develop annotation infrastructure to improve the ability of microbiologists and bioinformaticians to use both existing and new phenotype information and to capture it in a consistent and standardized manner. This will require two key components: 1) an Ontology of Microbial Phenotypes (OMP) that captures phenotype descriptions in a controlled vocabulary, and 2) a set of evidence codes based on extension of the existing Evidence Code Ontology,^1^ with links to a database of papers and other resources describing the assays used to “measure” these phenotypes.
We have explored two parallel approaches to building the OMP. Both are pre-coordinated approaches that rely on using the terms in the Phenotypic Quality Ontology (PATO) as a basis for building up phenotype terms.^2^ In the first approach we read 100 papers and identified 40 phenotypes described in those papers. We organized the 40 phenotypes into a controlled vocabulary using OBO-Edit.^3^ While this effort was not comprehensive, we were able to classify the 40 phenotypes into five superclasses and assign PATO entities and qualities. In addition, various assays (biochemical, morphological, and physiological) were collected from the papers that were curated to generate phenotype terms. In the second approach we generated a cross product between a selection of PATO terms and two GO nodes relevant to microbial phenotypes, “GO:0044262 : cellular carbohydrate metabolic process” and “GO:0006520 : cellular amino acid metabolic process.” We found the cross product generation method to be quite effective in generating large numbers of relevant terms quickly.
The manual and cross product efforts were undertaken independently and in parallel by separate members of the group to see what, if any, consistency would be achieved. We found that although the concepts captured were similar, the different researchers chose different PATO quality terms to represent the same concepts. The manual curator chose “abnormal,” while the person working on cross products chose “abolished” and “disrupted.” The results of this exercise illustrate one reason why the pre-coordinated approach has advantages over the post-coordinated approach. In the post-coordinated approach separate annotators creating phenotype annotations at different points in time may choose different ways of expressing the same concept and thus create inconsistency. In the pre-coordinated approach, one controlled set of PATO terms will be used for term generation, and the fact of storing all the terms in one controlled vocabulary will enforce consistency and uniformity.
If our project is funded, we plan to expand our cross product generation by targeting relevant nodes in the GO and other ontologies. We will extend ECO to include terms that capture the assays used in phenotype analysis. We will apply the OMP and extended ECO to the annotation of _Eschericia coli_ and make the data available using EcoliWiki and other resources.
3. Day-Richter J, Harris MA, Haendel M, The Gene Ontology OBO-Edit Working Group, and Lewis S. OBO-Edit—an ontology editor for biologists. Bioinformatics. 2007;23(16):2198-2200.