Publications (28)363.5 Total impact
-
Article: 'Candidatus Thermochlorobacter aerophilum:' an aerobic chlorophotoheterotrophic member of the phylum Chlorobi defined by metagenomics and metatranscriptomics.
[show abstract] [hide abstract]
ABSTRACT: An uncultured member of the phylum Chlorobi, provisionally named 'Candidatus Thermochlorobacter aerophilum', occurs in the microbial mats of alkaline siliceous hot springs at the Yellowstone National Park. 'Ca. T. aerophilum' was investigated through metagenomic and metatranscriptomic approaches. 'Ca. T. aerophilum' is a member of a novel, family-level lineage of Chlorobi, a chlorophototroph that synthesizes type-1 reaction centers and chlorosomes similar to cultivated relatives among the green sulfur bacteria, but is otherwise very different physiologically. 'Ca. T. aerophilum' is proposed to be an aerobic photoheterotroph that cannot oxidize sulfur compounds, cannot fix N(2), and does not fix CO(2) autotrophically. Metagenomic analyses suggest that 'Ca. T. aerophilum' depends on other mat organisms for fixed carbon and nitrogen, several amino acids, and other important nutrients. The failure to detect bchU suggests that 'Ca. T. aerophilum' synthesizes bacteriochlorophyll (BChl) d, and thus it occupies a different ecological niche than other chlorosome-containing chlorophototrophs in the mat. Transcription profiling throughout a diel cycle revealed distinctive gene expression patterns. Although 'Ca. T. aerophilum' probably photoassimilates organic carbon sources and synthesizes most of its cell materials during the day, it mainly transcribes genes for BChl synthesis during late afternoon and early morning, and it synthesizes and assembles its photosynthetic apparatus during the night.The ISME Journal 03/2012; 6(10):1869-82. · 7.38 Impact Factor -
Article: Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineageOpen
[show abstract] [hide abstract]
ABSTRACT: Bacteria in the 16S rRNA clade SAR86 are among the most abundant uncultivated constituents of microbial assemblages in the surface ocean for which little genomic information is currently available. Bioinformatic techniques were used to assemble two nearly complete genomes from marine metagenomes and single-cell sequencing provided two more partial genomes. Recruitment of metagenomic data shows that these SAR86 genomes substantially increase our knowledge of non-photosynthetic bacteria in the surface ocean. Phylogenomic analyses establish SAR86 as a basal and divergent lineage of γ-proteobacteria, and the individual genomes display a temperature-dependent distribution. Modestly sized at 1.25–1.7 Mbp, the SAR86 genomes lack several pathways for amino-acid and vitamin synthesis as well as sulfate reduction, trends commonly observed in other abundant marine microbes. SAR86 appears to be an aerobic chemoheterotroph with the potential for proteorhodopsin-based ATP generation, though the apparent lack of a retinal biosynthesis pathway may require it to scavenge exogenously-derived pigments to utilize proteorhodopsin. The genomes contain an expanded capacity for the degradation of lipids and carbohydrates acquired using a wealth of tonB-dependent outer membrane receptors. Like the abundant planktonic marine bacterial clade SAR11, SAR86 exhibits metabolic streamlining, but also a distinct carbon compound specialization, possibly avoiding competition.Keywords: SAR86; SAR11; metagenomic assembly; single cell genomics; proteorhodopsin; tonB receptorsThe ISME Journal 12/2011; 6(6):1186-1199. · 7.38 Impact Factor -
Article: Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.
[show abstract] [hide abstract]
ABSTRACT: Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.Nature Biotechnology 09/2011; 29(10):915-21. · 29.50 Impact Factor -
Article: Metatranscriptomic analyses of chlorophototrophs of a hot-spring microbial mat.
[show abstract] [hide abstract]
ABSTRACT: The phototrophic microbial mat community of Mushroom Spring, an alkaline siliceous hot spring in Yellowstone National Park, was studied by metatranscriptomic methods. RNA was extracted from mat specimens collected at four timepoints during light-to-dark and dark-to-light transitions in one diel cycle, and these RNA samples were analyzed by both pyrosequencing and SOLiD technologies. Pyrosequencing was used to assess the community composition, which showed that ~84% of the rRNA was derived from members of four kingdoms Cyanobacteria, Chloroflexi, Chlorobi and Acidobacteria. Transcription of photosynthesis-related genes conclusively demonstrated the phototrophic nature of two newly discovered populations; these organisms, which were discovered through metagenomics, are currently uncultured and previously undescribed members of Chloroflexi and Chlorobi. Data sets produced by SOLiD sequencing of complementary DNA provided >100-fold greater sequence coverage. The much greater sequencing depth allowed transcripts to be detected from ~15,000 genes and could be used to demonstrate statistically significant differential transcription of thousands of genes. Temporal differences for in situ transcription patterns of photosynthesis-related genes suggested that the six types of chlorophototrophs in the mats may use different strategies for maximizing their solar-energy capture, usage and growth. On the basis of both temporal pattern and transcript abundance, intra-guild gene expression differences were also detected for two populations of the oxygenic photosynthesis guild. This study showed that, when community-relevant genomes and metagenomes are available, SOLiD sequencing technology can be used for metatranscriptomic analyses, and the results suggested that this method can potentially reveal new insights into the ecophysiology of this model microbial community.The ISME Journal 06/2011; 5(8):1279-90. · 7.38 Impact Factor -
Article: Community ecology of hot spring cyanobacterial mats: predominant populations and their functional potential.
[show abstract] [hide abstract]
ABSTRACT: Phototrophic microbial mat communities from 60°C and 65°C regions in the effluent channels of Mushroom and Octopus Springs (Yellowstone National Park, WY, USA) were investigated by shotgun metagenomic sequencing. Analyses of assembled metagenomic sequences resolved six dominant chlorophototrophic populations and permitted the discovery and characterization of undescribed but predominant community members and their physiological potential. Linkage of phylogenetic marker genes and functional genes showed novel chlorophototrophic bacteria belonging to uncharacterized lineages within the order Chlorobiales and within the Kingdom Chloroflexi. The latter is the first chlorophototrophic member of Kingdom Chloroflexi that lies outside the monophyletic group of chlorophototrophs of the Order Chloroflexales. Direct comparison of unassembled metagenomic sequences to genomes of representative isolates showed extensive genetic diversity, genomic rearrangements and novel physiological potential in native populations as compared with genomic references. Synechococcus spp. metagenomic sequences showed a high degree of synteny with the reference genomes of Synechococcus spp. strains A and B', but synteny declined with decreasing sequence relatedness to these references. There was evidence of horizontal gene transfer among native populations, but the frequency of these events was inversely proportional to phylogenetic relatedness.The ISME Journal 06/2011; 5(8):1262-78. · 7.38 Impact Factor -
Article: Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
[show abstract] [hide abstract]
ABSTRACT: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.PLoS ONE 01/2011; 6(3):e18011. · 4.09 Impact Factor -
Article: ԊCm\w̃vNgjɂ݂Qmы@\̓K
Nature 11/2010; 468(7320):60-66. · 36.28 Impact Factor -
Article: Genomic and functional adaptation in surface ocean planktonic prokaryotes
[show abstract] [hide abstract]
ABSTRACT: The understanding of marine microbial ecology and metabolism has been hampered by the paucity of sequenced reference genomes. To this end, we report the sequencing of 137 diverse marine isolates collected from around the world. We analysed these sequences, along with previously published marine prokaryotic genomes, in the context of marine metagenomic data, to gain insights into the ecology of the surface ocean prokaryotic picoplankton (0.1-3.0 μm size range). The results suggest that the sequenced genomes define two microbial groups: one composed of only a few taxa that are nearly always abundant in picoplanktonic communities, and the other consisting of many microbial taxa that are rarely abundant. The genomic content of the second group suggests that these microbes are capable of slow growth and survival in energy-limited environments, and rapid growth in energy-rich environments. By contrast, the abundant and cosmopolitan picoplanktonic prokaryotes for which there is genomic representation have smaller genomes, are probably capable of only slow growth and seem to be relatively unable to sense or rapidly acclimate to energy-rich conditions. Their genomic features also lead us to propose that one method used to avoid predation by viruses and/or bacterivores is by means of slow growth and the maintenance of low biomass.Nature 11/2010; 468(7320):60-66. · 36.28 Impact Factor -
Article: METAREP: JCVI metagenomics reports--an open source tool for high-performance comparative metagenomics.
[show abstract] [hide abstract]
ABSTRACT: JCVI Metagenomics Reports (METAREP) is a Web 2.0 application designed to help scientists analyze and compare annotated metagenomics datasets. It utilizes Solr/Lucene, a high-performance scalable search engine, to quickly query large data collections. Furthermore, users can use its SQL-like query syntax to filter and refine datasets. METAREP provides graphical summaries for top taxonomic and functional classifications as well as a GO, NCBI Taxonomy and KEGG Pathway Browser. Users can compare absolute and relative counts of multiple datasets at various functional and taxonomic levels. Advanced comparative features comprise statistical tests as well as multidimensional scaling, heatmap and hierarchical clustering plots. Summaries can be exported as tab-delimited files, publication quality plots in PDF format. A data management layer allows collaborative data analysis and result sharing. Web site http://www.jcvi.org/metarep; source code http://github.com/jcvi/METAREP CONTACT: syooseph@jcvi.org Supplementary data are available at Bioinformatics online.Bioinformatics 10/2010; 26(20):2631-2. · 5.47 Impact Factor -
Article: Characterization of Prochlorococcus clades from iron-depleted oceanic regions.
[show abstract] [hide abstract]
ABSTRACT: Prochlorococcus describes a diverse and abundant genus of marine photosynthetic microbes. It is primarily found in oligotrophic waters across the globe and plays a crucial role in energy and nutrient cycling in the ocean ecosystem. The abundance, global distribution, and availability of isolates make Prochlorococcus a model system for understanding marine microbial diversity and biogeochemical cycling. Analysis of 73 metagenomic samples from the Global Ocean Sampling expedition acquired in the Atlantic, Pacific, and Indian Oceans revealed the presence of two uncharacterized Prochlorococcus clades. A phylogenetic analysis using six different genetic markers places the clades close to known lineages adapted to high-light environments. The two uncharacterized clades consistently cooccur and dominate the surface waters of high-temperature, macronutrient-replete, and low-iron regions of the Eastern Equatorial Pacific upwelling and the tropical Indian Ocean. They are genetically distinct from each other and other high-light Prochlorococcus isolates and likely define a previously unrecognized ecotype. Our detailed genomic analysis indicates that these clades comprise organisms that are adapted to iron-depleted environments by reducing their iron quota through the loss of several iron-containing proteins that likely function as electron sinks in the photosynthetic pathway in other Prochlorococcus clades from high-light environments. The presence and inferred physiology of these clades may explain why Prochlorococcus populations from iron-depleted regions do not respond to iron fertilization experiments and further expand our understanding of how phytoplankton adapt to variations in nutrient availability in the ocean.Proceedings of the National Academy of Sciences 09/2010; 107(37):16184-9. · 9.68 Impact Factor -
Article: A catalog of reference genomes from the human microbiome.
[show abstract] [hide abstract]
ABSTRACT: The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.Science 05/2010; 328(5981):994-9. · 31.20 Impact Factor -
Article: The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data.
[show abstract] [hide abstract]
ABSTRACT: The JCVI metagenomics analysis pipeline provides for the efficient and consistent annotation of shotgun metagenomics sequencing data for sampling communities of prokaryotic organisms. The process can be equally applied to individual sequence reads from traditional Sanger capillary electrophoresis sequences, newer technologies such as 454 pyrosequencing, or sequence assemblies derived from one or more of these data types. It includes the analysis of both coding and non-coding genes, whether full-length or, as is often the case for shotgun metagenomics, fragmentary. The system is designed to provide the best-supported conservative functional annotation based on a combination of trusted homology-based scientific evidence and computational assertions and an annotation value hierarchy established through extensive manual curation. The functional annotation attributes assigned by this system include gene name, gene symbol, GO terms, EC numbers, and JCVI functional role categories.Standards in Genomic Sciences 01/2010; 2(2):229-37. · 1.62 Impact Factor -
Article: Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function.
[show abstract] [hide abstract]
ABSTRACT: Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.BMC Bioinformatics 01/2010; 11:52. · 2.75 Impact Factor -
Article: METAREP: JCVI metagenomics reports - an open source tool for high-performance comparative metagenomics.
Bioinformatics. 01/2010; 26:2631-2632. -
Article: Metagenomes from high-temperature chemotrophic systems reveal geochemical controls on microbial community structure and function.
[show abstract] [hide abstract]
ABSTRACT: The Yellowstone caldera contains the most numerous and diverse geothermal systems on Earth, yielding an extensive array of unique high-temperature environments that host a variety of deeply-rooted and understudied Archaea, Bacteria and Eukarya. The combination of extreme temperature and chemical conditions encountered in geothermal environments often results in considerably less microbial diversity than other terrestrial habitats and offers a tremendous opportunity for studying the structure and function of indigenous microbial communities and for establishing linkages between putative metabolisms and element cycling. Metagenome sequence (14-15,000 Sanger reads per site) was obtained for five high-temperature (>65 degrees C) chemotrophic microbial communities sampled from geothermal springs (or pools) in Yellowstone National Park (YNP) that exhibit a wide range in geochemistry including pH, dissolved sulfide, dissolved oxygen and ferrous iron. Metagenome data revealed significant differences in the predominant phyla associated with each of these geochemical environments. Novel members of the Sulfolobales are dominant in low pH environments, while other Crenarchaeota including distantly-related Thermoproteales and Desulfurococcales populations dominate in suboxic sulfidic sediments. Several novel archaeal groups are well represented in an acidic (pH 3) Fe-oxyhydroxide mat, where a higher O2 influx is accompanied with an increase in archaeal diversity. The presence or absence of genes and pathways important in S oxidation-reduction, H2-oxidation, and aerobic respiration (terminal oxidation) provide insight regarding the metabolic strategies of indigenous organisms present in geothermal systems. Multiple-pathway and protein-specific functional analysis of metagenome sequence data corroborated results from phylogenetic analyses and clearly demonstrate major differences in metabolic potential across sites. The distribution of functional genes involved in electron transport is consistent with the hypothesis that geochemical parameters (e.g., pH, sulfide, Fe, O2) control microbial community structure and function in YNP geothermal springs.PLoS ONE 01/2010; 5(3):e9773. · 4.09 Impact Factor -
Article: The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples.
[show abstract] [hide abstract]
ABSTRACT: Viruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within microbial fractions, the prevalence of genes among viral sequences that encode microbial physiological function and their distinct phylogenetic distribution lend strong support to the notion that viral-mediated gene acquisition is a common and ongoing mechanism for generating microbial diversity in the marine environment.PLoS ONE 02/2008; 3(1):e1456. · 4.09 Impact Factor -
Article: Viral photosynthetic reaction center genes and transcripts in the marine environment.
[show abstract] [hide abstract]
ABSTRACT: Cyanobacteria of the genera Synechococcus and Prochlorococcus are important contributors to photosynthetic productivity in the open ocean. The discovery of genes (psbA, psbD) that encode key photosystem II proteins (D1, D2) in the genomes of phages that infect these cyanobacteria suggests new paradigms for the regulation, function and evolution of photosynthesis in the vast pelagic ecosystem. Reports on the prevalence and expression of phage photosynthesis genes, and evolutionary data showing a potential recombination of phage and host genes, suggest a model in which phage photosynthesis genes help support photosynthetic activity in their hosts during the infection process. Here, using metagenomic data in natural ocean samples, we show that about 60% of the psbA genes in surface water along the global ocean sampling transect are of phage origin, and that the phage genes are undergoing an independent selection for distinct D1 proteins. Furthermore, we show that different viral psbA genes are expressed in the environment.The ISME Journal 11/2007; 1(6):492-501. · 7.38 Impact Factor -
Article: Assessing diversity and biogeography of aerobic anoxygenic phototrophic bacteria in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling expedition metagenomes.
[show abstract] [hide abstract]
ABSTRACT: Aerobic anoxygenic photosynthetic bacteria (AAnP) were recently proposed to be significant contributors to global oceanic carbon and energy cycles. However, AAnP abundance, spatial distribution, diversity and potential ecological importance remain poorly understood. Here we present metagenomic data from the Global Ocean Sampling expedition indicating that AAnP diversity and abundance vary in different oceanic regions. Furthermore, we show for the first time that the composition of AAnP assemblages change between different oceanic regions, with specific bacterial assemblages adapted to open ocean or coastal areas respectively. Our results support the notion that marine AAnP populations are complex and dynamic, and compose an important fraction of bacterioplankton assemblages in certain oceanic areas.Environmental Microbiology 07/2007; 9(6):1464-75. · 5.84 Impact Factor -
Article: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.
[show abstract] [hide abstract]
ABSTRACT: Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.PLoS Biology 04/2007; 5(3):e16. · 11.45 Impact Factor -
Article: The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.
[show abstract] [hide abstract]
ABSTRACT: The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.PLoS Biology 04/2007; 5(3):e77. · 11.45 Impact Factor
Top Journals
- Science (5)
- The ISME Journal (3)
- PLoS ONE (3)
- Nature (2)
- PLoS Biology (2)
Institutions
-
2007–2011
-
J. Craig Venter Institute
Rockville, MD, USA
-