[show abstract][hide abstract] ABSTRACT: The human body consists of innumerable multifaceted environments that predispose colonization by a number of distinct microbial communities, which play fundamental roles in human health and disease. In addition to community surveys and shotgun metagenomes that seek to explore the composition and diversity of these microbiomes, there are significant efforts to sequence reference microbial genomes from many body sites of healthy adults. To illustrate the utility of reference genomes when studying more complex metagenomes, we present a reference-based analysis of sequence reads generated from 55 shotgun metagenomes, selected from 5 major body sites, including 16 sub-sites. Interestingly, between 13% and 92% (62.3% average) of these shotgun reads were aligned to a then-complete list of 2780 reference genomes, including 1583 references for the human microbiome. However, no reference genome was universally found in all body sites. For any given metagenome, the body site-specific reference genomes, derived from the same body site as the sample, accounted for an average of 58.8% of the mapped reads. While different body sites did differ in abundant genera, proximal or symmetrical body sites were found to be most similar to one another. The extent of variation observed, both between individuals sampled within the same microenvironment, or at the same site within the same individual over time, calls into question comparative studies across individuals even if sampled at the same body site. This study illustrates the high utility of reference genomes and the need for further site-specific reference microbial genome sequencing, even within the already well-sampled human microbiome.
PLoS ONE 01/2014; 9(1):e84963. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: We compared the classification accuracy of two sections of the fungal Internal Transcribed Spacer (ITS) region, individually and combined, and the 5' section (about 600 bp) of the large-subunit rRNA, using a naïve Bayesian Classifier and BLASTN. A hand-curated ITS-LSU training set of 1091 sequences and a larger training set of 8967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naïve Bayesian classifier and BLASTN gave similar results at higher taxonomic levels but the classifier was faster and more accurate at the genus level when bootstrap cutoff was used. All of the ITS and LSU sections performed well (> 97.7 % accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66-1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100-200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets.
Applied and environmental microbiology 11/2013; · 3.69 Impact Factor
[show abstract][hide abstract] ABSTRACT: Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.
Standards in Genomic Sciences 10/2013; 9(1):106-16. · 2.01 Impact Factor
[show abstract][hide abstract] ABSTRACT: We sequenced the 2 botulinum toxin gene clusters of Clostridium botulinum strain IBCA10-7060 type Bh. The sequence of bont/H differed substantially from the sequences of the 7 known bont genes for toxin types A-G. The 5' one-third terminus of bont/H that codes for the botulinum toxin light chain differed markedly from the light chain coding sequences of toxin types A-G. The 3' two-thirds terminus of bont/H that codes for the botulinum toxin heavy chain contained a novel Hn translocation domain coding sequence and a nonneutralizing type A-like Hc binding domain coding sequence. bont/H was part of an orfX toxin gene cluster that was located at a unique chromosomal site distant from those used by other botulinum toxin gene clusters. The bont/B sequence was similar to that of subtype bont/B2 and was located within its ha toxin gene cluster at the oppA/brnQ site. Our findings further establish that C. botulinum IBCA10-7060 produces novel BoNT/H.
The Journal of Infectious Diseases 10/2013; · 5.85 Impact Factor
[show abstract][hide abstract] ABSTRACT: We report the sequences of two Klebsiella pneumoniae clinical isolates, strains JHCK1 and VA360, from a newborn with meningitis in Buenos Aires, Argentina, and from a tertiary care medical center in Cleveland, OH, respectively. Both isolates contain one chromosome and at least five plasmids; isolate VA360 contains the Klebsiella pneumoniae carbapenemase (KPC) gene.
[show abstract][hide abstract] ABSTRACT: Sanger and shotgun sequencing of Clostridium botulinum strain Af84 type Af and its botulinum neurotoxin gene (bont) clusters identified the presence of three bont gene clusters rather than the expected two. The three toxin gene clusters consisted of bont subtypes A2, F4 and F5. The bont/A2 and bont/F4 gene clusters were located within the chromosome (the latter in a novel location), while the bont/F5 toxin gene cluster was located within a large 246 kb plasmid. These findings are the first identification of a C. botulinum strain that contains three botulinum neurotoxin gene clusters.
PLoS ONE 01/2013; 8(4):e61205. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Aquificales are thermophilic microorganisms that inhabit hydrothermal systems worldwide and are considered one of the earliest lineages of the domain Bacteria. We analyzed metagenome sequence obtained from six thermal "filamentous streamer" communities (∼40 Mbp per site), which targeted three different groups of Aquificales found in Yellowstone National Park (YNP). Unassembled metagenome sequence and PCR-amplified 16S rRNA gene libraries revealed that acidic, sulfidic sites were dominated by Hydrogenobaculum (Aquificaceae) populations, whereas the circum-neutral pH (6.5-7.8) sites containing dissolved sulfide were dominated by Sulfurihydrogenibium spp. (Hydrogenothermaceae). Thermocrinis (Aquificaceae) populations were found primarily in the circum-neutral sites with undetectable sulfide, and to a lesser extent in one sulfidic system at pH 8. Phylogenetic analysis of assembled sequence containing 16S rRNA genes as well as conserved protein-encoding genes revealed that the composition and function of these communities varied across geochemical conditions. Each Aquificales lineage contained genes for CO2 fixation by the reverse-TCA cycle, but only the Sulfurihydrogenibium populations perform citrate cleavage using ATP citrate lyase (Acl). The Aquificaceae populations use an alternative pathway catalyzed by two separate enzymes, citryl-CoA synthetase (Ccs), and citryl-CoA lyase (Ccl). All three Aquificales lineages contained evidence of aerobic respiration, albeit due to completely different types of heme Cu oxidases (subunit I) involved in oxygen reduction. The distribution of Aquificales populations and differences among functional genes involved in energy generation and electron transport is consistent with the hypothesis that geochemical parameters (e.g., pH, sulfide, H2, O2) have resulted in niche specialization among members of the Aquificales.
[show abstract][hide abstract] ABSTRACT: Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers.
At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database.
Classification by exact matching against a precomputed list of signature peptides provides comparable results to existing techniques for reads longer than about 300 bp and does not degrade severely with shorter reads. Orders of magnitude faster than existing methods, the approach is suitable now for inclusion in analysis pipelines and appears to be extensible in several different directions.
[show abstract][hide abstract] ABSTRACT: Kingella kingae is a human oral bacterium that can cause infections of the skeletal system in children. The bacterium is also a cardiovascular pathogen causing infective endocarditis in children and adults. We report herein the draft genome sequence of septic arthritis K. kingae strain PYKK081.
Journal of bacteriology 06/2012; 194(11):3017. · 3.94 Impact Factor
[show abstract][hide abstract] ABSTRACT: One form of immune evasion is a developmental state called "persistence" whereby chlamydial pathogens respond to the host-mediated withdrawal of L-tryptophan (Trp). A sophisticated survival mode of reversible quiescence is implemented. A mechanism has evolved which suppresses gene products necessary for rapid pathogen proliferation but allows expression of gene products that underlie the morphological and developmental characteristics of persistence. This switch from one translational profile to an alternative translational profile of newly synthesized proteins is proposed to be accomplished by maximizing the Trp content of some proteins needed for rapid proliferation (e.g., ADP/ATP translocase, hexose-phosphate transporter, phosphoenolpyruvate [PEP] carboxykinase, the Trp transporter, the Pmp protein superfamily for cell adhesion and antigenic variation, and components of the cell division pathway) while minimizing the Trp content of other proteins supporting the state of persistence. The Trp starvation mechanism is best understood in the human-Chlamydia trachomatis relationship, but the similarity of up-Trp and down-Trp proteomic profiles in all of the pathogenic Chlamydiaceae suggests that Trp availability is an underlying cue relied upon by this family of pathogens to trigger developmental transitions. The biochemically expensive pathogen strategy of selectively increased Trp usage to guide the translational profile can be leveraged significantly with minimal overall Trp usage by (i) regional concentration of Trp residue placements, (ii) amplified Trp content of a single protein that is required for expression or maturation of multiple proteins with low Trp content, and (iii) Achilles'-heel vulnerabilities of complex pathways to high Trp content of one or a few enzymes.
[show abstract][hide abstract] ABSTRACT: Comparison of genome-wide, high-resolution restriction maps of Klebsiella pneumoniae clinical isolates, including an NDM-1 producer, and in silico-generated restriction maps of sequenced genomes revealed a highly heterogeneous region we designated the 'high heterogeneity zone' (HHZ). The HHZ consists of several regions, including a 'hot spot' prone to insertions and other rearrangements. The HHZ is a characteristic genomic area that can be used in the identification and tracking of outbreak-causing strains.
Clinical Microbiology and Infection 04/2012; 18(7):E254-8. · 4.58 Impact Factor
[show abstract][hide abstract] ABSTRACT: Six terrestrial ecosystems in the USA were exposed to elevated atmospheric CO(2) in single or multifactorial experiments for more than a decade to assess potential impacts. We retrospectively assessed soil bacterial community responses in all six-field experiments and found ecosystem-specific and common patterns of soil bacterial community response to elevated CO(2) . Soil bacterial composition differed greatly across the six ecosystems. No common effect of elevated atmospheric CO(2) on bacterial biomass, richness and community composition across all of the ecosystems was identified, although significant responses were detected in individual ecosystems. The most striking common trend across the sites was a decrease of up to 3.5-fold in the relative abundance of Acidobacteria Group 1 bacteria in soils exposed to elevated CO(2) or other climate factors. The Acidobacteria Group 1 response observed in exploratory 16S rRNA gene clone library surveys was validated in one ecosystem by 100-fold deeper sequencing and semi-quantitative PCR assays. Collectively, the 16S rRNA gene sequencing approach revealed influences of elevated CO(2) on multiple ecosystems. Although few common trends across the ecosystems were detected in the small surveys, the trends may be harbingers of more substantive changes in less abundant, more sensitive taxa that can only be detected by deeper surveys. Representative bacterial 16S rRNA gene clone sequences were deposited in GenBank with Accession No. JQ366086–JQ387568.
[show abstract][hide abstract] ABSTRACT: Microbial hydrolysis of polysaccharides is critical to ecosystem functioning and is of great interest in diverse biotechnological applications, such as biofuel production and bioremediation. Here we demonstrate the use of a new, efficient approach to recover genomes of active polysaccharide degraders from natural, complex microbial assemblages, using a combination of fluorescently labeled substrates, fluorescence-activated cell sorting, and single cell genomics. We employed this approach to analyze freshwater and coastal bacterioplankton for degraders of laminarin and xylan, two of the most abundant storage and structural polysaccharides in nature. Our results suggest that a few phylotypes of Verrucomicrobia make a considerable contribution to polysaccharide degradation, although they constituted only a minor fraction of the total microbial community. Genomic sequencing of five cells, representing the most predominant, polysaccharide-active Verrucomicrobia phylotype, revealed significant enrichment in genes encoding a wide spectrum of glycoside hydrolases, sulfatases, peptidases, carbohydrate lyases and esterases, confirming that these organisms were well equipped for the hydrolysis of diverse polysaccharides. Remarkably, this enrichment was on average higher than in the sequenced representatives of Bacteroidetes, which are frequently regarded as highly efficient biopolymer degraders. These findings shed light on the ecological roles of uncultured Verrucomicrobia and suggest specific taxa as promising bioprospecting targets. The employed method offers a powerful tool to rapidly identify and recover discrete genomes of active players in polysaccharide degradation, without the need for cultivation.
PLoS ONE 01/2012; 7(4):e35314. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: In May of 2011, an enteroaggregative Escherichia coli O104:H4 strain that had acquired a Shiga toxin 2-converting phage caused a large outbreak of bloody diarrhea in Europe which was notable for its high prevalence of hemolytic uremic syndrome cases. Several studies have described the genomic inventory and phylogenies of strains associated with the outbreak and a collection of historical E. coli O104:H4 isolates using draft genome assemblies. We present the complete, closed genome sequences of an isolate from the 2011 outbreak (2011C-3493) and two isolates from cases of bloody diarrhea that occurred in the Republic of Georgia in 2009 (2009EL-2050 and 2009EL-2071). Comparative genome analysis indicates that, while the Georgian strains are the nearest neighbors to the 2011 outbreak isolates sequenced to date, structural and nucleotide-level differences are evident in the Stx2 phage genomes, the mer/tet antibiotic resistance island, and in the prophage and plasmid profiles of the strains, including a previously undescribed plasmid with homology to the pMT virulence plasmid of Yersinia pestis. In addition, multiphenotype analysis showed that 2009EL-2071 possessed higher resistance to polymyxin and membrane-disrupting agents. Finally, we show evidence by electron microscopy of the presence of a common phage morphotype among the European and Georgian strains and a second phage morphotype among the Georgian strains. The presence of at least two stx2 phage genotypes in host genetic backgrounds that may derive from a recent common ancestor of the 2011 outbreak isolates indicates that the emergence of stx2 phage-containing E. coli O104:H4 strains probably occurred more than once, or that the current outbreak isolates may be the result of a recent transfer of a new stx2 phage element into a pre-existing stx2-positive genetic background.
PLoS ONE 01/2012; 7(11):e48228. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed.
Standards in Genomic Sciences 12/2011; 5(3):331-40. · 2.01 Impact Factor
[show abstract][hide abstract] ABSTRACT: Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project.
Applied and environmental microbiology 12/2011; 78(5):1523-33. · 3.69 Impact Factor
[show abstract][hide abstract] ABSTRACT: As soon as whole-genome sequencing entered the scene in the mid-1990s and demonstrated its use in revealing the entire genetic potential of any given microbial organism, this technique immediately revolutionized the way pathogen (and many other fields of) research was carried out. The ability to perform whole-genome comparisons further transformed the field and allowed scientists to obtain information linking phenotypic dissimilarities among closely related organisms and their underlying genetic mechanisms. Such comparisons have become commonplace in examining strain-to-strain variability, as well as comparing pathogens to less, or nonpathogenic near neighbors. In recent years, a bloom in novel sequencing technologies along with continuous increases in throughput has occurred, inundating the field with various types of massively parallel sequencing data and further transforming comparative genomics research. Here, we review the evolution of comparative genomics, its impact in understanding pathogen evolution and physiology and the opportunities and challenges presented by next-generation sequencing as applied to pathogen genome comparisons.
Briefings in functional genomics 11/2011; 10(6):322-33. · 4.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: An isolate originally labeled Bacillus megaterium CDC 684 was found to contain both pXO1 and pXO2, was non-hemolytic, sensitive to gamma-phage, and produced both the protective antigen and the poly-D-glutamic acid capsule. These phenotypes prompted Ezzell et al., (J. Clin. Microbiol. 28:223) to reclassify this isolate to Bacillus anthracis in 1990.
We demonstrate that despite these B. anthracis features, the isolate is severely attenuated in a guinea pig model. This prompted whole genome sequencing and closure. The comparative analysis of CDC 684 to other sequenced B. anthracis isolates and further analysis reveals: a) CDC 684 is a close relative of a virulent strain, Vollum A0488; b) CDC 684 defines a new B. anthracis lineage (at least 51 SNPs) that includes 15 other isolates; c) the genome of CDC 684 contains a large chromosomal inversion that spans 3.3 Mbp; d) this inversion has caused a displacement of the usual spatial orientation of the origin of replication (ori) to the termination of replication (ter) from 180° in wild-type B. anthracis to 120° in CDC 684 and e) this isolate also has altered growth kinetics in liquid media.
We propose two alternative hypotheses explaining the attenuated phenotype of this isolate. Hypothesis 1 suggests that the skewed ori/ter relationship in CDC 684 has altered its DNA replication and/or transcriptome processes resulting in altered growth kinetics and virulence capacity. Hypothesis 2 suggests that one or more of the single nucleotide polymorphisms in CDC 684 has altered the expression of a regulatory element or other genes necessary for virulence.
[show abstract][hide abstract] ABSTRACT: The etiology of dental caries remains elusive because of our limited understanding of the complex oral microbiomes. The current methodologies have been limited by insufficient depth and breadth of microbial sampling, paucity of data for diseased hosts particularly at the population level, inconsistency of sampled sites and the inability to distinguish the underlying microbial factors. By cross-validating 16S rRNA gene amplicon-based and whole-genome-based deep-sequencing technologies, we report the most in-depth, comprehensive and collaborated view to date of the adult saliva microbiomes in pilot populations of 19 caries-active and 26 healthy human hosts. We found that: first, saliva microbiomes in human population were featured by a vast phylogenetic diversity yet a minimal organismal core; second, caries microbiomes were significantly more variable in community structure whereas the healthy ones were relatively conserved; third, abundance changes of certain taxa such as overabundance of Prevotella Genus distinguished caries microbiota from healthy ones, and furthermore, caries-active and normal individuals carried different arrays of Prevotella species; and finally, no 'caries-specific' operational taxonomic units (OTUs) were detected, yet 147 OTUs were 'caries associated', that is, differentially distributed yet present in both healthy and caries-active populations. These findings underscored the necessity of species- and strain-level resolution for caries prognosis, and were consistent with the ecological hypothesis where the shifts in community structure, instead of the presence or absence of particular groups of microbes, underlie the cariogenesis.
The ISME Journal 06/2011; 6(1):1-10. · 8.95 Impact Factor