[show abstract][hide abstract] ABSTRACT: Here we report the complete, accurate 1.89-Mb genome sequence of Francisella tularensis subsp. holarctica strain FSC200, isolated in 1998 in the Swedish municipality Ljusdal, which is in an area where tularemia is highly endemic. This genome is important because strain FSC200 has been extensively used for functional and genetic studies of Francisella and is well-characterized.
Journal of bacteriology 12/2012; 194(24):6965-6. · 3.94 Impact Factor
[show abstract][hide abstract] ABSTRACT: Burkholderia pseudomallei, the etiologic agent of human melioidosis, is capable of causing severe acute infection with overwhelming septicemia leading to death. A high rate of recurrent disease occurs in adult patients, most often due to recrudescence of the initial infecting strain. Pathogen persistence and evolution during such relapsing infections are not well understood. Bacterial cells present in the primary inoculum and in late infections may differ greatly, as has been observed in chronic disease, or they may be genetically similar. To test these alternative models, we conducted whole-genome comparisons of clonal primary and relapse B. pseudomallei isolates recovered six months to six years apart from four adult Thai patients. We found differences within each of the four pairs, and some, including a 330 Kb deletion, affected substantial portions of the genome. Many of the changes were associated with increased antibiotic resistance. We also found evidence of positive selection for deleterious mutations in a TetR family transcriptional regulator from a set of 107 additional B. pseudomallei strains. As part of the study, we sequenced to base-pair accuracy the genome of B. pseudomallei strain 1026b, the model used for genetic studies of B. pseudomallei pathogenesis and antibiotic resistance. Our findings provide new insights into pathogen evolution during long-term infections and have important implications for the development of intervention strategies to combat recurrent melioidosis.
PLoS ONE 01/2012; 7(5):e36507. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Phylogeographic reconstruction of some bacterial populations is hindered by low diversity coupled with high levels of lateral gene transfer. A comparison of recombination levels and diversity at seven housekeeping genes for eleven bacterial species, most of which are commonly cited as having high levels of lateral gene transfer shows that the relative contributions of homologous recombination versus mutation for Burkholderia pseudomallei is over two times higher than for Streptococcus pneumoniae and is thus the highest value yet reported in bacteria. Despite the potential for homologous recombination to increase diversity, B. pseudomallei exhibits a relative lack of diversity at these loci. In these situations, whole genome genotyping of orthologous shared single nucleotide polymorphism loci, discovered using next generation sequencing technologies, can provide very large data sets capable of estimating core phylogenetic relationships. We compared and searched 43 whole genome sequences of B. pseudomallei and its closest relatives for single nucleotide polymorphisms in orthologous shared regions to use in phylogenetic reconstruction.
Bayesian phylogenetic analyses of >14,000 single nucleotide polymorphisms yielded completely resolved trees for these 43 strains with high levels of statistical support. These results enable a better understanding of a separate analysis of population differentiation among >1,700 B. pseudomallei isolates as defined by sequence data from seven housekeeping genes. We analyzed this larger data set for population structure and allele sharing that can be attributed to lateral gene transfer. Our results suggest that despite an almost panmictic population, we can detect two distinct populations of B. pseudomallei that conform to biogeographic patterns found in many plant and animal species. That is, separation along Wallace's Line, a biogeographic boundary between Southeast Asia and Australia.
We describe an Australian origin for B. pseudomallei, characterized by a single introduction event into Southeast Asia during a recent glacial period, and variable levels of lateral gene transfer within populations. These patterns provide insights into mechanisms of genetic diversification in B. pseudomallei and its closest relatives, and provide a framework for integrating the traditionally separate fields of population genetics and phylogenetics for other bacterial species with high levels of lateral gene transfer.
[show abstract][hide abstract] ABSTRACT: In addition to causing diarrhea, Escherichia coli O157:H7 infection can lead to hemolytic-uremic syndrome (HUS), a severe disease characterized by hemolysis and renal failure. Differences in HUS frequency among E. coli O157:H7 outbreaks have been noted, but our understanding of bacterial factors that promote HUS is incomplete. In 2006, in an outbreak of E. coli O157:H7 caused by consumption of contaminated spinach, there was a notably high frequency of HUS. We sequenced the genome of the strain responsible (TW14359) with the goal of identifying candidate genetic factors that contribute to an enhanced ability to cause HUS. The TW14359 genome contains 70 kb of DNA segments not present in either of the two reference O157:H7 genomes. We identified seven putative virulence determinants, including two putative type III secretion system effector proteins, candidate genes that could result in increased pathogenicity or, alternatively, adaptation to plants, and an intact anaerobic nitric oxide reductase gene, norV. We surveyed 100 O157:H7 isolates for the presence of these putative virulence determinants. A norV deletion was found in over one-half of the strains surveyed and correlated strikingly with the absence of stx(1). The other putative virulence factors were found in 8 to 35% of the O157:H7 isolates surveyed, and their presence also correlated with the presence of norV and the absence of stx(1), indicating that the presence of norV may serve as a marker of a greater propensity for HUS, similar to the correlation between the absence of stx(1) and a propensity for HUS.
Infection and immunity 07/2009; 77(9):3713-21. · 4.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared.
The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm) gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau) gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17) that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name "island integration determinant" (iid).
These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire methylotrophic lifestyles.
PLoS ONE 02/2009; 4(5):e5584. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Renibacterium salmoninarum is the causative agent of bacterial kidney disease and a significant threat to healthy and sustainable production of salmonid fish worldwide. This pathogen is difficult to culture in vitro, genetic manipulation is challenging, and current therapies and preventative strategies are only marginally effective in preventing disease. The complete genome of R. salmoninarum ATCC 33209 was sequenced and shown to be a 3,155,250-bp circular chromosome that is predicted to contain 3,507 open-reading frames (ORFs). A total of 80 copies of three different insertion sequence elements are interspersed throughout the genome. Approximately 21% of the predicted ORFs have been inactivated via frameshifts, point mutations, insertion sequences, and putative deletions. The R. salmoninarum genome has extended regions of synteny to the Arthrobacter sp. strain FB24 and Arthrobacter aurescens TC1 genomes, but it is approximately 1.9 Mb smaller than both Arthrobacter genomes and has a lower G+C content, suggesting that significant genome reduction has occurred since divergence from the last common ancestor. A limited set of putative virulence factors appear to have been acquired via horizontal transmission after divergence of the species; these factors include capsular polysaccharides, heme sequestration molecules, and the major secreted cell surface antigen p57 (also known as major soluble antigen). Examination of the genome revealed a number of ORFs homologous to antibiotic resistance genes, including genes encoding beta-lactamases, efflux proteins, macrolide glycosyltransferases, and rRNA methyltransferases. The genome sequence provides new insights into R. salmoninarum evolution and may facilitate identification of chemotherapeutic targets and vaccine candidates that can be used for prevention and treatment of infections in cultured salmonids.
Journal of bacteriology 09/2008; 190(21):6970-82. · 3.94 Impact Factor
[show abstract][hide abstract] ABSTRACT: Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the "genome universe" of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation.
[show abstract][hide abstract] ABSTRACT: The human genome sequence has been finished to very high standards; however, more than 340 gaps remained when the finished genome was published by the International Human Genome Sequencing Consortium in 2004. Using fosmid resources generated from multiple individuals, we targeted gaps in the euchromatic part of the human genome. Here we report 2,488,842 bp of previously unknown euchromatic sequence, 363,114 bp of which close 26 of 250 euchromatic gaps, or 10%, including two remaining euchromatic gaps on chromosome 19. Eight (30.7%) of the closed gaps were found to be polymorphic. These sequences allow complete annotation of several human genes as well as the assignment of mRNAs. The gap sequences are 2.3-fold enriched in segmentally duplicated sequences compared to the whole genome. Our analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.
[show abstract][hide abstract] ABSTRACT: Heterokont algae form a monophyletic group within the stramenopile branch of the tree of life. These organisms display wide morphological diversity, ranging from minute unicells to massive, bladed forms. Surprisingly, chloroplast genome sequences are available only for diatoms, representing two (Coscinodiscophyceae and Bacillariophyceae) of approximately 18 classes of algae that comprise this taxonomic cluster. A universal challenge to chloroplast genome sequencing studies is the retrieval of highly purified DNA in quantities sufficient for analytical processing. To circumvent this problem, we have developed a simplified method for sequencing chloroplast genomes, using fosmids selected from a total cellular DNA library. The technique has been used to sequence chloroplast DNA of two Heterosigma akashiwo strains. This raphidophyte has served as a model system for studies of stramenopile chloroplast biogenesis and evolution.
H. akashiwo strain CCMP452 (West Atlantic) chloroplast DNA is 160,149 bp in size with a 21,822-bp inverted repeat, whereas NIES293 (West Pacific) chloroplast DNA is 159,370 bp in size and has an inverted repeat of 21,665 bp. The fosmid cloning technique reveals that both strains contain an isomeric chloroplast DNA population resulting from an inversion of their single copy domains. Both strains contain multiple small inverted and tandem repeats, non-randomly distributed within the genomes. Although both CCMP452 and NIES293 chloroplast DNAs contains 197 genes, multiple nucleotide polymorphisms are present in both coding and intergenic regions. Several protein-coding genes contain large, in-frame inserts relative to orthologous genes in other plastids. These inserts are maintained in mRNA products. Two genes of interest in H. akashiwo, not previously reported in any chloroplast genome, include tyrC, a tyrosine recombinase, which we hypothesize may be a result of a lateral gene transfer event, and an unidentified 456 amino acid protein, which we hypothesize serves as a G-protein-coupled receptor. The H. akashiwo chloroplast genomes share little synteny with other algal chloroplast genomes sequenced to date.
The fosmid cloning technique eliminates chloroplast isolation, does not require chloroplast DNA purification, and reduces sequencing processing time. Application of this method has provided new insights into chloroplast genome architecture, gene content and evolution within the stramenopile cluster.
[show abstract][hide abstract] ABSTRACT: Francisella tularensis subspecies tularensis and holarctica are pathogenic to humans, whereas the two other subspecies, novicida and mediasiatica, rarely cause disease. To uncover the factors that allow subspecies tularensis and holarctica to be pathogenic to humans, we compared their genome sequences with the genome sequence of Francisella tularensis subspecies novicida U112, which is nonpathogenic to humans.
Comparison of the genomes of human pathogenic Francisella strains with the genome of U112 identifies genes specific to the human pathogenic strains and reveals pseudogenes that previously were unidentified. In addition, this analysis provides a coarse chronology of the evolutionary events that took place during the emergence of the human pathogenic strains. Genomic rearrangements at the level of insertion sequences (IS elements), point mutations, and small indels took place in the human pathogenic strains during and after differentiation from the nonpathogenic strain, resulting in gene inactivation.
The chronology of events suggests a substantial role for genetic drift in the formation of pseudogenes in Francisella genomes. Mutations that occurred early in the evolution, however, might have been fixed in the population either because of evolutionary bottlenecks or because they were pathoadaptive (beneficial in the context of infection). Because the structure of Francisella genomes is similar to that of the genomes of other emerging or highly pathogenic bacteria, this evolutionary scenario may be shared by pathogens from other species.
[show abstract][hide abstract] ABSTRACT: Francisella tularensis is a bacterial pathogen that causes the zoonotic disease tularemia and is important to biodefense. Currently, the only vaccine known to confer protection against tularemia is a specific live vaccine strain (designated LVS) derived from a virulent isolate of Francisella tularensis subsp. holarctica. The origin and source of attenuation of this strain are not known. To assist with the design of a defined live vaccine strain, we sought to determine the genetic basis of the attenuation of LVS. This analysis relied primarily on the comparison between the genome of LVS and Francisella tularensis holarctica strain FSC200, which differ by only 0.08% of their nucleotide sequences. Under the assumption that the attenuation was due to a loss of function(s), only coding regions were examined in this comparison. To complement this analysis, the coding regions of two slightly more distantly related Francisella tularensis strains were also compared against the LVS coding regions. Thirty-five genes show unique sequence variations predicted to alter the protein sequence in LVS compared to the other Francisella tularensis strains. Due to these polymorphisms, the functions of 15 of these genes are very likely lost or impaired. Seven of these genes were demonstrated to be under stronger selective constraints, suggesting that they are the most probable to be the source of LVS attenuation and useful for a newly defined vaccine.
Infection and Immunity 01/2007; 74(12):6895-906. · 4.07 Impact Factor
[show abstract][hide abstract] ABSTRACT: Currently, challenges exist to acquire long-range (hundreds of kilobase pairs) phase-discriminated sequence across substantial numbers of individuals. We have developed a straightforward method for isolating and characterizing specific genomic regions in a haplospecific manner. Real-time PCR is carried out to STS content map and genotype pools of fosmid clones arrayed in 384-well microtiter plates. Single-nucleotide polymorphisms, microsatellite markers, and insertion-deletion polymorphisms are used to differentiate the target region into haplotype-specific tiling paths. DNA of clones from these tiling paths is retrieved from the library and either sequenced by standard shotgun methods or amplified in vitro and sequenced by a primer-based, directed method. This approach provides convenient access to complete, haplotype-resolved resequencing data from multiple individuals across tens to hundreds of thousands of basepairs. We illustrate its implementation with a detailed example of more than 400 kbp from the human CFTR region, across 15 individuals, and summarize our experience applying it to many other human loci.
[show abstract][hide abstract] ABSTRACT: Allelic variation in codons that specify amino acids that line the peptide-binding pockets of HLA's Class II antigen-presenting proteins is superimposed on strikingly few deeply diverged haplotypes. These haplotypes appear to have been evolving almost independently for tens of millions of years. By complete resequencing of 20 haplotypes across the approximately 100-kbp region that spans the HLA-DQA1, -DQB1, and -DRB1 genes, we provide a detailed view of the way in which the genome structure at this locus has been shaped by the interplay of selection, gene-gene interaction, and recombination.
Genome Research 10/2005; 15(9):1250-7. · 14.40 Impact Factor