Project

100K Pathogen Genome Project

Goal: The consortium is sequencing 100,000 bacterial pathogens. The goal is to produce and examine microbial evolution, determine pathogenicity genome features, and create information for reference databases used in bacterial identification.

Updates
0 new
1
Recommendations
0 new
3
Followers
0 new
41
Reads
0 new
355

Project log

Bart C Weimer
added 2 research items
Klebsiella pneumoniae is recognized as a common cause of nosocomial infections and outbreaks causing pneumonia, septicemia, and urinary tract infections. This opportunistic bacterium shows an increasing acquisition of antibiotic-resistance genes, which complicates treatment of infections. Hence, fast reliable strain typing methods are paramount for the study of this opportunistic pathogen’s multi-drug resistance genetic profiles. In this study, thirty-eight strains of K. pneumoniae isolated from the blood of pediatric patients were characterized by whole-genome sequencing and genomic clustering methods. Genes encoding β-lactamase were found in all the bacterial isolates, among which the blaSHV variant was the most prevalent (53%). Moreover, genes encoding virulence factors such as fimbriae, capsule, outer membrane proteins, T4SS and siderophores were investigated. Additionally, a multi-locus sequence typing (MLST) analysis revealed 24 distinct sequence types identified within the isolates, among which the most frequently represented were ST76 (16%) and ST70 (11%). Based on LPS structure, serotypes O1 and O3 were the most prevalent, accounting for approximately 63% of all infections. The virulence capsular types K10, K136, and K2 were present in 16, 13, and 8% of the isolates, respectively. Phylogenomic analysis based on virtual genome fingerprints correlated with the MLST data. The phylogenomic reconstruction also denoted association between strains with a higher abundance of virulence genes and virulent serotypes compared to strains that do not possess these traits. This study highlights the value of whole-genomic sequencing in the surveillance of virulence attributes among clinical K. pneumoniae strains.
Bart C Weimer
added a research item
Hungatella hathewayi has been observed to be a member of the gut microbiome. Unfortunately, little is known about this organism in spite of being associated with human fatalities; it is important to understand virulence mechanisms and epidemiological prospective to cause disease. In this study, a patient with chronic neurologic symptoms presented to the clinic with subsequent isolation of a strain with phenotypic characteristics suggestive of Clostridium difficile. However, whole-genome sequence found the organism to be H. hathewayi. Analysis including publicly available Hungatella genomes found substantial genomic differences as compared to the type strain, indicating this isolate was not C. difficile. We examined the whole-genome of Hungatella species and related genera, using comparative genomics to fully examine species identification and toxin production. Orthogonal phylogenetic using the 16S rRNA gene and entire genome analyses that included genome distance analyses using Genome-to-Genome Distance (GGDC), Average Nucleotide Identity (ANI), and a pan-genome analysis with inclusion of available public genomes determined the speciation to be Hungatella. Two clearly differentiated groups were identified, one including a reference H. hathewayi genome (strain DSM-13,479) and a second group that was determined to be H. effluvii, which included our clinical isolate. Also, some genomes reported as H. hathewayi were found to belong to other genera, including Clostridium and Faecalicatena. We show that the Hungatella species have an open pan-genome reflecting high genomic diversity. This study highlights the importance of correctly assigning taxonomic identification, particularly in disease-associated strains, to better understand virulence and therapeutic options.
Bart C Weimer
added a research item
The spread of SARS-CoV-2 created a pandemic crisis with > 150,000 cumulative cases in > 65 countries within a few months. The reproductive number (R) is a metric to estimate the transmission of a pathogen during an outbreak. Preliminary published estimates were based on the initial outbreak in China. Whole genome sequences (WGS) analysis found mutational variations in the viral genome; however, previous comparisons failed to show a direct relationship between viral genome diversity, transmission, and the epidemic severity. COVID-19 incidences from different countries were modeled over the epidemic curve. Estimates of the instantaneous R (Wallinga and Teunis method) with a short and standard serial interval were done. WGS were used to determine the populations genomic variation and that underpinned creation of the pathogen genome identity (GENI) score, which was merged with the outbreak curve in four distinct phases. Inference of transmission time was based on a mutation rate of 2 mutations/month. R estimates revealed differences in the transmission and variable infection dynamics between and within outbreak progression for each country examined. Outside China, our R estimates observed propagating dynamics indicating that other countries were poised to move to the takeoff and exponential stages. Population density and local temperatures had no clear relationship to the outbreak progression. Integration of incidence data with the GENI score directly predicted increases in cases as the genome variation increased that led to new variants. Integrating the outbreak curve, dynamic R, and SNP variation found a direct association between increasing cases and transmission genome evolution. By defining the epidemic curve into four stages and integrating the instantaneous country-specific R with the GENI score, we directly connected changes in individual outbreaks based on changes in the virus genome via SNPs. This resulted in the ability to forecast potential increases in cases as well as mutations that may defeat PCR screening and the infection process. By using instantaneous R estimations and WGS, outbreak dynamics were defined to be linked to viral mutations, indicating that WGS, as a surveillance tool, is required to predict shifts in each outbreak that will provide actionable decision making information. Integrating epidemiology with genome sequencing and modeling allows for evidence-based disease outbreak tracking with predictive therapeutically valuable insights in near real time.
Bart C Weimer
added 2 research items
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Vibrio parahaemolyticus is the most common cause of seafood-borne illness reported in the United States. Draft genomes of 132 North American clinical and oyster V. parahaemolyticus isolates were sequenced to investigate their phylogenetic and biogeographic relationships. The majority of oyster isolate sequence types (STs) were from a single harvest location; however, four were identified from multiple locations. There was population structure along the Gulf and Atlantic Coasts of North America, with what seemed to be a hub of genetic variability along the Gulf Coast with some of the same STs occurring along the Atlantic Coast and one shared between the coastal waters of the Gulf and those of Washington state. Phylogenetic analyses found nine well-supported clades. Two clades were composed of isolates from both clinical and oyster sources. Four were composed entirely from clinical sources and three entirely from oyster sources. Each single source clade consisted of one ST. Some human isolates lack tdh and trh and some T3SS genes, which are established virulence genes of V. parahaemolyticus. Thus, these genes are not essential for pathogenicity. However, isolates in the monophyletic groups from clinical sources were enriched in several categories of genes when compared to those from monophyletic groups of oyster isolates. These functional categories include: cell signaling, transport, and metabolism. Identification of genes in these functional categories provides a basis for future in-depth pathogenicity investigations of V. parahaemolyticus. IMPORTANCE Vibrio parahaemolyticus is the most common cause of seafood-borne illness reported in the United States and is frequently associated with shellfish consumption. This study contributes to our knowledge of the biogeography and functional genomics of this species around North America. STs shared between the Gulf Coast and the Atlantic seaboard as well as Pacific waters suggests possible transport via oceanic currents or large shipping vessels. STs frequently isolated from humans, but rarely if ever from the environment, are likely more competitive in the human gut compared to other STs. This could be due to additional functional capabilities in areas like cell signaling, transport, and metabolism which may give these isolates an advantage in novel nutrient replete environments like the human gut.
Bart C Weimer
added 4 research items
In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides , Clostridium , Lactococcus , Aeromonas , and Citrobacter . We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species' viability from total RNA sequencing.
Sierra Mixe maize is a geographically remote landrace variety grown on nitrogen-deficient fields in Oaxaca, Mexico that meets its nutritional requirements without synthetic fertilizer by associating with free-living diazotrophs comprising the microbiota of its aerial root mucilage. We selected nearly 500 diazotrophic (N 2-fixing) bacteria isolated from Sierra Mixe maize mucilage and sequenced their genomes. Comparative genomic analysis demonstrated that isolates represented diverse genera and composed three major diazotrophic groups based on nitrogen fixation gene content. In addition to nitrogen fixation, we examined deamination of 1-amino-1-cyclopropanecarboxylic acid, biosynthesis of indole-3-acetic acid, and phosphate solubilization as alternative mechanisms of direct plant growth promotion (PGP). Genome mining showed that isolates of all diazotrophic groups possessed marker genes for multiple mechanisms of direct plant growth promotion (PGP). Implementing in vitro assays corroborated isolate genotypes by measuring each isolate's potential to confer the targeted PGP traits and revealed phenotypic variation among isolates based on diazotrophic group assignment. Investigating the ability of mucilage diazotrophs to confer PGP by direct inoculation of clonally propagated potato plants in planta led to the identification of 16 bio-stimulant candidates. Conducting nitrogen-stress greenhouse experiments demonstrated that potato inoculation with a synthetic community of bio-stimulant candidates, as well as with its individual components, resulted in PGP phenotypes. We further demonstrated that one dia-zotrophic isolate conferred PGP to a conventional maize variety under nitrogen-stress in the greenhouse. These results indicate that, while many diazotrophic isolates from Sierra Mixe maize possessed genotypes and in vitro phenotypes for targeted PGP traits, a subset of these organisms promoted the growth of potato and conventional maize, potentially through the use of multiple promotion mechanisms. PLOS ONE PLOS ONE | https://doi.org/10.1371/journal.pone.
A geographically isolated maize landrace cultivated on nitrogen-depleted fields without synthetic fertilizer in the Sierra Mixe region of Oaxaca, Mexico utilizes nitrogen derived from the atmosphere and develops an extensive network of mucilage-secreting aerial roots that harbors a diazotrophic (N2-fixing) microbiota. Targeting these diazotrophs, we selected nearly 600 microbes of a collection obtained from mucilage and confirmed their ability to incorporate heavy nitrogen (15N2) metabolites in vitro. Sequencing their genomes and conducting comparative bioinformatic analyses showed that these genomes had substantial phylogenetic diversity. We examined each diazotroph genome for the presence of nif genes essential to nitrogen fixation (nifHDKENB) and carbohydrate utilization genes relevant to the mucilage polysaccharide digestion. These analyses identified diazotrophs that possessed the canonical nif gene operons, as well as many other operon configurations with concomitant fixation and release of >700 different 15N labeled metabolites. We further demonstrated that many diazotrophs possessed alternative nif gene operons and confirmed their genomic potential to derive chemical energy from mucilage polysaccharide to fuel nitrogen fixation. These results confirm that some diazotrophic bacteria associated with Sierra Mixe maize were capable of incorporating atmospheric nitrogen into their small molecule extracellular metabolites through multiple nif gene configurations while others were able to fix nitrogen without the canonical (nifHDKENB) genes.
Bart C Weimer
added a research item
Sierra Mixe maize is a geographically remote landrace variety grown on nitrogen-deficient fields in Oaxaca, Mexico that meets its nutritional requirements without synthetic fertilizer by associating with free-living diazotrophs comprising the microbiota of its aerial root mucilage. We selected nearly 500 diazotrophic bacteria isolated from Sierra Mixe maize mucilage and sequenced their genomes. Comparative genomic analysis demonstrated that isolates represented diverse genera and possessed multiple marker genes for mechanisms of direct plant growth promotion (PGP). In addition to nitrogen fixation, we examined deamination of 1-amino-1-cyclopropanecarboxylic acid, biosynthesis of indole-3-acetic acid, and phosphate solubilization. Implementing in vitro colorimetric assays revealed each isolate’s potential to confer the alternative PGP activities that corroborated genotype and pathway content. We examined the ability of mucilage diazotrophs to confer PGP by direct inoculation of clonally propagated potato plants in planta, which led to the identification of bio-stimulant candidates that were tested for PGP by inoculating a conventional maize variety. The results indicate that, while many diazotrophic isolates from Sierra Mixe maize possessed genotypes and in vitro phenotypes for targeted PGP traits, a subset of these organisms promoted the growth of potato and conventional maize using multiple promotion mechanisms.
Bart C Weimer
added 7 research items
Taxonomic classification is an essential step in the analysis of microbiome data that depends on a reference database of whole genome sequences. Taxonomic classifiers are built on established reference species, such as the Human Microbiome Project database, that is growing rapidly. While constructing a population wide pangenome of the bacterium Hungatella, we discovered that the Human Microbiome Project reference species Hungatella hathewayi (WAL 18680) was significantly different to other members of this genus. Specifically, the reference lacked the core genome as compared to the other members. Further analysis, using average nucleotide identity (ANI) and 16s rRNA comparisons, indicated that WAL18680 was misclassified as Hungatella. The error in classification is being amplified in the taxonomic classifiers and will have a compounding effect as microbiome analyses are done, resulting in inaccurate assignment of community members and will lead to fallacious conclusions and possibly treatment. As automated genome homology assessment expands for microbiome analysis, outbreak detection, and public health reliance on whole genomes increases this issue will likely occur at an increasing rate. These observations highlight the need for developing reference free methods for epidemiological investigation using whole genome sequences and the criticality of accurate reference databases.
RNA viruses are hypermutable. Using reovirus as model system for hypermutable virus evolution and reassortment. Avian reovirus (ARV) in meat type chickens manifests as a plethora of clinical signs ranging from runting and stunting to a severe disease characterized by viral tenosynovitis, pericarditis and myocarditis. The strategy to control the disease in meat type chickens entails breeder live virus vaccination using conventional S1133-like strains, followed by autogenous vaccines using prevalent isolates obtained from the field. However, the rate of change in the virus hinders our ability to obtain vaccines that provide persistent protection.
The goal of the project is to determine novel biomarkers of abortion in ruminants due to Campylobacter jejuni infection using predictive modeling of whole genome sequencing and machine learning. Determining the genomic basis for phenotypes is going to impact infectious disease surveillance, vaccine design and public health. Here we applied a novel approach to population genomics of infectious disease using machine learning.
Bart C Weimer
added a research item
Background: Global spread of COVID-19 created an unprecedented infectious disease crisis that progressed to a pandemic with >180,000 cases in >100 countries. Reproductive number (R) is an outbreak metric estimating the transmission of a pathogen. Initial R values were published based on the early outbreak in China with limited number of cases with whole genome sequencing. Initial comparisons failed to show a direct relationship viral genomic diversity and epidemic severity was not established for SARS-Cov-2. Methods: Each country's COVID-19 outbreak status was classified according to epicurve stage (index, takeoff, exponential, decline). Instantaneous R estimates (Wallinga and Teunis method) with a short and standard serial interval examined asymptomatic spread. Whole genome sequences were used to quantify the pathogen genome identity score that were used to estimate transmission time and epicurve stage. Transmission time was estimated based on evolutionary rate of 2 mutations/month. Findings: The country-specific R revealed variable infection dynamics between and within outbreak stages. Outside China, R estimates revealed propagating epidemics poised to move into the takeoff and exponential stages. Population density and local temperatures had variable relationship to the outbreaks. GENI scores differentiated countries in index stage with cryptic transmission. Integration of incidence data with genome variation directly increases in cases with increased genome variation. Interpretation: R was dynamic for each country and during the outbreak stage. Integrating the outbreak dynamic, dynamic R, and genome variation found a direct association between cases and genome variation. Synergistically, GENI provides an evidence-based transmission metric that can be determined by sequencing the virus from each case. We calculated an instantaneous country-specific R at different stages of outbreaks and formulated a novel metric for infection dynamics using viral genome sequences to capture gaps in untraceable transmission. Integrating epidemiology with genome sequencing allows evidence-based dynamic disease outbreak tracking with predictive evidence.
Bart C Weimer
added 2 research items
Ontologies are built in various domains such as biology, chemistry, and business. Ontologies as knowledge bases have great potential to serve as providers of context for analytics not only to yield more relevant results but also to provide meaning in explaining results. Simply put, analysis without context ignores the underlying meaning in data. In this paper, we discuss one important example, how classical classification of organisms in biology can become obsolete given the tremendous amount of genetic data now being analyzed under the lens of gene ontologies. Gene ontologies provide a functional context to how organisms operate and satisfy the functions of life. Ontologies such as gene ontologies encapsulate collective intelligence of scientist based on many decades of work. In this paper, we put forth a vision of contextual analytics in the field of genetics powered with big data and describe blueprints of an analytics architecture specifically designed to utilize ontologies as reference contextual knowledge bases.
Bacillus velezensis CE2 produces potent antimicrobial compound(s). The draft genome sequence of the strain reported here is 4.1 Mb with a G+C content of 46.1%. Whole-genome sequencing revealed that the strain genetically encodes a novel multicomponent lantibiotic, velezensicidin.
Bart C Weimer
added a research item
Figure 1: 100K Pathogen Genome Project sample preparation workflow for multiplexed, short-read Illumina sequencing Figure 3: Detailed KAPA HTP Library Preparation protocol. The input into library construction is fragmented DNA or cDNA. Each enzymatic reaction is followed by a SPRI-bead cleanup. The "with-bead" protocol uses a single aliquot of SPRI beads for all cleanups prior to library amplification, producing higher yields of adapter-ligated libraries, and reduces the number of amplification cycles to generate sufficient material for Library QC and sequencing. Figure 2: Representative electropherograms of Listeria (generated on the Agilent 2100 Bioanalyzer system and Agilent 2200 TapeStation system) of bacterial libraries prepared for whole genome sequencing with the KAPA HTP Library Preparation Kit. The average library size for each genus was as indicated. Peaks at 35 and 10381 bp are internal standards used for alignment and quantitation determination with the Agilent 2100 Bioanalyzer system. ABSTRACT A method was developed to automate the KAPA HTP Library Preparation kit for microbial whole genome sequencing. This method uses the Agilent NGS Workstation, consisting of the NGS Bravo liquid handling platform with its accessories for heating, cooling, shaking, and magnetic bead manipulations in a 96-well format. User intervention in multistep protocols is minimized through the use of other components of the workstation such as the BenchCel 4R Microplate Handler and Labware MiniHub for labware storage and movement. This method has been validated for sequencing on the Illumina platform and consists of three protocols: the first is for end repair to post-ligation cleanup; the second is used for library amplification setup; and the third is for the post-amplification cleanup. The modular design provides the end-user with the flexibility to complete library construction over two days, and is suitable for the construction of high-quality libraries from bacteria of various GC content. This combined solution produced a workflow that is suitable for production-scale sequencing projects such as the 100K Pathogen Genome Project.
Bart C Weimer
added a research item
Chitinases are glycosyl hydrolases that catalyze the hydrolysis of the β-1,4 linkages in complex carbohydrates and those that contain GlcNAc. These enzymes are considered emerging virulence factors during infection because the host glycan changes. This is the release of four single chitinase deletion mutants in Salmonella enterica serovar Typhimurium LT2.
Bart C Weimer
added a research item
SigTree tutorial vignette. An overview and demonstration of the main syntax to use the SigTree software package, with sample data. Also archived (with possible future updates) at http://cran.r-project.org/web/packages/SigTree/index.html.
Bart C Weimer
added a research item
The 100K Pathogen Genome Project is producing draft and closed genome sequences from diverse pathogens. This project expanded globally to include a snapshot of global bacterial genome diversity. The genomes form a sequence database that has a variety of uses from systematics to public health.
Bart C Weimer
added a research item
Salmonella is a common food-associated bacterium that has substantial impact on worldwide human health and the global economy. This is the public release of 1,183 Salmonella draft genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Salmonella genus.
Bart C Weimer
added a research item
Microbial community analysis experiments to assess the effect of a treatment intervention (or environmental change) on the relative abundance levels of multiple related microbial species (or operational taxonomic units) simultaneously using high throughput genomics are becoming increasingly common. Within the framework of the evolutionary phylogeny of all species considered in the experiment, this translates to a statistical need to identify the phylogenetic branches that exhibit a significant consensus response (in terms of operational taxonomic unit abundance) to the intervention. We present the R software package SigTree, a collection of flexible tools that make use of meta-analysis methods and regular expressions to identify and visualize significantly responsive branches in a phylogenetic tree, while appropriately adjusting for multiple comparisons.
Bart C Weimer
added a research item
Lysozyme enzymes hydrolyze the β-1,4-glycosidic bond in oligosaccharides. These enzymes are part of a broad group of glucoside hydrolases that are poorly characterized; however, they are important for growth and are being recognized as emerging virulence factors. This is the release of four lysozyme-encoding-gene-deletion mutants in Salmonella enterica serovar Typhimurium LT2.
Bart C Weimer
added 2 research items
Sialidases, which are widely distributed in nature, cleave the α-ketosidic bond of terminal sialic acid residue. These emerging virulence factors degrade the host glycan. We report here the release of seven sialidase and one sialic acid transporter deletion in Salmonella enterica serovar Typhimurium strain LT2, which are important in cellular invasion during infection.
Amylases catalyze the cleavage of α- d -1,4 and α- d -1,6-glycosidic bonds in starch and related carbohydrates. Amylases are widely distributed in nature and are important in carbohydrate metabolism. This is the release of four single and two double deletions in Salmonella enterica serovar Typhimurium LT2 that are important for glycan degradation during infection.
Bart C Weimer
added a research item
The Salmonella Syst-OMICS consortium is sequencing 4,500 Salmonella genomes and building an analysis pipeline for the study of Salmonella genome evolution, antibiotic resistance and virulence genes. Metadata, including phenotypic as well as genomic data, for isolates of the collection are provided through the Salmonella Foodborne Syst-OMICS database (SalFoS), at https://salfos.ibis.ulaval.ca/. Here, we present our strategy and the analysis of the first 3,377 genomes. Our data will be used to draw potential links between strains found in fresh produce, humans, animals and the environment. The ultimate goals are to understand how Salmonella evolves over time, improve the accuracy of diagnostic methods, develop control methods in the field, and identify prognostic markers for evidence-based decisions in epidemiology and surveillance.
Bart C Weimer
added 3 research items
The PacBio RS II provides for single molecular, real-time (SMRT) DNA technology to sequence genomes and detect DNA modifications. The quality control methods from gDNA input to the final library using the Agilent BioanalyzerSytem and Agilent TapeStation System were evaluated. Automated protocols of PacBio 10 kb library preparation produced libraries with similar technical performance to those generated manually. The TapeStation System proved to be a reliable method that could be used in a 96-well plate format to QC the DNA equivalent to the standard Bioanalyzer System results. The DNA Integrity Number that is calculated in the TapeStationSystem software upon analysis of genomic DNA is quite helpful to assure that the starting genomic DNA is not degraded. In this respect the gDNA assay on the TapeStation System is preferable to the DNA 12000 assay on the BioanalyzerSystem, which cannot run genomic DNA, nor can the Bioanalyzer work directly from the 96-well plates.
Shigella is a major foodborne pathogen that infects humans and non-human primates and is the major cause of dysentery and reactive arthritis worldwide. This is the initial public release of 16 Shigella genome sequences from four species sequenced as part of the 100K Pathogen Genome Project.
Next Generation Sequencing (NGS) is a process that can be used to construct DNAlibraries for large-scale sequencing projects. NGS utilizes the input of high molecular weight and intact genomicDNA(gDNA) to construct high-quality libraries. The assessment ofDNAintegrity is a key step in library construction. The Agilent 2200 TapeStation System with the genomic DNAassays assist in the determination of DNAquality. The system’s software algorithms allow for a visual inspection ofDNAas well as generate aDNAIntegrity Number (DIN) to indicate the integrity of extracted DNA. With these quality control steps for gDNA quality, it allows the next step in the library pipeline to be normalized and at an optimal size to produce quality final libraries using the KAPAHTP LibraryPreparationKit. GenomicDNAquality was analyzed with theAgilent 2200 TapeStationAnalysis Software and DINvalues were generated. Samples with values closer to 10 were accepted as high molecular weight and intact can easily produce quality final libraries.
Allison M Weis
added a research item
Campylobacter jejuni is an enteric bacterium that can cause abortion in livestock. This is the release of a multidrug-resistant Campylobacter jejuni genome from an isolate that caused an abortion in a cow in northern California. This isolate is part of the 100K Pathogen Genome Project.
Bart C Weimer
added 2 research items
Background The PacBio RS II provides for single molecule, real-time DNA technology to sequence genomes and detect DNA modifications. The starting point for high-quality sequence production is high molecular weight genomic DNA. To automate the library preparation process, there must be high-throughput methods in place to assess the genomic DNA, to ensure the size and amounts of the sheared DNA fragments and final library. FindingsThe library construction automation was accomplished using the Agilent NGS workstation with Bravo accessories for heating, shaking, cooling, and magnetic bead manipulations for template purification.The quality control methods from gDNA input to final library using the Agilent Bioanalyzer System and Agilent TapeStation System were evaluated. Conclusions Automated protocols of PacBio 10 kb library preparation produced libraries with similar technical performance to those generated manually. The TapeStation System proved to be a reliable method that could be used in a 96-well plate format to QC the DNA equivalent to the standard Bioanalyzer System results. The DNA Integrity Number that is calculated in the TapeStation System software upon analysis of genomic DNA is quite helpful to assure that the starting genomic DNA is not degraded. In this respect, the gDNA assay on the TapeStation System is preferable to the DNA 12000 assay on the Bioanalyzer System, which cannot run genomic DNA, nor can the Bioanalyzer work directly from the 96-well plates.
Listeria monocytogenes is a gram positive, intracellular pathogen that infects immune-compromised populations and has ~50% mortality rate. It is responsible for numerous food-borne outbreaks worldwide each year and commonly persists in the environment. While L.monocytogenes is the only pathogenic species in this genus it is increasingly important to define specific genomic features that are predictive for Listeria and the pathogenic species.As part of the 100KGenomeProject >1000 isolates from food, the environment, and humans are being sequenced to better understand the pan-genome of Listeria so as to enable more robust detection methods for routine testing as well as in formatic analysis for inclusion and exclusion of isolates from outbreaks as well as differentiation from non-pathogenic species quickly and accurately. To facilitate detection and outbreak identification in food-borne outbreaks, select L.monocytogenes genomes were sequenced to produce a closed genome and identify epigenetic modifications using Pacific Biosciences SMRT cell technology. This sequencing approach visualizes polymerase progression along the genome, creating a pattern of nucleotide recognition and identification of DNA modification sites correlating with the duration of polymerase stalling times. Specific modifications are identified by polymerase stalling times and patterns. Results from this study revealed that the L.monocytogenes genome contains multiple sites for DNA methylation. Methylation patterns were strain-specific with certain strains exhibiting multiple methylation patterns with no correlation to serotype or isolation source. Unexpectedly, a novel DNA modification was detected in an isolate that originated from animals that belongs to serotype 1/2a . The signal observed during sequencing was unlike any previously identified, but was repeatable and measurable in multiple locations in the genome. While the novel L.monocytogenes DNA modification was detected with SMRT cell technology the specific chemical group remains to be characterized. These results illustrate the diversity of bacterial epigenetic events, novel enzymes to catalyze the modification, unknown gene regulation strategies, and highlight the importance of understanding the role of epigenetics in the survival, host association, and pathogenesis of food-borne diseases.
Bart C Weimer
added a research item
The Weimer group studies host/microbe interactions using systems biology and food safety principles to understand how bacteria survive, grow, and persist to cause disease
Bart C Weimer
added a research item
Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra-and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. py-lori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colo-nization, the HpaA neuraminyllactose-binding protein and the FutB-1,3 lipopoly-saccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains. IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish " the genome " of a bacterial strain. Variability is usually reduced (" only sequence from a single colony "), ignored (" just publish the consensus "), or placed in the " too-hard " basket (" analysis of raw read data is more robust "). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory tech
Bart C Weimer
added a research item
Listeria monocytogenes is a food-associated bacterium that is responsible for food-related illnesses worldwide. This is the initial public release of 306 L. monocytogenes genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in L. monocytogenes .
Bart C Weimer
added an update
We published three genome releases in the last month that amount to over 500 genomes. More on the way!
 
Bart C Weimer
added a research item
Campylobacter is a food-associated bacterium and a leading cause of foodborne illness worldwide, being associated with poultry in the food supply. This is the initial public release of 202 Campylobacter genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Campylobacter genus.
Bart C Weimer
added 19 research items
Shearing of bacterial gDNA within a specific size range prior to sequencing library construction is a critical step in Next Generation Sequencing workflows. The quality control of the sheared bacterial gDNA is required in large multiplexed formats for large volume workflows, such as those used in the 100K Pathogen Genome Sequencing Project. Using the Covaris E220 instrument, the power and treatment time were varied to determine the effect on the optimal fragment size (150–350 bp) in the resulting sheared gDNA of four bacterial pathogens: Salmonella enterica subsp. enterica serovar Saint Paul strain Sp3 and serovar Typhimurium strain LT2, Klebsiella sp. and Vibrio spp. DNA fragment quantification and sizing were measured using an Agilent 2200 TapeStation system, and Agilent High Sensitivity D1000 ScreenTape assay. The 2200 TapeStation system was suitable to determine size distribution after fragmentation of gDNA in a 96-well plate format, a format suitable for high-throughput workflow and compatible with shearing technologies that use a 96-well plate multiplexed format. This approach enabled the measurement of gDNA and sheared DNA using a single technology.
Next Generation Sequencing (NGS) requires the input of high molecular weight genomic DNA (gDNA) to construct quality libraries for large scale sequencing projects , such as the 100K Pathogen Genome Project. The assessment of DNA integrity is a critical first step in obtaining meaningful data, and intact DNA is a key element for successful library construction. The Agilent 2200 TapeStation System plays an important role in the determination of the DNA quality using the DNA genomic assay. Profiles generated on the 2200 TapeStation System yield information on concentration , allow a visual inspection of the DNA quality, and generate a DNA Integrity Number (DIN), which is a value automatically assigned by the software that provides an indication of integrity (that is, lack of degradation). This application note describes a new software algorithm that has been developed to extract information about DNA sample integrity from the 2200 TapeStation System electrophoretic trace.
The initial step in Next Generation Sequencing is to construct a library from genomic DNA. To gain the optimum result, extracted DNA must be of high molecular weight with limited degradation. High-throughput sequencing projects, such as the 100K Pathogen Genome Project, require methods to rapidly assess the quantity and quality of genomic DNA extracts. In this study, assessment of the applicability of the Agilent 2200 TapeStation was done using genomic DNA from nine foodborne pathogens using several accepted high-throughput methods. The Agilent 2200 TapeStation System with Genomic DNA ScreenTape and Genomic DNA Reagents was easy to use with minimal manual intervention. An important advantage of the 2200 TapeStation over other high-throughput methods was that high molecular weight genomic DNA quality and quantity can be quantified apart from lower molecular weight size ranges, providing a distinct advantage in the library construction pipeline and over other methods available for this important step in the Next Generation Sequencing process.
Bart C Weimer
added a project goal
The consortium is sequencing 100,000 bacterial pathogens. The goal is to produce and examine microbial evolution, determine pathogenicity genome features, and create information for reference databases used in bacterial identification.