Heike Sichtig’s research while affiliated with U.S. Food and Drug Administration and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


Fig. 4 Comparison of NCBI Nt and FDA-ARGOS read classification results. Visualizing bioinformatics analysis with the MegaBLAST tool of metagenomics shotgun data of mock clinical human blood sample spiked with 10 5 E. avium. The heatmap showed read classification results for triplicate samples run against 200 database instances. Dark blue indicates read numbers below 10. A gradient from white to red indicates read numbers ranging from above 10 to 100,000. Here we demonstrated read classification results for all simulated species. E. avium classification results were consistent across all database instances. In addition, several other species were classified at >1000 reads with the normalized NCBI Nt database instances (Supplementary Data 3 and 4)
Bundibugyo ebolavirus performance summary
EBOV mock clinical trial prior probabilities
Proposed composite reference method (C-RM) for ID-NGS diagnostics. Panel a illustrates a walkthrough of the C-RM. Here, we show in silico target sequence comparison with FDA-ARGOS reference genomes in combination with representative clinical testing to understand the performance of ID-NGS diagnostic tests. Using raw sequence data from the ID-NGS diagnostic test device, in silico comparison of results obtained with the assay in-house database to results when using FDA-ARGOS will evaluate device bioinformatic analysis pipelines and report generation while eliminating the need for additional sample testing with a gold standard comparator (current FDA benchmarks). Overall, we anticipate the use of the C-RM based on assay-specific subsets of clinical samples and/or microbial reference materials (MRMs) for clinical validation in combination with FDA-ARGOS in silico target sequence comparison to generate scientifically valid evidence for understanding the performance of ID NGS diagnostic tests. Panel b lists the required quality control metrics for passing the regulatory-grade reference genome criteria. At a minimum, an FDA-ARGOS regulatory-grade reference genome adheres to six metrics (a–f). Specifically, category f details the minimum data requirements that are further described in (c). In addition, panel d lists the 10 critical metadata that need to be ascribed to a genome to meet the regulatory-grade criteria
FDA-ARGOS quality-controlled reference genomes for diagnostic use. Summary statistics of the current 487 microbial genomes show primary coverage of FDA-ARGOS resides with bacterial isolates, followed by viruses and then eukaryotic parasites (a). Supplementary Data 1 provides accessions for all 487 genomes currently available publicly. A majority of FDA-ARGOS constituents (b) originate from North America and are from human clinical isolation

+1

FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science
  • Article
  • Full-text available

July 2019

·

450 Reads

·

136 Citations

Heike Sichtig

·

·

Yi Yan

·

[...]

·

Uwe Scherf

FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials.

Download



Fig. 2 mCaller workflow and classification of E. coli sites in R9.4 data. a The pipeline for classification of adenines as methylated or unmethylated. b Probabilities of methylation defined by a neural network classifier for methylated compared to unmethylated positions in E. coli, with a model trained on one dataset and tested on a second. c ROC curves for the neural network model using R9.4 data, showing true positive rate (methylated positions correctly identified) as a function of false positive rate (unmethylated positions called as methylated) with varying probability thresholds for classification. We tested modifications to the standard model ("NN") using only high quality reads (average base quality > 9, "NN_hq") and classifying observations that included a maximum of two skips ("NN_sk"). A curve was calculated for genomic positions with ≥15× coverage by varying the fraction of reads with probability of methylation scores ≥0.5 required to define a position as methylated ("NN_pos"). Boxplot center lines show medians and whiskers 1.5x interquartile range
m⁶A methylation affects nanopore signal. Picoampere currents deviate from model values as the DNA surrounding a methylated adenine is pulled through a nanopore. a, b The deviations vary according to the position of the adenine within the pore and its surrounding sequence context. c Across all sequence contexts, the greatest deviations for R9 data occurred with the adenine in the fourth or fifth position among six nucleotides considered by the model in and around a pore. Boxplot center lines show medians and whiskers 1.5× interquartile range. Outliers are truncated at +/− 20 pA to better visualize data trends
Motif detection in a microbial reference community. a The percent of sites at different motifs identified as methylated by PacBio and nanopore (ONT) across species in a reference community. b A comparison between the percent m⁶A in the genome detected using a commercial ELISA kit and based on the motif sites identified using PacBio and confirmed by MeDIP-seq (Spearman ρ = 0.97, p = 6.54E−6). c The percent of non-motif sites identified by PacBio that were also called called as methylated using mCaller. The total non-motif sites called by PacBio for each strain is noted as n
Comparison to Tombo. a Precision-recall curves for E. coli motifs called by mCaller (per-position calls, for comparison to Tombo) and Tombo for a model trained using E. coli data, with true negatives corresponding to the same motifs within P. aeruginosa. Inset: comparison to true negatives drawn from a random set of true unmethylated motif positions within E. coli suggests GATC motifs are more frequently identified as methylated by both mCaller and Tombo than random adenines. b Precision-recall curves for untrained motifs in E. faecalis, with true negatives again drawn from P. aeruginosa
Methylated motif depletion. Ratio of observed over expected motif counts calculated by maximal order Markov model a in assembled genomes, colored by species and labeled by first nucleotide where there were multiple motifs for a species, b in predicted prophages within assembled genomes, and c in viruses associated with each species. d The distributions of ratios for different types of restriction-modification system across viruses associated with different species in the REBASE database. Boxplot center lines show medians, notches confidence intervals, and whiskers 1.5x interquartile range
Single-molecule sequencing detection of N6-methyladenine in microbial reference materials

February 2019

·

519 Reads

·

146 Citations

The DNA base modification N6-methyladenine (m⁶A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m⁶A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.



FDA-ARGOS: A Public Quality-Controlled Genome Database Resource for Infectious Disease Sequencing Diagnostics and Regulatory Science Research

November 2018

·

113 Reads

·

4 Citations

Infectious disease next generation sequencing (ID-NGS) diagnostics are on the cusp of revolutionizing the clinical market. To facilitate this transition, FDA proactively invested in tools to support innovation of emerging technologies. FDA and collaborators established a publicly available database, FDA dAtabase for Regulatory-Grade micrObial Sequences (FDA-ARGOS), as a tool to fill reference database gaps with quality-controlled genomes. This manuscript discusses quality control metrics for the proposed FDA-ARGOS genomic resource and outlines the need for quality-controlled genome gap filling in the public domain. Here, we also present three case studies showcasing potential applications for FDA-ARGOS in infectious disease diagnostics, specifically: assay design, reference database and in silico sequence comparison in combination with representative microbial organism wet lab testing; a novel composite validation strategy for ID-NGS diagnostics. The use of FDA-ARGOS as an in silico comparator tool could reduce the burden for completing ID-NGS clinical trials. In addition, use cases identifying Enterococcus avium and Ebola virus (Zaire ebolavirus variant Makona) demonstrate the utility of FDA-ARGOS as a reference database for independent performance validation of new tests and for documenting how one would use this database as an in silico sequence target comparator tool for ID-NGS validation, respectively.



Extra-Chromosomal DNA Sequencing Reveals Episomal Prophages Capable of Impacting Virulence Factor Expression in Staphylococcus aureus

July 2018

·

158 Reads

·

20 Citations

Staphylococcus aureus is a major human pathogen with well-characterized bacteriophage contributions to its virulence potential. Recently, we identified plasmidial and episomal prophages in S. aureus strains using an extra-chromosomal DNA (exDNA) isolation and sequencing approach, uncovering the plasmidial phage ϕBU01, which was found to encode important virulence determinants. Here, we expanded our extra-chromosomal sequencing of S. aureus, selecting 15 diverse clinical isolates with known chromosomal sequences for exDNA isolation and next-generation sequencing. We uncovered the presence of additional episomal prophages in 5 of 15 samples, but did not identify any plasmidial prophages. exDNA isolation was found to enrich for circular prophage elements, and qPCR characterization of the strains revealed that such prophage enrichment is detectable only in exDNA samples and would likely be missed in whole-genome DNA preparations (e.g., detection of episomal prophages did not correlate with higher prophage excision rates nor higher excised prophage copy numbers in qPCR experiments using whole-genome DNA). In S. aureus MSSA476, we found that enrichment and excision of the prophage ϕSa4ms into the cytoplasm was temporal and that episomal prophage localization did not appear to be a precursor to lytic cycle replication, suggesting ϕSa4ms excision into the cytoplasm may be part of a novel lysogenic switch. For example, we show that ϕSa4ms excision alters the promoter and transcription of htrA2, encoding a stress-response serine protease, and that alternative promotion of htrA2 confers increased heat-stress survival in S. aureus COL. Overall, exDNA isolation and focused sequencing may offer a more complete genomic picture for bacterial pathogens, offering insights into important chromosomal dynamics likely missed with whole-genome DNA-based approaches.



Figure 1: ANI neighboring table for the type genome of Cronobacter malonaticus. b Structured comment for the GenBank flatfile, summarizing the evidence that supports the taxonomic identification update. c Screen shot of the kmer tree showing the misidentified Cronobacter sakazakii genome. Type genomes are highlighted in blue, RefSeq reference genomes in purple. ANI spans are shown for several clades. As is evident in a, every genome in the malonaticus clade will be very close to 94.7 % ANI with respect to every genome in the sakazakii clade. In addition, two misidentified Cronobacter turicensis genomes appear at the top of the figure
Fig. 1 (See legend on next page.)
and references for abbreviations and terms used in this paper
Meeting Report: GenBank Microbial Genomic Taxonomy Workshop (12-13 May, 2015)

December 2016

·

397 Reads

·

77 Citations

Standards in Genomic Sciences

Many genomes are incorrectly identified at GenBank. We developed a plan to find and correct misidentified genomes using genomic comparison statistics together with a scaffold of reliably identified genomes from type. A workshop was organized with broad representation from the bacterial taxonomic community to review the proposal, the GenBank Microbial Genomic Taxonomy Workshop, Bethesda MD, May 12–13, 2015. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0134-1) contains supplementary material, which is available to authorized users.


Citations (6)


... Our method prioritizes genome completeness and precise annotations. By analyzing 734 strains from 514 species with these annotations, we could redefine bacterial proteomes with HOGs across 23% of currently known bacterial species able to infect humans and uncover potential novel widespread pathogenicity determinants, including most common pathogens, as demonstrated by a comparison with a comprehensive list of human-associated pathogens (Bartlett et al., 2022) and the FDA-ARGOS Wanted Organism list (Sichtig et al., 2019). This ensured the inclusion of clinically relevant species, genera, and families across 12 major bacterial phyla. ...

Reference:

Identifying potential novel widespread determinants of bacterial pathogenicity using phylogenetic-based orthology analysis
FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science

... Compared with the number of tools available for detecting 5mC and RNA modification, fewer tools are capable of detecting bacterial 6mA 38,39 . Examples of 6mA-detecting available tools include mCaller, a neural network-based tool trained on the sequencing data of E. coli K-12 40 ; Tombo, a comprehensive tool suite developed by the ONT, provides one de novo tool (Tombo_denovo) and two comparison mode tools, namely model_sample_compare (Tombo_modelcom) and level_sample_compare (Tombo_levelcom) 41 ; Nanodisco, a widely used tool for de novo modification detection and prediction of the methylation types in bacteria 42 ; Dorado (https://github.com/nanoporetech/ dorado), a deep-learning-based tool that provides highly accurate basecalling and modification detection for Nanopore sequencing data; ...

Single-molecule sequencing detection of N6-methyladenine in microbial reference materials

... The creation and ongoing improvements of specialized databases are abundant, and there is a need for uniformity and validation [113][114][115][116][117][118]. The FDA-ARGOS database [119], the SILVA ribosomal database (https://www.arb-silva.de/, accessed on 9 May 2024), and RDP (https://www.glbrc.org/data-and-tools/glbrc-data-sets/ribosomaldatabase-project, ...

FDA-ARGOS: A Public Quality-Controlled Genome Database Resource for Infectious Disease Sequencing Diagnostics and Regulatory Science Research

... Temperate phages are a double-edged sword due to their lytic and lysogenic cycles, which can either kill the bacteria or enhance the fitness and virulence of the host. Previous investigations have demonstrated that lysogenic conversion plays a significant role in S. aureus adaptability and virulence and contributes to the pathogenesis of staphylococcal infections [13,14]. There are also studies indicating that temperate phage in combination with antibiotic synergy eradicates bacteria through depletion of lysogens [15]. ...

Extra-Chromosomal DNA Sequencing Reveals Episomal Prophages Capable of Impacting Virulence Factor Expression in Staphylococcus aureus

... Manual intervention was performed to remove remaining contaminants from the assembly, ensuring high-quality final genome sequences (10) ( Table 1). Taxonomic classification was based on a 16S rRNA gene analysis and the NCBI prokaryotic genome annotation pipeline (11,12). The latter identified strain BW1 as 96.769% identical by average nucleotide identity to the type genome of Pseudomonas chengduensis, with 81.5% coverage of the genome. ...

Meeting Report: GenBank Microbial Genomic Taxonomy Workshop (12-13 May, 2015)

Standards in Genomic Sciences

... Why clinicians are not utilising the latest mNGS methods is under-researched, with a lack of qualitative research to understand the current diagnostic landscape and to evaluate why the landscape has remained unchanged for many years. Current literature is saturated with publications on clinical metagenomics, covering topics such as its utility and its limitations when applied to the current diagnostic landscape [14][15][16]; however, there is a gap in the literature which contextualises the recent clinical advancements in IDD and the barriers that hinder the implementation of new diagnostic frameworks. Perspectives and opinions of clinical diagnostic stakeholders influence the uptake of new diagnostic technologies [17]. ...

Making the Leap from Research Laboratory to Clinic: Challenges and Opportunities for Next-Generation Sequencing in Infectious Disease Diagnostics