Khai Luong

Pacific Biosciences of California, Inc., Menlo Park, California, United States

Are you Khai Luong?

Claim your profile

Publications (39)263.54 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The methylation of DNA bases plays an important role in numerous biological processes including development, gene expression, and DNA replication. Salmonella is an important foodborne pathogen, and methylation in Salmonella is implicated in virulence. Using single molecule real-time (SMRT) DNA-sequencing, we sequenced and assembled the complete genomes of eleven Salmonella enterica isolates from nine different serovars, and analysed the whole-genome methylation patterns of each genome. We describe 16 distinct N6-methyladenine (m6A) methylated motifs, one N4-methylcytosine (m4C) motif, and one combined m6A-m4C motif. Eight of these motifs are novel, i.e., they have not been previously described. We also identified the methyltransferases (MTases) associated with 13 of the motifs. Some motifs are conserved across all Salmonella serovars tested, while others were found only in a subset of serovars. Eight of the nine serovars contained a unique methylated motif that was not found in any other serovar (most of these motifs were part of Type I restriction modification systems), indicating the high diversity of methylation patterns present in Salmonella.
    Full-text · Article · Apr 2015 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Base J (β-D-glucosyl-hydroxymethyluracil) replaces 1% of T in the Leishmania genome and is only found in telomeric repeats (99%) and in regions where transcription starts and stops. This highly restricted distribution must be co-determined by the thymidine hydroxylases (JBP1 and JBP2) that catalyze the initial step in J synthesis. To determine the DNA sequences recognized by JBP1/2, we used SMRT sequencing of DNA segments inserted into plasmids grown in Leishmania tarentolae. We show that SMRT sequencing recognizes base J in DNA. Leishmania DNA segments that normally contain J also picked up J when present in the plasmid, whereas control sequences did not. Even a segment of only 10 telomeric (GGGTTA) repeats was modified in the plasmid. We show that J modification usually occurs at pairs of Ts on opposite DNA strands, separated by 12 nucleotides. Modifications occur near G-rich sequences capable of forming G-quadruplexes and JBP2 is needed, as it does not occur in JBP2-null cells. We propose a model whereby de novo J insertion is mediated by JBP2. JBP1 then binds to J and hydroxylates another T 13 bp downstream (but not upstream) on the complementary strand, allowing JBP1 to maintain existing J following DNA replication. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Full-text · Article · Feb 2015 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: TET/JBP enzymes oxidize 5-methylpyrimidines in DNA. In mammals, the oxidized methylcytosines (oxi-mCs) function as epigenetic marks and likely intermediates in DNA demethylation. Here we present a method based on diglucosylation of 5-hydroxymethylcytosine (5hmC) to simultaneously map 5hmC, 5-formylcytosine, and 5-carboxylcytosine at near-base-pair resolution. We have used the method to map the distribution of oxi-mC across the genome of Coprinopsis cinerea, a basidiomycete that encodes 47 TET/JBP paralogs in a previously unidentified class of DNA transposons. Like 5-methylcytosine residues from which they are derived, oxi-mC modifications are enriched at centromeres, TET/JBP transposons, and multicopy paralogous genes that are not expressed, but rarely mark genes whose expression changes between two developmental stages. Our study provides evidence for the emergence of an epigenetic regulatory system through recruitment of selfish elements in a eukaryotic lineage, and describes a method to map all three different species of oxi-mCs simultaneously.
    Full-text · Article · Nov 2014 · Proceedings of the National Academy of Sciences
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Campylobacter lari group is a phylogenetic clade within the epsilon subdivision of the Proteobacteria and is part of the thermotolerant Campylobacter spp., a division within the genus that includes the human pathogen Campylobacter jejuni. The C. lari group is currently composed of five species (C. lari, C. insulaenigrae, C. volucris, C. subantarcticus and C. peloridis), as well as a group of strains termed the urease-positive thermophilic Campylobacter (UPTC) and other C. lari-like strains. Here we present the complete genome sequences of 11 C. lari group strains, including the five C. lari group species, four UPTC strains and a lari-like strain isolated in this study. The genome of C. lari subsp. lari strain RM2100 was described previously. Analysis of the C. lari group genomes indicates that this group is highly related at the genome level. Furthermore, these genomes are strongly syntenic with minor rearrangements occurring only in four of the twelve genomes studied. The C. lari group can be bifurcated, based on the flagella and flagellar modification genes. Genomic analysis of the UPTC strains indicated that these organisms are variable but highly-similar, closely related to but distinct from C. lari. Additionally, the C. lari group contains multiple genes encoding hemagglutination domain proteins, which are either contingency genes or linked to conserved contingency genes. Many of the features identified in strain RM2100, such as major deficiencies in amino acid biosynthesis and energy metabolism, are conserved across all 12 genomes, suggesting that these common features may play a role in the association of the C. lari group with coastal environments and watersheds.
    No preview · Article · Nov 2014 · Genome Biology and Evolution
  • [Show abstract] [Hide abstract]
    ABSTRACT: Public health officials have raised concerns that plasmid transfer between Enterobacteriaceae species may spread resistance to carbapenems, an antibiotic class of last resort, thereby rendering common health care-associated infections nearly impossible to treat. To determine the diversity of carbapenemase-encoding plasmids and assess their mobility among bacterial species, we performed comprehensive surveillance and genomic sequencing of carbapenem-resistant Enterobacteriaceae in the National Institutes of Health (NIH) Clinical Center patient population and hospital environment. We isolated a repertoire of carbapenemase-encoding Enterobacteriaceae, including multiple strains of Klebsiella pneumoniae, Klebsiella oxytoca, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, and Pantoea species. Long-read genome sequencing with full end-to-end assembly revealed that these organisms carry the carbapenem resistance genes on a wide array of plasmids. K. pneumoniae and E. cloacae isolated simultaneously from a single patient harbored two different carbapenemase-encoding plasmids, indicating that plasmid transfer between organisms was unlikely within this patient. We did, however, find evidence of horizontal transfer of carbapenemase-encoding plasmids between K. pneumoniae, E. cloacae, and C. freundii in the hospital environment. Our data, including full plasmid identification, challenge assumptions about horizontal gene transfer events within patients and identify possible connections between patients and the hospital environment. In addition, we identified a new carbapenemase-encoding plasmid of potentially high clinical impact carried by K. pneumoniae, E. coli, E. cloacae, and Pantoea species, in unrelated patients and in the hospital environment.
    No preview · Article · Sep 2014 · Science translational medicine
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Bacterial phosphorothioate (PT) DNA modifications are incorporated by Dnd proteins A-E and often function with DndF-H as a restriction-modification (R-M) system, as in Escherichia coli B7A. However, bacteria such as Vibrio cyclitrophicus FF75 lack dndF-H, which points to other PT functions. Here we report two novel, orthogonal technologies to map PTs across the genomes of B7A and FF75 with >90% agreement: single molecule, real-time sequencing and deep sequencing of iodine-induced cleavage at PT (ICDS). In B7A, we detect PT on both strands of GpsAAC/GpsTTC motifs, but with only 12% of 40,701 possible sites modified. In contrast, PT in FF75 occurs as a single-strand modification at CpsCA, again with only 14% of 160,541 sites modified. Single-molecule analysis indicates that modification could be partial at any particular genomic site even with active restriction by DndF-H, with direct interaction of modification proteins with GAAC/GTTC sites demonstrated with oligonucleotides. These results point to highly unusual target selection by PT-modification proteins and rule out known R-M mechanisms.
    Preview · Article · Jun 2014 · Nature Communications
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The genome of Helicobacter pylori is remarkable for its large number of restriction-modification (R-M) systems, and strain-specific diversity in R-M systems has been suggested to limit natural transformation, the major driving force of genetic diversification in H. pylori. We have determined the comprehensive methylomes of two H. pylori strains at single base resolution, using Single Molecule Real-Time (SMRT®) sequencing. For strains 26695 and J99-R3, 17 and 22 methylated sequence motifs were identified, respectively. For most motifs, almost all sites occurring in the genome were detected as methylated. Twelve novel methylation patterns corresponding to nine recognition sequences were detected (26695, 3; J99-R3, 6). Functional inactivation, correction of frameshifts as well as cloning and expression of candidate methyltransferases (MTases) permitted not only the functional characterization of multiple, yet undescribed, MTases, but also revealed novel features of both Type I and Type II R-M systems, including frameshift-mediated changes of sequence specificity and the interaction of one MTase with two alternative specificity subunits resulting in different methylation patterns. The methylomes of these well-characterized H. pylori strains will provide a valuable resource for future studies investigating the role of H. pylori R-M systems in limiting transformation as well as in gene regulation and host interaction.
    Full-text · Article · Dec 2013 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Caulobacter DNA methyltransferase CcrM is one of five master cell-cycle regulators. CcrM is transiently present near the end of DNA replication when it rapidly methylates the adenine in hemimethylated GANTC sequences. The timing of transcription of two master regulator genes and two cell division genes is controlled by the methylation state of GANTC sites in their promoters. To explore the global extent of this regulatory mechanism, we determined the methylation state of the entire chromosome at every base pair at five time points in the cell cycle using single-molecule, real-time sequencing. The methylation state of 4,515 GANTC sites, preferentially positioned in intergenic regions, changed progressively from full to hemimethylation as the replication forks advanced. However, 27 GANTC sites remained unmethylated throughout the cell cycle, suggesting that these protected sites could participate in epigenetic regulatory functions. An analysis of the time of activation of every cell-cycle regulatory transcription start site, coupled to both the position of a GANTC site in their promoter regions and the time in the cell cycle when the GANTC site transitions from full to hemimethylation, allowed the identification of 59 genes as candidates for epigenetic regulation. In addition, we identified two previously unidentified N(6)-methyladenine motifs and showed that they maintained a constant methylation state throughout the cell cycle. The cognate methyltransferase was identified for one of these motifs as well as for one of two 5-methylcytosine motifs.
    Full-text · Article · Nov 2013 · Proceedings of the National Academy of Sciences
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We performed whole-genome analyses of DNA methylation in Shewanella oneidensis MR-1 to examine its possible role in regulating gene expression and other cellular processes. Single-molecule real-time (SMRT) sequencing revealed extensive methylation of adenine (N6mA) throughout the genome. These methylated bases were located in five sequence motifs, including three novel targets for type I restriction/modification enzymes. The sequence motifs targeted by putative methyltranferases were determined via SMRT sequencing of gene knockout mutants. In addition, we found that S. oneidensis MR-1 cultures grown under various culture conditions displayed different DNA methylation patterns. However, the small number of differentially methylated sites could not be directly linked to the much larger number of differentially expressed genes under these conditions, suggesting that DNA methylation is not a major regulator of gene expression in S. oneidensis MR-1. The enrichment of methylated GATC motifs in the origin of replication indicates that DNA methylation may regulate genome replication in a manner similar to that seen in Escherichia coli. Furthermore, comparative analyses suggest that many Gammaproteobacteria, including all members of the Shewanellaceae family, may also utilize DNA methylation to regulate genome replication.
    Preview · Article · Aug 2013 · Journal of bacteriology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria.
    Full-text · Article · Apr 2013 · Nature Communications
  • Source
    Dataset: Figure S1
    [Show abstract] [Hide abstract]
    ABSTRACT: Performance of the hierarchical model in partially modified plasmid data. The red, green and blue curves are ROC curves for the hierarchical model with control data, the case-control method, and the hierarchical model without control data, respectively. These three methods were tested on two different datasets: 1) a 3,589 bases long plasmid with 19 known 4-methylcytosines(4-mC) where 50%, 70%, and 90% molecules from the modified site are actually modified on average (single strand coverage of native sample and control sample are 200x and 65x, respectively)(A,C,E), and 2) a 3,591 bases long plasmid with 23 known N6-methyladenines(6-mA) where 50%, 70%, and 90% molecules from the modified site are actually modified on average (single strand coverage of native sample and control sample are 100x and 20x, respectively) (B,D,F). (TIFF)
    Preview · Dataset · Mar 2013
  • Source
    Dataset: Text S1
    [Show abstract] [Hide abstract]
    ABSTRACT: EM algorithm for fitting the hierarchial model. Text S1 provides a detailed description of the EM (Expectation-Maximization) algorithm used for estimating hyperparameters of the proposed hierarchical model. (PDF)
    Preview · Dataset · Mar 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Author Summary DNA modifications have been found in a wide range of living organisms, from bacteria to human. Many existing studies have shown that they play important roles in development, disease, bacteria virulence, etc. However, for many types of DNA modification, for example N6-methyladenine and 8-oxoG, there is not an efficient and accurate detection method. Single molecule real time (SMRT) sequencing not only generates DNA sequences, but also generates DNA polymerase kinetic information. The kinetic information is sensitive to DNA modifications in the sequenced DNA template, and therefore can be used for detecting a wide range of DNA modification types. The usual detection strategy is a case-control method, which compare kinetic information between native sample and a control sample whose modifications have been removed. However, generating a control sample doubles the cost. We proposed a hierarchical model, which can incorporate existing SMRT sequencing data to increase detection accuracy and reduce coverage requirement of control sample or even avoid the need of a control sample in some cases. We tested our method on SMRT sequencing data of plasmids with known modified sites and E. coli K-12 strain to demonstrate our method can greatly increase detection accuracy and reduce sequencing cost.
    Preview · Article · Mar 2013 · PLoS Computational Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report a closed genome of Salmonella enterica subsp. enterica serovar Javiana (S. Javiana). This serotype is a common food-borne pathogen and is often associated with fresh-cut produce. Complete (finished) genome assemblies will support pilot studies testing the utility of next-generation sequencing (NGS) technologies in public health laboratories.
    Full-text · Article · Feb 2013 · Genome Announcements
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence context dependence of the kinetic signatures for 5mC and 5caC. Top panel (a) is a schematic of the synthetic SMRTbell template with random bases surrounding 5mC or 5caC modifications in a CG sequence context. The modified position is indicated with pink text and an asterisk. The bottom panel (b) is a heat map of IPD ratio values of either 5mC or 5caC relative to an unmodified control sequence. IPD ratio values are shown for all possible sequence contexts of four random bases over ten positions on the DNA template (-3 to +6 relative to the modified base). Light grey boxes within the heatmap denote sequence contexts that did not have sufficient sequencing coverage. A minimum of 10 independent molecules of both modified and control templates were analyzed.
    Preview · Dataset · Jan 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Table of detection rates for all methylated motifs in B.halodurans C-125. The number and percent detection is shown for all methylated sequence motifs in the genome. A detected genomic position is one that has a kinetic score that is greater than the cutoff value. Detection rates are also shown for common secondary IPD ratio peaks of 6mA (+5) and 5mC (+2, +6) and for off-target motifs with similar sequence content to the methylated motifs. Methylated bases are colored: 6mA (red), 5mC (blue). The interrogated base in the motif is underlined. Unassigned are genomic positions with kinetic scores above the cutoff which are not in a methylated motif or a secondary peak.
    Preview · Dataset · Jan 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Table of detection rates for all methylated motifs in E.coli MG1655. The number and percent detection is shown for all methylated sequence motifs in the genome. A detected genomic position is one that has a kinetic score that is greater than the cutoff value. Detection rates are also shown for common secondary IPD ratio peaks of 6mA (+5) and 5mC (+2, +6) and for off-target motifs with similar sequence content to the methylated motifs. Methylated bases are colored: 6mA (red), 5mC (blue). The interrogated base in the motif is underlined. Unassigned are genomic positions with kinetic scores above the cutoff which are not in a methylated motif or a secondary peak.
    Preview · Dataset · Jan 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Conversion of 5mC to 5caC in synthetic oligonucleotides. Kinetic signals for synthetic oligonucleotides carrying two 5mC modified sites (red bars) are shown before (top) and after (bottom) mTet1-mediated oxidation to 5caC. IPD ratio data are plotted for each template position relative to a control template of identical sequence but lacking modifications. The template is shown in the 5' to 3' direction from left to right, the polymerase movement is right to left across the template as indicated by the arrow.
    Preview · Dataset · Jan 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: IPD ratio distributions of all methylated motifs in E.coli MG1655 and B.halodurans C-125. Each plots show the histograms of IPD ratio values for each methylated motif and an off-target non-methylated motif. The top plots are from native samples and the bottom show the same data after Tet1-mediated conversion of 5mC to 5caC.
    Preview · Dataset · Jan 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background DNA methylation serves as an important epigenetic mark in both eukaryotic and prokaryotic organisms. In eukaryotes, the most common epigenetic mark is 5-methylcytosine, whereas prokaryotes can have 6-methyladenine, 4-methylcytosine, or 5-methylcytosine. Single-molecule, real-time sequencing is capable of directly detecting all three types of modified bases. However, the kinetic signature of 5-methylcytosine is subtle, which presents a challenge for detection. We investigated whether conversion of 5-methylcytosine to 5-carboxylcytosine using the enzyme Tet1 would enhance the kinetic signature, thereby improving detection. Results We characterized the kinetic signatures of various cytosine modifications, demonstrating that 5-carboxylcytosine has a larger impact on the local polymerase rate than 5-methylcytosine. Using Tet1-mediated conversion, we show improved detection of 5-methylcytosine using in vitro methylated templates and apply the method to the characterization of 5-methylcytosine sites in the genomes of Escherichia coli MG1655 and Bacillus halodurans C-125. Conclusions We have developed a method for the enhancement of directly detecting 5-methylcytosine during single-molecule, real-time sequencing. Using Tet1 to convert 5-methylcytosine to 5-carboxylcytosine improves the detection rate of this important epigenetic marker, thereby complementing the set of readily detectable microbial base modifications, and enhancing the ability to interrogate eukaryotic epigenetic markers.
    Full-text · Article · Jan 2013 · BMC Biology

Publication Stats

1k Citations
263.54 Total Impact Points

Institutions

  • 2012-2014
    • Pacific Biosciences of California, Inc.
      Menlo Park, California, United States
    • Northern Arizona University
      Flagstaff, Arizona, United States
  • 2013
    • U.S. Food and Drug Administration
      • Center for Food Safety and Applied Nutrition
      Washington, Washington, D.C., United States