PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
Nucleic Acids Research (Impact Factor: 8.81). 08/2012; 40(22). DOI: 10.1093/nar/gks757
Source: PubMed

ABSTRACT Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.


Available from: Jason M Inman, Jan 24, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: We developed a user-friendly program, Genome Profiler (GeP), to refine the whole-genome multilocus sequence typing analysis by addressing gene paralogy with conserved gene neighborhoods. In comparison to similar programs, GeP produced overall the best results in terms of accuracy thus a useful alternative to resolve relationships of bacterial isolates. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
    Journal of clinical microbiology 03/2015; 53(5). DOI:10.1128/JCM.00051-15 · 4.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses. Copyright © 2014 Elsevier Ltd. All rights reserved.
    Current Opinion in Microbiology 01/2015; 23C:189-196. DOI:10.1016/j.mib.2014.11.017 · 7.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Acinetobacter baumannii is an important nosocomial pathogen that poses a serious health threat to immune-compromised patients. Due to its rapid ability to develop multidrug resistance (MDR), A. baumannii has increasingly become a focus of attention worldwide. To better understand the genetic variation and antibiotic resistance mechanisms of this bacterium at the genomic level, we reported high-quality draft genome sequences of 8 clinical isolates with various sequence types and drug susceptibility profiles. We sequenced 7 MDR and 1 drug-sensitive clinical A. baumannii isolates and performed comparative genomic analysis of these draft genomes with 16 A. baumannii complete genomes from GenBank. We found a high degree of variation in A. baumannii, including single nucleotide polymorphisms (SNPs) and large DNA fragment variations in the AbaR-like resistance island (RI) regions, the prophage and the type VI secretion system (T6SS). In addition, we found several new AbaR-like RI regions with highly variable structures in our MDR strains. Interestingly, we found a novel genomic island (designated as GIBJ4) in the drug-sensitive strain BJ4 carrying metal resistance genes instead of antibiotic resistance genes inserted into the position where AbaR-like RIs commonly reside in other A. baumannii strains. Furthermore, we showed that diverse antibiotic resistance determinants are present outside the RIs in A. baumannii, including antibiotic resistance-gene bearing integrons, the blaOXA-23-containing transposon Tn2009, and chromosomal intrinsic antibiotic resistance genes. Our comparative genomic analysis revealed that extensive genomic variation exists in the A. baumannii genome. Transposons, genomic islands and point mutations are the main contributors to the plasticity of the A. baumannii genome and play critical roles in facilitating the development of antibiotic resistance in the clinical isolates.