PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
Nucleic Acids Research (Impact Factor: 8.81). 08/2012; 40(22). DOI: 10.1093/nar/gks757
Source: PubMed

ABSTRACT Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Acinetobacter baumannii is an important nosocomial pathogen that poses a serious health threat to immune-compromised patients. Due to its rapid ability to develop multidrug resistance (MDR), A. baumannii has increasingly become a focus of attention worldwide. To better understand the genetic variation and antibiotic resistance mechanisms of this bacterium at the genomic level, we reported high-quality draft genome sequences of 8 clinical isolates with various sequence types and drug susceptibility profiles. We sequenced 7 MDR and 1 drug-sensitive clinical A. baumannii isolates and performed comparative genomic analysis of these draft genomes with 16 A. baumannii complete genomes from GenBank. We found a high degree of variation in A. baumannii, including single nucleotide polymorphisms (SNPs) and large DNA fragment variations in the AbaR-like resistance island (RI) regions, the prophage and the type VI secretion system (T6SS). In addition, we found several new AbaR-like RI regions with highly variable structures in our MDR strains. Interestingly, we found a novel genomic island (designated as GIBJ4) in the drug-sensitive strain BJ4 carrying metal resistance genes instead of antibiotic resistance genes inserted into the position where AbaR-like RIs commonly reside in other A. baumannii strains. Furthermore, we showed that diverse antibiotic resistance determinants are present outside the RIs in A. baumannii, including antibiotic resistance-gene bearing integrons, the blaOXA-23-containing transposon Tn2009, and chromosomal intrinsic antibiotic resistance genes. Our comparative genomic analysis revealed that extensive genomic variation exists in the A. baumannii genome. Transposons, genomic islands and point mutations are the main contributors to the plasticity of the A. baumannii genome and play critical roles in facilitating the development of antibiotic resistance in the clinical isolates.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Typhoid fever poses significant burden on healthcare systems in Southeast Asia and other endemic countries. Several epidemiological and genomic studies have attributed pseudogenisation to be the major driving force for the evolution of Salmonella Typhi although its real potential remains elusive. In the present study, we analyzed genomes of S. Typhi from different parts of Southeast Asia and Oceania, comprising of isolates from outbreak, sporadic and carrier cases. The genomes showed high genetic relatedness with limited opportunity for gene acquisition as evident from pan-genome structure. Given that pseudogenisation is an active process in S. Typhi, we further investigated core and pan-genome profiles of functional and pseudogenes separately. We observed a decline in core functional gene content and a significant increase in accessory pseudogene content. Upon functional classification, genes encoding metabolic functions formed a major constituent of pseudogenes as well as core functional gene clusters with SNPs. Further, an in-depth analysis of accessory pseudogene content revealed the existence of heterogeneous complements of functional and pseudogenes among the strains. In addition, these polymorphic genes were also enriched in metabolism related functions. Thus, the study highlights the existence of heterogeneous strains in a population with varying metabolic potential and that S. Typhi possibly resorts to metabolic fine tuning for its adaptation.
    Scientific Reports 12/2014; 4:7457. DOI:10.1038/srep07457 · 5.08 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses. Copyright © 2014 Elsevier Ltd. All rights reserved.

Full-text (3 Sources)

Available from
May 21, 2014