PanOCT: Automated Clustering of Orthologs Using Conserved Gene Neighborhood for Pan-Genomic Analysis of Bacterial Strains and Closely Related Species

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
Nucleic Acids Research (Impact Factor: 9.11). 08/2012; 40(22). DOI: 10.1093/nar/gks757
Source: PubMed


Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

Download full-text


Available from: Jason M Inman, Jan 24, 2014
  • Source
    • "The currently available genomes of N4-like bacteriophage were retrieved from NCBI in Genbank format. PanOCT [33] was used to identify the genes that are conserved across the N4-like bacteriophages (40% or 70% identity cut-off, at least 70% protein length overlap). Each orthologs cluster was aligned using MUSCLE [34]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bacteriophage EC1-UPM is an N4-like bacteriophage which specifically infects Escherichia coli O78:K80, an avian pathogenic strain that causes colibacillosis in poultry. The complete genome sequence of bacteriophage EC1-UPM was analysed and compared with other closely related N4-like phage groups to assess their genetic similarities and differences. Bacteriophage EC1-UPM displays a very similar codon usage profile with its host and does not contain any tRNA gene. Comparative genomics analysis reveals close resemblance of bacteriophage EC1-UPM to three N4-like bacteriophages namely vB_EcoP_G7C, IME11 and KBNP21 with a total of 44 protein coding genes shared at 70% identity threshold. The genomic region coding for the tail fiber protein was found to be unique in bacteriophage EC1-UPM. Further annotation of the tail fiber protein using HHpred, a highly sensitive homology detection tool, reveals the presence of protein structure homologous to various polysaccharide processing proteins in its C-terminus. Leveraging on the availability of multiple N4-like bacteriophage genome sequences, the core genes of N4-like bacteriophages were identified and used to perform a multilocus phylogenetic analysis which enabled the construction of a phylogenetic tree with higher confidence than phylogenetic trees based on single genes. We report for the first time the complete genome sequence of a N4-like bacteriophage which is lytic against avian pathogenic Escherichia coli O78:K80. A novel 928 amino acid residues tail fiber protein was identified in EC1-UPM which may be useful to further the understanding of phage-host specificity. Multilocus phylogenetic analysis using core genes of sequenced N4-like phages showed that the evolutionary relationship correlated well with the pattern of host specificity.
    Virology Journal 10/2013; 10(1):308. DOI:10.1186/1743-422X-10-308 · 2.18 Impact Factor
  • Source
    • "An all-versus-all BLASTP was performed on the extracted protein sequences from each strain. The BLAST output were used as an input for the identification of single-copy orthologs using PanOCT (% Identity 65; E-value < 1e-10) [52]. Venerable in R was used to construct the six-way Venn diagram. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bacteria belonging to the genus Novosphingobium are known to be metabolically versatile and occupy different ecological niches. In the absence of genomic data and/or analysis, knowledge of the bacteria that belong to this genus is currently limited to biochemical characteristics. In this study, we analyzed the whole genome sequencing data of six bacteria in the Novosphingobium genus and provide evidence to show the presence of genes that are associated with salt tolerance, cell-cell signaling and aromatic compound biodegradation phenotypes. Additionally, we show the taxonomic relationship between the sequenced bacteria based on phylogenomic analysis, average amino acid identity (AAI) and genomic signatures. The taxonomic clustering of Novosphingobium strains is generally influenced by their isolation source. AAI and genomic signature provide strong support the classification of Novosphingobium sp. PP1Y as Novosphingobium pentaromaticivorans PP1Y. The identification and subsequent functional annotation of the unique core genome in the marine Novosphingobium bacteria show that ectoine synthesis may be the main contributing factor in salt water adaptation. Genes coding for the synthesis and receptor of the cell-cell signaling molecules, of the N-acyl-homoserine lactones (AHL) class are identified. Notably, a solo luxR homolog was found in strain PP1Y that may have been recently acquired via horizontal gene transfer as evident by the presence of multiple mobile elements upstream of the gene. Additionally, phylogenetic tree analysis and sequence comparison with functionally validated aromatic ring hydroxylating dioxygenases (ARDO) revealed the presence of several ARDOs (oxygenase) in Novosphingobium bacteria with the majority of them belonging to the Groups II and III of the enzyme. The combination of prior knowledge on the distinctive phenotypes of Novosphingobium strains and meta-analysis of their whole genomes enables the identification of several genes that are relevant in industrial applications and bioremediation. The results from such targeted but comprehensive comparative genomics analysis have the potential to contribute to the understanding of adaptation, cell-cell communication and bioremediation properties of bacteria belonging to the genus Novosphingobium.
    BMC Genomics 06/2013; 14(1):431. DOI:10.1186/1471-2164-14-431 · 3.99 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Legionella pneumophila is a bacterial pathogen present in aquatic environments that can cause a severe pneumonia called Legionnaires' disease. Soon after its recognition, it was shown that Legionella replicates inside amoeba, suggesting that bacteria replicating in environmental protozoa are able to exploit conserved signaling pathways in human phagocytic cells. Comparative, evolutionary, and functional genomics suggests that the Legionella-amoeba interaction has shaped this pathogen more than previously thought. A complex evolutionary scenario involving mobile genetic elements, type IV secretion systems, and horizontal gene transfer among Legionella, amoeba, and other organisms seems to take place. This long-lasting coevolution led to the development of very sophisticated virulence strategies and a high level of temporal and spatial fine-tuning of bacteria host-cell interactions. We will discuss current knowledge of the evolution of virulence of Legionella from a genomics perspective and propose our vision of the emergence of this human pathogen from the environment.
    Cold Spring Harbor Perspectives in Medicine 06/2013; 3(6). DOI:10.1101/cshperspect.a009993 · 9.47 Impact Factor
Show more