Identifying bacterial genes and endosymbiont DNA with GLIMMER

Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States
Bioinformatics (Impact Factor: 4.98). 04/2007; 23(6):673-9. DOI: 10.1093/bioinformatics/btm009
Source: PubMed


The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host.
The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella.
Glimmer is OSI Certified Open Source and available at

Full-text preview

Available from:
  • Source
    • "The complete ACN001 genome sequence was analysed using Glimmer 3.0[10,11]and GeneMark[12,13]for gene prediction, the tRNAscan-SE tool for tRNA identi- fication[14], and RNAmmer[15]for ribosomal RNA identification. The predicted protein-coding genes were translated into amino acid sequences and annotated using the NCBI and UniProt non-redundant sequence databases[16], the Kyoto Encyclopedia of Genes and Genomes database[17], and, subsequently, the Cluster of Orthologous Genes database[18]to identify the specific protein products and their functional categories. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Avian pathogenic Escherichia coli is an important etiological agent of avian colibacillosis, which manifests as respiratory, hematogenous, meningitic, and enteric infections in poultry. It is also a potential zoonotic threat to human health. The diverse genomes of APEC strains largely hinder disease prevention and control measures. In the current study, pyrosequencing was used to analyze and characterize APEC strain ACN001 (= CCTCC 2015182T = DSMZ 29979T), which was isolated from the liver of a diseased chicken in China in 2010. Strain ACN001 belongs to extraintestinal pathogenic E. coli phylogenetic group B1, and was highly virulent in chicken and mouse models. Whole genome analysis showed that it consists of six different plasmids along with a circular chromosome of 4,936,576 bp, comprising 4,794 protein-coding genes, 108 RNA genes, and 51 pseudogenes, with an average G + C content of 50.56 %. As well as 237 coding sequences, we identified 39 insertion sequences, 12 predicated genomic islands, 8 prophage-related sequences, and 2 clustered regularly interspaced short palindromic repeats regions on the chromosome, suggesting the possible occurrence of horizontal gene transfer in this strain. In addition, most of the virulence and antibiotic resistance genes were located on the plasmids, which would assist in the distribution of pathogenicity and multidrug resistance elements among E. coli populations. Together, the information provided here on APEC isolate ACN001 will assist in future study of APEC strains, and aid in the development of control measures.
    Full-text · Article · Dec 2016 · Standards in Genomic Sciences
  • Source
    • "Predicted genes were identified using Glimmer version 3.0[14]. tRNAscan-SE version 1.21[15]was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer version 1.2[16]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Paenibacillus sp. strain A2 is a Gram-negative rod-shaped bacterium isolated from a mixture of formation water and petroleum in Daqing oilfield, China. This facultative aerobic bacterium was found to have a broad capacity for metabolizing hydrocarbon and organosulfur compounds, which are the main reasons for the interest in sequencing its genome. Here we describe the features of Paenibacillus sp. strain A2, together with the genome sequence and its annotation. The 7,650,246 bp long genome (1 chromosome but no plasmid) exhibits a G+C content of 54.2 % and contains 7575 protein-coding and 49 RNA genes, including 3 rRNA genes. One putative alkane monooxygenase, one putative alkanesulfonate monooxygenase, one putative alkanesulfonate transporter and four putative sulfate transporters were found in the draft genome.
    Preview · Article · Dec 2016 · Standards in Genomic Sciences
  • Source
    • "The high-quality reads, which provided an approximately 285-fold depth of coverage were assembled with Velvet Version 1.2.10 [5]. Protein-coding sequences were predicted by Glimmer software version 3.0 [6], while Ribosomal RNA (rRNA) and transfer RNA (tRNA) genes were predicted using an RNAmmer 1.2 server [7] and tRNAscan-SE Search Server version 1.21 [8] , respectively. Tandem repeats were predicted using Tandem Repeats Finder Version 4.04. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Acidithiobacillus ferrooxidans YQH-1 is a moderate acidophilic bacterium isolated from a river in a volcano of Northeast China. Here, we describe the draft genome of strain YQH-1, which was assembled into 123 contigs containing 3,111,222bp with a G+C content of 58.63%. A large number of genes related to carbon dioxide fixation, dinitrogen fixation, pH tolerance, heavy metal detoxification, and oxidative stress defense were detected. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LJBT00000000.
    Full-text · Article · Dec 2015 · Genomics Data
Show more