Identifying bacterial genes and endosymbiont DNA with GLIMMER

Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States
Bioinformatics (Impact Factor: 4.98). 04/2007; 23(6):673-9. DOI: 10.1093/bioinformatics/btm009
Source: PubMed


The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host.
The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella.
Glimmer is OSI Certified Open Source and available at

44 Reads
    • "Open reading frames (ORFs) in each assembled genome sequence were predicted using two gene-finding programs, Glimmer3 (Delcher et al. 2007) and GeneMarkS (Besemer et al. 2001). ORFs predicted by either of these programs were considered as potential protein-coding genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish.
    Gene 10/2015; DOI:10.1016/j.gene.2015.10.015 · 2.14 Impact Factor
  • Source
    • "Genome annotation and identification of regulatory elements and motifs Open reading frames (ORFs) in S-EIV1 were predicted using GeneMark (Lukashin and Borodovsky, 1998) and GLIMMER (Delcher et al., 2007), where the predictions differed, the longer of the two was kept. The predicted ORFs were translated and assigned putative functions by using BLASTp to compare them with protein sequences in the GenBank (nr), Acclame and Procite databases. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Viral lysis of phytoplankton constrains marine primary production, food web dynamics and biogeochemical cycles in the ocean. Yet, little is known about the biogeographical distribution of viral lysis rates across the global ocean. To address this, we investigated phytoplankton group-specific viral lysis rates along a latitudinal gradient within the North Atlantic Ocean. The data show large-scale distribution patterns of different virus groups across the North Atlantic that are associated with the biogeographical distributions of their potential microbial hosts. Average virus-mediated lysis rates of the picocyanobacteria Prochlorococcus and Synechococcus were lower than those of the picoeukaryotic and nanoeukaryotic phytoplankton (that is, 0.14 per day compared with 0.19 and 0.23 per day, respectively). Total phytoplankton mortality (virus plus grazer-mediated) was comparable to the gross growth rate, demonstrating high turnover rates of phytoplankton populations. Virus-induced mortality was an important loss process at low and mid latitudes, whereas phytoplankton mortality was dominated by microzooplankton grazing at higher latitudes (>56°N). This shift from a viral-lysis-dominated to a grazing-dominated phytoplankton community was associated with a decrease in temperature and salinity, and the decrease in viral lysis rates was also associated with increased vertical mixing at higher latitudes. Ocean-climate models predict that surface warming will lead to an expansion of the stratified and oligotrophic regions of the world’s oceans. Our findings suggest that these future shifts in the regional climate of the ocean surface layer are likely to increase the contribution of viral lysis to phytoplankton mortality in the higher-latitude waters of the North Atlantic, which may potentially reduce transfer of matter and energy up the food chain and thus affect the capacity of the northern North Atlantic to act as a long-term sink for CO2.
    The ISME Journal 08/2015; DOI:10.1038/ismej.2015.130 · 9.30 Impact Factor
  • Source
    • "protein-coding genes was identified using Glimmer 3.0 (Delcher et al., 2007). Genes consisting of <120 base pairs (bp) and those containing overlaps were eliminated. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bifidobacterium breve JCM 1192(T) was isolated from infant feces. Here, we report the complete genome sequence of this organism. Copyright © 2015. Published by Elsevier B.V.
    Journal of Biotechnology 06/2015; 210. DOI:10.1016/j.jbiotec.2015.06.414 · 2.87 Impact Factor
Show more

Preview (2 Sources)

44 Reads
Available from