Properties of overlapping genes are conserved across microbial genomes

Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
Genome Research (Impact Factor: 14.63). 12/2004; 14(11):2268-72. DOI: 10.1101/gr.2433104
Source: PubMed


There are numerous examples from the genomes of viruses, mitochondria, and chromosomes that adjacent genes can overlap, sharing at least one nucleotide. Overlaps have been hypothesized to be involved in genome size minimization and as a regulatory mechanism of gene expression. Here we show that overlapping genes are a consistent feature (approximately one-third of all genes) across all microbial genomes sequenced to date, have homologs in more microbes than do non-overlapping genes, and are therefore likely more conserved. In addition, the size, phase (reading frame offset), and distribution, among other characteristics, of overlapping genes are most consistent with the hypothesis that overlaps function in the regulation of gene expression. The upstream sequences and conservation of overlapping orthologs of two model organisms from the genus Prochlorococcus that have significantly different GC-content, and therefore different nucleotide sequences for orthologs, are also consistent with small overlapping sequence regions and programmed shifts in reading frame as a common mechanism in the regulation of microbial gene expression.

Full-text preview

Available from:
  • Source
    • "Previously reported work mainly falls into two categories: (i) statistical analyses of large numbers of OGs considered together and often compared with the observed frequencies of START and STOP codons along the different reading frames surrounding the genes [4, 5, 13, 14]; and (ii) “anecdotal” analyses of specific OGPs, usually related to a specific species or biological system [6, 15–17], together with an analysis of the extent of the overlap through evolution. Only a few have looked at the phylogenetic profiles of large number of OGPs [8, 18]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The forces underlying genome architecture and organization are still only poorly understood in detail. Overlapping genes (genes partially or entirely overlapping) represent a genomic feature that is shared widely across biological organisms ranging from viruses to multi-cellular organisms. In bacteria, a third of the annotated genes are involved in an overlap. Despite the widespread nature of this arrangement, its evolutionary origins and biological ramifications have so far eluded explanation. Results Here we present a comparative approach using information from 699 bacterial genomes that sheds light on the evolutionary dynamics of overlapping genes. We show that these structures exhibit high levels of plasticity. Conclusions We propose a simple model allowing us to explain the observed properties of overlapping genes based on the importance of initiation and termination of transcriptional and translational processes. We believe that taking into account the processes leading to the expression of protein-coding genes hold the key to the understanding of overlapping genes structures.
    Full-text · Article · Aug 2014 · BMC Genomics
  • Source
    • "This arrangement is thought to be responsible for maintaining the ∼1 :35 ratio between the two proteins, a ratio likely maintained to prevent undesired cross-talk between the numerous two-component systems (Siryaporn and Goulian, 2008). Accordingly , overlapping genes are primarily found in operons and their patterns are strongly conserved between phylogenetically distant bacteria (Johnson and Chisholm, 2004). More strikingly, operons can also have internal regulatory elements (e.g. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The proper functioning of bacteria is encoded in their genome at multiple levels or scales, each of which is constrained by specific physical forces. At the smallest spatial scales, interatomic forces dictate the folding and function of proteins and nucleic acids. On longer length scales, stochastic forces emerging from the thermal jiggling of proteins and RNAs impose strong constraints on the organization of genes along chromosomes, more particularly in the context of the building of nucleoprotein complexes and the operational mode of regulatory agents. At the cellular level, transcription, replication and cell division activities generate forces that act on both the internal structure and cellular location of chromosomes. The overall result is a complex multi-scale organization of genomes that reflects the evolutionary tinkering of bacteria. The goal of this review is to highlight avenues for deciphering this complexity by focusing on patterns that are conserved among evolutionarily distant bacteria. To this end, I discuss three different organizational scales: the protein structures, the chromosomal organization of genes and the global structure of chromosomes.
    Full-text · Article · Aug 2014 · Computational Biology and Chemistry
  • Source
    • "Interestingly, the 5′ coding region of the psbC gene seemed to be overlapped with the 3′ coding region of psbD on the same strand in ArM0029B. This phenomenon of two genes overlapping occurs frequently in the genomes of viruses, prokaryotes, mitochondria, and eukaryotes, including humans [23-25]. The overlap of psbD and psbC seemed to exist in all of the Trebouxiophyceae sequenced except for Helicosporidium sp., which lacks psbD and psbC in the plastid genome. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Chorella is the representative taxon of Chlorellales in Trebouxiophyceae, and its chloroplast (cp) genomic information has been thought to depend only on studies concerning Chlorella vulgaris and GenBank information of C. variablis. Mitochondrial (mt) genomic information regarding Chlorella is currently unavailable. To elucidate the evolution of organelle genomes and genetic information of Chlorella, we have sequenced and characterized the cp and mt genomes of Arctic Chlorella sp. ArM0029B. The 119,989-bp cp genome lacking inverted repeats and 65,049-bp mt genome were sequenced. The ArM0029B cp genome contains 114 conserved genes, including 32 tRNA genes, 3 rRNA genes, and 79 genes encoding proteins. Chlorella cp genomes are highly rearranged except for a Chlorella-specific six-gene cluster, and the ArM0029B plastid resembles that of Chlorella variabilis except for a 15-kb gene cluster inversion. In the mt genome, 62 conserved genes, including 27 tRNA genes, 3 rRNA genes, and 32 genes encoding proteins were determined. The mt genome of ArM0029B is similar to that of the non-photosynthetic species Prototheca and Heicosporidium. The ArM0029B mt genome contains a group I intron, with an ORF containing two LAGLIDADG motifs, in cox1. The intronic ORF is shared by C. vulgaris and Prototheca. The phylogeny of the plastid genome reveals that ArM0029B showed a close relationship of Chlorella to Parachlorella and Oocystis within Chlorellales. The distribution of the cox1 intron at 721 support membership in the order Chlorellales. Mitochondrial phylogenomic analyses, however, indicated that ArM0029B shows a greater affinity to MX-AZ01 and Coccomyxa than to the Helicosporidium-Prototheca clade, although the detailed phylogenetic relationships among the three taxa remain to be resolved. The plastid genome of ArM0029B is similar to that of C. variabilis. The mt sequence of ArM0029B is the first genome to be reported for Chlorella. Chloroplast genome phylogeny supports monophyly of the seven investigated members of Chlorellales. The presence of the cox1 intron at 721 in all four investigated Chlorellales taxa indicates that the cox1 intron had been introduced in early Chorellales as a cis-splice form and that the cis-splicing intron was inherited to recent Chlorellales and was recently trans-spliced in Helicosporidium.
    Full-text · Article · Apr 2014 · BMC Genomics
Show more