Overlapping genes in the human and mouse genomes

Department of Computer Science, Virginia Tech, Blacksburg, USA.
BMC Genomics (Impact Factor: 4.04). 02/2008; 9:169. DOI: 10.1186/1471-2164-9-169
Source: PubMed

ABSTRACT Increasing evidence suggests that overlapping genes are much more common in eukaryotic genomes than previously thought. In this study we identified and characterized the overlapping genes in a set of 13,484 pairs of human-mouse orthologous genes.
About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps. The majority of the same-strand overlaps are embedded forms, whereas most different-strand overlaps are not embedded and in the convergent transcription orientation. Most of the same-strand overlapping gene pairs show at least a tenfold difference in length, much larger than the length difference between non-overlapping neighboring gene pairs. The length difference between the two different-strand overlapping genes is less dramatic. Over 27% of the different-strand-overlap relationships are shared between human and mouse, compared to only approximately 8% conservation for same-strand-overlap relationships. More than 96% of the same-strand and different-strand overlaps that are not shared between human and mouse have both genes located on the same chromosomes in the species that does not show the overlap. We examined the causes of transition between the overlapping and non-overlapping states in the two species and found that 3' UTR change plays an important role in the transition.
Our study contributes to the understanding of the evolutionary transition between overlapping genes and non-overlapping genes and demonstrates the high rates of evolutionary changes in the un-translated regions.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Although gene overlapping is a common feature of prokaryote and mitochondria genomes, such genes have also been identified in many eukaryotes. The overlapping genes in eukaryotes are extensively rearranged even between closely related species. In this study, we investigated retention and rearrangement of positionally overlapping genes between the mosquitoes Aedes aegypti (dengue virus vector) and Anopheles gambiae (malaria vector). The overlapping gene pairs of A. aegypti were further compared with orthologs of other selected insects to conduct several hypothesis driven investigations relating to the evolution and rearrangement of overlapping genes. RESULTS: The results show that as much as ~10% of the predicted genes of A. aegypti and A. gambiae are localized in positional overlapping manner. Furthermore, the study shows that differential abundance of introns and simple sequence repeats have significant association with positional rearrangement of overlapping genes between the two species. Gene expression analysis further suggests that antisense transcripts generated from the oppositely oriented overlapping genes are differentially regulated and may have important regulatory functions in these mosquitoes. Our data further shows that synonymous and non-synonymous mutations have differential but non-significant effect on overlapping localization of orthologous genes in other insect genomes. CONCLUSION: Gene overlapping in insects may be a species-specific evolutionary process as evident from non-dependency of gene overlapping with species phylogeny. Based on the results, our study suggests that overlapping genes may have played an important role in genome evolution of insects.
    BMC Evolutionary Biology 06/2013; 13(1):124. · 3.41 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Strand specific RNA sequencing is rapidly replacing conventional cDNA sequencing as an approachfor assessing information about the transcriptome. Alongside improved laboratory protocols the developmentof bioinformatical tools is steadily progressing. In the current procedure the IlluminaTruSeq library preparation kit is used, along with additional reagents, to make stranded libraries in anautomated fashion which are then sequenced on Illumina HiSeq 2000. By the use of freely availablebioinformatical tools we show, through quality metrics, that the protocol is robust and reproducible.We further highlight the practicality of strand specific libraries by comparing expression of strand specificlibraries to non-stranded libraries, by looking at known antisense transcription of pseudogenesand by identifying novel transcription. Furthermore, two ribosomal depletion kits, RiboMinus andRiboZero, are compared and two sequence aligners, Tophat2 and STAR, are also compared.
    BMC Genomics 07/2014; 15(1):631. · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The pace of genome sequencing is accelerating, revealing the genetic background of a growing number of organisms. However, assigning function to each nucleotide in a completed genome remains the rate-limiting step. New technologies in transcriptomics and proteomics have influenced the emergence of proteogenomics, a field at the confluence of genomics, transcriptomics, and proteomics. First generation proteogenomic toolkits employ peptide mass spectrometry to identify novel protein coding. These efforts rely heavily on existing algorithms designed for standard proteome analysis and fail to address the challenges specific to genome annotation. In this work we extend first generation proteogenomic tools to achieve greater accuracy and enable the analysis of large, complex genomes. We apply our pipeline to Zea mays, which has a genome comparable in size to human. Our pipeline begins with the comparison of mass spectra to a putative translation of the genome. Our translation includes the six-frame translation as well as a splice graph for capturing protein splice variants. We distribute the identification of mass spectra across 45 compute nodes for increased efficiency and employ a database-independent scoring method to improve sensitivity. We select novel peptides, those that match to a region of the genome that was not previously known to be protein coding, for grouping into events. Each of our eight event types describes a refinement needed to the genome annotation. We present a novel, Bayesian framework for evaluating the accuracy of each event. Our calculated event probability, or eventProb, considers the number of supporting peptides and spectra, and the quality of each supporting peptide-spectrum match. More than 80% of the maize genome is comprised of repetitive elements. To address this, our eventProb handles uniquely located peptides and shared location peptides separately. Our pipeline predicts 165 novel protein-coding genes and proposes updated models for 741 additional genes.
    Molecular &amp Cellular Proteomics 10/2013; · 7.25 Impact Factor


Available from