Frequent emergence and functional resurrection of processed pseudogenes in the human and mouse genomes.

Japan Biological Information Research Center, Japan Biological Informatics Consortium, AIST Bio-IT Research Bldg 7F, 2-42 Aomi, Tokyo, Japan.
Gene (Impact Factor: 2.08). 04/2007; 389(2):196-203. DOI: 10.1016/j.gene.2006.11.007
Source: PubMed

ABSTRACT Despite the wide distribution of processed pseudogenes in mammalian genomes, such as those of human and mouse, relatively little is known about their roles in genomic evolution. While gene duplications are recognized as one of the major driving forces in genome evolution, processed pseudogenes, which are retrotransposed copies of mRNAs, have been regarded as junk or selfish DNA for a long time. In order to elucidate the quantitative and qualitative contribution of processed pseudogenes to the mammalian genome evolution, we attempted to detect processed pseudogenes by extensively mapping the mRNAs to both the human and mouse genomes, and then we estimated the rate of their emergence. As a result, we revealed that the rate of pseudogene emergence was about 1-2% per gene per million years, which was as high as the rate (0.9%) of gene duplication in the human genome, although the rate of pseudogene emergence was found to drastically decrease in the hominid lineage. Furthermore, 1% of the processed pseudogenes seemed to be reinvigorated by post-retrotransposition transcription, many of them preserving the intact coding regions. Since the expression patterns of transcribed pseudogenes in various tissues were quite different between human and mouse, their emergence might have led to species-specific evolution. Our results indicate that the generation of processed pseudogenes was not wholly futile but instead has been an indispensable resource, driving dynamic evolution of the mammalian genomes.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Pseudogenes are defined as non-functional relatives of genes whose protein-coding abilities are lost and are no longer expressed within cells. They are an outcome of accumulation of mutations within a gene whose end product is not essential for survival. Proper investigation of the procedure of pseudogenization is relevant for estimating occurrence of duplications in genomes. Frankineae houses an interesting group of microorganisms, carving a niche in the microbial world. This study was undertaken with the objective of determining the abundance of pseudogenes, understanding strength of purifying selection, investigating evidence of pseudogene expression, and analysing their molecular nature, their origin, evolution and deterioration patterns amongst domain families. Investigation revealed the occurrence of 956 core pFAM families sharing common characteristics indicating co-evolution. WD40, Rve_3, DDE_Tnp_IS240 and phage integrase core domains are larger families, having more pseudogenes, signifying a probability of harmful foreign genes being disabled within transposable elements. High selective pressure depicted that gene families rapidly duplicating and evolving undoubtedly facilitated creation of a number of pseudogenes in Frankineae. Codon usage analysis between protein-coding genes and pseudogenes indicated a wide degree of variation with respect to different factors. Moreover, the majority of pseudogenes were under the effect of purifying selection. Frankineae pseudogenes were under stronger selective constraints, indicating that they were functional for a very long time and became pseudogenes abruptly. The origin and deterioration of pseudogenes has been attributed to selection and mutational pressure acting upon sequences for adapting to stressed soil environments.
    Journal of Biosciences 11/2013; 38(4):727-32. DOI:10.1007/s12038-013-9356-1 · 1.94 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the PTEN pseudogene, PTENP1. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.
    PLoS ONE 04/2014; 9(4):e93972. DOI:10.1371/journal.pone.0093972 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In primates and other animals reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either 'retrogenes' coding for functioning proteins or expressed 'processed pseudogenes', which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We developed new methodologies allowing us to identify 'novel' retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of the 1000 Genomes Project. The accuracy of our dataset was corroborated by (i) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (ii) experimental validation, and (iii) the fact that we can reconstruct a correct phylogenetic tree of human sub-populations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle, and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and perhaps, is even a requirement for it.
    Genome Research 09/2013; DOI:10.1101/gr.154625.113 · 13.85 Impact Factor