Pseudo–Messenger RNA: Phantoms of the Transcriptome

Lawrence Livermore National Laboratory, US
PLoS Genetics (Impact Factor: 8.17). 05/2006; 2(4):e23. DOI: 10.1371/journal.pgen.0020023
Source: PubMed

ABSTRACT The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo-messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo-messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.

Download full-text


Available from: Piero Carninci, Jul 05, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Poaceae family, also known as the grasses, includes agronomically important cereal crops such as rice, maize, sorghum, and wheat. Previous comparative studies have shown that much of the gene content is shared among the grasses; however, functional conservation of orthologous genes has yet to be explored. To gain an understanding of the genome-wide patterns of evolution of gene expression across reproductive tissues, we employed a sequence-based approach to compare analogous transcriptomes in species representing three Poaceae subgroups including the Pooideae (Brachypodium distachyon), the Panicoideae (sorghum), and the Ehrhartoideae (rice). Our transcriptome analyses reveal that only a fraction of orthologous genes exhibit conserved expression patterns. A high proportion of conserved orthologs include genes that are upregulated in physiologically similar tissues such as leaves, anther, pistil, and embryo, while orthologs that are highly expressed in seeds show the most diverged expression patterns. More generally, we show that evolution of gene expression profiles and coding sequences in the grasses may be linked. Genes that are highly and broadly expressed tend to be conserved at the coding sequence level while genes with narrow expression patterns show accelerated rates of sequence evolution. We further show that orthologs in syntenic genomic blocks are more likely to share correlated expression patterns compared with non-syntenic orthologs. These findings are important for agricultural improvement because sequence information is transferred from model species, such as Brachypodium, rice, and sorghum to crop plants without sequenced genomes.
    The Plant Journal 03/2012; 71(3):492-502. DOI:10.1111/j.1365-313X.2012.05005.x · 6.82 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
    Genome Research 07/2007; 17(6):839-51. DOI:10.1101/gr.5586307 · 13.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Arising from gene duplications or retrotranspositions, pseudogenes are genomic sequences with high sequence similarity to functional genes but unable to encode the same type of functional molecular products as what their parental sequences produce. For those that are copies of protein-coding genes, this means that they have lost the potential of encoding a functional protein due to disruption in their putative open reading frames. Several computational algorithms have been developed for detecting pseudogenes in recent years and their applications have annotated hundreds and thousands of pseudogenes in higher eukaryotic genomes, including the rice and Arabidopsis genomes. While conventional wisdom considers pseudogenes as dead and inactive sequences, emerging evidence indicates that a large number of higher eukaryotic pseudogenes are transcriptionally alive and that furthermore many of the pseudogene transcripts may play a critical role in regulating gene expression. In particular, analyses of the RNAs from both plant and mammalian tissues or organs using deep-sequencing technology have uncovered scores of pseudogene-derived small RNAs. Their sequence features, together with carefully designed biochemical and genetic experiments, indicate that small RNAs from pseudogenes may function at different molecular levels, either as small interference RNAs directly regulating functional genes or modulating epigenomic silencing in the pseudogenic regions, or as decoy RNAs counteracting the inhibitory effectiveness of miRNAs supposedly targeting functional genes. These exciting discoveries suggest that pseudogenes may represent a hidden layer of regulatory elements in eukaryotic genomes, whose functional importance has just started to be unveiled and appreciated. KeywordsDCL-Dicer-Pseudogenes-RDR2-siRNA-Small RNA
    01/1970: pages 193-208;