Transposable element fragments in protein-coding regions and their contributions to human functional proteins

Institute of Bioinformatics, MOE Key Laboratory of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing 100084, China.
Gene (Impact Factor: 2.14). 11/2007; 401(1-2):165-71. DOI: 10.1016/j.gene.2007.07.012
Source: PubMed


Transposable elements (TEs) and their contributions to protein-coding regions are of particular interest. Here we searched for TE fragments in Homo sapiens at both the transcript and protein levels. We found evidence in support of TE exonization and its association with alternative splicing. Despite recent findings that long evolutionary times are required to incorporate TE into proteins, we found many functional proteins with translated TE cassettes derived from young TEs. Analyses of two Bcl-family proteins and Alu-encoded segments suggest the coding and functional potential of TE sequences.

7 Reads
  • Source
    • "These exons originate from exonization of different transposable elements (TEs) or are located immediately behind a TE in the human genome. Incorporation of TE-derived sequences into promoters or UTR or coding regions of genes has been documented by many studies [35]–[37]. In accordance with the restricted expression of TE-dependent TCF4 transcripts, TEs are known to be transcriptionally silenced in most mammalian tissues as a defense mechanism against potentially deleterious effects of their activity [38], [39]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factor 4 (TCF4 alias ITF2, E2-2, ME2 or SEF2) is a ubiquitous class A basic helix-loop-helix protein that binds to E-box DNA sequences (CANNTG). While involved in the development and functioning of many different cell types, recent studies point to important roles for TCF4 in the nervous system. Specifically, human TCF4 gene is implicated in susceptibility to schizophrenia and TCF4 haploinsufficiency is the cause of the Pitt-Hopkins mental retardation syndrome. However, the structure, expression and coding potential of the human TCF4 gene have not been described in detail. In the present study we used human tissue samples to characterize human TCF4 gene structure and TCF4 expression at mRNA and protein level. We report that although widely expressed, human TCF4 mRNA expression is particularly high in the brain. We demonstrate that usage of numerous 5' exons of the human TCF4 gene potentially yields in TCF4 protein isoforms with 18 different N-termini. In addition, the diversity of isoforms is increased by alternative splicing of several internal exons. For functional characterization of TCF4 isoforms, we overexpressed individual isoforms in cultured human cells. Our analysis revealed that subcellular distribution of TCF4 isoforms is differentially regulated: Some isoforms contain a bipartite nuclear localization signal and are exclusively nuclear, whereas distribution of other isoforms relies on heterodimerization partners. Furthermore, the ability of different TCF4 isoforms to regulate E-box controlled reporter gene transcription is varied depending on whether one or both of the two TCF4 transcription activation domains are present in the protein. Both TCF4 activation domains are able to activate transcription independently, but act synergistically in combination. Altogether, in this study we have described the inter-tissue variability of TCF4 expression in human and provided evidence about the functional diversity of the alternative TCF4 protein isoforms.
    PLoS ONE 07/2011; 6(7):e22138. DOI:10.1371/journal.pone.0022138 · 3.23 Impact Factor
  • Source
    • "These differences have been proposed to play important roles in behavioral adaptation to environmental stresses, including food selection (Despland and Simpson, 2005), responses to conspecies (Despland, 2001) and avoidance to natural enemies (Reynolds et al., 2009) in locusts. Therefore, the locust TEs may mediate the regulation of peripheral and central nervous systems by alternative splicing, protein coding, small RNA production and gene expression regulation (Muotri et al., 2005; Tamura et al., 2007; Thomson and Lin, 2009; Wu et al., 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: It has been reported that many genes and small RNAs are associated with density-dependent polyphenism in locusts. However, the regulatory mechanism underlying gene transcription is still unknown. Here, by analysis of transcriptome database of the migratory locust, we identified abundant transcripts of transposable elements, which are mediators of genetic variation and gene transcriptional regulation, mainly including CR1, I, L2 and RTE-BovB. We cloned one I element, which represents the most abundant transcripts in all transposable elements, and investigated its developmental and tissue-specific expression in gregarious and solitary locusts. Although there are no significant differences of I element expression in whole bodies between gregarious and solitary locusts at various developmental stages, this I element exhibits high expression level and differential expression pattern between gregarious and solitary locusts in central and peripheral nervous tissues, such as brain, antenna and labial palps. These results suggest that I element is potentially involved in the response of neural systems to social environmental changes in locusts.
    Journal of insect physiology 08/2010; 56(8):943-8. DOI:10.1016/j.jinsphys.2010.05.007 · 2.47 Impact Factor
  • Source
    • "The insertion of TE sequence fragments into open reading frames (ORFs) of vertebrate genes may be a general phenomenon (Nekrutenko and Li 2001). Consistent with this conjecture, TEs share sequence similarity with thousands of human protein-coding sequences (Britten 2006), many of which remain functional (Wu et al. 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The goal of this study was to assess the extent to which transposable elements (TEs) have contributed to protein-coding regions in Arabidopsis thaliana. To do this, we first characterized the extent of chimeric TE-gene constructs. We compared a genome-wide TE database to genomic sequences, annotated coding regions, and EST data. The comparison revealed that 7.8% of expressed genes contained a region with close similarity to a known TE sequence. Some groups of TEs, such as helitrons, were underrepresented in exons relative to their genome-wide distribution; in contrast, Copia-like and En/Spm-like sequences were overrepresented in exons. These 7.8% percent of genes were enriched for some GO-based functions, particularly kinase activity, and lacking in other functions, notably structural molecule activity. We also examined gene family evolution for these genes. Gene family information helped clarify whether the sequence similarity between TE and gene was due to a TE contributing to the gene or, instead, the TE co-opting a portion of the gene. Most (66%) of these genes were not easily assigned to a gene family, and for these we could not infer the direction of the relationship between TE and gene. For the remainder, where appropriate, we built phylogenetic trees to infer the direction of the TE-gene relationship by parsimony. By this method, we verified examples where TEs contributed to expressed proteins. Our results are undoubtedly conservative but suggest that TEs may have contributed small protein segments to as many as 1.2% of all expressed, annotated A. thaliana genes.
    Journal of Molecular Evolution 02/2009; 68(1):80-9. DOI:10.1007/s00239-008-9190-5 · 1.68 Impact Factor
Show more