Expression of Conjoined Genes: Another Mechanism for Gene Regulation in Eukaryotes

University of Texas Arlington, United States of America
PLoS ONE (Impact Factor: 3.23). 10/2010; 5(10):e13284. DOI: 10.1371/journal.pone.0013284
Source: PubMed


From the ENCODE project, it is realized that almost every base of the entire human genome is transcribed. One class of transcripts resulting from this arises from the conjoined gene, which is formed by combining the exons of two or more distinct (parent) genes lying on the same strand of a chromosome. Only a very limited number of such genes are known, and the definition and terminologies used for them are highly variable in the public databases. In this work, we have computationally identified and manually curated 751 conjoined genes (CGs) in the human genome that are supported by at least one mRNA or EST sequence available in the NCBI database. 353 representative CGs, of which 291 (82%) could be confirmed, were subjected to experimental validation using RT-PCR and sequencing methods. We speculate that these genes are arising out of novel functional requirements and are not merely artifacts of transcription, since more than 70% of them are conserved in other vertebrate genomes. The unique splicing patterns exhibited by CGs reveal their possible roles in protein evolution or gene regulation. Novel CGs, for which no transcript is available, could be identified in 80% of randomly selected potential CG forming regions, indicating that their formation is a routine process. Formation of CGs is not only limited to human, as we have also identified 270 CGs in mouse and 227 in drosophila using our approach. Additionally, we propose a novel mechanism for the formation of CGs. Finally, we developed a database, ConjoinG, which contains detailed information about all the CGs (800 in total) identified in the human genome. In summary, our findings reveal new insights about the functionality of CGs in terms of another possible mechanism for gene regulation and genomic evolution and the mechanism leading to their formation.

Download full-text


Available from: Vineet K Sharma, Apr 18, 2014
10 Reads
  • Source
    • "In our pancreatic cancer signature, our model also identified several gene fusions not previously associated with this disease, including ANKHD1-EIF4EBP3, a readthrough transcript of the neighboring cell survival scaffolding gene ANKHD1 and the downstream translational repressor EIF4EBP3, both of which are effectors of the RAS/MAPK pathway [83] [84], which is known to play a critical role in the development and progression of pancreatic cancer [85] [86] [87] [88]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Metastatic cancer of unknown primary (CUP) accounts for up to 5% of all new cancer cases, with a 5-year survival rate of only 10%. Accurate identification of tissue of origin would allow for directed, personalized therapies to improve clinical outcomes. Our objective was to use transcriptome sequencing (RNA-Seq) to identify lineage-specific biomarker signatures for the cancer types that most commonly metastasize as CUP (colorectum, kidney, liver, lung, ovary, pancreas, prostate, and stomach). RNA-Seq data of 17,471 transcripts from a total of 3,244 cancer samples across 26 different tissue types were compiled from in-house sequencing data and publically available International Cancer Genome Consortium and The Cancer Genome Atlas datasets. Robust cancer biomarker signatures were extracted using a 10-fold cross-validation method of log transformation, quantile normalization, transcript ranking by area under the receiver operating characteristic curve, and stepwise logistic regression. The entire algorithm was then repeated with a new set of randomly generated training and test sets, yielding highly concordant biomarker signatures. External validation of the cancer-specific signatures yielded high sensitivity (92.0% ± 3.15%; mean ± standard deviation) and specificity (97.7% ± 2.99%) for each cancer biomarker signature. The overall performance of this RNA-Seq biomarker-generating algorithm yielded an accuracy of 90.5%. In conclusion, we demonstrate a computational model for producing highly sensitive and specific cancer biomarker signatures from RNA-Seq data, generating signatures for the top eight cancer types responsible for CUP to accurately identify tumor origin.
    Neoplasia (New York, N.Y.) 11/2014; 16(11). DOI:10.1016/j.neo.2014.09.007 · 4.25 Impact Factor
  • Source
    • "The remaining 52 transcripts are considered polycistronic candidates, among which three are known transcripts (SNURF-SNRPN, LUZP6 and GDF1; GIs: 29540556, 190886450 and 110349791, respectively). An additional three undergo an unusual transcription pattern: leptin receptor (LEPR, GI: 310923183), which is reported to share the same promoter and the first two exons with the leptin receptor overlapping transcript (LEPROT) gene [36]; The IGF 2 read-through product (GI: 183603938); And the GPR75- ASB3 gene (G protein-coupled receptor 75-ankyrin repeat and SOCS box containing 3; GI: 188528701) read-through product [37] (Table 3, a detailed description in Table S1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Eukaryotic polycistronic transcription units are rare and only a few examples are known, mostly being the outcome of serendipitous discovery. We claim that nonsense-mediated mRNA decay (NMD) immune structure is a common characteristic of polycistronic transcripts, and that this immunity is an emergent property derived from all functional CDSs. The human RefSeq transcriptome was computationally screened for transcripts capable of eliciting NMD, and which contain an additional ORF(s) potentially capable of rescuing the transcript from NMD. Transcripts were further analyzed implementing domain-based strategies in order to estimate the potential of the candidate ORF to encode a functional protein. Consequently, we predict the existence of forty nine novel polycistronic transcripts. Experimental verification was carried out utilizing two different types of analyses. First, five Gene Expression Omnibus (GEO) datasets from published NMD-inhibition studies were used, aiming to explore whether a given mRNA is indeed insensitive to NMD. All known bicistronic transcripts and eleven out of the twelve predicted genes that were analyzed, displayed NMD insensitivity using various NMD inhibitors. For three genes, a mixed expression pattern was observed presenting both NMD sensitivity and insensitivity in different cell types. Second, we used published global translation initiation sequencing data from HEK293 cells to verify the existence of translation initiation sites in our predicted polycistronic genes. In five of our genes, the predicted rescuing uORFs are indeed identified as translation initiation sites, and in two additional genes, one of two predicted rescuing uORF is verified. These results validate our computational analysis and reinforce the possibility that NMD-immune architecture is a parameter by which polycistronic genes can be identified. Moreover, we present evidence for NMD-mediated regulation controlling the production of one or more proteins encoded in the polycistronic transcript.
    PLoS ONE 03/2014; 9(3):e91535. DOI:10.1371/journal.pone.0091535 · 3.23 Impact Factor
  • Source
    • "This suggests RNA polymerase read-through as a potential mechanism for generating the VTI1A-TCF7L2 transcripts in the absence of a corresponding genomic breakpoint. Several reports have shown that genes in close proximity in the human genome are expressed as conjoined genes, also called tandem chimeras, transcripts that are combined of at least part of one exon from two or more distinct genes that lie on the same chromosome [18]–[20]. It has been suggested that the expression of conjoined genes increase the complexity of the human genome by translating into distinct proteins, or that these transcripts play a role in regulation of canonical transcript levels. "
    [Show abstract] [Hide abstract]
    ABSTRACT: VTI1A-TCF7L2 was reported as a recurrent fusion gene in colorectal cancer (CRC), found to be expressed in three out of 97 primary cancers, and one cell line, NCI-H508, where a genomic deletion joins the two genes [1]. To investigate this fusion further, we analyzed high-throughput DNA and RNA sequencing data from seven CRC cell lines, and identified the gene RP11-57H14.3 (ENSG00000225292) as a novel fusion partner for TCF7L2. The fusion was discovered from both genome and transcriptome data in the HCT116 cell line. By triplicate nested RT-PCR, we tested both the novel fusion transcript and VTI1A-TCF7L2 for expression in a series of 106 CRC tissues, 21 CRC cell lines, 14 normal colonic mucosa, and 20 normal tissues from miscellaneous anatomical sites. Altogether, 42% and 45% of the CRC samples expressed VTI1A-TCF7L2 and TCF7L2-RP11-57H14.3 fusion transcripts, respectively. The fusion transcripts were both seen in 29% of the normal colonic mucosa samples, and in 25% and 75% of the tested normal tissues from other organs, revealing that the TCF7L2 fusion transcripts are neither specific to cancer nor to the colon and rectum. Seven different splice variants were detected for the VTI1A-TCF7L2 fusion, of which three are novel. Four different splice variants were detected for the TCF7L2-RP11-57H14.3 fusion. In conclusion, we have identified novel variants of VTI1A-TCF7L2 fusion transcripts, including a novel fusion partner gene, RP11-57H14.3, and demonstrated detectable levels in a large fraction of CRC samples, as well as in normal colonic mucosa and other tissue types. We suggest that the fusion transcripts observed in a high frequency of samples are transcription induced chimeras that are expressed at low levels in most samples. The similar fusion transcripts induced by genomic rearrangements observed in individual cancer cell lines may yet have oncogenic potential as suggested in the original study by Bass et al.
    PLoS ONE 03/2014; 9(3):e91264. DOI:10.1371/journal.pone.0091264 · 3.23 Impact Factor
Show more