Article

Expression of conjoined genes: another mechanism for gene regulation in eukaryotes.

MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute, Yokohama, Japan.
PLoS ONE (Impact Factor: 3.53). 01/2010; 5(10):e13284. DOI: 10.1371/journal.pone.0013284
Source: PubMed

ABSTRACT From the ENCODE project, it is realized that almost every base of the entire human genome is transcribed. One class of transcripts resulting from this arises from the conjoined gene, which is formed by combining the exons of two or more distinct (parent) genes lying on the same strand of a chromosome. Only a very limited number of such genes are known, and the definition and terminologies used for them are highly variable in the public databases. In this work, we have computationally identified and manually curated 751 conjoined genes (CGs) in the human genome that are supported by at least one mRNA or EST sequence available in the NCBI database. 353 representative CGs, of which 291 (82%) could be confirmed, were subjected to experimental validation using RT-PCR and sequencing methods. We speculate that these genes are arising out of novel functional requirements and are not merely artifacts of transcription, since more than 70% of them are conserved in other vertebrate genomes. The unique splicing patterns exhibited by CGs reveal their possible roles in protein evolution or gene regulation. Novel CGs, for which no transcript is available, could be identified in 80% of randomly selected potential CG forming regions, indicating that their formation is a routine process. Formation of CGs is not only limited to human, as we have also identified 270 CGs in mouse and 227 in drosophila using our approach. Additionally, we propose a novel mechanism for the formation of CGs. Finally, we developed a database, ConjoinG, which contains detailed information about all the CGs (800 in total) identified in the human genome. In summary, our findings reveal new insights about the functionality of CGs in terms of another possible mechanism for gene regulation and genomic evolution and the mechanism leading to their formation.

0 Bookmarks
 · 
108 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Metastatic cancer of unknown primary (CUP) accounts for up to 5% of all new cancer cases, with a 5-year survival rate of only 10%. Accurate identification of tissue of origin would allow for directed, personalized therapies to improve clinical outcomes. Our objective was to use transcriptome sequencing (RNA-Seq) to identify lineage-specific biomarker signatures for the cancer types that most commonly metastasize as CUP (colorectum, kidney, liver, lung, ovary, pancreas, prostate, and stomach). RNA-Seq data of 17,471 transcripts from a total of 3,244 cancer samples across 26 different tissue types were compiled from in-house sequencing data and publically available International Cancer Genome Consortium and The Cancer Genome Atlas datasets. Robust cancer biomarker signatures were extracted using a 10-fold cross-validation method of log transformation, quantile normalization, transcript ranking by area under the receiver operating characteristic curve, and stepwise logistic regression. The entire algorithm was then repeated with a new set of randomly generated training and test sets, yielding highly concordant biomarker signatures. External validation of the cancer-specific signatures yielded high sensitivity (92.0% ± 3.15%; mean ± standard deviation) and specificity (97.7% ± 2.99%) for each cancer biomarker signature. The overall performance of this RNA-Seq biomarker-generating algorithm yielded an accuracy of 90.5%. In conclusion, we demonstrate a computational model for producing highly sensitive and specific cancer biomarker signatures from RNA-Seq data, generating signatures for the top eight cancer types responsible for CUP to accurately identify tumor origin.
    Neoplasia. 11/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chimeric RNAs originating from two or more different genes are known to exist not only in cancer, but also in normal tissues, where they can play a role in human evolution. However, the exact mechanism of their formation is unknown. Here, we use RNA sequencing data from 462 healthy individuals representing 5 human populations to systematically identify and in depth characterize 81 RNA tandem chimeric transcripts, 13 of which are novel. We observe that 6 out of these 81 chimeras have been regarded as cancer-specific. Moreover, we show that a prevalence of long introns at the fusion breakpoint is associated with the chimeric transcripts formation. We also find that tandem RNA chimeras have lower abundances as compared to their partner genes. Finally, by combining our results with genomic data from the same individuals we uncover intronic genetic variants associated with the chimeric RNA formation. Taken together our findings provide an important insight into the chimeric transcripts formation and open new avenues of research into the role of intronic genetic variants in post-transcriptional processing events.
    PLoS ONE 01/2014; 9(8):e104567. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mammalian splicing regulatory protein RNA-binding motif protein 4 (RBM4) has an alanine repeat-containing C-terminal domain (CAD) that confers both nuclear- and splicing speckle-targeting activities. Alanine-repeat expansion has pathological potential. Here we show that the alanine-repeat tracts influence the subnuclear targeting properties of the RBM4 CAD in cultured human cells. Notably, truncation of the alanine tracts redistributed a portion of RBM4 to paraspeckles. The alanine-deficient CAD was sufficient for paraspeckle targeting. On the other hand, alanine-repeat expansion reduced the mobility of RBM4 and impaired its splicing activity. We further took advantage of the putative coactivator activator (CoAA)-RBM4 conjoined splicing factor, CoAZ, to investigate the function of the CAD in subnuclear targeting. Transiently expressed CoAZ formed discrete nuclear foci that emerged and subsequently separated-fully or partially-from paraspeckles. Alanine-repeat expansion appeared to prevent CoAZ separation from paraspeckles, resulting in their complete colocalization. CoAZ foci were dynamic but, unlike paraspeckles, were resistant to RNase treatment. Our results indicate that the alanine-rich CAD, in conjunction with its conjoined RNA-binding domain(s), differentially influences the subnuclear localization and biogenesis of RBM4 and CoAZ. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Nucleic Acids Research 11/2014; · 8.81 Impact Factor

Full-text (3 Sources)

Download
3 Downloads
Available from
Jun 10, 2014