Transcription Factor Binding Profiles Reveal Cyclic Expression of Human Protein-coding Genes and Non-coding RNAs

Institute for Quantitative Biomedical Sciences, Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, United States of America.
PLoS Computational Biology (Impact Factor: 4.62). 07/2013; 9(7):e1003132. DOI: 10.1371/journal.pcbi.1003132
Source: PubMed


Author Summary
Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Microarray time course experiments have been successfully used to identify cell cycle regulated genes but with several limitations, e.g. less effective in identifying genes with low expression. We propose a computational approach to predict cell cycle genes based on TF binding data and motif information in their promoters. Specifically, we take advantage of ChIP-seq TF binding data generated by the ENCODE project and the TF binding motif information available from public databases. These data were processed and utilized as predictor for predicting cell cycle genes using the Random Forest method. Our results show that both the trans- TF features and the cis- motif features are predictive to cell cycle genes, and a combination of the two types features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division.

Download full-text


Available from: Matthew Ung, Jan 07, 2014
  • Source
    • "Hence, it is very timely crucial to comprehensively review existing methods with pros and cons and guide future direction on analytical strategy and application of methods in static and temporal dynamics. Timeseries experiment is largely composed of three different settings, (I) a single-series time course to study a developmental transient pattern (Pauli et al., 2012), (II) a multi-series or factorial time course that interrogates multiple biological reactions to specific external stimuli at each time point (Jager et al., 2011; Sivriver et al., 2011), and (III) a periodical time course in cell-cycle or circadian rhythmic data (Bar-Joseph et al., 2012; Cheng et al., 2013; Lokody, 2014). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Interpreting gene expression profiles often involves statistical analysis of large numbers of differentially expressed genes, isoforms, and alternative splicing events at either static or dynamic spectrums. Reduced sequencing costs have made feasible dense time-series analysis of gene expression using RNA-seq; however, statistical methods in the context of temporal RNA-seq data are poorly developed. Here we will review current methods for identifying temporal changes in gene expression using RNA-seq, which are limited to static pairwise comparisons of time points and which fail to account for temporal dependencies in gene expression patterns. We also review recently developed very few number of temporal dynamic RNA-seq specific methods. Application and development of RNA-specific temporal dynamic methods have been continuously under the development, yet, it is still in infancy. We fully cover microarray specific temporal methods and transcriptome studies in initial digital technology (e.g., SAGE) between traditional microarray and new RNA-seq.
    Full-text · Article · Feb 2014 · Frontiers in Genetics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: IntroductionGenetic and molecular signatures have been incorporated into cancer prognosis prediction and treatment decisions with good success over the past decade. Clinically, these signatures are usually used in early-stage cancers to evaluate whether they require adjuvant therapy following surgical resection. A molecular signature that is prognostic across more clinical contexts would be a useful addition to current signatures.Method We defined a signature for the ubiquitous tissue factor, E2F4, based on its shared target genes in multiple tissues. These target genes were identified by Chromatin Immunoprecipitation Sequencing (ChIP-seq) experiments using a probabilistic method. We then computationally calculated the regulatory activity score (RAS) of E2F4 in cancer tissues, and examined how E2F4 RAS correlates with patient survival.ResultsGenes in our E2F4 signature were 21-fold more likely to be correlated with breast cancer patient survival time compared to randomly selected genes. Using eight independent breast cancer datasets containing over 1,900 unique samples, we stratified patients into low and high E2F4 RAS groups. E2F4 activity stratification was highly predictive of patient outcome, and our results remained robust even when controlling for many factors including patient age, tumor size, grade, estrogen receptor (ER) status, lymph node status, whether the patient received adjuvant therapy, and the patient¿s other prognostic indices such as Adjuvant! and the Nottingham Prognostic Index scores. Furthermore, the fractions of samples with positive E2F4 RAS vary in different intrinsic breast cancer subtypes, consistent with the different survival profiles of these subtypes.Conclusion We defined a prognostic signature, the E2F4 regulatory activity score, and showed it to be significantly predictive of patient outcome in breast cancer regardless of treatment status and the states of many other clinicopathological variables. It can be used in conjunction with other breast cancer classification methods such as Oncotype DX to improve clinical outcome prediction.
    Full-text · Article · Dec 2014 · Breast cancer research: BCR
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: All the cell types are under strict control of how their genes are transcribed into expressed transcripts by the temporally dynamic orchestration of the transcription factor binding activities. Given a set of known binding sites (BSs) of a given transcription factor (TF), computational TFBS screening technique represents a cost efficient and large scale strategy to complement the experimental ones. There are two major classes of computational TFBS prediction algorithms based on the tertiary and primary structures, respectively. A tertiary structure based algorithm tries to calculate the binding affinity between a query DNA fragment and the tertiary structure of the given TF. Due to the limited number of available TF tertiary structures, primary structure based TFBS prediction algorithm is a necessary complementary technique for large scale TFBS screening. This study proposes a novel evolutionary algorithm to randomly mutate the weights of different positions in the binding motif of a TF, so that the overall TFBS prediction accuracy is optimized. The comparison with the most widely used algorithm, Position Weight Matrix (PWM), suggests that our algorithm performs better or the same level in all the performance measurements, including sensitivity, specificity, accuracy and Matthews correlation coefficient. Our data also suggests that it is necessary to remove the widely used assumption of independence between motif positions. The supplementary material may be found at: .
    Full-text · Article · Jan 2015 · Advances in Experimental Medicine and Biology
Show more