Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The quantification of the kinetic rates of RNA synthesis, processing, and degradation are largely based on the integrative analysis of total and nascent transcription, the latter being quantified through RNA metabolic labeling. We developed INSPEcT-, a computational method based on the mathematical modeling of premature and mature RNA expression that is able to quantify kinetic rates from steady-state or time course total RNA-seq data without requiring any information on nascent transcripts. Our approach outperforms available solutions, closely recapitulates the kinetic rates obtained through RNA metabolic labeling, improves the ability to detect changes in transcript half-lives, reduces the cost and complexity of the experiments, and can be adopted to study experimental conditions in which nascent transcription cannot be readily profiled. Finally, we applied INSPEcT- to the characterization of post-transcriptional regulation landscapes in dozens of physiological and disease conditions. This approach was included in the INSPEcT Bioconductor package, which can now unveil RNA dynamics from steady-state or time course data, with or without the profiling of nascent RNA.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Those methods rely, for each sample, on a the separate quantification of labeled RNA on one hand and of total (mixed labeled and unlabeled) and/or unlabeled (or pre-existing) RNA on the other hand. In its later version, INSPEcT was extended to estimate rates without labeling the sample [26]. ...
... We notice that this last expression is of the same form as the one for the unlabeled fraction (8), but replacing exponentials by their complement to one. Importantly these two fractions do not depend on α , which (unlike [26]) allows our method to estimate processing and degradation rates independently from the synthesis rate. ...
... Together with the high correlation between synthesis and processing rate, it suggests that modeling transcription and processing as independent events is a simplification that could be reconsidered, as the coupling between the two has been documented [28]. However, this limitation of the Zeisel model is likely to also affect other methods using it [26,37]. ...
Article
Full-text available
Background Over the past decade, experimental procedures such as metabolic labeling for determining RNA turnover rates at the transcriptome-wide scale have been widely adopted and are now turning to single cell measurements. Several computational methods to estimate RNA synthesis, processing and degradation rates from such experiments have been suggested, but they all require several RNA sequencing samples. Here we present a method that can estimate those three rates from a single sample. Methods Our method relies on the analytical solution to the Zeisel model of RNA dynamics. It was validated on metabolic labeling experiments performed on mouse embryonic stem cells. Resulting degradation rates were compared both to previously published rates on the same system and to a state-of-the-art method applied to the same data. Results Our method is computationally efficient and outputs rates that correlate well with previously published data sets. Using it on a single sample, we were able to reproduce the observation that dynamic biological processes tend to involve genes with higher metabolic rates, while stable processes involve genes with lower rates. This supports the hypothesis that cells control not only the mRNA steady-state abundance, but also its responsiveness, i.e., how fast steady state is reached. Moreover, degradation rates obtained with our method compare favourably with the other tested method. Conclusions In addition to saving experimental work and computational time, estimating rates for a single sample has several advantages. It does not require an error-prone normalization across samples and enables the use of replicates to estimate uncertainty and assess sample quality. Finally the method and theoretical results described here are general enough to be useful in other contexts such as nucleotide conversion methods and single cell metabolic labeling experiments.
... Although previous studies have demonstrated that the EISA algorithm [27] and its improved REMBRANDTS package [32,93,94] used in this study achieve fairly high accuracy for mRNA stability evaluations, the accuracy of inferred mRNA stability may vary significantly between different genes. First, the differential expressed long noncoding RNAs (lncRNAs) [95,96] or perturbated factors involved in intron degradation [27,97] could cause the changes of difference in intronic read counts (Δintron) to affect the stability estimate. Adding the annotation of non-coding RNAs in the alignment of RNA-Seq may improve the accuracy of the mRNA stability inference. ...
... Indeed, 41.88% (634/1514) of stQTLs with stringency ≥0.9 overlap with RBP binding sites, which is significantly higher (P = 6e-44, Fisher's exact test) than 23.69% (1546/6527) of stQTLs with stringency ≤0.5. Finally, it should be of note that the mRNA stability calculated from RNA-Seq using REMBRANDTS is not an actual absolute value but a differential mRNA stability relative to the average of all samples for a given gene [32,97,98]. Due to these limitations, it may be difficult to directly compare the stQTLs identified using different tissue data. ...
... Furthermore, computer algorithms based on RNA-Seq are still under continuous development. For example, INSPEcT [97] was recently designed to calculate RNA kinetic rates based on time course RNA-seq data, or to estimate stability by calculating the difference between premature and mature RNA expression [100]. Going forward, stQTLs identified with more accurate mRNA stability profile estimations may further our understanding of how genetic variants regulate gene expression. ...
Article
Full-text available
Background Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs). Results Here, we presented a computational framework that takes advantage of recently developed methods to infer the mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3942 genes and 186,132 eQTLs for 4751 genes from 15,122,700 genetic variants for 13,476 genes on the autosomes, respectively. Interestingly, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels.
... 8 Additionally, the commonly used RNA-Seq of total or Poly-A RNAs (without metabolic labelling) has also been reported 9 to contain a non-negligible proportion of intronic reads in both bulk [7,8] and single-cell samples [9]. Increasing evidence 10 suggests that such intronic reads represent the level of pre-mRNAs hence are informative for revealing transcriptional 11 regulation [8,10] and RNA kinetic parameters [9,11]. One prominent use of these intronic reads is to estimate the kinetic 12 rates in single cells, followed by quantification of RNA velocity -the time derivative of the mature RNAs, which greatly 13 aids to infer future state of each cell on a time scale of hours [9]. ...
... 55 Therefore, we quantify the (relative) splicing efficiency via the stochastic mode in scVelo for its simpler model assumptions. 56 Unsurprisingly, we found that this quantification is highly consistent with two other tools: velocyto [9], another RNA 57 velocity method with similar settings, and INSPEcT- [11], an RNA kinetics estimator with treating scRNA-seq data in a 58 bulk manner (Pearson's R=0.97 for velocyto and Pearson's R=0.667 for INSPEcT-; Supp. Fig. S3). ...
... This is because the observations of 150 spliced and unspliced reads are cooperative results from kinetic rates and its latent induction or suppression time [12]. On 151 the other hand, the relative splicing efficiency, i.e., the ratio between splicing and degradation rates, is more straightforward 152 to obtain, with proportional to the spliced RNAs [11]. This relative value is also an important indicator of the RNA 153 processes and a key element of the RNA velocity analysis. ...
Preprint
Full-text available
RNA splicing is a key step of gene expression in higher organisms. Accurate quantification of the two-step splicing kinetics is of high interests not only for understanding the regulatory machinery, but also for estimating the RNA velocity in single cells. However, the kinetic rates remain poorly understood due to the intrinsic low content of unspliced RNAs and its stochasticity across contexts. Here, we estimated the relative splicing efficiency across a variety of single-cell RNA-Seq data with scVelo. We further extracted three large feature sets including 92 basic genomic sequence features, 65,536 octamers and 120 RNA binding proteins features and found they are highly predictive to RNA splicing efficiency across multiple tissues on human and mouse. A set of important features have been identified with strong regulatory potentials on splicing efficiency. This predictive power brings promise to reveal the complexity of RNA processing and to enhance the estimation of single-cell RNA velocity.
... Profiling of transcriptional maps by high-throughput sequencing is currently considered routine, and public repositories include thousands of RNA-seq experiments that enable both absolute and comparative gene expression quantification [2]. However, RNAs being transient species, the mere quantification of a transcript's copy number is poorly informative of the underlying dynamics and could actually lead to misleading conclusions [3]. Taking a transcript's abundance as a direct measurement of its corresponding gene's transcriptional activity is a widespread oversimplification. ...
... This set of information would allow a fuller understanding of the expression status of a given gene in a given biological condition. First, this information would reveal the close link between a gene's transcriptional activity and its actual transcriptional output, and that, at steady state, the population of its RNAs is only apparently static [3]. In fact, the tight balance between RNA production and decay results into a constant flowing of genetic information, comparable to the constant flowing of water in and out of a sink where both the faucet and the drain are open. ...
... In fact, the tight balance between RNA production and decay results into a constant flowing of genetic information, comparable to the constant flowing of water in and out of a sink where both the faucet and the drain are open. Second, this information would unveil a cell's ability to modulate gene expression levels at a given speed [3]. For example, if the faucet were suddenly closed, the time the sink would take to become empty would be entirely dictated by the size of the drain, i.e. the rate of degradation of a given RNA species. ...
Article
Full-text available
Despite gene expression programs being notoriously complex, RNA abundance is usually assumed as a proxy for transcriptional activity. Recently developed approaches, able to disentangle transcriptional and post-transcriptional regulatory processes, have revealed a more complex scenario. It is now possible to work out how synthesis, processing and degradation kinetic rates collectively determine the abundance of each gene’s RNA. It has become clear that the same transcriptional output can correspond to different combinations of the kinetic rates. This underscores the fact that markedly different modes of gene expression regulation exist, each with profound effects on a gene’s ability to modulate its own expression. This review describes the development of the experimental and computational approaches, including RNA metabolic labeling and mathematical modeling, that have been disclosing the mechanisms underlying complex transcriptional programs. Current limitations and future perspectives in the field are also discussed.
... In the last few years, several tools have been proposed to infer KRs from experimental data, and of these, DRiLL (Rabani et al., 2014) and INSPEcT (de Pretis et al., 2015;Furlan et al., 2020) provide a characterisation of all the crucial steps of the RNA life cycle from sequencing data. These tools are based on a least-squared estimation, and each gene is assumed to be independent of the others. ...
... ; https://doi.org/10. 1101/2020 beyond the scope of this manuscript. ...
... ; https://doi.org/10. 1101/2020 analysis of the clusters results in meaningful sets of terms in the context of MYC biology, and which are in conceptual agreement with the shape of the responses. This is particularly true for the synthesis rate, which is the most informative regulatory layer in this specific biological system. ...
Preprint
Full-text available
We propose a hierarchical Bayesian approach to infer the RNA synthesis, processing, and degradation rates from sequencing data. We parametrise kinetic rates with novel functional forms and estimate the parameters through a Dirichlet process defined at a low level of hierarchy. Despite the complexity of this approach, we manage to perform inference, clusterisation and model selection simultaneously. We apply our method to investigate transcriptional and post-transcriptional responses of murine fibroblasts to the activation of proto-oncogene MYC. We uncover a widespread choral regulation of the three rates, which was not previously observed in this biological system.
... Both SLAM-seq and TUC-seq have comparable conversion rates (>90%), while lower values were reported for TLS-seq (around 80%), but a direct comparison using the same starting material under the same conditions has not yet been performed nor a systematic analysis of RNA kinetics inferred from each protocol. Although different bioinformatics approaches have been used to quantify kinetic rates with [7,9,10] or without the profiling of nascent RNA [21], at the time of writing this manuscript, only two open-source softwares were readily available to be used with the latest nucleotide conversion methods. The SLAM-DUNK pipeline provides overall conversion rates and was used to analyze SLAM-seq from 3'-end sequencing (Quant-seq) data [12,16,19,22]. ...
... Here, we showed a high efficiency of the streptavidin purification and negligible contamination of the biotinenriched fraction. As recently advocated, contamination should be assessed at every experiment [21]. ...
Article
Full-text available
Metabolic labeling of newly transcribed RNAs coupled with RNA-seq is being increasingly used for genome-wide analysis of RNA dynamics. Methods including standard biochemical enrichment and recent nucleotide conversion protocols each require special experimental and computational treatment. Despite their immediate relevance, these technologies have not yet been assessed and benchmarked, and no data are currently available to advance reproducible research and the development of better inference tools. Here, we present a systematic evaluation and comparison of four RNA labeling protocols: 4sU-tagging biochemical enrichment, including spike-in RNA controls, SLAM-seq, TimeLapse-seq and TUC-seq. All protocols are evaluated based on practical considerations, conversion efficiency and wet lab requirements to handle hazardous substances. We also compute decay rate estimates and confidence intervals for each protocol using two alternative statistical frameworks, pulseR and GRAND-SLAM, for over 11 600 human genes and evaluate the underlying computational workflows for their robustness and ease of use. Overall, we demonstrate a high inter-method reliability across eight use case scenarios. Our results and data will facilitate reproducible research and serve as a resource contributing to a fuller understanding of RNA biology.
... LPS, TNF-a in myeloid cells or fibroblast) (265)(266)(267). Even in the absence of metabolic labeling, recent mathematical models are able to estimate with accuracy mRNA degradation rates from total RNA-seq datasets (268,269). These analyses showed that in cells stimulated with LPS or TNF-a, the raise of mRNA levels induced by pro-inflammatory stimuli is mainly due to a global increase at the transcriptional level, with a globally constant mRNA degradation rate (265,266). ...
Article
Full-text available
Innate immunity is the frontline of defense against infections and tissue damage. It is a fast and semi-specific response involving a myriad of processes essential for protecting the organism. These reactions promote the clearance of danger by activating, among others, an inflammatory response, the complement cascade and by recruiting the adaptive immunity. Any disequilibrium in this functional balance can lead to either inflammation-mediated tissue damage or defense inefficiency. A dynamic and coordinated gene expression program lies at the heart of the innate immune response. This expression program varies depending on the cell-type and the specific danger signal encountered by the cell and involves multiple layers of regulation. While these are achieved mainly via transcriptional control of gene expression, numerous post-transcriptional regulatory pathways involving RNA-binding proteins (RBPs) and other effectors play a critical role in its fine-tuning. Alternative splicing, translational control and mRNA stability have been shown to be tightly regulated during the innate immune response and participate in modulating gene expression in a global or gene specific manner. More recently, microRNAs assisting RBPs and post-transcriptional modification of RNA bases are also emerging as essential players of the innate immune process. In this review, we highlight the numerous roles played by specific RNA-binding effectors in mediating post-transcriptional control of gene expression to shape innate immunity.
Article
In eukaryotes, RNA is synthesised in the nucleus, spliced, and exported to the cytoplasm where it is translated and finally degraded. Any of these steps could be subject to temporal regulation during the circadian cycle, resulting in daily fluctuations of RNA accumulation and affecting the distribution of transcripts in different subcellular compartments. Our study analysed the nuclear and cytoplasmic, poly(A) and total transcriptomes of mouse livers collected over the course of a day. These data provide a genome-wide temporal inventory of enrichment in subcellular RNA, and revealed specific signatures of splicing, nuclear export and cytoplasmic mRNA stability related to transcript and gene lengths. Combined with a mathematical model describing rhythmic RNA profiles, we could test the rhythmicity of export rates and cytoplasmic degradation rates of approximately 1400 genes. With nuclear export times usually much shorter than cytoplasmic half-lives, we found that nuclear export contributes to the modulation and generation of rhythmic profiles of 10% of the cycling nuclear mRNAs. This study contributes to a better understanding of the dynamic regulation of the transcriptome during the day-night cycle.
Chapter
The RNA abundance of each gene is determined by its rates of transcription and RNA decay. Biochemical experiments that measure these rates, including transcription inhibition and metabolic labelling, are challenging to perform and are largely limited to in vitro settings. Most transcriptomic studies have focused on analyzing changes in RNA abundances without attributing those changes to transcriptional or posttranscriptional regulation. Estimating differential transcription and decay rates of RNA molecules would enable the identification of regulatory factors, such as transcription factors, RNA binding proteins, and microRNAs, that govern large-scale shifts in RNA expression. Here, we describe a protocol for estimating differential stability of RNA molecules between conditions using standard RNA-sequencing data, without the need for transcription inhibition or metabolic labeling. We apply this protocol to in vivo RNA-seq data from individuals with Alzheimer's disease and demonstrate how estimates of differential stability can be leveraged to infer the regulatory factors underlying them.
Article
Motivation The RNA splicing efficiency is of high interest for both understanding the regulatory machinery of gene expression and estimating the RNA velocity in single cells. However, its genomic regulation and stochasticity across contexts remain poorly understood. Results Here, by leveraging the recent RNA velocity tool, we estimated the relative splicing efficiency across a variety of single-cell RNA-Seq data sets. We further extracted large sets of genomic features and 120 RNA binding protein features and found they are highly predictive to relative RNA splicing efficiency across multiple tissues and organs on human and mouse. This predictive power brings promise to reveal the complexity of RNA processing and to enhance the analysis of single-cell transcription activities. Availability and implementation In order to ensure reproducibility, all preprocessed data sets and scripts used for the prediction and figure generation are publicly available at https://doi.org/10.5281/zenodo.6513669. Supplementary information Supplementary data are available at Bioinformatics online.
Preprint
Full-text available
Background: Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs). Results: Here, we presented a computational framework that take the advantage of recently developed methods to infer the mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3,942 genes and 186,132 eQTLs for 4,751 genes from 15,122,700 genetic variants for 13,476 genes, respectively. Interesting, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels.
Preprint
Full-text available
Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs). Specifically, we computationally inferred mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3,942 genes and 186,132 eQTLs for 4,751 genes from 15,122,700 genetic variants for 13,476 genes, respectively. Interesting, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels. Author Summary In the past decade, many studies have identified genetic variants associated with gene expression level (eQTLs) in different phenotypes, including tissues and diseases. Gene expression is the result of cooperation between transcriptional regulation, such as transcriptional activity, and post-transcriptional regulation, such as mRNA stability. Here, we present a computational framework that take advantage of recently developed methods to estimate mRNA stability from RNA-Seq, which is widely used to estimate gene expression, and then to identify genetic variants associated with mRNA stability (stQTLs) in lung tissue. Compared to eQTLs, we found that genetic variants that affects mRNA stability are more significantly located in the CDS and 3’UTR regions, which are known to interact with RNA-binding proteins (RBPs) or microRNAs to regulate stability. In addition, stQTLs are significantly more likely to overlap the binding sites of RBPs. We show that the six RBPs that most significantly bind to stQTLs are all known to regulate mRNA stability. This pipeline of simultaneously identifying eQTLs and stQTLs using only RNA-Seq data can provide higher resolution than traditional eQTLs study to better understand the molecular mechanisms of genetic variants on the regulation of gene expression.
Article
Full-text available
Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata, the identification and organization of samples still constitutes a major bottleneck hampering data reuse. We introduce Onassis, an R package within the Bioconductor environment providing key functionalities of Natural Language Processing (NLP) tools. Leveraging biomedical ontologies, Onassis greatly simplifies the association of samples from large-scale repositories to their representation in terms of ontology-based annotations. Moreover, through the use of semantic similarity measures, Onassis hierarchically organizes the datasets of interest, thus supporting the semantically aware analysis of the corresponding omics data. In conclusion, Onassis leverages NLP techniques, biomedical ontologies, and the R statistical framework, to identify, relate, and analyze datasets from public repositories. The tool was tested on various large-scale datasets, including compendia of gene expression, histone marks, and DNA methylation, illustrating how it can facilitate the integrative analysis of various omics data.
Article
Full-text available
RNA velocity has opened up new ways of studying cellular differentiation in single-cell RNA-sequencing data. It describes the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). However, errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. Here we present scVelo, a method that overcomes these limitations by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations. We apply scVelo to disentangling subpopulation kinetics in neurogenesis and pancreatic endocrinogenesis. We infer gene-specific rates of transcription, splicing and degradation, recover each cell’s position in the underlying differentiation processes and detect putative driver genes. scVelo will facilitate the study of lineage decisions and gene regulation. scVelo reconstructs transient cell states and differentiation pathways from single-cell RNA-sequencing data.
Article
Full-text available
The abundance of RNA species and their response to perturbations are set by the kinetics rates of RNA synthesis, processing, and degradation. However, the visualization, interpretation, and manipulation of these data require familiarity with mathematical modeling and command line tools. INSPEcT-GUI is an R-Shiny interface that allows researchers without specific training to effortlessly explore how the fine kinetic regulation of the RNA life cycle can shape gene expression programs. In particular, it allows to: (i) interactively visualize gene-level RNA dynamics; (ii) refine the model fit of experimental data; (iii) test alternative regulatory models; (iv) explore, independently from the availability of data, how the combined action of the RNA kinetic rates impacts on premature and mature RNA. INSPEcT-GUI is freely available within the R/Bioconductor package INSPEcT at http://bioconductor.org/packages/INSPEcT/. An HTML vignette including documentation on the tool startup and usage, executable examples, and a video demonstration, are available at: http://bioconductor.org/packages/release/bioc/vignettes/INSPEcT/inst/doc/INSPEcT_GUI.html.
Article
Full-text available
It has been known for a few decades that transcripts can be marked by dozens of different modifications. Yet, we are just at the beginning of charting these marks and understanding their functional impact. High-quality methods were developed for the profiling of some of these marks, and approaches to finely study their impact on specific phases of the RNA life-cycle are available, including RNA metabolic labeling. Thanks to these improvements, the most abundant marks, including N6-methyladenosine, are emerging as important determinants of the fate of marked RNAs. However, we still lack approaches to directly study how the set of marks for a given RNA molecule shape its fate. In this perspective, we first review current leading approaches in the field. Then, we propose an experimental and computational setup, based on direct RNA sequencing and mathematical modeling, to decipher the functional consequences of RNA modifications on the fate of individual RNA molecules and isoforms.
Article
Full-text available
Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata, the identification and organization of samples still constitutes a major bottleneck hampering data reuse. We introduce Onassis, an R package within the Bioconductor environment providing key functionalities of Natural Language Processing (NLP) tools. Leveraging biomedical ontologies, Onassis greatly simplifies the association of samples from large-scale repositories to their representation in terms of ontology-based annotations. Moreover, through the use of semantic similarity measures, Onassis hierarchically organizes the datasets of interest, thus supporting the semantically aware analysis of the corresponding omics data. In conclusion, Onassis leverages NLP techniques, biomedical ontologies, and the R statistical framework, to identify, relate, and analyze datasets from public repositories. The tool was tested on various large-scale datasets, including compendia of gene expression, histone marks, and DNA methylation, illustrating how it can facilitate the integrative analysis of various omics data.
Article
Full-text available
Massively parallel RNA sequencing (RNA-seq) in combination with metabolic labeling has become the de facto standard approach to study alterations in RNA transcription, processing or decay. Regardless of advances in the experimental protocols and techniques, every experimentalist needs to specify the key aspects of experimental design: For example, which protocol should be used (biochemical separation vs. nucleotide conversion) and what is the optimal labeling time? In this work, we provide approximate answers to these questions using the asymptotic theory of optimal design. Specifically, we investigate, how the variance of degradation rate estimates depends on the time and derive the optimal time for any given degradation rate. Subsequently, we show that an increase in sample numbers should be preferred over an increase in sequencing depth. Lastly, we provide some guidance on use cases when laborious biochemical separation outcompetes recent nucleotide conversion based methods (such as SLAMseq) and show, how inefficient conversion influences the precision of estimates. Code and documentation can be found at https://github.com/dieterich-lab/DesignMetabolicRNAlabeling.
Article
Full-text available
Single-cell RNA sequencing (scRNA-seq) has highlighted the important role of intercellular heterogeneity in phenotype variability in both health and disease¹. However, current scRNA-seq approaches provide only a snapshot of gene expression and convey little information on the true temporal dynamics and stochastic nature of transcription. A further key limitation of scRNA-seq analysis is that the RNA profile of each individual cell can be analysed only once. Here we introduce single-cell, thiol-(SH)-linked alkylation of RNA for metabolic labelling sequencing (scSLAM-seq), which integrates metabolic RNA labelling², biochemical nucleoside conversion³ and scRNA-seq to record transcriptional activity directly by differentiating between new and old RNA for thousands of genes per single cell. We use scSLAM-seq to study the onset of infection with lytic cytomegalovirus in single mouse fibroblasts. The cell-cycle state and dose of infection deduced from old RNA enable dose–response analysis based on new RNA. scSLAM-seq thereby both visualizes and explains differences in transcriptional activity at the single-cell level. Furthermore, it depicts ‘on–off’ switches and transcriptional burst kinetics in host gene expression with extensive gene-specific differences that correlate with promoter-intrinsic features (TBP–TATA-box interactions and DNA methylation). Thus, gene-specific, and not cell-specific, features explain the heterogeneity in transcriptomes between individual cells and the transcriptional response to perturbations.
Article
Full-text available
Background Methods to read out naturally occurring or experimentally introduced nucleic acid modifications are emerging as powerful tools to study dynamic cellular processes. The recovery, quantification and interpretation of such events in high-throughput sequencing datasets demands specialized bioinformatics approaches. Results Here, we present Digital Unmasking of Nucleotide conversions in K-mers (DUNK), a data analysis pipeline enabling the quantification of nucleotide conversions in high-throughput sequencing datasets. We demonstrate using experimentally generated and simulated datasets that DUNK allows constant mapping rates irrespective of nucleotide-conversion rates, promotes the recovery of multimapping reads and employs Single Nucleotide Polymorphism (SNP) masking to uncouple true SNPs from nucleotide conversions to facilitate a robust and sensitive quantification of nucleotide-conversions. As a first application, we implement this strategy as SLAM-DUNK for the analysis of SLAMseq profiles, in which 4-thiouridine-labeled transcripts are detected based on T > C conversions. SLAM-DUNK provides both raw counts of nucleotide-conversion containing reads as well as a base-content and read coverage normalized approach for estimating the fractions of labeled transcripts as readout. Conclusion Beyond providing a readily accessible tool for analyzing SLAMseq and related time-resolved RNA sequencing methods (TimeLapse-seq, TUC-seq), DUNK establishes a broadly applicable strategy for quantifying nucleotide conversions. Electronic supplementary material The online version of this article (10.1186/s12859-019-2849-7) contains supplementary material, which is available to authorized users.
Article
Full-text available
N6-methyladenosine (m6A) is the most abundant RNA modification. It has been involved in the regulation of RNA metabolism, including degradation and translation, in both physiological and disease conditions. A recent study showed that m6A-mediated degradation of key transcripts also plays a role in the control of T cells homeostasis and IL-7 induced differentiation. We re-analyzed the omics data from that study and, through the integrative analysis of total and nascent RNA-seq data, we were able to comprehensively quantify T cells RNA dynamics and how these are affected by m6A depletion. In addition to the expected impact on RNA degradation, we revealed a broader effect of m6A on RNA dynamics, which included the alteration of RNA synthesis and processing. Altogether, the combined action of m6A on all major steps of the RNA life-cycle closely re-capitulated the observed changes in the abundance of premature and mature RNA species. Ultimately, our re-analysis extended the findings of the initial study, focused on RNA stability, and proposed a yet unappreciated role for m6A in RNA synthesis and processing dynamics.
Article
Full-text available
RNA abundance is a powerful indicator of the state of individual cells. Single-cell RNA sequencing can reveal RNA abundance with high quantitative accuracy, sensitivity and throughput1. However, this approach captures only a static snapshot at a point in time, posing a challenge for the analysis of time-resolved phenomena such as embryogenesis or tissue regeneration. Here we show that RNA velocity-the time derivative of the gene expression state-can be directly estimated by distinguishing between unspliced and spliced mRNAs in common single-cell RNA sequencing protocols. RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. We validate its accuracy in the neural crest lineage, demonstrate its use on multiple published datasets and technical platforms, reveal the branching lineage tree of the developing mouse hippocampus, and examine the kinetics of transcription in human embryonic brain. We expect RNA velocity to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.
Article
Full-text available
Epitranscriptomic modification of mRNA affects its metabolism and has recently been shown to regulate brain development. Two studies in this issue of Neuron, Koranda et al. (2018) and Engel et al. (2018), uncover dynamic and critical roles of m6A/m RNA modifications in the adult mammalian brain in regulating physiological and stress-induced behaviors.
Article
Full-text available
Global quantification of total RNA is used to investigate steady state levels of gene expression. However, being able to differentiate pre-existing RNA (that has been synthesized prior to a defined point in time) and newly transcribed RNA can provide invaluable information e.g. to estimate RNA half-lives or identify fast and complex regulatory processes. Recently, new techniques based on metabolic labeling and RNA-seq have emerged that allow to quantify new and old RNA: Nucleoside analogs are incorporated into newly transcribed RNA and are made detectable as point mutations in mapped reads. However, relatively infrequent incorporation events and significant sequencing error rates make the differentiation between old and new RNA a highly challenging task. We developed a statistical approach termed GRAND-SLAM that, for the first time, allows to estimate the proportion of old and new RNA in such an experiment. Uncertainty in the estimates is quantified in a Bayesian framework. Simulation experiments show our approach to be unbiased and highly accurate. Furthermore, we analyze how uncertainty in the proportion translates into uncertainty in estimating RNA half-lives and give guidelines for planning experiments. Finally, we demonstrate that our estimates of RNA half-lives compare favorably to other experimental approaches and that biological processes affecting RNA half-lives can be investigated with greater power than offered by any other method. GRAND-SLAM is freely available for non-commercial use at http://software.erhard-lab.de; R scripts to generate all figures are available at zenodo (doi: 10.5281/zenodo.1162340).
Article
Full-text available
Cell type-specific transcriptome analysis is an essential tool in understanding biological processes in which diverse types of cells are involved. Although cell isolation methods such as fluorescence-activated cell sorting (FACS) in combination with transcriptome analysis have widely been used so far, their time-consuming and harsh procedures limit their applications. Here, we report a novel in vivo metabolic RNA sequencing method, SLAM-ITseq, which metabolically labels RNA with 4-thiouracil in a specific cell type in vivo followed by detection through an RNA-seq-based method that specifically distinguishes the thiolated uridine by base conversion. This method has successfully identified the cell type-specific transcriptome in three different tissues: endothelial cells in brain, epithelial cells in intestine, and adipocytes in white adipose tissue. Since this method does not require isolation of cells or RNA prior to the transcriptomic analysis, SLAM-ITseq provides an easy yet accurate snapshot of the transcriptional state in vivo.
Article
Full-text available
The gaseous plant hormone ethylene regulates a multitude of growth and developmental processes. How the numerous growth control pathways are coordinated by the ethylene transcriptional response remains elusive. We characterized the dynamic ethylene transcriptional response by identifying targets of the master regulator of the ethylene signaling pathway, ETHYLENE INSENSITIVE3 (EIN3), using chromatin immunoprecipitation sequencing and transcript sequencing during a timecourse of ethylene treatment. Ethylene-induced transcription occurs in temporal waves regulated by EIN3, suggesting distinct layers of transcriptional control. EIN3 binding was found to modulate a multitude of downstream transcriptional cascades, including a major feedback regulatory circuitry of the ethylene signaling pathway, as well as integrating numerous connections between most of the hormone mediated growth response pathways. These findings provide direct evidence linking each of the major plant growth and development networks in novel ways.
Article
Full-text available
RNA sequencing (RNA-seq) offers a snapshot of cellular RNA populations, but not temporal information about the sequenced RNA. Here we report TimeLapse-seq, which uses oxidative-nucleophilic-aromatic substitution to convert 4-thiouridine into cytidine analogs, yielding apparent U-to-C mutations that mark new transcripts upon sequencing. TimeLapse-seq is a single-molecule approach that is adaptable to many applications and reveals RNA dynamics and induced differential expression concealed in traditional RNA-seq.
Article
Full-text available
The turnover of the RNA molecules is determined by the rates of transcription and RNA degradation. Several methods have been developed to study RNA turnover since the beginnings of molecular biology. Here we summarize the main methods to measure RNA half-life: transcription inhibition, gene control, and metabolic labelling. These methods were used to detect the cellular activity of the mRNAs degradation machinery, including the exo-ribonuclease Xrn1 and the exosome. On the other hand, the study of the differential stability of mature RNAs has been hampered by the fact that different methods have often yielded inconsistent results. Recent advances in the systematic comparison of different method variants in yeast have permitted the identification of the least invasive methodologies that reflect half-lives the most faithfully, which is expected to open the way for a consistent quantitative analysis of the determinants of mRNA stability.
Article
Full-text available
The abundance of mRNA is mainly determined by the rates of RNA transcription and decay. Here, we present a method for unbiased estimation of differential mRNA decay rate from RNA-sequencing data by modeling the kinetics of mRNA metabolism. We show that in all primary human tissues tested, and particularly in the central nervous system, many pathways are regulated at the mRNA stability level. We present a parsimonious regulatory model consisting of two RNA-binding proteins and four microRNAs that modulate the mRNA stability landscape of the brain, which suggests a new link between RBFOX proteins and Alzheimer's disease. We show that downregulation of RBFOX1 leads to destabilization of mRNAs encoding for synaptic transmission proteins, which may contribute to the loss of synaptic function in Alzheimer's disease. RBFOX1 downregulation is more likely to occur in older and female individuals, consistent with the association of Alzheimer's disease with age and gender.
Article
Full-text available
Gene expression profiling by high-throughput sequencing reveals qualitative and quantitative changes in RNA species at steady state but obscures the intracellular dynamics of RNA transcription, processing and decay. We developed thiol(SH)-linked alkylation for the metabolic sequencing of RNA (SLAM seq), an orthogonal-chemistry-based RNA sequencing technology that detects 4-thiouridine (s4U) incorporation in RNA species at single-nucleotide resolution. In combination with well-established metabolic RNA labeling protocols and coupled to standard, low-input, high-throughput RNA sequencing methods, SLAM seq enabled rapid access to RNA-polymerase-II-dependent gene expression dynamics in the context of total RNA. We validated the method in mouse embryonic stem cells by showing that the RNA-polymerase-II-dependent transcriptional output scaled with Oct4/Sox2/Nanog-defined enhancer activity, and we provide quantitative and mechanistic evidence for transcript-specific RNA turnover mediated by post-transcriptional gene regulatory pathways initiated by microRNAs and N6-methyladenosine. SLAM seq facilitates the dissection of fundamental mechanisms that control gene expression in an accessible, cost-effective and scalable manner.
Article
Full-text available
N(6)-methyladenosine (m(6)A) is the most common and abundant messenger RNA modification, modulated by 'writers', 'erasers' and 'readers' of this mark. In vitro data have shown that m(6)A influences all fundamental aspects of mRNA metabolism, mainly mRNA stability, to determine stem cell fates. However, its in vivo physiological function in mammals and adult mammalian cells is still unknown. Here we show that the deletion of m(6)A 'writer' protein METTL3 in mouse T cells disrupts T cell homeostasis and differentiation. In a lymphopaenic mouse adoptive transfer model, naive Mettl3-deficient T cells failed to undergo homeostatic expansion and remained in the naive state for up to 12 weeks, thereby preventing colitis. Consistent with these observations, the mRNAs of SOCS family genes encoding the STAT signalling inhibitory proteins SOCS1, SOCS3 and CISH were marked by m(6)A, exhibited slower mRNA decay and showed increased mRNAs and levels of protein expression in Mettl3-deficient naive T cells. This increased SOCS family activity consequently inhibited IL-7-mediated STAT5 activation and T cell homeostatic proliferation and differentiation. We also found that m(6)A has important roles for inducible degradation of Socs mRNAs in response to IL-7 signalling in order to reprogram naive T cells for proliferation and differentiation. Our study elucidates for the first time, to our knowledge, the in vivo biological role of m(6)A modification in T-cell-mediated pathogenesis and reveals a novel mechanism of T cell homeostasis and signal-dependent induction of mRNA degradation.
Article
Full-text available
The RAF‐MEK‐ERK signalling pathway controls fundamental, often opposing cellular processes such as proliferation and apoptosis. Signal duration has been identified to play a decisive role in these cell fate decisions. However, it remains unclear how the different early and late responding gene expression modules can discriminate short and long signals. We obtained both protein phosphorylation and gene expression time course data from HEK293 cells carrying an inducible construct of the proto‐oncogene RAF. By mathematical modelling, we identified a new gene expression module of immediate–late genes (ILGs) distinct in gene expression dynamics and function. We find that mRNA longevity enables these ILGs to respond late and thus translate ERK signal duration into response amplitude. Despite their late response, their GC‐rich promoter structure suggested and metabolic labelling with 4SU confirmed that transcription of ILGs is induced immediately. A comparative analysis shows that the principle of duration decoding is conserved in PC12 cells and MCF7 cells, two paradigm cell systems for ERK signal duration. Altogether, our findings suggest that ILGs function as a gene expression module to decode ERK signal duration.
Article
Full-text available
To monitor transcriptional regulation in human cells, rapid changes in enhancer and promoter activity must be captured with high sensitivity and temporal resolution. Here, we show that the recently established protocol TT?seq (?transient transcriptome sequencing?) can monitor rapid changes in transcription from enhancers and promoters during the immediate response of T cells to ionomycin and phorbol 12?myristate 13?acetate (PMA). TT?seq maps eRNAs and mRNAs every 5?min after T?cell stimulation with high sensitivity and identifies many new primary response genes. TT?seq reveals that the synthesis of 1,601 eRNAs and 650 mRNAs changes significantly within only 15?min after stimulation, when standard RNA?seq does not detect differentially expressed genes. Transcription of enhancers that are primed for activation by nucleosome depletion can occur immediately and simultaneously with transcription of target gene promoters. Our results indicate that enhancer transcription is a good proxy for enhancer regulatory activity in target gene activation, and establish TT?seq as a tool for monitoring the dynamics of enhancer landscapes and transcription programs during cellular responses and differentiation.
Article
Full-text available
Uncontrolled Th17 cell activity is associated with cancer and autoimmune and inflammatory diseases. To validate the potential relevance of mouse models of targeting the Th17 pathway in human diseases we used RNA sequencing to compare the expression of coding and non-coding transcripts during the priming of Th17 cell differentiation in both human and mouse. In addition to already known targets, several transcripts not previously linked to Th17 cell polarization were found in both species. Moreover, a considerable number of human-specific long non-coding RNAs were identified that responded to cytokines stimulating Th17 cell differentiation. We integrated our transcriptomics data with known disease-associated polymorphisms and show that conserved regulation pinpoints genes that are relevant to Th17 cell-mediated human diseases and that can be modelled in mouse. Substantial differences observed in non-coding transcriptomes between the two species as well as increased overlap between Th17 cell-specific gene expression and disease-associated polymorphisms underline the need of parallel analysis of human and mouse models. Comprehensive analysis of genes regulated during Th17 cell priming and their classification to conserved and non-conserved between human and mouse facilitates translational research, pointing out which candidate targets identified in human are worth studying by using in vivo mouse models.
Article
Full-text available
The regulation of miRNAs is critical to the definition of cell identity and behavior in normal physiology and disease. To date, the dynamics of miRNA degradation and the mechanisms involved in remain largely obscure, in particular, in higher organisms. Here, we developed a pulse-chase approach based on metabolic RNA labeling to calculate miRNA decay rates at genome-wide scale in mammalian cells. Our analysis revealed heterogeneous miRNA half-lives, with many species behaving as stable molecules (T1/2>24h), while others, including passenger miRNAs and a number (25/129) of guide miRNAs, are quickly turned over (T1/2=4-14h). Decay rates were coupled with other features, including genomic organization, transcription rates, structural heterogeneity (IsomiRs) and target abundance, measured through quantitative experimental approaches. This comprehensive analysis highlighted functional mechanisms that mediate miRNA degradation, as well as the importance of decay dynamics in the regulation of the miRNA pool under both steady state conditions and during cell transitions.
Article
Full-text available
Cellular mRNA levels originate from the combined action of multiple regulatory processes, which can be recapitulated by the rates of pre-mRNA synthesis, pre-mRNA processing, and mRNA degradation. Recent experimental and computational advances set the basis to study these intertwined levels of regulation. Nevertheless, software for the comprehensive quantification of RNA dynamics is still lacking. INSPEcT is an R package for the integrative analysis of RNA- and 4sU-seq data to study the dynamics of transcriptional regulation. INSPEcT provides gene-level quantification of these rates, and a modeling framework to identify which of these regulatory processes are most likely to explain the observed mRNA and pre-mRNA concentrations. Software performance is tested on a synthetic dataset, instrumental to guide the choice of the modeling parameters and the experimental design. INSPEcT is submitted to Bioconductor and is currently available as Additional File 1. mattia.pelizzola@iit.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Article
Full-text available
Cells control dynamic transitions in transcript levels by regulating transcription, processing, and/or degradation through an integrated regulatory strategy. Here, we combine RNA metabolic labeling, rRNA-depleted RNA-seq, and DRiLL, a novel computational framework, to quantify the level; editing sites; and transcription, processing, and degradation rates of each transcript at a splice junction resolution during the LPS response of mouse dendritic cells. Four key regulatory strategies, dominated by RNA transcription changes, generate most temporal gene expression patterns. Noncanonical strategies that also employ dynamic posttranscriptional regulation control only a minority of genes, but provide unique signal processing features. We validate Tristetraprolin (TTP) as a major regulator of RNA degradation in one noncanonical strategy. Applying DRiLL to the regulation of noncoding RNAs and to zebrafish embryogenesis demonstrates its broad utility. Our study provides a new quantitative approach to discover transcriptional and posttranscriptional events that control dynamic changes in transcript levels using RNA sequencing data. Copyright © 2014 Elsevier Inc. All rights reserved.
Article
Full-text available
MicroRNAs (miRNAs) regulate target mRNAs through a combination of translational repression and mRNA destabilization, with mRNA destabilization dominating at steady state in the few contexts examined globally. Here, we extend the global steady-state measurements to additional mammalian contexts and find that regardless of the miRNA, cell type, growth condition, or translational state, mRNA destabilization explains most (66%->90%) miRNA-mediated repression. We also determine the relative dynamics of translational repression and mRNA destabilization for endogenous mRNAs as a miRNA is induced. Although translational repression occurs rapidly, its effect is relatively weak, such that by the time consequential repression ensues, the effect of mRNA destabilization dominates. These results imply that consequential miRNA-mediated repression is largely irreversible and provide other insights into the nature of miRNA-mediated regulation. They also simplify future studies, dramatically extending the known contexts and time points for which monitoring mRNA changes captures most of the direct miRNA effects.
Article
Full-text available
The c-myc proto-oncogene product, Myc, is a transcription factor that binds thousands of genomic loci 1 . Recent work suggested that rather than up-and downregulating selected groups of genes 1–3 , Myc targets all active promoters and enhancers in the genome (a phenomenon termed 'invasion') and acts as a general amplifier of transcription 4,5 . However, the available data did not readily discrim-inate between direct and indirect effects of Myc on RNA biogenesis. We addressed this issue with genome-wide chromatin immunopre-cipitation and RNA expression profiles during B-cell lymphoma-genesis in mice, in cultured B cells and fibroblasts. Consistent with long-standing observations 6 , we detected general increases in total RNA or messenger RNA copies per cell (hereby termed 'amplification') 4,5 when comparing actively proliferating cells with control quiescent cells: this was true whether cells were stimulated by mitogens (requiring endogenous Myc for a proliferative response) 7,8 or by deregulated, oncogenic Myc activity. RNA amplification and promoter/enhancer invasion by Myc were separable phenomena that could occur without one another. Moreover, whether or not associated with RNA amp-lification, Myc drove the differential expression of distinct subsets of target genes. Hence, although having the potential to interact with all active or poised regulatory elements in the genome 4,5,9–11
Article
Full-text available
Although transcriptional elongation by RNA polymerase II is coupled with many RNA-related processes, genomewide elongation rates remain unknown. We describe a method, called 4sUDRB-seq, based on reversible inhibition of transcription elongation coupled with tagging newly transcribed RNA with 4-thiouridine and high throughput sequencing to measure simultaneously with high confidence genome-wide transcription elongation rates in cells. We find that most genes are transcribed at about 3.5 kb/min, with elongation rates varying between 2 kb/min - 6 kb/min. 4sUDRB-seq can facilitate genomewide exploration of the involvement of specific elongation factors in transcription and the contribution of deregulated transcription elongation to various pathologies.
Article
Full-text available
mRNA synthesis, processing, and destruction involve a complex series of molecular steps that are incompletely understood. Because the RNA intermediates in each of these steps have finite lifetimes, extensive mechanistic and dynamical information is encoded in total cellular RNA. Here we report the development of SnapShot-Seq, a set of computational methods that allow the determination of in vivo rates of pre-mRNA synthesis, splicing, intron degradation, and mRNA decay from a single RNA-Seq snapshot of total cellular RNA. SnapShot-Seq can detect in vivo changes in the rates of specific steps of splicing, and it provides genome-wide estimates of pre-mRNA synthesis rates comparable to those obtained via labeling of newly synthesized RNA. We used SnapShot-Seq to investigate the origins of the intrinsic bimodality of metazoan gene expression levels, and our results suggest that this bimodality is partly due to spillover of transcriptional activation from highly expressed genes to their poorly expressed neighbors. SnapShot-Seq dramatically expands the information obtainable from a standard RNA-Seq experiment.
Article
Full-text available
Sensing and responding to ambient temperature is important for controlling growth and development of many organisms, in part by regulating mRNA levels. mRNA abundance can change with temperature, but it is unclear whether this results from changes in transcription or decay rates, and whether passive or active temperature regulation is involved. Using a base analog labelling method, we directly measured the temperature coefficient, Q10, of mRNA synthesis and degradation rates of the Arabidopsis transcriptome. We show that for most genes, transcript levels are buffered against passive increases in transcription rates by balancing passive increases in the rate of decay. Strikingly, for temperature-responsive transcripts, increasing temperature raises transcript abundance primarily by promoting faster transcription relative to decay and not vice versa, suggesting a global transcriptional process exists that controls mRNA abundance by temperature. This is partly accounted for by gene body H2A.Z which is associated with low transcription rate Q10, but is also influenced by other marks and transcription factor activities. Our data show that less frequent chromatin states can produce temperature responses simply by virtue of their rarity and the difference between their thermal properties and those of the most common states, and underline the advantages of directly measuring transcription rate changes in dynamic systems, rather than inferring rates from changes in mRNA abundance.
Article
Full-text available
Post-transcriptional regulation (PTR) of gene expression is now recognized as a major determinant of cell phenotypes. The recent availability of methods to map protein-RNA interactions in entire transcriptomes such as RIP, CLIP and their variants, together with global polysomal and ribosome profiling techniques, are driving the exponential accumulation of vast amounts of data on mRNA contacts in cells, and of corresponding predictions of PTR events. However, this exceptional quantity of information cannot be exploited at its best to reconstruct potential PTR networks, as it still lies scattered throughout several databases and in isolated reports of single interactions. To address this issue, we developed the second and vastly enhanced version of the Atlas of UTR Regulatory Activity (AURA 2), a meta-database centered on mapping interaction of trans-factors with human and mouse UTRs. AURA 2 includes experimentally demonstrated binding sites for RBPs, ncRNAs, thousands of cis-elements, variations, RNA epigenetics data and more. Its user-friendly interface offers various data-mining features including co-regulation search, network generation and regulatory enrichment testing. Gene expression profiles for many tissues and cell lines can be also combined with these analyses to display only the interactions possible in the system under study. AURA 2 aims at becoming a valuable toolbox for PTR studies and at tracing the road for how PTR network-building tools should be designed. AURA 2 is available at http://aura.science.unitn.it.
Article
Full-text available
N(6)-methyladenosine (m(6)A) is the most prevalent internal (non-cap) modification present in the messenger RNA of all higher eukaryotes. Although essential to cell viability and development, the exact role of m(6)A modification remains to be determined. The recent discovery of two m(6)A demethylases in mammalian cells highlighted the importance of m(6)A in basic biological functions and disease. Here we show that m(6)A is selectively recognized by the human YTH domain family 2 (YTHDF2) 'reader' protein to regulate mRNA degradation. We identified over 3,000 cellular RNA targets of YTHDF2, most of which are mRNAs, but which also include non-coding RNAs, with a conserved core motif of G(m(6)A)C. We further establish the role of YTHDF2 in RNA metabolism, showing that binding of YTHDF2 results in the localization of bound mRNA from the translatable pool to mRNA decay sites, such as processing bodies. The carboxy-terminal domain of YTHDF2 selectively binds to m(6)A-containing mRNA, whereas the amino-terminal domain is responsible for the localization of the YTHDF2-mRNA complex to cellular RNA decay sites. Our results indicate that the dynamic m(6)A modification is recognized by selectively binding proteins to affect the translation status and lifetime of mRNA.
Article
Full-text available
RNA-seq is an effective method for studying the transcriptome, but it can be difficult to apply to scarce or degraded RNA from fixed clinical samples, rare cell populations or cadavers. Recent studies have proposed several methods for RNA-seq of low-quality and/or low-quantity samples, but the relative merits of these methods have not been systematically analyzed. Here we compare five such methods using metrics relevant to transcriptome annotation, transcript discovery and gene expression. Using a single human RNA sample, we constructed and sequenced ten libraries with these methods and compared them against two control libraries. We found that the RNase H method performed best for chemically fragmented, low-quality RNA, and we confirmed this through analysis of actual degraded samples. RNase H can even effectively replace oligo(dT)-based methods for standard RNA-seq. SMART and NuGEN had distinct strengths for measuring low-quantity RNA. Our analysis allows biologists to select the most suitable methods and provides a benchmark for future method development.
Article
Full-text available
To monitor eukaryotic mRNA metabolism, we developed comparative dynamic transcriptome analysis (cDTA). cDTA provides absolute rates of mRNA synthesis and decay in Saccharomyces cerevisiae (Sc) cells with the use of Schizosaccharomyces pombe (Sp) as an internal standard. cDTA uses nonperturbing metabolic labeling that supersedes conventional methods for mRNA turnover analysis. cDTA reveals that Sc and Sp transcripts that encode orthologous proteins have similar synthesis rates, whereas decay rates are fivefold lower in Sp, resulting in similar mRNA concentrations despite the larger Sp cell volume. cDTA of Sc mutants reveals that a eukaryote can buffer mRNA levels. Impairing transcription with a point mutation in RNA polymerase (Pol) II causes decreased mRNA synthesis rates as expected, but also decreased decay rates. Impairing mRNA degradation by deleting deadenylase subunits of the Ccr4-Not complex causes decreased decay rates as expected, but also decreased synthesis rates. Extended kinetic modeling reveals mutual feedback between mRNA synthesis and degradation that may be achieved by a factor that inhibits synthesis and enhances degradation.
Article
Eukaryotic genes often generate a variety of RNA isoforms that can lead to functionally distinct protein variants. The synthesis and stability of RNA isoforms is poorly characterized because current methods to quantify RNA metabolism use short-read sequencing and cannot detect RNA isoforms. Here we present nanopore sequencing-based isoform dynamics (nano-ID), a method that detects newly synthesized RNA isoforms and monitors isoform metabolism. Nano-ID combines metabolic RNA labeling, long-read nanopore sequencing of native RNA molecules, and machine learning. Nano-ID derives RNA stability estimates and evaluates stability determining factors such as RNA sequence, poly(A)-tail length, secondary structure, translation efficiency, and RNA-binding proteins. Application of nano-ID to the heat shock response in human cells reveals that many RNA isoforms change their stability. Nano-ID also shows that the metabolism of individual RNA isoforms differs strongly from that estimated for the combined RNA signal at a specific gene locus. Nano-ID enables studies of RNA metabolism at the level of single RNA molecules and isoforms in different cell states and conditions.
Article
Gene expression is regulated by the rates of synthesis and degradation of mRNAs, but how these processes are coordinated is poorly understood. Here, we show that reduced transcription dynamics of specific genes leads to enhanced m⁶A deposition, preferential activity of the CCR4-Not complex, shortened poly(A) tails, and reduced stability of the respective mRNAs. These effects are also exerted by internal ribosome entry site (IRES) elements, which we found to be transcriptional pause sites. However, when transcription dynamics, and subsequently poly(A) tails, are globally altered, cells buffer mRNA levels by adjusting the expression of mRNA degradation machinery. Stress-provoked global impediment of transcription elongation leads to a dramatic inhibition of the mRNA degradation machinery and massive mRNA stabilization. Accordingly, globally enhanced transcription, such as following B cell activation or glucose stimulation, has the opposite effects. This study uncovers two molecular pathways that maintain balanced gene expression in mammalian cells by linking transcription to mRNA stability.
Article
The programmes that direct an organism's development and maintenance are encoded in its genome. Decoding of this information begins with regulated transcription of genomic DNA into RNA. Although transcription and its control can be tracked indirectly by measuring stable RNAs, it is only by directly measuring nascent RNAs that the immediate regulatory changes in response to developmental, environmental, disease and metabolic signals are revealed. Multiple complementary methods have been developed to quantitatively track nascent transcription genome-wide at nucleotide resolution, all of which have contributed novel insights into the mechanisms of gene regulation and transcription-coupled RNA processing. Here we critically evaluate the array of strategies used for investigating nascent transcription and discuss the recent conceptual advances they have provided.
Article
Upon activation, lymphocytes exit quiescence and undergo substantial increases in cell size, accompanied by activation of energy-producing and anabolic pathways, widespread chromatin decompaction, and elevated transcriptional activity. These changes depend upon prior induction of the Myc transcription factor, but how Myc controls them remains unclear. We addressed this issue by profiling the response to LPS stimulation in wild-type and c-myc-deleted primary mouse B-cells. Myc is rapidly induced, becomes detectable on virtually all active promoters and enhancers, but has no direct impact on global transcriptional activity. Instead, Myc contributes to the swift up- and down-regulation of several hundred genes, including many known regulators of the aforementioned cellular processes. Myc-activated promoters are enriched for E-box consensus motifs, bind Myc at the highest levels, and show enhanced RNA Polymerase II recruitment, the opposite being true at down-regulated loci. Remarkably, the Myc-dependent signature identified in activated B-cells is also enriched in Myc-driven B-cell lymphomas: hence, besides modulation of new cancer-specific programs, the oncogenic action of Myc may largely rely on sustained deregulation of its normal physiological targets.
Article
p>The combination of metabolic RNA labeling with biochemical nucleoside conversion now adds a broadly applicable temporal dimension to RNA sequencing.</p
Article
Control of messenger RNA (mRNA) stability is an important aspect of gene regulation. The gold standard for measuring mRNA stability transcriptome-wide uses metabolic labeling, biochemical isolation of labeled RNA populations, and high-throughput sequencing. However, difficult normalization procedures have inhibited widespread adoption of this approach. Here, we present DRUID (for Determination of Rates Using Intron Dynamics), a new computational pipeline that is robust, easy to use, and freely available. Our pipeline uses endogenous introns to normalize time course data and yields reproducible half-lives, even with datasets that were otherwise unusable. DRUID can handle datasets from a variety of organisms, spanning yeast to humans, and we even applied it retroactively on published datasets. We anticipate that DRUID will allow broad application of metabolic labeling for studies of transcript stability.
Article
Motivation: Metabolic labelling of RNA is a well-established and powerful method to estimate RNA synthesis and decay rates. The pulseR R package simplifies the analysis of RNA-seq count data that emerge from corresponding pulse-chase experiments. Results: The pulseR package provides a flexible interface and readily accommodates numerous different experimental designs. To our knowledge, it is the first publicly available software solution that models count data with the more appropriate negative-binomial model. Moreover, pulseR handles labelled and unlabelled spike-in sets in its workflow and accounts for potential labeling biases (e.g. number of uridine residues). Availability and implementation: The pulseR package is freely available at https://github.com/dieterich-lab/pulseR under the GPLv3.0 licence. Contact: a.uvarovskii@uni-heidelberg.de or christoph.dieterich@uni-heidelberg.de. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Overexpression of the MYC transcription factor causes its widespread interaction with regulatory elements in the genome but leads to the up- and down-regulation of discrete sets of genes. The molecular determinants of these selective transcriptional responses remain elusive. Here, we present an integrated time-course analysis of transcription and mRNA dynamics following MYC activation in proliferating mouse fibroblasts, based on chromatin immunoprecipitation, metabolic labeling of newly synthesized RNA, extensive sequencing, and mathematical modeling. Transcriptional activation correlated with the highest increases in MYC binding at promoters. Repression followed a reciprocal scenario, with the lowest gains in MYC binding. Altogether, the relative abundance (henceforth, "share") of MYC at promoters was the strongest predictor of transcriptional responses in diverse cell types, predominating over MYC's association with the corepressor ZBTB17 (also known as MIZ1). MYC activation elicited immediate loading of RNA polymerase II (RNAPII) at activated promoters, followed by increases in pause-release, while repressed promoters showed opposite effects. Gains and losses in RNAPII loading were proportional to the changes in the MYC share, suggesting that repression by MYC may be partly indirect, owing to competition for limiting amounts of RNAPII. Secondary to the changes in RNAPII loading, the dynamics of elongation and pre-mRNA processing were also rapidly altered at MYC regulated genes, leading to the transient accumulation of partially or aberrantly processed mRNAs. Altogether, our results shed light on how overexpressed MYC alters the various phases of the RNAPII cycle and the resulting transcriptional response.
Article
RNA modifications are integral to the regulation of RNA metabolism. One abundant mRNA modification is N6-methyladenosine (m6A), which affects various aspects of RNA metabolism, including splicing, translation and degradation. Current knowledge about the proteins recruited to m6A to carry out these molecular processes is still limited. Here we describe comprehensive and systematic mass-spectrometry-based screening of m6A interactors in various cell types and sequence contexts. Among the main findings, we identified G3BP1 as a protein that is repelled by m6A and positively regulates mRNA stability in an m6A-regulated manner. Furthermore, we identified FMR1 as a sequence-context-dependent m6A reader, thus revealing a connection between an mRNA modification and an autism spectrum disorder. Collectively, our data represent a rich resource and shed further light on the complex interplay among m6A, m6A interactors and mRNA homeostasis.
Article
Over 100 types of chemical modifications have been identified in cellular RNAs. While the 5′ cap modification and the poly(A) tail of eukaryotic mRNA play key roles in regulation, internal modifications are gaining attention for their roles in mRNA metabolism. The most abundant internal mRNA modification is N⁶-methyladenosine (m⁶A), and identification of proteins that install, recognize, and remove this and other marks have revealed roles for mRNA modification in nearly every aspect of the mRNA life cycle, as well as in various cellular, developmental, and disease processes. Abundant noncoding RNAs such as tRNAs, rRNAs, and spliceosomal RNAs are also heavily modified and depend on the modifications for their biogenesis and function. Our understanding of the biological contributions of these different chemical modifications is beginning to take shape, but it's clear that in both coding and noncoding RNAs, dynamic modifications represent a new layer of control of genetic information.
Article
Pervasive transcription of the human genome results in a heterogeneous mix of coding RNAs and long noncoding RNAs (lncRNAs). Only a small fraction of lncRNAs have demonstrated regulatory functions, thus making functional lncRNAs difficult to distinguish from nonfunctional transcriptional byproducts. This difficulty has resulted in numerous competing human lncRNA classifications that are complicated by a steady increase in the number of annotated lncRNAs. To address these challenges, we quantitatively examined transcription, splicing, degradation, localization and translation for coding and noncoding human genes. We observed that annotated lncRNAs had lower synthesis and higher degradation rates than mRNAs and discovered mechanistic differences explaining slower lncRNA splicing. We grouped genes into classes with similar RNA metabolism profiles, containing both mRNAs and lncRNAs to varying extents. These classes exhibited distinct RNA metabolism, different evolutionary patterns and differential sensitivity to cellular RNA-regulatory pathways. Our classification provides an alternative to genomic context-driven annotations of lncRNAs.
Article
Upon recruitment to active enhancers and promoters, RNA polymerase II (Pol II) generates short non-coding transcripts of unclear function. The mechanisms that control the length and the amount of ncRNAs generated by cis-regulatory elements are largely unknown. Here, we show that the adaptor protein WDR82 and its associated complexes actively limit such non-coding transcription. WDR82 targets the SET1 H3K4 methyltransferases and the nuclear protein phosphatase 1 (PP1) complexes to the initiating Pol II. WDR82 and PP1 also interact with components of the transcriptional termination and RNA processing machineries. Depletion of WDR82, SET1, or the PP1 subunit required for its nuclear import caused distinct but overlapping transcription termination defects at highly expressed genes and active enhancers and promoters, thus enabling the increased synthesis of unusually long ncRNAs. These data indicate that transcription initiated from cis-regulatory elements is tightly coordinated with termination mechanisms that impose the synthesis of short RNAs. Upon recruitment to enhancers and promoters, RNA polymerase II synthesizes short and poorly abundant non-coding transcripts.
Article
N(6)-methyladenosine (m6A) is the most abundant modified base in eukaryotic mRNA and has been linked to diverse effects on mRNA fate. Current mapping approaches localize m6A residues to transcript regions 100-200 nt long but cannot identify precise m6A positions on a transcriptome-wide level. Here we developed m6A individual-nucleotide-resolution cross-linking and immunoprecipitation (miCLIP) and used it to demonstrate that antibodies to m6A can induce specific mutational signatures at m6A residues after ultraviolet light-induced antibody-RNA cross-linking and reverse transcription. We found that these antibodies similarly induced mutational signatures at N(6),2'-O-dimethyladenosine (m6Am), a modification found at the first nucleotide of certain mRNAs. Using these signatures, we mapped m6A and m6Am at single-nucleotide resolution in human and mouse mRNA and identified small nucleolar RNAs (snoRNAs) as a new class of m6A-containing non-coding RNAs (ncRNAs).
Article
RNA-seq experiments generate reads derived not only from mature RNA transcripts but also from pre-mRNA. Here we present a computational approach called exon-intron split analysis (EISA) that measures changes in mature RNA and pre-mRNA reads across different experimental conditions to quantify transcriptional and post-transcriptional regulation of gene expression. We apply EISA to 17 diverse data sets to show that most intronic reads arise from nuclear RNA and changes in intronic read counts accurately predict changes in transcriptional activity. Furthermore, changes in post-transcriptional regulation can be predicted from differences between exonic and intronic changes. EISA reveals both transcriptional and post-transcriptional contributions to expression changes, increasing the amount of information that can be gained from RNA-seq data sets.
Article
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease of the motor neurons, which results in weakness and atrophy of voluntary skeletal muscles. Treatments do not modify the disease trajectory effectively, and only modestly improve survival. A complex interaction between genes, environmental exposure and impaired molecular pathways contributes to pathology in patients with ALS. Epigenetic mechanisms control the hereditary and reversible regulation of gene expression without altering the basic genetic code. Aberrant epigenetic patterns-including abnormal microRNA (miRNA) biogenesis and function, DNA modifications, histone remodeling, and RNA editing-are acquired throughout life and are influenced by environmental factors. Thus, understanding the molecular processes that lead to epigenetic dysregulation in patients with ALS might facilitate the discovery of novel therapeutic targets and biomarkers that could reduce diagnostic delay. These achievements could prove crucial for successful disease modification in patients with ALS. We review the latest findings regarding the role of miRNA modifications and other epigenetic mechanisms in ALS, and discuss their potential as therapeutic targets.
Article
4sUDRB-seq separately measures, on a genomic scale, the distinct contributions of transcription elongation speed and rate of RNA polymerase II (Pol II) transition into active elongation (TAE) to the overall mRNA production rate. It uses reversible inhibition of transcription elongation with 5,6-dichloro-1-β-D-ribofuranosylbenzimidazole (DRB), combined with a pulse of 4-thiouridine (4sU), to tag newly transcribed RNA. After DRB removal, cells are collected at several time points, and tagged RNA is biotinylated, captured on streptavidin beads and sequenced. 4sUDRB-seq enables the comparison of elongation speeds between different developmental stages or different cell types, and it allows the impact of specific transcription factors on transcription elongation speed versus TAE to be studied. RNA preparation takes ∼4 d to complete, with deep sequencing requiring an additional ∼4-11 d plus 1-3 d for bioinformatics analysis. The experimental protocol requires basic molecular biology skills, whereas data analysis requires knowledge in bioinformatics, particularly MATLAB and the Linux environment.