Quantification of stochastic noise of splicing and polyadenylation in Entamoeba histolytica

Institut Pasteur, Unité Biologie Cellulaire du Parasitisme, Département Biologie cellulaire et infection, F-75015 Paris, France, INSERM U786, F-75015 Paris, France, Institut Pasteur, Plate-forme Transcriptome et Epigénome, Département Génomes et Génétique, F-75015 Paris, France, Jawaharlal Nehru University, School of Life Sciences, New Delhi 110067, India, and Jawaharlal Nehru University, School of Computational and Integrative Sciences, New Delhi 110067, India.
Nucleic Acids Research (Impact Factor: 9.11). 12/2012; 41(3). DOI: 10.1093/nar/gks1271
Source: PubMed


Alternative splicing and polyadenylation were observed pervasively in eukaryotic messenger RNAs. These alternative isoforms could either be consequences of physiological regulation or stochastic noise of RNA processing. To quantify the extent of stochastic noise in splicing and polyadenylation, we analyzed the alternative usage of splicing and polyadenylation sites in Entamoeba histolytica using RNA-Seq. First, we identified a large number of rarely spliced alternative junctions and then showed that the occurrence of these alternative splicing events is correlated with splicing site sequence, occurrence of constitutive splicing events and messenger RNA abundance. Our results implied the majority of these alternative splicing events are likely to be stochastic error of splicing machineries, and we estimated the corresponding error rates. Second, we observed extensive microheterogeneity of polyadenylation cleavage sites, and the extent of such microheterogeneity is correlated with the occurrence of constitutive cleavage events, suggesting most of such microheterogeneity is likely to be stochastic. Overall, we only observed a small fraction of alternative splicing and polyadenylation isoforms that are unlikely to be solely stochastic, implying the functional relevance of alternative splicing and polyadenylation in E. histolytica is limited. Lastly, we revised the gene models and annotated their 3'UTR in AmoebaDB, providing valuable resources to the community.

Download full-text


Available from: Christian Weber,
  • Source
    • "All splicing junctions identified by HMMSplicer [37] were clustered as mentioned in a previous study [52]. A junction cluster is considered to be ‘antisense’ when its representative junction is located within the coding region of a gene on the opposite strand. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Advances in high-throughput sequencing have led to the discovery of widespread transcription of natural antisense transcripts (NATs) in a large number of organisms, where these transcripts have been shown to play important roles in the regulation of gene expression. Likewise, the existence of NATs has been observed in Plasmodium but our understanding towards their genome-wide distribution remains incomplete due to the limited depth and uncertainties in the level of strand specificity of previous datasets. To gain insights into the genome-wide distribution of NATs in P. falciparum, we performed RNA-ligation based strand-specific RNA sequencing at unprecedented depth. Our data indicate that 78.3% of the genome is transcribed during blood-stage development. Moreover, our analysis reveals significant levels of antisense transcription from at least 24% of protein-coding genes and that while expression levels of NATs change during the intraerythrocytic developmental cycle (IDC), they do not correlate with the corresponding mRNA levels. Interestingly, antisense transcription is not evenly distributed across coding regions (CDSs) but strongly clustered towards the 3[prime]-end of CDSs. Furthermore, for a significant subset of NATs, transcript levels correlate with mRNA levels of neighboring genes.Finally, we were able to identify the polyadenylation sites (PASs) for a subset of NATs, demonstrating that at least some NATs are polyadenylated. We also mapped the PASs of 3443 coding genes, yielding an average 3[prime] untranslated region length of 523 bp. Our strand-specific analysis of the P. falciparum transcriptome expands and strengthens the existing body of evidence that antisense transcription is a substantial phenomenon in P. falciparum. For a subset of neighboring genes we find that sense and antisense transcript levels are intricately linked while other NATs appear to be regulated independently of mRNA transcription. Our deep strand-specific dataset will provide a valuable resource for the precise determination of expression levels as it separates sense from antisense transcript levels, which we find to often significantly differ. In addition, the extensive novel data on 3[prime] UTR length will allow others to perform searches for regulatory motifs in the UTRs and help understand post-translational regulation in P. falciparum.
    BMC Genomics 02/2014; 15(1):150. DOI:10.1186/1471-2164-15-150 · 3.99 Impact Factor
  • Source
    • "The two E. histolytica LMW-PTP proteins (Gen- Bank: XP 656359, coded by GenBank: XM 651267, and GenBank: XP 653357, coded by GenBank: XM 648265), are identical except for a single conservative residue change at position 85 in the protein sequence: XP 656359 has an alanine and XP 653357 a valine. Both genes are expressed in cultured trophozoites, clinical isolates, and cysts [17] [18]. XM 651267, the gene encoding XP 656359, was cloned and expressed for this study, as was its Cys to Ser substrate-trapping mutant form. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Entamoeba histolytica is a eukaryotic intestinal parasite of humans, and is endemic in developing countries. We have characterized the E. histolytica putative low molecular weight protein tyrosine phosphatase (LMW-PTP). The structure for this amebic tyrosine phosphatase was solved, showing the ligand-induced conformational changes necessary for binding of substrate. In amebae, it was expressed at low but detectable levels as detected by immunoprecipitation followed by immunoblotting. A mutant LMW-PTP protein in which the catalytic cysteine in the active site was replaced with a serine lacked phosphatase activity, and was used to identify a number of trapped putative substrate proteins via mass spectrometry analysis. Seven of these putative substrate protein genes were cloned with an epitope tag and overexpressed in amebae. Five of these seven putative substrate proteins were demonstrated to interact specifically with the mutant LMW-PTP. This is the first biochemical study of a small tyrosine phosphatase in Entamoeba, and sets the stage for understanding its role in amebic biology and pathogenesis.
    Molecular and Biochemical Parasitology 02/2014; 193(1):33-44. DOI:10.1016/j.molbiopara.2014.01.003 · 1.79 Impact Factor
  • Source
    • "Recently, Pickrell et al. [93] have used RNA-seq to show that indeed an important amount of alternative isoforms result from noisy splicing. On the same line, Hon et al. [92] have used RNA-seq in E.histolytica to show that a majority of alternative splicing and polyadenylation isoforms are the result of stochastic processes and therefore unlikely to play a functional role. Reinforcing these results, recent proteomics studies [105] show that a fraction of transcripts do not reach the protein level, and for this reason are less likely to be functional. "
    [Show abstract] [Hide abstract]
    ABSTRACT: At present we know that phenotypic differences between organisms arise from a variety of sources, like protein sequence divergence, regulatory sequence divergence, alternative splicing, etc. However, we do not have yet a complete view of how these sources are related. Here we address this problem, studying the relationship between protein divergence and the ability of genes to express multiple isoforms. We used three genome-wide datasets of human-mouse orthologs to study the relationship between isoform multiplicity co-occurrence between orthologs (the fact that two orthologs have more than one isoform) and protein divergence. In all cases our results showed that there was a monotonic dependence between these two properties. We could explain this relationship in terms of a more fundamental one, between exon number of the largest isoform and protein divergence. We found that this last relationship was present, although with variations, in other species (chimpanzee, cow, rat, chicken, zebrafish and fruit fly). In summary, we have identified a relationship between protein divergence and isoform multiplicity co-occurrence and explained its origin in terms of a simple gene-level property. Finally, we discuss the biological implications of these findings for our understanding of inter-species phenotypic differences.
    PLoS ONE 08/2013; 8(8):e72742. DOI:10.1371/journal.pone.0072742 · 3.23 Impact Factor
Show more