PosterPDF Available

Genomic Distribution of Alus and Their Impact on Gene Expression

Authors:

Abstract

In this study we analyzed the genomic distribution of Alu elements in the human genome and their impact on transcript abundance and classified transcripts based on relative orientation of Alus. As a first step towards functional analysis, we performed in-silico simulations of co-transcriptional folding behavior in order to identify kinetic traps, which might serve as potential target sites triggering RNA degradation.
[1] "GENCODE: The reference human genome annotation for the ENCODE project", Harrow et al., Genome Res., 22, 1760-1774 (2012)
[2] "Landscape of transcription in human cells", Djebali et. al, Nature, 489, 101-8 (2012)
[3] "A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and fusion detection", Hoffmann et al., Genome Biol. 15, R34 (2014)
[4] "Folding kinetics of large RNAs", Geiss et al. , J. Mol. Biol. 379, 160-73 (2008)
[5] "ViennaRNA Package 2.0", R. Lorenz et al., Algorithms in Molecular Biology, vol. 6, p. 26 (2011)
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria 2Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria
3Center for RNA in Technology and Health, Univ. Copenhagen, Grønnegårdsvej 3, Frederiksberg C, Denmark
4Department for Cell- and Developmental Biology, Medical University of Vienna, Schwarzspanierstr. 17 A-1090 Wien, Austria, 5Max F. Perutz Laboratories, Vienna Biocenter (VBC), Dr. Bohr-Gasse 9, 1030 Vienna, Austria
Andrea Tanzer1, Michael T. Wolfinger1, Stefan Badelt1, Mansoureh Tajaddod5, Konstantin Licht5, Ivo L. Hofacker1,2,3 and Michael F. Jantsch4,5
Genomic Distribution of Alus and
Their Impact on Gene Expression
Abstract
Alus orientation and
RNA abundance
Alus in genomic elements
Data:
- human genome assembly: hg19/GRCh37
- gene annotation: GENCODE version 19 [1]
- RNAseq data: ENCODE project (phase2) [2],
15 celllines, cell, PolyA+, long RNA faction
- Alu annotation: UCSC genome browser,
Repeatmasker track
Tools:
- RNAseq mapper: segemehl [3]
- transcript quantification: cufflinks2
- annotation comparison: bedtools2
- RNA kinetic folding simulation: Kinwalker [4]
- RNA structure plots: Vienna RNA package
[5]
References
We classified Alus based on their position within transcripts
(intergenic, transcript, intron, exon, CDS, 5UTR, 3UTR, non-
coding exon) and their orientation relative to neighbouring Alus
(single Alus, head-to-head, tail-to-tail, direct tandems, no
Alu). Only long transcripts were used (gene type lncRNA, protein
coding, pseudogene), small RNAs and rRNAs were not considered.
Given that a gene locus consists on average of 3.8 transcript
isoforms, we projected individual elements onto the genome in a
hierarchical rule set and used these projections for genome
Location and Orientation
Toolbox
Co-transcriptional folding
at@tbi.univie.ac.at http://www.tbi.univie.ac.at
In this study we analyzed the genomic
distribution of Alu elements in the human
genome and their impact on transcript
abundance and classified transcripts based
on relative orientation of Alus.
As a first step towards functional analysis, we
performed in-silico simulations of co-
transcriptional folding behavior in order to
identify kinetic traps, which might serve as
potential target sites triggering RNA
degradation.
Summary
Acknowledgments
This study is joint work of the University of
Vienna, the Max F. Perutz Laboratories and
the Medical University of Vienna and is
financially supported by the FWF grant "SFB
RNA regulation of the transcriptome", F 43.
AT performed the genomic analysis, AT and
MTW the transcriptomics analysis, AT and SB
did folding simulations, MT and KL designed
and validated constructs and MFJ and ILH
supervised the study.
At = 0.05 level, we find that transcripts with
single Alus are significantly higher expressed
than transcripts containing iAlus (p-value
4.51e-35). The same holds for transcripts with
single vs. tandem arrangements (p-value
1.77e-22). Transcripts with iAlus show lower
expression than those with Alus in tandem
arrangement (p-value 1.98e-3). Within iAlu-
containing transcripts, the ones with Alus in
head-to-head orientation are lower expressed
than transcripts with Alus in tail-to-tail
orientation (p-value 2.47e-2).
1.2 Mio. Alu elements make up 10.76% (311,730,074 bp) of the human
genome. They are almost evenly distributed between genic and intergenic
regions. Within genic regions, introns are enriched compared to exons.
Coding regions make up 30% of the exonic partition, but only 4% are
covered by Alus. In contrast, non-coding exons and 3'-UTRs are
enriched by a factor of 1.4.
Transcripts containing Alus in exons are significantly lower expressed than
those with intronic Alus or Alu-free transcripts, even when all isoforms are
from one gene locus and thus presumably under the control of the same
single >
tandem >>> || <<<
iAlu >><< || <<<>>>
head-head <<<>>>
tail-tail >>><<<
str8
str7
str1 str2 str3 str4 str5 str6
We show that
- Alus are depleted in coding regions
- but are enriched in 3'UTRs and non-coding exons
- the location of Alus within these elements reduces
mRNA abundance
- head-to-head conformation of inverted Alus has
the strongest effect
- co-transcriptional folding of iSines leads to stable
long helical regions
Our results suggest that Alu elements shape the
transcriptomics landscape of human cells. Details of
the underlying mechanism(s), however, remain to
be determined.
Inverted Alus have the strongest effect on RNA abundance. In-silico
simulations of co-transcriptional folding behavior of a 3'-UTR with
iSines (tail-to-tail) show that the two Alus form a long helical
region, even though they belong to different Alu families.
Shortly after transcription of Alu2 (blue) starts (str3), the transient
structure of Alu1 (red) starts to open and basepair with Alu2 (step 4).
The energy barrier of 7.1 kcal/mol seems to be just low enough to
allow the duplex formation, but is sufficiently high to stabilize the
conformation and thus support co-transcriptional helix extension.
Once transcription of Alu2 is completed, another refolding event (str7
to str8) with a rather high energy barrier stabilizes the final helix.
Experimental data of different Alu-constructs confirm that stable
transient structures formed by Alus reduce transcript abundance.
5' 3'
5' 3'
5' 3'
head-to-tail head-to-head tail-to-tail
inverted Alus (iSINEs):
tail-to-head
tandem Alus (tSINEs):
single Alu Alu/non-alu
pair of transcripts
non-coding/
UTR exon
intron CDS exon
Alu-orientation in exons
Transcript abundance
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Numerous high-throughput sequencing studies focus on detecting conventionally spliced mRNAs in RNA-seq data. However, non-standard RNAs arising through gene fusion, circularization, or trans-splicing are often neglected. We introduce a novel, unbiased algorithm to detect splice junctions from single-end cDNA sequences. In contrast to other methods, our approach accommodates multi-junction structures. Our method compares favorably with competing tools on conventionally spliced mRNAs and, with a gain of up to 40\% of recall, systematically outperforms them on reads with multiple splits, trans-splicing and circular products. The algorithm is integrated into our mapping tool segemehl (www.bioinf.uni-leipzig.de/Software/segemehl/).
Article
Full-text available
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
Article
Full-text available
Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties. The ViennaRNA Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the Turner 2004 parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying RNAlib and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as centroid structures and maximum expected accuracy structures derived from base pairing probabilities, or z-scores for locally stable secondary structures, and support for input in fasta format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions. The ViennaRNA Package 2.0, supporting concurrent computations via OpenMP, can be downloaded from http://www.tbi.univie.ac.at/RNA.
Article
We introduce here a heuristic approach to kinetic RNA folding that constructs secondary structures by stepwise combination of building blocks. These blocks correspond to subsequences and their thermodynamically optimal structures. These are determined by the standard dynamic programming approach to RNA folding. Folding trajectories are modeled at base-pair resolution using the Morgan-Higgs heuristic and a barrier tree-based heuristic to connect combinations of the local building blocks. Implemented in the program Kinwalker, the algorithm allows co-transcriptional folding and can be used to fold sequences of up to about 1500 nucleotides in length. A detailed comparison with several well-studied examples from the literature, including the delayed folding of bacteriophage cloverleaf structures, the adenine sensing riboswitch, and the hok RNA, shows an excellent agreement of predicted trajectories and experimental evidence. The software is available as part of the ViennaRNA Package.
GENCODE: The reference human genome annotation for the ENCODE projectLandscape of transcription in human cellsA multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and fusion detection
  • Harrow
"GENCODE: The reference human genome annotation for the ENCODE project", Harrow et al., Genome Res., 22, 1760-1774 (2012) [2] "Landscape of transcription in human cells", Djebali et. al, Nature, 489, 101-8 (2012) [3] "A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and fusion detection", Hoffmann et al., Genome Biol. 15, R34 (2014)