[show abstract][hide abstract] ABSTRACT: GC 5' splice sites (5'ss) are present in ∼1% of human introns, but factors promoting their efficient selection are poorly understood. Here, we describe a case of X-linked agammaglobulinemia resulting from a GC 5'ss activated by a mutation in BTK intron 3. This GC 5'ss was intrinsically weak, yet it was selected in >90% primary transcripts in the presence of a strong and intact natural GT counterpart. We show that efficient selection of this GC 5'ss required a high density of GAA/CAA-containing splicing enhancers in the exonized segment and was promoted by SR proteins 9G8, Tra2β and SC35. The GC 5'ss was efficiently inhibited by splice-switching oligonucleotides targeting either the GC 5'ss itself or the enhancer. Comprehensive analysis of natural GC-AG introns and previously reported pathogenic GC 5'ss showed that their efficient activation was facilitated by higher densities of splicing enhancers and lower densities of silencers than their GT 5'ss equivalents. Removal of the GC-AG introns was promoted to a minor extent by the splice-site strength of adjacent exons and inhibited by flanking Alu repeats, with the first downstream Alus located on average at a longer distance from the GC 5'ss than other transposable elements. These results provide new insights into the splicing code that governs selection of noncanonical splice sites.
Nucleic Acids Research 05/2011; 39(16):7077-91. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Missense, nonsense, and translationally silent mutations can inactivate genes by altering the inclusion of mutant exons in mRNA, but their overall frequency among disease-causing exonic substitutions is unknown. Here, we have tested missense and silent mutations deposited in the BRCA1 mutation databases of unclassified variants for their effects on exon inclusion. Analysis of 21 BRCA1 variants using minigene assays revealed a single exon-skipping mutation c.231G>T. Comprehensive mutagenesis of an adjacent 12-nt segment showed that this silent mutation resulted in a higher level of exon skipping than the 35 other single-nucleotide substitutions. Exon inclusion levels of mutant constructs correlated significantly with predicted splicing enhancers/silencers, prompting the development of two online utilities freely available at http://www.dbass.org.uk. EX-SKIP quickly estimates which allele is more susceptible to exon skipping, whereas HOT-SKIP examines all possible mutations at each exon position and identifies candidate exon-skipping positions/substitutions. We demonstrate that the distribution of exon-skipping and disease-associated substitutions previously identified in coding regions was biased toward top-ranking HOT-SKIP mutations. Finally, we show that proteins 9G8, SC35, SF2/ASF, Tra2, and hnRNP A1 were associated with significant alterations of BRCA1 exon 6 inclusion in the mRNA. Together, these results facilitate prediction of exonic substitutions that reduce exon inclusion in mature transcripts.
Human Mutation 02/2011; 32(4):436-44. · 5.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: Genetic predisposition to type 1 diabetes (T1D) has been associated with a chromosome 11 locus centered on the proinsulin gene (INS) and with differential steady-state levels of INS RNA from T1D-predisposing and -protective haplotypes. Here, we show that the haplotype-specific expression is determined by INS variants that control the splicing efficiency of intron 1. The adenine allele at IVS1-6 (rs689), which rapidly expanded in modern humans, renders the 3' splice site of this intron more dependent on the auxiliary factor of U2 small nuclear ribonucleoprotein (U2AF). This interaction required both zinc fingers of the 35-kD U2AF subunit (U2AF35) and was associated with repression of a competing 3' splice site in INS exon 2. Systematic mutagenesis of reporter constructs showed that intron 1 removal was facilitated by conserved guanosine-rich enhancers and identified additional splicing regulatory motifs in exon 2. Sequencing of intron 1 in primates revealed that relaxation of its 3' splice site in Hominidae coevolved with the introduction of a short upstream open reading frame, providing a more efficient coupled splicing and translation control. Depletion of SR proteins 9G8 and transformer-2 by RNA interference was associated with exon 2 skipping whereas depletion of SRp20 with increased representation of transcripts containing a cryptic 3' splice site in the last exon. Together, these findings reveal critical interactions underlying the allele-dependent INS expression and INS-mediated risk of T1D and suggest that the increased requirement for U2AF35 in higher primates may hinder thymic presentation of autoantigens encoded by transcripts with weak 3' splice sites.
Human Genetics 10/2010; 128(4):383-400. · 4.63 Impact Factor
[show abstract][hide abstract] ABSTRACT: DBASS3 and DBASS5 provide comprehensive repositories of new exon boundaries that were induced by pathogenic mutations in human disease genes. Aberrant 5'- and 3'-splice sites were activated either by mutations in the consensus sequences of natural exon-intron junctions (cryptic sites) or elsewhere ('de novo' sites). DBASS3 and DBASS5 currently contain approximately 900 records of cryptic and de novo 3'- and 5'-splice sites that were produced by over a thousand different mutations in approximately 360 genes. DBASS3 and DBASS5 data can be searched by disease phenotype, gene, mutation, location of aberrant splice sites in introns and exons and their distance from authentic counterparts, by bibliographic references and by the splice-site strength estimated with several prediction algorithms. The user can also retrieve reference sequences of both aberrant and authentic splice sites with the underlying mutation. These data will facilitate identification of introns or exons frequently involved in aberrant splicing, mutation analysis of human disease genes and study of germline or somatic mutations that impair RNA processing. Finally, this resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays. DBASS3 and DBASS5 are freely available at http://www.dbass.org.uk/.
Nucleic Acids Research 10/2010; 39(Database issue):D86-91. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II) for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS) sensor used in our tool allows inference on non-canonical exons.
Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD). SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns.
We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.
[show abstract][hide abstract] ABSTRACT: Auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries.
A total of 2,546 Tetrapoda enhancers and silencers were clustered into 15 putative core motifs based on their Markov properties. Most of these elements have been identified previously, but 118 putative silencers and 260 enhancers (~15%) were novel. Examination of previously published experimental data for the presence of predicted elements showed that their mutations in 21/23 (91.3%) cases altered the splicing pattern as expected. Predicted intronic motifs flanking 3' and 5' splice sites had higher evolutionary conservation than other sequences within intronic flanks and the intronic enhancers were markedly differed between 3' and 5' intronic flanks.
Difference in intronic enhancers supporting 5' and 3' splice sites suggests an independent splicing commitment for neighboring exons. Increased evolutionary conservation for ISEs/ISSs within intronic flanks and effect of modulation of predicted elements on splicing suggest functional significance of found elements in splicing regulation. Most of the elements identified were shown to have direct implications in human splicing and therefore could be useful for building computational splicing models in biomedical research.
[show abstract][hide abstract] ABSTRACT: Transposable elements (TEs) make up a half of the human genome, but the extent of their contribution to cryptic exon activation that results in genetic disease is unknown. Here, a comprehensive survey of 78 mutation-induced cryptic exons previously identified in 51 disease genes revealed the presence of TEs in 40 cases (51%). Most TE-containing exons were derived from short interspersed nuclear elements (SINEs), with Alus and mammalian interspersed repeats (MIRs) covering >18 and >16% of the exonized sequences, respectively. The majority of SINE-derived cryptic exons had splice sites at the same positions of the Alu/MIR consensus as existing SINE exons and their inclusion in the mRNA was facilitated by phylogenetically conserved changes that improved both traditional and auxiliary splicing signals, thus marking intronic TEs amenable for pathogenic exonization. The overrepresentation of MIRs among TE exons is likely to result from their high average exon inclusion levels, which reflect their strong splice sites, a lack of splicing silencers and a high density of enhancers, particularly (G)AA(G) motifs. These elements were markedly depleted in antisense Alu exons, had the most prominent position on the exon-intron gradient scale and are proposed to promote exon definition through enhanced tertiary RNA interactions involving unpaired (di)adenosines. The identification of common mechanisms by which the most dynamic parts of the genome contribute both to new exon creation and genetic disease will facilitate detection of intronic mutations and the development of computational tools that predict TE hot-spots of cryptic exon activation.
Human Genetics 10/2009; 127(2):135-54. · 4.63 Impact Factor
[show abstract][hide abstract] ABSTRACT: Mutations that affect splicing of precursor messenger RNAs play a major role in the development of hereditary diseases. Most splicing mutations have been found to eliminate GT or AG dinucleotides that define the 5' and 3' ends of introns, leading to exon skipping or cryptic splice-site activation. Although accurate description of the mis-spliced transcripts is critical for predicting phenotypic consequences of these alterations, their exact nature in affected individuals cannot often be determined experimentally. Using a comprehensive collection of exons that sustained cryptic splice-site activation or were skipped as a result of splice-site mutations, we have developed a multivariate logistic discrimination procedure that distinguishes the two aberrant splicing outcomes from DNA sequences. The new algorithm was validated using an independent sample of exons and implemented as a free online utility termed CRYP-SKIP (http://www.dbass.org.uk/cryp-skip/). The web application takes up one or more mutated alleles, each consisting of one exon and flanking intronic sequences, and provides a list of important predictor variables and their values, the overall probability of activating cryptic splice vs exon skipping, and the location and intrinsic strength of predicted cryptic splice sites in the input sequence. These results will facilitate phenotypic prediction of splicing mutations and provide further insights into splicing enhancer and silencer elements and their relative importance for splice-site selection in vivo.
European journal of human genetics: EJHG 02/2009; 17(6):759-65. · 3.56 Impact Factor
[show abstract][hide abstract] ABSTRACT: Cryptic exons or pseudoexons are typically activated by point mutations that create GT or AG dinucleotides of new 5' or 3' splice sites in introns, often in repetitive elements. Here we describe two cases of tetrahydrobiopterin deficiency caused by mutations improving the branch point sequence and polypyrimidine tracts of repeat-containing pseudoexons in the PTS gene. In the first case, we demonstrate a novel pathway of antisense Alu exonization, resulting from an intronic deletion that removed the poly(T)-tail of antisense AluSq. The deletion brought a favorable branch point sequence within proximity of the pseudoexon 3' splice site and removed an upstream AG dinucleotide required for the 3' splice site repression on normal alleles. New Alu exons can thus arise in the absence of poly(T)-tails that facilitated inclusion of most transposed elements in mRNAs by serving as polypyrimidine tracts, highlighting extraordinary flexibility of Alu repeats in shaping intron-exon structure. In the other case, a PTS pseudoexon was activated by an A>T substitution 9 nt upstream of its 3' splice site in a LINE-2 sequence, providing the first example of a disease-causing exonization of the most ancient interspersed repeat. These observations expand the spectrum of mutational mechanisms that introduce repetitive sequences in mature transcripts and illustrate the importance of intronic mutations in alternative splicing and phenotypic variability of hereditary disorders.
Human Mutation 01/2009; 30(5):823-31. · 5.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: Despite a growing number of splicing mutations found in hereditary diseases, utilization of aberrant splice sites and their effects on gene expression remain challenging to predict. We compiled sequences of 346 aberrant 5'splice sites (5'ss) that were activated by mutations in 166 human disease genes. Mutations within the 5'ss consensus accounted for 254 cryptic 5'ss and mutations elsewhere activated 92 de novo 5'ss. Point mutations leading to cryptic 5'ss activation were most common in the first intron nucleotide, followed by the fifth nucleotide. Substitutions at position +5 were exclusively G>A transitions, which was largely attributable to high mutability rates of C/G>T/A. However, the frequency of point mutations at position +5 was significantly higher than that observed in the Human Gene Mutation Database, suggesting that alterations of this position are particularly prone to aberrant splicing, possibly due to a requirement for sequential interactions with U1 and U6 snRNAs. Cryptic 5'ss were best predicted by computational algorithms that accommodate nucleotide dependencies and not by weight-matrix models. Discrimination of intronic 5'ss from their authentic counterparts was less effective than for exonic sites, as the former were intrinsically stronger than the latter. Computational prediction of exonic de novo 5'ss was poor, suggesting that their activation critically depends on exonic splicing enhancers or silencers. The authentic counterparts of aberrant 5'ss were significantly weaker than the average human 5'ss. The development of an online database of aberrant 5'ss will be useful for studying basic mechanisms of splice-site selection, identifying splicing mutations and optimizing splice-site prediction algorithms.
Nucleic Acids Research 02/2007; 35(13):4250-63. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Auxiliary splicing signals play a major role in the regulation of constitutive and alternative pre-mRNA splicing, but their relative importance in selection of mutation-induced cryptic or de novo splice sites is poorly understood. Here, we show that exonic sequences between authentic and aberrant splice sites that were activated by splice-site mutations in human disease genes have lower frequencies of splicing enhancers and higher frequencies of splicing silencers than average exons. Conversely, sequences between authentic and intronic aberrant splice sites have more enhancers and less silencers than average introns. Exons that were skipped as a result of splice-site mutations were smaller, had lower SF2/ASF motif scores, a decreased availability of decoy splice sites and a higher density of silencers than exons in which splice-site mutation activated cryptic splice sites. These four variables were the strongest predictors of the two aberrant splicing events in a logistic regression model. Elimination or weakening of predicted silencers in two reporters consistently promoted use of intron-proximal splice sites if these elements were maintained at their original positions, with their modular combinations producing expected modification of splicing. Together, these results show the existence of a gradient in exon and intron definition at the level of pre-mRNA splicing and provide a basis for the development of computational tools that predict aberrant splicing outcomes.
Nucleic Acids Research 02/2007; 35(19):6399-413. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The branch point sequence (BPS) is a conserved splicing signal important for spliceosome assembly and lariat intron formation. BPS mutations may result in aberrant pre-mRNA splicing and genetic disorders, but their phenotypic consequences have been difficult to predict, largely due to a highly degenerate nature of the BPS consensus. Here, we have examined the splicing pattern of nine reporter pre-mRNAs that have previously been shown to give rise to human hereditary diseases as a result of single-nucleotide substitutions in the predicted BPS. Increased exon skipping and intron retention observed in vivo were recapitulated for each mutated pre-mRNA, but the reproducibility of cryptic splice site activation was lower. BP mutations in reporter pre-mRNAs frequently induced aberrant 3' splice sites and also activated a cryptic 5' splice site. Systematic mutagenesis of BP adenosines showed that in most pre-mRNAs, the expression of canonical transcripts was lower for BP transitions than BP transversions. Differential splicing outcome for transitions vs. transversions was abrogated or reduced if introns were truncated to 200 nt or less, suggesting that the nature of the BP residue is less critical for interactions across very short introns. Together, these results improve prediction of phenotypic consequences of point mutations upstream of splice acceptor sites and suggest that the overrepresentation of disease-causing adenosine-to-guanosine BP substitutions observed in Mendelian disorders is due to more profound defects of gene expression at the level of pre-mRNA splicing.
Human Mutation 09/2006; 27(8):803-13. · 5.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: Alternative splicing of HLA-DQB1 exon 4 is allele-dependent and results in variable expression of soluble DQbeta. We have recently shown that differential inclusion of this exon in mature transcripts is largely due to intron 3 variants in the branch point sequence (BPS) and polypyrimidine tract. To identify additional regulatory cis-elements that contribute to haplotype-specific splicing of DQB1, we systematically examined the effect of guanosine (G) repeats on intron 3 removal. We found that the GGG or GGGG repeats generally improved splicing of DQB1 intron 3, except for those that were adjacent to the 5' splice site where they had the opposite effect. The most prominent splicing enhancement was conferred by GGGG motifs arranged in tandem upstream of the BPS. Replacement of a G-rich segment just 5' of the BPS with a series of random sequences markedly repressed splicing, whereas substitutions of a segment further upstream that lacked the G-rich elements and had the same size did not result in comparable splicing inhibition. Systematic mutagenesis of both suprabranch guanosine quadruplets (G(4)) revealed a key role of central G residues in splicing enhancement, whereas cytosines in these positions had the most prominent repressive effects. Together, these results show a significant role of tandem G(4)NG(4) structures in splicing of both complete and truncated DQB1 intron 3, support position dependency of G repeats in splicing promotion and inhibition, and identify positively and negatively acting sequences that contribute to the haplotype-specific DQB1 expression.
The Journal of Immunology 03/2006; 176(4):2381-8. · 5.52 Impact Factor
[show abstract][hide abstract] ABSTRACT: Predisposition to type 1 diabetes and juvenile obesity is influenced by the susceptibility locus IDDM2 that includes the insulin gene (INS). Although the risk conferred by IDDM2 has been attributed to a minisatellite upstream of INS, intragenic variants have not been ruled out. We examined whether INS polymorphisms affect pre-mRNA splicing and proinsulin secretion using minigene reporter assays. We show that IVS1-6A/T (-23HphI+/-) is a key INS variant that influences alternative splicing of intron 1 through differential recognition of its 3' splice site. The A allele resulted in an increased production of mature transcripts with a long 5' leader in several cell lines, and the extended mRNAs generated more proinsulin in culture supernatants than natural transcripts. The longer mRNAs were significantly overrepresented among beta-cell-expressed sequenced tags containing the A allele as compared with those with T alleles. In addition, we show that a rare insertion/deletion polymorphism IVS1+5insTTGC (IVS-69), which is exclusively present in Africans, activated a downstream cryptic 5' splice site, extending the 5' leader by 30 bp. These results indicate that -23HphI and IVS-69 are the most important INS variants affecting pre-mRNA splicing and suggest that -23HphI+/- is a common functional single nucleotide polymorphism at IDDM2.
[show abstract][hide abstract] ABSTRACT: The frequency distribution of mutation-induced aberrant 3' splice sites (3'ss) in exons and introns is more complex than for 5' splice sites, largely owing to sequence constraints upstream of intron/exon boundaries. As a result, prediction of their localization remains a challenging task. Here, nucleotide sequences of previously reported 218 aberrant 3'ss activated by disease-causing mutations in 131 human genes were compared with their authentic counterparts using currently available splice site prediction tools. Each tested algorithm distinguished authentic 3'ss from cryptic sites more effectively than from de novo sites. The best discrimination between aberrant and authentic 3'ss was achieved by the maximum entropy model. Almost one half of aberrant 3'ss was activated by AG-creating mutations and approximately 95% of the newly created AGs were selected in vivo. The overall nucleotide structure upstream of aberrant 3'ss was characterized by higher purine content than for authentic sites, particularly in position -3, that may be compensated by more stringent requirements for positive and negative nucleotide signatures centred around position -11. A newly developed online database of aberrant 3'ss will facilitate identification of splicing mutations in a gene or phenotype of interest and future optimization of splice site prediction tools.
Nucleic Acids Research 02/2006; 34(16):4630-41. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Auxiliary splicing signals in introns play an important role in splice site selection, but these elements are poorly understood. We show that a subset of serine/arginine (SR)-rich proteins activate a cryptic 3' splice site in a sense Alu repeat located in intron 4 of the human LST1 gene. Utilization of this cryptic splice site is controlled by juxtaposed Alu-derived splicing silencers and enhancers between closely linked short tandem repeats TNFd and TNFe. Systematic mutagenesis of these elements showed that AG dinucleotides that were not preceded by purine residues were critical for repressing exon inclusion of a chimeric splicing reporter. Since the splice acceptor-like sequences are present in excess in exonic splicing silencers, these signals may contribute to inhibition of a large number of pseudosites in primate genomes.
Molecular and Cellular Biology 09/2005; 25(16):6912-20. · 5.37 Impact Factor
[show abstract][hide abstract] ABSTRACT: Nonsense-mediated mRNA decay (NMD) is a eukaryotic quality-control mechanism that detects and degrades aberrant transcripts prematurely terminating translation. NMD may be elicited by intergenic transcripts that contain premature termination codons (PTCs), but chimeric mRNAs of genes that have introns of identical phase would be predicted to lack PTCs and escape NMD. We examined intron phase I-containing HLA class II genes for the presence of intergenic mRNAs and found an extraordinary diversity of correctly spliced and polyadenylated intergenic transcripts. They lacked a significant homology at the chimeric joins and had no PTCs. Their expression levels were very low and positively correlated with the expression of natural transcripts. In contrast, pair-wise mixtures of separately transcribed plasmids carrying full-length HLA-DQB1, -DQA1, -DRB1, and -DRA cDNAs produced only hybrid molecules that lacked canonical exon boundaries, had homologous chimeric joins, and occasionally contained PTCs, implicating in vitro artifacts generated by template switching of Taq polymerase and reverse transcriptase. A differential exon structure of hybrid molecules observed in vitro and in cellular RNA preparations suggests that intergenic mRNAs with canonical exon boundaries arise in vivo during exon joining and/or transcription. Since the observed intergenic mRNAs may encode mixed class II heterodimers that were previously shown to present antigens it will be interesting to determine functional properties of such molecules in future studies.