ArticlePDF Available

Pseudoexons provide a mechanism for allele-specific expression of APC in familial adenomatous polyposis

Authors:

Abstract and Figures

Allele-specific expression (ASE) of the Adenomatous Polyposis Coli (APC) gene occurs in up to one-third of families with adenomatous polyposis (FAP) that have screened mutation-negative by conventional techniques. To advance our understanding of the genomic basis of this phenomenon, 54 APC mutation-negative families (21 with classical FAP and 33 with attenuated FAP, AFAP) were investigated. We focused on four families with validated ASE and scrutinized these families by sequencing of the blood transcriptomes (RNA-seq) and genomes (WGS). Three families, two with classical FAP and one with AFAP, revealed deep intronic mutations associated with pseudoexons. In all three families, intronic mutations (c.646-1806T>G in intron 6, c.1408+729A>G in intron 11, and c.1408+731C>T in intron 11) created new splice donor sites resulting in the insertion of intronic sequences (of 127 bp, 83 bp, and 83 bp, respectively) in the APC transcript. The respective intronic mutations were absent in the remaining polyposis families and the general population. Premature stop of translation as the predicted consequence as well as co-segregation with polyposis supported the pathogenicity of the pseudoexons. We conclude that next generation sequencing on RNA and genomic DNA is an effective strategy to reveal and validate pseudoexons that are regularly missed by traditional screening methods and is worth considering in apparent mutation-negative polyposis families.
Content may be subject to copyright.
Oncotarget1
www.impactjournals.com/oncotarget
www.impactjournals.com/oncotarget/ Oncotarget, Advance Publications 2016
Pseudoexons provide a mechanism for allele-specic expression
of APC in familial adenomatous polyposis
Taina T. Nieminen1, Walter Pavicic1,2, Noora Porkka1, Matti Kankainen3, Heikki J.
Järvinen4, Anna Lepistö5, Päivi Peltomäki1
1University of Helsinki, Medical and Clinical Genetics, Helsinki, Finland
2Laboratorio de Citogenética y Mutagénesis, Instituto Multidisciplinario de Biología Celular (IMBICE-CONICET-CICPBA), La
Plata, Argentina
3University of Helsinki, Institute for Molecular Medicine Finland, Helsinki, Finland
4Second Department of Surgery, Helsinki University Central Hospital, Helsinki, Finland
5Department of Colorectal Surgery, Abdominal Center, Helsinki University Hospital, Helsinki, Finland
Correspondence to: Taina T. Nieminen, email: Taina.Nieminen@Helsinki.Fi
Keywords: familial adenomatous polyposis, APC, pseudoexon, RNA-seq, allele-specic expression
Received: February 09, 2016 Accepted: September 12, 2016 Published: September 23, 2016
ABSTRACT
Allele-specic expression (ASE) of the Adenomatous Polyposis Coli (APC) gene
occurs in up to one-third of families with adenomatous polyposis (FAP) that have
screened mutation-negative by conventional techniques. To advance our understanding
of the genomic basis of this phenomenon, 54 APC mutation-negative families (21
with classical FAP and 33 with attenuated FAP, AFAP) were investigated. We focused
on four families with validated ASE and scrutinized these families by sequencing of
the blood transcriptomes (RNA-seq) and genomes (WGS). Three families, two with
classical FAP and one with AFAP, revealed deep intronic mutations associated with
pseudoexons. In all three families, intronic mutations (c.646-1806T>G in intron 6,
c.1408+729A>G in intron 11, and c.1408+731C>T in intron 11) created new splice
donor sites resulting in the insertion of intronic sequences (of 127 bp, 83 bp, and 83
bp, respectively) in the APC transcript. The respective intronic mutations were absent
in the remaining polyposis families and the general population. Premature stop of
translation as the predicted consequence as well as co-segregation with polyposis
supported the pathogenicity of the pseudoexons. We conclude that next generation
sequencing on RNA and genomic DNA is an effective strategy to reveal and validate
pseudoexons that are regularly missed by traditional screening methods and is worth
considering in apparent mutation-negative polyposis families.
INTRODUCTION
Familial adenomatous polyposis (FAP; OMIM
#175100) is characterized by a dominant predisposition
to multiple adenomatous polyps throughout the colon and
rectum as a consequence of germline mutations in the
Adenomatous Polyposis Coli (APC) gene [1]. While FAP
mostly represents an inherited disease, up to 25% may
result from de novo mutations of APC without any family
history of the disease [2]. The number of adenomatous
polyps in the bowel is used to stratify APC-associated
polyposis into a classical form (FAP; 100 adenomas or
more) and attenuated form (AFAP; below 100 adenomas).
These two phenotypes additionally differ relative to the
onset of polyposis (in the second or third decades of life in
FAP vs. later in AFAP), colonic location (left-sided disease
in FAP vs. frequently right-sided disease in AFAP), and
life-time risk of colorectal cancer (100% in FAP vs. up to
70% in AFAP) [1, 3].
The APC gene has 16 exons and translation starts
from exon 2 (http://insight-group.org/variants/database/
APC). More than 1,500 unique germline mutations in APC
are known [4]. The frequency of detectable APC mutations
in polyposis patients varies a lot depending on the method
of ascertainment of the patients and families, and the
strategies used for mutation screening. In a large cohort
of individuals who had undergone clinical genetic testing
because of a personal or family history of polyposis,
Oncotarget2
www.impactjournals.com/oncotarget
58% (851/1457) of those with classic polyposis and 9%
(376/4223) of those with AFAP had APC mutations by
exon-specic sequencing and large rearrangement analysis
of the APC gene [3]. Moreover, Grover et al. [3] found that
the APC mutation rate progressively increased with the
cumulative adenoma count (being 80% in individuals with
at least one thousand adenomas), while the mutation rate
of MUTYH, which is another polyposis-associated gene,
remained constant (below 10 percent) across all polyp
number categories. Sanger sequencing of genomic DNA
to examine the coding exons and intron-exon boundaries
of APC, combined with multiplex ligation-dependent
probe amplication (MLPA) for large rearrangements is
the standard mutation screening strategy adopted by most
laboratories [4]. The protein truncating test (PTT) was
commonly used in previous years and may be benecial
in certain situations [5]. In a typical PTT design, APC
exons are examined in RNA, except for the last exon
that is investigated in genomic DNA. Nevertheless,
over 20% of classical FAP and up to 80% of AFAP
patients remain APC mutation-negative, which may be
attributable to methodological shortcomings in association
with particular types of mutations [5-8], nontruncating
alterations with uncertain pathogenic signicance [2], and
susceptibility associated with other genes than APC, such
as MUTYH [3], POLE and POLD [9], and AXIN2 [10].
Unbalanced expression of the two parental alleles,
due to loss-of-function mutations or various cis- or
trans-acting factors, may facilitate the identication of
susceptibility genes for human diseases [11]. APC mutations
occurring prior to the last exon of the gene are associated
with allele-specic expression (ASE) [12]. ASE imbalance
of APC has been found in blood samples from 9 – 31%
of adenomatous polyposis families without any detectable
APC mutations by conventional techniques, suggesting the
existence of hidden mutations [12–14]. Moreover, ASE of
APC may contribute to common forms of colorectal cancer,
as colorectal cancer risk has been shown to increase along
with increasing ASE imbalance [15].
This study was undertaken to address the underlying
basis of predisposition in 54 APC mutation-negative
adenomatous polyposis families from Finland, with a
particular focus on families with constitutionally unbalanced
mRNA expression of APC alleles by Single Nucleotide
Primer Extension (SNuPE) [13]. Interrogation of the latter
four families by whole transcriptome (RNA-seq) and whole-
genome (WGS) sequencing revealed deep intronic mutations
associated with pseudoexons in three of four families.
RESULTS
Identication of pseudoexons by RNA-seq and
deep intronic mutations as their underlying
causes
We focused on three FAP families (42, 85, and 103)
from the research-based cohort (Figure 1 and Table 1). The
families were associated with ASE imbalance of APC by
SNuPE but no identiable causative change in APC had
been detected by PTT, Sanger sequencing of all exons
and intron/exon borders, MLPA, and promoter mutation
and methylation analyses (ref. [8] and this study). Only
family 85 included several affected members. Of these,
85-1 [13] and 85-2 (Supplementary Figure S1) showed
ASE imbalance, whereas 85-3 was uninformative in ASE
analysis due to homozygosity for polymorphisms. No
RNA was available from 85-4.
Blood RNA specimens from the three ASE families
were subjected to RNA-seq. Data analysis revealed
aberrant splice junctions which raised a suspicion of
pseudoexons, i.e., inclusion of intronic sequence in
the mature mRNA, in families 42 and 85 (Figure 2).
To verify pseudoexons, APC cDNA was amplied in
ve overlapping fragments with primers described in
Spier et al. [6] in addition to which primers from exons
11 (forward) and 13 (reverse) were used to evaluate the
suspected pseudoexon in family 85 (Supplementary Table
S1 and Figure 3). Sequencing of reverse transcription
(RT)-PCR products (fragment 2 in family 42 and fragment
4 as well as the exon 11-13-specic fragment in family 85)
revealed a 127-bp insertion from intron 6 in family 42 and
an 83-bp insertion from intron 11 in family 85.
As the predisposing mutations of the families
were unknown, WGS on blood DNA was applied. At the
outset, mutations in the APC coding region and exon/
intron borders had been screened for (see Materials and
Methods). Particularly, WGS offered the opportunity to
investigate the entire introns of APC as well as regions
outside APC. Families 42 and 85 revealed deep intronic
mutations, both creating new splice donor sites (/gt):
c.646-1806T>G in intron 6 and c.1408+731C>T in intron
11 of APC, respectively (Figure 4). The changes were
validated by Sanger sequencing.
The remaining 51 families (Table 1) were
subsequently screened by Sanger sequencing with primers
from introns 6 and 11 of APC (Supplementary Table S1)
to examine the presence of the deep intronic mutations
identied in families 42 and 85. These particular mutations
were absent in the remaining families. Incidentally,
however, family 163 revealed another nucleotide
substitution (c.1408+729A>G) two nucleotides upstream
of the mutation present in family 85 (Figure 4). The
nucleotide change in family 163 was predicted to activate
the same cryptic splice donor site (AG/gt) as the mutation
in family 85 (the nucleotide substitutions created an
apparently viable AG/ and /gt, respectively) (Figure 5).
Family 163 represented a clinic-based cohort for which
only DNA was routinely available. However, we were able
to obtain RNA from the single affected family member in
a separate effort. RNA-seq (Figure 2) and RT-PCR (Figure
3) identied an 83-bp insertion from intron 11, identical
to that in family 85. Furthermore, the c.1408+729A>G
mutation was part of the resulting transcript unlike the
deep intronic mutations of families 42 and 85 (Figure 5).
Oncotarget3
www.impactjournals.com/oncotarget
Figure 1: Pedigrees of ASE families. Pedigrees of adenomatous polyposis families with ASE. Individuals with polyposis and/or
colorectal cancer are indicated (see Table 1 for additional clinical details). Plus sign denotes carriers of deep intronic mutations associated
with pseudoexons of APC. Index persons are marked with arrows.
Table 1: Clinical and molecular characteristics of polyposis cases investigated
Case
IDa
ASE
statusb
Inheritance
pattern
Number
of polyps
Age at
diagnosisc
Extracolonic
manifestationsd
Classication
of familye
Large
rearrangement
by MLPAf
APC
methylation by
MS-MLPAg
RESEARCH
BASED 42 ASE Sporadic 100-1000 40 No FAP No No
78 N Sporadic 50 55 No AFAP No No
85-1 ASE dominant 2000 38 Yes FAP No No
85-2 ASE dominant 100-200 16 No FAP No No
85-3 NI dominant 2000 44 No FAP NA No
85-4 dominant 100-200 12 No FAP NA No
88 N sporadic 100-1000 58 No FA P No No
92 N sporadic 200 51 No FAP No No
96 NI sporadic 561 48 No FAP No No
97 N sporadic > 1000 58 No FAP No No
98 dominant 100-1000 30 No FAP No No
100 NI sporadic 30 62 No AFAP No No
103 (ASE) sporadic > 100 51 No FAP No No
104 N dominant? 210 54 Yes FA P No No
111 NI sporadic 30-40 36 No AFAP No No
(Continued )
Oncotarget4
www.impactjournals.com/oncotarget
Case
IDa
ASE
statusb
Inheritance
pattern
Number
of polyps
Age at
diagnosisc
Extracolonic
manifestationsd
Classication
of familye
Large
rearrangement
by MLPAf
APC
methylation by
MS-MLPAg
123 NI sporadic 2100 37 No FA P No No
125 N sporadic 300 31 No FAP No No
CLINIC
BASED 134 sporadic 200-300 55 No FA P No No
136 sporadic >100 67 Ye s FAP No No
139 sporadic 100 71 No FA P No No
145 recessive 20-50 61 No AFAP No No
148 sporadic 150-200 50 No FA P No No
158 sporadic 50 49 Yes AFAP No No
159 sporadic 200 50 No FA P No No
162 sporadic >50 52 No AFAP No No
163 (ASE) sporadic 10-20 16 Yes AFAP No No
165-1 dominant? Colon
cancer x 2 50 NA AFAP No No
165-2 dominant? 20-30 33 Yes AFAP No No
168 sporadic 100 56 Ye s FA P No No
177 sporadic 100-200 52 No FA P No No
179 dominant? >10 23 No AFAP No No
180 NA >100 38 NA FA P No No
1001 dominant 10 48 NA AFAP No No
1003 sporadic 20-30 70 NA AFAP No No
1005 dominant 10-20 68 Yes AFAP No No
1006 sporadic 20 60 No AFAP No No
1007 sporadic 20 30 No AFAP No No
1010 dominant 5-10 68 NA AFAP No No
1011 NA 60-100 31 NA FA P No No
1013 sporadic >100 48 NA FAP No No
1015 sporadic 10 47 Ye s AFAP No No
1017 sporadic? 10-20 57 NA AFAP No No
1018 sporadic 20-30 74 NA AFAP No No
1019 sporadic 2-3 30 Yes AFAP No No
1020 sporadic 3 35 Yes AFAP No No
1021 sporadic 30 72 NA AFAP No No
1022 dominant 3 65 Yes AFAP No No
1023 sporadic 40 33 NA AFAP No No
1024 sporadic 20 72 NA AFAP No No
1025 sporadic 20-30 67 NA AFAP No No
1026 sporadic 10-20 51 NA AFAP No No
1029 sporadic 20-30 56 No AFAP No No
(Continued )
Oncotarget5
www.impactjournals.com/oncotarget
Analysis of the individual RNA reads with pseudoexons
validated the presence of variant nucleotide (G) at the
position of the mutation, indicating that the variant
nucleotide was specically associated with pseudoexon
formation (Supplementary Figure S2). Finally, SNuPE
analysis of cDNA from the index individual from family
163 showed putative ASE with the value of 1.7 for the
ratio of allelic peak areas in cDNA relative to genomic
DNA at rs2229992 (Supplementary Figure S1).
Pathogenicity of pseudoexons
The pseudoexon ndings are summarized in Table
2. The pseudoexons in families 42, 85, and 163 were
all predicted to cause premature stop of translation; the
very rst three nucleotides of the pseudoexon in family
42 coded for a stop of translation, whereas in families
85 and 163 the pseudoexon caused a frameshift and a
premature stop 55 codons later. The following evidence
supports the idea that the pseudoexons underlay polyposis
predisposition in all three families. First, the splice
prediction program BDGP (Materials and Methods)
indicated a splice efciency of 99% for the new splice
donor sites introduced in intron 6 in family 42 and intron
11 in family 85. The new splice donor site in intron 11 in
family 163 did not match with the canonical splice site
model and was therefore not recognized by the splice
prediction programs. However, RNA-seq and our cloning
experiment showed that all pseudoexon-containing
transcripts had the variant nucleotide G in the 3’ end of
the pseudoexon (Supplementary Figure S2). Moreover,
our cloning experiment (on the ex 11 – 13 fragment, see
legend to Figure 3) combined with haplotype analysis
(with SNuPE markers) suggested that all transcripts
representing the mutant allele, as inferred from haplotypes,
had the pseudoexon inserted (data not shown). Second, the
intronic variant showed a complete co-segregation with
polyposis in family 85 (Figure 1). The variants were also
absent in the general population (ExAC Browser Beta,
SISu and Ensembl databases and our investigation of 300
anonymous blood donors from Finland). Finally, WGS
data available for families 42 and 85 revealed no other
apparently pathogenic mutations in established cancer
genes as possible alternative explanations for polyposis
predisposition.
SNuPE vs. RNA-seq in the detection of ASE
The ASE diagnoses of the four families (42, 85, 103,
and 163) with unbalanced expression of APC alleles in
our series (Table 1) were initially based on SNuPE. To
evaluate if ASE imbalance was also recoverable in RNA-
seq data, a genome-wide ASE imbalance analysis was
performed as described in Materials and Methods. The
results are given in Supplementary Table S2. Applying
stringent criteria for ASE, FAP42 and FAP85 (individual
85-2) revealed unequivocal ASE for APC (q-value <
0.05). Three APC-mutation-positive cases not belonging
to the study series specied in Table 1 were also included,
and ASE was detected in one (the remaining two were
uninformative). FAP85 (individual 85-1) and FAP103,
as well as healthy control sample 3, showed borderline
ASE which was, however, not statistically signicant
after multiple hypothesis correction (q value > 0.05 and
0.15). The ASE value for APC in AFAP163 did not
reach statistical signicance. As shown in Supplementary
Case
IDa
ASE
statusb
Inheritance
pattern
Number
of polyps
Age at
diagnosisc
Extracolonic
manifestationsd
Classication
of familye
Large
rearrangement
by MLPAf
APC
methylation by
MS-MLPAg
1030 sporadic >10 59 No AFAP No No
1032 NA 8 63 Ye s AFAP No No
1034 NA >10 62 No AFAP No No
1035 sporadic 20-30 71 No AFAP No No
1036 dominant >10 61 No AFAP No No
1037 sporadic ~10 52 No AFAP No No
aIdentication number of family, followed by identication number of individual if several family members were studied.
bASE, shows allele-specic expression of APC; (ASE), putative ASE (see Materials and Methods); N, no ASE; NI, not
informative (homozygous); blank, no RNA available
cPolyposis or colorectal carcinoma, whichever comes rst
dDesmoids and duodenal adenomas in particular
eBased on the number of intestinal adenomas with 100 as the cut-off
fP043-C1 assay from MRC-Holland
gME001-C1 assay from MRC-Holland
NA, information not available
Oncotarget6
www.impactjournals.com/oncotarget
Table S2, the overall concordance between ASE results by
SNuPE and RNA-seq was high.
DISCUSSION
Canonical splice-site sequences at the intron/exon
borders dene exons. The canonical 5’ (splice donor) site
has a consensus sequence AG/gtragt and the 3’ (splice
acceptor) site poly(y)nyag/G (where capital letters indicate
exonic and lowercase letters intronic sequence, r denotes
purine, y pyrimidine, and n any nucleotide, and the nearly
invariant nucleotides are underlined) [16]. Pseudoexons
are intronic sequences of 50 – 300 bp in length that have
apparent 5’ and 3’ splice sites, but are normally ignored
by the splicing machinery [17, 18]. Pseudoexons can be
activated by mutations that create viable splice donor
or acceptor sites by different mechanisms, resulting in
the insertion of intronic sequences in the mature mRNA
[19]. Such mutations can be inherited and may cause
predisposition to cancer syndromes, including ataxia-
telangiectasia (ATM) [20], breast and ovarian cancer
(BRCA2) [21], Lynch syndrome (MSH2) [22] and familial
adenomatous polyposis (APC) (ref. [6] and this study).
From the therapeutic point of view, location far outside the
Figure 2: RNA-seq (42, 85-2, and 163). Sashimi plots to visualize splice junctions. IGV display of RNA-seq data is provided for
an affected representative of each family and a healthy control individual for reference for each region. Sequence alignments are based
on TopHat. The region between APC exons 6 and 7 (GRCh37/Hg19) is shown for FAP42 (Figure 2A) and that between exons 11 and 12
for FAP85 (individual 85-2) and AFAP163 (Figure 2B). The locations of pseudoexons are indicated by horizontal bars. A 54-bp in-frame
insertion present in the normal reference sample, too, and not associated with any genomic change is denoted by a dashed bar (Figure
2B). The same insertion was discovered in an earlier investigation [12]. Numbers on the plots indicate APC exon coverages expressed as
junction depth. Splice events corresponding to pseudoexons are boxed and those associated with the 54-bp insertion are underlined; the
remaining ones represent canonical splicing.
Oncotarget7
www.impactjournals.com/oncotarget
coding sequence makes deep intronic mutations excellent
candidates for correction by antisense oligonucleotides to
restore the production of normal protein [20, 21].
Using RNA-seq and WGS, we discovered two
different pseudoexons (127-bp insertion from intron
6 and 83-bp insertion from intron 11) caused by three
different heterozygous germline mutations in APC. To
our knowledge, our effort is the rst one successfully
identifying pseudoexons in APC using next-generation
sequencing and the second ever to reveal APC-related
pseudoexons in FAP. The study by Spier et al. [6] was the
rst report and described two different APC pseudoexons
(167-bp insertion from intron 5 and 83-bp insertion
from intron 11). These pseudoexons were caused by
three different heterozygous germline mutations. By RT-
PCR screen of APC cDNA from 125 APC- and MUTYH
mutation-negative adenomatous polyposis cases from
Germany, a frequency of 6.4% (8/125 individuals) was
obtained for cases with an identiable genomic change
underlying pseudoexon formation. Interestingly, the
pseudoexon in intron 11 occurring in our family 85 in
association with c.1408+731C>T nucleotide substitution
was on genomic DNA and RNA level precisely the same
as that present in two unrelated German patients [6]. The
region around position +731 in intron 11 may be prone
to pseudoexon formation in general, given the existence
of two additional pseudoexon-associated nucleotide
substitutions in this region, one located two nucleotides
upstream (our study) and another one six nucleotides
downstream of position +731 [6]. The overall frequency
of APC pseudoexons in our series from Finland (3/54
index patients, 5.5%) may be an underestimate since our
full pseudoexon screen focused on four index patients
with unbalanced expression of APC alleles, whereas the
remaining index patients (with mainly DNA available
only) underwent a targeted screen for the same mutations
identied in the former patients.
Diagnostic strategies mostly target coding
regions in DNA [4]. Detection of disease-associated
pseudoexons in turn requires simultaneous RNA- and
DNA-based evidence to demonstrate the insertion
of extraneous sequence in mRNA and distinguish
transcriptional post-modication errors from deep
intronic mutation in genomic DNA as the mechanistic
basis of insertion. Hence, validated disease-associated
pseudoexons have remained scarce [6, 20–22], despite
the fact that potential pseudoexons are frequent in
introns of human genes [18]. The pseudoexons in
families 42 and 85 were missed by our original PTT
screen [23]. Family 42 did reveal a visible truncation,
Figure 3: RT-PCR (42, 85-1, 163, 103). RT-PCR analysis of samples from ASE families. RT-PCR products separated by gel
electrophoresis are shown. Arrows denote fragments with intronic insertions (pseudoexons). Fragment 2 (upper panel) encompasses a
615-bp cDNA segment from exon 4 to exon 9 [6] and shows a heterozygous 127-bp insertion in family 42. The wild-type size of the exon
11 - exon 13 fragment (lower panel) is 246 bp (Supplementary Table S1). An identical 83-bp insertion in families 85 (case 85-1) and 163 is
evident. The RT-PCR products from the index persons and healthy controls were cloned and sequenced to verify their DNA sequences. In
the exon 11 - exon 13 fragment, a 54-bp in-frame insertion (see legend for Figure 2) accompanied the pseudoexon and wild-type sequences
in a proportion (up to one-third) of all clones and likely contributed to the multiplicity of fragments seen after gel electrophoresis.
Oncotarget8
www.impactjournals.com/oncotarget
but the subsequent search of a causative change by
Sanger sequencing of genomic DNA did not extend deep
into the introns [23]. On the other hand, no convincing
extra fragment was visible for FAP85. This is likely
attributable to some commonly observed disadvantages
of PTT, such as decreased RNA stability and assay
artifacts [24]. Instead of PTT, family 163 originally
underwent an exon-by-exon screen in genomic DNA [8]
that, obviously, was not able to capture deep intronic
mutations.
Family 103 showed putative ASE imbalance
(Table 1, Supplementary Table S2), but neither RNA-
seq nor WGS revealed variants that might underlie the
suggestive ASE phenotype. This apparently sporadic
case with classical FAP (Figure 1) might be explained
by a mosaicism for APC mutation; such mutations are
Figure 4: WGS (42 and 85-2) + Sanger seq. Deep intronic mutations in APC. Upper panels provide IGV display of WGS data for
intron 6 in FAP42 (Figure 4A) and intron 11 in FAP85, individual 85-2 (Figure 4B). Lower panels show Sanger sequence tracings of the
mutations. In Figure 4B, the Sanger sequencing result of AFAP163 is also given (AFAP163 was not included in WGS analysis).
Oncotarget9
www.impactjournals.com/oncotarget
challenging to detect and verify [5]. Eventual in-cis or
in-trans regulatory changes or complex rearrangements
escaping detection by sequencing would be examples
of other theoretical possibilities to consider in future
investigations. The ASE phenotype in FAP103 affected
many other genes beyond APC (Supplementary Table S2),
offering possible candidate genes to be tested for germline
alterations. It is important to note that up to ~20% of all
informative genes expressed in lymphoblastoid cells/
blood may show ASE even in healthy control individuals
Figure 5: Schematic diagrams of pseudoexons. Schematic diagrams of APC pseudoexons identied. The canonical splice sites at
the exon/intron borders, pseudoexons (underlined), and the responsible deep intronic mutations (in bold) are highlighted.
Oncotarget10
www.impactjournals.com/oncotarget
[25, 26], and the underlying cause remains elusive for
most genes.
The site of germline mutation in the APC gene is
known to correlate with the disease phenotype [1]. Our
family 42 with pseudoexon 6/7 was associated with
classical FAP in agreement with genotype-phenotype
expectations. Among the two pseudoexon 11/12 families,
family 85 complied with established genotype-phenotype
correlations by showing classical FAP like the two
German families with the same mutation. Family 163 was
classied as AFAP based on polyp count (10 – 20), but also
showed features more typical (although not exclusive) of
a profuse form of FAP such as low age at onset (16 years)
and presence of extracolonic manifestations (mandibular
osteomas) (Table 1). In FAP85, we demonstrated co-
segregation of the respective genomic change with
polyposis (Figure 1). Unfortunately, segregation studies
were not possible in the remaining two families because
of the lack of additional affected members.
Next-generation sequencing techniques are changing
the screening for predisposing mutations. Targeted
gene panels capturing the entire introns in addition to
exons and combined with deep sequencing are likely to
replace current screening protocols that rely on exon-
specic Sanger sequencing and MLPA [4, 27]. We show
that deep intronic mutations of the APC gene explained
three out of four FAP and AFAP families displaying ASE
imbalance and remaining mutation-negative by traditional
methods. This indicates that our strategy to use ASE
for pre-selection of cases for pseudoexon testing was
effective and could even serve as a proxy for the initial
screening of out-of-frame pseudoexon insertion events
in FAP and AFAP. Unavailability of RNA made ASE and
pseudoexon screening impossible in a signicant fraction
of our polyposis families (Table 1); hence, investigation
of larger series is necessary for a reliable determination
of the frequency and clinical signicance of ASE and
pseudoexon events in this disease. In the clinical context,
pathogenicity of pseudoexons requires special attention.
Considerations we point out (see Results above) as well
as recommendations valid to any splicing aberrations
[28] would apply. In our experience, next generation
sequencing on RNA and genomic DNA facilitate
pseudoexon identication and provide valuable tools to
explore the genetic basis of mutation-negative families.
MATERIALS AND METHODS
Patients and samples
The series consisted of 54 unrelated families/cases
from Finland, including 21 with classical FAP and 33
with AFAP (Table 1). Fourteen families represented a
research-based cohort from the nation-wide Hereditary
Colorectal Cancer Registry of Finland [13] lacking APC
point mutations by PTT and exon-specic screening
methods (heteroduplex analysis and Sanger sequencing)
and large rearrangements by MLPA [8] (P043-C1). The
remaining 40 families represented a clinic-based cohort
of consecutive index cases with newly diagnosed FAP
or AFAP and overlapped with the series described in ref.
[8]. These cases were recruited via clinical genetic units
of Finnish university hospitals, and cases remaining APC
mutation-negative after exon-specic sequencing and
MLPA were eligible (additionally, APC epimutations
were excluded by methylation-specic multiplex ligation-
dependent probe amplication [8]). MUTYH-positive
cases and occasional cases with mutations in other
polyposis-related genes were excluded. Cases with allele-
specic expression (ASE) of APC were 42, 85-1, 85-
2, 103, and 163 (ref. [13] and this study). No ASE was
detected in cases 78, 88, 92, 97, 104, and 125 [13]. The
remaining families/cases were uninformative or not tested
for ASE because of the lack of RNA (as a rule, no RNA
was available for clinic-based cases).
DNA and RNA were extracted from lymphocytes
or EBV-transformed lymphoblasts as described [13]. This
study was approved by the institutional review board
of the Helsinki University Central Hospital (Helsinki,
Finland).
Table 2: Summary of the variants
Family Location in APC
(GRCh37/GRCh38)
Insertion
length
(bp)
Genomic
variant
RNA alteration Predicted
protein
alteration
FAP42 intron 6
(5:112126337/5:112790640) 127 c.646-1806T>G r.645_646ins646-1933_646-1807 p.Arg216*
FAP85 intron 11
(5:112158419/5:112822720) 83 c.1408+731C>T r.1408_1409i
ns1408+647_1408+729 p.Gly471Serfs*55
AFAP163 intron 11
(5:112158417/5:112822722) 83 c.1408+729A>G r.1408_1409i
ns1408+647_1408+729 p.Gly471Serfs*55
Note: An updated APC nomenclature based on 16 exons (NM_012583R138.4, ENST012583R10257430) was used for exon
annotation.
Oncotarget11
www.impactjournals.com/oncotarget
Single nucleotide primer extension (SNuPE)
SNuPE uses a single dideoxynucleotide (ddNTP)
and a combination of three dNTPs for an extension reaction
where the incorporation of a ddNTP yields differential
extension of primers attached close to the polymorphic site
[13]. Four coding single nucleotide polymorphisms (SNPs)
in APC were used to study APC allele-specic expression
(cDNA compared with gDNA) as described in Pavicic et al.
[8]. ASE ratios (R) were validated against SNuPE results
from individuals not carrying any APC mutation [8]. Ratios
R≤0.6 or R≥1.67 were considered to indicate unequivocal
ASE (40% reduction of one allele relative to the other allele)
and 0.6<R<0.8 or 1.25<R<1.67 putative ASE (21 – 39%
reduction of one allele relative to the other allele). The ASE
statuses in Table 1 were assigned according to the highest
ASE ratio yielded by any of the four coding polymorphisms.
Transcriptome sequencing (RNA-seq) and
transcriptome data-analysis
RNA-seq libraries were prepared using the ribo-
depletion protocol from 12 DNAse treated total RNA
samples, including three from FAP family 85 (individuals
85-1, 85-2, and 85-3), three from the index persons
from families 42, 103 and 163, and three from proven
mutation carriers from APC-mutation positive families 3,
93, and 63. RNA-seq data for three healthy individuals
were generated for comparison. Sequencing of samples
was done using Illumina HiSeq 2000 at the Institute for
Molecular Medicine Finland (FIMM) (Helsinki, Finland).
The bioinformatics workow included correction of the
sequence data for adapter sequences, bases with low quality,
and reads less than 36-bp in length using Trimmomatics
[29]. Paired-end reads passing the pre-processing were
aligned to human reference genome build 38 (EnsEMBL
v82) using STAR [30] with the default 2-pass multi-sample
mapping settings, except that alignSJstitchMismatchNmax
was set to 0 -1 -1 -1, outSJlterCountUniqueMin
to 6 2 2 2, outSJlterCountTotalMin to 6 2 2 2, and
outSJlterDistToOtherSJmin to 10 0 0 0 in order to allow
a more sensitive recovery of mutations at splice sites.
Duplicate reads were marked with the Picard tools (http://
picard.sourceforge.net) and strandedness information
added with Bamutils [31]. Transcripts were assembled
using StringTie [32] using the EnsEMBL v82 reference
annotation le. Transcript predictions across all 12 samples
were combined to a non-redundant set of transcripts using
default parameters, except that minimum input transcript
TPM and FPKM were set to 0.5.
RNA-sequencing data variant calling and ASE
analysis
Allele-specic expression of genes was quantied
using Genome Analysis Toolkit (GATK) package [33]
and ASE deceptions algorithm MBASED [34]. Briey,
pre-processed and mapped reads were split into exon
segments using GATK SplitNCigarReads, local indel
realignment was performed around indels using GATK
IndelRealigner, and base qualities were recalibrated using
GATK BaseQualityScoreRecalibration. Variants were
called using GATK HaplotypeCaller and ltered using
GATK VariantFiltration according to the best practice
recommendations regarding the RNA-seq variant analysis
workow. Multi allelic sites were removed with GATK
SelectVariants and non-heterozygous variants and variants
falling outside of StringTie-called exon regions extended
by 3 bp discareded with GATK VariantFiltration. The ASE
deceptions algorithm MBASED [34] was then applied for
each variant set to infer the probability of ASE in genes
listed in EnsEMBL v82 and having ≥ 2 variants. Default
non-phased ASE calling settings were used, except that
dispersion estimate was set to 0.004 and the probability
to detected haplotype 1 supporting reads was set to the
average fraction of aligned reads supporting haplotype 1
variants with coverage ≥ 30 in the given sample. Sequence
data was visualized using Integrative Genomics Viewer
(IGV) browser [35]. Supplementary Table S3 outlines the
performance of our RNA-seq and ASE experiments.
Whole genome sequencing (WGS)
WGS was applied to DNAs from individuals 85-
1, 85-2, and 85-3 from FAP family 85 as well as index
patients from FAP families 42 and 103. Briey, DNA
was extracted from blood samples and KAPA and
ThruPLEX sequencing libraries prepared according
to the manufacturer’s instructions. Sequencing was
then conducted using Illumina HiSeq 2000 platform
with KAPA and ThruPLEX libraries at the Institute for
Molecular Medicine Finland (FIMM) (Helsinki, Finland).
Sequencing data was analyzed by the FIMM variant
calling pipeline version (VCP) 3.1 [36], including quality
control of raw reads before and after alignment, pre-
processing of reads for sequencing artifacts, alignment
of reads to the human reference genome (build 19) using
the Burrows-Wheeler Alignment (BWA) software [37],
and calling of variants with the samtools package [38].
The minimum acceptable read depth for a variant was 7.
Variant data were then analyzed by the VarSeq® software
version 1.3.2 (Golden Helix, Inc., Bozeman, MT, www.
goldenhelix.com). Genotype quality (difference between
the Phred-scale likelihoods of the two most likely
genotypes) was assessed on a scale between 0 and 99
and variants with genotype quality less than 70 excluded.
All common variants with minor allele frequency (MAF)
≥0.001 were removed. Only heterozygote variants were
considered in agreement with dominant inheritance in
FAP family 85 and any inheritance pattern was accepted
in the index patients from FAP families 42 and 103
(sporadic cases, Table 1). The identied variants were
Oncotarget12
www.impactjournals.com/oncotarget
checked against ExAC (http://exac.broadinstitute.org)
and SISu databases (www.sisuproject.) as well as
Ensembl database (http://www.ensembl.org) to assess
population frequencies. Sequence data was visualized
using Integrative Genomics Viewer (IGV) browser [35].
Supplementary Table S4 lists some essential performance
characteristics for the WGS experiments.
Vercation of pseudoexons by sanger sequencing
To verify the pseudoexons identied by RNA-
seq, relevant fragments of APC cDNA were amplied
with primers from Spier et al. [6] and Sanger sequenced.
The 11/12 pseudoexon was additionally veried from
a cDNA fragment from exon 11 to exon 13 (amplied
with primers given in Supplementary Table S1).
Moreover, cDNA fragments 2 and exon 11 – 13 (Figure
3, Supplementary Table S1) were cloned into a pCR2.1
TOPO vector using the TOPO TA Cloning system
(Invitrogen, Carlsbad, CA, USA) and DNAs extracted
from the resulting white colonies were sequenced. The
genomic variants discovered by WGS were conrmed by
Sanger sequencing using primers around the respective
nucleotide substitutions in introns 6 and 11 of APC
(Supplementary Table S1).
URL addresses for web resources used
Berkeley Drosophila Gene Project (BDGP), http://
www.fruity.org/seq_tools/splice.html
Ensembl, http://www.ensembl.org
Exome Aggregation Consortium (ExAC), http://
exac.broadinstitute.org
InSiGHT, http://insight-group.org
Sequencing Initiative Suomi (SISu), www.
sisuproject.
PICARD, http://picard.sourceforge.net
GATK, https://software.broadinstitute.org/gatk/
VarSeqTM, http://www.goldenhelix.com
ACKNOWLEDGMENTS
We thank the patients and responsible clinical
experts for participation. Tuula Lehtinen and Beatriz
Alcala-Repo are thanked for collecting clinical data and
Saila Saarinen for expert technical assistance.
CONFLICTS OF INTEREST
No conicts of interests.
GRANT SUPPORT
This work was supported by the European Research
Council (FP7-ERC-232635), the Academy of Finland
(grants no. 257795 and 294643), the Finnish Cancer
Organizations, the Sigrid Juselius Foundation, the Nordic
Cancer Union, Jane and Aatos Erkko Foundation, Maud
Kuistila Memorial Foundation, and Biocentrum Helsinki.
REFERENCES
1.
Leoz ML, Carballal S, Moreira L, Ocana T, Balaguer F.
The genetic basis of familial adenomatous polyposis and
its implications for clinical practice and risk management.
Appl Clin Genet. 2015; 8:95-107.
2.
Aretz S, Stienen D, Friedrichs N, Stemmler S, Uhlhaas S,
Rahner N, Propping P, Friedl W. Somatic APC mosaicism:
a frequent cause of familial adenomatous polyposis (FAP).
Hum Mutat. 2007; 28:985-992.
3.
Grover S, Kastrinos F, Steyerberg EW, Cook EF,
Dewanwala A, Burbidge LA, Wenstrup RJ, Syngal S.
Prevalence and phenotypes of APC and MUTYH mutations
in patients with multiple colorectal adenomas. JAMA.
2012; 308:485-492.
4.
Hegde M, Ferber M, Mao R, Samowitz W, Ganguly A,
Working Group of the American College of Medical
Genetics and Genomics (ACMG) Laboratory Quality
Assurance Committee. ACMG technical standards and
guidelines for genetic testing for inherited colorectal
cancer (Lynch syndrome, familial adenomatous polyposis,
and MYH-associated polyposis). Genet Med. 2014;
16:101-116.
5.
Necker J, Kovac M, Attenhofer M, Reichlin B, Heinimann
K. Detection of APC germ line mosaicism in patients with
de novo familial adenomatous polyposis: a plea for the
protein truncation test. J Med Genet. 2011; 48:526-529.
6.
Spier I, Horpaopan S, Vogt S, Uhlhaas S, Morak M, Stienen
D, Draaken M, Ludwig M, Holinski-Feder E, Nothen
MM, Hoffmann P, Aretz S. Deep intronic APC mutations
explain a substantial proportion of patients with familial
or early-onset adenomatous polyposis. Hum Mutat. 2012;
33:1045-1050.
7.
Shirts BH, Salipante SJ, Casadei S, Ryan S, Martin J,
Jacobson A, Vlaskin T, Koehler K, Livingston RJ, King
MC, Walsh T, Pritchard CC. Deep sequencing with
intronic capture enables identication of an APC exon 10
inversion in a patient with polyposis. Genet Med. 2014;
16:783-786.
8.
Pavicic W, Nieminen TT, Gylling A, Pursiheimo JP, Laiho
A, Gyenesei A, Jarvinen HJ, Peltomaki P. Promoter-specic
alterations of APC are a rare cause for mutation-negative
familial adenomatous polyposis. Genes chromosomes
cancer. 2014; 53:857-864.
9.
Bellido F, Pineda M, Aiza G, Valdes-Mas R, Navarro M,
Puente DA, Pons T, Gonzalez S, Iglesias S, Darder E, Pinol
V, Soto JL, Valencia A, et al. POLE and POLD1 mutations
in 529 kindred with familial colorectal cancer and/or
polyposis: review of reported cases and recommendations
for genetic testing and surveillance. Genet Med.
2016;325-332.
Oncotarget13
www.impactjournals.com/oncotarget
10.
Lammi L, Arte S, Somer M, Jarvinen H, Lahermo P,
Thesleff I, Pirinen S, Nieminen P. Mutations in AXIN2
cause familial tooth agenesis and predispose to colorectal
cancer. Am J Hum Genet. 2004; 74:1043-1050.
11. de la Chapelle A. Genetic predisposition to human disease:
allele-specic expression and low-penetrance regulatory
loci. Oncogene. 2009; 28:3345-3348.
12.
Castellsague E, Gonzalez S, Guino E, Stevens KN, Borras
E, Raymond VM, Lazaro C, Blanco I, Gruber SB, Capella
G. Allele-specic expression of APC in adenomatous
polyposis families. Gastroenterology. 2010; 139:439-47.
13.
Renkonen ET, Nieminen P, Abdel-Rahman WM, Moisio AL,
Jarvela I, Arte S, Jarvinen HJ, Peltomaki P. Adenomatous
polyposis families that screen APC mutation-negative by
conventional methods are genetically heterogeneous. J Clin
Oncol. 2005; 23:5651-5659.
14.
Aceto GM, Fantini F, De Iure S, Di Nicola M, Palka G,
Valanzano R, Di Gregorio P, Stigliano V, Genuardi M,
Battista P, Cama A, Curia MC. Correlation between
mutations and mRNA expression of APC and MUTYH
genes: new insight into hereditary colorectal polyposis
predisposition. J Exp Clin Cancer Res. 2015; 34:131.
15.
Curia MC, De Iure S, De Lellis L, Veschi S, Mammarella
S, White MJ, Bartlett J, Di Iorio A, Amatetti C, Lombardo
M, Di Gregorio P, Battista P, Mariani-Costantini R, et al.
Increased variance in germline allele-specic expression of
APC associates with colorectal cancer. Gastroenterology.
2012; 142:71-77.
16.
Maquat LE. Defects in RNA splicing and the consequence
of shortened translational reading frames. Am J Hum Genet.
1996; 59:279-286.
17.
Faustino NA, Cooper TA. Pre-mRNA splicing and human
disease. Genes Dev. 2003; 17:419-437.
18.
Dhir A, Buratti E. Alternative splicing: role of pseudoexons
in human disease and potential therapeutic strategies. FEBS
J. 2010; 277:841-855.
19.
Romano M, Buratti E, Baralle D. Role of pseudoexons and
pseudointrons in human cancer. Int J Cell Biol. 2013: 810572.
20.
Cavalieri S, Pozzi E, Gatti RA, Brusco A. Deep-intronic
ATM mutation detected by genomic resequencing and
corrected in vitro by antisense morpholino oligonucleotide
(AMO). Eur J Hum Genet. 2013; 21:774-778.
21.
Anczukow O, Buisson M, Leone M, Coutanson C, Lasset
C, Calender A, Sinilnikova OM, Mazoyer S. BRCA2 deep
intronic mutation causing activation of a cryptic exon:
opening toward a new preventive therapeutic strategy. Clin
Cancer Res. 2012; 18: 4903-4909.
22.
Clendenning M, Buchanan DD, Walsh MD, Nagler B,
Rosty C, Thompson B, Spurdle AB, Hopper JL, Jenkins
MA, Young JP. Mutation deep within an intron of MSH2
causes Lynch syndrome. Fam Cancer. 2011; 10:297-301.
23.
Moisio AL, Jarvinen H, Peltomaki P. Genetic and clinical
characterisation of familial adenomatous polyposis: a
population based study. Gut. 2002; 50:845-850.
24.
Half E, Bercovich D, Rozen P. Familial adenomatous
polyposis. Orphanet J Rare Dis. 2009; 4:22.
25.
Lappalainen T, Sammeth M, Friedlander MR, ’t Hoen
PA, Monlong J, Rivas MA, Gonzalez-Porta M, Kurbatova
N, Griebel T, Ferreira PG, Barann M, Wieland T,
Greger L, et al. Transcriptome and genome sequencing
uncovers functional variation in humans. Nature. 2013;
501:506-511.
26.
Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci
A, Leng J, Bjornson R, Kong Y, Kitabayashi N, Bhardwaj
N, Rubin M, Snyder M, et al. AlleleSeq: analysis of allele-
specic expression and binding in a network framework.
Mol Syst Biol. 2011; 7:522.
27.
Pritchard CC, Smith C, Salipante SJ, Lee MK, Thornton
AM, Nord AS, Gulden C, Kupfer SS, Swisher EM, Bennett
RL, Novetsky AP, Jarvik GP, Olopade OI, et al. ColoSeq
provides comprehensive lynch and polyposis syndrome
mutational analysis using massively parallel sequencing. J
Mol Diagn. 2012; 14:357-366.
28.
Spurdle AB, Couch FJ, Hogervorst FB, Radice P,
Sinilnikova OM, IARC Unclassied Genetic Variants
Working Group. Prediction and assessment of splicing
alterations: implications for clinical testing. Hum Mutat.
2008; 29:1304-1313.
29.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a exible
trimmer for Illumina sequence data. Bioinformatics. 2014;
30:2114-2120.
30.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski
C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR:
ultrafast universal RNA-seq aligner. Bioinformatics. 2013;
29:15-21.
31.
Breese MR, Liu Y. NGSUtils: a software suite for analyzing
and manipulating next-generation sequencing datasets.
Bioinformatics. 2013; 29:494-496.
32.
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell
JT, Salzberg SL. StringTie enables improved reconstruction
of a transcriptome from RNA-seq reads. Nat Biotechnol.
2015; 33:290-295.
33.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis
K, Kernytsky A, Garimella K, Altshuler D, Gabriel S,
Daly M, DePristo MA. The Genome Analysis Toolkit: a
MapReduce framework for analyzing next-generation DNA
sequencing data. Genome Res. 2010; 20:1297-1303.
34.
Mayba O, Gilbert HN, Liu J, Haverty PM, Jhunjhunwala S,
Jiang Z, Watanabe C, Zhang Z. MBASED: allele-specic
expression detection in cancer tissues and cell lines.
Genome Biol. 2014; 15:405.
35.
Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative
Genomics Viewer (IGV): high-performance genomics
data visualization and exploration. Brief Bioinform. 2013;
14:178-192.
36.
Sulonen AM, Ellonen P, Almusa H, Lepisto M, Eldfors S,
Hannula S, Miettinen T, Tyynismaa H, Salo P, Heckman C,
Joensuu H, Raivio T, Suomalainen A, et al. Comparison of
Oncotarget14
www.impactjournals.com/oncotarget
solution-based exome capture methods for next generation
sequencing. Genome Biol. 2011; 12:R94.
37.
Li H, Durbin R. Fast and accurate short read alignment
with Burrows-Wheeler transform. Bioinformatics. 2009;
25:1754-1760.
38.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer
N, Marth G, Abecasis G, Durbin R, 1000 Genome Project
Data Processing Subgroup. The Sequence Alignment/Map
format and SAMtools. Bioinformatics. 2009; 25:2078-2079.
... Mosaic APC variants and deep intronic variants localized in regions not covered by PCR-based diagnostics were previously identified as additional causal factors. Using RNA-based assays and next-generation sequencing (NGS), it has been shown that a proportion of variant-negative FAP patients harbor molecular changes in deep intronic regions of APC [19,20]. These studies identified deep intronic APC variants that result in pseudoexon formation [19,20]. ...
... Using RNA-based assays and next-generation sequencing (NGS), it has been shown that a proportion of variant-negative FAP patients harbor molecular changes in deep intronic regions of APC [19,20]. These studies identified deep intronic APC variants that result in pseudoexon formation [19,20]. Through the use of sensitive techniques, somatic APC mosaicism has been demonstrated in a minority of adenomatous polyposis patients [21][22][23][24][25][26]. ...
... Deep NGS of APC was performed to identify possible undetected pathogenic mosaic variants. Furthermore, APC intronic germline variants described previously [19,20] were studied to evaluate their role. A high-risk cohort was selected for this study, consisting of 80 index patients with ≥ 50 colorectal polyps (Table 1), of whom many had a relatively early onset, which increases the probability of finding undiscovered mosaic or intronic variants. ...
Article
Full-text available
In addition to classic germline APC gene variants, APC mosaicism and deep intronic germline APC variants have also been reported to be causes of adenomatous polyposis. In this study, we investigated 80 unexplained colorectal polyposis patients without germline pathogenic variants in known polyposis predisposing genes to detect mosaic and deep intronic APC variants. All patients developed more than 50 colorectal polyps, with adenomas being predominantly observed. To detect APC mosaicism, we performed next-generation sequencing (NGS) in leukocyte DNA. Furthermore, using Sanger sequencing, the cohort was screened for the following previously reported deep intronic pathogenic germline APC variants: c.1408 + 731C > T, p.(Gly471Serfs*55), c.1408 + 735A > T, p.(Gly471Serfs*55), c.1408 + 729A > G, p.(Gly471Serfs*55) and c.532-941G > A, p.(Phe178Argfs*22). We did not detect mosaic or intronic APC variants in the screened unexplained colorectal polyposis patients. The results of this study indicate that the deep intronic APC variants investigated in this study are not a cause of colorectal polyposis in this Dutch population. In addition, NGS did not detect any further mosaic variants in our cohort.
... However, some highpenetrance mutations may involve promoter regions, introns, or non-coding RNA. For example, mutations residing deep in the introns can activate sequences (pseudoexons) flanked by splice sites, which can result in the insertion of intronic sequences in the mature mRNA [115]. In such cases, wholegenome sequencing and RNA-sequencing would be required to catch the alteration [115]. ...
... For example, mutations residing deep in the introns can activate sequences (pseudoexons) flanked by splice sites, which can result in the insertion of intronic sequences in the mature mRNA [115]. In such cases, wholegenome sequencing and RNA-sequencing would be required to catch the alteration [115]. Likewise, for the detection of constitutional epimutations, a combination of techniques (genetic and epigenetic) would be needed. ...
Article
Full-text available
Introduction Up to one third of colorectal cancers show familial clustering and 5% are hereditary single-gene disorders. Hereditary non-polyposis colorectal cancer comprises DNA mismatch repair-deficient and -proficient subsets, represented by Lynch syndrome (LS) and familial colorectal cancer type X (FCCTX), respectively. Accurate knowledge of molecular etiology and genotype-phenotype correlations are critical for tailored cancer prevention and treatment. Areas covered The authors highlight advances in the molecular dissection of hereditary non-polyposis colorectal cancer, based on recent literature retrieved from PubMed. Future possibilities for novel gene discoveries are discussed. Expert commentary LS is molecularly well established, but new information is accumulating of the associated clinical and tumor phenotypes. FCCTX remains poorly defined, but several promising candidate genes have been discovered and share some preferential biological pathways. Multi-level characterization of specimens from large patient cohorts representing multiple populations, combined with proper bioinformatic and functional analyses, will be necessary to resolve the outstanding questions.
... APC:c.1408+743_1408+745delinsACG located in intron 13 was detected in a patient with colon polyposis (38 tubular adenomas), which are associated with APC pathogenic variants. Additionally, pseudoexon inclusion events in intron 13 of the APC gene have been previously described as pathogenic [21]. The variant was reclassified to likely pathogenic with PS3-m, PM2, PP3, and PP4 criteria. ...
Article
Full-text available
Pathogenic/likely pathogenic variants in susceptibility genes that interrupt RNA splicing are a well-documented mechanism of hereditary cancer syndromes development. However, if RNA studies are not performed, most of the variants beyond the canonical GT-AG splice site are characterized as variants of uncertain significance (VUS). To decrease the VUS burden, we have bioinformatically evaluated all novel VUS detected in 732 consecutive patients tested in the routine genetic counseling process. Twelve VUS that were predicted to cause splicing defects were selected for mRNA analysis. Here, we report a functional characterization of 12 variants located beyond the first two intronic nucleotides using RNAseq in APC, ATM, FH, LZTR1, MSH6, PALB2, RAD51C, and TP53 genes. Based on the analysis of mRNA, we have successfully reclassified 50% of investigated variants. 25% of variants were downgraded to likely benign, whereas 25% were upgraded to likely pathogenic leading to improved clinical management of the patient and the family members.
... Approximately 20% of patients with clinical FAP remain unsolved after routine molecular genetic analysis of the APC and additional polyposis genes, suggesting additional loss-of-function mechanisms. 1 Aside from an abundance of coding variants, several intronic rearrangements, a complex event that generates a complete deletion of exon 5 (c.422+1123_532-577 del ins 423-1933_423-1687 inv), or deep intronic single nucleotide variants have been described for APC. [2][3][4][5][6] The latter loss-of-function mechanisms may frequently escape detection in routine diagnostics. ...
Article
Full-text available
Purpose Approximately 20% of patients with clinical familial adenomatous polyposis (FAP) remain unsolved after molecular genetic analysis of the APC and other polyposis genes, suggesting additional pathomechanisms. Methods We applied multidimensional genomic analysis employing chromosomal microarray profiling, optical mapping, long-read genome and RNA sequencing combined with FISH and standard PCR of genomic and complementary DNA to decode a patient with an attenuated FAP that had remained unsolved by Sanger sequencing and multigene panel next-generation sequencing for years. Results We identified a complex 3.9 Mb rearrangement involving 14 fragments from chromosome 5q22.1q22.3 of which three were lost, 1 reinserted into chromosome 5 and 10 inserted into chromosome 10q21.3 in a seemingly random order and orientation thus fulfilling the major criteria of chromothripsis. The rearrangement separates APC promoter 1B from the coding ORF (open reading frame) thus leading to allele-specific downregulation of APC mRNA. The rearrangement also involves three additional genes implicated in the APC –Axin–GSK3B–β-catenin signalling pathway. Conclusions Based on comprehensive genomic analysis, we propose that constitutional chromothripsis dampening APC expression, possibly modified by additional APC –Axin–GSK3B–β-catenin pathway disruptions, underlies the patient’s clinical phenotype. The combinatorial approach we deployed provides a powerful tool set for deciphering unsolved familial polyposis and potentially other tumour syndromes and monogenic diseases.
... This CE in Adenomatous Polyposis Coli (APC-OMIM #611731) was first reported as a pathogenic inclusion by Spier et al. (2012). Remarkably, three unique donor site SNVs have been reported as being causative of pathogenic APC-11a splicing (Nieminen et al., 2016;Spier et al., 2012). All three mutations caused a phenotype of familial adenomatous polyposis (FAP), a disease characterised by colon polyps and an elevated risk of colon cancer. ...
Article
Full-text available
Background Cryptic exons are typically characterised as deleterious splicing aberrations caused by deep intronic mutations. However, low-level splicing of cryptic exons is sometimes observed in the absence of any pathogenic mutation. Five recent reports have described how low-level splicing of cryptic exons can be modulated by common single-nucleotide polymorphisms (SNPs), resulting in phenotypic differences amongst different genotypes. Methods We sought to investigate whether additional ‘SNPtic’ exons may exist, and whether these could provide an explanatory mechanism for some of the genotype–phenotype correlations revealed by genome-wide association studies. We thoroughly searched the literature for reported cryptic exons, cross-referenced their genomic coordinates against the dbSNP database of common SNPs, then screened out SNPs with no reported phenotype associations. Results This method discovered five probable SNPtic exons in the genes APC, FGB, GHRL, MYPBC3 and OTC. For four of these five exons, we observed that the phenotype associated with the SNP was compatible with the predicted splicing effect of the nucleotide change, whilst the fifth (in GHRL) likely had a more complex splice-switching effect. Conclusion Application of our search methods could augment the knowledge value of future cryptic exon reports and aid in generating better hypotheses for genome-wide association studies.
... Detection of such variants is impossible with just WES and requires simultaneous transcriptome and whole genome -based approaches. Few cases of APC pseudoexons have been reported (100,101). ...
Article
Full-text available
Hereditary colorectal cancer syndromes attributable to high penetrance mutations represent 9–26% of young-onset colorectal cancer cases. The clinical significance of many of these mutations is understood well enough to be used in diagnostics and as an aid in patient care. However, despite the advances made in the field, a significant proportion of familial and early-onset cases remains molecularly uncharacterized and extensive work is still needed to fully understand the genetic nature of colorectal cancer susceptibility. With the emergence of next generation sequencing and associated methods, several predisposition loci have been unravelled but validation is incomplete. Individuals with cancer predisposing mutations are currently enrolled in life-long surveillance, but with the development of new treatments, such as cancer vaccinations, this might change in the not so distant future for at least some individuals. For individuals without a known cause for their disease susceptibility, prevention and therapy options are less precise. Herein, we review the progress achieved in the last three decades with a focus on how colorectal cancer predisposition genes were discovered. Furthermore, we discuss the clinical implications of these discoveries and anticipate what to expect in the next decade.
... Even so, they offer hope with exposing novel mechanism of low toxicity and new opportunity for the use of SSOs in precision medicine with less off-target effect. Although SSOs confer some benefits in patients with Duchenne muscular dystrophy (DMD) [125], its role in CRC patients has not been sufficiently explored [126]. It is worth mentioning that the high polarity and charged characteristics of oligonucleotide drugs make them obvious differences between small chemical molecules and monoclonal antibody drugs in terms of drug delivery system, pharmacokinetic properties, and efficacy [110]. ...
Article
Full-text available
Alternative splicing (AS) is an important event that contributes to posttranscriptional gene regulation. This process leads to several mature transcript variants with diverse physiological functions. Indeed, disruption of various aspects of this multistep process, such as cis- or trans- factor alteration, promotes the progression of colorectal cancer. Therefore, targeting some specific processes of AS may be an effective therapeutic strategy for treating cancer. Here, we provide an overview of the AS events related to colorectal cancer based on research done in the past 5 years. We focus on the mechanisms and functions of variant products of AS that are relevant to malignant hallmarks, with an emphasis on variants with clinical significance. In addition, novel strategies for exploiting the therapeutic value of AS events are discussed.
... Moreover, WGS of patient cohorts might facilitate discovery of missed non-coding variants in known hCRC and polyposis genes. In the past, deep-intronic and promoter variants were described in tumor suppressor genes APC and PTEN, which makes sequencing of these non-coding regions of particular interest for unresolved hCRC and polyposis patients [98][99][100][101][102][103]. Long-read sequencing and optical mapping techniques might be valuable as well, as these techniques are specifically directed to the detection of complex and structural variants, and allow alignment and variant mapping in regions that used to be uncovered in the past due to their nucleotide composition (e.g., extreme GC-rich, and multiple short repeats) [104,105]. ...
Article
Full-text available
To discover novel high-penetrant risk loci for hereditary colorectal cancer (hCRC) and polyposis syndromes many whole-exome and whole-genome sequencing (WES/WGS) studies have been performed. Remarkably, these studies resulted in only a few novel high-penetrant risk genes. Given this observation, the possibility and strategy to identify high-penetrant risk genes for hCRC and polyposis needs reconsideration. Therefore, we reviewed the study design of WES/WGS-based hCRC and polyposis gene discovery studies (n = 37) and provide recommendations to optimize discovery and validation strategies. The group of genetically unresolved patients is phenotypically heterogeneous, and likely composed of distinct molecular subtypes. This knowledge advocates for the screening of a homogeneous, stringently preselected discovery cohort and obtaining multi-level evidence for variant pathogenicity. This evidence can be collected by characterizing the molecular landscape of tumors from individuals with the same affected gene or by functional validation in cell-based models. Together, the combined approach of a phenotype-driven, tumor-based candidate gene search might elucidate the potential contribution of novel genetic predispositions in genetically unresolved hCRC and polyposis.
Article
Full-text available
Understanding pre-mRNA splicing is crucial to accurately diagnosing and treating genetic diseases. However, mutations that alter splicing can exert highly diverse effects. Of all the known types of splicing mutations, perhaps the rarest and most difficult to predict are those that activate pseudoexons, sometimes also called cryptic exons. Unlike other splicing mutations that either destroy or redirect existing splice events, pseudoexon mutations appear to create entirely new exons within introns. Since exon definition in vertebrates requires coordinated arrangements of numerous RNA motifs, one might expect that pseudoexons would only arise when rearrangements of intronic DNA create novel exons by chance. Surprisingly, although such mutations do occur, a far more common cause of pseudoexons is deep-intronic single nucleotide variants, raising the question of why these latent exon-like tracts near the mutation sites have not already been purged from the genome by the evolutionary advantage of more efficient splicing. Possible answers may lie in deep intronic splicing processes such as recursive splicing or poison exon splicing. Because these processes utilize intronic motifs that benignly engage with the spliceosome, the regions involved may be more susceptible to exonization than other intronic regions would be. We speculated that a comprehensive study of reported pseudoexons might detect alignments with known deep intronic splice sites and could also permit the characterisation of novel pseudoexon categories. In this report, we present and analyse a catalogue of over 400 published pseudoexon splice events. In addition to confirming prior observations of the most common pseudoexon mutation types, the size of this catalogue also enabled us to suggest new categories for some of the rarer types of pseudoexon mutation. By comparing our catalogue against published datasets of non-canonical splice events, we also found that 15.7% of pseudoexons exhibit some splicing activity at one or both of their splice sites in non-mutant cells. Importantly, this included seven examples of experimentally confirmed recursive splice sites, confirming for the first time a long-suspected link between these two splicing phenomena. These findings have the potential to improve the fidelity of genetic diagnostics and reveal new targets for splice-modulating therapies.
Article
Gene panel and whole exome sequencing are now commonly used to detect Mendelian disease, but the current molecular diagnostic rate of DNA sequencing is only 35%-50%. In recent years, RNA sequencing emerges as a promising diagnostic method. It can detect new pathogenic mutations, and analyze allele-specific expression. This will be helpful to understand the relationship between disease genotype and phenotype, and can complement genome sequencing in order to expand the traditional genomic diagnostic methods of Mendelian disease. RNA sequencing is expected to become a routine tool for diagnosing Mendelian diseases. This article reviews the application of RNA sequencing in the clinical diagnosis of Mendelian disease.
Article
Full-text available
Transcript dosage imbalance may influence the transcriptome. To gain insight into the role of altered gene expression in hereditary colorectal polyposis predisposition, in the present study we analyzed absolute and allele-specific expression (ASE) of adenomatous polyposis coli (APC) and mutY Homolog (MUTYH) genes. We analyzed DNA and RNA extracted from peripheral blood mononuclear cells (PBMC) of 49 familial polyposis patients and 42 healthy blood donors selected according similar gender and age. Patients were studied for germline alterations in both genes using dHPLC, MLPA and automated sequencing. APC and MUTYH mRNA expression levels were investigated by quantitative Real-Time PCR (qRT-PCR) analysis using TaqMan assay and by ASE assays using dHPLC-based primer extension. Twenty out of 49 patients showed germline mutations: 14 in APC gene and six in MUTYH gene. Twenty-nine patients did not show mutations in both genes. Results from qRT-PCR indicated that gene expression of both APC and MUTYH was reduced in patients analyzed. In particular, a significant reduction in APC expression was observed in patients without APC germline mutation vs control group (P < 0.05) while APC expression in the mutation carrier patients, although lower compared to control individuals, did not show statistical significance. On the other hand a significant reduced MUTYH expression was detected in patients with MUTYH mutations vs control group (P < 0.05). Altered ASE of APC was detected in four out of eight APC mutation carriers. In particular one case showed a complete loss of one allele. Among APC mutation negative cases, 4 out of 13 showed a moderate ASE. ASE of MUTYH did not show any altered expression in the cases analyzed. Spearman’s Rho Test analysis showed a positive and significant correlation between APC and MUTYH genes both in cases and in controls (P = 0.020 and P < 0.001). APC and MUTYH showed a reduced germline expression, not always corresponding to gene mutation. Expression of APC is decreased in mutation negative cases and this appears to be a promising indicator of FAP predisposition, while for MUTYH gene, mutation is associated to reduced mRNA expression. This study could improve the predictive genetic diagnosis of at-risk individuals belonging to families with reduced mRNA expression regardless of presence of mutation.
Article
Full-text available
Purpose: Germ-line mutations in the exonuclease domains of POLE and POLD1 have been recently associated with polyposis and colorectal cancer (CRC) predisposition. Here, we aimed to gain a better understanding of the phenotypic characteristics of this syndrome to establish specific criteria for POLE and POLD1 mutation screening and to help define the clinical management of mutation carriers. Methods: The exonuclease domains of POLE and POLD1 were studied in 529 kindred, 441 with familial nonpolyposis CRC and 88 with polyposis, by using pooled DNA amplification and massively parallel sequencing. Results: Seven novel or rare genetic variants were identified. In addition to the POLE p.L424V recurrent mutation in a patient with polyposis, CRC and oligodendroglioma, six novel or rare POLD1 variants (four of them, p.D316H, p.D316G, p.R409W, and p.L474P, with strong evidence for pathogenicity) were identified in nonpolyposis CRC families. Phenotypic data from these and previously reported POLE/POLD1 carriers point to an associated phenotype characterized by attenuated or oligo-adenomatous colorectal polyposis, CRC, and probably brain tumors. In addition, POLD1 mutations predispose to endometrial and breast tumors. Conclusion: Our results widen the phenotypic spectrum of the POLE/POLD1-associated syndrome and identify novel pathogenic variants. We propose guidelines for genetic testing and surveillance recommendations.
Article
Full-text available
Familial adenomatous polyposis (FAP) is an inherited disorder that represents the most common gastrointestinal polyposis syndrome. Germline mutations in the APC gene were initially identified as responsible for FAP, and later, several studies have also implicated the MUTYH gene as responsible for this disease, usually referred to as MUTYH-associated polyposis (MAP). FAP and MAP are characterized by the early onset of multiple adenomatous colorectal polyps, a high lifetime risk of colorectal cancer (CRC), and in some patients the development of extracolonic manifestations. The goal of colorectal management in these patients is to prevent CRC mortality through endoscopic and surgical approaches. Individuals with FAP and their relatives should receive appropriate genetic counseling and join surveillance programs when indicated. This review is focused on the description of the main clinical and genetic aspects of FAP associated with germline APC mutations and MAP.
Article
Full-text available
Allele-specific gene expression, ASE, is an important aspect of gene regulation. We developed a novel method MBASED, meta-analysis based allele-specific expression detection for ASE detection using RNA-seq data that aggregates information across multiple single nucleotide variation loci to obtain a gene-level measure of ASE, even when prior phasing information is unavailable. MBASED is capable of one-sample and two-sample analyses and performs well in simulations. We applied MBASED to a panel of cancer cell lines and paired tumor-normal tissue samples, and observed extensive ASE in cancer, but not normal, samples, mainly driven by genomic copy number alterations. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0405-3) contains supplementary material, which is available to authorized users.
Article
Full-text available
Although many NGS read pre-processing tools already existed, we could not find any tool or combination of tools which met our requirements in terms of flexibility, correct handling of paired-end data, and high performance. We have developed Trimmomatic as a more flexible and efficient pre-processing tool, which could correctly handle paired-end data. The value of NGS read pre-processing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output which is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available from http://www.usadellab.org/cms/index.php?page=trimmomatic CONTACT: usadel@bio1.rwth-aachen.de SUPPLEMENTARY INFORMATION: Manual and source code are available from http://www.usadellab.org/cms/index.php?page=trimmomatic.
Article
Full-text available
Purpose: Single-exon inversions have rarely been described in clinical syndromes and are challenging to detect using Sanger sequencing. We report the case of a 40-year-old woman with adenomatous colon polyps too numerous to count and who had a complex inversion spanning the entire exon 10 in APC (the gene encoding for adenomatous polyposis coli), causing exon skipping and resulting in a frameshift and premature protein truncation. Methods: In this study, we employed complete APC gene sequencing using high-coverage next-generation sequencing by ColoSeq, analysis with BreakDancer and SLOPE software, and confirmatory transcript analysis. Results: ColoSeq identified a complex small genomic rearrangement consisting of an inversion that results in translational skipping of exon 10 in the APC gene. This mutation would not have been detected by traditional sequencing or gene-dosage methods. Conclusion: We report a case of adenomatous polyposis resulting from a complex single-exon inversion. Our report highlights the benefits of large-scale sequencing methods that capture intronic sequences with high enough depth of coverage-as well as the use of informatics tools-to enable detection of small pathogenic structural rearrangements.
Article
Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.
Article
In familial adenomatous polyposis (FAP), 20% of classical and 70% of attenuated/atypical (AFAP) cases remain mutation-negative after routine testing; yet, allelic expression imbalance may suggest an APC alteration. Our aim was to determine the proportion of families attributable to genetic or epigenetic changes in the APC promoter region. We studied 51 unrelated families/cases (26 with classical FAP and 25 with AFAP) with no point mutations in the exons and exon/intron borders and no rearrangements by multiplex ligation-dependent probe amplification (MLPA, P043-B1). Promoter-specific events of APC were addressed by targeted resequencing, MLPA (P043-C1), methylation-specific MLPA, and Sanger sequencing of promoter regions. A novel 132-kb deletion encompassing the APC promoter 1B and upstream sequence occurred in a classical FAP family with allele-specific APC expression. No promoter-specific point mutations or hypermethylation were present in any family. In conclusion, promoter-specific alterations are a rare cause for mutation-negative FAP (1/51, 2%). The frequency and clinical correlations of promoter 1B deletions are poorly defined. This investigation provides frequencies of 1/26 (4%) for classical FAP, 0/25 (0%) for AFAP, and 1/7 (14%) for families with allele-specific expression of APC. Clinically, promoter 1B deletions may associate with classical FAP without extracolonic manifestations. © 2014 Wiley Periodicals, Inc.
Article
Lynch syndrome, familial adenomatous polyposis, and Mut Y homolog (MYH)-associated polyposis are three major known types of inherited colorectal cancer, which accounts for up to 5% of all colon cancer cases. Lynch syndrome is most frequently caused by mutations in the mismatch repair genes MLH1, MSH2, MSH6, and PMS2 and is inherited in an autosomal dominant manner. Familial adenomatous polyposis is manifested as colonic polyposis caused by mutations in the APC gene and is also inherited in an autosomal dominant manner. Finally, MYH-associated polyposis is caused by mutations in the MUTYH gene and is inherited in an autosomal recessive manner but may or may not be associated with polyps. There are variants of both familial adenomatous polyposis (Gardner syndrome-with extracolonic features-and Turcot syndrome, which features medulloblastoma) and Lynch syndrome (Muir-Torre syndrome features sebaceous skin carcinomas, and Turcot syndrome features glioblastomas). Although a clinical diagnosis of familial adenomatous polyposis can be made using colonoscopy, genetic testing is needed to inform at-risk relatives. Because of the overlapping phenotypes between attenuated familial adenomatous polyposis, MYH-associated polyposis, and Lynch syndrome, genetic testing is needed to distinguish among these conditions. This distinction is important, especially for women with Lynch syndrome, who are at increased risk for gynecological cancers. Clinical testing for these genes has progressed rapidly in the past few years with advances in technologies and the lower cost of reagents, especially for sequencing. To assist clinical laboratories in developing and validating testing for this group of inherited colorectal cancers, the American College of Medical Genetics and Genomics has developed the following technical standards and guidelines. An algorithm for testing is also proposed.Genet Med advance online publication 5 December 2013Genetics in Medicine (2013); doi:10.1038/gim.2013.166.