ArticlePDF Available

Genetic regulation of TERT splicing affects cancer risk by altering cellular longevity and replicative potential

Springer Nature
Nature Communications
Authors:

Abstract and Figures

The chromosome 5p15.33 region, which encodes telomerase reverse transcriptase (TERT), harbors multiple germline variants identified by genome-wide association studies (GWAS) as risk for some cancers but protective for others. Here, we characterize a variable number tandem repeat within TERT intron 6, VNTR6-1 (38-bp repeat unit), and detect a strong link between VNTR6-1 alleles (Short: 24-27 repeats, Long: 40.5-66.5 repeats) and GWAS signals rs2242652 and rs10069690 within TERT intron 4. Bioinformatics analyses reveal that rs10069690-T allele increases intron 4 retention while VNTR6-1-Long allele expands a polymorphic G-quadruplex (G4, 35-113 copies) within intron 6, with both variants contributing to variable TERT expression through alternative splicing and nonsense-mediated decay. In two cell lines, CRISPR/Cas9 deletion of VNTR6-1 increases the ratio of TERT-full-length (FL) to the alternative TERT-β isoform, promoting apoptosis and reducing cell proliferation. In contrast, treatment with G4-stabilizing ligands shifts splicing from TERT-FL to TERT-β isoform, implicating VNTR6-1 as a splicing switch. We associate the functional variants VNTR6-1, rs10069690, and their haplotypes with multi-cancer risk and age-related telomere shortening. By regulating TERT splicing, these variants may contribute to fine-tuning cellular longevity and replicative potential in the context of stress due to tissue-specific endogenous and exogenous exposures, thereby influencing the cancer risk conferred by this locus.
VNTR6-1 affects the TERT-FL:TERT-β splicing ratio a G4-ChIP results within the TERT region in the cell lines HEK-293T (VNTR6-1: Long/Long) and NA18507 (YRI, 1000 G, VNTR6-1: Short/Short) display mismatches (%) during DNA synthesis, reflecting polymerase stalling after G stabilization in both the plus (blue) and minus (orange, direction of TERT transcription) genome strands. The genomic region of TERT intron 6 includes VNTR6-1 (24-66.5 copies of the 38-bp repeat unit), VNTR6-2, G4 in the minus strand (polymorphic G4 within VNTR6-1 and constitutive upstream G4), and CRISPR/Cas9 guide RNAs for excising VNTR6-1. The sequence logo shows the consensus 38-bp VNTR6-1 repeat unit in UMUC3 cells based on long-read WGS. b Agarose gels of RT‒PCR products amplified from the cDNA of corresponding samples; gDNA–genomic DNA negative control; HPRT1 -endogenous normalization control. e Densitometry results of the PCR amplicons in the plot (b). The differences in the TERT isoform ratios are further explored in Supplementary Fig. 11. Experiments in UMUC3 cells comparing TERT splicing and isoform-specific expression after 72 h of treatment with G4 stabilizing ligands, normalized to HPRT1 in the WT (c, f) and V6.1-KO (d, g) cell lines. c, d A representative agarose gel of SYBR-Green RT‒qPCR products detecting several isoforms with primers located in exons 6 and 9. The extra PCR band, marked by a red arrow in panels (c and d), is further explored in Supplementary Fig. 12. f, g Densitometry analysis of the corresponding agarose gels evaluating TERT-FL (%) relative to the total PCR products. All analyses are based on three independent experiments, presented with means ± SD. One representative gel per experiment is shown. Comparisons were made against the vehicle control (DMSO). P-values are for unpaired two-sided Student’s T test. The source data are provided in the Source Data file.
… 
VNTR6-1 affects proliferation and apoptosis in the bladder cancer cell line UMUC3 a Analysis of real-time increase in cell counts (cell index) measured with xCELLigence over 283 h in UMUC3 cells. The WT and knockout cells (clone #2, starred samples in Supplementary Data 9) were cultured in a CS medium for 48 h, followed by a switch to fresh CS or full medium for 10 more days. Proliferation rates in response to culturing conditions were significantly lower in V6.1-KO compared to WT cells. The plots present the results of one of three independent experiments, with means ± SEM from n = 6 biological replicates. Statistical significance and β-values for differences in the cell index during the visually determined growth phase (gray highlighting between 50 and 183 h) were calculated using linear mixed-effects interaction models. The reference sample is labeled with a dotted circle; βint represents the change in growth rates between experimental groups. b, c Analysis of cell doublings in UMUC3 cells cultured in (b) CS medium or (c) full medium for four days, using the CFSE assay described in Supplementary Fig. 18c. Data presented are the results of one of three independent experiments, with means ± SD from n = 3 biological replicates. P-values in graphs are for two-way ANOVA using Tukey’s test. d, e Quantification of apoptosis in UMUC3 cells cultured for 48 h (d) with 10 μM cisplatin in CS medium or (e) in full medium, followed by Annexin V/PI staining and flow cytometry analysis to determine the percentage of apoptotic cells. Data presented are the results of one of three independent experiments, with means ± SD from n = 3 biological replicates (normalized to values of CS media and the WT groups). P-values in graphs were determined by one-way ANOVA using Dunnett’s test. f, g Gene set enrichment analysis (GSEA) for differential expression of genes involved in (f) pathways related to the downregulation of cellular processes for proliferation-promotion and (g) apoptosis-resistance, as identified by RNA-seq analysis comparing UMUC3 V6.1-KO (clone #3) to WT UMUC3 cells. Genes highlighted in blue are common to both pathways. The source data for all panels are provided in the Source Data file.
… 
This content is subject to copyright. Terms and conditions apply.
Article https://doi.org/10.1038/s41467-025-56947-y
Genetic regulation of TERT splicing affects
cancer risk by altering cellular longevity and
replicative potential
Oscar Florez-Vargas
1
,MichelleHo
1
, Maxwell H. Hogshead
1
,
Brenen W. Papenberg
1
, Chia-Han Lee
1
, Kaitlin Forsythe
1
, Kristine Jones
2
,
Wen Luo
2
, Kedest Teshome
2
, Cornelis Blauwendraat
3
,KimberlyJ.Billingsley
3
,
Mikhail Kolmogorov
4
,MelissaMeredith
5
, Benedict Paten
5
,RajChari
6
,
Chi Zhang
2
, John S. Schneekloth
7
,MitchellJ.Machiela
8
,
Stephen J. Chanock
9
, Shahinaz M. Gadalla
10
,SharonA.Savage
10
,
Sam M. Mbulaiteye
11
& Ludmila Prokunina-Olsson
1
The chromosome 5p15.33 region, which encodes telomerase reverse tran-
scriptase (TERT), harbors multiple germline variants identied by genome-
wide association studies (GWAS) as risk for some cancers but protective for
others. Here, we characterize a variable number tandem repeat within TERT
intron 6, VNTR6-1 (38-bp repeat unit), and detect a strong link between VNTR6-
1 alleles (Short: 24-27 repeats, Long: 40.5-66.5 repeats) and GWAS signals
rs2242652 and rs10069690 within TERT intron 4. Bioinformatics analyses
reveal that rs10069690-T allele increases intron 4 retention while VNTR6-1-
Long allele expands a polymorphic G-quadruplex (G4, 35-113 copies) within
intron 6, with both variants contributing to variable TERT expression through
alternative splicing and nonsense-mediated decay. In two cell lines, CRISPR/
Cas9 deletion of VNTR6-1increasestheratioofTERT-full-length (FL) to the
alternative TERT-βisoform, promoting apoptosis and reducing cell prolifera-
tion. In contrast, treatment with G4-stabilizing ligands shifts splicing from
TERT-FL to TERT-βisoform, implicating VNTR6-1 as a splicing switch. We
associate the functional variants VNTR6-1, rs10069690, and their haplotypes
with multi-cancer risk and age-related telomere shortening. By regulating TERT
splicing, these variants may contribute to ne-tuning cellular longevity and
replicative potential in the context of stress due to tissue-specicendogenous
and exogenous exposures, thereby inuencing the cancer risk conferred by
this locus.
At least ten independent GWAS signals within the ~100 kb genomic
region on chromosome 5p15.33 harboring TERT and CLPTM1L have
been associated with cancer risk or protection14.TERT encodes the
catalytic subunit of telomerase, a reverse transcriptase that extends
telomeric repeats at chromosome ends to maintain telomere length
and genome integrity5, with telomerase dysfunction implicated in
many human diseases6.CLPTM1L encodes a putative oncogene that
promotes cancer cell growth and resistance to apoptosis7,8.GWAS-
identied signals might be causal or tag some known or yet unknown
functional polymorphisms. Thus, identifying these variants and the
Received: 4 July 2024
Accepted: 6 February 2025
Check for updates
A full list of afliations appears at the end of the paper. e-mail: prokuninal@mail.nih.gov
Nature Communications | (2025) 16:1676 1
1234567890():,;
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
mechanisms underlying their associations may improve the under-
standing of the etiology and biology of these cancers, leading to
optimized cancer risk prediction, prevention, and treatment.
Several variable number tandem repeats (VNTRs) have been
reported within the 5p15.33 region9,10, but their characterization has
been limited due to high variability, complexity, and length of genomic
fragments extended by repeats. Advances in long-read genome
sequencing and assembly11 have resolved many genomic gaps,
enabling deeper exploration of complex regions such as VNTRs, which
remain challenging to analyze with short-read whole-genome
sequencing (WGS) or PCR-based methods. Recent examples12 have
shown that VNTRs might account for or contribute to GWAS signalsfor
cancer and other human traits, expanding the list of potentially func-
tional variants to consider.
Here, hypothesizing that VNTRs might be responsible for some of
the 5p15.33 GWAS signals, we explore two VNTRs within TERT intron 6
in relation to multi-cancer GWAS signals reported in this region.
Among these signals, we detect a strong link only between VNTR6-1
and two single nucleotide polymorphisms (SNPs)rs2242652 and
rs10069690within TERT intron 4. Specically, we preferentially link
VNTR6-1 Long alleles (40.566.5 repeats), in contrast with Short alleles
(2427 repeats), with the rs2242652-A and rs10069690-T alleles, both
of which were previously associated with a reduced risk of bladder4
and prostate cancer13 but an elevated risk of glioma14,breast
15,16,and
ovarian cancer17. We present a comprehensive genetic and functional
analysis of VNTR6-1 and its linked GWAS signals.
Results
VNTR6-1 is linked with multi-cancer GWAS signals rs2242652
and rs10069690
We explored two previously reported but minimally characterized
VNTRs9,10 within TERT intron6inrelationtoallcancer-relatedGWAS
signals within the 5p15.33 multi-cancer region14. First, we analyzed 452
long-read WGS assemblies from 226 controls of diverse ancestries
generated by the Human Pangenome Reference Consortium (HPRC)11
and the Center for Alzheimers and Related Dementias (CARD)18.The
strongest associations were detected for VNTR6-1 (38-bp repeat unit,
range 2466.5 repeats in the assemblies), with more repeats detected
in assemblies with the rs2242652-A (p= 5.93E-19) and rs10069690-T
(p= 5.40E-11) alleles compared with the alternative alleles at these SNPs
located within the TERT intron 4 (Fig. 1a, b, Supplementary Fig. 1 and
Supplementary Data 1). In contrast, VNTR6-2 (36-bp repeat unit, range
20
30
40
50
60
70
80
rs10069690 rs2242652
VNTR6−1, repeat copies
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
rs10069690 rs2242652
VNTR6−2, repeat copies
CT
n=328 n=124
GA
n=383 n=69
CT
n=328 n=124
GA
n=383 n=69
p= 5.40-11 p= 5.93E-19 p= 0.84 p= 7.66E-04
24
25.5
27
40.5
66.5
bc
chr5 (p15.33) 15.2 13.3 12 5q11.2 13.2 5q14.3 5q15 21.3 23.1 23.2 31.1 31.3 q32 5q34
TERT exons
VNTR6-2
VNTR6-1
rs10069690
rs2242652
6
7
a
54
26.3 31.8 26.4 35.6 60.5 59.3 59.0 67.0
Fig. 1 | Analysis of VNTR6-1 andVNTR6-2 within TERT intron6 in relation to the
multi-cancer GWAS signals rs10069690 and rs2242652. a The chr 5p15.33
genomic region with GWAS signals r s10069690 and rs2242652 within TERT intron 4
and VNTRs within intron 6. b,cDistribution of repeat copies in 226 controls of
diverse ancestries (n= 452 long-read WGS assemblies, with individual sample sizes
for each group indicated in the gure) for (b) VNTR6-1 (38-bp repeat unit) and (c)
VNTR6-2 (36-bp repeat unit). The dots represent repeat copies for each chromo-
some assembly. The box plots dene the minima and maxima (ends of the whis-
kers), the center (median, shown as a horizontal black line),the bounds of the box
(rst and third quartiles, representing the interquartile range), and the means
(black dots, with values displayed abovethe corresponding plots). Half-violin plots
show thedensity distribution of the data. Five VNTR6-1alleles24,25.5, 27, and 40.5
repeats were observed above the 5% frequency threshold and accounted for
90.04% of all alleles in the set; VNTR6-2 alleles were scattered between 8 and 155
repeat copies, all under the 5% threshold. P-values were calculated for unpaired
two-sided WilcoxonMannWhitney tests comparing the number of repeat copies
betweenthe genotype groups. The source dataare provided in the Source Data le.
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 2
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8-155 repeats in the assemblies) was moderately associated with
rs2242652-A (p= 7.66E-04) and some other GWAS signals, but not
rs10069690-T allele (p= 0.84, Fig. 1c and Supplementary Data 1). Thus,
we focused on VNTR6-1 as a potential proxy for the multi-cancer GWAS
signals rs2242652 and rs10069690.
Since long-read WGS resources are limited, we performed long-
read targeted sequencing of the VNTR6-1 PCR amplicon (2126-
3750 bp) in various samples (Supplementary Data 2). This analysis
conrmed the concordance in repeat scoring between targeted
sequencing and WGS for ten HPRC controls with available assemblies11,
as well as between ve bladder tumors and paired tumor-adjacent
normal bladder tissues. In addition, it conrmed Mendelian segrega-
tion of VNTR6-1 alleles in HapMap samples from 30 European (Central
European from Utah, CEU) and 30 African (Yoruba, YRI) family trios.
Although reliable, long-read sequencing is an expensive, labor-inten-
sive, and low-throughput method that requires signicant amounts of
high-quality DNA. To facilitate its evaluation in large-scale association
studies, we explored additional approaches for VNTR6-1 analysis.
We noted that in both the assemblies and HapMap samples with
targeted sequencing data, VNTR6-1 alleles clustered into two main
groups (Fig. 1, Supplementary Data 2). In the HapMap samples, these
groups included alleles designated as Short (CEU: mean 25.8 ± 1.0
repeats, 83.3%; YRI: mean 27.0 ± 2.0 repeats, 80%) and Long (CEU:
mean 40.5 ± 0 repeats, 16.7%; YRI: mean 43.75 ± 8.8 repeats, 20%), with
an uncommon 66.5-repeat allele detected at 2.5% frequency only in
African-ancestry individuals (Supplementary Data 2). We also explored
short-read WGS proles for all individuals of diverse ancestries from
the 1000 Genomes Project (1000 G, n= 3201).
WGS proles of samples with only the VNTR6-1-Short alleles
(2427 repeats based on the assemblies or targeted sequencing)
appeared similar to the reference human genome (GRCh38, Short
allele with 27 repeats),whereas prominent read pileups were observed
in samples with at least one copy of the Long allele (40.5-66.5 repeats),
with no further separation within these groups (Supplementary
Fig. 2a). A supervised-learning approach classied all the 1000 G
samples as carriers of at least one copy of the VNTR6-1-Long allele
(Long/any genotype) or the Short/Short genotype (Supplementary
Fig. 2b, c and Supplementary Data 3). Treating these classications as
true VNTR6-1 genotypes, we used all SNPs within the 400 kb genomic
region (GRCh38 chr5:1,100,000-1,500,000) to construct a random
forest classier across all 1000 G samples. This analysis identied two
SNPs, rs56345976 and rs33961405, located 704 bp apart within the
VNTR6-1 amplicon, as the best predictors of VNTR6-1 groups across
populations despite differences in linkage disequilibrium (LD) proles
(Supplementary Fig. 3, Supplementary Data 4 and Supplementary
Data 5). Although these SNPs were not sufciently informative indivi-
dually, the rs56345976-A/rs33961405-G haplotype separated the car-
riers of VNTR6-1-Long alleles from those with the VNTR6-1-Short/Short
genotypes dened by three other haplotypes of these two SNPs (AA,
GG, and GA) (Supplementary Fig. 4 and Supplementary Data 3). This
classication was consistent with the repeat sizes determined by the
long-read assemblies and targeted sequencing (Supplementary Fig. 5
and Supplementary Data 6).
We also created a custom imputation reference panel of the 400-
kb region in all 1000 G samples. VNTR6-1 was incorporated into this
panel as a biallelic marker with Short/Long alleles determined based on
rs56345976/rs33961405 haplotypes (Supplementary Data 3). The
1000 G dataset was randomly split into two equal groups, using the
rst group as a reference for imputation in the second group,
achieving 99.3% concordance with predetermined VNTR6-1 geno-
types. These results demonstrated that in WGS datasets, VNTR6-1
could be condently imputed as a biallelic marker with Short/Long
alleles. In 1000 G European populations (1000G-EUR), the VNTR6-1-
Long allele was most strongly linked with rs2242652-A (r2= 0.62) and
rs10069690-T (r2= 0.48, Supplementary Data 5 and Supplementary
Fig. 6), suggesting that VNTR6-1 might contribute to associations
detected for these GWAS signals.
VNTR6-1 creates an expandable G-quadruplex that modulates
TERT splicing
VNTR6-1 is located ~3.5 kb upstream of TERT exon 7 (Fig. 2a). Simul-
taneous inclusion or skipping of exons 7 and 8 denes the TERT-full-
length (TERT-FL) or TERT-βisoform, respectively19. To assess its role in
this splicing pattern, we deleted the VNTR6-1 region (2241 bp in the
reference human genome) by CRISPR/Cas9 editing. Partial deletion of
this highly repetitive genomic region was not possible. We established
three stable isogenic VNTR6-1 knockout clones (V6.1-KO) in UMUC3, a
bladder cancer cell line with high TERT expression (DepMap tran-
scripts per million (TPM) = 6.78; Fig. 2a and Supplementary Fig. 7a, b)
and two clones in A549, a lung cancer cell line with moderate TERT
expression (TPM = 3.63, Supplementary Fig. 8a). We considered these
knockouts and their parental wild-type (WT) cell lines as VNTR6-1
extremes (none vs. 24 repeats), which can be used as isogenic models
for VNTR6-1-Short and -Long alleles, respectively. VNTR6-1 knockout
increased the inclusion of exons 7 and 8 in both cell lines, increasing
the TERT-FL fraction from ~ 45% to 71% in UMUC3 (Fig. 2b, e, Supple-
mentary Fig. 7cf) and from 39% to 58% in A549 (Supplementary
Fig. 8b, c). These results suggest that VNTR6-1 acts as a splicing switch
between the TERT-FL (greater fraction in knockout cells) and TERT-β
(greater fraction in WT cells).
In search of functional features that could explain our observa-
tions, we found no evidence of differential DNA methylation (Sup-
plementary Fig. 9) or long-range chromatin interactions
(Supplementary Fig. 10) involving the VNTR6-1 region. However, we
noted a high G content within the 38-bp consensus repeat sequence of
VNTR6-1: (5-GGTGGGGATCTGTGGGATTGGTTTTCATGTGTGGGGTA-
3). Based on G4Hunter analysis and G4-ChiP-seq, we predicted that
VNTR6-1 adopts a G-quadruplex (G4) structure in the TERT-sense
orientation, creating 35-113 G4 copies per allele with conserved core
G-containing motifs (Fig. 2a and Supplementary Fig. 11a, b).
Since a single invariable G4 upstream of VNTR6-1 has been
implicated in TERT-βsplicing20, we hypothesized that variation in G4
copies, created by VNTR6-1-Short vs. Long alleles, may explain the
observed differences in the TERT-FL:TERT-βratio. To assess the role of
VNTR6-1-G4 in splicing, we treated our WT cell lines, UMUC3 (Fig. 2
and Supplementary Fig. 11cf) and A549 (Supplementary Fig. 8), as well
as their respective knockouts, with two G4-stabilizing ligands. We
quantied the expression of exons 6-9 (TERT-β)and7-8(TERT-FL)and
total TERT in cDNA from treated and untreated cells. Treatment with
the G4 ligands Pidnarulex (CX-5461)21 or PhenDC322 decreased the
TERT-FL while increasing the TERT-βfraction in both the UMUC3 and
A549 cell lines, likely by stabilizing VNTR6-1-G4 (Fig. 2f, g and Sup-
plementary Fig. 8dk). These results support the role of VNTR6-1-G4 in
modulating the TERT-FL:TERT-βratio. A splicing isoform with exon 8
skipping (TERT-Δ8, Supplementary Fig. 12) was observed in knockout
and WT cells after G4 ligand treatment.
rs10069690-T and VNTR6-1-Long alleles affect TERT expression
and splicing
TERT expression is generally lower in normal human tissues (The
Genotype-Tissue Expression (GTEx) Project, median TPM = 0.002.73)
and is not associated with the GWAS signals rs2242652 and
rs10069690 (Supplementary Data 7). However, TERT expression is
generally higher in tumors (The Cancer Genome Atlas (TCGA), median
TPM = 0.025.71; Supplementary Data 7) and is associated with these
SNPs in some tumor types (kidney chromophobe, KICH, and head and
neck squamous carcinoma, HNSC; Supplementary Data 7). Notably, we
detected high TERT expression (mean TPM = 59.7, Fig. 3aandSup-
plementary Data 8) in a set of 78Burkitt lymphoma (BL) tumors23.BLis
an aggressive pediatric cancer originating from germinal center B cells,
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
in which high TERT expression is necessary for the longevity of
memory B cells. Two ho tspot somatic mutations in the TERT promoter,
C228T (-124 bp) and C250T (-146 bp) upregulate TERT expression in
many tumors24,25, but these mutations are absent in non-Hodgkin
lymphomas, including BL26 and our set of BL tumors. The combination
of high TERT expression in the absence of upregulating promoter
mutations in BL tumors provides an opportunity to explore the reg-
ulation of TERT expression by germline variants.
In BL tumors (Supplementary Data 8), TERT expression decreased
with the rs10069690-T allele (β=13.95 TPM, p=0.035; Fig. 3b) but
not with the rs2242652-A allele (β=2.53TPM, p= 0.83; Fig. 3c), with a
suggestive trend for decreased TERT expression for the VNTR6-1-Long
a
60
0
60
0
chr5 (p15.33) 15.2 13.3 12 5q11.2 13.2 5q14.3 5q15 21.3 23.1 23.2 31.1 31.3 q32 5q34
Exon 7 Exon 6
Upstream G4
(GG
GG
TGGGG
GGGG
ATCT
G
TGGG
GGG
ATTGG
GG
TTTTCAT
G
T
G
TGGGG
GGGG
TA)24-66.5x GGGGG
GGGGG
CCTTGGGG
GGGG
CTCGG
GG
CAGGGG
GGGG
TGAAAGGGG
GGGG
1.26kb 667bp
3.39kb
VNTR6-2 VNTR6-1
~0.9-2kb
NA18507
1,255,000 1,260,000 1,265,000 1,270,000 1,275,000 1,280,000 1,285,000 1,290,000 1,295,000
HEK-293T
Mismatch, % Mismatch, %
20
40
60
80
100
TERT
-FL, %
p= 4.5E-02
p= 2.4E-02
c
TERT-FL, 302 bp
TERT-β, 120 bp
HPRT1, 100 bp
TERT-Δ8, 216 bp
Guide RNA 2
Guide RNA 1
0
Untreated
DMSO
CX-5461
PhenDC3
UMUC3
f
d
g
TERT
V6.1-KO
TERT-FL, 302 bp
200 bp
100 bp
75 bp
50 bp
TERT-β, 120 bp
HPRT1, 100 bp
WT
V6.1-KO #2
V6.1-KO #1
V6.1-KO #3
Control (gDNA)
Control (H2O)
Ladder
Untreated
DMSO
CX-5461
PhenDC3
Control (gDNA)
Control (H2O)
Ladder
b
e
TERTβ
TERTFL
0
20
40
60
80
100
TERT isoform fraction, %
55
41 42
29
45
59 58
71
WT
V6.1-KO #2
V6.1-KO #1
V6.1-KO #3
p= 5.7E-03
p= 1.5E-02
TERT
-FL, %
0
20
40
60
80
100
Untreated
DMSO
CX-5461
PhenDC3
G-quadruplexes (G4)
0.0
0.5
Probability
GG
T
G
5
GG
A
G
G
A
10
C
T
T
A
C
G
T
G
T
15
G
G
A
G
G
A
T
20
T
GG
TT
25
T
C
T
T
C
T
A
T
30
C
G
A
C
T
G
C
T
G
35
A
G
G
T
G
A
G
T
40
A
WT
200 bp
100 bp
75 bp
50 bp
200 bp
100 bp
75 bp
50 bp
TERT-FL, 302 bp
TERT-β, 120 bp
HPRT1, 100 bp
TERT-Δ8, 216 bp
Untreated
DMSO
CX-5461
PhenDC3
Control (gDNA)
Control (H2O)
Ladder
Fig. 2 | VNTR6-1 affects the TERT-FL:TERT-βsplicing ratio. a G4-ChIP results
within the TERT region in the cell lines HEK-293T (VNTR6-1: Long/Long) and
NA18507 (YRI, 1000 G, VNTR6-1: Short/Short) display mismatches (%) during DNA
synthesis, reecting polymerase stalling after G stabilization in both the plus (blue)
and minus (orange, direction of TERT transcription) genome strands. The genomic
region of TERTintron 6 includes VNTR6-1 (24-66.5 copies of the38-bp repeat unit),
VNTR6-2,G4 in the minus strand (polymorphic G4 within VNTR6-1and constitutive
upstream G4), and CRISPR/Cas9 guide RNAs for excising VNTR6-1. The sequence
logo showsthe consensus38-bp VNTR6-1repeat unit in UMUC3cells based on long-
read WGS. bAgarose gels of RTPCR products amplied from the cDNA of corre-
sponding samples; gDNAgenomic DNA negative control; HPRT1 -endogenous
normalization control. eDensitometry results ofthe PCR amplicons in the plot (b).
The differences in the TERT isoform ratios are further explored in Supplementary
Fig. 11. Experiments in UMUC3 cells comparing TERT splicing and isoform-specic
expression after72 h of treatment with G4 stabilizing ligands, normalized to HPRT1
in the WT (c,f)andV6.1-KO(d,g) cell lines. c,dA representative agarose gel of
SYBR-Green RTqPCR products detecting several isoforms with primers located in
exons 6 and 9. The extra PCR band, marked by a red arrow in panels (cand d), is
further explored in Supplementary Fig. 12.f,gDensitometry analysis of the cor-
responding agarose gels evaluating TERT-FL (%) relative to the total PCR products.
All analyses are based on three independent experiments, presented with
means± SD. One representative gel per experiment is shown. Comparisons were
made against the vehicle control (DMSO). P-values are for unpaired two-sided
StudentsTtest. The source data are provided in the Source Data le.
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 4
Content courtesy of Springer Nature, terms of use apply. Rights reserved
allele (β=16.97 TPM, p=0.053;Fig.3d). These variants are in high LD
in 1000G-EUR but in low LD in 1000G-AFR and our set of BL tumors
(88% from African patients, Supplementary Fig. 13), suggesting inde-
pendent effects of rs10069690 and VNTR6-1 on TERT expression.
Based on the LD proles and association with TERT expression in BL
tumors, we functionally prioritized rs10069690 and VNTR6-1 for fur-
ther analyses.
The functional role of the rs10069690-T allele has been attributed
tothecreationofanalternativesplicingsiteinTERT intron 4, resulting
in the coproduction of telomerase-functional TERT-FL and a truncated
telomerase-nonfunctional INS1b isoform27. However, owing to low
TERT expression in most human tissues, this relationship has not been
explored in relation to 5p15.33 genetic variants27. In BL tumors, 26.2%
of all RNA-seq reads between exons 4 and 5 were retained within intron
4, in contrast with neighboring introns 3 and 5 (with 10.5% and 8.1% of
the retained reads, respectively, Supplementary Fig. 14a). The rate of
TERT intron 4 retention was stronger associated with rs10069690
(p= 5.36E-09, Supplementary Fig. 14b) than with rs2242652 (p=5.0E-
03, Supplementary Fig. 14c). We analyzed four splicing events between
exons 4 and 5,one with canonical intron 4 splicing and three involving
intron 4 retention (isoforms INS119,27,INS1b27, and unspliced intron 4,
Supplementary Fig. 15ad). Canonical splicing decreased across
rs10069690 genotypes (68.3%, 63.8%, and 57.3% of reads in CC, CT and
TT groups, respectively; p= 1.65E-05; Supplementary Fig. 15e). With
each copyof the rs10069690-T allele, INS1b splicing increased from 0%
to 3.8% and 7.0% (p= 3.07E-09; Supplementary Fig. 15g), while INS1
splicing decreased and unspliced intron 4 (excluding reads for INS1
and INS1b isoforms) increased (Supplementary Fig. 15f, h). These
results are consistent with the previously reported association
between the rs10069690-T allele and INS1b-type splicing27 but suggest
that INS1- and INS1b-type splicing are minor and likely secondary to
intron retention, which increases with the rs10069690-T allele. A
similar trend was observed for rs2242652 but with weaker associations
(Supplementary Fig. 15il).
Several other common TERT isoforms have been reported19
(Supplementary Fig. 16). The TERT-αisoform involves in-frame 36 bp
skipping within exon 6 (Δ6
(136)
), causing partial loss of the reverse
transcriptase domain19. As discussed above, TERT-β(Δ78)19 results
from the simultaneous skipping of exons 7 and 8 (182 bp), terminating
the frameshifted protein in exon 10. In addition, TERT-αβ results from
concurrent Δ6
(136)
and Δ7-8 splicing events. The expression of these
TERT isoforms was not signicantly associated with rs10069690,
rs2242652, or VNTR6-1 in BL tumors (Supplementary Data 8). Tran-
scripts truncated by premature termination codons (Supplementary
Fig. 16), including INS (truncated within exon 5), INS1b (intron 4), and
TERT-βor TERT-αβ (exon 10), are likely to be eliminated by nonsense-
mediated decay (NMD), reducing total TERT expression. Escaping
NMD would result in alternative TERT proteins without telomerase
41
23
5
6
7
8
9
10
11
12
13
14
15
16
rs10069690-CATG
41
2
3
5
6
9
10
11
12
13
14
15
16
' 7-8 exons
ATG
VNTR6-1, Short
41
2
3
5
6
7
8
9
10
11
12
13
14
15
16
rs10069690-TATG
VNTR6-1, Short
41
23
56910111213141516
' 7-8 exons
ATG
rs10069690-C
VNTR6-1, Long
rs10069690-T
VNTR6-1, Long
+480 bp
+480 bp
Full-length
INSb1
E
INSb1 / E
TERT isoforms of haplotypes VNTR6-1 and rs10069690
0.447 Reference haplotype
0.335 -12.22 (-27.1 -2.70)
0.059 -15.92 (-50.0 -18.2)
0.158 -24.18 (-45.6 -2.72)
Freq, n=78 E(95% CI), p value
TERT, TPM
0
50
100
150
200
250
300
CC
n=23
CT
n=33
TT
n=22
rs10069690
E= −13.95, p= 3.51E−02
b
0
50
100
150
200
250
300
n=78
TERT, TPM
0
50
100
150
200
250
300
E= 2.53, p= 8.30E−01
GG GA AA
n=59 n=19 n=0
ca
rs2242652
77.3
54.1
49.9
59.4
60.7
59.7
0
50
100
150
200
250
300
E= −16.97, p= 5.33E−02
Short/Short Short/Long Long/Long
n=47 n=28 n=3
d
VNTR6-1
67.8
47.2
50.4
e
p= 0.10
p= 0.36
p= 0.027
Fig. 3 | Analysis of TERT expression in 78 Burkitt lymphoma (BL) tumors. Total
TERT expression analyzed as transcripts permillion (TPM) in BL tumors (n= 78) (a)
overall and in relation to the (b) rs10069690, (c) rs2242652, and (d)VNTR6-1
genotypes, with individualsample sizes for each group indicated in the gure. The
group means are shown as red dots with values above the corresponding violin
plots. Within each violin plot,the embedded box plots dene the center line as the
median, the whiskers as the minima and maxima, and the bounds as the 1st (25%)
and 3rd (75%) quartiles of the distribution. eassociation of the VNTR6-1 and
rs10069690 haplotypes with total TERT expression. The reference Short-C haplo-
type corresponds to the telomerase-encoding TERT-FL isoform, whereas the INS1
and TERT-βisoforms encode truncated proteins without telomerase activity. Effect
alleles in haplotypes are marked in red; white boxes exons; gray bo xes introns;
black boxes intron 4 retention; blue boxes alternative exons 7 and 8; and red
lollipops stop codons. The direction of the TERT exons is from right to left,
corresponding to the minus strand, as presented in the UCSC browser. ATG
indicates translation start codons. P-values and β-values are for linear regression
models adjusted for sex and age, which were assessed for robustness through
permutation testing, withresults provided in SupplementaryData 8. The detailsare
provided in the Source Data le.
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
activity but still binding the telomerase RNA component (TERC), thus
producing dominant-negative competitors of the telomerase-
functional TERT-FL28.
Due to premature termination codons (in intron 4 for INS1b and in
exon 10 for TERT-β), both rs10069690 and VNTR6-1 increase the
fraction of NMD-targeted transcripts encoding telomerase-
nonfunctional proteins, decreasing total TERT expression and the
fraction of the telomerase-encoding TERT-FL isoform. To assess the
combined effects of these variants, we analyzed TERT expression
based on the VNTR6-1/rs10069690 haplotypes (Fig. 3e). Compared to
the Short-C haplotype, TERT expression was decreased with the Short-
T(β=12.2 TPM, p= 0.10) and Long-C (β=15.92 TPM, p=0.36) hap-
lotypes, with a greater decrease occurring when both the VNTR6-1-
Long and rs10069690-T alleles were included in the same haplotype
(Long-T, β=24.18 TPM, p=0.027,Fig.3e and Supplementary Data 8).
Thus, two splicing events independently contributed by the VNTR6-1-
Long and rs10069690-T alleles (a splicing switch from the TERT-FL to
TERT-βisoform and intron 4 retention) decrease total TERT expres-
sion, with stronger effects expected in the presence of both alleles.
VNTR6-1 regulates proliferation and apoptosis
We hypothesized that variation in the TERT-FL:TERT-βratio due to
VNTR6-1 length could affect cellular dynamics, such as proliferation.
To address this, we monitored the counts of UMUC3 WT and V6.1-KO
cells over ten days using the Lionheart automated microscope. The
differences in cell counts were not signicant when WT and knockout
cells were continuously cultured in a medium supplemented with
fetal bovine serum (full medium, Supplementary Fig. 17a). However,
when the cells were rst cultured in a medium without any serum
(serum-starved) for 24 h and then switched to a full medium, a strong
increase in proliferation was observed only in the WT cells (Supple-
mentary Fig. 17b). To further explore the role of VNTR6-1 in response
to culturing conditions, we assessed cell proliferation as cell index,
measured with xCELLigence as a real-time increase in cell counts. The
cells were rst cultured for two days in a mediumsupplemented with
charcoal-stripped serum (CS medium, depleted of hormones and
growth factors), followed by culturing in fresh media (CS or full) for
ten more days. The knockout clones of both UMUC3 and A549 cell
lines proliferated signicantly slower than WT cells (Fig. 4a, Supple-
mentary Fig. 18a and Supplementary Data 9). Similarly to the pre-
vious results (Supplementary Fig. 17), switching to the full medium
resulted in a stronger and faster increase in proliferation in WT
compared to knockout cells (Fig. 4a and Supplementary Fig. 18a).
The increase in proliferation in WT versus knockout was less dra-
matic (Fig. 4a for UMUC3) or undetectable (Supplementary Fig. 18a
for A549) in cells continuously cultured in CS medium. These results
support the role of VNTR6-1 in regulating proliferation, potentially by
providing adaptation in response to alterations in cellular conditions
and stress, such as the availability of serum growth factors and
hormones in our experiments.
Because proliferation reects the balance between cell division
and apoptosis, we analyzed both processes. We stained UMUC3 cells
with an intracellular dye (CFSE) and monitored the decrease in uor-
escence intensity that occurs as the cells divide. This analysis showed
that even though all cells divided faster in the full medium, knockout
clones divided slower than WT cells regardless of the culturing med-
ium (Fig. 4b, c and Supplementary Fig. 17c). Annexin V staining of both
UMUC3 and A549 cells revealed a signicant increase in the percen-
tage of apoptotic cells in knockout compared to WT cells cultured in
CS media with cisplatin, an apoptosis-inducing DNA-damaging agent29,
with a weaker effect also observed in full media (Fig. 4d, e and Sup-
plementary Fig. 18b, c). RNA-seq analysis of UMUC3 knockout com-
pared to WT cells also demonstrated the role of VNTR6-1 in promoting
proliferation and protection from apoptosis (Fig. 4f, g, Supplementary
Data 10 and Supplementary Data 11).
TERT-βis a dominant-negative competitor of TERT-FL for tel-
omerase function28 , but the interplay of these isoforms in pro-
liferation is unclear. We monitored proliferation measured as cell
index in a bladder cancer cell line with low TERT expression (5637
cells, DepMap TERT TPM = 1.23) after transient transfection with the
TERT-FL or TERT-βplasmid (Supplementary Fig. 19a, b). Compared
to the GFP control, overexpression of either isoform increased
proliferation, with a weaker effect for TERT-FL compared to TERT-β
(Supplementary Fig. 19c and Supplementary Data 12). Co-
transfection at 20:80% and 80:20% TERT-FL:TERT-βplasmid ratios
(modeling WT and V6.1-KO cells, respectively) also increased pro-
liferation compared to control. However, cells transfected with
more TERT-FL (80:20% ratio) grew slower than those transfected
with more TERT-β(20:80% ratio, Supplementary Fig. 19d and Sup-
plementary Data 12), potentially due to reduced levels of anti-
apoptotic TERT-β.
We imaged A549 cells, in which visualization is facilitated by a
large cytoplasm. In cells co-transfected with an equal ratio of both
TERT isoforms, stronger mitochondrial colocalization was observed
for TERT-βthan TERT-FL (Fig. 5a and Supplementary Fig. 20). These
results further suggest that TERT-βplays a role in mitochondrial-
localized processes, such as protection from apoptosis, particularly
under stress conditions30,31. Collectively, these experiments inde-
pendently demonstrated that an increased TERT-FL:TERT-βratio
due to the loss of VNTR6-1 (in V6.1-KO cells) or carriage of the
VNTR6-1-Short allele, may result in a reduction in TERT-βexpression
to levels insufcient for protection from apoptosis, thus negatively
affecting proliferation. These anti-apoptotic effects might manifest
only in specic conditions that increase cellular stress, such as DNA
damage, nutrient deciency, or other microenvironmental chal-
lenges (Fig. 5b).
VNTR6-1 and rs10069690 account for multi-cancer GWAS
associations
Because we linked the VNTR6-1-Long allele with the GWAS signals
rs2242652-A (r2= 0.62) and rs10069690-T (r2= 0.48) in the 1000G-
EUR populations, we next sought to compare the cancer associa-
tions of these markers. Having validated the rs56345976/
rs33961405 haplotypes as a condent predictor of VNTR6-1 Short
vs. Long alleles (Supplementary Data 3, Supplementary Data 4,
Supplementary Fig. 3 and Supplementary Fig. 4), we used these
haplotypes to infer VNTR6-1 alleles in various sets. In the absence of
WGS data, we used data based on array genotyping and imputation,
although this might reduce condence in inferring VNTR6-1 status,
as it depends on the accuracy of the imputation and phasing of
rs56345976 and rs33961405. Specically, we inferred VNTR6-1 and
the composite marker VNTR6-1/rs10069690 because it captured
the functional effects of both variants. Using these markers, we
performed association analyses in individuals of European ancestry
from the Prostate, Lung, Colorectal, and Ovarian (PLCO) cohort32 of
cancer-free controls (n= 73,085) and individuals with 16 cancer
types (n= 29,623). The PLCO association results for the VNTR6-1-
Long allele and VNTR6-1/rs10069690 were comparable to those for
the rs10069690-T and rs2242652-A alleles; these alleles were asso-
ciated with a reduced risk of bladder and prostate cancer but an
elevated risk of breast, endometrial, ovarian, and thyroid cancer
and glioma (Fig. 6a and Supplementary Data 13). Conditional ana-
lysis for VNTR6-1 eliminated or attenuated associations for
rs2242652 and rs10069690. The minor residual associations after
adjustment for VNTR6-1 could reect the limitations in inferring its
status in the absence of WGS data. Compared to the reference
Short-C haplotype of the combined VNTR6-1/rs10069690 marker,
the strongest cancer-specic associations, both positive and nega-
tive, were observed for the Long-T haplotype (Supplementary
Fig. 21a).
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 6
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Associations of TERT isoforms and genetic variants with
telomerase-related metrics
TERT-β, which encodes a telomerase-nonfunctional protein, is the
major TERT isoform in both normal and tumor tissues (Supplementary
Fig. 22). Of the total TERT expression, the TERT-FL and TERT-βisoforms
represented on average 17.7% and 67.9%, respectively, in 30 normal
tissue types in GTEx, while 38.4% and 44.5%, respectively, in 33 tumor
types in TCGA (Supplementary Data 14). To further explore the func-
tional differences between TERT-FL and TERT-β, we assessed four
telomerase-related metrics: EXpression based Telomerase ENzymatic
activity Detection (EXTEND)33, stemness (mRNAsi)34, the telomerase
signature score35 and telomere length in primary tumors35 (Supple-
mentary Fig. 22 and Supplementary Data 15). In GTEx, signicant cor-
relations with the EXTEND signature (positive for TERT-FL and negative
for TERT-β) were observed in four tissues (blood, colon, esophagus,
and testis). Similarly, in TCGA, most tumors with signicant correla-
tions across all metrics showed positive values for TERT-FL and nega-
tive values for TERT-β. In TCGA, of the four metrics, telomere length in
p= 9.74E-09
p= 2.28E-14
p= 1.70E-11
Full medium
p= 1.67E-05
p= 2.31E-14
p= 1.05E-07
a
d
gf
0
0.5
1
1.5
2
2.5
3
3.5
4
0 50 100 150 200 250 300
Time, hours
Cell Index, xCELLigence
Medium switch
WT, CS medium
WT, Full medium
V6.1-KO, CS medium
V6.1-KO, Full medium
Switch to fresh CS or full medium
Seeding in CS medium
Growth phase
Measurement every 15 minutes
Eint= 2.56E-04
pint< 2.2E-308
Eint= 1.50E-04
pint< 2.2E-308
Eint= 1.40E-04
pint< 2.2E-308
Eint= 3.37E-05
pint< 2.2E-308
WT,
CS medium
WT,
Full medium
V6.1-KO,
CS medium
V6.1-KO,
Full medium
p= 1.16E-04
p= 3.75E-04
p= 1.16E-04
WT
V6.1 KO #1
V6.1 KO #2
V6.1 KO #3
7.0
7.5
8.0
8.5
9.0
Cell doublings in 48 hours
WT
V6.1 KO #1
V6.1 KO #2
V6.1 KO #3
7.0
7.5
8.0
8.5
9.0
Cell doublings in 48 hours
CS medium
WT
V6.1 KO #1
V6.1 KO #2
V6.1 KO #3
-4
-2
0
2
4
6
8
10
12
WT
V6.1 KO #1
V6.1 KO #2
V6.1 KO #3
-4
-2
0
2
4
6
8
10
12
Apoptotic cells, %
normalized to CS medium and WT
Apoptotic cells, %
normalized to CS medium and WT
Full medium10 PM cisplatin in CS medium
ebc
−10
−5
0
5
−10
−5
0
5
−0.6
−0.4
−0.2
0.0
−0.6
−0.4
−0.2
0.0
EYA4
FOXC2
SEMA5A
DDR2
BIRC6
NUPR1
MAP4K4
API5
MTDH
MARCHF7 SOX9
SLC7A5
Decrease in apoptosis-resistance pathway activity in V6.1 KO vs. WT
Decrease in proliferation-promoting pathway activity in V6.1 KO vs. WT
PFDR = 1.33E-04PFDR = 1.33E-04
GSEA enrichment score = -0.53
Normalized enrichment score = -1.54
GSEA enrichment score = -0.58
Normalized enrichment score = -1.68
Fold-change, log2 Enrichment score
Fold-change, log2 Enrichment score
0 10,000 20,000 30,000
Differential gene expression rank
0 10,000 20,000 30,000
Differential gene expression rank
SEMA5A
DDR2
CACUL1
MEF2D
BIRC6
NEAT1
STX3
MARCHF7 SOX9
SLC7A5
Enrichment plot: negative regulation of programmed cell death
Enrichment plot: positive regulation of cell population proliferation
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
tumors showed the weakest corr elations with TERT isoform expression
(Supplementary Fig. 22), potentially due to somatic events, including
TERT-upregulating promoter mutations24,25.
Because TERT activity is essential for maintaining telomere
length, we next tested the associations of VNTR6-1, rs10069690, and
their haplotypes with relative leukocyte telomere length (rLTL). We
inferred VNTR6-1 and VNTR6-1/rs10069690, as described above, in
cancer-free individuals of European ancestry (n= 339,103) from the UK
Biobank (UKB) based on SNP genotyping and imputation. In this ana-
lysis, the Short-C haplotype was associated with shorter rLTLs
(β=0.049, p= 8.75E-78, Fig. 6b, Supplementary Fig. 21b and Sup-
plementary Data 16). Signicant associations were also observed with
several known markers36,37 within TERT intron 2, including rs7705526
(β=0.079, p= 1.02E-219; Supplementary Data 16) and adjustment for
Fig. 4 | VNTR6-1 affects proliferation and apoptosis in the bladder cancer cell
line UMUC3. a Analysis of real-time increase in cell counts (cell index) measured
with xCELLigence over 283 h in UMUC3 cells. The WT and knockout cells (clone #2,
starred samples in Supplementary Data 9) were cultured in a CS medium for 48 h,
followed by a switch tofresh CS or full mediumfor 10 more days. Proliferation rates
in response to culturingconditionswere signicantly lower in V6.1-KOcompared to
WT cells. The plots present the results of one of three independent experiments,
with means ±SEM from n= 6 biological replicates. Statistical signicance and β-
values for differences in the cell indexduring the visually determined growth phase
(gray highlighting between50 and 183 h) were calculated using linear mixed-effects
interaction models. The reference sample is labeled with a dotted circle; β
int
represents the change in growth rates between experimental groups. b,cAnalysis
of cell doublings in UMUC3 cells cultured in (b)CSmediumor(c) full medium for
four days, using the CFSE assay described in Supplementary Fig. 18c. Data
presented are the results of one of three independent experiments, with means±
SD from n= 3 biological replicates. P-valuesin graphs are for two-way ANOVA using
Tukeystest.d,eQuantication of apoptosis in UMUC3 cells cultured for 48 h (d)
with 10 μM cisplatin in CS medium or (e) in full medium, followed by Annexin V/PI
staining and ow cytometry analysis to determine the percentage of apoptotic
cells. Data presented are the results of one of three independent experiments, with
means ± SD from n= 3 biological replicates (normalized to values of CS media and
the WT groups). P-values in graphs were determined by one-way ANOVA using
Dunnettstest.f,gGene set enrichment analysis (GSEA) for differential expression
of genes involved in (f) pathways related to the downregulation of cellular pro-
cesses for proliferation-promotion and (g) apoptosis-resistance, as identied by
RNA-seqanalysis comparing UMUC3 V6.1-KO (clone #3) toWT UMUC3 cells. Genes
highlighted in blue are common to both pathways. The source data for all panels
are provided in the Source Data le.
aTERT-FL-HA
TERT-FL-HA
TOM20 DAPI
TERT-β-FLAG
TOM20 DAPI
TERT-FL-HA TERT-β-FLAG
TOM20 DAPI
TERT-β-FLAG
TOM20
TOM20 TERT-β-FLAG DAPI
b
TERT-FL
TERT
Proliferation
Apoptosis
VNTR6-1 repeat length
ShortLong
Fig. 5 | Functional differences between the TERT-FL and TERT-βisoforms.
aStructured illumination microscopy images of A549 cells co-transfected with
TERT-FL and TERT-βexpression constructs at a 50:50% ratio. For individual chan-
nels, the staining is shown as black/white images for better contrast. Tri-color
mergedpanels: green FLAG(TERT-β) or HA (TERT-FL), blue DAPI(nuclei). Quad-
color merged panel: purple - HA (TERT-FL), green - FLAG (TERT-β), red - TOM20
(mitochondria), blue - DAPI (nuclei). The yellow inset in the TERT-β-FLAG panel is
shown at a higher magnication to demonstrate colocalization with mitochondria
(yellowstaining). The images shown are representative of ten distinct images taken
across two independent experiments. Scale bar = 2 μm, inset scale bar = 0.5 μm.
bSchematic overview of the proposed relationships between VNTR6-1 repeat
length, the TERT-FL:TERT-βratio, proliferation, and apoptosis.
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 8
Content courtesy of Springer Nature, terms of use apply. Rights reserved
rs7705526 eliminated the rLTL association with VNTR6-1/rs10069690
(p= 0.64). Notably, regression slopes for these markers differed by
genotypes (Fig. 6b). Interaction analysis in 5-year interval groups
revealed a signicantly slower (p
int
= 1.39E-02, Supplementary Data 16)
decrease in the rLTL in younger individuals and a faster decrease in
older individuals without the Short-C haplotype (which corresponds to
a greater fraction of telomerase-nonfunctional TERT) than in carriers
of this haplotype. This effect remained unchanged after adjustment for
rs7705526 (p
int
= 1.38E-02, Supplementary Data 16), which had an
independent signicant interaction (p
int
= 3.21E-03, Supplementary
Data 16). The rLTL association pattern was consistent in a smaller set of
healthy individuals, whose lymphocyte telomere length was measured
by ow-FISH38 and VNTR6-1 status was determined by long-read tar-
geted sequencing, but interaction analysis was limited by sample size
and age range (Supplementary Data 17).
In cancer-free controls of European ancestry, the Short-C haplo-
type frequencies were comparable (71.3672.07%) across 40- to 80-
year-old age groups in the UKB and PLCO but decreased to 67% in
individuals aged 98108 years (Supplementary Fig. 23). The difference
between centenarians and 40- to 80-year-olds was contributed by
decreased frequencies of both the rs10069690-C and the VNTR6-1-
Short alleles. The VNTR6-1 genomic region is absent in non-primate
Bladder
Pancreatic
Prostate
Gastrointestinal
Renal
Liver
Lung
Colorectal
Melanoma
Head and Neck
Hematopoietic
Breast
Endometrial
Ovarian
Thyroid
Glioma
Cases, n 1,756 739 7,378 546 869 233 2,960 2,045 2,476 659 3,418 4,591 804 504 339 306
Controls, n 73,085 73,085 32,679 73,085 73,085 73,085 73,085 73,085 73,085 73,085 73,085 40,406 40,406 40,406 73,085 73,085
Odds Ratio (95% Confidence Interval)
0.50
0.75
1.00
1.25
1.50
1.75
2.00
Number of copies of the
VNTR6-1/rs10069690 ShortC haplotype
ShortC, 0 −0.0256 x + 1.60 , n= 27,360
ShortC, 1 −0.0243 x + 1.49 , n= 137,470
ShortC, 2 −0.0239 x + 1.42 , n= 174,273
E= 0.049, p= 8.75E−78, pint= 1.39E-02, pcond= 6.40E-01
−0.6
−0.3
0.0
0.3
40 50 60 70
Age, years
rLTL, Z−standardised log
0.6
Genotypes of the rs7705526
AA −0.0254 x + 1.62, n= 35,828
AC −0.0245 x + 1.51 , n= 148,303
CC −0.0237 x + 1.38 , n= 154,972
Total set, n = 339,103
E= 0.079, p= 1.02E−219, pint= 3.21E-03, pcond= 5.50E-144
a
b
−0.6
−0.3
0.0
0.3
0.6
VNTR6−1 rs10069690 VNTR6−1/rs10069690 composite marker
Short C Short-C haplotype, 2 copies Reference allele
Long T Short-C haplotype, 0 or 1 copies
Effect allele
40 50 60 70
Age, years
rLTL, Z−standardised log
Total set, n = 339,103
Fig. 6 | Association analyses for cancer risk in PLCO and relative leukocyte
telomere length (rLTL) in UKB cancer-free individuals. a Evaluation of cancer
risk in the PLCO dataset (n= 102,708) for the V NTR6-1-Long and rs10069690-T
alleles and the composite marker (VNTR6-1/rs10069690). Odds ratios (ORs) are
shown as squares with 95% condence intervals (CIs) calculated for comparisons
between patients with the indicated cancers and a common group of cancer-free
controls using logistic regression analysis with an additive genetic model adjusted
for sex and age. bEvaluation of the relationships in UKB cancer-free individuals
(n= 339,103) between rLTL and VNTR6-1/rs10069690 and rs7705526 (LD, r2=0.33
between VNTR6-1/rs10069690 and rs7705526). P-values a nd β-values w ere derived
from linear regression models adjusted for sex, age, and smoking status. P
int
represents the interaction between genotypes and 5-year age groups; P
cond
repre-
sents the mutual adjustment for rs7705526 or VNTR6-1/rs10069690. The graphs
display regression lines and equations, with a shaded area representing the 95%
condence intervals. The analysisrevealed a decrease in the rLTLs with morecopies
of the Short-C haplotype. The results of sex-specic analyses of VNTR6-1/
rs10069690 are presented in Supplementary Fig. 21b. The details are provided in
the Source Data le.
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
species (Supplementary Fig. 24). In genomes of several primates, as
well as archaic humans (Neandertal and Denisova), only the Long-T
haplotypes were observed, with the VNTR6-1 consensus repeat
sequences in primates being nearly identical to those in humans
(Supplementary Fig. 24 and Supplementary Fig. 25). Thus, the VNTR6-
1-Short and rs10069690-C alleles, as well as the Short-C haplotype,
which increase the fraction of the telomerase-functional TERT-FL iso-
form but might negatively affect longevity, are human-specicand
major or common in all modern human populations (Supplementary
Data 18).
Discussion
Cancer risk is inuenced by complex interactions between genetic and
environmental factors. The numbers and replicative potential of stem
cells in each tissue type determine the probability of acquiring muta-
tions due to replicative errors occurring with every cell division, fur-
ther modulating cancer risk3941. In this work, we showed that the
genetic regulation of TERT splicing by an SNP rs10069690 and VNTR6-
1, a 38-bp intronic tandem repeat, accounts for the reduced or elevated
cancer risk associated with multi-cancer GWAS leads rs2242652 and
rs10069690 at 5p15.33.
While many VNTRs have been reported, including within the TERT
region9,10, their use in association studies remains limited. VNTRs are
often highly polymorphic, with a wide range of repeat numbers that
are difcult to quantify and link with biallelic markers commonly used
in GWAS, such as SNPs. VNTR6-1 within TERT intron 6 is unusual
becauseits repeat numbers could be binarized into two distinct allelic
groups we dened as Short (2427 repeats) and Long (40.5 or 66.5
repeats). Using multiple public and custom datasets and tools, we
established VNTR6-1 as a proxy for two multi-cancer GWAS lea ds i n this
region, rs2242652, and rs10069690.
We inferred the VNTR6-1 allelic groups (Short vs. Long alleles)
across diverse populations, including cancer cases and controls, based
on haplotypes of the common SNPs rs56345976 and rs33961405.
Although predicting VNTR6-1 status might carry more inherent tech-
nical uncertainty than the GWAS leads rs2242652 and rs10069690, our
genetic analysis of cancer risk revealed comparable associations for
these variants. While no functional properties were identied for
rs2242652, we demonstrated that both the VNTR6-1-Long and
rs10069690-T alleles are functional. Independently and in combina-
tion (i.e., Long-T haplotype), these alleles shift splicing from TERT-FL,
which encodes telomerase, to alternative isoforms INS1b and TERT-β,
which encode telomerase-nonfunctional TERT.
In addition to its canonical role in telomerase activity that is
mediated by the TERT-TERC complex and supports telomere main-
tenance, TERT also has important non-canonical telomere-indepen-
dent roles in supporting cellular homeostasis. Our ndings, based on
several methods and in line with previous observations28,42,support
the anti-apoptotic role of the TERT-βisoform, likely related to its
mitochondriallocalization and contributing to increased proliferation.
We hypothesize (Fig. 7) that the genetic regulation of the TERT-
FL:TERT-βratio has context-dependent consequences. The increase in
thefractionoftheanti-apoptoticTERT-βisoform extends cellular
longevity (lifespan of individual cells)43, manifesting as increased
proliferation (replicative potential), especially in response to cellular
stress and other stimuli. The tissues representing cancers with the
most signicant inverse associations for the rs10069690-T and
VNTR6-1-Long alleles (such as protection from bladder cancer and risk
for glioma) have low replicative potential at homeostasis. Under nor-
mal conditions, bladder epithelium is one of the slowest-growing
epithelial tissues with high resistance to apoptosis44, while the brain
has limited cell-specic neurogenesis in restricted regions45. However,
through direct contact with the urine, bladder epithelium is exposed
to pathogens and reactive metabolites that can cause tissue damage
and trigger acute regenerative proliferation that restores the tissue
integrity and function within days44. In contrast, direct exposure to
damaging agents requiring tissue regeneration, as well as the capacity
to regenerate, is limited for the brain. Thus, cancer susceptibility may
depend not only on the replicative potential of the normal tissues at
homeostasis but also on the types, timing, and intensity of damaging
exposuresandtheabilityofthetissuetoregenerateafterdamage.The
increased fraction of the TERT-βprotein extending cellular longevity in
bladder tissue may limit the need for tissue regeneration, thereby
mitigating mutagenesis from replication errors and decreasing cancer
risk. In contrast, the increased fraction of the TERT-βprotein extend-
ing cellular longevity of glial cells might increase cancer risk by pro-
moting the gradual accumulation of somatic mutations from
proliferation, especially under subtle but prolonged exposures, and
prevent the death of damaged cells.
In normal tissues with low replicative potential, including the
bladder and brain, tumorigenesis often depends on driver mutations,
such as TERT-upregulating promoter mutations that reactivate telo-
merase and immortalize cancer cells46. Tissue regeneration can also be
initiated by rare TERT-high cells acting as stem cells47 leading to
tumorigenesis upon theacquisitionof driver mutations. While the anti-
apoptotic function of TERT-βis important for extending cellular
longevity, the reciprocal decrease in the TERT-FL fraction might pre-
vent immortalization of cells with acquired somatic mutations or
protect against telomere shortening, especially under oxidative
stress42,48 and facilitate the DNA damage response49.
Notably, we did not detect GWAS associations for the same
alleles/haplotypes of rs10069690 and VNTR6-1 for cancers originating
from tissues with high proliferation at homeostasis (e.g., the gastro-
intestinal tract). High proliferation rates in stem cells of these tissues,
combined with cell death induced by critical telomere shortening in
differentiated cells, prevent cells from reaching a malignant state and
thus act as a tumor-suppressive mechanism50. For cancer types with no
or marginal associations for the alleles/haplotypes tested, TERT-
related mechanisms might be more heterogeneous and dependent on
cell specicity,tumorsubtype,timing,andthenatureandintensityof
environmental exposure.
Telomere length has been extensively studied in relation to
cancer and non-cancer conditions51,52. Mendelian randomization
analysis revealed an association between genetically predicted
longer telomeres and the risk of 8 out of 22 cancer types tested,
especially for rare cancers and those originating from tissues with
low replicative potential53. Our analysis in the UKB revealed a strong
association between the VNTR6-1/rs10069690 haplotypes and rLTL
but weaker than those with other TERT rLTL markers (rs7705526,
rs2736100, and rs2853677) known to be linked with telomere
length36,37. We noted a greater degree of telomere shortening in
older than in younger individuals without the Short-C haplotype.
Given the anti-apoptot ic role of TERT-β, which might extend cellular
longevity, this could reect a greater proportion of circulating
leukocytes originating from stem cells and their progenitors that
have undergone more cell divisions.
The alleles associated with an increased fraction of telomerase-
encoding TERT-FLVNTR6-1-Short, rs10069690-C, and their Short-C
haplotypeare human-specic variants, that in Europeans are less
common in centenarians than in younger individuals. The emergence
and retention of these alleles might be consistent with the disposable
soma theory of ageing, which postulates that evolution favors factors
supporting reproductive tness and growth at the expense of long-
evity, which requires substantial maintenance to repair age-related
somatic damage54. Female fertility strongly depends on ovarian
telomerase55 and telomere shortening is considered an evolutionary
cost of reproductive trade-offs56. The evolutionary selection ofgenetic
variants that increase the fraction of the telomerase-encoding TERT-FL
isoform might provide this reproductive tness benetwhile
decreasing longevity later in life, perhaps due to elevated cancer risk.
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 10
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Reference
VNTR6-1-Short / rs10069690-C
TERT-FL !TERT, INS1b
Increased cancer risk
relative to reference
VNTR6-1-Long / rs10069690-T
TERT-FL TERT, INS1b
Stem Cell or
TERTHigh Cell
Bladder, Prostate Brain, Thyroid, Ovary
Replication potential: low
Homeostatic proliferation: low
Regenerative proliferation: high
Replication potential: no/low
Homeostatic proliferation: no/low
Regenerative proliferation: no/low
b
a
VNTR6-1-Long / rs10069690-T
TERT-FL TERT, INS1b
Mutations from environmental exposures
Mutations from replicative errors
Normal cell
Cancer cell
Apoptotic cell
Cellular Longevity
TERT-FL TERT-β, INS1b
ex9 ex6 ex5ex8 ex7
TERT-FL !TERT-β, INS1b
ex9 ex6 ex5ex8 ex7
Telomere
length
cell divisions cell divisions
Replicative Potential (R)
Gene (G)
Environment (E)
Stimulation
(via normal turnover or tissue injury)
Decreased cancer risk
relative to reference
Cell division history
rs10069690-T
VNTR6-1-Long rs10069690-CVNTR6-1-Short
in vitro model: WT cell lines in vitro model: V6.1-KO cell lines
Fig. 7 | Proposed model for functional effects of VNTR6-1 and rs10069690
contributing to multi-cancer associations within the 5p15.33 region. a Cancer
risk as a product of interactions between gene, replicative potential, and environ-
ment ( G × R × E). TERT genetic variants VNTR6-1 and rs10069690 and environ-
mental factors dene the relative ratios of the isoforms encoding telomerase-
functional TERT-FL and telom erase-nonfunctional TERT-βand INS1b isoforms.
These isoforms affect cell proliferation, apoptosis, and telomere length, thus
modulating cellular longevity and replicative potential, including homeostatic
proliferation, which maintains tissue self-renewal, and regenerative proliferation,
which responds to environmental factors and tissue damage. bThe VNTR6-1-Long
and rs10069690-T alleles, or their haplotype (Long-T), are associated with reduced
cancer risk in tissues with low homeostatic but high regenerative potential (e.g.,
bladder). The anti-apoptotic effect of the TERT-βisoform red uces the need for
regenerative proliferation, thus decreasing the risk of acquiring mutations from
replicative errors. In tissues with no/low homeostatic and regenerative prolifera-
tion (e.g., brain, thyroid, ovary), the same alleles and Long-T haplotype are asso-
ciated with elevated cancer risk. The anti-apoptotic effect of TERT-βcontributes to
extended cellular longevity, allowing the accumulation of more mutations from
environmental exposures, such as reactive oxygen species (ROS), cellular
metabolites, etc.
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Further studies are warranted to explore our ndings in the con-
text of other 5p15.33 multi-cancer GWAS signals1,2and identify specic
splicing factors that bind to VNTR6-1. DNA:RNA hybrids, including G4
and R-loops, are emerging as important regulators under normal and
disease conditions57, and their therapeutic targeting through VNTR6-1
might be possible for modulating TERT functions. In conclusion, our
multi-faceted study uncovers the complex regulation of TERT func-
tions and multi-cancer cancer risk through a combination of TERT
germline variants an SNP rs10069690 within intron 4 and a VNTR
within intron 6 (VNTR6-1). Our results provide insights into analyses of
complex genetic variants and their contributions to cancer suscept-
ibility and telomere biology.
Methods
The research presented in this paper complied with all relevant ethical
regulations and informed consent was obtained by each contributing
study that granted access to data to perform analyses reported in this
work. The study used deidentied controlled access data from the
Center for Alzheimers and Related Dementias (CARD) of the National
Institute on Aging (dbGaP phs001300.v4.p1), Burkitt Lymphoma
Genome Sequencing Project (BLGSP, dbGaP phs000527.v6.p2), the
Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial
(project #PLCO-957), UK Biobank (project #92005) and The Cancer
Genome Atlas (TCGA, https://gdc.cancer.gov). The use of deidentied
bladder tissue samples was approved by the NIH Ofce of Human
Subjects Research (#4715). The use of deidentied samples from the
Center for International Blood and Marrow Transplant Research
biorepository (CIBMTR; https://cibmtr.org) was approved by the
National Marrow Donor Program Institutional Review Board. All study
participants or their guardians provided informed consent for parti-
cipation in the CIBMTR Research Database and Research Sample
Repository Protocols (NCT01166009 and NCT00495300). Non-
controlled access data were obtained from public resources 1000
Genomes Project and GTEx.
Human samples used for targeted PacBio-seq and TaqMan
genotyping of select SNPs
Publicly available DNAsamples for HapMap I (CEU panel for CEPH Utah
residents with ancestry from Northern and Western Europe, n=90),
HapMap III (YRI panel for Yoruba in Ibadan, Nigeria, n= 90), select
samples from the Human Pangenome Reference Consortium (HPRC,
n= 10), and the European panel of the Georgia Centenarian Collection
(n= 100) were purchased from the Coriell Institute for Medical
Research. Deidentied tissue samples for bladder tumors and match-
ing adjacent normal samples (n= 5 pairs) were purchased from
Asterand Bioscience after approval by the NIH Ofce of Human Sub-
jects Research (#4715) and used for DNA extraction and genotyping.
Flow-FISH telomere length samples (n= 77) were obtained from
donors of hematopoietic cell transplants from the Center for Inter-
national Blood and Marrow Transplant Research (CIBMTR; https://
cibmtr.org) biorepository, comprising 28 females (36.36%) and 49
males (63.64%), ages 2152 years, mean age 37.68 years. All study
participants or their guardians provided informed consent for parti-
cipation in the CIBMTR Research Database and Research Sample
Repository Protocols (NCT01166009 and NCT00495300). The use of
the data was approved by the National Marrow Donor Program Insti-
tutional Review Board.
Telomere length was measured for total lymphocytes and lym-
phocyte subsets via the ow-FISH assay described in a previous
study38. For the current analysis, the samples were selected to repre-
sent a wide range of telomeres (4.511.2 kb), and telomere length was
analyzed in relation to TERT genetic variants using linear regression
models adjusted for age and sex.
Cell lines: The urinary bladder cell lines UMUC3 (CRL-1749), 5637
(HTB-9), HT1376 (CRL-1472), RT4 (HTB-2), T24 (HTB-4), and SCaBER
(HTB-3), as well as the Burkitt lymphoma cell line Raji (CCL-86) and the
lung cancer cell line A549 (CCL-185), were purchased from ATCC
(Manassas) and maintained in the recommended media supplemented
with 10% FBS(unless specied otherwise) and 1% antibiotics. All thecell
lines were regularly tested for Mycoplasma contamination using the
MycoAlert Mycoplasma Detection Kit (Lonza) and authenticated with
the AmpFLSTR Identiler Plus Kit (Thermo Fisher) if used longer than
one year after initial purchase from ATCC. Two versions of EMEM and
F-12K complete media for culturing UMUC3 and A549, respectively,
were used for xCELLigence, CFSE assay, and apoptosis assay: (1) EMEM
or F-12K, both with phenol red, supplemented with 10% FBS and 1%
antibiotics (full medium); and (2) phenol red-free EMEM or F-12K
supplemented with 10% charcoal-stripped (CS) FBS and 1% antibiotics
(CS medium). Cells were moved to CS medium 34 days prior to the
experiments.
Analyses of BL tumors
RNA-seq and DNA-WGS data (Illumina) for Burkitt lymphoma (BL)
tumors were obtained from the National Cancer Institute (NCI)
Cancer Genome Characterization Initiative (CGCI): Burkitt Lym-
phoma Genome Sequencing Project (BLGSP)23,58,dbGaP
phs000527.v6.p2, including 78 participants (35.90% females and
64.10% males, ages 115, mean age 6.95 years). The datasets were
accessed through the National Cancer Institute Genomic Data
Commons (GDC, https://gdc.cancer.gov/). The RNA-seq BAM les
were analyzed using read counts based on the R package Feature-
Counts (v2.0.6). Splicing events between TERT exons 4 and 5 were
annotated based on a custom GTF annotation le to perform read
summarization at the feature level, generating a raw count matrix.
The total number of reads was determined by counting the reads
mapped to the splicing junction between exons 4 and 5 and those
that extended into intron 4 by at least 20 bp. Read counts were
calculated for the splicing events INS1 (a 38-bp extension of exon 4
into intron 4), INS1b (a 480 bp extension of exon 4 into intron 4),
and unspliced intron 4 (total reads between exons 4 and 5 minus
reads for INS1 and INS1b) as fractions of the read counts for these
events within total read counts. BAM les were also used to estimate
the overall expression of TERT isoforms α,β, and αβ,which
were indexed in a GTF le from ENSEMBL and analyzed using MISO
(v0.5.4) with default parameters. Transcripts per million (TPM)
values for bulk TERT RNA-seq data were downloaded from the GDC
data portal. eQTL analyses were performed under additive genetic
models using the lmfunction in R (v4.3.0), with adjustments for
sex and age. TERT intron retention was analyzed with IRFinder
(v2.0.1) with default settings using the GRCh38 reference genome
FASTA le and transcriptome GTF le for annotation.
Analysis of long-read sequences
VNTR6-1 and VNTR6-2 within TERT intron6wereexploredbasedon
long-read sequencing data. Phased genome assemblies for 47 indi-
viduals (94 chromosomes) were downloaded in FASTA format from
the Human Pangenome Reference Consortium (HPRC)11. In addi-
tion, we used 358 long- read sequencing (R9, Oxford Nanopore) DNA
assemblies generated by the Center for Alzheimers and Related
Dementias (CARD) of the National Institute on Aging (available from
dbGaP phs001300.v4.p1)18. Input DNA was extracted from the brain
tissue of 179 neurologically normal individuals of European ances-
try and phased assemblies were generated using the Napu
pipeline59.
Genomic sequences in FASTA format were extracted from the
assemblies using Cutadapt (v4.0) based on two sets of nested
sequences anking the region of interest, ~ 9 kb, GRCh38,
chr5:1,271,950-1,281,050. The extracted sequences were aligned to the
GRCh38 reference genome using minimap2 (v2.26) andcombined into
one BAM le, with each individual represented by two sequences, one
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved
for each chromosome. In this BAM le, SNPs were scored using SAM-
tools with mpileup ag (v1.17), and VNTRs were scored using Straglr
(v1.4) with default settings. The pipeline is available at https://github.
com/oorez/HumanGenomeAssemblies and in the repository https://
doi.org/10.5281/zenodo.14633198.
Targeted PacBio-seq
PCR amplicons for targeted PacBio sequencing of VNTR6-1 were gen-
eratedusing the LA Taq Hot-Start DNA Polymerase Kit (Takara) and the
M13-tagged primers VNTR6-1-M13F and VNTR6-1-M13R (Supplemen-
tary Data 19). In the reference human genome, these primers capture a
genomic fragment of 2,241 bp. The optimized 20 µl reactions included
4% DMSO, 0.3 µl of LA Taq DNA Polymerase, 2.5 µlof10xLATaqPCR
Buffer, 4 µlof2.5mMdNTPs,0.5µlofeach10µM primer, and 25 ng of
genomic DNA. The PCR conditions included denaturation for 1 min at
94 °C, 36 cycles of denaturation for 10 s at 98 °C and combined
annealing/extension for 3.5 min at 68°C, followed by a nal extension
for 10 min at 72 °C.
The controls included 1000 G DNA samples purchased from
the Coriell Institute for Medical Research and selected to represent
various repeat lengths determined based on HPRC assemblies
(HG00741, HG01358, HG01891, HG02080, HG02622, HG02717,
HG02723, HG03453, HG03492, and NA18906, Supplementary
Data 2). For technical validation of the rst and second rounds of
PCR, all products were quantied with the Quant-iT PicoGreen
dsDNA Assay (Invitrogen), and 5% of the products were analyzed
with the TapeStation D5000 Kit (Agilent). The second round of PCR
was performed with the LA Taq Hot-Start DNA Polymerase Kit and
the SMRTbell Barcoded Adapter Complete Prep Kit (PacBio), and
the M13 tags incorporated by the rst PCR were used to attach
unique barcodes to each sample with primers M13F and M13R,
where Nrepresents the unique barcode (Supplementary Data 19).
The 25 µl PCRs included 4% DMSO, 0.4 µl of LA Taq DNA Polymerase,
2.5 µl of 10x LA Taq PCR Buffer II, 4 µl of 2.5 mM dNTPs, 1.0 µl of each
3µM barcoded M13 primer, and 25 ng of product from the rst PCR.
The PCR conditions included denaturation for 1 min at 94 °C, 10
cycles of denaturation for 10 s at 98 °C and combined annealing/
extension for 3.5 min at 68 °C, followed by a nal extension for
10 min at 72 °C. The nal amplicons from three 96-well PCR plates
(288 samples) were pooled, processed with the Sequel II binding kit
3.1 (PacBio), and sequenced on one SMRT Cell on the Sequel II
System (PacBio).
PacBio amplicon analysis
The high-delity (HiFi) reads were assembled by circular consensus
sequencing (CCS) within SMRT Link (PacBio), demultiplexed with
Lima, and aligned to the reference genome GRCh38 with minimap2.
The VNTR6-1 amplicons had an average read coverage of ~ 10,000
reads per sample. The resulting BAM les were scored for rs56345976
and rs33961405 using SAMtools with mpileup ag (v1.17) and for
VNTR6-1 using Straglr (v1.4). The analysis was restricted by reads fully
covering the amplicon (GRCh38, chr5:1275500-1277500), excluding
outputs from partial reads using SAMtools with the ampliconclip ag
(v1.17). Phased haplotypes of rs56345976 and rs33961405 were con-
structed based on PacBio reads.
DNA genotyping
TaqMan genotyping assays for the TERT SNPs rs56345976 (C__88
595060_10), rs33961405 (C__34209972_10), rs10069690 (C__3032
2061_10), rs2242652 (C__16174622_20), rs7705526 (C__189441058_10),
rs2736100 (C__1844009_10) and rs2853677 (C__1844008_10) were pur-
chased from Thermo Fisher. The samples were genotyped in 384-well
plates on a QuantStudio 7 Flex Real-Time PCR System (Applied Biosys-
tems) using 2x TaqMan Genotyping Master Mix (Thermo Fisher) in 5-µL
reactions with 4 ng of genomic DNA per reaction.
Analyses in the 1000 Genomes (1000 G) Project
High-coverage (30x) short-read WGS data in CRAM format and phased
genetic variants for 3201 individuals from the 1000 G populations60
were downloaded from https://www.internationalgenome.org/data-
portal/data-collection/30x-grch38 for the 400 kb genomic region
(GRCh38 chr5:1,100,000-1,500,000). The depth of coverage of the
aligned short-sequencing reads within the 2290 bp genomic regi on
corresponding to VNTR6-1 (GRCh38 chr5:1,275,210-1,277,500) was
analyzed by calculating the median coverage within consecutive 50-
base windows using Mosdepth (v0.2.5). All the samples were classied
into VNTR6-1-Short/Short genotypes (2427 copies) and Long/any
genotypes (with one or two Long alleles of 40.5 or 66.5 copies) by
applying a machine learning approach based on regularized multi-
modal logistic regression, which was developed with the tidymodels
framework and the R package glmnet(v4.1-7). First, a total of
605 samples (18.89%) were randomly selected from the set, repre-
senting all 1000 G super-populations, and visually examined and
assigned to the Short or Long groups based on the coverage proles in
IGV. The dataset was then split into training (60%) and testing (40%)
sets. Fivefold cross-validation was used during the training process to
develop and evaluate the prediction model. The model demonstrated
stable performance in accurately classifying VNTR6-1 into the Short
and Long categories, with 96.8% specicity, 92.8% sensitivity, an F
score of 0.95, and an area under the ROC curve (AUC) of 0.98 (Sup-
plementary Fig. 2).
To identify variants predictive of VNTR6-1-Short/Long status, all
12,338 biallelic SNPs from the 1000 G phased genetic variant data
across the 400 kb genomic region (GRCh38 chr5:1,100,000-
1,500,000) were extracted and ltered for an MAF > 5%, resulting in
1473 SNPs for analysis. Based on Chi-square tests, 594 of these SNPs
were signicantly associated with the VNTR6-1 Short and Long cate-
gories (p< 0.05). A random forest model was then applied using the R
package randomForest(v4.7-1.1) to identify the predictive value of the
signicant SNPs for VNTR6-1 categories, selecting the top 10% based
on mean decrease in Gini scores. A total of 60 SNPs were identied as
highly informative, with rs56345976 and rs33961405 showing the
highest combined predictive probabilities for VNTR6-1 classication.
To map the haplotypes of rs56345976 and rs33961405 with the
prole of coverage distribution across the genomic region GRCh38
chr5:1,275,210-1,277,500, we applied unsupervised hierarchical clus-
tering using the core hclustfunction in R (v4.3.0) with the Euclidean
distance metric and complete linkage method. The rs56345976-A/
rs33961405-G haplotypes captured the VNTR6-1 Long allele (Cohens
Kappa coefcient of 0.78 and agreement of 0.90), whereas all the
remaining haplotypes captured the VNTR6-1 Short allele (Supple-
mentary Fig. 4). Phased data from our long-read sequencing, including
assemblies and targeted sequencing, were used to conrm the segre-
gation of rs56345976 and rs33961405 with VNTR6-1 (Supplemen-
tary Data 2).
We created a custom 1000 G reference panel that included all the
markers within the 400 kb genomic region (GRCh38 chr5:1,100,000-
1,500,000). In this region, VNTR6-1 was used as a biallelic marker, with
Short and Long alleles determined by the rs56345976/rs33961405
haplotypes at position chr5:1,275,400 (Supplementary Data 3). To
evaluate the scoring performance, the 1000 G dataset (n= 3201) was
randomly partitioned into two groups, which served as a reference
panel (n= 1601) and a test panel (n= 1600), to perform phasing with
SHAPEIT4 (v4.2.0) and imputation with IMPUTE2 (v2.3.2) with default
settings. VNTR6-1 was condently scored in all test panel samples
(imputation quality score61, IQS = 0.98), with an overall concordance of
99.3% compared with the predetermined genotypes across the entire
dataset. The population-specic concordance rates for VNTR6-1
imputation were as follows: EUR 99.7% (n= 321), AMR 99.6%
(n= 243), AFR 99.1% (n=456), SAS 98.99% (n= 299), and EAS
98.32% (n=281).
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Analyses in the Prostate, Lung, Colorectal and Ovarian (PLCO)
Cancer Screening Trial
PLCO62 is a large population-based cohort that includes 155,000 par-
ticipants enrolled between November 1993 and July 2001. The
individual-level data, including genotyped variants from Illumina
arrays, imputed variants using the TopMed reference panel, and phe-
notype data, were provided by PLCO upon approved application
(project #PLCO-957). The European ancestry dataset included 99,167
individuals (51.66% females and 48.34% males, ages 4274, mean age
62.26 years), comprising 73,085 cancer-free controls (55.28% females
and 44.72% males, mean age 62.02 years) and 26,082 patients with 16
cancer types (41.52% females and 58.48% males, meanage 62.84 years),
3239 (12.42%) of whom had multiple cancer types. All the variants
within the 400 kb region (GRCh38 chr5:1,100,000-1,500,000) were
phased using SHAPEIT4 (v4.2.0) and then VNTR6-1 genotypes (Short or
Long) were assigned based on phased rs56345976/rs33961405 haplo-
types. Logistic regression analyses were conducted with the logit link
function for binary outcomes using the glmfunction in R (v4.3.0),
adjusting for sex and age.
Analyses in the UK BioBank
Associations between genetic markers and relative leukocyte telomere
length (rLTL) in peripheral blood were assessed in the UK Biobank
(UKB) (https://www.ukbiobank.ac.uk/), a population-based pro-
spective study in the United Kingdom63, based on an approved appli-
cation (#92005). The analysis included 339,103 cancer-free
participants of European ancestry (54.64% females and 45.36% males,
ages 3873, mean age 55.82 years) with SNP data genotyped using the
UK Biobank Axiom array and imputed using the Haplotype Reference
Consortium and UK10K reference panels, along with rLTL measure-
ments. VNTR6-1 was scored as described above for PLCO. We used
linear regression models to assess the associations between the tech-
nically adjusted rLTLs (log
e
and Z-transformed)64 and the genetic
markers. This analysis was performed using the lmfunction in R
(v4.3.0) and adjusting for sex, age, and smoking status. A conditional
linear model was tested by independently adding SNPs (rs2736100,
rs2853677, and rs7705526) that are strongly associated with telomere
length in multiple populations. To account for trend differences in
rLTLs across all ages, the conditional linear model included an inter-
action term between the genetic markers and 5-year age groups that
was used to avoid age-heaping bias while maintaining a sufcient
sample size for each age class.
Analyses in The Cancer Genome Atlas (TCGA)
Blood-derived germline data for 9,610 TCGA participants across 33
cancer types were accessed through the National Cancer Institute
Genomic Data Commons (GDC, https://gdc.cancer.gov/). Controlled
access genotype calls generated from Affymetrix SNP6.0 array inten-
sities using BIRDSUITE65 were retrieved from the genomic region
GRCh37, chr5:335,889-2,321,650. In this region, in addition to the 5453
initially genotyped variants, we imputed approximately 57,000 var-
iants with imputation quality scores exceeding 0.8 using the TOPMed
Imputation Server, which includes data from more than 97,000
participants66. The imputation quality scores across cancer types were
as follows: mean (minmax) r2= 0.83 (0.78-0.89) for rs56345976,
r2=0.85 (0.750.92) for rs33961405, r2=0.85 (0.760.94) for
rs10069690, and r2=0.84(0.740.92) for rs2242652. Direct genotyp-
ing from germline WGS les for 387 BLCA downloaded from the GDC
revealed high concordance rates between imputed and WGS-
genotyped markers: 89.90% for rs56345976, 86.79% for rs33961405,
91.17% for rs10069690 and 92.75% for rs2242652.
Transcripts per million (TPM) for bulk TERT RNA-seq data were
downloaded from the GDC within the Pan-Cancer Atlas publications67.
The TPMs for the TERT-βand TERT-FL transcripts were downloaded
from the UCSC Xena platform (https://xenabrowser.net/datapages/)
within the UCSC toil RNA-seq Recompute Compendium, cohort TCGA
Pan-Cancer (PANCAN). We used pre-computed telomerase-related
metrics, including expression-based telomerase enzymatic activity
detection (EXTEND) scores based on a 13-gene signature33,stemness
indices calculated via a predictive model using one-class logistic
regression on mRNA expression34, a telomerase signature score esti-
mated from a 43-gene panel, and telomere length scores calculated
using TelSeq based on WGS35.
eQTL analysis was conducted using TPMs for bulk RNA-seq TERT
expression data and genetic markers (additive genetic model) using
the lmfunction in R (v4.3.0), with adjustments for sex and age.
Spearman rank correlations between TERT expression (TERT-βand
TERT-FL) and telomerase-associated metrics for each cancer type were
determined using the rcorrfunction of the Hmisc package in
R(v4.3.0).
Analyses in the Genotype-Tissue Expression (GTEx) project
TPMs for the TERT-βand TERT-FL transcripts were downloaded from
the GTEx Portal (https://gtexportal.org/home/downloads/) within the
bulk tissue expression database, GTEx Analysis V8 RNA-seq. Pre-
computed EXTEND scores based on a13-gene signature were obtained
from the Supplementary Information of the corresponding
publication33. Spearman rank correlations between TERT expression
(TERT-βand TERT-FL) and EXTEND scores for each tissue type were
determined using the rcorrfunction of the Hmisc package in R
(v4.3.0). The eQTLs for rs10069690, rs2242652, and TERT expression
were assessed through the GTEx portal.
CFSE proliferation assay
For each condition, cells (9.6E5) were stained with a 5 µMsolutionof
carboxyuorescein succinimidyl ester (CFSE) dye (CellTrace CFSE Cell
Proliferation Kit, Thermo Fisher) for 15min at 37 °C. Culture media
containing 10% CS FBS was added to an equal volume of staining
solution to quench excess dye. CFSE-stained cells were seeded into
6-well plates (Corning) at 1.2E5 cells/well in CS medium and incubated
at 37 °C and 5% CO
2
. The remaining CFSE-stained cells were analyzed
on an AttuneNxT (Thermo Fisher)ow cytometer to determine the day
0 (maximal) CFSE intensity (CFSE
start
). Seeded cells were grown for
48 h in CS medium to allow all cell lines to reach a sufcient level of
attachment for a medium change and then switched to either full
medium or CS medium. The cells were harvested with 0.05% trypsin-
EDTA 48 h after the media were changed and analyzed by ow cyto-
metry to determine the nal CFSE intensities (CFSE
nal
). The data were
re-analyzed using FlowJo v10. The CFSE mean uorescence intensity
(MFI) was determined by taking the geometric mean of uorescence
(collected on the BL1 channel, 530/30 nm) after gating live single cells.
Cell doublings were calculated using the equation: Cell Doublings =
(ln (CFSE
nal
/CFSE
start
)/ln 2).
CRISPR/Cas9 genome editing
CRISPR/Cas9 guide RNAs anking the VNTR6-1 region (2241 bp in the
reference genome) were designed using sgRNA Scorer 2.068. Annealed
oligonucleotides corresponding to two guide RNAs (Supplementary
Data 19)were cloned using Golden Gate Assembly cloning into PDG458
(ref. 69, Addgene plasmid #100900; http://n2t.net/addgene:100900;
RRID:Addgene 100900, a gift from Paul Thomas). The cells (1.0E6/
transfection) were transiently transfected with CRISPR/Cas9-expres-
sing plasmids using the Amaxa 4D nucleofection system (Lonza), a
100 µl SF cell line kit, and the CM-130 program (A549 prole settings
were used for all the cell lines). GFP-positive cells were enriched by
FACS 48h post-transfection using an SH800 sorter (Sony). The enri-
ched population was further single-cell sorted in 96-well plates to
isolate pure knockout populations. Genomic DNA from the expanded
clones was screened by PCR with the primers VNTR6-1F and VNTR6-1R
(Supplementary Data 19). These primers generate a 2241 bp PCR
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 14
Content courtesy of Springer Nature, terms of use apply. Rights reserved
product (based on the reference genome sequence) and a 974bp PCR
product after knockout. Three independent knockout clones (V6.1-
KOs) were selected for functional analyses. Clones that were exposed
toCRISPRreagentsbutdidnotresultinknockoutwerecomparedwith
parental controls (WT, no CRISPR treatment) by RNA-seq analysis.
CRISPR treatment had negligible effects on gene expression, and sta-
tistical analysis of the RNA-seq data was performed comparing V6.1-
KOs with the WT.
Cloning
The pCMV-TERT-FL-HA expression construct was generated with high-
delity Q5 polymerase (NEB) and amplied from a TERT-FL plasmid
(GenScript OHu25394), using a forward primer with an AgeI recogni-
tion site and a reverse primer with an HA-tag and BsrGI recognition
sites (Supplementary Data 19). PCR fragments were isolated by elec-
trophoresis and a gel extraction kit (Qiagen) and cloned into an
mEGFP-N1 expression vector (Addgene #54767) using AgeI and BsrGI
restriction enzymes (NEB), replacing mEGFP. The pCMV-TERT-β-
3xFLAG expression construct was generated using two separate Q5
PCRs from the same TERT-FL plasmid. The rst PCR utilized the same
AgeI forward primer and a reverse primer with a native BamHI recog-
nition site within TERT exon 9. The second PCR utilized the BamHI site
in its forward primer and a reverse primer with a 3xFLAG tag and a
BsrGI recognition site (Supplementary Data 19). These two PCR frag-
ments were isolated by electrophoresis and a gel extraction kit (Qia-
gen), cloned into pCR4 Blunt-TOPO (Invitrogen), and subcloned into
the mEGFP-N1 expression vector using AgeI + BamHI and BamHI+
BsrGI, replacing mEGFP.
RNA extraction
Cell lysates were harvested from culture plates using 350 µl of RLT lysis
buffer/well and stored at 80 °C before extraction. RNA was extracted
with the Qiagen RNeasy Mini RNA kit using QIAcube with standard on-
column DNAse treatment (Qiagen). The RNA concentrations were
quantied with a Qubit RNA High Sensitivity Kit (Invitrogen).
cDNA synthesis
7.5 µg of RNA from each sample was used in 20 µl reactions with the
iScript Advanced cDNA Synthesis Kit (Bio-Rad). The cDNA was con-
centrated overnight by ethanol precipitation and resuspended in
37.5 µl of water, resulting in an RNA input concentration of 200 ng/µl.
Expression assays
Expression of the TERT-βand TERT-FL transcripts was quantied with
two custom TaqMan gene expression assays (Thermo Fisher, Supple-
mentary Data 19) designed to target specic exons and splice junc-
tions. Reactions were multiplexed to include both targets and a
custom human HPRT1 endogenous control (NED/MGB probe, primer
limited, Assay ID: Hs99999909_m1, Thermo Fisher). TaqMan reactions
were run in technical quadruplicate in 384-well plates on a Quant-
Studio 7 Flex Real-Time PCR System (Applied Biosystems). Each 6 µl
reaction included 2 µl of cDNA diluted to 100 ng/µlfroma200ng/µl
RNA input. All assays (individually and in multiplexed reactions) were
validated using the TERT-FL-HA and TERT-β-3xFLAG plasmids in a
5 × 10-fold dilution series (from 100 pM to 10 fM). All the assays had
experimentally determined PCR efciencies ranging from 72100%.
The identities of the PCR products were conrmed by cloning into the
TOPO-pCR4 vector (Invitrogen) and Sanger sequencing with the
M13_TOPO primers (Supplementary Data 19).
SYBR Green RTqPCR assays were performed with iTaq Universal
SYBR Green Supermix (Bio-Rad). The samples were run in 5 µlreactions
with 2 µl of cDNA diluted to 50 ng/µl from the RNA input in 12 technical
replicates on a QuantStudio 7 Flex Real-Time PCR System. The primers
(10 mM, Thermo Fisher) used were identical to those used in the
TaqMan assays. HPRT1 controls (Supplementary Data 19) were run in
parallel reactions. For visualization, technical replicates of selected
RTqPCR products were pooled and resolved on 2% agarose gels,
along with a low-molecular-weight DNA ladder (NEB). The gel images
were captured on a Bio-Rad ChemiDoc Imaging System and analyzed
using Image Lab Software v6.1.0 (Bio-Rad). The ratio of TERT isoforms
was calculated based on the gel densitometry of the PCR products
(120 bp and 302 bp).
Total TERT expression was measured in 5 µL reactions using
TaqMan assays (FAM, exons 3-4) with TERT-Hs00972650_m1 multi-
plexed with the endogenous control HPRT1 (VIC, primer-limited, Assay
ID: Hs99999909_m1) and TaqMan Gene Expression Buffer (all from
Thermo Fisher).
RNA-seq
RNA quality (all RINs>9.0) was veried using the Bioanalyzer (Agilent)
and an RNA 6000 Nano Kit (Agilent). For each sample, 200 ng of total
RNA was used to prepare an adapter-ligated library with the KAPA RNA
HyperPrep kit with RiboErase (HMR) (KAPA Biosystems) using the
xGen Dual Index UMI Adapters (IDT). The multiplexed libraries with
250350 bp inserts were sequenced on a NovaSeq 6000 (Illumina) to
generate 279418 million paired-end 150 bp reads per sample. Quality
assessment of the RNA-seq data was conducted using MultiQC
(v1.16)70.Thequantication of transcript abundance was performed
using Salmon (v0.14.1) in count mode withvalidateMappings ag and
expressed as transcripts per million (TPM). The raw RNA-sequencing
reads were aligned with STAR71 based on the reference genome
GRCh38 and GENCODE annotation (v36). Differential expression ana-
lysis was conducted with DESeq2 (v1.40.2) based on the estimated
counts obtained from Salmon quantication, controlling for the false
discovery rate (FDR). Gene-level transcript abundances were estimated
with lengthScaledTPMin the R package tximport(v1.28.0). Gene
Ontology (GO) analysis and gene set enrichment analysis (GSEA) on
differentially expressed genes was conducted with clusterPro-
ler (v4.8.3).
G4 Hunter prediction analysis
Analysis was performed with G4Hunter (https://bioinformatics.ibp.
cz)72. PacBio-generated DNA sequences for UMUC3 (24 repeat
copies per allele) and HG03516 (27 and 66.5 repeat copies per allele)
were used as inputs anked by 120 bp on each side of the VNTR6-1
region.
G4-seq analysis
For the lymphoblastoid cell line NA18057 (VNTR6-1-Short/Short geno-
type, 24 and 27 repeat copies), ChIP-seq data for G-quadruplexes (G4)
detected in forward and reverse orientations were downloaded from
BED les from the GEO dataset GSE63874 (ref. 73,les GSE63874_-
Na_K_minus_hits_intersect.bed.gz and GSE63874_Na_K_plus_h-
its_intersect.bed.gz). These les were merged into a single BED le and
converted to the UCSC BED format. The G4 mismatch quantication
bedGraph les GSE63874_Na_K_12_minus.bedGraph.gz and GSE63874_-
Na_K_12_plus.bedGraph.gz were downloaded and converted into bigwig
format using the bedGraphToBigWig tool.
Similarly, for the 293 T normal embryonic kidney cell line (VNTR6-1-
Long/Long genotype), the G4-seq data were downloaded from
GSE110582 (ref. 74,les GSM3003539_Homo_all_w15_th-
1_minus.hits.max.K.w50.25.bed.gz and GSM3003539_Homo_all_w15_th-
1_plus.hits.max.K.w50.25.bed.gz) and processed as above. The G4 mis-
match quantication values were downloaded from GSM3003539_
Homo_all_w15_th-1_minus.K.bedGraph.gz and GSM3003539_Homo_all_
w15_th-1_plus.K.bedGraph.gz. The G4-seq tracks for NA18057 and
293 T cells were visualized through the UCSC Genome Browser
(GRCh37).
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Evaluation of G4 ligands
Five G4 stabilizing ligands were tested for their ability to stabilize TERT
G4. Ligands: PhenDC3, TMPyP4, BRACO-19, and Pyridostatin were
provided by Dr. John Schneekloth. Pidnarulex (CX-5461) was selected
from the literature21 and obtained from Selleck Chem. For optimiza-
tion, cells were seeded into 6-well plates at 4.0E5 cells/well. After
adhering for 24 h, the cells were treated for 24, 48, or 72 h with 0.1 µM,
0.3 µM, 1 µM, 3 µM, 10 µM, or 30 µM ligands dissolved in DMSO, with
the DMSO vehicle alone and untreated control samples included in
each plate. In the 72 h group, the media was replaced at 48 h, and the
cells were harvested at 72 h. The viability of the treated cells was
evaluated by counting them with the BioTek Lionheart FX automated
microscope (Agilent) every 24 hours. Pidnarulex (CX-5461) and
PhenDC3 at 3 µMfor72hwereidentied as the most effective treat-
ments for modulating TERT exon 7-8 skipping and were used in sub-
sequent experiments. WT and V6.1-KO cells were treated with technical
replicates in three independent experiments.
Western blot
BCA-normalized protein samples and 10 µL of SeeBlue Plus2 ladder
were loaded and run on gels using 1X Bolt running bufferat 165 V for 1 h
and transferred to nitrocellulose membranes using an iBlot2 dry
transfer instrument (Invitrogen). The membranes were blocked with
5% milk in 1X TBST for 1 h at room temperature. The membranes were
incubated overnight at 4°C with primary antibodies in 2.5% milk in 1×
TBST (anti-GFP: Invitrogen A-11122; anti-HA: Novus NB600-362; anti-
FLAG: SigmaAldrich M2; anti-GAPDH: Abcam ab9485). After three
5 min washes with 1X TBST, the membranes were incubated at room
temperature for 1 h with secondary antibodies (anti-rabbit: Cell Sig-
naling 7074; anti-mouse: Cell Signaling 7076; anti-goat: Santa Cruz sc-
2304)andimagedusingPicoandFemtoECLreagents(Thermo).
Structured illumination microscopy uorescence imaging
A549 cells were chosen for imaging of mitochondria because this
highly transfectable cell line has a larger cytoplasmic area than
UMUC3, allowing better visualization. The cells were seeded in a 12-
well plate at 1.25E5 cells/well and cotransfected with pCMV-TERT-FL-
HA or pCMV-TERT-β-3xFLAG expression constructs at a 50:50% iso-
form ratio. Transfections were performed using Lipofectamine 3000
for 4 h. The transfected cells were washed with DPBS, dissociated
using Accutase (StemPro), and counted. The cells were then diluted
and seeded onto CultureWell Chambered Coverglass (Invitrogen).
After 48 h, the coverslips were xed with 4% formaldehyde in PBS for
10 min, permeabilized with 0.03% Triton-X 100 for 10 min, and
blocked with blocking buffer (5% BSA + 0.01% Triton-X 100 in PBS)
for 30 min. The coverslips were incubated at 4 °C overnight with the
following primary antibodies: anti-FLAG (Sigma M2, mouse, 1:400
dilution), anti-HA (Novus NB600-362, goat, 1:400 dilution), and anti-
TOM20 (Proteintech 11802-1-AP, rabbit, 1:1000 dilution) diluted in
blocking buffer, followed by incubation at room temperature for
30 min with the following secondary antibodies: anti-mouse-
AlexaFluor488 (Thermo Fisher A21202, 1:500 dilution), anti-goat-
AlexaFluor647 (Thermo Fisher A32849, 1:500 dilution), and anti-
rabbit-AlexaFluor555 (Thermo Fisher A31572, 1:500 dilution) diluted
in blocking buffer. Three washes were performed with PBS between
all the staining steps; after the nal wash, the cells were counter-
stained with 3 µg/ml DAPI. The coverslips were then mounted onto
glass slides with ProLong Gold Antifade Mountant (Invitrogen) and
sealed with clear nail polish. Superresolution structured illumination
microscopy uorescence images were obtained using ZEN Black
software on an ELYRA PS.1 A superresolution (SR) microscope (Carl
Zeiss, Inc.) with a Plan-Achromat 63X/1.4 NA oil objective and a
Pco.edge sCMOS camera, 405 nm/488 nm/561 nm/633 nm laser illu-
mination and standard excitation and emission lter sets were used.
Raw data were acquired by projecting grids onto the sample
generated from the interference from a phase grating with 23 µm,
28 µm, and 34 µm spacings for 405, 488, and 561 nm excitation,
respectively (3 grid rotations and 5 grid shifts for a total of 15 images
per super-resolved z-plane per color). The raw images were pro-
cessed with ZEN black software. For publication, images were scaled
to 8-bit RGB identically with a linear LUT and exported in TIFF format
using ImageJ. Figures were made from the TIFF images in Adobe
Illustrator without any change in resolution, except for the inset
zoomed images.
Apoptosis assay
Cells were seeded in 6-well platesat 1.2E5 cells/well, and the media was
changed 48 h later to full medium, CS medium alone, or CS medium
containing 10 µM cisplatin. The cells were harvested with 0.05%
trypsin-EDTA 48 h after the media was changed, pelleted at 500 × g for
5 min, and washed with 1 mL of PBS. The cells were stained with an
Annexin V-FITC conjugate (Thermo Fisher) and propidium iodide
(Thermo Fisher) in Annexin V staining buffer (Thermo Fisher)
according to Rieger et al.75. FITC (ex.488 nm/em.517 nm) and PI
(ex.488nm/em. 617 nm) uorescence were analyzed by ow cyto-
metry on an Attune NxT with a CytKick Autosampler (Thermo Fisher).
Unstained cells, Annexin V-FITC-stained cells, and PI-stained cells were
used as compensation controls. Apoptosis was determined by the
percentage of FITC-positive cells.
Lionheart cell proliferation analysis
WT and V6.1-KO UMUC3 cells were seeded in 6-well plates (Falcon) at
1.0E4 cells/well in EMEM. After adherence for 24 hours (day 0), a label-
free cell counting protocol, with focus and cell size calibrated to
adhered WT UMUC3 cells, was created on the BioTek Lionheart FX
automated microscope (Agilent), and cell counts were recorded every
24 h for 10 days. The fold change in the number of cells was calculated
by dividing the recorded counts by the initial cell counts on day 0.
Linear mixed models were applied to the data obtained from the
Lionheart FX, normalized to day 0, where the treatment type was
considered a xed effect term and the technical replicate was con-
sidered a random effect term. Maximum likelihood estimation pro-
cedures were employed to conduct joint effects likelihood-ratio tests,
whereas restricted maximum likelihood estimation was utilized for
more precise estimation of effect sizes as beta coefcients using the
linear mixed-effects function in the R package nlme(v3.1162).
xCELLigence Real-Time Cell Analysis (RTCA)
In Supplementary Fig. 19c and d, cells were seeded in a 12-well plate at
1.25E5 cells/well and transfected with either GFP, pCMV-TERT-FL-HA,
or pCMV-TERT-β-3xFLAG expression constructs either as single
transfection or co-transfection at different ratios of isoforms (80:20%
and 20:80%). Transfections were performed using Lipofectamine
3000 for4 h. The transfected cells were washed with DPBS, dissociated
using Accutase (StemPro), and counted. The cells were then diluted
and seeded into an xCELLigence E-Plate 16 microplate (Agilent) at
1.0E3 cells/well and placed on an xCELLigence RTCA DP system (Agi-
lent). The data were collected every 15 minutes in RTCA software for
288 h and then exported for analysis.
In Fig. 4a and Supplementary Figure 18, 1.0E3 WT or V6.1-KO cells
grown in CS medium were seeded into E-Plate 16 microplates at 1.0E3
cells/well (Agilent). Cell label-free impedance in the E-Plate (correlated
with cell proliferation) was measured every 15min for 283 h using the
xCELLigence RTCA DP system. Two days after seeding, the medium
was changed to either full medium or CS medium (control).
Linear mixed models were applied to the impedance data
obtained from the xCELLigence system, where the treatment type was
considered a xed effect term and the technical replicate was con-
sidered a random effect term. Maximum likelihood estimation pro-
cedures were employed to conduct joint effects likelihood-ratio tests,
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved
whereas restricted maximum likelihood estimation was utilized for
more precise estimation of effect sizes as beta coefcients using the
linear mixed-effects function in the R package nlme(v3.1162).
HiChIP analysis
The H3K27Ac HiChIP libraries for the bladder cancer cell lines T24 and
RT4 were generated using the Arima-HiChIP protocol (Arima Geno-
mics, A101020). Briey, 1E6 cells/replicate were collected for chro-
matin cross-linking followed by digestion with a restriction enzyme
cocktail, biotin labeling, and ligation. The samples were then puried,
fragmented, and enriched. Pulldown was performed using an antibody
against H3K27ac (Cell Signaling Technology, #8173).The Arima-HiChIP
libraries that passed the QC were sequenced using an Illumina Nova-
Seq 6000 to generate raw FASTQ les for each sample. The paired-end
readswerealignedtotheGRCh37genomeusingtheHiC-Propipeline
(v3.1.0, https://github.com/nservant/HiC-Pro). The conrmed interac-
tionreadswereusedasinputsforsignicant loop calling via the
FitHiChIP tool (v.11.0, https://github.com/ay-lab/FitHiChIP)with
default settings. The HiChIP loop and ATAC peak calling les for the
GM12878 and normal bladder samples were downloaded from the
Gene Expression Omnibus (GSE188401). The interactions were visua-
lized through the UCSC genome browser.
PacBio DNA methylation analysis
Freshly collected genomic DNA (5 µg) from the HT1376, RT4, T24,
SCaBER, UMUC3, and Raji cell lines was sheared using Covaris g-tubes
at 4800 rpm, followed by size selection using PippinHT. Three SMRT
ow cells were run for each sample library on the PacBio Sequel II
platform. The sequence reads were transformed into FASTQ les and
aligned to the GRCh38 reference genome using the default settings of
the SMRT-Link workow. 5mC DNA methylation analysis was a part of
the SMRT-Link pipeline, and the corresponding information specifying
the positions and probabilities of 5mC methylation at CpG sites was
integrated into the output le. The methylation plots were generated
in IGV by coloring alignments in PacBio WGS bam les based on base
modication (5mC).
Oxford Nanopore cDNA-seq
cDNA libraries were generated using the PCR cDNA Sequencing Kit
SQK-DCS109 (Oxford Nanopore Technologies), starting with 100 ng
of poly-A RNA. Libraries were loaded onto R9.4.1 PromethION ow
cells mounted on a P2 Solo and run for 96 h. Basecalling was per-
formed using MinKNOW software with the high-accuracy model on a
GridION sequencer (Oxford Nanopore Technologies). The reads
were aligned to GRCh38 via Minimap2 (v2.26) and SAMtools (v1.5).
UMUC3 yielded 25,827,200 reads, with 46 reads aligning to TERT,
whereas UMUC3 V6.1-KO yielded 18,709,848 reads, with 62 reads
aligning to TERT.
Analysis of sequence conservation in non-human species
Haplotype-resolved Telomere-to-Telomere (T2T) assemblies of pri-
mates were downloaded from GenomeArk (https://www.genomeark.
org/). The FASTA sequences were aligned to the human GRCh38
reference genome using Minimap2 (v2.26) with the -ax asm10ag
and converted to a BAM le using SAMtools (v1.5). The TERT VNTR6-1
repeat units were analyzed with Tandem Repeat Finder (https://
tandem.bu.edu/trf/trf.html). The BAM les of Neandertal (n=3) and
Denisova (n= 1) individuals were downloaded from the Max Planck
Institute for Evolutionary Anthropology resource (http://cdna.eva.
mpg.de/neandertal/Vindija/bam/Pruefer_etal_2017/ and http://cdna.
eva.mpg.de/neandertal/Chagyrskaya/) and visualized using IGV.
Statistical analysis
Analyses were performed with R Studio (v4.3.0), GraphPad Prism
(v10), and FlowJo (v9). P values are for unpaired two-sided tests:
StudentsTtest, linear mixed models, and linear or logistic
regression, with adjustments for relevant covariates as indicated. P-
values are reported without correction for multiple
comparisons, or based on FDR-adjustment or permutation as indi-
cated. Error bars correspond to standard deviation (SD), standard
error of the mean (SEM), or 95% condence intervals (CI), as
indicated.
This work utilized the computational resources of the NIH HPC
Biowulf cluster (http://hpc.nih.gov). The gures were assembled using
Adobe Illustrator.
Reporting summary
Further information on research design is available in the Nature
Portfolio Reporting Summary linked to this article.
Data availability
Data generated in this study have been deposited in the NCBI Sequence
Read Archive (SRA). PacBio-targeted sequencing data are included in
the BioProject PRJNA1134698.ThedataforPacBio-WGS,HiChIP,short-
read RNA-seq by Illumina, and long-read RNA-seq by Oxford Nanopore
Technology are included in the BioProject PRJNA1134701. The publicly
available datasets used in the study include RNA-seq expression data
from TCGA (UCSC Xena platform, https://toil.xenahubs.net,https://toil-
xena-hub.s3.us-east-1.amazonaws.com/download/tcga_rsem_isoform_
tpm.gz)76; RNA-seq expression data in normal tissues (GTEx portal,
https://www.gtexportal.org/,https://storage.googleapis.com/adult-
gtex/bulk-gex/v8/rna-seq/GTEx_Analysis_2017-06-05_v8_RSEMv1.3.0_
transcript_tpm.gct.gz)77; haplotype-resolved Telomere-to-Telomere
(T2T) assemblies of primates (Genome Ark database, https://www.
genomeark.org/, IDs: mGorGor1, mPanPan1, mPanTro3, and mPonAbe1,
accessed on October 3, 2023 https://registry.opendata.aws/genomeark);
WGS for Neandertal and Denisova individuals (Max Planck Institute for
Evolutionary Anthropology, https://www.eva.mpg.de/index/,IDs:Altai,
Denisova,Vindija,andChagyrskaya)
78,79;FASTAles for long-read WGS
(Human Pangenome Reference Consortium, https://humanpangenome.
org/,https://github.com/human-pangenomics/HPP_Year1_Data_Freeze_
v1.0)11; HiChIP data (NCBI Gene Expression Omnibus, accession code
GSE188401)80; the 1000 Genomes 30x on GRCh38 data (The Interna-
tional Genome Sample Resource, https://www.internationalgenome.
org/,https://www.internationalgenome.org/data-portal/data-collection/
30x-grch38)60; and ChIP-seq data for G-quadruplexes (NCBI Gene
Expression Omnibus, accession codes GSE6387473 and GSE110582)74.
The controlled access data were obtained from PLCO (#PLCO-957) and
UKB (#92005) based on approved applications. The controlled access
long-read sequencing data from the Center for AlzheimersandRelated
Dementias (CARD) of the National Institute on Aging is available from
dbGaP phs001300.v4.p1, and data for Burkitt Lymphoma Genome
Sequencing Project (BLGSP) is available from dbGAP phs000527.v6.p2.
The remaining data used in this article are available within the Article,
Supplementary Information, or Source data provided with this paper.
Source data are provided in this paper.
Code availability
The pipeline and script used for the analysis of the genome assem-
blies are available at GitHub (https://github.com/oorez/
HumanGenomeAssemblies) and Zenodo (https://doi.org/10.5281/
zenodo.14633198)81.
References
1. Rafnar, T. et al. Sequence variants at the TERT-CLPTM1L locus
associate with many cancer types. Nat. Genet. 41, 221227 (2009).
2. Wang, Z. et al. Imputation and subset-based association analysis
across different cancer types identies multiple independent risk
loci in the TERT-CLPTM1L region on chromosome 5p15.33. Hum.
Mol. Genet. 23, 66166633 (2014).
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 17
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3. Chen, H. et al. Large-scale cross-cancer ne-mapping of the
5p15.33regionrevealsmultipleindependentsignals.HGG Adv. 2,
100041 (2021).
4. Koutros, S. et al. Genome-wide Association Study of Bladder Can-
cer Reveals New Biological and Translational Insights. Eur. Urol. 84,
127137 (2023).
5. Roake, C. M. & Artandi, S. E. Regulation of human telomerase in
homeostasis and disease. Nat. Rev. Mol. Cell Biol. 21,384397
(2020).
6. Rossiello, F., Jurk, D., Passos, J. F. & dAdda di Fagagna, F. Telomere
dysfunction in ageing and age-related diseases. Nat. Cell Biol. 24,
135147 (2022).
7. James, M. A. et al. Functional characterization of CLPTM1L as a lung
cancer risk candidate gene in the 5p15.33 locus. PLoS ONE 7,
e36116 (2012).
8. Jia, J. et al. CLPTM1L promotes growth and enhances aneuploidy in
pancreatic cancer cells. Cancer Res. 74,27852795 (2014).
9. Leem, S. H. et al. The human telomerase gene: complete genomic
sequence and analysis of tandem repeat polymorphisms in intronic
regions. Oncogene 21,769777 (2002).
10. Szutorisz, H. et al. Rearrangements of minisatellites in the human
telomerase reverse transcriptase gene are not correlated with its
expression in colon carcinomas. Oncogene 20,26002605
(2001).
11. Liao, W. W. et al. A draft human pangenome reference. Nature 617,
312324 (2023).
12. Lee, O. W. et al. Targeted long-read sequencing of the Ewing sar-
coma 6p25.1 susceptibility locus identies germline-somatic inter-
actions with EWSR1-FLI1 binding. Am.J.Hum.Genet.110,
427441 (2023).
13. Schumacher, F. R. et al. Association analyses of more than 140,000
men identify 63 new prostate cancer susceptibility loci. Nat. Genet.
50,928936 (2018).
14. Melin, B. S. et al. Genome-wide association study of glioma sub-
types identies specic differences in genetic susceptibility to
glioblastoma and non-glioblastoma tumors. Nat. Genet. 49,
789794 (2017).
15. Michailidou, K. et al. Association analysis identies 65 new breast
cancer risk loci. Nature 551,9294 (2017).
16. Milne,R.L.etal.Identication of ten variants associated with risk of
estrogen-receptor-negative breast cancer. Nat. Genet. 49,
17671778 (2017).
17. Phelan, C. M. et al. Identication of 12 new susceptibility loci for
different histotypes of epithelial ovarian cancer. Nat. Genet. 49,
680691 (2017).
18. Billingsley, K. J. et al. Long-read sequencing of hundreds of diverse
brains provides insight into the impact of structural variation on
gene expression and DNA methylation. Preprint at https://doi.org/
10.1101/2024.12.16.628723 (2024).
19. Kilian, A. et al. Isolation of a candidate human telomerase catalytic
subunit gene, which reveals complex splicing patterns in different
cell types. Hum. Mol. Genet. 6,20112019 (1997).
20. Lemarteleur,GomezD.,Lacroix,T.,Mailliet,L.,Mergny,P.&Riou,J.
L. JF. Telomerase downregulation induced by the G-quadruplex
ligand 12459 in A549 cells is mediated by hTERT RNA alternative
splicing. Nucleic Acids Res. 32,371379 (2004).
21. Li, G. et al. Alternative splicing of human telomerase reverse tran-
scriptase in gliomas and its modulation mediated by CX-5461. J.
Exp. Clin. Cancer Res. 37,78(2018).
22. De Cian, A. et al. Reevaluation of telomerase inhibition by quad-
ruplex ligands and their mechanisms of action. Proc. Natl. Acad. Sci.
USA 104,1734717352 (2007).
23. Grande, B. M. et al. Genome-wide discovery of somatic coding and
noncoding mutations in pediatric endemic and sporadic Burkitt
lymphoma. Blood 133,13131324 (2019).
24. Bell, R. J. et al. Understanding TERT promoter mutations: A common
path to immortality. Mol. Cancer Res. 14,315323 (2016).
25. Vinagre, J. et al. Frequency of TERT promoter mutations in human
cancers. Nat. Commun. 4, 2185 (2013).
26. Lam,G.,Xian,R.R.,Li,Y.,Burns,K.H.&Beemon,K.L.LackofTERT
promoter mutations in human B-cell non-hodgkin Lymphoma.
Genes 7,https://doi.org/10.3390/genes7110093 (2016).
27. Killedar, A. et al. A common cancer risk-associated allele in the
hTERT locus encodes a dominant negative inhibitor of telomerase.
PLoS Genet. 11, e1005286 (2015).
28. Listerman, I., Sun, J., Gazzaniga, F. S., Lukas, J. L. & Blackburn, E. H.
The major reverse transcriptase-incompetent splice variant of the
human telomerase protein inhibits telomerase activity but protects
from apoptosis. Cancer Res. 73,28172828 (2013).
29. Gonzalez,V.M.,Fuertes,M.A.,Alonso,C.&Perez,J.M.Iscisplatin-
induced cell death always produced by apoptosis? Mol. Pharm. 59,
657663 (2001).
30. Zamzami, N. et al. Mitochondrial control of nuclear apoptosis. J.
Exp. Med. 183,15331544 (1996).
31. OMalley,J.,Kumar,R.,Inigo,J.,Yadava,N.&Chandra,D.Mito-
chondrial Stress Response and Cancer. Trends Cancer 6,
688701 (2020).
32. Machiela, M. J. et al. GWAS Explorer: an open-source tool to
explore, visualize, and access GWAS summary statistics in the
PLCO Atlas. Sci. Data 10,25(2023).
33. Noureen, N. et al. Integrated analysis of telomerase enzymatic
activity unravels an associationwithcancerstemnessandpro-
liferation. Nat. Commun. 12, 139 (2021).
34. Malta,T.M.etal.MachineLearningIdenties Stemness Features
Associated with Oncogenic Dedifferentiation. Cell 173, 338354
(2018).
35. Barthel,F.P.etal.Systematicanalysisoftelomerelengthand
somatic alterations in 31 cancer types. Nat. Genet. 49,
349357 (2017).
36. Taub, M. A. et al. Genetic determinants of telomere length from
109,122 ancestrally diverse whole-genome sequences in TOPMed.
Cell Genom. 2, 10.1016/j.xgen.2021.100084. (2022).
37. Codd, V. et al. Identication of seven loci affecting mean telomere
length and their association with disease. Nat. Genet. 45,422427
(2013).
38. Gadalla, S. M. et al. Donor telomere length and causes of death after
unrelated hematopoietic cell transplantation in patients with mar-
row failure. Blood 131,23932398 (2018).
39. Tomasetti,C.&Vogelstein,B.Canceretiology.Variationincancer
risk among tissues can be explained by the number of stem cell
divisions. Science 347,7881 (2015).
40. Tomasetti, C., Li, L. & Vogelstein, B. Stem cell divisions, somatic
mutations, cancer etiology, and cancer prevention. Science 355,
13301334 (2017).
41. Wu, S., Powers, S., Zhu, W. & Hannun, Y. A. Substantial contribution
of extrinsic risk factors to cancer development. Nature 529,4347
(2016).
42. vonZglinicki,T.,Saretzki,G.,Docke,W.&Lotze,C.Mildhyperoxia
shortens telomeres and inhibits proliferation of broblasts: a model
for senescence? Exp. Cell Res. 220,186193 (1995).
43. Bree, R. T. et al. Cellular longevity: role of apoptosis and replicative
senescence. Biogerontology 3,195206 (2002).
44. Dalghi, M. G., Montalbetti, N., Carattino, M. D. & Apodaca, G. The
Urothelium: Life in a Liquid Environment. Physiol. Rev. 100,
16211705 (2020).
45. Boldrini, M. et al. Human hippocampal neurogenesis persists
throughout aging. Cell Stem Cell 22,589599.e585 (2018).
46. Killela, P. J. et al. TERT promoter mutations occur frequently in
gliomas and a subset of tumors derived from cells with low rates of
self-renewal. Proc.Natl.Acad.Sci.USA110,60216026 (2013).
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 18
Content courtesy of Springer Nature, terms of use apply. Rights reserved
47. Lin, S. et al. Distributed hepatocytes expressing telomerase repo-
pulate the liver in homeostasis and injury. Nature 556,244248
(2018).
48. Ahmed, S. et al. Telomerase does not counteract telomere short-
ening but protects mitochondrial function under oxidative stress. J.
Cell Sci. 121,10461053 (2008).
49. Masutomi, K. et al. The telomerase reverse transcriptase regulates
chromatin state and DNA damage responses. Proc. Natl. Acad. Sci.
USA 102, 82228227 (2005).
50. Maciejowski, J. & de Lange, T. Telomeres in cancer: tumour sup-
pression and genome instability. Nat. Rev. Mol. Cell Biol. 18,
175186 (2017).
51. Schneider, C. V. et al. Association of Telomere Length With Risk of
Disease and Mortality. JAMA Intern. Med. 182,291300 (2022).
52. Savage, S. A., Gadalla, S. M. & Chanock, S. J. The long and short of
telomeres and cancer association studies. J. Natl. Cancer Inst. 105,
448449 (2013).
53. Telomeres Mendelian Randomization C., et al. Association between
telomere length and risk of cancer and non-neoplastic diseases: A
mendelian randomization study. JAMA Oncol. 3,636651 (2017).
54. Kirkwood, T. B. Evolution of ageing. Nature 270,301304 (1977).
55. Toupance, S. et al. Ovarian telomerase and female fertility. Biome-
dicines 9, 842 (2021).
56. Jasienska, G. Costs of reproduction and ageing in the human
female. Philos.Trans.R.Soc.Lond.BBiol.Sci.375,20190615
(2020).
57. Wulfridge, P. & Sarma, K. Intertwining roles of R-loops and
G-quadruplexes in DNA repair, transcription and genome organi-
zation. Nat. Cell Biol. 26,10251036 (2024).
58. Thomas, N. et al. Genetic subgroups inform on pathobiology in
adult and pediatric Burkitt lymphoma. Blood 141,904916
(2023).
59. Kolmogorov, M. et al. Scalable Nanopore sequencing of human
genomes provides a comprehensive view of haplotype-resolved
variation and methylation. Nat. Methods 20,14831492 (2023).
60. Byrska-Bishop, M. et al. High-coverage whole-genome sequencing
of the expanded 1000 Genomes Project cohort including 602 trios.
Cell 185,34263440 (2022).
61. Lin, P. et al. A new statistic to evaluate imputation reliability. PLoS
ONE 5, e9697 (2010).
62. Hasson, M. A. et al. Design and evolution of the data management
systems in the Prostate, Lung, Colorectal and Ovarian (PLCO)
Cancer Screening Trial. Control Clin. Trials 21,329S348S (2000).
63. Bycroft, C. et al. The UK Biobank resource with deep phenotyping
and genomic data. Nature 562,203209 (2018).
64. Codd, V. et al. Measurement and initial characterization of leuko-
cyte telomere length in 474,074 participants in UK Biobank. Nat.
Aging 2,170179 (2022).
65. Korn, J. M. et al. Integrated genotype calling and association ana-
lysis of SNPs, common copy number polymorphisms and rare
CNVs. Nat. Genet. 40,12531260 (2008).
66. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the
NHLBI TOPMed Program. Nature 590,290299 (2021).
67. Gao, G. F. et al. Before and after: Comparison of legacy and har-
monized TCGA genomic data commonsData. Cell Syst. 9,2434
(2019).
68. Chari, R., Yeo, N. C., Chavez, A. & Church, G. M. sgRNA Scorer 2.0: A
species-independent model to predict CRISPR/Cas9 activity. ACS
Synth. Biol. 6,902904 (2017).
69. Adikusuma, F., Ptzner, C. & Thomas, P. Q. Versatile single-step-
assembly CRISPR/Cas9 vectors for dual gRNA expression. PLoS
ONE 12,e0187236(2017).
70. Ewels, P., Magnusson, M., Lundin,S.&Kaller,M.MultiQC:sum-
marize analysis results for multiple tools and samples in a single
report. Bioinformatics 32,30473048 (2016).
71. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinfor-
matics 29,1521 (2013).
72. Brazda, V. et al. G4Hunter web application: a web server for
G-quadruplex prediction. Bioinformatics 35, 34933495 (2019).
73. Chambers, V. S. et al. High-throughput sequencing of DNA
G-quadruplex structures in the human genome. Nat. Biotechnol. 33,
877881 (2015).
74. Marsico, G. et al. Whole genome experimental maps of DNA
G-quadruplexes in multiple species. Nucleic Acids Res. 47,
38623874 (2019).
75. Rieger A. M., Nelson K. L., Konowalchuk J. D., Barreda D. R. Modied
annexin V/propidium iodide apoptosis assay for accurate assess-
ment of cell death. J. Vis. Exp. https://doi.org/10.3791/2597 (2011).
76. Goldman, M. J. et al. Visualizing and interpreting cancer genomics
data via the Xena platform. Nat. Biotechnol. 38,675678 (2020).
77. Consortium G. T. The GTEx Consortium atlas of genetic regulatory
effects across human tissues. Science 369,13181330 (2020).
78. Mafessoni, F. et al. A high-coverage Neandertal genome from
Chagyrskaya Cave. Proc.Natl.Acad.Sci.USA117,1513215136
(2020).
79. Prufer, K. et al. A high-coverage Neandertal genome from Vindija
Cave in Croatia. Science 358,655658 (2017).
80. Donohue, L. K. H. et al. A cis-regulatory lexicon of DNA motif
combinations mediating cell-type-specicgeneregulation.Cell
Genom. 2, 100191 (2022).
81. Florez-Vargas, O. et al. Genetic regulation of TERT splicing affects
cancer risk by altering cellular longevity and replicative potential.
Zenodo https://doi.org/10.5281/zenodo.14633198 (2025).
Acknowledgements
This work was supported by the Intramural Research Programs of the
Division of Cancer Epidemiology and Genetics (DCEG) and the Center
for Cancer Research (CCR), the National Cancer Institute, and the Center
for Alzheimers and Related Dementias (CARD) within the Intramural
Research Program of the National Institute on Aging and the National
Institute of Neurological Disorders and Stroke (1ZIAAG000538). BLGSP
was funded in part by the Foundation for Burkitt Lymphoma Research
(http://www.foundationforburkittlymphoma.org) and with U.S. Federal
funds from the National Cancer Institute, National Institutes of Health,
under Contract No. HHSN261200800001E and Contracts No.
HHSN261201100063C and No. HHSN261201100007I (DCEG). The pre-
sented results are, in part based upon data generated by the TCGA
Research Network. The work was conducted using the UK Biobank
resource (application 92005). The UK Biobank was established by the
Wellcome Trust, the Medical Research Council, the United Kingdom
Department of Health, and the Scottish Government. The UK Biobank
has also received funding from the Welsh Assembly Government, the
British Heart Foundation, and Diabetes UK. The CIBMTR is supported
primarily by the Public Health Service U24CA076518 from the NCI, the
National Heart, Lung and Blood Institute (NHLBI), and the National
Institute of Allergy and Infectious Diseases (NIAID); 75R60222C00011
from the Health Resources and Services Administration (HRSA); and
N00014-23-1-2057 and N00014-24-1-2057 from the Ofce of Naval
Research. The Cancer Genomics Research (CGR) Laboratory and Gen-
ome Modication Core are funded with Federal funds from the National
Cancer Institute under Contract No. 75N910D00024. B.P. and M.M.
acknowledge the support of the Chan Zuckerberg Initiative and the
National Institutes of Health grants U24HG011853 and OT2OD033761 to
B.P. M.H.H. was supported by the NCI Intramural Continuing Umbrella
for Research Experiences (iCURE) program. We thank Drs. Helen Piont-
kivska, and the members of the Laboratory of Translational Genomics for
comments and discussions. We thank Dr. Tatiana Karpova, Optical
Microscopy Core (NCI/CCR/LRBGE), for helping with super-resolution
imaging. The opinions expressed by the authors are their own and
should not be interpreted as representing the ofcial viewpoint of the
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved
U.S. Department of Health and Human Services, the National Institutes
of Health, or the National Cancer Institute. Open Access funding was
provided by the National Institutes of Health (NIH).
Author contributions
O.F.-V. and L.P.-O. conceived the study; O.F.-V., C.-H. L., and C.Z. per-
formed the data analysis; M.H., M.H.H., B.W.P., and K.F. performed the
experiments; C.B., K.J.B., M.K., M.M., and B.P. generated the long-read
genome assemblies; K.F., M.H.H., K.J., W.L., and K.T. performed the long-
read targeted sequencing; R.C., J.S., M.J.M., S.J.C., S.M.G., S.A.S., and
S.M.M. provided reagents, data, samples and interpretations of the
results; O.F.-V. and L.P.-O. led the manuscript writing with the input of all
the authors; and L.P.-O. supervised the project. Correspondence to
Ludmila Prokunina-Olsson (prokuninal@mail.nih.gov).
Funding
Open access funding provided by the National Institutes of Health.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version contains
supplementary material available at
https://doi.org/10.1038/s41467-025-56947-y.
Correspondence and requests for materials should be addressed to
Ludmila Prokunina-Olsson.
Peer review information Nature Communications thanks Gabriele Sar-
etzki, and the other anonymous reviewer(s) for their contribution to the
peer review of this work. A peer review le is available.
Reprints and permissions information is available at
http://www.nature.com/reprints
Publishers note Springer Nature remains neutral with regard to jur-
isdictional claims in published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this
article are included in the articles Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the articles Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by/4.0/.
© This is a U.S. Government work and not under copyright protection in
the US; foreign copyright protection may apply 2025
1
Laboratory of Translational Genomics, DCEG, National Cancer Institute, Rockville, MD, USA.
2
Cancer Genomics Research Laboratory, Leidos Biomedical
Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
3
Center for Alzheimers and Related Dementias,National Institute of Aging
and National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA.
4
Cancer Data Science Laboratory, CCR, National Cancer Institute, Bethesda,
MD, USA.
5
UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
6
Genome Modication Core, Laboratory Animal Sciences Program, Leidos Biomedical
Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
7
Chemical Biology Laboratory, CCR, National Cancer Institute, Frederick,
MD, USA.
8
Integrative Tumor Epidemiology Branch, DCEG, National Cancer Institute, Rockville, MD, USA.
9
Laboratory of Genetic Susceptibility, DCEG,
National Cancer Institute, Rockville, MD, USA.
10
Clinical Genetics Branch, DCEG, National Cancer Institute, Rockville, MD, USA.
11
Infections and Immunoe-
pidemiology Branch, DCEG, National Cancer Institute, Rockville, MD, USA. e-mail: prokuninal@mail.nih.gov
Article https://doi.org/10.1038/s41467-025-56947-y
Nature Communications | (2025) 16:1676 20
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... 71 This clinical intervention could influence phenotypic associations captured by our thyroid-related clusters. 72 Second, heterogeneity in screening practices and diagnostic criteria across countries, cohorts, and calendar 73 years 104,105 may contribute to variability in the direction and magnitude of observed associations. Our findings 74 ...
Preprint
Full-text available
Thyroid cancer is the most common endocrine malignancy, yet its biological underpinnings remain incompletely understood. We conducted a multi-ancestry genome-wide association study meta-analysis of thyroid cancer (16,167 cases and 2,430,374 controls), identifying 51 independent loci, including 21 novel signals. We analyzed the associations of thyroid cancer risk alleles with 151 other thyroid-cancer-related traits. These pleiotropic relationships reveal mechanistic clusters linked to thyroid function, oncogenic pathways, and mixed physiological function. Two thyroid-specific clusters associate with thyroid stimulating hormone, influencing thyroid growth and function and were enriched in thyroid tissues. Oncogenic clusters included DNA repair (ATM, CHEK2, TP53) and telomere maintenance (TERT) genes, implicating shared cancer mechanisms. Cluster-specific polygenic scores were associated with thyroid disease, cancer, and metabolic traits across ancestry groups, suggesting distinct genetic subtypes of thyroid cancer risk. These results demonstrate the utility of pleiotropy-based approaches in uncovering thyroid cancer mechanisms and advancing genetically informed risk stratification.
Preprint
Full-text available
Structural variants (SVs) drive gene expression in the human brain and are causative of many neurological conditions. However, most existing genetic studies have been based on short-read sequencing methods, which capture fewer than half of the SVs present in any one individual. Long-read sequencing (LRS) enhances our ability to detect disease-associated and functionally relevant structural variants (SVs); however, its application in large-scale genomic studies has been limited by challenges in sample preparation and high costs. Here, we leverage a new scalable wet-lab protocol and computational pipeline for whole-genome Oxford Nanopore Technologies sequencing and apply it to neurologically normal control samples from the North American Brain Expression Consortium (NABEC) (European ancestry) and Human Brain Collection Core (HBCC) (African or African admixed ancestry) cohorts. Through this work, we present a publicly available long-read resource from 351 human brain samples (median N50: 27 Kbp and at an average depth of ∼40x genome coverage). We discover approximately 234,905 SVs and produce locally phased assemblies that cover 95% of all protein-coding genes in GRCh38. Utilizing matched expression datasets for these samples, we apply quantitative trait locus (QTL) analyses and identify SVs that impact gene expression in post-mortem frontal cortex brain tissue. Further, we determine haplotype- specific methylation signatures at millions of CpGs and, with this data, identify cis-acting SVs. In summary, these results highlight that large-scale LRS can identify complex regulatory mechanisms in the brain that were inaccessible using previous approaches. We believe this new resource provides a critical step toward understanding the biological effects of genetic variation in the human brain.
Article
Full-text available
Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer’s and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.
Article
Full-text available
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals¹. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Article
Full-text available
The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial is a prospective cohort study of nearly 155,000 U.S. volunteers aged 55–74 at enrollment in 1993–2001. We developed the PLCO Atlas Project, a large resource for multi-trait genome-wide association studies (GWAS), by genotyping participants with available DNA and genomic consent. Genotyping on high-density arrays and imputation was performed, and GWAS were conducted using a custom semi-automated pipeline. Association summary statistics were generated from a total of 110,562 participants of European, African and Asian ancestry. Application programming interfaces (APIs) and open-source software development kits (SKDs) enable exploring, visualizing and open data access through the PLCO Atlas GWAS Explorer website, promoting Findable, Accessible, Interoperable, and Re-usable (FAIR) principles. Currently the GWAS Explorer hosts association data for 90 traits and >78,000,000 genomic markers, focusing on cancer and cancer-related phenotypes. New traits will be posted as association data becomes available. The PLCO Atlas is a FAIR resource of high-quality genetic and phenotypic data with many potential reuse opportunities for cancer research and genetic epidemiology.
Article
Full-text available
Burkitt lymphoma (BL) accounts for the majority of pediatric non-Hodgkin lymphomas being less common but significantly more lethal when diagnosed in adults. Much of our knowledge of the genetics of BL thus far has originated from the study of pediatric BL (pBL), leaving its relationship to adult (aBL) and other adult lymphomas not fully explored. We sought to more thoroughly identify the somatic changes that underlie lymphomagenesis in aBL and any molecular features that associate with clinical disparities within and between pBL and aBL. Through comprehensive whole-genome sequencing of 230 BL and 295 diffuse large B-cell lymphoma (DLBCL) tumors, we identified additional significantly mutated genes (SMGs) including more genetic features that associate with tumor EBV status, and unraveled new distinct subgroupings within BL and DLBCL with three predominantly comprising BLs: DGG-BL (DDX3X, GNA13 and GNAI2), IC-BL (ID3, CCND3), and Q53-BL (quiet TP53). Each BL subgroup is characterized by combinations of common driver and non-coding mutations caused by aberrant somatic hypermutation (aSHM). The largest subgroups of BL cases, IC-BL and DGG-BL are further characterized by distinct biological and gene expression differences. IC-BL and DGG-BL and their prototypical genetic features (ID3 and TP53) had significant associations with patient outcomes that were different among aBL and pBL cohorts. These findings highlight shared pathogenesis between aBL and pBL, and establish genetic subtypes within BL that serve to delineate tumors with distinct molecular features, providing a new framework for epidemiological, diagnostic, and therapeutic strategies.
Article
Full-text available
Gene expression is controlled by transcription factors (TFs) that bind cognate DNA motif sequences in cis-regulatory elements (CREs). The combinations of DNA motifs acting within homeostasis and disease, however, are unclear. Gene expression, chromatin accessibility, TF footprinting, and H3K27ac-dependent DNA looping data were generated and a random-forest-based model was applied to identify 7,531 cell-type-specific cis-regulatory modules (CRMs) across 15 diploid human cell types. A co-enrichment framework within CRMs nominated 838 cell-type-specific, recurrent heterotypic DNA motif combinations (DMCs), which were functionally validated using massively parallel reporter assays. Cancer cells engaged DMCs linked to neoplasia-enabling processes operative in normal cells while also activating new DMCs only seen in the neoplastic state. This integrative approach identifies cell-type-specific cis-regulatory combinatorial DNA motifs in diverse normal and diseased human cells and represents a general framework for deciphering cis-regulatory sequence logic in gene regulation.
Article
R-loops are three-stranded nucleic acid structures that are abundant and widespread across the genome and that have important physiological roles in many nuclear processes. Their accumulation is observed in cancers and neurodegenerative disorders. Recent studies have implicated a function for R-loops and G-quadruplex (G4) structures, which can form on the displaced single strand of R-loops, in three-dimensional genome organization in both physiological and pathological contexts. Here we discuss the interconnected functions of DNA:RNA hybrids and G4s within R-loops, their impact on DNA repair and gene regulatory networks, and their emerging roles in genome organization during development and disease.
Article
Background: Genomic regions identified by genome-wide association studies (GWAS) for bladder cancer risk provide new insights into etiology. Objective: To identify new susceptibility variants for bladder cancer in a meta-analysis of new and existing genome-wide genotype data. Design, setting, and participants: Data from 32 studies that includes 13,790 bladder cancer cases and 343,502 controls of European ancestry were used for meta-analysis. Outcome measurements and statistical analyses: Log-additive associations of genetic variants were assessed using logistic regression models. A fixed-effects model was used for meta-analysis of the results. Stratified analyses were conducted to evaluate effect modification by sex and smoking status. A polygenic risk score (PRS) was generated on the basis of known and novel susceptibility variants and tested for interaction with smoking. Results and limitations: Multiple novel bladder cancer susceptibility loci (6p.22.3, 7q36.3, 8q21.13, 9p21.3, 10q22.1, 19q13.33) as well as improved signals in three known regions (4p16.3, 5p15.33, 11p15.5) were identified, bringing the number of independent markers at genome-wide significance (p < 5 × 10-8) to 24. The 4p16.3 (FGFR3/TACC3) locus was associated with a stronger risk for women than for men (p-interaction = 0.002). Bladder cancer risk was increased by interactions between smoking status and genetic variants at 8p22 (NAT2; multiplicative p value for interaction [pM-I] = 0.004), 8q21.13 (PAG1; pM-I = 0.01), and 9p21.3 (LOC107987026/MTAP/CDKN2A; pM-I = 0.02). The PRS based on the 24 independent GWAS markers (odds ratio per standard deviation increase 1.49, 95% confidence interval 1.44-1.53), which also showed comparable results in two prospective cohorts (UK Biobank, PLCO trial), revealed an approximately fourfold difference in the lifetime risk of bladder cancer according to the PRS (e.g., 1st vs 10th decile) for both smokers and nonsmokers. Conclusions: We report novel loci associated with risk of bladder cancer that provide clues to its biological underpinnings. Using 24 independent markers, we constructed a PRS to stratify lifetime risk. The PRS combined with smoking history, and other established risk factors, has the potential to inform future screening efforts for bladder cancer. Patient summary: We identified new genetic markers that provide biological insights into the genetic causes of bladder cancer. These genetic risk factors combined with lifestyle risk factors, such as smoking, may inform future preventive and screening strategies for bladder cancer.
Article
Ewing sarcoma (EwS) is a rare bone and soft tissue malignancy driven by chromosomal translocations encoding chimeric transcription factors, such as EWSR1-FLI1, that bind GGAA motifs forming novel enhancers that alter nearby expression. We propose that germline microsatellite variation at the 6p25.1 EwS susceptibility locus could impact downstream gene expression and EwS biology. We performed targeted long-read sequencing of EwS blood DNA to characterize variation and genomic features important for EWSR1-FLI1 binding. We identified 50 microsatellite alleles at 6p25.1 and observed that EwS-affected individuals had longer alleles (>135 bp) with more GGAA repeats. The 6p25.1 GGAA microsatellite showed chromatin features of an EWSR1-FLI1 enhancer and regulated expression of RREB1, a transcription factor associated with RAS/MAPK signaling. RREB1 knockdown reduced proliferation and clonogenic potential and reduced expression of cell cycle and DNA replication genes. Our integrative analysis at 6p25.1 details increased binding of longer GGAA microsatellite alleles with acquired EWSR-FLI1 to promote Ewing sarcomagenesis by RREB1-mediated proliferation.