ArticlePDF Available

Abstract and Figures

Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1,2,3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter⁴; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution⁷; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10,11,12,13,14,15,16,17,18.
Association between rare germline PTVs in protein-coding genes and somatic mutational phenotypes a–d, f, Data are based on two-sided rare-variant association testing across n = 2,583 patients, with a stringent P value threshold of P < 2.5 × 10⁻⁶ used to mitigate multiple-hypothesis testing (significant genes marked with coloured circles). Blue/red circles mark genes that decrease/increase somatic mutation rates. The black line represents the identity line that would be followed if the observed P values followed the null expectation, with the shaded area showing the 95% confidence intervals. a, QQ plots for the proportion of somatic SV deletions, tandem duplications, inversions and translocation in cancer genomes. b, QQ plots for the proportion of somatic SV deletions in cancer genomes stratified by four size groups (1–10 kb, 10–100 kb, 100–1,000 kb and >1,000 kb). c, QQ plots for the proportion of somatic SV tandem duplications in cancer genomes stratified by four size groups (1–10 kb, 10–100 kb, 100–1,000 kb and >1,000 kb). d, QQ plot for the presence or absence of somatic SV templated insertion (cycles) in cancer genomes. e, Number of SV-templated insertion cycles in PCAWG tumours with germline BRCA1 PTVs. Only histological samples with at least one germline BRCA1 PTV carrier are shown (n = 1,095 patients combined). The box denotes the interquartile range, with the median marked as a horizontal line. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Outliers are shown as points. f, QQ plot for somatic CpG mutagenesis in cancer genomes based on NpCpG motif analysis. g, Violin plots show estimated densities of the proportion of somatic CpG mutations in PCAWG donors with germline MBD4 and BRCA2 PTVs. The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Two-sided hypothesis testing, not corrected for multiple testing, was performed using linear regression models. h, Replication of germline MBD4 and BRCA2 PTV associations with somatic CpG mutagenesis in TCGA whole-exome sequencing donors. Violin plots show the estimated density of the proportion of somatic CpG mutations in TCGA exomes with germline MBD4 and BRCA2 PTVs. The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Two-sided hypothesis testing, not corrected for multiple testing, was performed using linear-regression models. i, Correlation between MBD4 expression and somatic CpG mutagenesis in primary solid PCAWG tumours. Hypothesis testing was two-sided and not corrected for multiple testing, using linear-regression models. The box denotes the interquartile range, with the median marked as a horizontal line. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. j, Data are mean ± s.e.m. across n = 20 tumour types. The dashed black line shows the fitted line to the data, estimated using linear-regression models. Hypothesis testing was two-sided and not corrected for multiple testing, using Spearman’s rank correlations. k, MBD4 effect sizes (open circles) with 95% confidence intervals (error bars) for individual cancer types were estimated using linear-regression analysis after (if available) accounting for sex, age at diagnosis (young/old) and ICGC project. Hypothesis testing was two-sided and not corrected for multiple testing.
… 
Germline MEI callset a, Left, dots show the number of transductions promoted by each hot element in individual samples. Arrows highlight retrotransposition burst. Right, the contribution of each hot locus is represented. The total number of transductions mediated by each source element is shown on the right. b, Source L1 activity rate (that is, measured as the average number of transductions mediated by an element) versus the percentage of samples with retrotransposition activity in which the germline element is active. For visualization purposes, extreme points observed for a source L1 with an activity rate of 49 and for a L1 active in 31% of the samples are shown at ‘≥20’ and ‘≥10’, respectively. c, Contrasting allele frequencies for Strombolian and Plinian source loci (sample sizes shown under each axis label). The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Hypothesis testing was performed using two-sided Mann–Whitney U-tests without correction for multiple tests. d, Numbers of active and hot source L1 elements per donor. Data are mean ± s.d. number of elements per donor. e, The novel Plinian source element on 7p12.3 mediates 72 transductions among only 6 cancer samples. This generates a transduction that induces the deletion of the tumour-suppressor gene CDKN2A. f, Violin plots show the estimated number of distinct germline MEI alleles per PCAWG donor. The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Donors are grouped according to their genetic ancestry: AFR, African; AMR, admixed American; EAS, East Asian; EUR, European; SAS, South Asian. Sample sizes are shown under each axis label. g, For each type of MEI (L1, Alu and SVA) identified both in PCAWG and in the 1000 Genomes Project (1KGP), the correlations between allele frequency estimates per ancestry derived from both projects are displayed in a blue (0) to red (1) coloured gradient. n = 2,583 PCAWG patients. Two-sided hypothesis testing was performed using Spearman’s rank correlations without correction for multiple tests. h, Example correlation between MEI allele frequencies derived from PCAWG and the 1000 Genomes Project for individuals with European ancestry (n = 1,201 patients in PCAWG). Two-sided hypothesis testing was performed using Spearman’s rank correlations without correction for multiple tests. i, Evaluation of TraFiC-mem false-discovery rate on a liver hepatocellular carcinoma sample (DO50807) and a cell line (NCI-BL2087) sequenced using single-molecule sequencing with MinION (Oxford Nanopore). For each allele frequency bin (common, >5%; low frequency, 1–5%; rare, <1%), the percentage of events supported by N long reads is represented (N ranges from 0–1 to more than 5). MEIs supported by at least two Nanopore reads were considered to be true positives (blue palette) and were classified as false positives (red) otherwise. The total number of germline MEIs per allele frequency bin is shown on the right. j, Correlation between predicted MEI lengths from Illumina and Nanopore data. Two-sided hypothesis testing was performed using Spearman’s rank correlations without correction for multiple testing.
… 
Different mechanisms of telomere lengthening in cancer a, Scatter plot showing the four clusters of tumour-specific telomere patterns identified across PCAWG samples, together with the clusters of matched normal samples, generated by t-distributed stochastic neighbour embedding. Circles represent tumour samples and triangles represent matched normal samples. Points are coloured by tissue of origin. Data are based on n = 2,518 tumour samples and their matched normal samples. b, Patterns of comutation of the relevant driver mutations across individual patients. Columns in plot represent individual patients, coloured by type of abnormality observed. c, Distribution of clonality of driver mutations in genes relevant to telomere maintenance across clusters. Clonal [early], clonal mutations that occurred before duplications involving the relevant chromosome (including whole-genome duplications); clonal [late], clonal mutations that occurred after such duplications; and clonal [NA], mutations that occurred when no duplication was observed. d, Relationship between the estimated number of stem cell divisions per year and rate of telomere maintenance abnormalities across tumour types. The analysis uses data on estimated rates of stem cell division per year across n = 19 tissue types previously collated from the literature⁸². Tumour types are coloured according to the scheme shown in Extended Data Fig. 3. Two-sided hypothesis testing was performed using likelihood ratio tests on Poisson regression models with no correction for multiple tests.
… 
This content is subject to copyright. Terms and conditions apply.
82 | Nature | Vol 578 | 6 February 2020
Article
Pan-cancer analysis of whole genomes
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
Cancer is driven by genetic change, and the advent of massively parallel sequencing has
enabled systematic documentation of this variation at the whole-genome scale1–3. Here
we report the integrative analysis of 2,658whole-cancer genomes and their matching
normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes
(PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The
Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource,
facilitated by international data sharing using compute clouds. On average, cancer
genomes contained 4–5driver mutations when combining coding and non-coding
genomic elements; however, in around 5% of cases no drivers were identied,
suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which
many clustered structural variants arise in a single catastrophic event, is frequently an
early event in tumour evolution; in acral melanoma, for example, these events precede
most somatic point mutations and aect several cancer-associated genes
simultaneously. Cancers with abnormal telomere maintenance often originate from
tissues with low replicative activity and show several mechanisms of preventing
telomere attrition to critical levels. Common and rare germline variants aect patterns
of somatic mutation, including point mutations, structural variants and somatic
retrotransposition. A collection of papers from the PCAWG Consortium describes
non-coding mutations that drive cancer beyond those in the TERT promoter4; identies
new signatures of mutational processes that cause base substitutions, small insertions
and deletions and structural variation5,6; analyses timings and patterns of tumour
evolution7; describes the diverse transcriptional consequences of somatic mutation on
splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range
of more-specialized features of cancer genomes8,1018.
Cancer is the second most-frequent cause of death worldwide,
killing more than 8million people every year; the incidence of cancer
is expected to increase by more than 50% over the coming decades
19,20
.
‘Cancer’ is a catch-all term used to denote a set of diseases characterized
by autonomous expansion and spread of a somatic clone. To achieve
this behaviour, the cancer clone must co-opt multiple cellular pathways
that enable it to disregard the normal constraints on cell growth, modify
the local microenvironment to favour its own proliferation, invade
through tissue barriers, spread to other organs and evade immune sur-
veillance21. No single cellular program directs these behaviours. Rather,
there is a large pool of potential pathogenic abnormalities from which
individual cancers draw their own combinations: the commonalities
of macroscopic features across tumours belie a vastly heterogeneous
landscape of cellular abnormalities.
This heterogeneity arises from the stochastic nature of Darwinian
evolution. There are three preconditions for Darwinian evolution:
characteristics must vary within a population; this variation must be
heritable from parent to offspring; and there must be competition for
survival within the population. In the context of somatic cells, heritable
variation arises from mutations acquired stochastically throughout
life, notwithstanding additional contributions from germline and
epigenetic variation. A subset of these mutations alter the cellular
phenotype, and a small subset of those variants confer an advantage
on clones during the competition to escape the tight physiological
controls wired into somatic cells. Mutations that provide a selective
advantage to the clone are termed driver mutations, as opposed to
selectively neutral passenger mutations.
Initial studies using massively parallel sequencing demonstrated the
feasibility of identifying every somatic point mutation, copy-number
change and structural variant (SV) in a given cancer1–3. In 2008, recog-
nizing the opportunity that this advance in technology provided, the
global cancer genomics community established the ICGC with the
goal of systematically documenting the somatic mutations that drive
common tumour types22.
The pan-cancer analysis of whole genomes
The expansion of whole-genome sequencing studies from individual
ICGC and TCGA working groups presented the opportunity to under-
take a meta-analysis of genomic features across tumour types. To
achieve this, the PCAWG Consortium was established. A Technical
Working Group implemented the informatics analyses by aggregating
the raw sequencing data from different working groups that studied
individual tumour types, aligning the sequences to the human genome
and delivering a set of high-quality somatic mutation calls for down-
stream analysis (Extended Data Fig.1). Given the recent meta-analysis
https://doi.org/10.1038/s41586-020-1969-6
Received: 29 July 2018
Accepted: 11 December 2019
Published online: 5 February 2020
Open access
A list of members and their afiliations appears in the online version of the paper and lists of working groups appear in the Supplementary Information.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature | Vol 578 | 6 February 2020 | 83
of exome data from the TCGA Pan-Cancer Atlas
2325
, scientific working
groups concentrated their efforts on analyses best-informed by whole-
genome sequencing data.
We collected genome data from 2,834donors (Extended Data
Table1), of which 176 were excluded after quality assurance. A further
75 had minor issues that could affect some of the analyses (grey-listed
donors) and 2,583 had data of optimal quality (white-listed donors)
(Supplementary Table1). Across the 2,658white- and grey-listed donors,
whole-genome sequencing data were available from 2,605primary
tumours and 173metastases or local recurrences. Mean read coverage
was 39× for normal samples, whereas tumours had a bimodal cover-
age distribution with modes at 38× and 60× (Supplementary Fig.1).
RNA-sequencing data were available for 1,222donors. The final cohort
comprised 1,469men (55%) and 1,189women (45%), with a mean age of
56years (range, 1–90years) across 38tumour types (Extended Data
Table1 and Supplementary Table1).
To identify somatic mutations, we analysed all 6,835samples using
a uniform set of algorithms for alignment, variant calling and quality
control (Extended Data Fig.1, Supplementary Fig.2 and Supplementary
Methods2). We used three established pipelines to call somatic single-
nucleotide variations (SNVs), small insertions and deletions (indels),
copy-number alterations (CNAs) and SVs. Somatic retrotransposition
events, mitochondrial DNA mutations and telomere lengths were also
called by bespoke algorithms. RNA-sequencing data were uniformly
processed to call transcriptomic alterations. Germline variants identi-
fied by the three separate pipelines included single-nucleotide poly
-
morphisms, indels, SVs and mobile-element insertions (Supplementary
Table2).
The requirement to uniformly realign and call variants on approxi-
mately 5,800 whole genomes presented considerable computational
challenges, and raised ethical issues owing to the use of data from dif-
ferent jurisdictions (Extended Data Table2). We used cloud comput-
ing
26,27
to distribute alignment and variant calling across 13data centres
on 3continents (Supplementary Table3). Core pipelines were pack-
aged into Docker containers
28
as reproducible, stand-alone packages,
which we have made available for download. Data repositories for raw
and derived datasets, together with portals for data visualization and
exploration, have also been created (Box1 and Supplementary Table4).
Benchmarking of genetic variant calls
To benchmark mutation calling, we ran the 3core pipelines, together
with 10additional pipelines, on 63representative tumour–normal
genome pairs (Supplementary Note1). For 50 of these cases, we per-
formed validation by hybridization of tumour and matched normal DNA
to a custom bait set with deep sequencing
29
. The 3core somatic variant-
calling pipelines had individual estimates of sensitivity of 80–90%
to detect a true somatic SNV called by any of the 13 pipelines; more
Box 1
Online resources for data access, visualization and analysis
The PCAWG landing page (http://docs.icgc.org/pcawg) provides
links to several data resources for interactive online browsing,
analysis and download of PCAWG data and results (Supplementary
Table4).
Direct download of PCAWG data
Aligned PCAWG read data in BAM format are also available at
the European Genome Phenome Archive (EGA; https://www.
ebi.ac.uk/ega/search/site/pcawg under accession number
EGAS00001001692). In addition, all open-tier PCAWG genomics
data, as well as reference datasets used for analysis, can be
downloaded from the ICGC Data Portal at http://docs.icgc.org/
pcawg/data/. Controlled-tier genomic data, including SNVs and
indels that originated from TCGA projects (in VCF format) and
aligned reads (in BAM format) can be downloaded using the
Score (https://www.overture.bio/) software package, which has
accelerated and secure ile transfer, as well as BAM slicing facilities
to selectively download deined regions of genomic alignments.
PCAWG computational pipelines
The core alignment, somatic variant-calling, quality-control and
variant consensus-generation pipelines used by PCAWG have each
been packaged into portable cross-platform images using the
Dockstore system84 and released under an Open Source licence that
enables unrestricted use and redistribution. All PCAWG Dockstore
images are available to the public at https://dockstore.org/
organizations/PCAWG/collections/PCAWG.
ICGC Data Portal
The ICGC Data Portal85 (https://dcc.icgc.org) serves as the main
entry point for accessing PCAWG datasets with a single uniform web
interface and a high-performance data-download client. This uniform
interface provides users with easy access to the myriad of PCAWG
sequencing data and variant calls that reside in many repositories
and compute clouds worldwide. Streaming technology86 provides
users with high-level visualizations in real time of BAM and VCF iles
stored remotely on the Cancer Genome Collaboratory.
UCSC Xena
UCSC Xena87 (https://pcawg.xenahubs.net) visualizes all PCAWG
primary results, including copy-number, gene-expression, gene-fusion
and promoter-usage alterations, simple somatic mutations, large
somatic structural variations, mutational signatures and phenotypic
data. These open-access data are available through a public Xena
hub, and consensus simple somatic mutations can be loaded to the
local computer of a user via a private Xena hub. Kaplan–Meier plots,
histograms, box plots, scatter plots and transcript-speciic views offer
additional visualization options and statistical analyses.
The Expression Atlas
The Expression Atlas (https://www.ebi.ac.uk/gxa/home) contains
RNA-sequencing and expression microarray data for querying
gene expression across tissues, cell types, developmental stages
and/or experimental conditions88. Two different views of the data
are provided: summarized expression levels for each tumour type
and gene expression at the level of individual samples, including
reference-gene expression datasets for matching normal tissues.
PCAWG Scout
PCAWG Scout (http://pcawgscout.bsc.es/) provides a framework for
-omics worklow and website templating to generate on-demand,
in-depth analyses of the PCAWG data that are openly available to the
whole research community. Views of protected data are available
that still safeguard sensitive data. Through the PCAWG Scout web
interface, users can access an array of reports and visualizations
that leverage on-demand bioinformatic computing infrastructure
to produce results in real time, allowing users to discover trends as
well as form and test hypotheses.
Chromothripsis Explorer
Chromothripsis Explorer (http://compbio.med.harvard.edu/
chromothripsis/) is a portal that allows structural variation in the
PCAWG dataset to be explored on an individual patient basis
through the use of circos plots. Patterns of chromothripsis can also
be explored in aggregated formats.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
84 | Nature | Vol 578 | 6 February 2020
Article
than 95% of SNV calls made by each of the core pipelines were genu-
ine somatic variants (Fig.1a). For indels—a more-challenging class of
variants to identify with short-read sequencing—the 3core algorithms
had individual sensitivity estimates in the range of 40–50%, with pre-
cision of 70–95% (Fig.1b). For individual SV algorithms, we estimated
precision to be in the range 80–95% for samples in the 63-sample pilot
dataset.
Next, we defined a strategy to merge results from the three pipelines
into one final call-set to be used for downstream scientific analyses
(Methods and Supplementary Note2). Sensitivity and precision of
consensus somatic variant calls were 95% (90% confidence interval,
88–98%) and 95% (90% confidence interval, 71–99%), respectively, for
SNVs (Extended Data Fig.2). For somatic indels, sensitivity and preci-
sion were 60% (34–72%) and 91% (73–96%), respectively (Extended Data
Fig.2). Regarding somatic SVs, we estimate the sensitivity of merged
calls to be 90% for true calls generated by any one pipeline; precision
was estimated as 97.5%. The improvement in calling accuracy from
combining different pipelines was most noticeable in variants with
low variant allele fractions, which probably originate from tumour
subclones (Fig.1c, d). Germline variant calls, phased using a haplotype-
reference panel, displayed a precision of more than 99% and a sensitivity
of 92–98% (Supplementary Note2).
Analysis of PCAWG data
The uniformly generated, high-quality set of variant calls across more
than 2,500donors provided the springboard for a series of scientific
working groups to explore the biology of cancer. A comprehensive
suite of companion papers that describe the analyses and discoveries
across these thematic areas is copublished with this paper
4–18
(Extended
Data Table3).
Pan-cancer burden of somatic mutations
Across the 2,583white-listed PCAWG donors, we called 43,778,859
somatic SNVs, 410,123somatic multinucleotide variants, 2,418,247
somatic indels, 288,416somatic SVs, 19,166somatic retrotransposition
events and 8,185denovo mitochondrial DNA mutations (Supplemen-
tary Table1). There was considerable heterogeneity in the burden of
somatic mutations across patients and tumour types, with a broad
correlation in mutation burden among different classes of somatic
variation (Extended Data Fig.3). Analysed at a per-patient level, this
correlation held, even when considering tumours with similar purity
and ploidy (Supplementary Fig.3). Why such correlation should apply
on a pan-cancer basis is unclear. It is likely that age has some role, as we
observe a correlation between most classes of somatic mutation and
age at diagnosis (around 190SNVs per year, P=0.02; about 22indels
per year, P=5×10−5; 1.5SVs per year, P<2×10−16; linear regression
with likelihood ratio tests; Supplementary Fig.4). Other factors are
also likely to contribute to the correlations among classes of somatic
mutation, as there is evidence that some DNA-repair defects can cause
multiple types of somatic mutation30, and a single carcinogen can cause
a range of DNA lesions31.
Panorama of driver mutations in cancer
We extracted the subset of somatic mutations in PCAWG tumours
that have high confidence to be driver events on the basis of current
knowledge. One challenge to pinpointing the specific driver muta
-
tions in an individual tumour is that not all point mutations in recur-
rently mutated cancer-associated genes are drivers
32
. For genomic
elements significantly mutated in PCAWG data, we developed a ‘rank-
and-cut’ approach to identify the probable drivers (Supplementary
Methods8.1). This approach works by ranking the observed mutations
in a given genomic element based on recurrence, estimated functional
consequence and expected pattern of drivers in that element. We then
estimate the excess burden of somatic mutations in that genomic
element above that expected for the background mutation rate, and cut
the ranked mutations at this level. Mutations in each element with the
highest driver ranking were then assigned as probable drivers; those
below the threshold will probably have arisen through chance and were
assigned as probable passengers. Improvements to features that are
used to rank the mutations and the methods used to measure them
will contribute to further development of the rank-and-cut approach.
We also needed to account for the fact that some bona fide cancer
genomic elements were not rediscovered in PCAWG data because
of low statistical power. We therefore added previously known
cancer-associated genes to the discovery set, creating a ‘compendium
of mutational driver elements’ (Supplementary Methods8.2). Then,
using stringent rules to nominate driver point mutations that affect
these genomic elements on the basis of prior knowledge
33
, we separated
probable driver from passenger point mutations. To cover all classes
of variant, we also created a compendium of known driver SVs, using
analogous rules to identify which somatic CNAs and SVs are most likely
to act as drivers in each tumour. For probable pathogenic germline
variants, we identified all truncating germline point mutations and
SVs that affect high-penetrance germline cancer-associated genes.
This analysis defined a set of mutations that we could confidently
assert, based on current knowledge, drove tumorigenesis in the more
than 2,500tumours of PCAWG. We found that 91% of tumours had at
least one identified driver mutation, with an average of 4.6drivers per
tumour identified, showing extensive variation across cancer types
(Fig.2a). For coding point mutations, the average was 2.6drivers per
tumour, similar to numbers estimated in known cancer-associated
genes in tumours in the TCGA using analogous approaches32.
To address the frequency of non-coding driver point mutations,
we combined promoters and enhancers that are known targets of
Adiscan BETA MuTect
DKFZ
LOH complete
MuSE 0.9 Tier0
OICR-bl SGA Sanger
WUSTL
c
T
T
T
H
m
e
0.
W
Mu
F
1
= 0.1
= 0.1
F
1
= 0.2
= 0
F
1
= 0.3
= 0
F
1
= 0.4
= 0
F
1
= 0.5
= 0
F
1
= 0.6
= 0
F
1
= 0.7
= 0
F
1
= 0.8
= 0
0
0.25
0.50
0.75
1.00
00.250.500.751.00
Sensitivity
Precision
MuTect2
CRG Clindel
DKFZ
novobreak indel
SGA Sanger
SMuFin
WUSTL
o
r
de
U
CRG
M
L
GA
M
DK
t2
ng
F
1
= 0.1
= 0.1
F
1
= 0.2
= 0
F
1
= 0.3
= 0
F
1
= 0.4
=
F
1
= 0.5
=
F
1
= 0.6
=
F
1
= 0.7
=
F
1
= 0.8
=
0
0.25
0.50
0.75
1.00
00.250.500.7
51
.00
Sensitivity
Precision
a
d
c
b
F1Precision Sensitivity
[0,0.1]
(0.1,0.2]
(0.2,0.3]
(0.3,0.5]
(0.5,1]
0.6
0.8
1.0
0.6
0.8
1.0
0.6
0.8
1.0
VAF
Accuracy
F
1
Precision Sensitivity
[0,0.1]
(0.1,0.2]
(0.2,0.3]
(0.3,0.5]
(0.5,1]
0
0.50
1.00
0
0.50
1.00
0
0.50
1.00
VAF
Accuracy
DKFZ
MuTect
Sanger
Logistic regression
two_plus DKFZ
Sanger
SMuFin
Logistic regression
two_plus
Fig. 1 | Validation of variant-calling pipelines in PCAWG. a, Scatter plot of
estimated sensitivity and precision for somatic SNVs across individual
algorithms assessed in the validation exercise across n=63 PCAWG samples.
Core algorithms included in the final PCAWG call set are shown in blue.
b, Sensitivity and precision estimates across individual algorithms for
somatic indels. c, Accuracy (precision, sensitivity and F1 score, defined as
2×sensitivity×precision/(sensitivity+precision)) of somatic SNV calls across
variant allele fractions (VAFs) for the core algorithms. The accuracy of two
methods of combining variant calls (two-plus, which was used in the final
dataset, and logistic regression) is also shown. d, Accuracy of indel calls
across variant allele fractions.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature | Vol 578 | 6 February 2020 | 85
non-coding drivers3437 with those newly discovered in PCAWG data;
this is reported in a companion paper4. Using this approach, only
13% (785 out of 5,913) of driver point mutations were non-coding
in PCAWG. Nonetheless, 25% of PCAWG tumours bear at least one
putative non-coding driver point mutation, and one third (237 out
of 785) affected the TERT promoter (9% of PCAWG tumours). Overall,
non-coding driver point mutations are less frequent than coding
driver mutations. With the exception of the TERT promoter, indi-
vidual enhancers and promoters are only infrequent targets of driver
mutations4.
Across tumour types, SVs and point mutations have different rela
-
tive contributions to tumorigenesis. Driver SVs are more prevalent
in breast adenocarcinomas (6.4±3.7 SVs (mean±s.d.) compared
with 2.2±1.3 point mutations; P<1×10
−16
, Mann–Whitney U-test)
and ovary adenocarcinomas (5.8±2.6 SVs compared with 1.9±1.0
point mutations; P<1×10−16), whereas driver point mutations have
a larger contribution in colorectal adenocarcinomas (2.4±1.4 SVs
compared with 7.4±7.0 point mutations; P=4×10−10) and mature
B cell lymphomas (2.2±1.3 SVs compared with 6±3.8 point muta-
tions; P<1×10
−16
), as previously shown
38
. Across tumour types, there
are differences in which classes of mutation affect a given genomic
element (Fig.2b).
We confirmed that many driver mutations that affect tumour-
suppressor genes are two-hit inactivation events (Fig.2c). For exam-
ple, of the 954tumours in the cohort with driver mutations in TP53,
736 (77%) had both alleles mutated, 96% of which (707 out of 736)
combined a somatic point mutation that affected one allele with
somatic deletion of the other allele. Overall, 17% of patients had
rare germline protein-truncating variants (PTVs) in cancer-predis-
position genes
39
, DNA-damage response genes
40
and somatic driver
genes. Biallelic inactivation due to somatic alteration on top of a
germline PTV was observed in 4.5% of patients overall, with 81% of
Liver–HCC
Panc–AdenoCA
Prost–AdenoCA
Breast–AdenoCa
Kidney–RCC
CNS–Medullo
Ovary–AdenoCA
Skin–Melanoma
Lymph–BNHL
Eso–AdenoCa
Lymph–CLL
CNS–PiloAstro
Panc–Endocrine
Stomach–AdenoCA
Head–SCC
ColoRect–AdenoCA
Thy–AdenoCA
Lung–SCC
Uterus–AdenoCA
Kidney–ChRCC
CNS–GBM
Lung–AdenoCA
Bone–Osteosarc
SoftTissue–Leiomyo
Biliary–AdenoCA
Bladder–TCC
Germline susceptibility variants
Somatic non-coding drivers
Somatic coding drivers
SGR drivers
SCNA drivers
WG duplications
Coding Promoter
Intron splicin
g3
UTR
5 UTR
Amplied oncogene Deleted TSG
Truncated TSG Fusion gene
cis-activating GR
Mutations
SCNA and SV
71
74
76
83
84
85
88
89
90
90
95
103
106
107
118
162
167
177
181
258
263
269
287
316
475
954
0 0.25 0.50 0.75
1.00
ATM
CREBBP
MAP2K4
CCND1
MCL1
PBRM1
APC
KMT2D
19p13.3a
VHL
CCNE1
NF1
MYC
ERG
CTNNB1
BRAF
RB1
PIK3CA
SMAD4
CDKN2B
TERT
PTEN
KRAS
ARID1A
CDKN2A
TP53
Number of
patients
Proportion of patients
0 0.1 0.3 0.5
0.80
1.00
0.63
0.82
0.77
Proportion of patients with the gene altered as biallelic
Number of patients
Deletion/deletion
Deletion/GR(break)
Deletion/mutation
Deletion/deletion
Mutation/deletion
Mutation/mutation
Somatic/somatic Germline/somatic
TP53
0
200
400
600
CDKN2A
CDKN2B
PTEN
SMAD4
0
200
400
VHL
RB1
PBRM1
ARID1A
MAP2K4
NF1
APC
BRCA2
MEN1
ATM
AXIN1
BRCA1
MSR1
DCC
SETD2
BAP1
TGFBR2
FAS
EME2
STK11
KDM6A
CDH1
B2M
DDX3X
FAT1
DAXX
CREBBP
NCOR1
SMARCA4
IRF2
KDM5C
RNF43
ATRX
TSC1
TNFRSF14
BRD7
POLR2L
PTCH1
FBXW7
PIK3R1
NF2
CIC
MAP3K1
0
20
40
60
80
0.91
0.46
0.76
0.17
0.70
0.47
0.48
0.75
0.86
0.42
0.83
0.77
1.00
0.76
0.43
0.69
0.57
0.92
1.00
0.75
0.53
0.66
0.36
0.57
0.38
1.00
0.22
0.57
0.33
0.38
0.52
0.58
0.47
0.67
0.71
0.86
1.00
0.52
0.25
0.33
0.73
0.71
0.28
20 60 100
Patients with
drivers (%)
All
Coding point muts
Non-coding point muts
Rearrangements
SCNA
Germline
91
76
25
26
73
17
0 2.5 5.0 7.5
Number of drivers
4.6
2.6
1.2
1.3
3.4
1.1
1.0
010
ab
c
101176 48 107 109 20 82 142 1056 23 6331 3130 32 61316 21 812
37 153 55 13 48 126 113159 1411 23 24 13768
63 79 213 19 20 61353 3214121 392
1206 3134114 12881 1481
14 1211618 1060 33 91659684105435
53 74 27 177221 111219 3 363
894 27 16 22 1110 51 3791094468
3115 2131 11129372
34 27 18 42263 12411911 12 631121
42 82454823023211145 42118
34 52 1211475 210212 0 11
80 114533 917 4
107
6139 29411221
21 10 162116 1295328 1273
1121210825104 184
1185 21
35 53
23 1543 212 1233 10 1135
84 21243 44 15 2 111
25 5111357 151 31
23 6230 81 248
21 15 1630 412
4252 1 117 58542
145141211135
29 23 21 1627 21111 10 133 12
19 38
Fig. 2 | Panor ama of driver mut ations in PC AWG. a, Top, putative dr iver
mutatio ns in PCAWG, represen ted as a circos plo t. Each sec tor represent s a
tumour in the c ohort. From the pe riphery to th e centre of the plot t he
concentr ic rings repre sent: (1) the tot al number of drive r alterations; (2) the
presenc e of whole-genom e (WG) duplication; (3) the tum our type; (4) the
number of dr iver CNAs; (5) the num ber of driver genom ic rearrangem ents;
(6) driver coding point mutations; (7) driver non-coding point mutations; and
(8) pathogenic ger mline variant s. Bottom, sna pshots of the pan orama of driver
mutatio ns. The horiz ontal bar plot (le ft) represent s the proport ion of patient s
with diffe rent types o f drivers. The d ot plot (right) repres ents the mea n
number of e ach type of dr iver mutation ac ross tumours w ith at least one eve nt
(the square dot) and th e standard devi ation (grey whi skers), based on n=2 ,58 3
patients. b, Genomic ele ments targe ted by different t ypes of mut ations in the
cohort alt ered in more than 6 5 tumours. Bot h germline and som atic variants
are included . Left, the hea t map shows the recur rence of altera tions across
cancer t ypes. The c olour indicate s the proport ion of mutated t umours and the
number indicates the absolute count of mutated tumours. Right, the
proport ion of each typ e of alteration th at affects e ach genomic ele ment.
c, Tumour-suppressor gen es with bialleli c inactivatio n in 10 or more patient s.
The values i ncluded under t he gene labels rep resent the prop ortions of
patient s who have biallelic mut ations in the gen e out of all patien ts with a
somatic mu tation in that ge ne. GR, genomi c rearrangeme nt; SCNA, soma tic
copy-number alteration; SGR, somatic genome rearrangement; TSG, tumour
suppress or gene; UTR, unt ranslated reg ion.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
86 | Nature | Vol 578 | 6 February 2020
Article
these affecting known cancer-predisposition genes (such as BRCA1,
BRCA2 and ATM).
PCAWG tumours with no apparent drivers
Although more than 90% of PCAWG cases had identified drivers, we
found none in 181 tumours (Extended Data Fig.4a). Reasons for miss-
ing drivers have not yet been systematically evaluated in a pan-cancer
cohort, and could arise from either technical or biological causes.
Technical explanations could include poor-quality samples, inad
-
equate sequencing or failures in the bioinformatic algorithms used.
We assessed the quality of the samples and found that 4 of the 181
cases with no known drivers had more than 5% tumour DNA contami-
nation in their matched normal sample (Fig.3a). Using an algorithm
designed to correct for this contamination41, we identified previously
missed mutations in genes relevant to the respective cancer types.
Similarly, if the fraction of tumour cells in the cancer sample is low
through stromal contamination, the detection of driver mutations
can be impaired. Most tumours with no known drivers had an aver
-
age power to detect mutations close to 100%; however, a few had
power in the 70–90% range (Fig.3b and Extended Data Fig.4b). Even
in adequately sequenced genomes, lack of read depth at specific
driver loci can impair mutation detection. For example, only around
50% of PCAWG tumours had sufficient coverage to call a mutation
(≥90% power) at the two TERT promoter hotspots, probably because
the high GC content of this region causes biased coverage (Fig.3c).
In fact, 6 hepatocellular carcinomas and 2 biliary cholangiocarcinomas
among the 181 cases with no known drivers actually did contain TERT
mutations, which were discovered after deep targeted sequencing42.
Finally, technical reasons for missing driver mutations include fail-
ures in the bioinformatic algorithms. This affected 35 myeloprolif-
erative neoplasms in PCAWG, in which the JAK2
V617F
driver mutation
should have been called. Our somatic variant-calling algorithms rely
on ‘panels of normals’, typically from blood samples, to remove recur-
rent sequencing artefacts. As 2–5% of healthy individuals carry occult
haematopoietic clones43, recurrent driver mutations in these clones
can enter panels of normals.
With regard to biological causes, tumours may be driven by muta-
tions in cancer-associated genes that are not yet described for that
tumour type. Using driver discovery algorithms on tumours with no
known drivers, no individual genes reached significance for point muta-
tions. However, we identified a recurrent CNA that spanned SETD2 in
a
b
0
5
10
15
Tumour-in-normal
estimate (%)
0
1
Average
detectionsensitivity
c
Chromosome 5: 1,259,228
Detectionsensitivity
0
1
0
1Chromosome 5: 1,259,250
d
0.25 10–2 10–4 10–7 10–20
1
2
3
4
5
6
7
8
9
10
11 12
13 14
15 16
17 18
19 20
21 22
2q37.3
3p21.31
5q35.2
8p23.1
10q26.13
16q24.3
17p13.3 FANCA (40 genes)
TP53 (289 genes)
SETD2 (13 genes)
PCM1 (187 genes)
(287 genes)
FGFR2 (151 genes)
(80 genes)
Chromosome
q value
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18192021
22
Kidney–ChRCCPanc–Endocrine
eChromosome loss Chromosome gain
Biliary–AdenoCA
Bone–Cart
Bone–Epith
Bone–Osteosarc
Breast–AdenoCA
CNS–Medullo
Eso–AdenoCA
Head–SCC
Kidney–ChRCC
Kidney–RCC
Liver–HCC
Lung–AdenoCa
Lymph–BNHL
Lymph–CLL
Myeloid–AML
Myeloid–MDS
Myeloid–MPN
Panc–AdenoCA
Panc–Endocrine
Prost–AdenoCA
Skin–Melanoma
Stomach–AdenoCA
Thy–AdenoCA
Biliary–AdenoCA
CNS–Medullo
Head–SCC
Liver–HCC
Skin–Melanoma
Thy–AdenoCA
Fig. 3 | Analys is of patien ts with no dete cted driver m utations . a, Individual
estimates of the percentage of tumour-in-normal contamination across
patient s with no driver mut ations in PC AWG (n=181). No data we re available for
myelodyspla stic syndrome s and acute myeloid l eukaemia. Poin ts represen t
estimate s for individual pa tients, and th e coloured area s are estimated d ensity
distribu tions (violin plot s). Abbreviations of t he tumour typ es are defin ed in
Extende d Data Table1. b, Average detect ion sensitiv ity by tumour t ype for
tumours without known drivers (n=181). Each do t represent s a given sample
and is the average s ensitivit y of detect ing clonal subst itutions acro ss the
genome, ta king into accou nt purity and pl oidy. Coloured areas a re estimated
density d istributio ns, shown for cohor ts with at le ast five cas es. c, Detec tion
sensitivity for TERT promote r hotspots in t umour type s in which TERT is
frequent ly mutated. Co loured areas are e stimated den sity distr ibutions.
d, Signif icant copy-numb er losses ide ntified by t wo-sided hypo thesis test ing
using GIST IC2.0, corre cted for multip le-hypothes is testing. Nu mbers in
parenthe ses indicate t he number of gene s in signifi cant region s when
analysing medulloblastomas without known drivers (n=42). Significa nt
regions w ith known can cer-associated ge nes are labelle d with the
represen tative cance r-associated gene . e, Aneuploidy in c hromophobe ren al
cell carcinomas and pancreatic neuroendocrine tumours without known
drivers. Pa tients are orde red on the y axis by tum our type and t hen by presence
of whole-genome duplication (bottom) or not (top).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature | Vol 578 | 6 February 2020 | 87
medulloblastomas that lacked known drivers (Fig.3d), indicating that
restricting hypothesis testing to missing-driver cases can improve
power if undiscovered genes are enriched in such tumours. Inactivation
of SETD2 in medulloblastoma significantly decreased gene expres-
sion (P=0.002) (Extended Data Fig.4c). Notably, SETD2 mutations
occurred exclusively in medulloblastoma group-4 tumours (P<1×10−4).
Group-4 medulloblastomas are known for frequent mutations in other
chromatin-modifying genes
44
, and our results suggest that SETD2 loss
of function is an additional driver that affects chromatin regulators in
this subgroup.
Two tumour types had a surprisingly high fraction of patients with-
out identified driver mutations: chromophobe renal cell carcinoma
(44%; 19 out of 43) and pancreatic neuroendocrine cancers (22%;
18 out of 81) (Extended Data Fig.4a). A notable feature of the miss-
ing-driver cases in both tumour types was a remarkably consistent
profile of chromosomal aneuploidy—patterns that have previously
been reported
45,46
(Fig.3e). The absence of other identified driver muta-
tions in these patients raises the possibility that certain combinations
of whole-chromosome gains and losses may be sufficient to initiate
a cancer in the absence of more-targeted driver events such as point
mutations or fusion genes of focal CNAs.
Even after accounting for technical issues and novel drivers, 5.3% of
PCAWG tumours still had no identifiable driver events. In a research
setting, in which we are interested in drawing conclusions about popu-
lations of patients, the consequences of technical issues that affect
occasional samples will be mitigated by sample size. In a clinical setting,
in which we are interested in the driver mutations in a specific patient,
these issues become substantially more important. Careful and critical
appraisal of the whole pipeline—including sample acquisition, genome
sequencing, mapping, variant calling and driver annotation, as done
FractionFractionEvents
ChromoplexyChromothripsis
FractionNo. foci
Kataegis
a
b
d
Punctuated events across PCAWG
c
10
102
104
106
Chromoplexy
interfootprint distance
WBSCR17
TMPRSS2
RUNX1T
1
RCBTB2
IGF2BP3
MIR3925
ZBTB44
CASC11
THADA
KDM4C
TRA2A
RUNX1
LPAR6
SRSF3
SOX4
BRAF
RPA1
BCL2
ST14
MYC
MX1
ERG
RB1
IGH
PALM2
BZRAP1
HIST1H2BC
HIST1H2AC
KIAA0226L
LINC01136
MIR155HG
MIR4436A
OSBPL10
ST6GAL1
TMSB4X
ZFP36L1
BCL2L11
TBC1D4
MIR4322
EIF2AK3
ZCCHC7
IMMP2L
SMIM20
DNMT1
ZNF860
ZNF595
SEL1L3
FOXO1
MIR142
NEAT1
AKAP2
RFTN1
BACH2
TCL1A
SOCS1
DUSP2
CXCR4
BCL7A
LRRN3
AICDA
S1PR2
RHOH
BIRC3
VMP1
LRMP
ACTB
DTX1
BTG1
BTG2
XBP1
CIITA
SGK1
PAX5
ETS1
CD74
BCL2
AFF3
BCL6
CD83
DMD
RMI2
PIM1
FHIT
PIM2
MYC
IRF8
IRF1
IRF4
LTB
LPP
IGH
IGK
IGL
100
102
104
106
Kataegis
interfocal distance
0
0.5
1.0
0
0.5
1.0 Small
Amplied
Far from telomere
Classic single
Multiple chrom.
0
0.5
1.0
1
10
100
APOBEC3 Alt. C deamin.
C[T>N]T Pol η
Uncertain
+ SV
– SV
Chromoplexy
Balanced translocations
25
0
25
50
75
Interbreakpoint
distance (bp)
100
101
102
103
104
105
106
2220181716151413121110987654321 X
AmplicationHomozygous deletion
No. losses No. gains
Rearrangement
0
0
2
2
SOX2 (12)TERT (22) EGFR (9)
CCND1 (30)
MDM2 (36)
CDK4 (30) ERBB2 (30)
NF1 (11)
RB1 (7)
CDKN2A (15)
Liposarcoma-like
Bladder−TCC
Lung−SCC
Skin−Melanoma−Acral
SoftTissue−Liposarc
Lymph−BNHL
Bone−Osteosarc
Cervix−SCC
Head−SCC
Panc−AdenoCA
SoftTissue−Leiomyo
Skin−Melanoma−Cut
Eso−AdenoCA
Lung−AdenoCA
Breast−AdenoCA
Ovary−AdenoCA
CNS−GBM
Breast−LobularCA
Biliary−AdenoCA
Stomach−AdenoCA
ColoRect−AdenoCA
Liver−HCC
Lymph−CLL
Bone−Epith
Prost−AdenoCA
Uterus−AdenoCA
Kidney−RCC−Clear
CNS−Oligo
Panc−Endocrine
Kidney−ChRCC
Kidney−RCC−Pap
Thy−AdenoCA
Bone−Benign
CNS−Medullo
CNS−PiloAstro
Myeloid−AML
Myeloid−MPN
RTN4RL1
Fig. 4 | Patt erns of clust ered mutati onal proces ses in PCAWG. a, Kata egis.
Top, prevalence of differe nt types of k ataegis an d their associ ation with SVs
(≤1kb from the focus). Bot tom, the distr ibution of the num ber of foci of
kataeg is per sample . Chromoplexy. Prevale nce of chromoplex y across canc er
types , subdivided i nto balanced tra nslocation s and more complex even ts.
Chromothrip sis. Top, frequency of chr omothripsis a cross cancer t ypes.
Bottom, for e ach cancer t ype a column is show n, in which each row i s a
chromothr ipsis region re presented by f ive coloured rec tangles re lating to its
categori zation. b, Circos rainfal l plot showing the di stances be tween
consecu tive kataeg is events across P CAWG compared with t heir genomic
position. Lymphoid tumours (khaki, B cell non-Hodgkin’s lymphoma; orange,
chronic lym phocytic le ukaemia) have hyper mutation ho t spots (≥3 foci wit h
distanc e ≤1kb; pale red zone), many of which are n ear known can cer-associated
genes (red anno tations) and have ass ociated SVs (≤10kb from th e focus; shown
as arcs in the ce ntre). c, Circos rainfall plo t as in b that shows the di stance versu s
the positi on of consecut ive chromoplexy a nd reciprocal tr anslocatio n
footprint s across PCAWG. Lympho id, prostate and t hyroid cancer s exhibit
recurrent e vents (≥2 footpri nts with dist ance ≤10kb; pale red zon e) that are
likely to be drive r SVs and are annotate d with nearby gen es and associ ated SVs,
which are show n as bold and thin arc s for chromoplexy an d reciprocal
transloc ations, resp ectively (colour s as in a). d, Effect of chrom othripsis alo ng
the genome an d involvement of PCAWG drive r genes. Top, number of
chromothr ipsis-induced g ains or losses ( grey) and amplif ications (b lue) or
deletion s (red). Within the iden tified chro mothripsis reg ions, selec ted
recurrent ly rearranged ( light grey), amplifi ed (blue) and homoz ygously
deleted (mage nta) driver genes a re indicated. B ottom, interb reakpoint
distanc e between all su bsequent bre akpoints w ithin chromothr ipsis region s
across can cer type s, coloured by canc er type. Re gions with an aver age
interbrea kpoint dista nce <10kb are highlighte d. C[T>N]T, kataegis with a
pattern o f thymine muta tions in a Cp TpT context.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
88 | Nature | Vol 578 | 6 February 2020
Article
here—should be required for laboratories that offer clinical sequenc-
ing of cancer genomes.
Patterns of clustered mutations and SVs
Some somatic mutational processes generate multiple mutations in a
single catastrophic event, typically clustered in genomic space, leading
to substantial reconfiguration of the genome. Three such processes
have previously been described: (1) chromoplexy, in which repair of
co-occurring double-stranded DNA breaks—typically on different chro-
mosomes—results in shuffled chains of rearrangements
47,48
(Extended
Data Fig.5a); (2) kataegis, a focal hypermutation process that leads to
locally clustered nucleotide substitutions, biased towards a single DNA
strand
4951
(Extended Data Fig.5b); and (3) chromothripsis, in which
tens to hundreds of DNA breaks occur simultaneously, clustered on
one or a few chromosomes, with near-random stitching together of
the resulting fragments5255 (Extended Data Fig.5c). We characterized
the PCAWG genomes for these three processes (Fig.4).
Chromoplexy events and reciprocal translocations were identified
in 467 (17.8%) samples (Fig.4a, c). Chromoplexy was prominent in
prostate adenocarcinoma and lymphoid malignancies, as previously
described47,48, and—unexpectedly—thyroid adenocarcinoma. Differ-
ent genomic loci were recurrently rearranged by chromoplexy across
the three tumour types, mediated by positive selection for particu-
lar fusion genes or enhancer-hijacking events. Of 13 fusion genes or
enhancer hijacking events in 48 thyroid adenocarcinomas, at least
4 (31%) were caused by chromoplexy, with a further 4 (31%) part of com-
plexes that contained chromoplexy footprints (Extended Data Fig.5a).
These events generated fusion genes that involved RET (two cases) and
NTRK3 (one case)
56
, and the juxtaposition of the oncogene IGF2BP3
with regulatory elements from highly expressed genes (five cases).
Kataegis events were found in 60.5% of all cancers, with particularly
high abundance in lung squamous cell carcinoma, bladder cancer,
acral melanoma and sarcomas (Fig.4a, b). Typically, kataegis com-
prises C>N mutations in a TpC context, which are probably caused
by APOBEC activity4951, although a T>N conversion in a TpT or CpT
process (the affected T is highlighted in bold) attributed to error-prone
polymerases has recently been described
57
. The APOBEC signature
accounted for 81.7% of kataegis events and correlated positively with
APOBEC3B expression levels, somatic SV burden and age at diagnosis
(Supplementary Fig.5). Furthermore, 5.7% of kataegis events involved
the T>N error-prone polymerase signature and 2.3% of events, most
notably in sarcomas, showed cytidine deamination in an alternative
GpC or CpC context.
Kataegis events were frequently associated with somatic SV break-
points (Fig.4a and Supplementary Fig.6a), as previously described50,51.
Deletions and complex rearrangements were most-strongly associ-
ated with kataegis, whereas tandem duplications and other simple
SV classes were only infrequently associated (Supplementary Fig.6b).
Kataegis inducing predominantly T>N mutations in CpTpT context
was enriched near deletions, specifically those in the 10–25-kilobase
(kb) range (Supplementary Fig.6c).
Samples with extreme kataegis burden (more than 30 foci) comprise
four types of focal hypermutation (Extended Data Fig.6): (1) off-target
somatic hypermutation and foci of T>N at CpTpT, found in B cell non-
Hodgkin lymphoma and oesophageal adenocarcinomas, respectively;
(2) APOBEC kataegis associated with complex rearrangements, notably
found in sarcoma and melanoma; (3) rearrangement-independent
APOBEC kataegis on the lagging strand and in early-replicating regions,
mainly found in bladder and head and neck cancer; and (4) a mix of
the last two types. Kataegis only occasionally led to driver mutations
(Supplementary Table5).
We identified chromothripsis in 587samples (22.3%), most fre-
quently among sarcoma, glioblastoma, lung squamous cell carci-
noma, melanoma and breast adenocarcinoma18. Chromothripsis
increased with whole-genome duplications in most cancer types
(Extended Data Fig.7a), as previously shown in medulloblastoma
58
.
The most recurrently associated driver was TP53
52
(pan-cancer odds
ratio=3.22; pan-cancer P=8.3×10−35; q<0.05 in breast lobular (odds
ratio=13), colorectal (odds ratio=25), prostate (odds ratio=2.6) and
hepatocellular (odds ratio=3.9) cancers; Fisher–Boschloo tests). In
two cancer types (osteosarcoma and B cell lymphoma), women had a
higher incidence of chromothripsis than men (Extended Data Fig.7b).
In prostate cancer, we observed a higher incidence of chromothripsis
in patients with late-onset than early-onset disease
59
(Extended Data
Fig.7c).
Chromothripsis regions coincided with 3.6% of all identified driv-
ers in PCAWG and around 7% of copy-number drivers (Fig.4d). These
proportions are considerably enriched compared to expectation if
selection were not acting on these events (Extended Data Fig.7d). The
majority of coinciding driver events were amplifications (58%), followed
by homozygous deletions (34%) and SVs within genes or promoter
regions (8%). We frequently observed a ≥2-fold increase or decrease in
expression of amplified or deleted drivers, respectively, when these loci
were part of a chromothripsis event, compared with samples without
chromothripsis (Extended Data Fig.7e).
Chromothripsis manifested in diverse patterns and frequencies
across tumour types, which we categorized on the basis of five charac
-
teristics (Fig.4a). In liposarcoma, for example, chromothripsis events
often involved multiple chromosomes, with universal MDM2 ampli-
fication
60
and co-amplification of TERT in 4 of 19 cases (Fig.4d). By
contrast, in glioblastoma the events tended to affect a smaller region
on a single chromosome that was distant from the telomere, resulting
in focal amplification of EGFR and MDM2 and loss of CDKN2A. Acral
melanomas frequently exhibited CCND1 amplification, and lung squa-
mous cell carcinomas SOX2 amplif ications. In both cases, these drivers
were more-frequently altered by chromothripsis compared with other
drivers in the same cancer type and to other cancer types for the same
driver (Fig.4d and Extended Data Fig.7f). Finally, in chromophobe renal
cell carcinoma, chromothripsis nearly always affected chromosome
5 (Supplementary Fig.7): these samples had breakpoints immediately
adjacent to TERT, increasing TERT expression by 80-fold on average
compared with samples without rearrangements (P=0.0004; Mann–
Whitney U-test).
Timing clustered mutations in evolution
An unanswered question for clustered mutational processes is whether
they occur early or late in cancer evolution. To address this, we used
molecular clocks to define broad epochs in the life history of each
tumour
49,61
. One transition point is between clonal and subclonal muta
-
tions: clonal mutations occurred before, and subclonal mutations after,
the emergence of the most-recent common ancestor. In regions with
copy-number gains, molecular time can be further divided according
to whether mutations preceded the copy-number gain (and were them-
selves duplicated) or occurred after the gain (and therefore present on
only one chromosomal copy)7.
Chromothripsis tended to have greater relative odds of being clonal
than subclonal, suggesting that it occurs early in cancer evolution,
especially in liposarcomas, prostate adenocarcinoma and squamous
cell lung cancer (Fig.5a). As previously reported, chromothripsis was
especially common in melanomas
62
. We identified 89 separate chromo-
thripsis events that affected 66 melanomas (61%); 47 out of 89 events
affected genes known to be recurrently altered in melanoma
63
(Sup-
plementary Table6). Involvement of a region on chromosome 11 that
includes the cell-cycle regulator CCND1 occurred in 21 cases (10 out
of 86 cutaneous, and 11 out of 21 acral or mucosal melanomas), typi-
cally combining chromothripsis with amplification (19 out of 21 cases)
(Extended Data Fig.8). Co-involvement of other cancer-associated
genes in the same chromothripsis event was also frequent, including
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature | Vol 578 | 6 February 2020 | 89
TERT (five cases), CDKN2A (three cases), TP53 (two cases) and MYC
(two cases) (Fig.5b). In these co-amplifications, a chromothripsis
event involving multiple chromosomes initiated the process, creat-
ing a derivative chromosome in which hundreds of fragments were
stitched together in a near-random order (Fig.5b). This derivative
then rearranged further, leading to massive co-amplification of the
multiple target oncogenes together with regions located nearby on
the derivative chromosome.
In these cases of amplified chromothripsis, we can use the inferred
number of copies bearing each SNV to time the amplification process.
SNVs present on the chromosome before amplification will them-
selves be amplified and are therefore reported in a high fraction of
sequence reads (Fig.5b and Extended Data Fig.8). By contrast, late
SNVs that occur after the amplification has concluded will be present
on only one chromosome copy out of many, and thus have a low variant
allele fraction. Regions of CCND1 amplification had few—sometimes
zero—mutations at high variant allele fraction in acral melanomas, in
contrast to later CCND1 amplifications in cutaneous melanomas, in
which hundreds to thousands of mutations typically predated ampli-
fication (Fig.5b and Extended Data Fig.9a, b). Thus, both chromoth-
ripsis and the subsequent amplification generally occurred very early
during the evolution of acral melanoma. By comparison, in lung squa-
mous cell carcinomas, similar patterns of chromothripsis followed by
SOX2 amplification are characterized by many amplified SNVs, sug-
gesting a later event in the evolution of these cancers (Extended Data
Fig.9c).
Notably, in cancer types in which the mutational load was sufficiently
high, we could detect a larger-than-expected number of SNVs on an
intermediate number of DNA copies, suggesting that they appeared
during the amplification process (Supplementary Fig.8).
TERT CCND1
a
b
0
20
40
0
0.5
1.0
C>A
C>G
C>T
T>A
T>C
T>G
0
40
80
120
0
0.5
1.0
VAF
010203040505565758595 105 115
Chr. 5 position (Mb) Chr. 11 position (Mb)
0
20
40
0
0.5
1.0
Copy
number
Sample: SA557318
Acral melanoma
Sample: SA557322
Acral melanoma
Sample: SA557416
Acral melanoma
VAF Copy
number VAF Copy
number
0.01
0.1
1
10
100
0.01
0.1
1
10
100
Relative odds
(clonal/subclonal)
Relative odds
(early/late)
Fraction of
samples
Chromoplexy
Chromothripsis
Kataegis
No. samples
34
23
16
10
38
198
3
13
2
18
41
146
18
89
60
98
57
45
111
33
317
38
48
107
95
13
2
23
113
239
85
210
20
86
1
15
19
75
48
51
Biliary−AdenoCA
Bladder−TCC
Bone−Benign
Bone−Epith
Bone−Osteosarc
Breast−AdenoCA
Breast−DCIS
Breast−LobularCA
Cervix−AdenoCA
Cervix−SCC
CNS−GBM
CNS−Medullo
CNS−Oligo
CNS−PiloAstro
ColoRect−AdenoCA
Eso−AdenoCA
Head−SCC
Kidney−ChRCC
Kidney−RCC−Clear
Kidney−RCC−Pap
Liver−HCC
Lung−AdenoCA
Lung−SCC
Lymph−BNHL
Lymph−CLL
Myeloid−AML
Myeloid−MDS
Myeloid−MPN
Ovary−AdenoCA
Panc−AdenoCA
Panc−Endocrine
Prost−AdenoCA
Skin−Melanoma−Acral
Skin−Melanoma−Cut
Skin−Melanoma−Mucosal
SoftTissue−Leiomyo
SoftTissue−Liposarc
Stomach−AdenoCA
Thy−AdenoCA
Uterus−AdenoCA
Fig. 5 | Timin g of clustere d events in PCAWG. a, Exten t and timing of
chromothr ipsis, katae gis and chromopl exy across PC AWG. Top, stacke d bar
charts il lustrate co- occurrence of c hromothripsi s, kataegi s and chromoplexy
in the sample s. Middle, rela tive odds of cluste red events bein g clonal or
subclonal a re shown with bo otstrappe d 95% confiden ce intervals . Point
estimate s are highlight ed when they do not overl ap odds of 1:1. B ottom,
relative odd s of the events bein g early or late clona l are shown as above. S ample
sizes (numbe r of patients) are show n across the top. b, Th ree represen tative
patients with acral melanoma and chromothripsis-induced amplification that
simultaneously affect s TERT and CCND1. The black p oints (top) represent
sequenc e coverage from indiv idual genomic bin s, with SVs shown as c oloured
arcs (transloc ation in black, d eletion in pur ple, duplicatio n in brown, tail-to-t ail
inversion in cya n and head-to-he ad inversion in gre en). Bottom, the vari ant
allele fractions of somatic point mutations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
90 | Nature | Vol 578 | 6 February 2020
Article
Germline effects on somatic mutations
We integrated the set of 88million germline genetic variant calls
with somatic mutations in PCAWG, to study germline determinants
of somatic mutation rates and patterns. First, we performed a genome-
wide association study of somatic mutational processes with common
germline variants (minor allele frequency (MAF)>5%) in individuals
with inferred European ancestry. An independent genome-wide associ-
ation study was performed in East Asian individuals from Asian cancer
genome projects. We focused on two prevalent endogenous muta
-
tional processes: spontaneous deamination of 5-methylcytosine at
CpG dinucleotides
5
(signature 1) and activity of the APOBEC3 family of
cytidine deaminases64 (signatures 2 and 13). No locus reached genome-
wide significance (P<5×10
−8
) for signature 1 (Extended Data Fig.10a,
b). However, a locus at 22q13.1 predicted an APOBEC3B-like mutagen-
esis at the pan-cancer level65 (Fig.6a). The strongest signal at 22q13.1
was driven by rs12628403, and the minor (non-reference) allele was
protective against APOBEC3B-like mutagenesis (β=−0.43, P=5.6×10−9,
MAF=8.2%, n=1,201donors) (Extended Data Fig.10c). This variant
tags a common, approximately 30-kb germline SV that deletes the
APOBEC3B coding sequence and fuses the APOBEC3B 3′ untranslated
region with the coding sequence of APOBEC3A. The deletion is known
to increase breast cancer risk and APOBEC mutagenesis in breast can-
cer genomes
66,67
. Here, we found that rs12628403 reduces APOBEC3B-
like mutagenesis specifically in cancer types with low levels of APOBEC
mutagenesis (β
low
=−0.50, P
low
=1×10
−8
; β
high
=0.17, P
high
=0.2), and
increases APOBEC3A-like mutagenesis in cancer types with high lev-
els of APOBEC mutagenesis (β
high
=0.44, P
high
=8×10
−4
; β
low
=−0.21,
Plow=0.02). Moreover, we identified a second, novel locus at 22q13.1
that was associated with APOBEC3B-like mutagenesis across cancer
types (rs2142833, β=0.23, P=1.3×10−8). We independently validated the
association between both loci and APOBEC3B-like mutagenesis using
East Asian individuals from Asian cancer genome projects
(β
rs12628403
=0.57, P
rs12628403
=4.2×10
−12
; β
rs2142833
=0.58, P
rs2142833
=8×10
−15
)
(Extended Data Fig.10d). Notably, in a conditional analysis that
accounted for rs12628403, we found that rs2142833 and rs12628403
are inherited independently in Europeans (r2<0.1), and rs2142833
remained significantly associated with APOBEC3B-like mutagenesis
in Europeans (β
EUR
=0.17, P
EUR
=3×10
−5
) and East Asians (β
ASN
=0.25,
PASN=2×10−3) (Extended Data Fig.10e, f). Analysis of donor-matched
expression data further suggests that rs2142833 is a cis-expression
quantitative trait locus (eQTL) for APOBEC3B at the pan-cancer level
(β=0.19, P=2×10−6) (Extended Data Fig.10g, h), consistent with
cis-eQTL studies in normal cells68,69.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
22q13.1
ac
(1)
(2)
(3)
(4)
b
d
–log
10
(P)
Chromosomes –log10(Pexp)
–log10(Pobs)
0 0.5 1.0 1.5 2.0 2.5 3.0
0
1
2
3
4
5
6
7
8
BRCA2
MBD4
5
Long read (kb)
Chr. 2: 59,279,205–59,289,368 Chr. 5: 14,8202,017–148,202,805
123a 3b
0
10
Chr. 2
Chr. 5
Chr. 2
Chr. 5
Germline Tumour
12 3
Short
reads
12345678910 11 12 13 14 15 16 17 1819202122
1
2
3
4
5
6
7
8
9
10
Contribution (%)
015≥10
Volcan
o
size
Strombolian PlinianNot hot
Chromosome
InterchromosomalDeletion
Duplication Inversion (tail-to-tail)
Inversion (head-to-head)
Prost–AdenoCA
(DO51965)
1
2
3
4
5
6
7
8
9
10
11
X
22
21
20
19
18
17
16
15
14
13
12
Y
Fig. 6 | Germ line deter minants of the s omatic muta tion landsc ape.
a, Associ ation betwe en common (MA F>5%) germline variants a nd somatic
APOBEC3 B-like mutagene sis in individua ls of European anc estry (n=1 ,201) .
Two-sided hypoth esis testin g was performe d with PLINK v.1.9. To mitigate
multiple-hy pothesis te sting, the sig nifican ce threshold was s et to genome-
wide sign ificance (P<5×10−8). b, Templated inser tion SVs in a BRCA1-
associa ted prostate c ancer. Left, chromo some bands (1); SVs≤10megabase s
(Mb) (2); 1-kb read depth corr ected to copy num ber 0–6 (3); inter- and
intrachrom osomal SVs>10Mb (4). Right, a complex somat ic SV compose d of a
2.2-kb t andem duplica tion on chromoso me 2 together wi th a 232-base-pai r
(bp) inverted t emplated inse rtion SV that is d erived from chro mosome 5 and
insert ed inbetwee n the tandem dup lication (bo ttom). Consensu s sequence
alignment of locally assembled Oxford Nanopore Technologies long
sequenc ing reads to chrom osomes 2 and 5 of th e human reference ge nome
(top). Breakpoints a re circled and marked a s 1 (beginnin g of tandem
duplicati on), 2 (end of tandem duplica tion) or 3 (inverted tem plated inser tion).
For each break point, the middl e panel shows Illumi na short reads a t SV
breakpoints. c, Asso ciation bet ween rare germlin e PTVs (MAF<0.5%) an d
somatic Cp G mutagenesi s (approximately wit h signature 1) in indi viduals of
European ancestry (n=1,201). Genes highlig hted in blue or red we re associate d
with lower or hi gher somatic m utation rate s. Two-sided hypothe sis testing w as
perform ed using linear-regre ssion model s with sex, age at diag nosis and
cancer proj ect as variab les. To mitigate multi ple-hypothe sis testing , the
signif icance thres hold was set to exome -wide signif icance (P<2 .5×10−6).
The black lin e represent s the identit y line that would be followe d if the
obser ved P values followed the n ull expectat ion; the shaded are a shows
the 95% confidence intervals. d, Catalogue o f polymorphi c germline L1 sou rce
element s that are active in c ancer. The chromos omal map shows ger mline
source L1 el ements as volc ano symbols . Each volcano is co lour-coded
according to t he type of sou rce L1 activ ity. The contrib ution of each so urce
locus (expres sed as a percen tage) to the total num ber of transdu ctions
identif ied in PCAWG tumour s is represente d as a gradient of vol cano size, wit h
top contributing elements exhibiting larger sizes.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature | Vol 578 | 6 February 2020 | 91
Second, we performed a rare-variant association study (MAF <0.5%)
to investigate the relationship between germline PTVs and somatic
DNA rearrangements in individuals with European ancestry (Extended
Data Fig.11a–c). Germline BRCA2 and BRCA1 PTVs were associated
with an increased burden of small (less than 10kb) somatic SV dele
-
tions (P=1×10
−8
) and tandem duplications (P=6×10
−13
), respectively,
corroborating recent studies in breast and ovarian cancer30,70. In
PCAWG data, this pattern also extends to other tumour types, includ-
ing adenocarcinomas of the prostate and pancreas6, typically in the
setting of biallelic inactivation. In addition, tumours with high lev-
els of small SV tandem duplications frequently exhibited a novel and
distinct class of SVs termed ‘cycles of templated insertions’
6
. These
complex SV events consist of DNA templates that are copied from
across the genome, joined into one contiguous sequence and inserted
into a single derivative chromosome. We found a significant associa-
tion between germline BRCA1 PTVs and templated insertions at the
pan-cancer level (P=4×10−1 5) (Extended Data Fig.11d, e). Whole-genome
long-read sequencing data generated for a BRCA1-deficient PCAWG
prostate tumour verified the small tandem-duplication and templated-
insertion SV phenotypes (Fig.6b). Almost all (20 out of 21) of BRCA1-
associated tumours with a templated-insertion SV phenotype displayed
combined germline and somatic hits in the gene. Together, these data
suggest that biallelic inactivation of BRCA1 is a driver of the templated-
insertion SV phenotype.
Third, rare-variant association analysis revealed that patients with
germline MBD4 PTVs had increased rates of somatic C>T mutation
rates at CpG dinucleotides (P<2.5×10
−6
) (Fig.6c and Extended Data
Fig.11f, g). Analysis of previously published whole-exome sequencing
samples from the TCGA (n=8,134) replicated the association between
germline MBD4 PTVs and increased somatic CpG mutagenesis at the
pan-cancer level (P=7.1×10
−4
) (Extended Data Fig.11h). Moreover,
gene-expression profiling revealed a significant but modest correlation
between MBD4 expression and somatic CpG mutation rates between
and within PCAWG tumour types (Extended Data Fig.11i–k). MBD4
encodes a DNA-repair gene that removes thymidines from T:G mis-
matches within methylated CpG sites
71
, a functionality that would be
consistent with a CpG mutational signature in cancer.
Fourth, we assessed long interspersed nuclear elements (LINE-1; L1
hereafter) that mediate somatic retrotransposition events
7274
. We iden-
tified 114 germline source L1 elements capable of active somatic retro-
transposition, including 70 that represent insertions with respect to the
human reference genome (Fig.6d and Supplementary Table7), and 53
that were tagged by single-nucleotide polymorphisms in strong linkage
disequilibrium (Supplementary Table7). Only 16 germline L1 elements
accounted for 67% (2,440 out of 3,669) of all L1-mediated transduc-
tions
10
detected in the PCAWG dataset (Extended Data Fig.12a). These
16 hot-L1 elements followed two broad patterns of somatic activity (8
of each), which we term Strombolian and Plinian in analogy to patterns
of volcanic activity. Strombolian L1s are frequently active in cancer,
but mediate only small-to-modest eruptions of somatic L1 activity in
cancer samples (Extended Data Fig.12b). By contrast, Plinian L1s are
more rarely seen, but display aggressive somatic activity. Whereas
Strombolian elements are typically relatively common (MAF>2%) and
sometimes even fixed in the human population, all Plinian elements
were infrequent (MAF≤2%) in PCAWG donors (Extended Data Fig.12c;
P=0.001, Mann–Whitney U-test). This dichotomous pattern of activ-
ity and allele frequency may reflect differences in age and selective
pressures, with Plinian elements potentially inserted into the human
germline more recently. PCAWG donors bear on average between 50
and 60 L1 source elements and between 5 and 7 elements with hot
activity (Extended Data Fig.12d), but only 38% (1,075 out of 2,814) of
PCAWG donors carried ≥1 Plinian element. Some L1 germline source
loci caused somatic loss of tumour-suppressor genes (Extended Data
Fig.12e). Many are restricted to individual continental population
ancestries (Extended Data Fig.12f–j).
Replicative immortality
One of the hallmarks of cancer is the ability of cancer to evade cellular
senescence
21
. Normal somatic cells typically have finite cell division
potential; telomere attrition is one mechanism to limit numbers of
mitoses75. Cancers enlist multiple strategies to achieve replicative
immortality. Overexpression of the telomerase gene, TERT, which main-
tains telomere lengths, is especially prevalent. This can be achieved
through point mutations in the promoter that lead to denovo tran-
scription factor binding34,37; hitching TERT to highly active regulatory
elements elsewhere in the genome
46,76
; insertions of viral enhancers
upstream of the gene
77,78
; and increased dosage through chromosomal
amplification, as we have seen in melanoma (Fig.5b). In addition, there is
an ‘alternative lengthening of telomeres’ (ALT) pathway, in which telom-
eres are lengthened through homologous recombination, mediated by
loss-of-function mutations in the ATRX and DAXX genes79.
a
dc
b
0
25
50
75
100
Cluster 4(2396 samples)
Cluster 3 (33 samples)
Cluster 2 (42 samples)
Cluster 1 (47 samples)
ATRX DAXX RB1TERT
ATRX DAXX RB1TERT
ATRX DAXX RB1TERT
ATRX DAXX RB1 TERT
0
10
20
30
0
10
20
30
40
50
0
10
20
30
0
3
6
9
Gene
Mutated (%)
CNV loss
SNV
SV
Truncating
Fraction of patients
Cluster 3
ALT features, cluster 2
ALT features, cluster 1
Multiple mutations
TERT promoter SNV
TERT SV hijacking
TERT amplication
ATRX or DAXX
0
0.2
0.4
0.6
0.8 No TMM
mutations
TMM
mutations
t-SNE dimension 2
Distribution of clusters (%)
O
O
Cluster
T1
T2
T3
T4
N1
N2
N3
N4
t-SNE dimension 1
Thy−AdenoCA
SoftTissue−Leiomyo
Bone−Osteosarc
Panc−Endocrine
CNS−LGG
Bone−Epith
CNS−GBM
Skin−Melanoma
Liver−HCC
CNS−Medullo
Eso−AdenoCA
Kidney−RCC
Skin−Melanoma
Thy−AdenoCA
Panc−Endocrine
Head−SCC
Liver−HCC
ColoRect−AdenoCA
CNS−Medullo
Kidney−ChRCC
Kidney−RCC
Uterus−AdenoCA
Ovary−AdenoCA
Eso−AdenoCA
Stomach−AdenoCA
Breast−AdenoCA
Lymph−CLL
Panc−AdenoCA
Lung−SCC
Lymph−BNHL
Prost−AdenoCA
CNS−PiloAstro
Fig. 7 | Telomere se quence pat terns acro ss PCAWG. a, Scatter p lot of the
clusters of t elomere patt erns identif ied across P CAWG using t-distributed
stochastic neighbour embedding (t-SNE), based on n=2, 518 tumour sa mples
and their mat ched normal sa mples. Axe s have arbitrary dim ensions such t hat
samples w ith similar telom ere profiles a re clustered toge ther and sample s with
dissimilar t elomere prof iles are far apart w ith high probab ility. b, Distribution
of the four tumou r-specific clus ters of telomere p atterns in s elected tum our
types f rom PCAWG. c, Distribution of relevant driver mutations associated
with alternative lengthening of telomere and normal telomere maintenance
across the fou r clusters. d, Distribution of telomere maintenance
abnormali ties across tu mour type s with more than 40 p atients in PC AWG.
Samples we re classifie d as tumour clust ers 1–3 if they fell into a relevant cl uster
without mutations in TERT, ATRX or DAXX and had no A LT phenotype. TM M,
telomere maintenance mechanisms.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
92 | Nature | Vol 578 | 6 February 2020
Article
As reported in a companion paper13, 16% of tumours in the PCAWG
dataset exhibited somatic mutations in at least one of ATRX, DAXX
and TERT. TERT alterations were detected in 270samples, whereas
128tumours had alterations in ATRX or DAXX, of which 71 were protein-
truncating. In the companion paper, which focused on describing pat-
terns of ALT and TERT-mediated telomere maintenance
13
, 12features
of telomeric sequence were measured in the PCAWG cohort. These
included counts of nine variants of the core hexameric sequence,
the number of ectopic telomere-like insertions within the genome,
the number of genomic breakpoints and telomere length as a ratio
between tumour and normal. Here we used the 12features as an over-
view of telomere integrity across all tumours in the PCAWG dataset.
On the basis of these 12features, tumour samples formed 4 dis-
tinct subclusters (Fig.7a and Extended Data Fig.13a), suggesting that
telomere-maintenance mechanisms are more diverse than the well-
established TERT and ALT dichotomy. Clusters C1 (47tumours) and
C2 (42tumours) were enriched for traits of the ALT pathway—having
longer telomeres, more genomic breakpoints, more ectopic telomere
insertions and variant telomere sequence motifs (Supplementary
Fig.9). C1 and C2 were distinguished from one another by the latter
having a considerable increase in the number of TTCGGG and TGAGGG
variant motifs among the telomeric hexamers. Thyroid adenocarci-
nomas were markedly enriched among C3 samples (26 out of 33 C3
samples; P<10−16); the C1 cluster (ALT subtype 1) was common among
sarcomas; and both pancreatic endocrine neoplasms and low-grade
gliomas had a high proportion of samples in the C2 cluster (ALT sub-
type 2) (Fig.7b). Notably, some of the thyroid adenocarcinomas and
pancreatic neuroendocrine tumours that cluster together (cluster C3)
had matched normal samples that also cluster together (normal cluster
N3) (Extended Data Fig.13a) and which share common properties. For
example, the GTAGGG repeat was overrepresented among samples in
this group (Supplementary Fig.10).
Somatic driver mutations were also unevenly distributed across the
four clusters (Fig.7c). C1 tumours were enriched for RB1 mutations or
SVs (P=3×10
−5
), as well as frequent SVs that affected ATRX (P=6×10
−14
),
but not DAXX. RB1 and ATRX mutations were largely mutually exclusive
(Extended Data Fig.13b). By contrast, C2 tumours were enriched for
somatic point mutations in ATRX and DAXX (P=6×10−5), but not RB1.
The enrichment of RB1 mutations in C1 remained significant when
only leiomyosarcomas and osteosarcomas were considered, confirm-
ing that this enrichment is not merely a consequence of the different
distribution of tumour types across clusters. C3 samples had frequent
TERT promoter mutations (30%; P=2×10−6).
There was a marked predominance of RB1 mutations in C1. Nearly
a third of the samples in C1 contained an RB1 alteration, which were
evenly distributed across truncating SNVs, SVs and shallow dele-
tions (Extended Data Fig.13c). Previous research has shown that RB1
mutations are associated with long telomeres in the absence of TERT
mutations and ATRX inactivation80, and studies using mouse models
have shown that knockout of Rb-family proteins causes elongated
telomeres
81
. The association with the C1 cluster here suggests that RB1
mutations can represent another route to activating the ALT pathway,
which has subtly different properties of telomeric sequence com-
pared with the inactivation of DAXX—these fall almost exclusively in
cluster C2.
Tumour types with the highest rates of abnormal telomere mainte-
nance mechanisms often originate in tissues that have low endogenous
replicative activity (Fig.7d). In support of this, we found an inverse cor-
relation between previously estimated rates of stem cell division across
tissues
82
and the frequency of telomere maintenance abnormalities
(P=0.01, Poisson regression) (Extended Data Fig.13d). This suggests
that restriction of telomere maintenance is an important tumour-
suppression mechanism, particularly in tissues with low steady-state
cellular proliferation, in which a clone must overcome this constraint
to achieve replicative immortality.
Conclusions and future perspectives
The resource reported in this paper and its companion papers has
yielded insights into the nature and timing of the many mutational
processes that shape large- and small-scale somatic variation in the
cancer genome; the patterns of selection that act on these varia-
tions; the widespread effect of somatic variants on transcription;
the complementary roles of the coding and non-coding genome for
both germline and somatic mutations; the ubiquity of intratumoral
heterogeneity; and the distinctive evolutionary trajectory of each
cancer type. Many of these insights can be obtained only from an
integrated analysis of all classes of somatic mutation on a whole-
genome scale, and would not be accessible with, for example, targeted
exome sequencing.
The promise of precision medicine is to match patients to targeted
therapies using genomics. A major barrier to its evidence-based imple-
mentation is the daunting heterogeneity of cancer chronicled in these
papers, from tumour type to tumour type, from patient to patient, from
clone to clone and from cell to cell. Building meaningful clinical predic-
tors from genomic data can be achieved, but will require knowledge
banks comprising tens of thousands of patients with comprehensive
clinical characterization
83
. As these sample sizes will be too large for
any single funding agency, pharmaceutical company or health system,
international collaboration and data sharing will be required. The next
phase of ICGC, ICGC-ARGO (https:// ww w.icgc-argo.org/), will bring
the cancer genomics community together with healthcare providers,
pharmaceutical companies, data science and clinical trials groups to
build comprehensive knowledge banks of clinical outcome and treat-
ment data from patients with a wide variety of cancers, matched with
detailed molecular profiling.
Extending the story begun by TCGA, ICGC and other cancer genom-
ics projects, the PCAWG has brought us closer to a comprehensive
narrative of the causal biological changes that drive cancer phenotypes.
We must now translate this knowledge into sustainable, meaningful
clinical treatments.
Online content
Any methods, additional references, Nature Research reporting sum-
maries, source data, extended data, supplementary information,
acknowledgements, peer review information; details of author con-
tributions and competing interests; and statements of data and code
availability are available at https://doi.org/10.1038/s41586-020-1969-6.
1. Pleasance, E. D. etal. A comprehensive catalogue of somatic mutations from a human
cancer genome. Nature 463, 191–196 (2010).
2. Pleasance, E. D. etal. A small-cell lung cancer genome with complex signatures of
tobacco exposure. Nature 463, 184–190 (2010).
3. Ley, T. J. etal. DNA sequencing of a cytogenetically normal acute myeloid leukaemia
genome. Nature 456, 66–72 (2008).
4. Rheinbay, E. etal. Analyses of non-coding somatic drivers in 2,693 cancer whole
genomes. Nature https://doi.org/10.1038/s41586-020-1965-x (2020).
5. Alexandrov, L. B. etal. The repertoire of mutational signatures in human cancer. Nature
https://doi.org/10.1038/s41586-020-1943-3 (2020).
6. Li, Y. etal. Patterns of somatic structural variation in human cancer genomes. Nature
https://doi.org/10.1038/s41586-019-1913-9 (2020).
7. Gerstung, M. etal. The evolutionary history of 2,658 cancers. Nature https://doi.org/
10.1038/s41586-019-1907-7 (2020).
8. PCAWG Transcriptome Core Group etal. Genomic basis of RNA alterations in cancer.
Nature https://doi.org/10.1038/s41586-020-1970-0 (2020).
9. Zhang, Y. etal. High-coverage whole-genome analysis of 1,220 cancers reveals
hundreds of genes deregulated by rearrangement-mediated cis-regulatory
alterations. Nat. Commun. https://doi.org/10.1038/s41467-019-13885-w (2020).
10. Rodriguez-Martin, B. etal. Pan-cancer analysis of whole genomes identiies driver
rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. https://doi.
org/10.1038/s41588-019-0562-0 (2020).
11. Zapatka, M. etal. The landscape of viral associations in human cancers. Nat. Genet.
https://doi.org/10.1038/s41588-019-0558-9 (2020).
12. Jiao, W. etal. A deep learning system can accurately classify primary and metastatic
cancers based on patterns of passenger mutations. Nat. Commun. https://doi.org/
10.1038/s41467-019-13825-8 (2020).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Nature | Vol 578 | 6 February 2020 | 93
13. Sieverling, L. etal. Genomic footprints of activated telomere maintenance mechanisms
in cancer. Nat. Commun. https://doi.org/10.1038/s41467-019-13824-9 (2020).
14. Yuan, Y. etal. Comprehensive molecular characterization of mitochondrial genomes in
human cancers. Nat. Genet. https://doi.org/10.1038/s41588-019-0557-x (2020).
15. Akdemir, K. C. etal. Chromatin folding domains disruptions by somatic genomic
rearrangements in human cancers. Nat. Genet. https://doi.org/10.1038/s41588-019-
0564-y (2020).
16. Reyna, M. A. etal. Pathway and network analysis of more than 2,500 whole cancer
genomes. Nat. Commun. https://doi.org/10.1038/s41467-020-14351-8 (2020).
17. Bailey, M. H. etal. Retrospective evaluation of whole exome and genome mutation calls
in 746 cancer samples. Nat. Commun. (2020).
18. Cortes-Ciriano, I. etal. Comprehensive analysis of chromothripsis in 2,658 human
cancers using whole-genome sequencing. Nat. Genet. https://doi.org/10.1038/s41588-
019-0576-7 (2020).
19. Bray, F., Ren, J.-S., Masuyer, E. & Ferlay, J. Global estimates of cancer prevalence for 27
sites in the adult population in 2008. Int. J. Cancer 132, 1133–1145 (2013).
20. Tarver, T. Cancer Facts & Figures 2012. American Cancer Society (ACS). J. Consum. Health
Internet 16, 366–367 (2012).
21. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144,
646–674 (2011).
22. International Cancer Genome Consortium. International network of cancer genome
projects. Nature 464, 993–998 (2010).
23. Bailey, M. H. etal. Comprehensive characterization of cancer driver genes and mutations.
Cell 173, 371–385 (2018).
24. Sanchez-Vega, F. etal. Oncogenic signaling pathways in The Cancer Genome Atlas. Cell
173, 321–337 (2018).
25. Hoadley, K. A. etal. Cell-of-origin patterns dominate the molecular classiication of
10,000 tumors from 33 types of cancer. Cell 173, 291–304 (2018).
26. Stein, L. D., Knoppers, B. M., Campbell, P., Getz, G. & Korbel, J. O. Data analysis: create a
cloud commons. Nature 523, 149–151 (2015).
27. Phillips, M. etal. Genomics: data sharing needs international code of conduct. Nature
https://doi.org/10.1038/d41586-020-00082-9 (2020).
28. Krochmalski, J. Developing with Docker (Packt Publishing, 2016).
29. Welch, J. S. etal. The origin and evolution of mutations in acute myeloid leukemia. Cell
150, 264–278 (2012).
30. Nik-Zainal, S. etal. Landscape of somatic mutations in 560 breast cancer whole-genome
sequences. Nature 534, 47–54 (2016).
31. Meier, B. etal. C. elegans whole-genome sequencing reveals mutational signatures
related to carcinogens and DNA repair deiciency. Genome Res. 24, 1624–1636 (2014).
32. Martincorena, I. etal. Universal patterns of selection in cancer and somatic tissues. Cell
171, 1029–1041 (2017).
33. Tamborero, D. etal. Cancer Genome Interpreter annotates the biological and clinical
relevance of tumor alterations. Genome Med. 10, 25 (2018).
34. Huang, F. W. etal. Highly recurrent TERT promoter mutations in human melanoma.
Science 339, 957–959 (2013).
35. Rheinbay, E. etal. Recurrent and functional regulatory mutations in breast cancer. Nature
547, 55–60 (2017).
36. Fredriksson, N. J., Ny, L., Nilsson, J. A. & Larsson, E. Systematic analysis of noncoding
somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet.
46, 1258–1263 (2014).
37. Horn, S. etal. TERT promoter mutations in familial and sporadic melanoma. Science 339,
959–961 (2013).
38. Ciriello, G. etal. Emerging landscape of oncogenic signatures across human cancers.
Nat. Genet. 45, 1127–1133 (2013).
39. Rahman, N. Realizing the promise of cancer predisposition genes. Nature 505, 302–308
(2014).
40. Pearl, L. H., Schierz, A. C., Ward, S. E., Al-Lazikani, B. & Pearl, F. M. G. Therapeutic
opportunities within the DNA damage response. Nat. Rev. Cancer 15, 166–180 (2015).
41. Taylor-Weiner, A. etal. DeTiN: overcoming tumor-in-normal contamination. Nat. Methods
15, 531–534 (2018).
42. Fujimoto, A. etal. Whole-genome mutational landscape and characterization of
noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).
43. Shlush, L . I. Age-related clonal hematopoiesis. Blood 131, 496–504 (2018).
44. Northcott, P. A. etal. The whole-genome landscape of medulloblastoma subtypes.
Nature 547, 311–317 (2017).
45. Scarpa, A. etal. Whole-genome landscape of pancreatic neuroendocrine tumours.
Nature 543, 65–71 (2017).
46. Davis, C. F. etal. The somatic genomic landscape of chromophobe renal cell carcinoma.
Cancer Cell 26, 319–330 (2014).
47. Berger, M. F. etal. The genomic complexity of primary human prostate cancer. Nature
470, 214–220 (2011).
48. Baca, S. C. etal. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
49. Nik-Zainal, S. etal. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
50. Nik-Zainal, S. etal. Mutational processes molding the genomes of 21 breast cancers. Cell
149, 979–993 (2012).
51. Roberts, S. A. etal. Clustered mutations in yeast and in human cancers can arise from
damaged long single-strand DNA regions. Mol. Cell 46, 424–435 (2012).
52. Rausch, T. etal. Genome sequencing of pediatric medulloblastoma links catastrophic
DNA rearrangements with TP53 mutations. Cell 148, 59–71 (2012).
53. Stephens, P. J. etal. Massive genomic rearrangement acquired in a single catastrophic
event during cancer development. Cell 144, 27–40 (2011).
54. Korbel, J. O. & Campbell, P. J. Criteria for inference of chromothripsis in cancer genomes.
Cell 152, 1226–1236 (2013).
55. Zhang, C.-Z. etal. Chromothripsis from DNA damage in micronuclei. Nature 522, 179–184
(2015).
56. The Cancer Genome Atlas Research Network. Integrated genomic characterization of
papillary thyroid carcinoma. Cell 159, 676–690 (2014).
57. Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair
targets mutations to active genes. Cell 170, 534–547 (2017).
58. Mardin, B. R. etal. A cell-based model system links chromothripsis with hyperploidy. Mol.
Syst. Biol. 11, 828 (2015).
59. Weischenfeldt, J. etal. Integrative genomic analyses reveal an androgen-driven somatic
alteration landscape in early-onset prostate cancer. Cancer Cell 23, 159–170 (2013).
60. Garsed, D. W. etal. The architecture and evolution of cancer neochromosomes. Cancer
Cell 26, 653–667 (2014).
61. Durinck, S. etal. Temporal dissection of tumorigenesis in primary cancers. Cancer Discov.
1, 137–143 (2011).
62. Hayward, N. K. etal. Whole-genome landscapes of major melanoma subtypes. Nature
545, 175–180 (2017).
63. The Cancer Genome Atlas Network. Genomic classiication of cutaneous melanoma. Cell
161, 1681–1696 (2015).
64. Alexandrov, L. B. etal. Signatures of mutational processes in human cancer. Nature 500,
415–421 (2013).
65. Chan, K. etal. An APOBEC3A hypermutation signature is distinguishable from the
signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47,
1067–1072 (2015).
66. Nik-Zainal, S. etal. Association of a germline copy number polymorphism of APOBEC3A
and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer.
Nat. Genet. 46, 487–491 (2014).
67. Middlebrooks, C. D. etal. Association of germline variants in the APOBEC3 region with
cancer risk and enrichment with APOBEC-signature mutations in tumors. Nat. Genet. 48,
1330–1338 (2016).
68. Westra, H.-J. etal. Systematic identiication of trans eQTLs as putative drivers of known
disease associations. Nat. Genet. 45, 1238–1243 (2013).
69. Stranger, B. E. etal. Population genomics of human gene expression. Nat. Genet. 39,
1217–1224 (2007).
70. Menghi, F. etal. The tandem duplicator phenotype as a distinct genomic coniguration in
cancer. Proc. Natl Acad. Sci. USA 113, E2373–E2382 (2016).
71. Hendrich, B., Hardeland, U., Ng, H. H., Jiricny, J. & Bird, A. The thymine glycosylase MBD4
can bind to the product of deamination at methylated CpG sites. Nature 401, 301–304
(1999).
72. Lee, E. etal. Landscape of somatic retrotransposition in human cancers. Science 337,
967–971 (2012).
73. Tubio, J. M. C. etal. Extensive transduction of nonrepetitive DNA mediated by L1
retrotransposition in cancer genomes. Science 345, 1251343–1251343 (2014).
74. Helman, E. etal. Somatic retrotransposition in human cancer revealed by whole-genome
and exome sequencing. Genome Res. 24, 1053–1063 (2014).
75. Shay, J. W. & Wright, W. E. Haylick, his limit, and cellular ageing. Nat. Rev. Mol. Cell Biol. 1,
72–76 (2000).
76. Peifer, M. etal. Telomerase activation by genomic rearrangements in high-risk
neuroblastoma. Nature 526, 700–704 (2015).
77. Totoki, Y. etal. Trans-ancestry mutational landscape of hepatocellular carcinoma
genomes. Nat. Genet. 46, 1267–1273 (2014).
78. Paterlini-Bréchot, P. etal. Hepatitis B virus-related insertional mutagenesis occurs
frequently in human liver cancers and recurrently targets human telomerase gene.
Oncogene 22, 3911–3916 (2003).
79. Heaphy, C. M. etal. Prevalence of the alternative lengthening of telomeres telomere
maintenance mechanism in human cancer subtypes. Am. J. Pathol. 179, 1608–1615
(2011).
80. Barthel, F. P. etal. Systematic analysis of telomere length and somatic alterations in 31
cancer types. Nat. Genet. 49, 349–357 (2017).
81. García-Cao, M., Gonzalo, S., Dean, D. & Blasco, M. A. A role for the Rb family of proteins in
controlling telomere length. Nat. Genet. 32, 415–419 (2002).
82. Tomasetti, C. & Vogelstein, B. Variation in cancer risk among tissues can be explained by
the number of stem cell divisions. Science 347, 78–81 (2015).
83. Gerstung, M. etal. Precision oncology for acute myeloid leukemia using a knowledge
bank approach. Nat. Genet. 49, 332–340 (2017).
84. O’Connor, B. D. etal. The Dockstore: enabling modular, community-focused sharing of
Docker-based genomics tools and worklows. F1000Res. 6, 52 (2017).
85. Zhang, J. etal. The International Cancer Genome Consortium Data Portal.
Nat. Biotechnol. 37, 367–369 (2019).
86. Miller, C. A., Qiao, Y., DiSera, T., D’Astous, B. & Marth, G. T. bam.iobio: a web-based,
real-time, sequence alignment ile inspector. Nat. Methods 11, 1189–1189 (2014).
87. Goldman, M. etal. The UCSC Xena platform for public and private cancer genomics data
visualization and interpretation. Preprint at https://www.biorxiv.org/content/10.1101/
326470v6 (2019)
88. Papatheodorou, I. etal. Expression Atlas: gene and protein expression across multiple
studies and organisms. Nucleic Acids Res. 46, D246–D251 (2018).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afiliations.
Open Access This article is licensed under a Creative Commons Attribution
4.0 International License, which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The images or other third party material in this article are
included in the article’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons license and your
intended use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a copy of this license,
visit http://creativecommons.org/licenses/by/4.0/.
© The Author(s) 2020
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Article
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
Peter J. Campbell1,2,745 *, Gad Getz3,4,5,6,745*, Jan O. Korbel7,8,745*, Joshua M. Stuart9,745*,
Jennifer L. Jennings10,11,745, Lincoln D. Stein12,13,745*, Marc D. Perry14,15, Hardeep K. Nahal-
Bose15, B. F. Francis Ouellette16,17, Constance H. Li12,18, Esther Rheinbay3,6,19 , G. Petur
Nielsen19, Dennis C. Sgroi19, Chin-Lee Wu19, William C. Faquin19, Vikram Deshpande19, Paul
C. Boutros12,18,20,21, Alexander J. Lazar22, Katherine A. Hoadley23,24, David N. Louis19, L.
Jonathan Dursi12,25, Christina K. Yung15, Matthew H. Bailey26,27, Gordon Saksena3, Keiran M.
Raine1, Ivo Buchhalter28,29,30, Kortine Kleinheinz28,30, Matthias Schlesner28,31, Junjun Zhang15,
Wenyi Wang32, David A. Wheeler33,34, Li Ding26,27,35, Jared T. Simpson12,36, Brian D.
O’Connor15,37, Sergei Yakneen8, Kyle Ellrott38, Naoki Miyoshi39, Adam P. Butler1, Romina
Royo40, Solomon I. Shorser12, Miguel Vazquez40,41, Tobias Rausch8, Grace Tiao3, Sebastian
M. Waszak8, Bernardo Rodriguez-Martin42,43,44, Suyash Shringarpure45, Dai-Ying Wu46,
German M. Demidov47,48,49, Olivier Delaneau50,51,52, Shuto Hayashi39, Seiya Imoto39, Nina
Habermann8, Ayellet V. Segre3,53 , Erik Garrison1, Andy Cafferkey7, Eva G. Alvarez42,43,44, José
María Heredia-Genestar54, Francesc Muyas47,48,49, Oliver Drechsel47,49 , Alicia L.
Bruzos42,43,44, Javier Temes42,43, Jorge Zamora1,42,43,44, Adrian Baez-Ortega55, Hyung-Lae
Kim56, R. Jay Mashl27,5 7, Kai Ye58,59, Anthony DiBiase60, Kuan-lin Huang27,61, Ivica Letunic62,
Michael D. McLellan26,27,35, Steven J. Newhouse7, Tal Shmaya46, Sushant Kumar63,64, David C.
Wedge1,65,66, Mark H. Wright45, Venkata D. Yellapantula67,6 8, Mark Gerstein63,6 4,69, Ekta
Khurana70,71,7 2,73, Tomas Marques-Bonet74,75,76 ,77, Arcadi Navarro74,75,76 , Carlos D.
Bustamante45,78 , Reiner Siebert79,80, Hidewaki Nakagawa81, Douglas F. Easton82,83, Stephan
Ossowski47,48,49, Jose M. C. Tubio42,43,44, Francisco M. De La Vega45,46,78, Xavier Estivill47,84,
Denis Yuen12, George L. Mihaiescu15, Larsson Omberg85, Vincent Ferretti15,86,
Radhakrishnan Sabarinathan87,88,89, Oriol Pich87,89 , Abel Gonzalez-Perez87,8 9, Amaro Taylor-
Weiner90, Matthew W. Fittall91, Jonas Demeulemeester91,92, Maxime Tarabichi1,91, Nicola D.
Roberts1, Peter Van Loo91,92, Isidro Cortés-Ciriano93,94,95 , Lara Urban7,8, Peter Park94,95, Bin
Zhu96, Esa Pitkänen8, Yilong Li1, Natalie Saini97, Leszek J. Klimczak98, Joachim
Weischenfeldt8,99,100, Nikos Sidiropoulos100, Ludmil B. Alexandrov1,101, Raquel
Rabionet47,49,102, Georgia Escaramis47,103,104, Mattia Bosio40,47,49, Aliaksei Z. Holik47, Hana
Susak47,4 9, Aparna Prasad49, Serap Erkek8, Claudia Calabrese7,8 , Benjamin Raeder8, Eoghan
Harrington105, Simon Mayes106, Daniel Turner106, Sissel Juul105, Steven A. Roberts107, Lei
Song96, Roelof Koster108, Lisa Mirabello96, Xing Hua96, Tomas J. Tanskanen109, Marta Tojo44,
Jieming Chen64,110, Lauri A. Aaltonen111, Gunnar Rätsch112,113,114,115,116,117, Roland F.
Schwarz7,118,119,120, Atul J. Butte121, Alvis Brazma7, Stephen J. Chanock96, Nilanjan
Chatterjee122,123, Oliver Stegle7,8,124, Olivier Harismendy125, G. Steven Bova126, Dmitry A.
Gordenin97, David Haan9, Lina Sieverling127,128, Lars Feuerbach127, Don Chalmers129, Yann
Joly130, Bartha Knoppers130, Fruzsina Molnár-Gábor131, Mark Phillips130, Adrian
Thorogood130, David Townend130, Mary Goldman132, Nuno A. Fonseca7,13 3, Qian Xiang15,
Brian Craft132, Elena Piñeiro-Yáñez134, Alfonso Muñoz7, Robert Petryszak7, Anja Füllgrabe7,
Fatima Al-Shahrour134, Maria Keays7, David Haussler132,135, John Weinstein136,137, Wolfgang
Huber8, Alfonso Valencia40,76, Irene Papatheodorou7, Jingchun Zhu132, Yu Fan32, David
Torrents40,76, Matthias Bieg138,139, Ken Chen140, Zechen Chong141, Kristian Cibulskis3, Roland
Eils28,30,142,143, Robert S. Fulton26,27,35, Josep L. Gelpi40,14 4, Santiago Gonzalez7,8 , Ivo G. Gut49,74,
Faraz Hach145,146, Michael Heinold28,30, Taobo Hu147, Vincent Huang12, Barbara Hutter139,148,149,
Natalie Jäger28, Jongsun Jung150, Yogesh Kumar147, Christopher Lalansingh12, Ignaty
Leshchiner3, Dimitri Livitz3, Eric Z. Ma147, Yosef E. Maruvka3,19,151 , Ana Milovanovic40, Morten
Muhlig Nielsen152, Nagarajan Paramasivam28,139, Jakob Skou Pedersen152,153, Montserrat
Puiggròs40, S. Cenk Sahinalp146,154,155, Iman Sarrai146,155, Chip Stewart3, Miranda D.
Stobbe49,74, Jeremiah A. Wala3,6 ,156 , Jiayin Wang27,58,157, Michael Wendl27,158,1 59, Johannes
Werner28,160, Zhenggang Wu147, Hong Xue147, Takafumi N. Yamaguchi12, Venkata
Yellapantula67,68, Brandi N. Davis-Dusenbery161, Robert L. Grossman162, Youngwook
Kim163,164, Michael C. Heinold28,30, Jonathan Hinton1, David R. Jones1, Andrew Menzies1, Lucy
Stebbings1, Julian M. Hess3,151 , Mara Rosenberg3,19, Andrew J. Dunford3, Manaswi Gupta3,
Marcin Imielinski165,166, Matthew Meyerson3,6,1 56, Rameen Beroukhim3, 6,167 , Jüri Reimand12,18,
Priyanka Dhingra71,73, Francesco Favero168, Stefan Dentro1,65,91, Jeff Wintersinger169,170,171,
Vasilisa Rudneva8, Ji Wan Park172, Eun Pyo Hong172, Seong Gu Heo172, André
Kahles112,113,114,115,116, Kjong-Van Lehmann112 ,114 ,115,1 73,1 74, Cameron M. Soulette37, Yuichi
Shiraishi39, Fenglin Liu175,176, Yao He175, Deniz Demircioğlu177,178, Natalie R.
Davidson112,114,115,117,173, Liliana Greger7, Siliang Li179,180, Dongbing Liu179,180, Stefan G.
Stark115,173,181,182, Fan Zhang175, Samirkumar B. Amin183,184,185, Peter Bailey186, Aurélien
Chateigner15, Milana Frenkel-Morgenstern187, Yong Hou179,180, Matthew R. Huska118, Helena
Kilpinen188, Fabien C. Lamaze12, Chang Li179,180, Xiaobo Li179,180, Xinyue Li179, Xingmin Liu179,180,
Maximillian G. Marin37, Julia Markowski118, Tannistha Nandi189, Akinyemi I. Ojesina190,191,192,
Qiang Pan-Hammarström179,193, Peter J. Park94,95, Chandra Sekhar Pedamallu3,6 ,167 , Hong
Su179,180, Patrick Tan18 9,194 ,195,1 96, Bin Tean Teh194,195,196,197,198, Jian Wang179, Heng Xiong179,180,
Chen Ye179,180, Christina Yung15, Xiuqing Zhang179, Liangtao Zheng175, Shida Zhu179,180, Philip
Awadalla12,13, Chad J. Creighton199, Kui Wu179,180, Huanming Yang179, Jonathan Göke177,200,
Zemin Zhang175,201, Angela N. Brooks3,37,1 56 , Matthew W Fittall91, Iñigo Martincorena1, Carlota
Rubio-Perez87,89,202, Malene Juul152, Steven Schumacher3,203, Ofer Shapira3,156 , David
Tamborero87,89 , Loris Mularoni87,89 , Henrik Hornshøj152, Jordi Deu-Pons89,204, Ferran
Muiños87,8 9, Johanna Bertl152,205, Qianyun Guo153, Abel Gonzalez-Perez87,89,206, Qian Xiang207,
Wojciech Bazant7, Elisabet Barrera7, Sultan T. Al-Sedairy208 , Axel Aretz209, Cindy Bell210,
Miguel Betancourt211, Christiane Buchholz212, Fabien Calvo213, Christine Chomienne214,
Michael Dunn215, Stuart Edmonds216, Eric Green217, Shailja Gupta218, Carolyn M. Hutter217,
Karine Jegalian219, Nic Jones220, Youyong Lu221,222,223, Hitoshi Nakagama224, Gerd
Nettekoven225, Laura Planko225, David Scott220, Tatsuhiro Shibata226,227, Kiyo Shimizu228,
Michael R. Stratton1, Takashi Yugawa228, Giampaolo Tortora229,230, K. VijayRaghavan218, Jean
C. Zenklusen231, David Townend232, Bartha M. Knoppers130, Brice Aminou15, Javier
Bartolome40, Keith A. Boroevich81,233, Rich Boyce7, Alex Buchanan38, Niall J. Byrne15,
Zhaohong Chen234, Sunghoon Cho235, Wan Choi236, Peter Clapham1, Michelle T. Dow234,
Lewis Jonathan Dursi12,25, Juergen Eils142,143, Claudiu Farcas234, Nodirjon Fayzullaev15, Paul
Flicek7, Allison P. Heath237, Oliver Hofmann238, Jongwhi H. Hong239, Thomas J. Hudson240,241,
Daniel Hübschmann30,120,142,242,243, Sinisa Ivkovic244, Seung-Hyup Jeon236, Wei Jiao12, Rolf
Kabbe28, Andre Kahles112,113,114,115,174, Jules N. A. Kerssemakers28, Hyunghwan Kim236, Jihoon
Kim245, Michael Koscher246, Antonios Koures234, Milena Kovacevic244, Chris Lawerenz143, Jia
Liu247, Sanja Mijalkovic244, Ana Mijalkovic Mijalkovic-Lazic244, Satoru Miyano39, Mia
Nastic244, Jonathan Nicholson1, David Ocana7, Kazuhiro Ohi39, Lucila Ohno-Machado234,
Todd D. Pihl248, Manuel Prinz28, Petar Radovic244, Charles Short7, Heidi J. Soia217, Jonathan
Spring162, Adam J. Struck38, Nebojsa Tijanic244, David Vicente40, Zhining Wang231, Ashley
Williams234, Youngchoon Woo236, Adam J. Wright12, Liming Yang231, Mark P. Hamilton249 ,
Todd A. Johnson233, Abdullah Kahraman250,251,252, Manolis Kellis3,253, Paz Polak3,4,6, Richard
Sallari3, Nasa Sinnott-Armstrong3,45, Christian von Mering252,254, Sergi Beltran49,74, Daniela
S. Gerhard255, Marta Gut49,74, Jean-Rémi Trotta74, Justin P. Whalley74, Beifang Niu256,
Shadrielle M. G. Espiritu12, Shengjie Gao179, Yi Huang157,257, Christopher M. Lalansingh12, Jon
W. Teague1, Michael C. Wendl27,158,1 59, Federico Abascal1, Gary D. Bader13, Pratiti
Bandopadhayay3,258,259, Jonathan Barenboim12, Søren Brunak260,261, Joana Carlevaro-
Fita262,263,264, Dimple Chakravarty265,266, Calvin Wing Yiu Chan28,128, Jung Kyoon Choi267, Klev
Diamanti268, J. Lynn Fink40,269, Joan Frigola204, Carlo Gambacorti-Passerini270, Dale W.
Garsed271, Nicholas J. Haradhvala3,19, Arif O. Harmanci64,272, Mohamed Helmy170, Carl
Herrmann28,30,273, Asger Hobolth153,205, Ermin Hodzic155, Chen Hong127,128, Keren Isaev12,18,
Jose M. G. Izarzugaza260, Rory Johnson263,274, Randi Istrup Juul152, Jaegil Kim3, Jong K.
Kim275, Jan Komorowski268,276, Andrés Lanzós263,264,274, Erik Larsson112, Donghoon Lee64,
Shantao Li64, Xiaotong Li64, Ziao Lin3,277, Eric Minwei Liu71,73,278, Lucas Lochovsky63,64,185,
Shaoke Lou63,64, Tobias Madsen152, Kathleen Marchal279,280, Alexander Martinez-
Fundichely71,72,73, Patrick D. McGillivray63, William Meyerson64,281, Marta Paczkowska12,
Keunchil Park282,283, Kiejung Park284, Tirso Pons285, Sergio Pulido-Tamayo279,280, Iker Reyes-
Salazar87, Matthew A. Reyna286, Mark A. Rubin274,287,288,289,290, Leonidas Salichos63,64 , Chris