Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for more than 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole-genome sequence analysis revealed frequent structural rearrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
Mapping the Hallmarks of Lung
Adenocarcinoma with Massively
Alice H. Berger,
Peter S. Hammerman,
Trevor J. Pugh,
Michael S. Lawrence,
Luc de Waal,
Ju¨ rgen Wolf,
Bruce E. Johnson,
Pasi A. Ja
Vincent A. Miller,
William D. Travis,
Harvey I. Pass,
Stacey B. Gabriel,
Eric S. Lander,
Roman K. Thomas,
Levi A. Garraway,
and Matthew Meyerson
Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
Department of Pathology, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA
Department of Pathology
Department of Systems Biology
Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
Department of Medical Oncology, Dana Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02115, USA
Samsung Research Institute, Samsung Medical Center, Seoul 135-967, Republic of Korea
Department of Pathology
Department of Cardiothoracic Surgery
Langone Medical Center, New York University, New York, NY 10016, USA
Department of Internal Medicine and Center for Integrated Oncology
Laboratory of Translational Cancer Genomics
ln-Bonn, University of Cologne, 50924 Cologne, Germany
Max Planck Institute for Neurological Research, 50924 Cologne, Germany
Thoracic Oncology Service, Department of Medicine
Department of Pathology
Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
Division of Hematology/Oncology, Vanderbilt-Ingram Cancer Center, Nashville, TN 37232, USA
Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
Department of Translational Genomics, University of Cologne, Weyertal 115b, 50931 Cologne, Germany
Department of Pathology, University of Cologne, Kerpener Strasse 62, 50937 Cologne, Germany
These authors contributed equally to this work
Lung adenocarcinoma, the most comm on subtype
of non-small cell lung cancer, is responsible for
more than 500,000 deaths per year worldwide.
Here, we report exome and genome sequences of
183 lung adenocarcinoma tumor/normal DNA pairs.
These analyses revealed a mean exonic somatic
mutation rate of 12.0 events/megabase and identi-
ﬁed the majority of genes previously reported as
signiﬁcantly mutated in lung adenocarcinoma. In
addition, we identiﬁed statistically recurrent somatic
mutations in the splicing factor gene U2AF1 and
truncating mutations affecting RBM10 and ARID1A.
Analysis of nucleotide context-speciﬁc mutation
signatures grouped the sample set into dis-
tinct clusters that correlated with smoking history
and alteration s of reported lung adenocarcinoma
genes. Whole-genome sequence analysis revealed
frequent structural rearrangements, including in-
frame exonic alterations within EGFR and SIK2
kinases. The candidate genes identiﬁed in this
study are attractive targets for biological charac-
terization and therapeutic targeting of lung adeno-
Lung cancer is a leading cause of death worldwide, resulting in
more than 1.3 million deaths per year, of which more than 40%
are lung adenocarcinomas (World Health Organization, 2012;
Travis, 2002). Most often, tumors are discovered as locally
advanced or metastatic disease, and despite improvements in
molecular diagnosis and targeted therapies, the average 5 year
Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc. 1107
survival rate for lung adenocarcinoma is 15% (Minna and
Molecular genotyping is now routinely used to guide clinical
care of lung adenocarcinoma patients, largely due to clinical
trials that demonstrated superior efﬁcacy of targeted kinase
inhibitors as compared to standard chemotherapy for patients
with EGFR mutations or ALK fusions (Kwak et al., 2010; Pao
and Chmielecki, 2010). In addition to EGFR and ALK alterations
found in 15% of U.S. cases, lung adenocarcinomas frequently
harbor activating mutations in KRAS, BRAF, ERBB2, and
PIK3CA or translocations in RET and ROS1 ( Pao and Hutchin-
son, 2012), all of which are being pursued as targets in ongoing
clinical trials (http://clinicaltrials.gov/). Lung adenocarcinomas
also often harbor loss-of-function mutations and deletions in
tumor suppressor genes TP53, STK11, RB1, NF1 , CDKN2A,
SMARCA4, and KEAP1 (Ding et al., 2008; Kan et al., 2010; San-
chez-Cespedes et al., 2002). Unfortunately, such alterations are
difﬁcult to exploit therapeutically. Therefore, knowledge of addi-
tional genes altered in lung adenocarcinoma is needed to further
guide diagnosis and treatment.
Previous efforts in lung adenocarcinoma genome character-
ization include array-based proﬁling of copy number changes
(Tanaka et al., 2007; Weir et al., 2007), targeted sequencing of
candidate protein-coding genes ( Ding et al., 2008; Kan et al.,
2010), and whole-genome sequencing of a single tumor/normal
pair (Ju et al., 2012; Lee et al., 2010). These studies identiﬁed
somatic focal ampliﬁcations of NKX2-1, substitutions and copy
number alterations in known oncogenes and tumor suppressor
genes, and recurrent in-frame fusions of KIF5B and RET. These
studies have also nominated several putative cancer genes with
somatic mutations (EPHA family, NTRK family, TLR4, LPHN3,
and GLI), but the functional consequence of many alter-
ations is unknown. A recent study describing whole-exome
sequencing of 16 lung adenocarcinomas (Liu et al., 2012)
enumerated several mutated genes but did not identify genes
undergoing positive selection for mutation in the studied tumors.
In this study, we used next-generation sequencing to se-
quence the exomes and/or genomes of DNA from 183 lung
adenocarcinomas and matched normal adjacent tissue pairs.
In addition to verifying genes with frequent somatic alteration
in previous studies of lung adenocarcinoma, we identiﬁed novel
mutated genes with statistical evidence of selection and that
likely contribute to pathogenesis. Together, these data represent
a signiﬁcant advance toward a comprehensive annotation of
somatic alterations in lung adenocarcinoma.
Patient Cohort Description
We sequenced DNA from 183 lung adenocarcinomas and
matched normal tissues by using paired-end massively parallel
sequencing technology (Bentley et al., 2008). The cohort in-
cluded 27 never-smokers, 17 light smokers (deﬁned by less
than ten pack years of tobacco use), 118 heavy smokers (more
than ten pack years), and 21 patients of unknown smoking status
(Table 1). The cohort included 90 stage I, 36 stage II, 22 stage III,
and 10 stage IV lung adenocarcinoma cases, as well as 25
patients with unknown stage. All tumors were chemotherapy-
naive, primary resection specimens except for one case with
whole-genome sequence data (LU-A08-43) that was a postche-
motherapy metastatic tumor from a never-smoker. Sample
acquisition details are provided in Extended Experimental
Procedures. Additional clinical descriptors of the cohort are
provided in Table 1. Comprehensive clinical and histopatholog-
ical annotations, sequence characteristics, and major variants
for each patient in the study are provided in Table S1 (available
Mutation Detection and Validation
We examined 183 lung adenocarcinoma tumor/normal pairs with
a combination of whole-exome sequencing (WES) or whole-
genome sequencing (WGS): 159 WES, 23 WES and WGS, and
1 WGS only. Exomes were sequenced to a median fold coverage
of 92 (range: 51–201) on 36.6 Mb of target sequence (Fisher
et al., 2011). Genomes were sequenced to a median coverage
of 69 (range: 25–103) in the tumor and 36 (range: 28–55) in the
normal, with the higher tumor coverage to adjust for stromal
contamination. Complementary SNP array analysis of 183 pairs
was used to detect genome-wide somatic copy number alter-
ations. See Table 2 and Extended Experimental Procedures for
We identiﬁed somatic substitutions and small insertions
and deletions (indels) through statistical comparison of paired
tumor/normal sequence data by using algorithms calibrated
for stromally contaminated cancer tissues (Banerji et al., 2012;
Stransky et al., 2011)(www.broadinstitute.org/cancer/cga; Ex-
tended Experimental Procedures). Exonic regions of the 183
cases contained 77,736 somatic variants corresponding to
Table 1. Summary of Clinical Features
Age at Surgery (Median; Range) 66 (36–87)
Smoking Status (AJCC 7th Edition)
smoker >10 years 118
smoker %10 years 17
pack years (median; range) 30 (0–128)
follow-up available 135
follow-up unavailable 48
PFS in months (median; range) 9 (0–63)
Distribution of selected clinical variables from 183 lung adenocarcinoma
1108 Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc.
a median of 8.1 mutations/Mb and a mean of 11.9 mutations/
Mb (range: 0.04–117.4). These comprised 43,813 missense,
14,801 silent, 3,504 nonsense, 1,460 splice-site, 2,310 dele-
tions, 839 insertions, and 11,009 other mutations (predomi-
nantly residing in 5
untranslated regions [UTRs]). Of
the 3,149 indels, 182 were in-frame, 1,785 were predicted to
cause a frame shift, 68 occurred at a splice site, and 1,114
were otherwise classiﬁed.
Mutation calls were validated by cross-comparison of coding
mutations detected by WES and WGS from 24 cases with both
data types. We validated 84% of 380 indels and 97% of 9,354
substitutions identiﬁed by WGS at sufﬁciently powered sites in
the corresponding WES tumor sample. In the converse analysis,
we validated 86% of 338 indels and 98% of 8,912 substitutions
from WES at sufﬁciently powered sites in the corresponding
WGS tumor sample (Figure S1, Table S2A, and Extended Exper-
imental Procedures). To validate mutations from cases with only
WES data, we randomly selected 69 candidate mutations for
ultradeep (>1,000-fold) targeted resequencing. Somatic status
was conﬁrmed for 30 of 33 (91%) indel events and 33 of 36
(92%) substitution events (Table S2B). These validation rates
generally meet or exceed those reported in similar sequencing
studies (Banerji et al., 2012; Berger et al., 2012; Gerlinger
et al., 2012; Nikolaev et al., 2012; Stransky et al., 2011; Cancer
Genome Atlas Research Network, 2011; Totoki et al., 2011;
Zang et al., 2012).
Somatic Genetic Signatures of Mutagen Exposure in
Consistent with previous studies (Ding et al., 2008; Kan et al.,
2010; Zang et al., 2012), we observed signiﬁcantly higher exonic
mutation rates in tumors from smokers (median: 9.8/Mb;
mean: 12.9/Mb; range: 0.04–117.4/Mb) compared to never-
smokers (median: 1.7/Mb; mean: 2.9/Mb; range: 0.07–22.1/Mb;
p = 3.0 3 10
, Wilcoxon rank sum test). Lung adenocarcinoma
mutation rates in our cohort exceeded those reported for other
epithelial tumor types, except melanoma and squamous cell
lung cancer (Hodis et al., 2012; Nikolaev et al., 2012; Cancer
Genome Atlas Research Network, 2012; Wei et al., 2011).
To characterize the mutation spectrum of lung adenocarci-
noma, we analyzed somatic substitutions and covered bases
within their trinucleotide sequence context (Figure 1A). The
most frequent mutation signatures were C/T transitions in the
setting of CpG dinucleotides (CpG/T) and C/A transversions.
The least frequent mutation type was A/C. Unbiased hierar-
chical clustering of context-sp eciﬁc mutation rates across 182
WES cases yielded ﬁve mutation spectrum clusters. These clus-
ters represented grades of increasing mutational complexity:
cluster 1 was enriched for CpG/T mutations and was marked
by an overall low mutation rate; cluster 2 was characterized
by CpG/T transitions and CpG/A transversions; cluster 3
showed additional C/A transversions outside of the CpG
context; cluster 4 showed additional C/T transitions outside
of the CpG context and TpC transversions that mutated to either
a T or a G; and cluster 5 comprised hypermutated tumors con-
taining a broad mutational spectrum that included rare mutation
signatures, such as A/T transversions. Mutation spectrum
clusters in tumors correlated with clinical features of patients.
Cluster 1 was signiﬁcantly enriched in never- and light smokers
(p = 1.9 3 10
, Fisher’s exact test), whereas cluster 4 was
signiﬁcantly enriched in patients with advanced (IIIB or IV) stage
(p = 0.0063, Fisher’s exact test).
Differentiation of smokers and never-smokers was evident
from comparison of mutation counts from the most frequent
mutational signatures, CpG/T and C/A(Figure 1B). These
results were consistent with previous reports of signatures of
DNA damage by tobacco (Hainaut and Pfeifer, 2001). Applying
thresholds to a log-adjusted ratio of CpG/T and C/A muta-
tions (see Experimental Procedures), we imputed smoking
status for 21 patients who lacked reported smoking history
and accurately recapitulated reported smoking status for more
than 75% of the remaining cases (Figure 1B). Exonic and intronic
mutation rates, context-speciﬁc mutation counts, imputed
Table 2. Whole-Genome and Whole-Exome Sequencing
Capture Whole Genome
Total tumor Gb
Median fold tumor
target coverage (range)
91 (51–201) 69 (25–103)
Median normal fold
target coverage (range)
92 (62–141) 36 (28–55)
Median somatic mutation rate
per Mb in target territory (range)
6.8 (0.3–94.7) 13.3 (4.5–55.3)
Median number of coding
mutations per patient (range)
216 (1–3,512) 323 (63–2,279)
Median number of
per patient (range)
167 (1–2,721) 248 (53–1,770)
Median number of transcribed
noncoding mutations per
187 (13–2,559) 18,314
Total number of
Total number of
Total number of frame-
Median number of genes
powered at 20% exonic
Median number of genes
powered at 50% exonic
Selected sequencing statistics for 183 WES and WGS cases. ‘‘Tumor
Target Territory’’ refers to the exonic territory targeted by the exome
capture bait set reported by (Fisher et al., 2011) and used in this study.
The ‘‘Whole-Exome Capture’’ column does not include data on 23 cases
analyzed by both WES and WGS.
Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc. 1109
smoking status, and mutation spectrum cluster assignments for
each patient are provided in Table S1.
Calibration of a Statistical Approach to the Analysis of
High Mutation Rate Tumors
The high mutation rates in lung adenocarcinoma and other
tumors (Hodis et al., 2012; Cancer Genome Atlas Research
Network, 2012) present a challenge for unbiased discovery of
mutated genes undergoing positive somatic selection. More
than 13,000 of 18,616 genes with adequate sequence coverage
had nonsynonymous somatic mutations in at least one tumor,
and more than 3,000 were mutated in at least ﬁve patients. These
genes included those with very large genomic footprints (e.g.,
TTN), genes with low basal expression in lung adenocarcinomas
(e.g., CSMD3), and genes accumulating high numbers of silent
substitutions (e.g., LRP1B).
Application of a standard binomial background mutation
model assuming a constant mutation rate in each patient and
nucleotide context stratum (Berger et al., 2011) yielded profound
test statistic inﬂation (Figure S2A) and identiﬁed more than 1,300
signiﬁcantly mutated genes. Genes with signiﬁcant p values in
this analysis had low basal expression in lung adenocarcinoma
cell lines (Barretina et al., 2012)(Figure S2B), harbored high frac-
tions of synonymous mutations, and were enriched in gene
classes previously unassociated with cancer (e.g., olfactory
receptors and solute transporters). Recalibration of this model
by limiting to genes with evidence of expression improved, but
did not completely correct, this statistical inﬂation (Figure S2C).
These results suggested a high degree of variation in neutral
somatic mutation rates among genes, including expression-
dependent variation. This observation is consistent with reports
of regional mutation rates correlated with density of H3K9 chro-
matin marks across cancers (Schuster-Bo
ckler and Lehner,
2012) and with gene expression in multiple myeloma (Chapman
et al., 2011).
To more adequately model variation of neutral somatic muta-
tion rates among genes, we applied the InVEx algorithm (Hodis
et al., 2012 ) to exploit the abundant noncoding mutations de-
tected by both WES and WGS. InVEx permutes coding, untrans-
lated, and intronic mutations within covered territories of each
Context specific mutation rate (mutations / MB)
1 5 10 50 100 500
Age > 55
Age > 70
C A mutations
CpG T mutations
Figure 1. Mutation Spectrum Analysis of 183 Lung Adenocarcinomas
(A) Hierarchical clustering of 183 lung adenocarcinomas according to their nucleotide context-speciﬁc exonic mutation rates. Each column represents a case,
and each row represents one of 96 strand-collapsed trinucleotide context mutation signatures. Top bar, patient-cluster membership; left bar, simpliﬁed single-
nucleotide context mutational signature; bottom bars, reported tumor stage, age, and smoking status for each patient; right gradient, mutation rate scale.
(B) Stratiﬁcation of reported versus imputed smoking status by the log transform of the adjusted ratio of C/A tranversion rates and CpG/T transition rates. The
color of each inner solid point represents the reported smoking status for that particular patient. The color of each outer circle indicates that patient’s imputed
smoking status as predicted by the classiﬁer. Additional analytic details are provided in the Extended Experimental Procedures.
See also Figure S1 and Tables S2 and S3 .
1110 Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc.
gene, patient, and nucleotide context to generate within-gene
null distributions of ‘‘functional impact’’ across a sample set
(see Experimental Procedures).
Our primary InVEx analysis employed a PolyPhen-2 (PPH2)-
based metric (Adzhubei et al., 2010) to assess the functional
impact of observed and permuted mutations. Applying this anal-
ysis to 12,907 mutated genes with at least one PPH2-scored
event yielded a well-distributed test statistic with minimal inﬂa-
tion (Figure S2 D) and without gene expression bias in lung
adenocarcinoma cell lines (Figure S2B). To increase speciﬁcity
and power, we restricted our analysis to 7,260 genes demon-
strating expression (median Robust Multiarray Average [RMA]
value R5) in a panel of 40 lung adenocarcinoma cell lines (Barre-
tina et al., 2012), which resulted in a similarly well-calibrated test
statistic (Figure S2E).
Next, we tested for enrichment of loss-of-function (LOF) muta-
tions by considering only truncating mutations as functional and
all remaining mutation types as neutral. We applied this method
to 2,266 genes with evidence of expression in lung adenocarci-
noma cell lines and at least one truncating mutational event.
Finally, we applied both PPH2 and LOF InVEx analyses to a
focused set of Cancer Gene Census (CGC) genes expressed in
lung adenocarcinoma and mutated or ampliﬁed in one or more
Statistical Driver Analysis Yields Previously Reported
and Novel Lung Adenocarcinoma Genes
The primary PPH2 InVEx analysis yielded 13 genes with statis-
tical evidence of positive selection (q < 0.25) (Table S3A). These
included lung adenocarcinoma genes with nonsynonymous
mutation frequencies that were consistent with previous reports:
TP53 (50%), KRAS (27%), EGFR (17%), STK11 (15%), KEAP1
(12%), NF1 (11%), BRAF (8%), and SMAD4 (3%). This analysis
also uncovered ﬁve novel candidates, including CHEK2,a
gene driven by an apparent recurrent mapping artifact in three
tumors and removed from all subsequent analyses (see Ex-
tended Experimental Procedures). The remaining candidates
were mutated at frequencies lower than most previously re-
ported genes, demonstrating the increased power of our large
sample set. The LOF InVEx yielded six signiﬁcantly mutated
genes (q < 0.25), including BRD3, which is an additional gene
not contained in the PPH2 analysis (Table S3B). The CGC-only
PPH2 and LOF analyses yielded 15 and 10 genes, respectively,
including CTNNB1, FGFR3, ATM, CBL, PIK3CA, PTEN, FBXW7,
ARID1A, and SETD2 (Tables S3C and S3D). In total, the union of
these four analyses nominated 25 genes as signiﬁcantly mutated
in our cohort (Figure 2A). Somatic coding mutations in signiﬁ-
cantly mutated genes and known lung adenocarcinoma genes
are provided in Table S1. The entire list of somatic coding muta-
tions for all covered genes is provided in Table S4.
To compare our results with previous reports, we reviewed the
CGC and lung adenocarcinoma literature to identify genes with
evidence for functional somatic mutation in lung adeno-
carcinoma (see Extended Experimental Procedures for criteria
and references). Of the 19 genes with reported functional muta-
tions, 13 were signiﬁcantly mutated genes nominated by our
analysis (KRAS, TP53, EGFR, STK11, SMARCA4, NF1, RB1,
BRAF, KEAP1, SMAD4, CTNNB1, PIK3CA, and ATM). The alter-
ations driving the statistical enrichment of these genes included
previously reported and novel mutations (Figures S3A–S3C). The
remaining six reported lung adenocarcinoma genes (CDKN2A,
ERBB2, AKT1, NRAS, HRAS, and APC) were not signiﬁcant
in our mutation analysis (Table S3E), although we did identify
canonical driver mutations in these genes (e.g., AKT1 p.E17K,
NRAS p.Q61L) and although CDKN2A is signiﬁcantly deleted
(see Figure 2) and rearranged (see below). This may reﬂect a
power limitation of our cohort or analytic methods we applied,
particularly when identifying infrequently mutated genes such
as AKT1, NRAS, and HRAS. Also missing among our signiﬁ-
cantly mutated genes were 22 genes nominated by two previous
large-scale targeted lung adenocarcinoma sequencing studies
of similar or smaller size (Ding et al., 2008; Kan et al., 2010)
(see Extended Experimental Procedures for the complete list).
Most of these genes (20 of 22) did not pass our gene expression
ﬁlter and thus were not included in our global analysis. Targeted
analysis of these genes identiﬁed four with nominal evidence for
positive selection via PPH2 InVEx (EPHA3, LPHN3, GRM1, and
TLR4), the most signiﬁcant of these being EPHA3 (p = 0.0027,
Correlations among Alterations in Signiﬁcantly Mutated
Genes and Clinicopathologic and Genomic Features
We correlated mutation status of the 25 signiﬁcantly mutated
genes with clinical features (smoking, age, and stage), genomic
variables (mutation rate, mutation spectrum cluster, and imputed
smoking status), and presence of driver alterations in 25 genes
frequently or functionally altered in lung adenocarcinoma. These
alterations included genes with reported high frequency of
somatic mutation (e.g., KRAS) or focal ampliﬁcation (e.g.,
or deletion (e.g., TP53). High-frequency somatic copy
number alterations used for this analysis were curated from
published surveys of lung adenocarcinoma (Tanaka et al.,
2007; Weir et al., 2007). See Hallmarks Analysis in the Experi-
mental Procedures for the strict deﬁnition of driver alterations.
In our cohort, we observed gains of TERT (42% of cases, 15%
focal), MYC (31%), EGFR (22%), and NKX2-1 (18%, 10% focal).
Frequent losses were seen in TP53 (18%) and CDKN2A (24%,
10% homozygous), as well as in other signiﬁcantly mutated
genes, including SMAD4, KEAP1, and SMARCA4.
EGFR mutations were signiﬁcantly anticorrelated with KRAS
mutations (p = 3.3 3 10
), and somatic mutation rate (p =
5.9 3 10
) EGFR mutations signiﬁcantly correlated with
never-/light smoker status (p = 2.0 3 10
), imputed never-/light
smoker status (1.5 3 10
), and membership in spectrum cluster
1 (p = 0.0015). KRAS, STK11, SMARCA4, and KEAP1 mutations
were signiﬁcantly anticorrelated with both spectrum cluster 1
and imputed never-/light smoking status (p < 0.005). These ﬁnd-
ings are consistent with reported associations (Koivunen et al.,
2008; Pao et al., 2004, 2005; Slebos et al., 1991). In addition,
NF1 mutations were signiﬁcantly depleted in spectrum cluster
) and co-occurred with U2AF1 mutations (p =
0.0011). KRAS driver alterations (including both mutations and
copy number alterations) signiﬁcantly associated with spectrum
cluster 3 (p = 0.00071). STK11 driver alterations were signiﬁ-
cantly enriched in spectrum cluster 2 (p = 0.0026). Correlation
results are graphically summarized in Figure S3D.
Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc. 1111
Finally, we screened the 25 signiﬁcantly mutated genes for as-
sociation with progression-free survival (PFS) across 135 patients
with PFS data. U2AF1 (p = 0.00011, log rank test) and TP53 muta-
tions (p = 0.0014, log rank test) were associated with signiﬁcantly
reduced survival (Figure S3E). The latter ﬁnding was consistent
with previous reports (Kosaka et al., 2009; Mitsudomi et al.,
1993). No other signiﬁcant associations with PFS were seen.
Nomination of Candidate Lung Adenocarcinoma Genes
One of the most signiﬁcantly mutated genes in this lung adeno-
carcinoma cohort was U2AF1 (p = 2.0 3 10
, PPH2 InVEx),
which had nonsynonymous mutations in 3% of cases (Figure 3A).
Identical c.101C > T, p.S34F mutations were seen in four of ﬁve
U2AF1 mutant cases (Figure 3A); this is the exact mutation re-
ported in myelodysplastic syndrome (MDS) (Graubert et al.,
2012; Yoshida et al., 2011). To our knowledge, this study is the
ﬁrst report of U2AF1 mutations in an epithelial tumor. One of
four p.S34F mutations occurred with an activating event in
KRAS (p.Q61H), suggesting that U2AF1 mutations may confer
tumorigenic capability independent of known proliferation-
sustaining driver genes. As mentioned above, four patients with
U2AF1 mutations and survival data had signiﬁcantly reduced
PFS (Figure S3E). Nonsynonymous mutations in genes encoding
other members of the spliceosome complex (including SF3B1,
U2AF2, and PRPF40B) were found in 14 additional cases (Yosh-
ida et al., 2011).
RBM10 was frequently mutated (12/183 cases; 7%) and sub-
ject to recurrent nonsense, frameshift, or splice-site mutations,
Figure 2. Somatic Mutations and Copy Number Changes in 183 Lung Adenocarcinomas
Top panel shows a summary of exonic somatic mutations in 25 signiﬁcantly mutated genes (see text and Table S3 for details). Tumors are arranged from left to
right by the number of nonsilent mutations per sample, shown in the top track. Signiﬁcantly mutated genes are listed vertically in decreasing order of nonsilent
mutation prevalence in the sequenced cohort. Colored rectangles indicate mutation category observed in a given gene and tumor. Bar chart (right) indicates
prevalence of each mutation category in each gene. Asterisks indicate genes signiﬁcantly enriched in truncating (nonsense, frameshift) mutations. Middle bars
indicate smoking status and mutation spectrum cluster for each patient. White boxes indicate unknown status. Bottom panel shows a summary of somatic copy
number alterations derived from SNP array data. Colored rectangles indicate the copy number change seen for a given gene and tumor. See also Figure S2.
1112 Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc.
which were present in 7 of 12 mutated cases (4% of overall
cohort) (Figure 3B). This resulted in signiﬁcant enrichment in
the global PPH2 InVEx analysis (p = 0.00042) (Table S3). Like
U2AF1, RBM10 is an RNA-binding protein that is highly ex-
pressed in lung adenocarcinoma cell lines (data not shown),
and its mutations co-occurred with those in known lung adeno-
carcinoma oncogenes (KRAS, EGFR , and PIK3CA). ARID1A,
encoding a key protein in the SWI/SNF chromatin-remodeling
complex, was mutated in 8% of cases (Figure 3C) and showed
signiﬁcant accumulation of nonsense substitutions and frame-
shift indels (p = 0.027, CGC LOF InVEx).
Whole-Genome Rearrangement Analysis Reveals Novel
and Recurrent Structural Variants
We used paired-end and split-read mapping of whole-genome
data (Banerji et al., 2012; Bass et al., 2011; Medvedev et al.,
2009) to detect and map the breakpoints of 2,349 somatic rear-
rangements across 24 WGS cases. The majority of these were in-
trachromosomal rearrangements (1,818 events) but included 531
interchromosomal events. Among these were 1,443 (61.4%)
genic rearrangements (i.e., in which one breakpoint was con-
tained within the promoter, UTR, intron, or exon of a gene) and
906 (38.6%) purely intergenic events. Lung adenocarcinomas
harbored a wide range of total rearrangements (median: 98;
range: 18–246), genic rearrangements (median: 50; range: 12–
173) (Figure 4A), and overall genome complexity (Figure S4).
The variability of rearrangement counts between cases did not
1 50 100 150 200
1 200 400 600 800
15001000 1500 2000
Figure 3. Somatic Mutations of Lung
Adenocarcinoma Candidate Genes U2AF1,
RBM10, and ARID1A
(A) Schematic representation of identiﬁed somatic
mutations in U2AF1 shown in the context of the
known domain structure of the protein. Numbers
refer to amino acid residues. Each rectangle
corresponds to an independent, mutated tumor
sample. Silent mutations are not shown. Missense
mutations are shown in black.
(B) Schematic of somatic RBM10 mutations.
Splice-site mutations are shown in purple; trun-
cating mutations are shown in red. Other notations
as in (A).
(C) Schematic of somatic ARID1A mutations.
Notations as in (A) and (B).
See also Figure S3.
correlate with clinical variables (Figure S4
and Table S1) or mutation spectrum. Rear-
rangement coordinates and interpreta-
tions are provided as Table S5.
The reading frame of affected genes
was preserved by 3% of detected rear-
rangements (71 of 2,349). These included
34 protein fusions, 13 duplications, and
24 deletions. We found 44 rearrange-
ments that fused UTRs of two genes
without affecting the protein-coding se-
quence of either gene. All 25 genic fusions
we tested were conﬁrmed by PCR and
Illumina sequencing (see Extended Experimental Procedures)
The gene with the highest rate of rearrangements for its size
was CDKN2A (4.3 rearrangements/sequenced Mb). Two cases
had out-of-frame, antisense fusions (with MTAP and C9orf53),
and a third harbored an in-frame deletion (Figure 4B). As shown
in lung squamous cell carcinomas, rearrangements represent an
additional mechanism of CDKN2A inactivation, in addition to re-
ported mutation, homozygous deletion, and methylation ( Cancer
Genome Atlas Research Network, 2012). Additional lung adeno-
carcinoma tumor suppressors affected by predicted null or
truncating rearrangements included STK11 (2.5 kb deletion
removing the translational start site) and APC (midexon rear-
rangement) (Figure 4B).
We next focused on potentially activating in-frame rearrange-
ments of kinase genes. This analysis uncovered a two-exon
deletion in EGFR, which was previously identiﬁed in glioblastoma
multiforme but is novel in lung adenocarcinoma, ablating a
portion of the C terminus of EGFR encoded by exons 25 and
26 (Figures 4B, 5A, and S5), including residues associated
with interaction with PIK3C2B (Wheeler and Domin, 2001) and
CBL (Grøvdal et al., 2004). Similar C-terminal deletion variants
(EGFR vIVb) have been previously identiﬁed in glioblastoma
(Ekstrand et al., 1992) and have been shown to be oncogenic
in cellular and animal models (Cho et al., 2011; Pines et al.,
2010). This tumor contained a second somatic alteration in
EGFR, a p.G719S mutation, suggesting possible synergy of
Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc. 1113
(exon 25–26 deletion)
(exon 8–10 deletion)
(exon 4 duplication)
(exon 10–27 duplication)
translational start site)
Figure 4. Whole-Genome Sequencing of Lung Adenocarcinoma
(A) Summary of genic rearrangement types across 25 lung adenocarcinoma whole genomes. Stacked-bar plot depicting the types of somatic rearrangement
found in annotated genes by analysis of whole-genome sequence data from 25 tumor/normal pairs. The ‘‘Other Genic’’ category refers to rearrangements linking
an intergenic region to the 3
portion of a genic footprint.
(B) Representative Circos (Krzywinski et al., 2009) plots of whole-genome sequence data with rearrangements targeting known lung adenocarcinoma genes
CDKN2A, STK11, and EGFR and novel genes MAST2, SIK2 , and ROCK1. Chromosomes are arranged circularly end to end with each chromosome’s cytobands
marked in the outer ring. The inner ring displays copy number data inferred from WGS with intrachromosomal events in green and interchromosomal trans-
locations in purple.
See also Figure S4 and Table S5.
1114 Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc.
activating EGFR mutations or presence of independent, subclo-
nal activating mutations.
To assess oncogenicity of this novel EGFR variant, we ectop-
ically expressed an EGFR transgene lacking exons 25 and 26 in
NIH 3T3 cells. As has been previously observed for oncogenic
EGFR mutations, cells stably expressing this transgene demon-
strated colony formation in soft agar (Figure 5B) and increased
EGFR and AKT phosphorylation in the absence of EGF (Fig-
ure 5C). In contrast, cells expressing wild-type EGFR formed
colonies only in the presence of EGF (Figure 5B). Overexpression
of the EGFR transgene in Ba/F3 cells led to interleukin-3 inde-
pendent proliferation that was blocked by treatment with an
EGFR tyrosine kinase inhibitor, erlotinib (Figure 5D), at concen-
trations previously shown to be sufﬁcient for inhibition of acti-
vated variants of EGFR (Yuza et al., 2007).
Kinases with in-frame rearrangements in tumors without muta-
tions in lung adenocarcinoma oncogenes included SIK2 and
ROCK1 (Figure 4B). An in-frame kinase domain duplication in
SIK2 (salt-inducible kinase 2) was identiﬁed and validated by
quantitative PCR (qPCR). The duplication occurred 15 amino
Exon 24 25 26 27
100 200 300 400 500 600 700 800 900 1000 1100 1200
Extracellular Kinase C-terminal
Number of colonies
Ex25 & 26
Ex25 & 26 del
Cell viability (% control)
Ex25 & 26 del
0 0.0033 0.001 0.033 0.1 0.33 1.0 3.33 10
Erlotinib concentration (μM)
Figure 5. Identiﬁcation of a Novel Lung Adenocarcinoma In-Frame Deletion in EGFR
(A) Schematic representation of reported EGFR alterations (above protein model) for comparison with a C-terminal deletion event found in this study by WGS
(below protein model). A schematic depiction of sequencing data shows the expected wild-type reads (gray) in contrast wit h the observed reads (black) spanning
or split by the deletion breakpoint. Supporting paired-end and split-read mapping data are shown in Figure S5.
(B) Soft agar colony forming assay of NIH 3T3 cells expressing exon 25- and 26-deleted EGFR (Ex25&26 del) or wild-type EGFR in the presence or absence of
ligand stimulation. The bar graph shows the number of colonies formed by indicated cells with or without EGF in soft agar. Data shown are mean +SD of three
replicates of a single experiment. The results are representative of three independent experiments.
(C) Ex25&26 del EGFR is constitutively active in the absence of EGF. The same NIH 3T3 cells used for the assay in (B) were subjected to immunoblotting with anti-
phospho-tyrosine (4G10), anti-EGFR, and anti-phospho-Akt (S473) antibodies. Blots were probed with anti-Akt and anti-B-actin antibodies (loading control).
(D) Cell growth induced by the oncogenic EGFR deletion mutant is suppressed by erlotinib treatment. Ba/F3 cells transform ed by either L858R or Ex25&26 del
mutants were treated with increasing concentrations of erlotinib as indicated for 72 hr and were assayed for cell viability. Data shown are mean ±SD of six
replicates of a single experiment. The results are representative of three independent experiments.
See also Figure S5.
Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc. 1115
acids upstream of Thr-175, where a related kinase, SIK1, is acti-
vated by STK11 (Hashimoto et al., 2008). A 19 exon duplication
was uncovered in ROCK1, which is a serine/threonine kinase
that acts as an effector of Rho signaling (Pearce et al., 2010 ).
Notably, we did not identify any in-frame rearrangements
involving kinase fusion targets in lung adenocarcinoma ALK,
RET1, and ROS1. Given their reported 2%–7% frequency in
lung adenocarcinoma (Bergethon et al., 2012; Takeuchi et al.,
2012), our study of 24 tumor/normal pairs may not be large
enough to detect these rearrangements. Interestingly, an out-
of-frame ROS1-CD74 translocation was identiﬁed in a single
patient without evidence for the previously characterized recip-
rocal activating event. In-frame fusions and indels are annotated
for each WGS case in Table S1.
Charting the Next-Generation Hallmarks of Lung
The ‘‘hallmarks of cancer,’’ as deﬁned by Hanahan and Weinberg
(2000, 2011), comprise a set of cellular traits thought to be
necessary for tumorigenesis. They also represent a powerful
framework to evaluate our understanding of genetic alterations
driving lung adenocarcinoma. With this aim, we mapped each
of 25 experimentally validated lung adenocarcinoma genes to
one or more cancer hallmarks from Hanahan and Weinberg
(2000, 2011) (Table S6 and Experimental Procedures). These
25 genes include the 19 previously reported genes discussed
above, in addition to six genes subject to frequent copy num-
ber alteration in lung adenocarcinoma (NKX2-1, TERT, PTEN,
MDM2, CCND1, and MYC). Next, we integrated this gene hall-
mark mapping with our somatic mutation and copy number
data to estimate the prevalence of cancer hallmark alterations
in lung adenocarcinoma (Figure 6 and Table S1).
For many cases in our cohort, we could attribute only a minority
of the ten cancer hallmarks to a distinct genetic lesion (Figure S6).
Only 6% of tumors had alterations assigned to all six classic hall-
marks, and none had alterations impacting all ten emerging and
classic hallmarks. In contrast, 15% of our cohort did not have
a single hallmark alteration, and 38% had three or fewer. This
ﬁnding is likely explained in part by alteration of cancer genes
by mechanisms not assayed in our study and also suggests
that many lung adenocarcinoma genes have not been identi-
ﬁed. This may be especially relevant for the hallmarks of avoid-
ing immune destruction and tumor-promoting inﬂammation, to
which none of the recurrently mutated genes identiﬁed in our
study or previous studies could be linked. One of the most
important and therapeutically targetable cancer hallmarks is
sustaining proliferative signaling (Figures 6 and S6). Less than
half (47%) of our cohort harbored a mutation in a known driver
gene for this hallmark, and only slightly more (55%) did so
when including high-level ampliﬁcation in one or more prolifera-
tive signaling genes (e.g., EGFR, ERBB2, and MYC).
mapping of somatic alterations to cancer hallmarks
illuminates speciﬁc gaps in the understanding of the somatic
genetic underpinnings of lung adenocarcinoma. Around half of
the sequenced cohort lacked a mutation supporting sustained
proliferative signaling, and a majority lacked a genetic alteration
explaining the phenotypes of invasion and metastasis or
angiogenesis. This phenotypic gap may be explained by novel
capabilities not yet attributed to alterations in known lung adeno-
carcinoma genes or through novel alterations in genes previ-
ously unassociated with this disease that will emerge through
additional unbiased analyses.
While annotating the 25 known lung adenocarcinoma genes,
we noted that SMARCA4, an epigenetic regulator and tumor
suppressor, could not be clearly mapped to any cancer hallmark.
Given the frequent somatic mutations in epigenetic and splicing
regulators found by recent cancer genome scans (Elsa
sser et al.,
2011) and our study ( U2AF1, ARID1A, RBM10, SETD2, and
BRD3), we speculated that these alterations may represent
a novel hallmark of epigenetic and RNA deregulation. Together,
these genes implicate the proposed eleventh hallmark in a con-
siderable proportion of cases (10% including only SMARCA4,
22% including nominated genes).
Efﬁciency and Power in Somatic Genetic Studies of
This study represents the largest sequencing analysis of lung
adenocarcinoma to date. Our analysis reveals the genomic
complexity of lung adenocarcinoma at the base-pair and struc-
tural levels, exceeding that observed in genome characterization
studies of most other tumor types. We have applied a recently
published statistical method (Hodis et al., 2012) for identifying
somatically mutated genes displaying evidence of positive
selection in cancer. This permutation approach exploits the
abundant supply of intronic and ﬂanking mutation events de-
tected in both WES and WGS to adequately model the gene-
speciﬁc variation in neutral mutation rates (Hodis et al., 2012 ).
We believe that such a calibrated approach is required to identify
signals of positive somatic selection in large unbiased cancer
genome scans. This concern is particularly relevant to tumor
types harboring high rates of somatic mutation, such as lung
adenocarcinoma or melanoma.
This study has led to discovery of signiﬁcant mutation of 25
genes in lung adenocarcinoma. Notably, our study did not iden-
tify a mutated oncogene in every tumor sample. Furthermore, we
were unable to statistically nominate several important, but
rarely mutated, lung adenocarcinoma genes (AKT1, ERBB2,
NRAS, and HRAS, each with %3 events in our cohort). Therefore,
future studies of larger cohorts by The Cancer Genome Atlas
and other consortia that combine analysis of data from RNA
sequencing (RNA-seq), methylation proﬁling, and other omic
platforms will likely yield an even more complete annotation of
genes signiﬁcant to lung adenocarcinoma.
This study represents a signiﬁcant advance toward complete
characterization of the genomic alterations of lung adenocarci-
noma. These results are a testament to the power of unbiased,
large-scale next-generation sequencing technology to expand
our understanding of tumor biology. The novel mutated genes
identiﬁed in this study warrant further investigation to determine
their biologic, prognostic, and/or therapeutic signiﬁcance in lung
adenocarcinoma, potentially leading to clinical translation and
improved outcomes for patients with this deadly disease.
1116 Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc.
Details of sample preparation and analysis are described in the Extended
Patient and Sample Characteristics
We obtained DNA from tumor and matched normal adjacent tissue from six
source sites. DNA was obtained from frozen tissue primary lung cancer resec-
tion specimens for all samples, with the exception of one patient (LU-A08-14),
for whom a liver metastasis was obtained at autopsy. The 183 lung adenocar-
cinoma diagnoses were either certiﬁed by a clinical surgical pathology report
provided by the external tissue bank or collaborator or was veriﬁed through in-
house review by an anatomical pathologist at the Broad Institute of MIT and
Harvard. A second round of pathology review was conducted by an expert
committee led by W.D.T. Informed consent (Institutional Review Board) was
obtained for each sample by using protocols approved by the Broad Institute
of Harvard and MIT and each originating tissue source site.
Massively Parallel Sequencing
Exome capture was performed by using Agilent SureSelect Human All Exon 50
Mb according the manufacturer’s instructions. All WES and WGS was per-
formed on the Illumina HiSeq platform. Basic alignment and sequence quality
control were done by using the Picard and Firehose pipelines at the Broad
Institute. Mapped genomes were processed by the Broad Firehose pipeline
to perform additional quality control, variant calling, and mutational signiﬁ-
Gene expression data for 40 lung adenocarcinoma cell lines were obtained
from the Cancer Cell Line Encyclopedia (CCLE) (http://www.broadinstitute.
Figure 6. Next-Generation Hallmarks of Lung Adenocarcinoma
Left, the prevalence of mutation or SCNA of Sanger Cancer Gene Census (Futreal et al., 2004) genes mapp ing to cancer hallmarks deﬁned by Hanahan and
Weinberg (2011). Suspected passenger mutations were ﬁltered out of the analysis, as described in Experimental Procedures. Top right, genes comprising the
mutated genes in the hallmark of sustaining proliferative signaling are shown. Bottom right, a proposed eleventh hallmark of epigenetic and RNA deregulation is
shown, depicted as above. Genes shown in gray are candidate lung adenocarcinoma genes identiﬁed in this study that may additionally contribute to the
See also Figure S6 and Table S6.
Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc. 1117
org/ccle/home) as RMA normalized tab-delimited text data (Barretina et al.,
We evaluated statistical evidence for somatic selection within the longest tran-
script of each gene by using InVEx (Hodis et al., 2012) with PolyPhen-2-based
(Adzhubei et al., 2010) and LOF-based scoring schemes. The method was im-
plemented in Python (http://www.python.org) and is available for download
(http://www.broadinstitute.org/software/invex/). Gene ranking according to
a stratiﬁed binomial model was performed by using the MutSig method from
Berger et al. (2011) and was implemented in MATLAB. Correlations between
genotype status, mutation/rearrangement spectrum data, and clinical vari-
ables were performed by Fisher’s exact test for dichotomous variables and
by Wilcoxon rank sum test for dichotomous variables versus numeric data
(e.g., mutation status versus total mutation rate). All remaining statistical
computing, including cluster analysis and visualiza tion, was performed by
using standard packages in R (http://www.r-project.org).
We manually assigned 25 genes—implicated by previous studies to be
frequently or functionally altered in lung adenocarcinoma—to one or more
cancer hallmarks as deﬁned by Hanahan and Weinberg (2000, 2011) (see
Extended Experimental Procedures). We determined whether alterations in
gene i could be implicated as a ‘‘driver’’ of one or more cancer hallmarks in
case j by applying the following criteria: we inferred the activation status of
genes annotated by the Sanger Gene Census as ‘‘dominant’’ cancer genes
(e.g., KRAS) in each patient by evaluating every nonsynonymous variant in
the gene for its presence within a COSMIC hot spot (Forbes et al., 2011). Muta-
tions that were present in the COSMIC database (http://www.sanger.ac.uk/
genetics/CGP/cosmic/) at least ten times were considered oncogenic muta-
tions. We considered a dominant gene activated if it harbored such a variant
or a high-level, focal ampliﬁcation. We considered recessive cancer genes
(e.g., TP53) to be inactivated if the gene had (1) a truncating mutation, (2)
compound missense mutations, (3) a hemizygous missense mutation, or (4)
homozygous copy number loss. We mapped each patient j to hallmark k if
the sample contained at least one activating or inactivating event in a dominant
or recessive cancer gene, respectively, that mapped to hallmark k.
The dbGAP accession number for the data reported in this paper is
Supplemental Information includes Extended Experimental Procedures, six
ﬁgures, and six tables and can be found with this article online at http://dx.
We thank all members of the Biological Samples Platform, DNA Sequencing
Platform, and Genetic Analysis Platforms of the Broad Institute, without whose
work this sequencing project could not have occurred. M.I. is supported by
NCI Training Grant T32 CA9216. T.J.P. is supported by a Canadian Institutes
of Health Research Fellowship. A.H.B. is supported by a postdoctoral
fellowship from the American Cancer Society. P.S.H. is supported by a Young
Investigator Award from the National Lung Cancer Partnership and a Career
Development Award from the Dana-Farber/Harvard Cancer Center Lung
Cancer SPORE P50 CA090578. E.H. is supported by NIGMS training
grant T32GM07753. R.K.T. is supported by the German Ministry of Science
and Educa tion (BMBF) as a member of the NGFN-Plus program (grant
01GS08100), by the Max Planck Society (M.I.F.A.NEUR8061), by the Deutsche
Forschungsgemeinschaft (DFG) through SFB832 (TP6) and grant TH1386/3-1,
by the EU-Framework Programme CURELUNG (HEALTH-F2-2010-258677),
by Stand Up To Cancer-American Association for Cancer Research Innovative
Research Grant (SU2C-AACR-IR60109), by the Behrens-Weise Foundation,
and by an anonymous foundation. This work was supported by the National
Human Genome Research Institute (E.S.L.) and by the National Cancer
Institute, Uniting Against Lung Cancer, the Lung Cancer Research Foundation,
and the American Lung Association (M.M.). M.M., E.L., and L.A.G. are
founders and equity holders of Foundation Medicine, a for-proﬁt company
that provides next-generation sequencing diagnostic services. V.A.M. is an
employee of Foundation Medicine.
Received: May 8, 2012
Revised: July 27, 2012
Accepted: August 27, 2012
Published: September 13, 2012
Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A.,
Bork, P., Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server
for predicting damaging missense mutations. Nat. Methods 7, 248–249.
Banerji, S., Cibulskis, K., Rangel-Escareno, C., Brown, K.K., Carter, S.L., Fred-
erick, A.M., Lawrence, M.S., Sivachenko, A.Y., Sougnez, C., Zou, L., et al.
(2012). Sequence analysis of mutations and translocatio ns across breast
cancer subtypes. Nature 486, 405–409.
Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A.A., Kim,
S., Wilson, C.J., Leha
r, J., Kryukov, G.V., Sonkin, D., et al. (2012). The Cancer
Cell Line Encyclopedia enables predictive modelling of anticancer drug sensi-
tivity. Nature 483, 603–607.
Bass, A.J., Lawrence, M.S., Brace, L.E., Ramos, A.H., Drier, Y., Cibulskis, K.,
Sougnez, C., Voet, D., Saksena, G., Sivachenko, A., et al. (2011). Genomic
sequencing of colorectal adenocarcinomas identiﬁes a recurrent VTI1A-
TCF7L2 fusion. Nat. Genet. 43, 964–968.
Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J.,
Brown, C.G., Hall, K.P., Evers, D.J., Barnes, C.L., Bignell, H.R., et al. (2008).
Accurate whole human genome sequencing using reversible terminator chem-
istry. Nature 456, 53–59.
Berger, M.F., Lawrence, M.S., Demichelis, F., Drier, Y., Cibulskis, K., Siva-
chenko, A.Y., Sboner, A., Esgueva, R., Pﬂueger, D., Sougnez, C., et al.
The genomic complexity of primary human prostate cancer. Nature
Berger, M.F., Hodis, E., Heffernan, T.P., Deribe, Y.L., Lawrence, M.S., Proto-
popov, A., Ivanova, E., Watson, I.R., Nickerson, E., Ghosh, P., et al. (2012).
Melanoma genome sequencing reveals frequent PREX2 mutations. Nature
Bergethon, K., Shaw, A.T., Ou, S.H., Katayama, R., Lovly, C.M., McDonald,
N.T., Massion, P.P., Siwak-Tapp, C., Gonzalez, A., Fang, R., et al. (2012).
ROS1 rearrangements deﬁne a unique molecular class of lung cancers. J.
Clin. Oncol. 30, 863–870.
Cancer Genome Atlas Research Network. (2011). Integrated genomic anal-
yses of ovarian carcinoma. Nature 474, 609–615.
Cancer Genome Atlas Research Network. (2012). Comprehensive genomic
characterization of squamous cell lung cancers. Nature. Published online
September 9, 2012. http://dx.doi.org/10.1038/nature11404.
Chapman, M.A., Lawrence, M.S., Keats, J.J., Cibulskis, K., Sougnez, C.,
Schinzel, A.C., Harview, C.L., Brunet, J.P., Ahmann, G.J., Adli, M., et al.
(2011). Initial genome sequencing and analysis of multiple myelo ma. Nature
Cho, J., Pastorino, S., Zeng, Q., Xu, X., Johnson, W., Vandenberg, S., Verhaak,
R., Cherniack, A.D., Watanabe, H., Dutt, A., et al. (2011). Glioblastoma-derived
epidermal growth factor receptor carboxyl-terminal deletion mutants are
transforming and are sensitive to EGFR-directed therapies. Cancer Res. 71,
Ding, L., Getz, G., Wheeler, D.A., Mardis, E.R., McLellan, M.D., Cibulskis, K.,
Sougnez, C., Greulich, H., Muzny, D.M., Morgan, M.B., et al. (2008). Somatic
1118 Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc.
mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–
Ekstrand, A.J., Sugawa, N., James, C.D., and Collins, V.P. (1992). Ampliﬁed
and rearranged epidermal growth factor receptor genes in human glioblas-
tomas reveal deletions of sequences encoding portions of the N- and/or
C-terminal tails. Proc. Natl. Acad. Sci. USA 89, 4309–4313.
sser, S.J., Allis, C.D., and Lewis, P.W. (2011). Cancer. New epigenetic
drivers of cancers. Science 331, 1145–1146.
Fisher, S., Barry, A., Abreu, J., Minie, B., Nolan, J., Delore y, T.M., Young, G.,
Fennell, T.J., Allen, A., Ambrogio, L., et al. (2011). A scalable, fully automated
process for construction of sequence-ready human exome targeted capture
libraries. Genome Biol. 12, R1.
Forbes, S.A., Bindal, N., Bamford, S., Cole, C., Kok, C.Y., Beare, D., Jia, M.,
Shepherd, R., Leung, K., Menzies, A., et al. (2011). COSMIC: mining complete
cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic
Acids Res. 39 (Database issue), D945–D950. Published online October 15,
Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rah-
man, N., and Stratton, M.R. (2004). A census of human cancer genes. Nat. Rev.
Cancer 4, 177–183.
Gerlinger, M., Rowan, A.J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos,
E., Martinez, P., Matthews, N., Stewart, A., Tarpey, P., et al. (2012). Intratumor
heterogeneity and branched evolution revealed by multiregion sequencing.
N. Engl. J. Med. 366, 883–892.
Graubert, T.A., Shen, D., Ding, L., Okeyo-Owuor, T., Lunn, C.L., Shao, J.,
Krysiak, K., Harris, C.C., Koboldt, D.C., Larson, D.E., et al. (2012). Recurrent
mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nat.
Genet. 44, 53–57.
Grøvdal, L.M., Stang, E., Sorkin, A., and Madshus, I.H. (2004). Direct interac-
tion of Cbl with pTyr 1045 of the EGF receptor (EGFR) is required to sort the
EGFR to lysosomes for degradation. Exp. Cell Res. 300, 388–395.
Hainaut, P., and Pfeifer, G.P. (2001). Patterns of p5 3 G—>T transversions in
lung cancers reﬂect the primary mutagenic signature of DNA-damage by
tobacco smoke. Carcinogenesis 22, 367–374.
Hanahan, D., and Weinberg, R.A. (2000). The hallmarks of cancer. Cell 100,
Hanahan, D., and Weinberg, R.A. (2011). Hallmarks of cancer: the next gener-
ation. Cell 144, 646–674.
Hashimoto, Y.K., Satoh, T., Okamoto, M., and Takemori, H. (2008). Importance
of autophosphorylation at Ser186 in the A-loop of salt inducible kinase 1 for its
sustained kinase activity. J. Cell. Biochem. 104 , 1724–1739.
Hodis, E., Watson, I.R., Kryukov, G.V., Arold, S.T., Imielinski, M., Theurillat,
J.P., Nickerson, E., Auclair, D., Li, L., Place, C., et al. (2012). A landscape of
driver mutations in melanoma. Cell 150, 251–263.
Ju, Y.S., Lee, W.C., Shin, J.Y., Lee, S., Bleazard, T., Won, J.K., Kim, Y.T., Kim,
J.I., Kang, J.H., and Seo, J.S. (2012). A transforming KIF5B and RET gene
fusion in lung adenocarcinoma revealed from whole-genome and transcrip-
tome sequencing. Genome Res. 22, 436–445.
Kan, Z., Jaiswal, B.S., Stinson, J., Janakira man, V., Bhatt, D., Stern, H.M., Yue,
P., Haverty, P.M., Bourgon, R., Zheng, J., et al. (2010). Diverse somatic muta-
tion patterns and pathway alterations in human cancers. Nature 466, 869–873.
Koivunen, J.P., Kim, J., Lee, J., Rogers, A.M., Park, J.O., Zhao, X., Naoki, K.,
Okamoto, I., Nakagawa, K., Yeap, B.Y., et al. (2008). Mutations in the LKB1
tumour suppressor are frequently detected in tumours from Caucasian but
not Asian lung cancer patients. Br. J. Cancer 99, 245–252.
Kosaka, T., Yatabe, Y., Onozato, R., Kuwano, H., and Mitsudomi, T. (2009).
Prognostic implication of EGFR, KRAS, and TP53 gene mutations in a large
cohort of Japanese patients with surgically treated lung adenocarcinoma.
J. Thorac. Oncol. 4, 22–29.
Krzywinski, M., Sch ein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D.,
Jones, S.J., and Marra, M.A. (2009). Circos: an information aesthetic for
comparative genomics. Genome Res. 19, 1639–1645.
Kwak, E.L., Bang, Y.J., Camidge, D.R., Shaw, A.T., Solomon, B., Maki, R.G.,
Ou, S.H., Dezube, B.J., Ja
nne, P.A., Costa, D.B., et al. (2010). Anaplastic
lymphoma kinase inhibition in non-small-cell lung cancer. N. Engl. J. Med.
Lee, W., Jiang, Z., Liu, J., Haverty, P.M., Guan, Y., Stinson, J., Yue, P., Zhang,
Y., Pant, K.P., Bhatt, D., et al. (2010). The mutation spectrum revealed by
paired genome sequences from a lung cancer patient. Nature 465, 473–477.
Liu, P., Morrison, C., Wang, L., Xiong, D., Vedell, P., Cui, P., Hua, X., Ding, F.,
Lu, Y., James, M., et al. (2012). Identiﬁcation of somatic mutations in non-small
cell lung carcinomas using whole-exome sequencing. Carcinogenesis 33,
Medvedev, P., Stanciu, M., and Brudno, M. (2009). Computational methods
for discovering structural variation with next-generation sequencing. Nat.
Methods Suppl. 6, S13–S20.
Minna, J.D., and Schiller, J.H. (2008). Lung Cancer. In Harrison’s Principles
of Internal Medicine, 17th Edition, A.S. Fauci, E. Braunwald, D.L. Kasper,
Hauser, D.L. Longo, J.L. Jameson, and J. Loscalzo, eds. (New York:
McGraw-Hill), pp. 551–562.
Mitsudomi, T., Oyama, T., Kusano, T., Osaki, T., Nakanishi, R., and Shira-
kusa, T. (1993). Mutations of the p53 gene as a predictor of poor prognosis
in patients with non-small-cell lung cancer. J. Natl. Cancer Inst. 85, 2018–
Nikolaev, S.I., Rimoldi, D., Iseli, C., Valsesia, A., Robyr, D., Gehrig, C., Harsh-
man, K., Guipponi, M., Bukac h, O., Zoete, V., et al. (2012). Exome sequencing
identiﬁes recurrent somatic MAP2K1 and MAP2K2 mutations in melanoma.
Nat. Genet. 44, 133–139.
Pao, W., and Chmielecki, J. (2010). Rational, biologically based treatment of
EGFR-mutant non-small-cell lung cancer. Nat. Rev. Cancer 10, 760–774.
Pao, W., and Hutchinson, K.E. (2012). Chipping away at the lung cancer
genome. Nat. Med. 18, 349–351.
Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B.,
Heelan, R., Rusch, V., Fulton, L., et al. (2004). EGF receptor gene mutations are
common in lung cancers from ‘‘never smokers’’ and are associated with sensi-
tivity of tumors to geﬁtinib and erlotinib. Proc. Natl. Acad. Sci. USA 101,
Pao, W., Wang, T.Y., Riely, G.J., Miller, V.A., Pan, Q., Ladanyi, M., Zakowski,
M.F., Heelan, R.T., Kris, M.G., and Varmus, H.E. (2005). KRAS mutations and
primary resistance of lung adenocarcinomas to geﬁtinib or erlotinib. PLoS
Med. 2, e17.
Pearce, L.R., Komander, D., and Alessi, D.R. (2010). The nuts and bolts of AGC
protein kinases. Nat. Rev. Mol. Cell Biol. 11, 9–22.
Pines, G., Huang, P.H., Zwang, Y., White, F.M., and Yarden, Y. (2010).
EGFRvIV: a previously uncharacterized oncogenic mutant reveals a kinase
autoinhibitory mechanism. Oncogene 29, 5850–5860.
Sanchez-Cespedes, M., Parrella, P., Esteller, M., Nomoto, S., Trink, B., Eng-
les, J.M., Westra, W.H., Herman, J.G., and Sidransky, D. (2002). Inactivation
of LKB1/STK11 is a common event in adenocarcinomas of the lung. Cancer
Res. 62, 3659–3662.
ckler, B., and Lehner, B. (2012). Chromatin organization is a
major inﬂuence on regional mutation rates in human cancer cells. Nature
Slebos, R.J., Hruban, R.H., Dalesio, O., Mooi, W.J., Offerhaus, G.J., and Ro-
denhuis, S. (1991). Relationship between K-ras oncogene activation and
smoking in adenocarcinoma of the human lung. J. Natl. Cancer Inst. 83,
Stransky, N., Egloff, A.M., Tward, A.D., Kostic, A.D., Cibulskis, K., Sivachenko,
A., Kryukov, G.V., Lawrence, M.S., Sougnez, C., McKenna, A., et al. (2011).
The mutational landscape of head and neck squamous cell carcinoma.
Science 333, 1157–1160.
Takeuchi, K., Soda, M., Togashi, Y., Suzuki, R., Sakata, S., Hatano, S., Asaka,
R., Hamanaka, W., Ninomiya, H., Uehara, H., et al. (2012). RET, ROS1 and ALK
fusions in lung cancer. Nat. Med. 18, 378–381.
Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc. 1119
Tanaka, H., Yanagisawa, K., Shinjo, K., Taguchi, A., Maeno, K., Tomida, S.,
Shimada, Y., Osada, H., Kosaka, T., Matsubara, H., et al. (2007). Lineage-
speciﬁc dependency of lung adenocarcinomas on the lung development
regulator TTF-1. Cancer Res. 67, 6007–6011.
Totoki, Y., Tatsuno, K., Yamamoto, S., Arai, Y., Hosoda, F., Ishikawa, S., Tsut-
sumi, S., Sonoda, K., Totsuka, H., Shirakihara, T., et al. (2011). High-resolution
characterization of a hepatocellular carcinoma genome. Nat. Genet. 43,
Travis, W.D. (2002). Pathology of lung cancer. Clin. Chest Med. 23, 65–81, viii.
Wei, X., Walia, V., Lin, J.C., Teer, J.K., Prickett, T.D., Gartner, J., Davis, S.,
Stemke-Hale, K., Davies, M.A., Gershenwald, J.E., et al; NISC Comparative
Sequencing Program. (2011). Exome sequencing identiﬁes GRIN2A as
frequently mutated in melanoma. Nat. Genet. 43, 442–446.
Weir, B.A., Woo, M.S., Getz, G., Perner, S., Ding, L., Beroukhim, R., Lin, W.M.,
Province, M.A., Kraja, A., Johnson, L.A., et al. (2007). Characterizing the
cancer genome in lung adenocarcinoma. Nature 450, 893–898.
Wheeler, M., and Domin, J. (2001). Recruitment of the class II phosphoinosi-
tide 3-kinase C2beta to the epidermal growth factor receptor: role of Grb2.
Mol. Cell. Biol. 21, 6660–6667.
World Health Organization (2012). Cancer (http://www.who.int/cancer/en/).
Yoshida, K., Sanada, M., Shiraishi, Y., Nowak, D., Nagata, Y., Yamamoto, R.,
Sato, Y., Sato-Otsubo, A., Kon, A., Nagasaki, M., et al. (2011). Frequent
pathway mutations of splicing machinery in myelodysplasia. Nature 478,
Yuza, Y., Glatt, K.A., Jiang, J., Greulich, H., Minami, Y., Woo, M.S., Shima-
mura, T., Shapiro, G., Lee, J.C., Ji, H., et al. (2007). Allele-dependent variation
in the relative cellular potency of distinct EGFR inhibitors. Cancer Biol. Ther. 6,
Zang, Z.J., Cutcutache, I., Poon, S.L., Zhang, S.L., McPherson, J.R., Tao, J.,
Rajasegaran, V., Heng, H.L., Deng, N., Gan, A., et al. (2012). Exome
sequencing of gastric adenocarcinoma identiﬁes recurrent somatic mutations
in cell adhesion and chromatin remodeling genes. Nat. Genet. 44, 570–574.
1120 Cell 150, 1107–1120, September 14, 2012 ª2012 Elsevier Inc.