ArticlePDF Available

Evaluation of Exome Sequencing to Estimate Tumor Burden in Plasma

PLOS
PLOS ONE
Authors:
  • Epidemic Sound

Abstract and Figures

Accurate estimation of systemic tumor load from the blood of cancer patients has enormous potential. One avenue is to measure the presence of cell-free circulating tumor DNA in plasma. Various approaches have been investigated, predominantly covering hotspot mutations or customized, patient-specific assays. Therefore, we investigated the utility of using exome sequencing to monitor circulating tumor DNA levels through the detection of single nucleotide variants in plasma. Two technologies, claiming to offer efficient library preparation from nanogram levels of DNA, were evaluated. This allowed us to estimate the proportion of starting molecules measurable by sequence capture (<5%). As cell-free DNA is highly fragmented, we designed and provide software for efficient identification of PCR duplicates in single-end libraries with a varying size distribution. On average, this improved sequence coverage by 38% in comparison to standard tools. By exploiting the redundant information in PCR-duplicates the background noise was reduced to ∼1/35000. By applying our optimized analysis pipeline to a simulation analysis, we determined the current sensitivity limit to ∼1/2400, starting with 30 ng of cell-free DNA. Subsequently, circulating tumor DNA levels were assessed in seven breast- and one prostate cancer patient. One patient carried detectable levels of circulating tumor DNA, as verified by break-point specific PCR. These results demonstrate exome sequencing on cell-free DNA to be a powerful tool for disease monitoring of metastatic cancers. To enable a broad implementation in the diagnostic settings, the efficiency limitations of sequence capture and the inherent noise levels of the Illumina sequencing technology must be further improved.
Content may be subject to copyright.
Evaluation of Exome Sequencing to Estimate Tumor
Burden in Plasma
Daniel Klevebring
1.
,Ma
˚rten Neiman
1.
, Simon Sundling
1
, Louise Eriksson
1
, Eva Darai Ramqvist
2
,
Fuat Celebioglu
3
, Kamila Czene
4
, Per Hall
4
, Lars Egevad
2
, Henrik Gro
¨nberg
4
, Johan Lindberg
1
*
1Department of Medical Epidemiology and Biostatistics, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden, 2Department of Pathology and Cytology,
Karolinska University Hospital, Stockholm, Sweden, 3Department of Clinical Science and Education, So
¨dersjukhuset, Stockholm, Sweden, 4Department of Medical
Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Abstract
Accurate estimation of systemic tumor load from the blood of cancer patients has enormous potential. One avenue is to
measure the presence of cell-free circulating tumor DNA in plasma. Various approaches have been investigated,
predominantly covering hotspot mutations or customized, patient-specific assays. Therefore, we investigated the utility of
using exome sequencing to monitor circulating tumor DNA levels through the detection of single nucleotide variants in
plasma. Two technologies, claiming to offer efficient library preparation from nanogram levels of DNA, were evaluated. This
allowed us to estimate the proportion of starting molecules measurable by sequence capture (,5%). As cell-free DNA is
highly fragmented, we designed and provide software for efficient identification of PCR duplicates in single-end libraries
with a varying size distribution. On average, this improved sequence coverage by 38% in comparison to standard tools. By
exploiting the redundant information in PCR-duplicates the background noise was reduced to ,1/35000. By applying our
optimized analysis pipeline to a simulation analysis, we determined the current sensitivity limit to ,1/2400, starting with
30 ng of cell-free DNA. Subsequently, circulating tumor DNA levels were assessed in seven breast- and one prostate cancer
patient. One patient carried detectable levels of circulating tumor DNA, as verified by break-point specific PCR. These results
demonstrate exome sequencing on cell-free DNA to be a powerful tool for disease monitoring of metastatic cancers. To
enable a broad implementation in the diagnostic settings, the efficiency limitations of sequence capture and the inherent
noise levels of the Illumina sequencing technology must be further improved.
Citation: Klevebring D, Neiman M, Sundling S, Eriksson L, Darai Ramqvist E, et al. (2014) Evaluation of Exome Sequencing to Estimate Tumor Burden in
Plasma. PLoS ONE 9(8): e104417. doi:10.1371/journal.pone.0104417
Editor: Natasha Kyprianou, University of Kentucky College of Medicine, United States of America
Received May 26, 2014; Accepted July 8, 2014; Published August 18, 2014
Copyright: ß2014 Klevebring et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The authors confirm that, for approved reasons, some access restrictions apply to the data underlying the findings. All data relevant for the
interpretation of our findings is provided in the main manuscript or the supplementary information except for the raw sequence data. Any data providing
genotype information is considered to be a personal registry by the Swedish law (Personal Data Act), thereby prohibiting the submission to a public repository.
The raw sequence data is instead available upon request from the authors if approval has been obtained from the Regional Ethical Vetting Board in Stockholm.
Funding: This study was supported by the Linnaeus Cancer Risk Prediction Center (grant number 70867902) and the AstraZeneca Translational
Research Center. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* Email: johan.lindberg@ki.se
.These authors contributed equally to this work.
Introduction
All human individuals harbor cell-free DNA (cfDNA) in the
circulation [1,2]. In cancer patients, as tumor cells die, DNA is
shed into bloodstream. Circulating tumor DNA (ctDNA) consti-
tute a fingerprint, which can be used for disease monitoring.
Circulating tumor DNA has been correlated to both early
detection and prognosis [3,4]. Since the half-life of cfDNA is less
than one hour, it has been successfully used to monitor treatment
progression [5,6]. Although ctDNA is an extremely promising
biomarker, clinical implementation has been impeded, not only by
inherent challenges in the characteristics of cfDNA, but also in
tumor biology as well as technology. Cell-free DNA is present in
low concentration, and the majority of fragments are short which
limits the efficiency of PCR based methodologies. Circulating
tumor DNA fractions are low, except in metastatic and high-grade
disease. Levels of ctDNA was demonstrated to be ,1% on average
for non-metastatic colorectal tumors [4] which marks the upper
bound for a desired sensitivity. Furthermore, as revealed by the
large ongoing cancer sequencing efforts, any two individuals
harboring the same cancer diagnosis share few, if any, somatic
events [7], which require a high degree of flexibility. Various
methods have been used for the detection of tumor-specific
somatic lesions in the circulation. Monitoring genomic break-
points through digital PCR is highly specific, allowing for the
detection of single copy cancer genomes in milliliters of plasma
[5,6]. Sequencing of a selected subset of genes has demonstrated
potential to detect ctDNA down to 0.14% [8] and excellent
correlation to orthogonal technologies such as digital PCR [9].
Chan and colleagues demonstrated the feasibility of using whole
genome sequencing of plasma DNA in cancer patients to detect
somatic copy number alterations according to the same rationale
as previously shown for trisomy 21 [1,10]. Although promising,
the approach requires unfeasible deep coverage for a sensitivity
level of 1% [11]. Recently, Murtaza and colleagues displayed the
advent of exome sequencing to monitor multiple mutations in
PLOS ONE | www.plosone.org 1 August 2014 | Volume 9 | Issue 8 | e104417
concert. Targeted sequencing still has the economical advantage
over whole-genome sequencing, while capturing the majority of
known driver mutations [7]. Nevertheless, these individuals
suffered from metastatic late stage disease and whether the
sensitivity is good enough for detection of low levels of ctDNA
remains unknown [12]. Additionally, unlike other assays [13],
whole-exome sequencing does not require individual assays to be
tailored for the vast majority of patients [14], a requirement for a
broader clinical utility. Here we investigate the utility of using
exome sequencing for monitoring of ctDNA levels through
detection of single nucleotide variants in plasma. Since the
number of variants is commonly ,100 for most solid tumors [14]
and in order to retain maximal sensitivity we 1) evaluated the
capability of two promising approaches to generate sequencing
libraries with high complexity from small amounts of fragmented
DNA without pre-amplification; 2) optimized data analysis
pipelines for read depth and accuracy; 3) applied exome
sequencing on plasma obtained from prostate cancer and breast
cancer patients to demonstrate its utility. In conclusion, the main
limiting factor was the low efficiency of library preparation and
subsequent targeted capture. Less than five percent of starting
molecules were observable, limiting the sensitivity to 1/2433 using
10 ml of plasma, thereby restricting the applicability to locally
advanced cancers reported to emit fragments into the circulation.
Materials and Methods
Samples and clinical data
Prostate tumor tissue and clinical data were collected from men
who underwent radical prostatectomy at the Karolinska University
Hospital in Stockholm as described previously [15]. Blood was
collected at patient registration in the eve of surgery, directly after
surgery, at discharge and at return visit. Breast tumor tissue was
collected from women who underwent surgery for breast cancer at
the South General Hospital in Stockholm. Blood was collected at
the patient registration, approximately one week prior to surgery.
Signed informed consent was obtained for all study participants.
The ethical approval was give by the Regional Ethical Vetting
Board in Stockholm (located at the Karolinska Institutet) with
registration numbers 2009/1357-32 (prostate cancer samples) and
2010/958-31/3 (breast cancer samples).
Extraction of nucleotides
DNA was extracted from whole blood using QIAmp spin
miniprep kit (Qiagen, Hilden, Germany). DNA/RNA and
proteins were simultaneously extracted from prostate and breast
cancer tissues, as described previously [15]. Cell-free circulating
DNA was isolated from plasma using QIAamp Circulating
Nucleic Acid Kit (Qiagen). All extractions were done according
to the manufacturers recommendations. High molecular weight
fragments were removed from the cell-free circulating DNA
samples by polyethylene glycol (PEG) mediated precipitation on
carboxylic acid coated magnetic beads (MyOne, Invitrogen) as
described previously [16] using 8% and 25% PEG 6000 (Merck) in
the first and second solution respectively in a Magnatrix
TM
1200
(NorDiag ASA, Oslo, Norway) liquid handling robot. DNA
concentrations were measured using a Qubit fluorometer (Invitro-
gen, CA, USA) dsDNA HS kit and the size distributions of the cell-
free DNA were assessed using Agilent 2100 BioAnalyzer (Agilent
Technologies, Santa Clara, CA, USA) and the DNA HS kit.
Simulated ctDNA
DNA samples derived from tumor tissue was sheared by
suspension in 120 ml nuclease free water and sonication using the
Covaris (Covaris Inc., MA, USA) sonication system using the
settings for a 150 bp peak according to the manufacturers
instructions. 1 ml of each sample was analyzed using an Agilent
2100 Bionalyzer and the DNA 1000 kit. Automated size-selection
was done as described previously [16] using 10% and 11% PEG
6000 (Merck) in the first and second solution respectively in a
Magnatrix
TM
1200 (NorDiag) liquid handling robot and the
resulting size distributions were assessed using Bionalyzer and the
DNA 1000 kit (figure S1).
Exome capture
For the evaluation of performing exome sequencing on minute
amounts of sample, sequencing library preparation was done from
1 and 10 ng DNA derived from prostate cancer tissue using
Mondrian SP+System (NuGEN Technologies Inc., CA, USA) or
ThruPLEX-FD Prep Kit (Rubicon Genomics, MI, USA) accord-
ing to the manufacture’s recommendations. Exome capture was
performed as described previously [15]. Custom blocking adapters
were used for respective technology. For primary tumor material,
whole blood of plasma, libraries were prepared ThruPLEX-FD
Prep Kit (Rubicon Genomics). Exome capture was carried out
using the SeqCap EZ Exome Library Version 1 (Roche
Nimblegen Inc, Madison, WI, USA) according to the manufac-
turers instructions.
Sequencing
Sequencing was carried using Illumina 26100 bp paired-end
sequencing on a HiSeq 2500 instrument according to the
manufacturers recommendations using TruSeq PE Cluster Gen-
eration Kit v3 and the TruSeq SBS Kit v3. All lanes were spiked
with 1% phiX as a quality control.
Processing of sequence data
Three analysis pipelines were implemented to compare the
performance of using 1) standard sequencing processing 2)
standard sequencing processing but with merging overlapping
paired-end reads to improve base qualities and reduce noise rates
3) standard sequencing processing but with merging overlapping
paired-end reads with subsequent optimized PCR duplicate
processing to improve base qualities and to reduce noise rates.
Standard sequencing processing was defined as 1) removal of
adapter sequences only, using SeqPrep (v. 1.1) [17] 2) alignment to
the reference genome (hg19) using BWA (v. 0.6.2) [18] 3)
realignment using GATK [19] 4) removal of technical duplicates
using Picard [20]. All QC metrics were obtained using Picard.
Sequence data from tumor tissues and normal blood DNA was
processed using standard sequencing processing. Realignment and
base quality recalibration was carried out using GATK v2.8-1
before the identification of somatic point mutations using Mutect v
1.1.5 [21]. Merging of overlapping reads was performed using the
SeqPrep software. SeqPrep was modified to set discordant
overlapping base pairs to N with quality 2. Concordant base
pairs were used to boost base qualities by addition to a maximum
of 45. The modified version of the SeqPrep is available at https://
github.com/dakl/SeqPrep. For optimal utilization of data and to
further improve noise rates MergeDuplicates was designed. Unlike
MarkDuplicates, provided in the Picard software suite, MergeDu-
plicates takes amplicon length of single-end data into account for
the identification of PCR duplicates and also merges duplicates, to
provide a consensus call, for increasing base qualities (figure S2).
For each set of duplicate molecules, each base is traversed and if at
least 75% of bases in each position are identical, this base is kept
and the phred-scaled qualities are boosted by addition, otherwise
the base is set to N with quality 2. Maximum base quality was set
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 2 August 2014 | Volume 9 | Issue 8 | e104417
to 50. MergeDuplicates is available at https://bitbucket.org/dakl/
mergeduplicates. Note, the boosted base qualities from SeqPrep or
MergeDuplicates were not used for variant identification but as a
means of separating data with support from multiple independent
sources at the level of overlapping sequencing (modified version of
SeqPrep) and PCR duplicates (MergeDuplicates). Mutational data
for each position was obtained using Samtools [22]. Samtools
associates each base with a Base Alignment Quality (BAQ). The
BAQ gives the phred-scaled probability of each base being
misaligned and is the minimum between base quality and the
BAQ [23]. To reduce background noise, variants were filtered to
remove positions residing in regions of the human genome with
low uniqueness (mappability) [24] or known to harbor germline
variants (fig. S3). To further restrict to regions with excellent
mapping characteristics, variants were not allowed if within 50 bp
from a region with mappability ,1. Data management and
statistical analysis were done in R [25].
Simulation
To investigate the sensitivity of using exome sequencing in
plasma, the following variables were varied; proportion ctDNA
(range 0.00001–0.05), amount of starting material (range 3–
60 ng). As the quality of the background variant reads were lower
relative to reference bases, a quality filter was set where the
maximum fraction of reference reads was kept relative the noise.
For simplicity an exome was assumed to contain 50 variants,
which in concert with the determined assay efficiency set the
collective depth for each iteration. Also, sample bases were drawn
with a probability to draw a variant base equal to the current
fraction of ctDNA. For the background, the whole set of data was
used for each iteration. Subsequently a one-sided fishers exact test
was performed to test if the fraction of ctDNA was significantly
higher relative to the background data. This process was repeated
1000 times for each fraction of ctDNA and each amount of
starting material.
Identification and validation of somatic rearrangements
To identify somatic chromosomal rearrangements, we per-
formed whole-genome sequencing using of long-insert (approxi-
mately 700 bps) libraries for the breast tumors and paired normal
DNA from blood. DNA was fragmented using a Covaris S1 system
with the following settings: Duty Cycle 5%, Intensity 3, Cycles per
burst 200, and Time 50 s. Fragmented DNA was prepared as
described previously [26] and sequenced to an average of 36base
coverage on an Illumina HiSeq 2000 system. From the WGS data,
we used BreakDancer 1.3.5.1 to identify candidate somatic break-
points and BICseq to identify copy number variants. We manually
filtered these data to keep regions with good support from
breakdancer as well as CNV support from BicSeq. To generate
primer pairs for validation, reads spanning the breakpoints were
extracted from the original fastq files. Each read pair was
concatenated (read 2 reverse complemented) with a 30N spaces
between them and fed into primer3 for design. In order to
minimize the risk that sequencing errors were used in the primer
design step, primer3 was instructed not to allow any bases with a
quality ,20 in the primers. For each breakpoint, the highest
scoring primer pair was used. In total across 5 patients (BC_B,
BC_C, BC_D, BC_F, BC_G), 19 primer pairs were design, out of
which 18 validated giving a band specific to the tumor. For the
primer pairs that gave unspecific product, the shortest band was
specific to the tumor. Sanger sequencing confirmed the 18
rearrangements. We selected 8 of the rearrangements for analysis
in the plasma samples (B3, C1, D1, D3, F4, F5, G1, G2). All
rearrangements except two gave good signals in the tumors, with
estimated allele frequencies between 5% and 78%.
Results
Evaluation of library preparation methodologies
A key aspect of performing exome sequencing of cfDNA is
efficient library preparation as 1 ml of plasma from prostate
cancer (PC) or breast cancer (BC) patients yields commonly yields
3 ng of cfDNA (data not shown). To avoid amplification biases
[27] we set out to evaluate candidate technologies claiming to
enable sequence analysis of nano-gram levels of DNA. As cfDNA
is heavily fragmented (,180 bp peak), the tagmentation-based kit
from Illumina (Nextera) was excluded as it causes further shearing
of the template DNA. To obtain enough DNA for repeated
comparisons and to create a source of simulated cfDNA (figure
S1), DNA from a tumor, previously profiled using whole-exome
sequencing (SWE-54) [15] was carefully prepared to mimic the
true size distribution of cfDNA. The simulated cfDNA was
prepared for capture and sequencing using the ThruPLEX kit
(Rubicon Genomics) and the Mondrian system (NuGEN Tech-
nologies). To optimize procedure efficiency, we evaluated: 1)
amount of starting material (1 and 10 ng); 2) number of cycles of
PCR performed after capture (9 and 18 cycles); 3) the capture
plexity (1, 4 and 8 plex capture, figure S4). Capture was performed
using a 5 Mb kit (,1300 genes associated to cancer) to facilitate
sequencing to saturation, and thereby, to better assess the
limitations of respective technology. Assay performance was
investigated using tools in the Picard software suit [20]. To ensure
a successful capture using both technologies, the fold enrichment
of target regions was assessed using only 10.000 reads, a level of
sequence depth where complexity is not limiting (Mondrian range
390–440 fold enrichment, ThruPLEX range 370–400 fold
enrichment). Library complexity was estimated as the average
sequence coverage obtained after removing PCR duplicates. To
enable a comparison of complexity throughout the whole range of
sequence depths, the data was subsampled, starting with 10.000
reads, incrementing with 1.256until all available data was used
for each sample (figure 1). Furthermore, analysis of variance was
performed (table S1) to estimate the significance of each factor,
which revealed only the starting amount of DNA and the library
preparation approach to be relevant for library complexity. The
average coverage using 1 ng and 10 ng of starting DNA was 2x
and 3x for the Mondrian system and 16x and 85x using
ThruPLEX, respectively. Therefore, we used the ThruPLEX
technology for further processing of plasma samples.
Optimized procedures for exome sequencing of cell-free
DNA
To evaluate the possibility of tracking point mutations in plasma
we performed whole-exome sequencing on tumor tissue and blood
from seven BC and one PC patient (table 1). This identified
somatic variants for each individual, with the potential to act as
personalized biomarkers in the circulation. Subsequently, we
performed exome sequencing on plasma samples obtained before
(all patients) and 1 month after surgery (the PC patient only). The
level of uniqueness varies throughout genomes as do alignment-
related errors, even in the presence of stringent filtering [28].
Therefore, to test if the fraction of variant reads found in cfDNA is
significant relative to noise, the same positions must be used in the
background distribution of samples. As background for each
position we used data from all plasma samples profiled here,
excluding samples where identical mutations were found in the
primary tumor material. To obtain a high-quality call set, all
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 3 August 2014 | Volume 9 | Issue 8 | e104417
variants identified in the primary tumor tissue were restricted to
unique regions, not previously reported to harbor SNPs (figure
S3). This removed 24% of all positions, retaining on average 81
variants per individual.
A potentially limiting factor of sensitivity is the overall error rate
of the Illumina sequencing technology, which was recently
reported to be as high as 0.38% in cell-free DNA [10], here
found to be 0.29%. Previously, and as Illumina offers paired-end
sequencing, overlapping sequencing have been used to reduce
errors made during sequencing by synthesis [29]. As cfDNA is
fragmented (,180 bp) we explored this option, although this does
not allow for the correction of PCR errors occurring before
sequencing. Commonly, during low-level processing of sequence
data, PCR duplicates are identified through the mapped starting
positions of individual reads. As these duplicates originate from the
same molecule, they offer means to identify PCR errors and to
decrease the error rate. Therefore, we explored the variation in
noise rates, going from standard sequencing processing to merging
of overlapping reads and lastly, using PCR duplicates to reduce
noise (analysis pipelines 1–3, methods). Additionally, for paired-
end data, the mapped position of both ends is used for efficient
identification of PCR duplicates. In contrast, only the starting
position is used for single-end data, as all fragments commonly
have the same read-length. For cfDNA, and for other merged
libraries with short insert-sizes, the distribution of fragment sizes
obtained after merging offers means to distinguish reads originat-
ing from different starting molecules harboring the same mapped
starting position (figure S2). On average, not merging reads caused
an average inflated coverage of 58% relative merging and using
standard MarkDuplicates provided in Picard [20]. Taking
fragment size into account substantially improved coverage. There
was only a 15% difference between not merging and merging in
combination with MergeDuplicates (custom software, figure S5).
To minimize the background noise, an increasing BAQ (base
alignment quality) filter was applied which demonstrated a
significant decrease with increasing stringency (figure 2A). Pro-
portionally, the highest fraction of reference reads relative the
noise was retained at the minimum noise rate (figure 2B, table 2).
Noise rate was defined as the number of reads supporting the
presence of a mutation in the background samples divided by the
total number of reference reads in the same positions. Merging
reads in combination with optimal PCR duplicate processing
yielded the lowest noise rate (1/35419) at a BAQ cutoff of 46
(table 2). Therefore, we processed all plasma samples according to
pipeline 3) using a BAQ cutoff of 46.
Finally, a one-sided Fisher’s Exact Test was performed to assess
if ctDNA could be detected in the patients’ paired plasma samples
(table 1, table S2). The prostate patient carried lymph-node
metastases at surgery. The metastases were, in conjunction with
the primary tumor, exome sequenced previously (SWE-54 A-C)
[15] and variants from the metastases were used for ctDNA
estimation. For this individual, the pre-surgical sample was positive
for ctDNA, albeit non-significant, whereas the post-surgical
sample was negative. For the breast cancer samples, only one of
the six patients was positive (BC_D). The breast tumors sequenced
here comes from a prospective collection of patients, consisting of
newly diagnosed invasive breast cancers at least 1 cm in size.
Interestingly, the only positive sample was also the one with
highest proliferation (as determined by percent cells staining with
Ki67, table 1). For validation purposes we performed whole-
genome long-insert sequencing (700 bps inserts) of five BC tumors
and paired normal samples to <3x base coverage, corresponding
to approximately 10x physical coverage of the genome. This data
was used to identify candidate breakpoints of somatic chromo-
somal rearrangements. In total, 18 of 19 candidates were validated
using Sanger sequencing. A digital PCR assay was set up to for
Figure 1. Duplication rates using the ThruPLEX kit and the Mondrian system. The proportion PCR duplicates in relation to sequencing
depth demonstrated by subsampling of deeply sequenced libraries. A) The lower range and B) the higher range of sequencing depth. At any given
number of reads sequenced, libraries with an input amount of 10 ng shows a lower fraction of duplicated reads compared to 1 ng. Furthermore,
ThruPLEX-prepared libraries consistently show lower fraction of duplicated reads compared to Mondrian-prepared libraries.
doi:10.1371/journal.pone.0104417.g001
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 4 August 2014 | Volume 9 | Issue 8 | e104417
Table 1. Clinical and profiling data.
Tissue/Blood exome sequencing Plasma DNA exome sequencing
dPCR breakpoint
profiling
SampleID Clincal data Tumor* Blood*
Nbr point
mutations cfDNA source DNA (ng) Plasma* Fraction ctDNA P-value DNA (ng)
Fraction
ctDNA
SWE-54_B Gleason 5+4, T3A 87 86 27** Before surgery 3 39 0.001 0.265 NA NA
SWE-54_A Gleason 5+4, T3A 87 86 27** 1 month after
surgery
842 0 1 NANA
BC_A Elston 3, prolif 75%,
21 mm, ER+,PR+, HER2+
122 147 26 Before surgery 5 62 0 1 5 NA
BC_B Elston 3, prolif 70%,
18 mm, ER+, HER2+
114 137 184 Before surgery 1 18 0 1 1 0
BC_C Elston 2, prolif 13%,
16 mm, ER+,PR+
111 146 17 Before surgery 5 57 0 1 5 0
BC_D Elston 3, prolif 90%,
18 mm, ER+, HER2+
101 118 245 Before surgery 3 43 0.003 1.16E-18 3 0.026
BC_E Elston 1, prolif 10%,
38 mm, ER+,PR+
134 188 20 Before surgery 5 55 0 1 5 NA
BC_F Elston 2, prolif 2%,
12 mm, ER+,PR+
175 153 82 Before surgery 5 48 0 1 5 0
BC_G Elston 3, prolif 85%,
24 mm, ER+,PR+
163 168 47 Before surgery 5 39 0 1 5 0
*Mean coverage throughout the exome.
**The point mutations originate from the two lymph-node metastases that were sequenced in ref 15.
doi:10.1371/journal.pone.0104417.t001
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 5 August 2014 | Volume 9 | Issue 8 | e104417
each break-point, using 3–15 ng of cfDNA. This verified BC_D to
harbor detectable levels of ctDNA, whereas all others were
negative.
Sensitivity of whole-exome sequencing of cell-free DNA
Several factors affect the sensitivity, the background noise rate,
the amount of cfDNA obtained from plasma, the fraction of
Figure 2. Base alignment quality filtering to reduce noise. A) The noise rate in background samples for analysis pipelines 1)–3) as the base
alignment quality (BAQ) cutoff is increased. Rate is defined here as total number of mutant reads/(total number of mutant reads+the total number of
reference reads) in the background samples at mutated positions. B) The log2 ratio of (proportion of mutant reads)/(proportion of reference reads)
left by increasing BAQ cutoffs in the background samples at mutated positions. Colors scale according to analysis pipelines. Pipeline 1) BAQ limited to
40, as qualities were not altered. Pipeline 2) BAQ limited to 45 through merging of overlapping reads. Pipeline 3) BAQ limited to 50 by merging reads
and also accounting for concordance between PCR duplicates originating form the same starting molecule.
doi:10.1371/journal.pone.0104417.g002
Table 2. Comparison of analysis pipelines 1–3.
Pipeline
1
Proportion of N-bases
2
Optimal BAQ cutoff
3
Proportion of data left
4
Noise rate
5
Sensitivity
6
10.00018 No cutoff 1.00 1/2176 1/852
1 0.00018 38 0.40 1/11451 1/1372
2 0.00067 43 0.17 1/8673 1/775
3 0.00330 46 0.61 1/35419 1/2433
1)
Analysis pipelines as described in Material and Methods.
2)
Proportion of bases set to ‘‘N’’ during processing.
3)
Optimal base alignment quality cutoff (BAQ).
4)
Proportion of data left using the BAQ cutoff set in
3
.
5)
Noise rate defined as the number of mutant bases in the background divided by the number of reference bases.
6)
Sensitivity of exome sequencing to detect ctDNA based on in silico evaluation.
doi:10.1371/journal.pone.0104417.t002
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 6 August 2014 | Volume 9 | Issue 8 | e104417
ctDNA but also the efficiency of the sequence capture procedure,
including library preparation and enrichment. Six of the
independent library preparations performed, using the simulated
cfDNA, were sequenced to such depth that the proportion of new
unique molecules outside target regions was in majority (average
duplicate rate, 81%). As we started with 1 or 10 ng DNA and
based on the coverage retrieved, the fraction of starting molecules
accountable for was 4.7%. Averaging over the whole set of samples
sequenced here, also including lower duplicate levels, the average
fraction was 3.8%.
To investigate how efficiency, preprocessing and other factors
impact the sensitivity we performed an in silico evaluation. Since
the simulated cfDNA libraries, used for technical evaluation,
originated from a previously exome sequenced prostate tumor,
variants were recalled (18 point mutations passing filters). Positions
harboring mutations were used to sample variant- and reference
reads in various fractions and depths representing the sample
signal (9.4% mutation-supporting reads among all 4733 reads
from 18 positions). The background/noise level was estimated
from all plasma samples assayed here, collectively investigating all
known variant positions in all tumor samples (678 positions),
excluding each samples’ own mutations, which accumulated
517973 reads for the simulation. Due to the low efficiency and
to enable sensitive detection of ctDNA, all variant position were
pooled and collectively tested vs. the background distribution of
reads using a frequency test. Importantly, there was no difference
in error rate (Wilcoxon rank sum test, p-value = 0.917) or size
distribution (figure S1) between the simulated cfDNA and the real
plasma DNA, a prerequisite to avoid inflation of BAQ:s. Several
lessons could be drawn from this exercise (Figure 3); Due to the
small proportion of data left after BAQ filtering (table 2) and in
relation to ctDNA levels, boosting qualities through merging reads
did not improve sensitivity. Improved processing of duplicates in
combination with BAQ filtering gave the most sensitive approach,
although limited to 1/2433 at 95% sensitivity. To reach such
detection levels, 30 ng of cell-free DNA, commonly retrieved from
10 ml of plasma, is required. Further increasing input amounts by
a factor of two, only lowered the sensitivity marginally due to the
efficiency limitations of sequence capture (figure S6). As sequenc-
ing costs continue to drop, we investigated the sensitivity increase
by whole genome sequencing (WGS) assuming 3000 variants per
tumor genome and 306average coverage. Although sensitivity
was improved (1/5747) it is still not at the levels required to
estimate the tumor burden in patients with locally confined, early
stage tumors [13]
Discussion
The presence of tumor fragments in the circulation holds
promise to revolutionize care by offering efficient means to
monitor systemic disease. This has spurred an active field of
research, where many different approaches have been taken to
assess ctDNA levels. Nevertheless, in order to become clinical
routine, several requirements have to be fulfilled, including no only
high sensitivity, but also practical applicability. Therefore we set
out to investigate the utility of applying exome sequencing to
monitor ctDNA levels. We evaluated two technologies that claim
to enable sequencing of nano-gram levels of starting material
without prior amplification. In brief, the data obtained through
use of the ThruPLEX kit was superior, and therefore chosen for
further evaluation. As we endeavored in this project to investigate
the utility of ctDNA estimation through exome sequencing, we did
not foresee the low level of ctDNA to be present in these samples.
Only one sample contained detectable levels of ctDNA. Also, the
levels of ctDNA was estimated to be one order of magnitude
greater utilizing a break-point specific digital assay. It is probable
that different mutations are found at varying fractions in the
circulation as different clones in the heterogeneous primary tumor
mass have different characteristics, a phenomenon noted previ-
ously [12,30]. Still, our effort reveled several aspects not previously
reported. First, the efficiency of sequence capture is low (,5%),
impairing the use of exome sequencing to track specific mutations.
The most probable reason is the library preparation in itself, as
performing capture on eight samples simultaneously did not have
a significant effect on coverage. Therefore, unless massive amounts
of plasma is available, clinicians must resort to other methodol-
ogies to implement liquid biopsies as companion diagnostics, e.g.
to search for KRAS mutations in colorectal patients treated with
EGFR targeted therapy [31]. Nevertheless, to optimize the signal,
we designed a new algorithm for the identification of PCR
duplicates in merged read libraries. We envision this software to be
broadly used, not only for cfDNA libraries, but also for formalin-
fixed, paraffin embedded tumor material. Furthermore, by
utilizing information in PCR duplicates to boost the quality of
bases observed multiple times, the noise rate was significantly
reduced to 1/35419, albeit at the cost of filtering out 39% of all
data. Still, the sensitivity was limited to 1/2433, assuming the
availability of 30 ng of cfDNA. Therefore, we investigated the
sensitivity of performing ‘‘in silico’’ whole genome sequencing to
306coverage, tracking 3000 mutations. As no enrichment is
required for whole genome sequencing, data was only lost while
reducing the noise rate. This lowered the sensitivity to 1/5747 and
demarks the limitation to the optimized background noise rate in
relation to the low fraction of ctDNA being present in clinical
samples. Early-stage colorectal tumors was reported to harbored
levels down to 1/10.000 [4]. At this fraction, 306coverage would
expect to give (30 fold coverage63000 variants60.61 percent kept
after quality filtering60.0001 fraction of ctDNA) 5.5 mutant reads
by average. This must be put in the context of break-point specific
PCR [5,6], enabling the detection of single molecules. Assuming
10 ng of cfDNA and an unlikely 100% PCR efficiency, 3108
genome copies (10.000 pg6100%/3.218 pg per haploid human
genome copy) would be available for interrogation, making it
unlikely to detect such fractions without access to &10 ml of
plasma, commonly not utilized in the literature. This also
underlines the potential power to use multiple markers to track
ctDNA. Nevertheless, to reach broader clinical implementation
the challenges of using exome sequencing must be addressed. The
low efficiency of sequence capture is likely to be improved with
new library preparation approaches and in a system allowing for
agitation during enrichment, an approach popular while in-house
spotted arrays were utilized for gene expression experiments. The
low fraction of ctDNA likely requires sampling of larger volumes of
plasma, also for other technologies. Commonly 1–5 ml of plasma
is used in ctDNA experiments, although 10 times as much could
be obtained from patients without any obvious ethical dilemmas.
For exome sequencing to be effective, using such input amounts,
the inherent noise levels of short read data must be significantly
reduced. A previously demonstrated approach to obliterate the
background noise, was suggested by Loeb and colleagues is the
addition of a random barcode to the Illumina adapter construct
[29]. This enables the removal of basically all PCR related errors,
but the introduction of a random barcode is likely to complicate
adapter blocking during capture, with the risk of decreasing the
already low efficiency of capture. Although we used plasma
samples from cancer patients to estimate background noise, it is
highly unlikely to have had an effect on our sensitivity estimates as
the background noise rate was 16 times lower relatively unfiltered
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 7 August 2014 | Volume 9 | Issue 8 | e104417
data, processed according to standard tools used by the academic
community. Collectively, we demonstrate the use of exome
sequencing as a tool to detect ctDNA but as explained, unless
current inherent limitations of the approach are addressed,
researchers and clinicians are going to have to resort to other
options in order to do estimation of ctDNA in patients suffering
from most organ-confined, low-grade primary cancers.
Supporting Information
Figure S1 An electropherogram from a BioAnalyzer
instrument (Agilent) comparing the size-distribution of
the simulated cell-free DNA (top) and a real plasma
sample (bottom). Y-axis, fluorescence units (FU). X-axis,
fragment size in base pairs (bp).
(TIF)
Figure S2 Definition of PCR duplicates. The scheme
describes how PCR duplicates are defined using analysis pipelines
1–3) in corresponding order. Left; Adapters are trimmed using
SeqPrep. Subsequently, PCR duplicates are removed using
MarkDuplicates (Picard), which takes the leftmost and the
rightmost base into account for the identification of PCR
duplicates. Middle; Reads not merged are processed as Left.
PRC duplicates of merged reads are identified by leftmost starting
position. Right; Reads not merged are processed as Left. PCR
duplicates of merged reads are identified by starting positions and
template length.
(EPS)
Figure S3 Filtering somatic variants to obtain a set with
minimum background noise. Y-axis, number of variants
from each sample. Variants were filtered vs. simple repeat regions,
regions with low mappability, germline variants from the 1000
genomes project, de novo germline variants identified in these
individuals. This set was used for detection of ctDNA. For clarity,
PROT_EFF represents the number of variants with potential to
affect protein function (non_synonymous, truncating etc.). Iden-
tification of regions harboring simple repeats and low mappability
(50 mer) were downloaded from USCS genome browser. The
1000 genomes variant set was available in the GATK resource
bundle.
(EPS)
Figure S4 The evaluation was performed on a 5 Mb
capture kit to facilitate sequencing the samples to
saturation. The variables evaluated are shown from left to
right; 1) The number of samples captured simultaneously 2) The
number of PCR cycles after capture but before sequencing. The
Nimblegen SeqCap EZ standard protocol contains 18 rounds of
PCR, yielding unnecessary high amounts of material. As
amplification is performed on beads, it is not possible to use a
qPCR instrument for the post-capture PCR 3) The Mondrian
system and the ThruPLEX kit were evaluated for its capability to
Figure 3. Sensitivity of exome sequencing to track ctDNA. Analysis pipelines 1)–3) are displayed here with optimal base quality alignment
cutoffs (BAQ) and without for pipeline 1 to display the effects of BAQ filtering. As exome sequencing is limited by the efficiency of the capture
procedure, 30Xwhole genome sequencing was also simulated assuming 3000 variants in the genome. 1000 iterations were performed for each
ctDNA fraction assuming 50 variants in the exome, starting with 10 ml of plasma (30 ng of cfDNA). The sensitivity is defined as the number of
proportion of tests passing the significance threshold for each set of 1000 iterations (p,0.05, fishers’ exact test, comparing the number of variant and
reference reads from sample and background).
doi:10.1371/journal.pone.0104417.g003
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 8 August 2014 | Volume 9 | Issue 8 | e104417
provide sequence libraries with high complexity for capture. 4)
Input amounts of 1 ng and 10 ng representing cell-free DNA
starting amounts commonly available from plasma samples. For
both Mondrian and ThruPLEX, three independent library
preparations were performed for both 1 ng and 10 ng of DNA
all represented in the chart using 18 cycles of post-capture PCR.
As 18 cycle post-capture PCR yielded micrograms of material, the
impact was evaluated by taking the remaining material from the
six ThruPLEX libraries and performing another round of capture.
As the ThruPLEX data as superior we choose only to evaluate this
variable using ThruPLEX libraries.
(EPS)
Figure S5 Mean coverage obtained for the same sample
using analysis pipelines 1)–3). Identical samples are con-
nected with lines. Left; 5 Mb target region used for technological
evaluation. Right; Whole-exome data (26 Mb target region)
obtained from plasma samples.
(EPS)
Figure S6 The effect of varying input amounts of cell-
free DNA for exome sequencing to tract ctDNA. 1000
iterations were performed for each ctDNA fraction and input
amount assessed here assuming 50 variants for each exome. The
sensitivity is defined as the number of proportion of tests passing
the significance threshold for each set of 1000 iteration (p,0.05,
fishers’ exact test, comparing the number of variant and reference
reads from sample and background). Assuming 3 ng/ml the
colored lines represent 1, 3, 10, 15 and 20 ml of plasma.
(EPS)
Table S1 An analysis of variance table showing the influence
from different parameters on the library quality (measured as
percent duplicated reads) when performing exome sequencing
from small amounts of starting material. Listed parameters; cycles
9 or 18 PCR cycles after capture but before sequencing; plex
indicates the number of samples captured simultaneously, here 1,
4 or 8; input the starting amounts of DNA before library
preparation, here 1 and 10 ng; prep the technology used for
library preparation, here Mondrian and ThruPLEX.
(PDF)
Table S2 The number of reads supporting either the mutations
or reference bases in foreground- and background samples.
(PDF)
Acknowledgments
The authors would like to thank Anna Westring and Gabriela Prochazka
for excellent laboratory support. Furthermore, we acknowledge support
from Science for Life Laboratory, the Swedish national infrastructure
SNISS, and Uppmax for providing assistance in massively parallel
sequencing and computational infrastructure.
Author Contributions
Conceived and designed the experiments: DK MN JL. Performed the
experiments: MN SS. Analyzed the data: DK JL. Contributed reagents/
materials/analysis tools: L. Egevad EDR FC KC PH L. Eriksson HG.
Contributed to the writing of the manuscript: DK JL MN KC L. Eriksson.
References
1. Chiu RWK, Chan KCA, Gao Y, Lau VYM, Zheng W, et al. (2008) Noninvasive
prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel
genomic sequencing of DNA in maternal plasma. Proceedings of the National
Academy of Sciences 105: 20458–20463. doi:10.1073/pnas.0810641105.
2. Beck J, Urnovitz HB, Riggert J, Clerici M, Schu¨ tz E (2009) Profile of the
circulating DNA in apparently healthy individuals. Clin Chem 55: 730–738.
doi:10.1373/clinchem.2008.113597.
3. Diehl F, Schmidt K, Choti MA, Romans K, Goodman S, et al. (2008)
Circulating mutant DNA to assess tumor dynamics. Nat Med 14: 985–990.
doi:10.1038/nm.1789.
4. Diehl F, Li M, Dressman D, He Y, Shen D, et al. (2005) Detection and
quantification of mutations in the plasma of patients with colorectal tumors. Proc
Natl Acad Sci USA 102: 16368–16373. doi:10.1073/pnas.0507904102.
5. Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, et al. (2010) Development of
Personalized Tumor Biomarkers Using Massively Parallel Sequencing. Science
Translational Medicine 2: 20ra14–20ra14. doi:10.1126/scitranslmed.3000702.
6. Mcbride DJ, Orpana AK, Sotiriou C, Joensuu H, Stephens PJ, et al. (2010) Use
of cancer-specific genomic rearrangements to quantify disease burden in plasma
from patients with solid tumors. Genes Chromosom Cancer 49: 1062–1069.
doi:10.1002/gcc.20815.
7. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, et al. (2013)
Cancer Genome Landscapes. Science (New York, NY) 339: 1546–1558.
doi:10.1126/science.1235122.
8. Dawson S-J, Tsui DWY, Murtaza M, Biggs H, Rueda OM, et al. (2013) Analysis
of Circulating Tumor DNA to Monitor Metastatic Breast Cancer.
N Engl J Med: 130313140010009. doi:10.1056/NEJMoa1213261.
9. Forshew T, Murtaza M, Parkinson C, Gale D, Tsui DWY, et al. (2012)
Noninvasive Identification and Monitoring of Cancer Mutations by Targeted
Deep Sequencing of Plasma DNA. Science Translational Medicine 4: 136ra68–
136ra68. doi:10.1126/scitranslmed.3003726.
10. Chan KCA, Jiang P, Zheng YWL, Liao GJW, Sun H, et al. (2012) Cancer
Genome Scanning in Plasma: Detection of Tumor-Associated Copy Num ber
Aberrations, Single-Nucleotide Variants, and Tumoral Heterogeneity by
Massively Parallel Sequencing. Clin Chem. doi:10.1373/clinchem.2012.196014.
11. Leary RJ, Sausen M, Kinde I, Papadopoulos N, Carpten JD, et al. (2012)
Detection of Chromosomal Alterations in the Circulation of Cancer Patients
with Whole-Genome Sequencing. Science Translational Medicine 4: 162ra154–
162ra154. doi:10.1126/scitranslmed.3004742.
12. Murtaza M, Dawson S-J, Tsui DWY, Gale D, Forshew T, et al. (2013) Non-
invasive analysis of acquired resistance to cancer therapy by sequencing of
plasma DNA. Nature: 1–6. doi:10.1038/nature12065.
13. Bettegowda C, Sausen M, Leary RJ, Kinde I, Wang Y, et al. (2014) Detection of
Circulating Tumor DNA in Early- and Late-Stage Human Malignancies.
Science Translational Medicine 6: 224ra24–224ra24. doi:10.1126/scitranslmed.
3007094.
14. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, et al. (2013)
Mutational heterogeneity in cancer and the search for new cancer-associated
genes. Nature: 1–5. doi:10.1038/nature12213.
15. Lindberg J, Mills IG, Klevebring D, Liu W, Neiman M, et al. (2013) The
Mitochondrial and Autosomal Mutation Landscapes of Prostate Cancer.
European Urology 63: 702–708. doi:10.1016/j.eururo.2012.11.053.
16. Borgstro¨ m E, Lundin S, Lundeberg J (2011) Large scale library generation for
high throughput sequencing. PLoS ONE 6: e19119–e19119. doi:10.1371/
journal.pone.0019119.
17. St John J, editor (n.d.) SeqPrep. Available: https://github.com/jstjohn/SeqPrep.
Accessed 6 March 2014.
18. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-
Wheeler transform. Bioinformatics 25: 1754–1760. doi:10.1093/bioinformatics/
btp324.
19. Depristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2 011) A
framework for variation discovery and genotyping using next-generation DNA
sequencing data. Nat Genet 43: 491–498. doi:10.1038/ng.806.
20. Picard (n.d.) Picard. Available: http://picard.sourceforge.net. Accessed 6 March
2014.
21. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, et al. (2013)
Sensitive detection of somatic point mutations in impure and heterogeneous
cancer samples. Nat Biotechnol 31: 213–219. doi:10.1038/nbt.2514.
22. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. (2009) The Sequence
Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.
doi:10.1093/bioinformatics/btp352.
23. Li H (2011) Improving SNP discovery by base alignment quality. Bioinformatics
27: 1157–1158. doi:10.1093/bioinformatics/btr076.
24. Karolchik D (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids
Res 32: 493D–496. doi:10.1093/nar/gkh103.
25. Team RC, editor (n.d.) R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing. Available: http://www.
R-project.org. Accessed 6 March 2014.
26. Neiman M, Sundling S, Gro¨ nberg H, Hall P, Czene K, et al. (2012) Library
Preparation and Multiplex Capture for Massive Parallel Sequencing Applications
Made Efficient and Easy. PLoS ONE 7: e48616. doi:10.1371/journal.pone.
0048616.g002.
27. Voet T, Kumar P, Van Loo P, Cooke SL, Marshall J, et al. (2013) Single-cell
paired-end genome sequencing reveals structural variation per cell cycle. Nucleic
Acids Res 41: 6119–6138. doi:10.1093/nar/gkt345.
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 9 August 2014 | Volume 9 | Issue 8 | e104417
28. Minoche AE, Dohm JC, Himmelbauer H (2011) Evaluation of genomic high-
throughputsequencing data generated on Illumina HiSeqand Genome Analyzer
systems. Genome Biol 12: R112. doi:10.1186/gb-2011-12-11-r112.
29. Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, et al. (2012) Detection of
ultra-rare mutations by next-generation sequencing. Proceedings of the National
Academy of Sciences: –. doi:10.1073/pnas.1208715109.
30. Navin N, Krasnitz A, Rodgers L, Cook K, Meth J, et al. (2010) Inferring tumor
progression from genomic heterogeneity. Genome Research 20: 68–80.
doi:10.1101/gr.099622.109.
31. Diaz LA, Williams RT, Wu J, Kinde I, Hecht JR, et al. (2012) The molecular
evolution of acquired resistance to targeted EGFR blockade in colorectal
cancers. Nature: 1–4. doi:10.1038/nature11219.
Exome Sequencing to Estimate Tumor Burden in Plasma
PLOS ONE | www.plosone.org 10 August 2014 | Volume 9 | Issue 8 | e104417
... When using commonly available WES technologies, large amounts of circula DNA are needed, which cannot be obtained from adequate volumes of plasma samp In WES based on hybridization, the amount of circulating DNA and the complexity of sequencing library can limit the process since a higher number of PCR cycles is requ when the input material is limited. To address this issue, we utilized the ThruPLEX Prep Kit (Rubicon Genomics, Inc., Ann Arbor, MI, USA) to maximize the yield for libr generation [55][56][57]. cfDNA is usually present at low concentrations and highly f mented, and its abundance depends on the cancer type and stage, as well as the sam treatment prior to analysis [58,59]. Shearing and other fragmentation methods have a c siderable effect on the size distribution within fragments and therefore the results of a ysis. ...
... In WES based on hybridization, the amount of circulating DNA and the complexity of the sequencing library can limit the process since a higher number of PCR cycles is required when the input material is limited. To address this issue, we utilized the ThruPLEX-FD Prep Kit (Rubicon Genomics, Inc., Ann Arbor, MI, USA) to maximize the yield for library generation [55][56][57]. cfDNA is usually present at low concentrations and highly fragmented, and its abundance depends on the cancer type and stage, as well as the sample treatment prior to analysis [58,59]. Shearing and other fragmentation methods have a considerable effect on the size distribution within fragments and therefore the results of analysis. ...
Article
Full-text available
The accurate diagnosis and treatment of oral squamous cell carcinoma (OSCC) requires an understanding of its genomic alterations. Liquid biopsies, especially cell-free DNA (cfDNA) analysis, are a minimally invasive technique used for genomic profiling. We conducted comprehensive whole-exome sequencing (WES) of 50 paired OSCC cell-free plasma with whole blood samples using multiple mutation calling pipelines and filtering criteria. Integrative Genomics Viewer (IGV) was used to validate somatic mutations. Mutation burden and mutant genes were correlated to clinico-pathological parameters. The plasma mutation burden of cfDNA was significantly associated with clinical staging and distant metastasis status. The genes TTN, PLEC, SYNE1, and USH2A were most frequently mutated in OSCC, and known driver genes, including KMT2D, LRP1B, TRRAP, and FLNA, were also significantly and frequently mutated. Additionally, the novel mutated genes CCDC168, HMCN2, STARD9, and CRAMP1 were significantly and frequently present in patients with OSCC. The mutated genes most frequently found in patients with metastatic OSCC were RORC, SLC49A3, and NUMBL. Further analysis revealed that branched-chain amino acid (BCAA) catabolism, extracellular matrix–receptor interaction, and the hypoxia-related pathway were associated with OSCC prognosis. Choline metabolism in cancer, O-glycan biosynthesis, and protein processing in the endoplasmic reticulum pathway were associated with distant metastatic status. About 20% of tumors carried at least one aberrant event in BCAA catabolism signaling that could possibly be targeted by an approved therapeutic agent. We identified molecular-level OSCC that were correlated with etiology and prognosis while defining the landscape of major altered events of the OSCC plasma genome. These findings will be useful in the design of clinical trials for targeted therapies and the stratification of patients with OSCC according to therapeutic efficacy.
... Mutation analysis by liquid biopsy presents several challenges; one of them is the sensitivity of DNA detection, which is affected by cfDNA concentration 32 , background noise rate, ctDNA abundance in plasma, capture e ciency, and requires high sequencing depths for detection 33,34 , in addition to tumor heterogeneity and the presence of variants caused by clonal hematopoiesis 35,36 . Sometimes, detecting these mutations is di cult because the patients still have a small tumor mass, producing low ctDNA rates which results in a negative result. ...
Preprint
Full-text available
Differential presence of exons (DPE) by next generation sequencing (NGS) is a method of interpretation of whole exome sequencing. This method has been proposed to design a predictive and diagnostic algorithm with clinical value in plasma from patients bearing colorectal cancer (CRC). The aim of the present study was to determine a common exonic signature to discriminate between different clinical pictures, such as non-metastatic, metastatic and non-disease (healthy), using a sustainable and novel technology in liquid biopsy. Through DPE analysis, we determined the differences in DNA exon levels circulating in plasma between patients bearing CRC vs. healthy, patients bearing CRC metastasis vs. non-metastatic and patients bearing CRC metastasis vs. healthy comparisons. We identified a set of 510 exons (469 up and 41 down) whose differential presence in plasma allowed us to group and classify between the three cohorts. Random forest classification (machine learning) was performed and an estimated out-of-bag (OOB) error rate of 35.9% was obtained and the predictive model had an accuracy of 75% with a confidence interval (CI) of 56.6–88.5. In conclusion, the DPE analysis allowed us to discriminate between different patho-physiological status such as metastatic, non-metastatic and healthy donors. In addition, this analysis allowed us to obtain very significant values with respect to previous published results, since we increased the number of samples in our study. These results suggest that circulating DNA in patient’s plasma may be actively released by cells and may be involved in intercellular communication and, therefore, may play a pivotal role in malignant transformation (genometastasis).
... Second, to acquire the tissue sample of lung tumor is sometimes challenging and thus using liquid biopsy might be elegant option how to analyse the tumor DNA. The main limitations of TMB assessment from liquid biopsy is the low amount of available ctDNA, which affects sensitivity and concerns that ctDNA is more associated with metastases than with primary tumors [17][18][19]. To avoid this issue we enrolled patients with localised NSCLC, thus the bTMB result won't be affected by the presence of metastases. ...
Article
Full-text available
Immunotherapy has dramatically influenced and changed therapeutical approach in non-small cell lung cancer (NSCLC) in recent five years. Even though we can reach long-term response to this treatment in approximately 20% of patients with NSCLC, we are still not able to identify this cohort of patients based on predictive biomarkers. In our study we have focused on tumor mutation burden (TMB), one of the potential biomarkers which could predict effectiveness of check-point inhibitors, but has several limitations, especially in multiple approaches to TMB quantification and ununiform threshold. We determined the value of TMB in tumor tissue (tTMB) and blood (bTMB) in 20 patients with early stage NSCLC using original custom gene panel LMB_TMB1. We evaluated various possibilities of TMB calculation and concluded that TMB should be counted from both somatic non-synonymous and synonymous mutations. Considering various factors, we established cut-offs of tTMB in/excluding HLA genes as ≥22 mut/Mb and 12 mut/Mb respectively, and cut-offs of bTMB were defined as ≥21 mut/Mb and ≥5 mut/Mb, respectively. We also observed trend in correlation of somatic mutations in HLA genes with overall survival of patients.
... Sensitivity of cfDNA based mutation detection assays may be aided by an improvement of amplification efficiency. Plasma cfDNA is known to be highly fragmented (Fleischhacker et al., 2011;Klevebring et al., 2014; Figure 2B). Therefore, it is commonly recognized that an increase in length of PCR amplicons may result in the elimination of a majority of the extracted DNA fragments as possible templates. ...
... BTMB, however, seemed not. Previous literature [44] is searched to discover one important reason for bTMB's inadequate predictive capability compared to tTMB as it demands a minimal amount of ctDNA. The tumors must shed DNA into the blood, if ctDNA could be detected in the blood for optimal assay performance. ...
Article
Full-text available
Background: Nonsmall cell lung cancer (NSCLC) is the most common type of lung cancer, and the majority of NSCLC patients are diagnosed at the advanced stage. Chemotherapy is still the main treatment at present, and the overall prognosis is poor. In recent years, immunotherapy has developed rapidly. Immune checkpoint inhibitors (ICIs) as the representative have been extensively applied for treating various types of cancers. Tumor mutation burden (TMB) as a potential biomarker is used to screen appropriate patients for treatment of ICIs. To verify the predictive efficacy of TMB, a systematic review and meta-analysis were conducted to explore the association between TMB and ICIs. Method: PubMed, EMBASE, Cochrane Library, and son on were systematically searched from inception to April 2020. Objective response rate (ORR), progression-free survival (PFS), and overall survival (OS) were estimated. Results: A total of 11 studies consisting of 1525 nonsmall cell lung cancer (NSCLC) patients were included. Comparison of high and low TMB: pooled HRs for OS, 0.57 (95% CI 0.32 to 0.99; P = 0.046); PFS, 0.48 (95% CI 0.33 to 0.69; P < 0.001); ORR, 3.15 (95% CI 2.29 to 4.33; P < 0.001). Subgroup analysis values: pooled HRs for OS, 0.75 (95% CI 0.29 to 1.92, P = 0.548) for blood TMB (bTMB), 0.44 (95% CI 0.26 to 0.75, P = 0.003) for tissue TMB (tTMB); for PFS, 0.54 (95% CI 0.29 to 0.98, P = 0.044) and 0.43 (95% CI 0.26 to 0.71, P = 0.001), respectively. Conclusions: These findings imply that NSCLC patients with high TMB possess significant clinical benefits from ICIs compared to those with low TMB. As opposed to bTMB, tTMB was thought more appropriate for stratifying NSCLC patients for ICI treatment.
... Thirdly, the sensitivity of cfDNA sequencing was dependent on sequencing depth. 10 High sequencing depth will help to find rare mutations in tissue DNA. In this study, we acquired 100 sequencing depth, so the mutation with less than 1% AF is unable to be identified theoretically. ...
... It is a minimally invasive technique to monitor many diseases of the CNS and has fewer complications, such as intracranial hematoma, compared with surgery. Recently, some studies have demonstrated that tumor mutations are detectable in CSF of patients with various primary and metastatic brain tumors, and ctDNA in CSF recapitulates the genomic landscape of brain tumor better than plasma and precipitates of CSF [24,25]. ...
Article
Objective: The 2016 World Health Organization (WHO) Classification of Tumors of the Central Nervous System (CNS) was revised to include molecular biomarkers as diagnostic criteria. However, conventional biopsies of gliomas were spatially and temporally limited. This study aimed to determine whether circulating tumor DNA (ctDNA) from cerebrospinal fluid (CSF) could provide more comprehensive diagnostic information to gliomas. Methods: Combined with clinical data, we analyzed gene alterations from CSF and tumor tissues of newly diagnosed patients, and detected mutations of ctDNA in recurrent patients. We simultaneously analyzed mutations of ctDNA in different glioma subtypes, and in lower-grade gliomas (LrGG) versus glioblastoma multiforme (GBM). Results: CSF ctDNA mutations had high concordance rates with tumor DNA (tDNA). CSF ctDNA mutations of PTEN and TP53 were commonly detected in recurrent gliomas patients. IDH mutation was detected in most of CSF ctDNA derived from IDH-mutant diffuse astrocytomas, while CSF ctDNA mutations of RB1 and EGFR were found in IDH-wild-type GBM. IDH mutation was detected in LrGG, whereas Rb1 mutation was more commonly detected in GBM. Conclusions: CSF ctDNA detection can be an alternative method as liquid biopsy in gliomas.
Article
Full-text available
Ovarian cancer is the fifth leading cause of cancer-related mortality in women worldwide. Despite the development of technologies over decades to improve the diagnosis and treatment of patients with ovarian cancer, the survival rate remains dismal, mainly because most patients are diagnosed at a late stage. Traditional treatment methods and biomarkers such as cancer antigen-125 as a cancer screening tool lack specificity and cannot offer personalized combinatorial therapy schemes. Circulating tumor DNA (ctDNA) is a promising biomarker for ovarian cancer and can be detected using a noninvasive liquid biopsy. A wide variety of ctDNA applications are being elucidated in multiple studies for tracking ovarian carcinoma during diagnostic and prognostic evaluations of patients and are being integrated into clinical trials to evaluate the disease. Furthermore, ctDNA analysis may be used in combination with multiple “omic” techniques to analyze proteins, epigenetics, RNA, nucleosomes, exosomes, and associated immune markers to promote early detection. However, several technical and biological hurdles impede the application of ctDNA analysis. Certain intrinsic features of ctDNA that may enhance its utility as a biomarker are problematic for its detection, including ctDNA lengths, copy number variations, and methylation. Before the development of ctDNA assays for integration in the clinic, such issues are required to be resolved since these assays have substantial potential as a test for cancer screening. This review focuses on studies concerning the potential clinical applications of ctDNA in ovarian cancer diagnosis and discusses our perspective on the clinical research aimed to treat this daunting form of cancer.
Thesis
Les cancers pédiatriques représentent un défi thérapeutique, et une compréhension des mécanismes d’échappement aux traitements est nécessaire pour pouvoir développer de nouvelles approches thérapeutiques. L'ADN circulant peut être libéré par une tumeur dans les fluides corporels et permet de détecter et suivre des altérations génétiques tumorales par des prélèvements successifs peu invasifs tels qu'une prise de sang. Dans ce travail, une technique de séquençage de type « whole exome » (séquençage de l’ensemble des exons) de l’ADN circulant a été mis au point pour permettre l’étude des altérations génétiques tumorales dans le plasma chez des enfants atteints de cancer.Ces analyses soulignent la grande hétérogénéité génétique spatiale et temporelle des cancers pédiatriques. De plus, un rôle important de l’évolution clonale dans la progression de la maladie a ainsi pu être mis en évidence. Des approches utilisant les caractéristiques particuliers de l'ADN circulant ont également permis d’inférer le profil d’expression, basées sur l’empreinte des sites de début de transcription, ou le profil épigénétique de la tumeur. En plus d'une aide à une classification des tumeurs, ces particularités pourront aider à l'observation d'un changement d'identité cellulaire sous traitement. L'ADN circulant est donc un formidable outil pour mieux comprendre l'échappement aux traitements d'une tumeur par son hétérogénéité spatiale et temporelle et sa plasticité.
Article
Full-text available
The development of noninvasive methods to detect and monitor tumors continues to be a major challenge in oncology. We used digital polymerase chain reaction-based technologies to evaluate the ability of circulating tumor DNA (ctDNA) to detect tumors in 640 patients with various cancer types. We found that ctDNA was detectable in >75% of patients with advanced pancreatic, ovarian, colorectal, bladder, gastroesophageal, breast, melanoma, hepatocellular, and head and neck cancers, but in less than 50% of primary brain, renal, prostate, or thyroid cancers. In patients with localized tumors, ctDNA was detected in 73, 57, 48, and 50% of patients with colorectal cancer, gastroesophageal cancer, pancreatic cancer, and breast adenocarcinoma, respectively. ctDNA was often present in patients without detectable circulating tumor cells, suggesting that these two biomarkers are distinct entities. In a separate panel of 206 patients with metastatic colorectal cancers, we showed that the sensitivity of ctDNA for detection of clinically relevant KRAS gene mutations was 87.2% and its specificity was 99.2%. Finally, we assessed whether ctDNA could provide clues into the mechanisms underlying resistance to epidermal growth factor receptor blockade in 24 patients who objectively responded to therapy but subsequently relapsed. Twenty-three (96%) of these patients developed one or more mutations in genes involved in the mitogen-activated protein kinase pathway. Together, these data suggest that ctDNA is a broadly applicable, sensitive, and specific biomarker that can be used for a variety of clinical and research purposes in patients with multiple different types of cancer.
Article
Full-text available
Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.
Article
Full-text available
The nature and pace of genome mutation is largely unknown. Because standard methods sequence DNA from populations of cells, the genetic composition of individual cells is lost, de novo mutations in cells are concealed within the bulk signal and per cell cycle mutation rates and mechanisms remain elusive. Although single-cell genome analyses could resolve these problems, such analyses are error-prone because of whole-genome amplification (WGA) artefacts and are limited in the types of DNA mutation that can be discerned. We developed methods for paired-end sequence analysis of single-cell WGA products that enable (i) detecting multiple classes of DNA mutation, (ii) distinguishing DNA copy number changes from allelic WGA-amplification artefacts by the discovery of matching aberrantly mapping read pairs among the surfeit of paired-end WGA and mapping artefacts and (iii) delineating the break points and architecture of structural variants. By applying the methods, we capture DNA copy number changes acquired over one cell cycle in breast cancer cells and in blastomeres derived from a human zygote after in vitro fertilization. Furthermore, we were able to discover and fine-map a heritable inter-chromosomal rearrangement t(1;16)(p36;p12) by sequencing a single blastomere. The methods will expedite applications in basic genome research and provide a stepping stone to novel approaches for clinical genetic diagnosis.
Article
Cancers acquire resistance to systemic treatment as a result of clonal evolution and selection. Repeat biopsies to study genomic evolution as a result of therapy are difficult, invasive and may be confounded by intra-tumour heterogeneity. Recent studies have shown that genomic alterations in solid cancers can be characterized by massively parallel sequencing of circulating cell-free tumour DNA released from cancer cells into plasma, representing a non-invasive liquid biopsy. Here we report sequencing of cancer exomes in serial plasma samples to track genomic evolution of metastatic cancers in response to therapy. Six patients with advanced breast, ovarian and lung cancers were followed over 1-2 years. For each case, exome sequencing was performed on 2-5 plasma samples (19 in total) spanning multiple courses of treatment, at selected time points when the allele fraction of tumour mutations in plasma was high, allowing improved sensitivity. For two cases, synchronous biopsies were also analysed, confirming genome-wide representation of the tumour genome in plasma. Quantification of allele fractions in plasma identified increased representation of mutant alleles in association with emergence of therapy resistance. These included an activating mutation in PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha) following treatment with paclitaxel; a truncating mutation in RB1 (retinoblastoma 1) following treatment with cisplatin; a truncating mutation in MED1 (mediator complex subunit 1) following treatment with tamoxifen and trastuzumab, and following subsequent treatment with lapatinib, a splicing mutation in GAS6 (growth arrest-specific 6) in the same patient; and a resistance-conferring mutation in EGFR (epidermal growth factor receptor; T790M) following treatment with gefitinib. These results establish proof of principle that exome-wide analysis of circulating tumour DNA could complement current invasive biopsy approaches to identify mutations associated with acquired drug resistance in advanced cancers. Serial analysis of cancer genomes in plasma constitutes a new paradigm for the study of clonal evolution in human cancers.
Article
Over the past decade, comprehensive sequencing efforts have revealed the genomic landscapes of common forms of human cancer. For most cancer types, this landscape consists of a small number of “mountains” (genes altered in a high percentage of tumors) and a much larger number of “hills” (genes altered infrequently). To date, these studies have revealed ~140 genes that, when altered by intragenic mutations, can promote or “drive” tumorigenesis. A typical tumor contains two to eight of these “driver gene” mutations; the remaining mutations are passengers that confer no selective growth advantage. Driver genes can be classified into 12 signaling pathways that regulate three core cellular processes: cell fate, cell survival, and genome maintenance. A better understanding of these pathways is one of the most pressing needs in basic cancer research. Even now, however, our knowledge of cancer genomes is sufficient to guide the development of more effective approaches for reducing cancer morbidity and mortality.
Article
Background: The management of metastatic breast cancer requires monitoring of the tumor burden to determine the response to treatment, and improved biomarkers are needed. Biomarkers such as cancer antigen 15-3 (CA 15-3) and circulating tumor cells have been widely studied. However, circulating cell-free DNA carrying tumor-specific alterations (circulating tumor DNA) has not been extensively investigated or compared with other circulating biomarkers in breast cancer. Methods: We compared the radiographic imaging of tumors with the assay of circulating tumor DNA, CA 15-3, and circulating tumor cells in 30 women with metastatic breast cancer who were receiving systemic therapy. We used targeted or whole-genome sequencing to identify somatic genomic alterations and designed personalized assays to quantify circulating tumor DNA in serially collected plasma specimens. CA 15-3 levels and numbers of circulating tumor cells were measured at identical time points. Results: Circulating tumor DNA was successfully detected in 29 of the 30 women (97%) in whom somatic genomic alterations were identified; CA 15-3 and circulating tumor cells were detected in 21 of 27 women (78%) and 26 of 30 women (87%), respectively. Circulating tumor DNA levels showed a greater dynamic range, and greater correlation with changes in tumor burden, than did CA 15-3 or circulating tumor cells. Among the measures tested, circulating tumor DNA provided the earliest measure of treatment response in 10 of 19 women (53%). Conclusions: This proof-of-concept analysis showed that circulating tumor DNA is an informative, inherently specific, and highly sensitive biomarker of metastatic breast cancer. (Funded by Cancer Research UK and others.).
Article
Detection of somatic point substitutions is a key step in characterizing the cancer genome. However, existing methods typically miss low-allelic-fraction mutations that occur in only a subset of the sequenced cells owing to either tumor heterogeneity or contamination by normal cells. Here we present MuTect, a method that applies a Bayesian classifier to detect somatic mutations with very low allele fractions, requiring only a few supporting reads, followed by carefully tuned filters that ensure high specificity. We also describe benchmarking approaches that use real, rather than simulated, sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.