ArticlePDF Available

Amplicon Sequencing of Colorectal Cancer: Variant Calling in Frozen and Formalin-Fixed Samples

Authors:

Abstract and Figures

Next generation sequencing (NGS) is an emerging technology becoming relevant for genotyping of clinical samples. Here, we assessed the stability of amplicon sequencing from formalin-fixed paraffin-embedded (FFPE) and paired frozen samples from colorectal cancer metastases with different analysis pipelines. 212 amplicon regions in 48 cancer related genes were sequenced with Illumina MiSeq using DNA isolated from resection specimens from 17 patients with colorectal cancer liver metastases. From ten of these patients, paired fresh frozen and routinely processed FFPE tissue was available for comparative study. Sample quality of FFPE tissues was determined by the amount of amplifiable DNA using qPCR, sequencing libraries were evaluated using Bioanalyzer. Three bioinformatic pipelines were compared for analysis of amplicon sequencing data. Selected hot spot mutations were reviewed using Sanger sequencing. In the sequenced samples from 16 patients, 29 non-synonymous coding mutations were identified in eleven genes. Most frequent were mutations in TP53 (10), APC (7), PIK3CA (3) and KRAS (2). A high concordance of FFPE and paired frozen tissue samples was observed in ten matched samples, revealing 21 identical mutation calls and only two mutations differing. Comparison of these results with two other commonly used variant calling tools, however, showed high discrepancies. Hence, amplicon sequencing can potentially be used to identify hot spot mutations in colorectal cancer metastases in frozen and FFPE tissue. However, remarkable differences exist among results of different variant calling tools, which are not only related to DNA sample quality. Our study highlights the need for standardization and benchmarking of variant calling pipelines, which will be required for translational and clinical applications.
Content may be subject to copyright.
RESEARCH ARTICLE
Amplicon Sequencing of Colorectal Cancer:
Variant Calling in Frozen and Formalin-Fixed
Samples
Johannes Betge
1,2
*, Grainne Kerr
1
, Thilo Miersch
1
, Svenja Leible
1
, Gerrit Erdmann
1
,
Christian L. Galata
3
, Tianzuo Zhan
1,2
, Timo Gaiser
4
, Stefan Post
3
, Matthias P. Ebert
2
,
Karoline Horisberger
3
, Michael Boutros
1
*
1Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department
of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany,
2Department of Medicine II, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg
University, Mannheim, Germany, 3Department of Surgery, University Hospital Mannheim, Medical Faculty
Mannheim, Heidelberg University, Mannheim, Germany, 4Institue of Pathology, University Hospital
Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
*j.betge@dkfz.de;m.boutros@dkfz.de
Abstract
Next generation sequencing (NGS) is an emerging technology becoming relevant for geno-
typing of clinical samples. Here, we assessed the stability of amplicon sequencing from for-
malin-fixed paraffin-embedded (FFPE) and paired frozen samples from colorectal cancer
metastases with different analysis pipelines. 212 amplicon regions in 48 cancer related
genes were sequenced with Illumina MiSeq using DNA isolated from resection specimens
from 17 patients with colorectal cancer liver metastases. From ten of these patients, paired
fresh frozen and routinely processed FFPE tissue was available for comparative study.
Sample quality of FFPE tissues was determined by the amount of amplifiable DNA using
qPCR, sequencing libraries were evaluated using Bioanalyzer. Three bioinformatic pipe-
lines were compared for analysis of amplicon sequencing data. Selected hot spot mutations
were reviewed using Sanger sequencing. In the sequenced samples from 16 patients, 29
non-synonymous coding mutations were identified in eleven genes. Most frequent were mu-
tations in TP53 (10), APC (7), PIK3CA (3) and KRAS (2). A high concordance of FFPE and
paired frozen tissue samples was observed in ten matched samples, revealing 21 identical
mutation calls and only two mutations differing. Comparison of these results with two other
commonly used variant calling tools, however, showed high discrepancies. Hence, ampli-
con sequencing can potentially be used to identify hot spot mutations in colorectal cancer
metastases in frozen and FFPE tissue. However, remarkable differences exist among re-
sults of different variant calling tools, which are not only related to DNA sample quality. Our
study highlights the need for standardization and benchmarking of variant calling pipelines,
which will be required for translational and clinical applications.
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 1/18
OPEN ACCESS
Citation: Betge J, Kerr G, Miersch T, Leible S,
Erdmann G, Galata CL, et al. (2015) Amplicon
Sequencing of Colorectal Cancer: Variant Calling in
Frozen and Formalin-Fixed Samples. PLoS ONE
10(5): e0127146. doi:10.1371/journal.pone.0127146
Academic Editor: Jeong-Sun Seo, Seoul National
University College of Medicine, REPUBLIC OF
KOREA
Received: January 10, 2015
Accepted: April 13, 2015
Published: May 26, 2015
Copyright: © 2015 Betge et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are
credited.
Data Availability Statement: All relevant data are
available via the European Nucleotide Archive (ENA)
under accession number PRJEB8754.
Funding: JB has been supported by a fellowship
from the Hartmut-Hoffmann-Berling International
Graduate School (HBIGS).
Competing Interests: The authors have declared
that no competing interests exist.
Introduction
Due to recent advances in deep sequencing technologies, remarkable insights have been gained
on the alterations acquired by colorectal cancer (CRC) genomes during the carcinogenic process,
largely expanding our view on CRC genomic progression [13]. The promise that after structural
characterization of cancer genomes, clinical decision-making would be guided by individual ge-
nomic tumor profiles, however, remains to be fulfilled. Nevertheless, the development of novel
targeted therapeutics highlights the need for reliable and cost effective methods for molecular
characterization of cancer genomes to identify patients that ultimately respond to treatment on
the basis of druggable mutations, predictive alterations or acquired resistance markers.
Targeted sequencing based on PCR amplicons represents a feasible approach for evaluation
of actionable mutations, mutational hot spots or predictive alterations in cancer genomes for
clinical studies. Compared to genome-wide or exome-wide sequencing, a high depth of se-
quencing (>1000 reads) at the genomic loci of interest can be reached, thus facilitating detec-
tion of low-frequency variants in heterogeneous tumor samples admixed with stromal cells
[4,5]. Moreover, due to the comparably low number of base pairs to be sequenced per patient,
multiple samples, also for longitudinal analysis, can be analyzed in parallel on bench-top ma-
chines such as Illumina MiSeq, lowering costs and potentially allowing routine clinical applica-
tion in the near future.
However, for clinical application and for translational studies on archived clinical samples,
many problems remain to be solved. Most widely available specimens for clinical diagnostics
and biomarker studies are formalin-fixed, paraffin-embedded (FFPE) tissues from pathology
archives, as their long-term storage is relatively simple and cost efficient compared to frozen
material. However, it is known that formalin fixation leads to covalent linking of DNA, RNA
and protein by methylene bridges, deamination and oxidation reactions, formation of cyclic
base derivatives and also to DNA fragmentation [6]. These DNA alterations hamper sequenc-
ing technologies leading to less robust results and difficulties in interpreting data from se-
quencing experiments. Furthermore, a gold standard method for analysis of next-generation
sequencing (NGS) data is lacking and quality assurance programs are not launched yet. Differ-
ent bioinformatic analysis tools and pipelines have been developed for NGS data. However, it
appears that reproducibility between them needs to be improved [7]. Moreover, statistical
models for variant discovery and variant evaluation, designed for whole-exome or whole-ge-
nome data consisting of many samples with low coverage, may not be optimal for small ampli-
con datasets with few targeted regions. Thus, there is no generally accepted standard on how to
perform variant calling on amplicon sequencing data. These problems highlight the need for
sample preparation and data analysis pipelines optimized for amplicon sequencing of clinical
samples.
In this study, we describe an experimental and bioinformatic pipeline for amplicon sequenc-
ing of clinical fresh frozen and FFPE samples from CRC. Special focus is drawn on preparation
of sequencing libraries from low-quality FFPE samples. The bioinformatics pipeline, using an
adapted Genome Analysis Toolkit (GATK) Unified Genotyper, is explained in detail and com-
pared with other commonly used variant calling methods with respect to their suitability for
amplicon sequencing using FFPE material.
Materials and Methods
Patients
Thirty-three samples from 17 patients who underwent resection of liver metastasis of CRC
in the Department of Surgery, University Hospital Mannheim, between February 2012 and
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 2/18
February 2013 were included in this study. For all of these patients, either fresh frozen or for-
malin-fixed paraffin-embedded (FFPE) tissue was used for DNA isolation. From 10 patients,
paired frozen and FFPE tissue was available for study and from 5 patients, matched primary tu-
mors could be obtained from the archives of the Institute of Pathology, University Hospital
Mannheim. Additionally, one matched primary-metastasis pair from a neuroendocrine carci-
noma of the small bowel (Pat05), primary culture material from one patient (Pat16), material
from a prostate cancer patient and cell lines DLD-1, HCT116, HT55, HUH7, HEK293T, HS68
and SW480 were included in sequencing runs and analysis for other projects or as controls.
Samples were analyzed in two sequencing runs, one patient (Pat13) was analyzed in both runs
as control. All cell lines were obtained from ATCC. Information about patients can be found in
S1 Table.
Ethics approval
Ethics board approval was obtained from the Medical Ethics Commission II of the Medical
Faculty Mannheim, Heidelberg University, Mannheim, Germany (No. 2012-293N-MA, 2013-
841R-MA, 2014-551N-MA). Written informed consent from the donors of tissue samples was
obtained for the use in research.
Sample preparation
Frozen samples and cell lines. Samples from hepatic metastases from CRC patients were
transported in RPMI cell culture medium and were snap frozen on dry ice and subsequently
stored at -80°C. DNA isolation was done with the Qiagen DNeasy Blood & Tissue Kit (Qiagen,
Hilden, Germany) according to the manufacturers recommendations, including RNAse diges-
tion (Fig 1A). Cell lines were pelleted and DNA was isolated with the same protocol. Extracted
DNA was diluted and directly used for preparation of sequencing libraries.
FFPE samples. Tissue from hepatic metastases had been fixed in formalin und embedded
in paraffin during routine pathological work-up. Suitable blocks were chosen and five 10μm
slices were used for DNA extraction without microdissection. A slide stained with haematoxy-
lin and eosin (H&E) from each block was used to estimate the tumor cell content of the corre-
sponding slices by two investigators (TG and JB) using a double-headed microscope. DNA was
isolated using the Qiagen QIAamp DNA FFPE Kit according to the manufacturers instruc-
tions. DNA was eluted in 40μl Buffer ATE and concentrations were measured with NanoDrop
2000 (NanoDrop, Wilmington, USA) and Qubit BR kit (Life Technologies, Darmstadt, Ger-
many). Isolation yielded between 4.8μg and 22.8μg (mean 10.23μg) when measured with the
Qubit BR kit. Detailed information about preparation of FFPE samples can be found in S2
Table.
Library Preparation
DNA quality of FFPE samples was evaluated by determining the amount of amplifiable DNA
using the FFPE QC PCR (Illumina, San Diego, USA) according to the manufacturers recom-
mendations. Mean ΔCq-value of all FFPE samples was 2.0 (Median 1.9, Min 0.9, Max 4.1).
Nine samples (47%) had a ΔCq value higher than the recommended 2.0 (S2 Table). TruSeq
Amplicon Cancer Panel (Cat. No. FC-130-1008, Illumina) libraries were prepared with recom-
mended DNA amounts (150ng for fresh frozen material and cell lines, 250ng for FFPE sam-
ples). The panel includes 212 amplicons of 170190bp lenght, targeting mutational hot spots in
48 cancer related genes. Amplicon regions are depicted in S3 Table.
Bioanalyzer (Agilent Technologies, Böblingen, Germany) was used to confirm successful li-
brary amplification and library quality of FFPE samples by assessing concentration of DNA
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 3/18
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 4/18
with aspired size (~310bp) and short DNA fragments (<150bp). To compare amounts of DNA
within the desired size region, the concentration of DNA amplicons in the range of 250450bp
was calculated. Concentration of DNA with a size between 250bp and 450bp varied greatly be-
tween 51.7 and 93831.9 pg/μl (mean 5675.1 pg/μl, median 672.2 pg/μl) within the libraries of
different samples and inversely correlated with ΔCq values (Spearmans Coefficient: -0.805, Fig
1B,S2 Table). For the samples with low DNA concentrations at the 310bp amplicon, library
preparation was repeated using highest possible DNA amounts (S1 Fig,S2 Table). Bioanalyzer
revealed higher concentrations of DNA around 250450bp (365.3 pg/μl5669.8 pg/μl; mean
6190.9 pg/μl; median 1996.3 pg/μl), however, with significant background of short DNA
fragments. After PCR clean-up of libraries, short DNA fragments were reduced, but three sam-
ples also showed diminished amounts of the 310bp amplicon and were thus excluded from
sequencing.
Data processing
Bioinformatic analysis pipeline is shown in Fig 2A. Reads were aligned against hg19 reference
genome using the BWA algorithm implemented the MiSeq software (MiSeq Reporter v2.2.29).
Fig 1. Depth of Sequencing correlates with DNA quality. (A) Sample preparation workflow. DNA was isolated from fresh frozen or FFPE CRC liver
metastasis resection specimens with Qiagen Blood and Tissue or FFPE kit, respectively. Frozen samples then directly underwent sequencing library
preparation, pooling of libraries, quality control and sequencing. FFPE samples were additionally tested for DNA quality by qPCR. Library quality was tested
with Bioanalyzer. For samples with low amounts of correctly sized DNA amplicons (fragments at 310bp), new libraries wereprepared with higher starting
DNA concentrations and re-analyzed with Bioanalyzer. Samples with yet low amounts of DNA with correct size and highly fragmented DNA were excluded.
(B) ΔCq-values of quality control PCR indicate poor sample quality. DNA concentration of fragments between 250bp and 450bp after library preparation was
calculated with Agilent Bioanalyzer and plotted against ΔCq values of FFPE quality control PCR. (C) higher ΔCq-values correlate with lower mean depth of
sequencing. (D) Coverage distribution of amplicons from all paired FFPE and frozen samples, normalizedto total sample coverage. Frozen samples had a
mean depth of 4,622, FFPE samples 1,852.
doi:10.1371/journal.pone.0127146.g001
Fig 2. Amplicon Sequencing identifies hot-spot mutations in CRC metastases. (A) Sequencing analysis workflow. Sequence alignment files underwent
local-realignment around Indels, left alignment and base quality score recalibration. After variant calling with GATK Unified Genotyper, annotation and effect
prediction of detected variants was done using SnpEff. Raw variants of all samples were filtered by custom parameters with SnpSift. Variants included in the
1000 Genomes Project data were excluded to only obtain somatic mutations in cancer. (B) High frequency of TP53 and APC mutations among somatic
mutations identified in CRC liver metastases (frozen and FFPE tissue). Colored fields represent presence of a nonsynonymous coding SNP (blue), a
mutation leading to a stop-codon (grey) or a frameshift mutation (orange). Bars sum up mutations present in each patient (vertical bars) or each mutated
gene (horizontal bars). Of note, some genes contain more than one mutation.
doi:10.1371/journal.pone.0127146.g002
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 5/18
BAM files were quality-checked with FASTQC (v.0.9.5; http://www.bioinformatics.babraham.
ac.uk/projects/fastqc/). Indels in sequence alignment files were left-aligned and local realign-
ment around Indels was done with the RealignerTargetCreator and the IndelRealigner tools
from the Genome Analysis Toolkit (GATK, version 2.49) [8]. Base quality score recalibration
was performed. Duplicate mapping and marking was not deemed suitable for amplicon se-
quencing and thus omitted.
Unified Genotyper pipeline
Variant calling. Unified Genotyper from the GATK (version 2.49) was used for variant
calling. All samples were processed in parallel and split into individual variant files for each
sample after variant calling. Maximum coverage per locus was increased from the default 250
to 9,000,000 to take into account the high depth of amplicon sequencing. (Downsampling to
lower depth is done in whole-exome studies to increase speed by saving memory). The mini-
mum confidence threshold for calling was set to 10, the minimum confidence threshold for
emitting to 30. SNPs and Indels were evaluated simultaneously. A region list of all amplicons
was used to define regions for single nucleotide polymorphism (SNP) and Indel calling to in-
crease analysis speed. As an alternative, the Unified Genotyper pipeline was used by processing
each sample individually, otherwise the same parameters were used.
Variant annotation and effect prediction. SnpEff (version 2.0.5, http://snpeff.
sourceforge.net/)[9] was used for variant annotation and effect prediction and the GATK Var-
iantAnnotator tool was run with theA SnpEff option to add the SnpEff annotations with the
highest biological significance for each variant to the variant calling format (vcf) files. Subse-
quently, the vcf file with information about all sequenced samples was split into individual
sample variant files using the GATK SelectVariants program. Variants were annotated with the
variant frequencies in the 1000 genomes project using the SnpSift (http://snpeff.sourceforge.
net/SnpSift.html) annotate feature [9].
Variant filtering. SnpSift from the SnpEff package was used for filtering of raw variants.
The following quality-filter criteria were applied: quality by depth greater than 0.8 (QD >0.8),
total depth for calling variants at a specific locus greater than 200 (DP >200), Fisher strand
(Phred-scaled p-value using Fisher's Exact Test to detect strand bias) smaller than 70 (FS <
70), minimum variant confidence greater than 1500 (QUAL >1500), mapping quality greater
than 40 (MQ >40) and mapping quality rank sum test higher than -15 (! exists MQRankSum |
MQRankSum >-15). Filter criteria had been optimized by explorative analysis. Moreover,
only the coding variants were selected with the following expressions: (SNPEFF_EFFECT =
'NON_SYNONYMOUS_CODING') | (SNPEFF_EFFECT = 'CODON_CHANGE_PLUS_
CODON_DELETION') | (SNPEFF_EFFECT = 'CODON_DELETION') | (SNPEFF_EFFECT =
'FRAME_SHIFT') | (SNPEFF_EFFECT = 'STOP_GAINED')). All variants present in the 1000
Genomes data were excluded to obtain only somatic mutation data and exclude common
germline variants. Variant recalibration was not done due to the nature of targeted sequencing
data and the relatively small dataset.
SAMtools mpileup/BCF-tools pipeline
SAMtools (version 0.1.18) mpileup was used to generate raw variant calls with theu (generate
uncompress BCF output),f (faidx indexed reference sequence file),-D (output per-sample
DP),-S (output per-sample strand bias P-value) options and hg19 as reference genome, pro-
cessing all samples in parallel. Maximum per-sample depth for Indel and SNP calling was set
to 10,000. Bcftools view withbvcg options (output BCF file format, output potential variant
sites only, call SNPs, call genotypes at variant sites) was used for variant calling. Data were
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 6/18
processed and variants were annotated as for GATK data described above. Variants at loci with
a depth of less than 50 were filtered out, as well as all non-coding variants and all variants pres-
ent in the 1000G data.
Illumina Somatic Variant Caller pipeline
MiSeq on-board software Somatic Variant Caller was run with default parameters. Vcf files
containing variant information were downloaded from Basespace. Subsequently, they were an-
notated with 1000G variant frequencies. All non-coding, silent, synonymous and unknown
variants were filtered out, as well as all variants present in 1000G data. Moreover, all variants at
a locus with coverage of <200, variants with a variant frequency <0.05 or with a genotype
quality less than 100 were excluded.
Data analysis and visualization
Filtered variants were exported from variant files into tab-delimited files using SnpSift and
concatenated into a single tab-delimited file including all variants of all patients. Descriptive
statistics and data visualization was performed using Microsoft Excel and R packages (http://
www.r-project.org/). Venn diagrams were made using venny (http://bioinfogp.cnb.csic.es/
tools/venny/index.html) and jvenn [10]. The Integrative Genomics Viewer was used for analy-
sis and visualization of specific mutated loci [11].
The amplicon sequenceing data of all samples were deposited in the European Nucleotide
Archive (ENA) and can be accessed with accession number PRJEB8754.
Sanger sequencing
Sanger sequencing was performed to evaluate KRAS exon 2 and BRAF exon 15 statuses as de-
scribed here [12]. Briefly, genomic DNA was extracted from FFPE tumor tissue after manual
macro-dissection using the QIAamp DNA Micro kit (Qiagen, Hilden, Germany). The following
PCR primers were used for amplification: 5-AACACATTTCAAGCCCCAAA-30(BRAF-F), 50-
GAAACTGGTTTCAAAATATTCGTT-30(BRAF-R), 50-AGGCCTGCTGAAAATGACTGAA
TA-30(KRAS-F), 50-CTGTATCAAAGAATGGTCCTGCAC-30(KRAS-R), 50-
Thermal cycling conditions were 5 min at 94°C, followed by 35 cycles of 94°C for 30 sec-
onds, 53°C (BRAF) or 60°C (KRAS) for 30 seconds and 72°C for 30 seconds followed by a final
incubation at 72°C for 7 minutes. After dye-terminator sequencing using the PCR amplifica-
tion primers, analyses by capillary electrophoresis were performed on a 3130 Genetic Analyzer
(Applied Biosystems, Foster City, CA).
Results
Depth of sequencing correlates with DNA quality
We sequenced 212 amplicon regions in 48 cancer related genes with Illumina MiSeq using
DNA isolated from resection specimens from 17 patients with CRC liver metastases. From ten
of these patients, paired fresh frozen and routinely processed FFPE tissue was available for
comparative study. Sequencing statistics and DNA quality measurements were analyzed to
evaluate differences of FFPE and frozen material (Fig 1A).
The number of paired reads and paired reads mapped was significantly higher in frozen
samples compared with FFPE samples, however the percentage of mapped/raw reads was only
78% compared with 96% in FFPE (Table 1). Mean sequencing quality (Phred score 38 vs. 37)
was gradually higher in FFPE samples compared to frozen samples; also the GC content was
higher in FFPE than in frozen tissue (49% vs. 45%). Detailed sequencing statistics for each
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 7/18
frozen and FFPE sample are shown in S4 Table. Frozen samples had a mean depth of 4,622
reads, FFPE samples of 1,852 reads. In FFPE samples, we investigated the correlation of se-
quencing depth with DNA quality measured by quality control PCR. This step is performed be-
fore library preparation and estimates the amount of amplifiable DNA as a surrogate for
functional DNA quality (Fig 1B and 1C). We found that higher ΔCq-values, indicative of lower
DNA quality, correlated with lower mean depth of sequencing (Pearson Coefficient -0.505, Fig
1C). Of note, higher ΔCq-values also correlated with higher GC-content of the samples (Pear-
son Coefficient 0.488, S2 Fig) while the depth of sequencing appeared to be independent of
mean GC-content of the sequenced sample (S2 Fig). Fig 1D shows histograms of the coverage
of amplicons for each paired FFPE and frozen samples, normalized to total coverage of the
sample. FFPE samples tend to have a less balanced distribution of coverage on the different
amplicons than frozen samples.
These data indicate that sequencing performance correlates with DNA quality of sequenced
FFPE samples.
High concordance of mutations identified in frozen and FFPE samples
from CRC metastases
Recent large scale projects have identified the most common mutations occurring in CRC [1].
Sequencing 212 amplicon regions in 48 cancer related genes, we analyzed variant calls using an
adapted Unified Genotyper analysis pipeline.
In the sequenced tumor samples from 16 patients (frozen and/or FFPE), a total of 29 muta-
tions were identified in eleven genes after excluding all non-coding mutations, all synonymous
variants, and all non-harmful variants present in the 1000 genomes data (Fig 2A2B). The
number of mutations per patient varied from zero to four, mean number of mutations per pa-
tient was 1.8. Of the mutations, 16 were SNPs, four were Indels leading to a frameshift and
nine to a stop codon. The most frequently mutated gene was TP53, which showed 10 mutations
in nine of the patients. We observed seven APC mutations in six patients, while KRAS and
PIK3CA were mutated two and tree times, respectively (Fig 2B).
DNA from FFPE tissues may have alterations due to the process of fixation in formalin. We
compared the variants identified in paired frozen and FFPE tissues. In ten sequenced patients
with paired frozen and FFPE tissue, 23 mutations were identified in FFPE samples and 21 mu-
tations in frozen samples, thus a concordance of 91% could be observed (Fig 3A and 3B). The
two non-matching mutations (BRAF V600E and ATM E1971G) were both identified in the
FFPE but not in the frozen sample of patient 09. Sanger sequencing of the BRAF mutational
hotspot in exon 15 was performed, revealing V600E mutation. Of note, six percent of >10,000
reads at the BRAF V600E locus in the frozen sample showed the alternative base T, which
however did not lead to a variant call with Unified Genotyper pipeline (Fig 3C).
Table 1. Sequencing statistics of all patient samples.
FFPE Frozen
# Raw Paired Reads 423,355 1,278,334
# Paired Reads Mapped 407,493 984,164
% Mapped/Raw 96% 78%
Mean Depth 1,852 4,622
%GC 49 45
Mean Seq. Quality (Phred score) 38 37
Mean Mapping Quality 49.63 48.98
doi:10.1371/journal.pone.0127146.t001
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 8/18
The correlation between observed percentage of tumor cells on representative FFPE slides
and calculated variant frequency for selected mutations was moderate (Fig 3D).
These data show that sequencing of FFPE tissue can lead to overall similar results as se-
quencing frozen material and could thus be a feasible approach for routine clinical samples.
Low reproducibility of variant calling in FFPE and frozen tissue with
different bioinformatics pipelines
Low reproducibility between different variant calling pipelines has been reported for whole-ge-
nome or whole-exome sequencing data [7]. To test whether this problem also occurs with
amplicon sequencing data, we compared different tools for variant calling in order to test re-
producibility of our results. We observed marked differences between different variant calling
Fig 3. Paired frozen and FFPE samples of CRC liver metastases have a high concordance of mutations in hotspot cancer genes. (A) GATK Unified
Genotyper variant calling pipeline was used to identify non-synonymous coding mutations in FFPE (green) and frozen samples (red). (B) Venn-Diagram of
non-synonymous coding mutations identified in FFPE and frozen samples. (C) Representative imagesof reads mapped to the site of BRAF V600E mutation
identified in FFPE but not in frozen tissue of patient 09, displayed with the Integrative Genomics Viewer. (D) Variant frequency of selected mutations and
estimated tumor cell content analyzing FFPE samples.
doi:10.1371/journal.pone.0127146.g003
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 9/18
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 10 / 18
software (Fig 4). Compared to Unified Genotyper pipeline (Fig 4A and 4B) Samtools/BCFtools
found five of the mutations identified with the Unified Genotyper pipeline (patient 04 APC,
patient 09 CDH1, patient 12 KRAS and TP53 and patient 14 TP53). The APC mutation of pa-
tient 09 was also identified at the same locus but only in the frozen sample. However, two addi-
tional APC frameshift mutations in patients 03 and 13 were called only by Samtools/BCFtools.
In contrast, 15 mutations called with the Unifed Genotyper pipeline in both FFPE and frozen
as well as two mutations called only in FFPE tissue were not identified with Samtools/BCFtools.
Thus, Samtools/BCFtools as used in our pipeline seems to be less sensitive, although it may
identify additional small Indels leading to frameshift mutations (Fig 4C and 4D). Moreover, re-
sults from Illumina MiSeq on-board Somatic Variant Caller pipeline are shown in Fig 4E and
4F. Notably, this pipeline appears to call variants in both frozen and FFPE samples that are not
identified by other pipelines.
Regarding the paired primary CRCs we analyzed from patients 04, 10, 11 and 14, Illumina
Somatic Variant Caller again called more variants than others, especially in patient 04 (S5
Table). Cell lines that were included as controls are shown in S6 Table. In the cell lines, almost
identical results were obtained with the Unified Genotyper pipeline and Illumina Somatic Vari-
ant Caller, while Samtools mpileup/Bcftools was less sensitive.
All variant data from patients and cell lines obtained with different variant calling pipelines
can be found in S7 Table.
These data indicate that remarkable differences exist among results of different variant call-
ing pipelines, which are not only related to DNA sample quality.
Sensitivity and specifity of amplicon sequencing with respect to different
variant calling pipelines using frozen and FFPE tissues
To evaluate sensitivity and specifity of amplicon sequencing analyzed with different bioinfor-
matics tools, we performed Sanger sequencing of KRAS exon 2. As shown in Table 2, sensitivity
and specifity were 100% using Unified Genotyper with DNA isolated from frozen samples. In
FFPE samples, one discordant case (patient 02) was noted, which had KRAS c.38G>A
Fig 4. Comparison of different methods for variant calling. Mutations identified in matched frozen and FFPE tissue of CRC liver metastases detected
with (A, B) Genome Analysis Toolkit (GATK) Unified Genotyper (C, D) Samtools mpileup/Bcftools and (E, F) Somatic variant caller. Green color represents
FFPE samples, red represents frozen, color intensities represent number of non-synonymous coding mutations per gene.
doi:10.1371/journal.pone.0127146.g004
Table 2. KRAS mutations identified by sanger sequencing compared to deep amplicon sequencing analyzed with different variant calling tools.
No. Sanger UG frozen UG FFPE SAM frozen SAM FFPE SVC frozen SVC FFPE
Pat02 c.38G>A NA 0 NA 0 NA c.423T>A
Pat03 0 0 0 0 0 0 0
Pat04 c.35G>A c.35G>A c.35G>A 0 0 c.35G>A c.35G>A
Pat06 0 NA 0 NA 0 NA 0
Pat08 0 0 0 0 0 0 0
Pat09 0 0 0 0 0 0 0
Pat11 0 0 0 0 0 0 0
Pat12 c.35G>T c.35G>T c.35G>T c.35G>T c.35G>T c.35G>T c.35G>T
Pat17 0 0 NA 0 NA 0 NA
Pat18 0 0 NA 0 NA 0 NA
UG, Unied Genotyper pipeline; SAM, Samtools mpileup/Bcftools pipeline, SVC, Somatic Variant Caller; NA, not available
doi:10.1371/journal.pone.0127146.t002
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 11 / 18
mutation according to Sanger sequencing. However, of note, Sanger sequencing was performed
with material from the primary tumor and the metastatic piece analyzed with amplicon se-
quencing had estimated tumor content of only 10%. In addition, none of the reads had the mu-
tated variant at the mutation locus (S3 Fig). Frozen tumor sample was not available from this
patient. Regarding other variant calling pipelines, Samtools/BCFtools failed to identify KRAS
mutation of patient 04, while Somatic Variant Caller had a false positive call in patient 02 FFPE
sample, missing the mutation at codon 38.
Additionally, human cancer cell lines were analyzed to test concordance of variant calling
pipelines irrespective of sample quality and to evaluate suitability of filter criteria. As shown in
S4 Fig, a high concordance is observed between variant loci identified in cancer cell lines after
filtering poor quality and non-harmful variants. Moreover, almost all of the variant loci in cell
lines HCT116, HT55, HUH7 and SW480 identified with Unified Genotyper pipeline were also
identified by large scale databases Cell Line Encyclopedia [13] and COSMIC [14], while discor-
dant loci were largely eliminated from our data upon filtering (S4 Fig).
Accordingly, in CRC metastases substantial differences can be observed between raw data-
sets and datasets after filtering variants by quality measures and functional annotations. Vari-
ant count is substantially reduced, while concordance between frozen and FFPE, as well as
between different variant calling pipelines increases. Results are presented in S5 Fig.
Processing all sequence alignment files together for variant calling is
more sensitive than separately
Processing many samples together for variant calling is generally recommended for whole-ge-
nome or whole-exome sequencing data in order to increase the number of reads at specific loci.
However, it is not known whether this is also beneficial for deep amplicon sequencing, since it
might lower the impact of rare variants only present in a subset of tumor cells in few samples.
In contrast, it might increase sensitivity for common mutations present in many samples. We
observed a general increase in sensitivity for variant calling when samples were processed in
parallel (S6A Fig and S6B Fig) compared to separate processing with otherwise identical pipe-
line and filter criteria (S6C Fig and S6D Fig). Separate variant calling identified no additional
mutation compared to combined variant calling, but missed three mutations in frozen samples
and five mutations in FFPE samples. Hence, even in high-depth amplicon sequencing data,
processing samples in parallel appears to be beneficial.
Discussion
We performed amplicon sequencing of hot-spot mutational regions in cancer related genes in
clinical samples from 16 patients with metastatic CRC with Illumina MiSeq. From ten patients,
we compared results of fresh frozen and FFPE tissue and observed a high concordance of vari-
ant calls using GATK Unified Genotyper pipeline with adapted filter criteria and processing
variant calling on all samples in parallel. Thereby, we illustrate the general feasibility of ampli-
con sequencing in FFPE tissue. However, we observed marked differences among tested variant
calling pipelines even in this small dataset, highlighting the importance of benchmarking and
development of more robust variant calling methods. Moreover, preparation of sequencing
libraries with DNA from low quality FFPE samples remains challenging. Here, we prepared se-
quencing libraries also for samples with poor quality by increasing input DNA and demon-
strated successful library amplification by analysis with Agilent Bioanalyzer. However, we
observed that samples with poor DNA quality also had lower sequencing coverage and were
more problematic for variant calling.
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 12 / 18
Observed mutation frequencies were in line with literature data for APC, TP53, PIK3CA
and KRAS being most frequently altered in CRC [1]. TP53 but not APC had the highest muta-
tion frequency in our cohort, which may be due to the fact that we sequenced metastases and
TP53 mutation is known to be a comparably late step in the carcinogenic process [15]. APC
mutations occurred in lower frequency than expected. It is likely that mutations in regions not
targeted by our approach have been missed.
Mutation calling in NGS data is challenging due to various potential sources of error, in-
cluding not only sequencing errors, but also artifacts occurring during PCR amplification, in-
correct local alignment or problems due to tumor heterogeneity [16]. According to the data
presented here, concordance of sequencing results with different variant calling pipelines was
generally low. A pipeline based on Unified Genotyper by GATK and SnpEff was used and com-
pared with the output of SAMtools/BCFtools and the Illumina Somatic Variant Caller regard-
ing both FFPE and frozen samples in direct comparison. The former appeard to be less
sensitive than Unified Genotyper pipeline, however, it also identified variants not found by the
other pipelines. The latter method showed more variant calls than the other pipelines. Howev-
er, since many variant calls were present either only in the FFPE or the frozen sample of the
same patient, and since several in CRC unexpected or unusual mutations appeared, especially
in poor quality samples, it is very likely that many of the additional variants identified by this
pipeline are false positive. Compared with Sanger sequencing of KRAS exon 2, a high concor-
dance of Unified Genotyper pipeline results was shown. For one patient, we observed a discor-
dant mutation status by Sanger and Illumina sequencing. However, Sanger sequencing was
performed with material from the primary tumor and the sample from the liver metastsis had
an estimated tumor content of only 10%. Notably, no mutant reads were observed in at the
KRAS exon 2 locus in the deep sequencing data. Hence, tumor heterogeneity or low tumor
content of the sequenced material can have led to the false-negative result. Other groups have
analyzed concordance of different variant calling pipelines in whole-exome and whole-genome
data [7,1618]. ORave et al. [7] sequenced 15 exomes and found a low concordance of only
57% for SNPs running five different analysis pipelines and 27% for Indels running three dif-
ferent analysis pipelines with near default parameters. Pabinger et al. [17] provided a broad
overview on software for NGS data analysis and tested 32 different programs for variant identi-
fication, annotation and visualization on four data sets including two cancer datasets. They
grouped tools into such categories as germline callersand somatic callers. The concordance
of five tested germline callerswas low for SNPs and zero for Indels, while they also found no
common variant with three somatic callers, analyzing whole-exome datasets. These studies
highlight the problems of accurate variant identification in large-scale whole-exome data. Our
study, however, is to our knowledge the first to raise this issue in rather small-scale, targeted
amplicon sequencing data from clinical, formalin-fixed samples. Some authors suggested to an-
alyze datasets with different variant callers and to combine their results. However, comparing
the results of different tools that apply different quality metrics, can be difficult and time con-
suming. Statistical methods, such as applying false-discovery-rate confidence values, have been
developed to rank mutation calls from different tools [19]. More elaborate experimental proce-
dures like sequencing replicates of normal tissue are necessary for such methods, which reduce
feasibility of such methods for clinical amplicon sequencing, for which in many retrospective
settings not even normal tissue is available as a reference. The same holds true for extensive val-
idation of data by applying different sequencing technologies, e.g. Sanger sequencing, to rule
out false-positive calls. This may in our view be a feasible approach to confirm novel SNPs cor-
related to inherited diseases in whole-exome studies of individual patients, but not for the
study of cancer genomes in retrospective studies on limited archival tumor material. Moreover,
the problem of false negatives cannot be overcome with this approach.
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 13 / 18
The study of FFPE material makes analysis even more difficult due to highly fragmented
DNA and SNP artifacts, for instance related to cytosine deamination to uracil [20]. Our data
indicate general feasibility of amplicon sequencing using FFPE tissue, demonstrating a high
concordance of variant calls in matched frozen and FFPE samples using GATK Unified Geno-
typer pipeline with adapted filter criteria. Notably, the two non-matching mutations (BRAF
V600E and ATM E1971G) were both identified in the FFPE sample but not in the frozen sam-
ple of the same patient, suggesting false positive calls due to low DNA quality (FFPE sample of
patient 09 had comparably high ΔCq value, comp. S2 Table). Sanger sequencing of the BRAF
mutational hotspot in exon 15, however, revealed V600E mutation. Hence, variant calling of
amplicon sequencing led to a false negative result in frozen tissue, most likely due to low
amounts of tumor cells or tumor heterogeneity. Interestingly, six per cent mutated reads of
>10,000 reads at the BRAF V600E locus in the frozen sample were not sufficient for a mutation
call by Unified Genotyper pipeline.
Substantiating our results, a few authors have reported NGS studies using FFPE material,
also demonstrating feasibility of this approach [6,2128]. For instance, Wagle et al. [26] suc-
cessfully applied exome capture sequencing to target 137 druggablemutations in 10 FFPE
samples form colon and breast cancer patients. Spencer et al. [6] provided the yet only study di-
rectly comparing paired frozen and FFPE samples from 16 patients with lung cancer, using hy-
bridization capture enrichment for sequencing 27 cancer related genes. They also used Unified
Genotyper from GATK for variant calling. They also found greater coverage variability and in-
creased C to T transitions in FFPE samples while base calls between paired frozen and FFPE
samples had concordances as high as 99%.
An important issue for sequencing of FFPE samples is DNA preparation, qualification and
library preparation [29]. It is recommended to measure DNA with Qubit (and NanoDrop) as-
says to assess purity and quantity of DNA [29]. As in previous reports, quantity of DNA isolat-
ed from our samples measured by NanoDrop differed from Qubit results, with NanoDrop
generally overestimating DNA amounts [29]. Moreover, it has been recommended to quantify
the functional, amplifiable DNA content of DNA isolated from FFPE tissue before applying
to NGS techniques, especially to PCR based amplicon sequencing. Sah et al. [30] reported that
a qPCR based assay (QFI-PCR), similar to the FFPE quality assay (Illumina) that we performed
on our samples, could identify poorest quality samples. Moreover, similar to our approach,
samples with low amounts of amplifiable DNA could be rescuedby increasing input
amounts. According to our data, the amount of amplifiable DNA (represented by a low ΔCq-
Value) in FFPE samples correlates with the amount of properly amplified library DNA. We
also could rescue some of the samples that had libraries with low amounts of properly ampli-
fied DNA by increasing DNA input. Since we used maximum DNA input, the ideal increase in
DNA amount for poor samples and also the cut-off to exclude poorest samples remains to be
defined. This is especially valid as a minimum input of precious DNA as possible is desired. In-
terestingly, the samples with lowest amounts of amplifiable DNA also had a higher number of
variant calls and a markedly increased number of false positive calls according to data from
Sah et al. [30], indicating that the amount of amplifiable DNA is also a surrogate for general
DNA quality. In our dataset, similar effects could be observed. The depth of sequencing was
lower for samples with high ΔCq-values. Moreover, samples with low amounts of properly am-
plified library DNA, as measured by Bioanalyzer, tended to have many (most likely false-posi-
tive) variant calls, when analyzed with Somatic Variant Caller pipeline (compare S2 Table,Fig
4,S5 Table). However, with the Unified Genotyper variant calling pipeline and strict variant fil-
tering, this effect could be diminished. Thus, poor sample quality can in part be compensated
with advanced bioinformatics methods. Nevertheless, this pipeline appears to have the problem
of enhanced false negatives in samples with poor quality compared to the Somatic Variant
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 14 / 18
Caller pipeline (compare Fig 4), indicating that variant calling remains problematic especially
in samples with poor DNA quality. In any case, best possible sample preparation is crucial to
allow optimal results in variant calling. Methods for biochemical modification of DNA from
FFPE tissues have been proposed [20], however, more data is needed to verify these interesting
results before implementing into current practice. Criteria for excluding poor quality samples
from sequencing also have to be refined. A recommended cut-off of ΔCq >2 would have ex-
cluded almost half of our samples, which is not satisfactory for clinical studies with rare patient
material. Further, larger studies would need to be employed to identify potentially suitable cut-
off values. Remarkably, large differences exist between the recommended amounts of input
DNA between Illumina TruSeq amplicon panel (>250ng) and IonTorrent AmpliSeq panel, for
which libraries can be prepared with as little as 10ng DNA.
Our study has several limitations. First, only a small set of patients has been analyzed. Larger
series of FFPE samples have to be sequenced to show feasibility for routine practice and clinical
studies. Moreover, many other variant calling algorithms and pipelines are available, that are
steadily improving. Nevertheless, we believe that the problems of reproducibility of variant calling
can be well demonstrated with this small dataset of matched frozen and FFPE samples, exempli-
fied by the tested variant calling pipelines. Almost all of our samples were from 2012. A larger
amount of older FFPE samples would have to be analyzed to define their usability for clinical se-
quencing. Data from previous studies, however, suggests that age does not generally have a big
impact on DNA quality, but rather fixation time in formalin seems to be of major importance [6].
Conclusions
In conclusion, our data shows that amplicon sequencing of clinical CRC samples is a viable ap-
proach to characterize druggable, predictive or prognostic mutations in the cancer genome. A
high concordance between mutations identified in frozen tissue and paired FFPE samples does
furthermore suggest that also archived tissues from pathology departments can be used for ge-
nomic profiling with this method. However, bioinformatic pipelines for data analysis yet show
marked differences in results. Moreover, dedicated sample and library preparation and qualifi-
cation, including exclusion of poorest quality samples, have to be done. For the use of amplicon
sequencing in routine diagnostics or in clinical studies, gold standard methods have to be de-
fined, which should lead to higher reproducibility.
Supporting Information
S1 Fig. Sequencing libraries produced with low-quality DNA. Bioanalyzer was used to mea-
sure amounts of DNA by fragment length of DNA from FFPE samples during the library prep-
aration workflow. Three representative samples are shown: Patient 04 had high levels of DNA
with the aspired DNA fragment size of ~310bp and low amounts of short length DNA frag-
ments <100bp after initial library preparation with standard input amounts. Patients 13 and
15 are examples for low quality DNA with low amounts of DNA around 310bp and high
amounts of highly fragmented DNA. Library preparation was repeated for those samples using
maximum DNA input (compare S2 Table), which led to significantly higher concentrations of
correct sized DNA for sample 13, but not for sample 15. Highly increased background, short-
fragment DNA was shown to be reduced after the PCR clean-up step. Patient 13 was then se-
quenced, while patient 15 was excluded.
(TIF)
S2 Fig. Functional DNA amount correlates with GC-content. (A) ΔCq-values of quality con-
trol PCR of FFPE samples are plotted against GC content of sample DNA. (B) GC content of
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 15 / 18
sample DNA and mean depth of sequencing of FFPE samples analyzed.
(TIF)
S3 Fig. No KRAS mutation identified by amplicon sequencing in patient 02. Representative
image of reads mapped to the site of KRAS exon 2 with no mutated reads detected at the muta-
tional hot-spot at codon 38 in FFPE tissue from the liver metastasis, displayed with the Integra-
tive Genomics Viewer. KRAS mutation had been detected in the primary tumor by Sanger
sequencing. The expected mutational locus is indicated by black lines.
(TIF)
S4 Fig. Filtering removes false positive variant calls in cell lines analyzed by deep amplicon
sequencing. (A,B) Variant calling of deep amplicon sequencing data from cell lines HCT116,
DLD-1, SW480, HUH7, HT55, HEK293T and HS68 was performed with GATK Unified Geno-
typer (UG), SamTools/BcfTools (SAM) or Illumina Somatic Variant Caller (SVC) without any
filtering of variants (A) or with exclusion of variants below defined quality thresholds, synony-
mous and non-coding variants, as well as variants present in the 1000G data (B). Concordance
of genomic variant loci identified with the tree pipelines was analyzed with jvenn. (C,D) Over-
lap of variant loci identified in HCT116, HT55, HUH7 and SW480 with the GATK Unified
Genotyper pipeline with variant loci detected by the Cell Line Encyclopedia Project [13]is
shown without (C) or with (D) filtering out variants below quality thresholds, synonymous
and non-coding variants, as well as variants present in the 1000G data. (E,F) Overlap of variant
loci identified in HCT116, HT55 and HUH7 with the GATK Unified Genotyper pipeline with
variant loci detected by the COSMIC cell line project [14] is shown without (E) or with (F) fil-
tering out low quality variants, synonymous and non-coding variants, as well as non-harmful
variants present in the 1000G data.
(TIF)
S5 Fig. Concordance of variant loci in frozen and FFPE samples analyzed with three differ-
ent variant calling pipelines with and without filtering. (A) Variant calling of sequencing
data from matched frozen and FFPE samples were performed with GATK Unified Genotyper
(UG), SamTools/BcfTools (SAM) or Illumina Somatic Variant Caller (SVC) without any filter-
ing of variants. Overlap of genomic variant loci identified in each group are shown. Below, the
number of variant loci identified in each group are outlined. (B) Variants from (A) were anno-
tated and variants with low quality metrics, synonymous and non-coding variants, as well as
variants present in the 1000G data were filtered out. Again, overlap of genomic variant loci
identified in each group are shown. Below, the number of variant loci identified in each group
are outlined. Fields with 0overlap are left empty.
(TIF)
S6 Fig. Variant calling is more sensitive when samples are processed together compared
with analyzing each sample individually. (A, B) GATK Unified Genotyper pipeline with vari-
ant calling in all analyzed samples together or (C, D) separate. Green color represents FFPE
samples, red represents frozen, color intensities represent number of non-synonymous coding
mutations per gene.
(TIF)
S1 Table. Patients.
(PDF)
S2 Table. Sample and library preparation.
(PDF)
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 16 / 18
S3 Table. List of amplicons and targeted regions.
(XLSX)
S4 Table. Patient sequencing statistics.
(XLSX)
S5 Table. Mutations identified in primary tumors with different variant calling pipelines.
(PDF)
S6 Table. Mutations identified in cell lines with different variant calling pipelines.
(PDF)
S7 Table. All variants identified in analyzed samples
(XLSX)
S8 Table. All unfiltered variants identified in analyzed samples
(XLSX)
Author Contributions
Conceived and designed the experiments: JB GK SP KH MB. Performed the experiments: JB
TM SL GE TG. Analyzed the data: JB GK TZ MPE MB. Contributed reagents/materials/analy-
sis tools: CLG SP KH. Wrote the paper: JB GK TG MPE KH MB.
References
1. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal
cancer. Nature. 2012; 487: 330337. doi: 10.1038/nature11252 PMID: 22810696
2. Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, et al. The Genomic Landscapes of
Human Breast and Colorectal Cancers. Science. 2007; 318: 11081113. PMID: 17932254
3. Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990; 61: 759767.
PMID: 2188735
4. Han S-W, Kim H-P, Shin J-Y, Jeong E-G, Lee W-C, Lee K-H, et al. Targeted Sequencing of Cancer-Re-
lated Genes in Colorectal Cancer Using Next-Generation Sequencing. PLoS ONE. 2013; 8: e64271.
doi: 10.1371/journal.pone.0064271 PMID: 23700467
5. Tougeron D, Lecomte T, Pages JC, Villalva C, Collin C, Ferru A, et al. Effect of low-frequency KRAS
mutations on the response to anti-EGFR therapy in metastatic colorectal cancer. Ann Oncol. 2013; 24:
12671273. doi: 10.1093/annonc/mds620 PMID: 23293113
6. Spencer DH, Sehn JK, Abel HJ, Watson MA, Pfeifer JD, Duncavage EJ. Comparison of clinical targeted
next-generation sequence data from formalin-fixed and fresh-frozen tissue specimens. J Mol Diagn.
2013; 15: 623633. doi: 10.1016/j.jmoldx.2013.05.004 PMID: 23810758
7. O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, et al. Low concordance of multiple variant-callingpipe-
lines: practical implications for exome andgenome sequencing. Genome Med. 2013; 5: 28. doi: 10.
1186/gm432 PMID: 23537139
8. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis
Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res.
2010; 20: 12971303. doi: 10.1101/gr.107524.110 PMID: 20644199
9. Cingolani P, Platts A, Wang Le, Coon M, Nguyen T, Wang L, et al. A program for annotating and pre-
dicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila mel-
anogaster strain w (1118); iso-2; iso-3. Fly. 2012; 6: 8092. doi: 10.4161/fly.19695 PMID: 22728672
10. Bardou P, Mariette J, Escudié F, Djemiel C, Klopp C. jvenn: an interactive Venn diagram viewer. BMC
Bioinformatics 2014; 15:293. doi: 10.1186/1471-2105-15-293 PMID: 25176396
11. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance ge-
nomics data visualization and exploration. Brief Bioinform 2013; 14:178192. doi: 10.1093/bib/bbs017
PMID: 22517427
12. Ahls MG, Niedergethmann M, Dinter D, Sauer C, Lüttges J, Post S, et al. Case report: Intraductal tubu-
lopapillary neoplasm of the pancreas with unique clear cell phenotype. Diagn Pathol 2014; 9:11. doi:
10.1186/1746-1596-9-11 PMID: 24443801
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 17 / 18
13. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line
Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012; 483:603607.
doi: 10.1038/nature11003 PMID: 22460905
14. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, et al. COSMIC: exploring the
world's knowledge of somatic mutations in human cancer. Nucleic Acids Research 2015; 43:D805
811. doi: 10.1093/nar/gku1075 PMID: 25355519
15. Baker SJ, Preisinger AC, Jessup JM, Paraskeva C, Markowitz S, Willson JK, et al. p53 gene mutations
occur in combination with 17p allelic deletions as late events in colorectal tumorigenesis. Cancer Res
1990; 50:77177722. PMID: 2253215
16. Kim SY, Speed TP. Comparing somatic mutation-callers: beyond Venn diagrams. BMC Bioinformatics
2013; 14:189. doi: 10.1186/1471-2105-14-189 PMID: 23758877
17. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, et al. A survey of tools for variant
analysis of next-generation genome sequencing data. Brief Bioinform 2014; 15:256278. doi: 10.1093/
bib/bbs086 PMID: 23341494
18. Roberts ND, Kortschak RD, Parker WT, Schreiber AW, Branford S, Scott HS, et al. A comparative anal-
ysis of algorithms for somatic SNV detection in cancer. Bioinformatics 2013; 29:22232230. doi: 10.
1093/bioinformatics/btt375 PMID: 23842810
19. Löwer M, Renard BY, de Graaf J, Wagner M, Paret C, Kneip C, et al. Confidence-based Somatic Muta-
tion Evaluation and Prioritization. PLoS Comput Biol 2012; 8:e1002714. doi: 10.1371/journal.pcbi.
1002714 PMID: 23028300
20. Do H, Dobrovic A. Dramatic reduction of sequence artefacts from DNA isolated from formalin-fixed can-
cer biopsies by treatment with uracil- DNA glycosylase. Oncotarget 2012; 3:546558. PMID: 22643842
21. Pritchard CC, Salipante SJ, Koehler K, Smith C, Scroggins S, Wood B, et al. Validation and Implemen-
tation of Targeted Capture and Sequencing for the Detection of Actionable Mutation, Copy Number Var-
iation, and Gene Rearrangement in Clinical Cancer Specimens. J Mol Diagn 2013. doi: 10.1016/j.
jmoldx.2013.08.004
22. Frampton GM, Fichtenholtz A, Otto GA, Wang K, Downing SR, He J, et al. Development and validation
of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotech-
nol. 2013; 31:10231031. doi: 10.1038/nbt.2696 PMID: 24142049
23. Endris V, Penzel R, Warth A, Muckenhuber A, Schirmacher P, Stenzinger A, et al. Molecular diagnostic
profiling of lung cancer specimens with a semiconductor-based massive parallel sequencing approach:
feasibility, costs, and performance compared with conventional sequencing. J Mol Diagn 2013;
15:765775. doi: 10.1016/j.jmoldx.2013.06.002 PMID: 23973117
24. Becker K, Vollbrecht C, Koitzsch U, Koenig K, Fassunke J, Huss S, et al. Deep ion sequencing of ampli-
con adapter ligated libraries: a novel tool in molecular diagnostics of formalin fixed and paraffin embed-
ded tissues. J Clin Pathol 2013; 66:803806. doi: 10.1136/jclinpath-2013-201549 PMID: 23618693
25. Hadd AG, Houghton J, Choudhary A, Sah S, Chen L, Marko AC, et al. Targeted, high-depth, next-gen-
eration sequencing of cancer genes in formalin-fixed, paraffin-embedded and fine-needle aspiration
tumor specimens. J Mol Diagn 2013; 15:234247. doi: 10.1016/j.jmoldx.2012.11.006 PMID: 23321017
26. Wagle N, Berger MF, Davis MJ, Blumenstiel B, Defelice M, Pochanard P, et al. High-throughput detec-
tion of actionable genomic alterations in clinical tumor samples by targeted, massively parallel se-
quencing. Cancer Discov 2012; 2:8293. doi: 10.1158/2159-8290.CD-11-0184 PMID: 22585170
27. Kerick M, Isau M, Timmermann B, Sültmann H, Herwig R, Krobitsch S, et al. Targeted high throughput
sequencing in clinical cancer settings: formaldehyde fixed-paraffin embedded (FFPE) tumor tissues,
input amount and tumor heterogeneity. BMC Med Genomics 2011; 4:68. doi: 10.1186/1755-8794-4-68
PMID: 21958464
28. Schweiger MR, Kerick M, Timmermann B, Albrecht MW, Borodina T, Parkhomchuk D, et al. Genome-
Wide Massively Parallel Sequencing of Formaldehyde Fixed-Paraffin Embedded (FFPE) Tumor Tis-
sues for Copy-Number- and Mutation-Analysis. PLoS ONE 2009; 4:e5548. doi: 10.1371/journal.pone.
0005548 PMID: 19440246
29. Simbolo M, Gottardi M, Corbo V, Fassan M, Mafficini A, Malpeli G, et al. DNA Qualification Workflow for
Next Generation Sequencing of Histopathological Samples. PLoS ONE 2013; 8:e62692. doi: 10.1371/
journal.pone.0062692 PMID: 23762227
30. Sah S, Chen L, Houghton J, Kemppainen J, Marko AC, Zeigler R, et al. Functional DNA quantification
guides accurate next-generation sequencing mutation detection in formalin-fixed, paraffin-embedded
tumor biopsies. Genome Med 2013; 5:77. doi: 10.1186/gm481 PMID: 24001039
Amplicon Sequencing of Frozen and Formalin Fixed Samples
PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 18 / 18
... We are aware of 17 published studies in which FFPE and FF biospecimens from the same patient have been sequenced in the context of NGS (summarized in Table 1). 3,14,15,21,[23][24][25][26][27][28][29][30][31][32][33][34][35] A common finding is that DNA sequenced from FFPE biospecimens has a lower percentage of mapped reads (i.e., reads that were aligned to the reference genome) than that from FF. Seven of the studies report that DNA from FFPE has lower coverage than that from FF samples, but still above the usually applied quality thresholds for NGS. 3,14,24,27,30,34,35 Another seven studies found or show no statistically significant difference in coverage between FFPE and FF. ...
... 3,14,15,21,[23][24][25][26][27][28][29][30][31][32][33][34][35] A common finding is that DNA sequenced from FFPE biospecimens has a lower percentage of mapped reads (i.e., reads that were aligned to the reference genome) than that from FF. Seven of the studies report that DNA from FFPE has lower coverage than that from FF samples, but still above the usually applied quality thresholds for NGS. 3,14,24,27,30,34,35 Another seven studies found or show no statistically significant difference in coverage between FFPE and FF. 15,21,23,25,29,31,32 One paper reports greater coverage in FFPE than FF, and two papers do not report any. ...
... 14 An enrichment in C:G > T:A mutations in the FFPE DNA compared with the FF DNA was reported in four of the reports (average coverage per study, 77-130×), 3,14,30,32 and there were no statistically significant enrichments in five studies. 15,21,24,25,35 One study only found formaldehyde-induced artifactual mutations when the FFPE DNA was particularly degraded, 23 one study only found it in CpG sites, 29 and six studies did not report any. [26][27][28]31,33,34 Formaldehyde-induced artifacts are random in their nature, and so they become less likely to materialize when sequencing coverage increases. ...
Article
Full-text available
Fresh-frozen tissue is the “gold standard” biospecimen type for next-generation sequencing (NGS). However, collecting frozen tissue is usually not feasible because clinical workflows deliver formalin-fixed, paraffin-embedded (FFPE) tissue blocks. Some clinicians and researchers are reticent to embrace the use of FFPE tissue for NGS because FFPE tissue can yield low quantities of degraded DNA, containing formalin-induced mutations. We describe the process by which formalin-induced deamination can lead to artifactual cytosine (C) to thymine (T) and guanine (G) to adenine (A) (C:G > T:A) mutation calls and perform a literature review of 17 publications that compare NGS data from patient-matched fresh-frozen and FFPE tissue blocks. We conclude that although it is indeed true that sequencing data from FFPE tissue can be poorer than those from frozen tissue, any differences occur at an inconsequential magnitude, and FFPE biospecimens can be used in genomic medicine with confidence:
... Variants were selected for known single nucleotide polymorphisms (SNPs) and synonymous mutations. All non-coding, silent, synonymous, unknown and common germline variants were filtered out, as well as all variants present in 1,000 G data (22). Moreover, all variants at a locus with coverage of <200, or variants with a variant frequency <0.05 were excluded. ...
... Betge et al. studied gene mutations in paired FFPE and fresh frozen tissue samples from hepatic metastases from 10 patients with CRC. The results revealed a high concordance between samples, with 21 identical variants and 2 different variants (22). All of the above-mentioned studies used NGS to detect mutation, but the testing method were different. ...
... However, because the accuracy of mutation detection in FFPE tissues is influenced by multiple factors, it is important to standardize the procedure in order to minimize variability. Standardized protocols should be elaborated for sample preparation, storage room requirements, library preparation, evaluating the quality of extracted DNA, and the exclusion of poor-quality samples (22). This study had some limitations. ...
Article
Full-text available
Background: Next generation sequencing (NGS)-based multi-gene panel tests have been performed to predict the treatment response and prognosis in patients with colorectal cancer (CRC). Whether the multi-gene mutation results of formalin-fixed paraffin-embedded (FFPE) tissues are identical to those of fresh frozen tissues remains unknown. Methods: A 22-gene panel with 103 hotspots was used to detect mutations in paired fresh frozen tissue and FFPE tissue from 118 patients with CRC. Results: In our study, 117 patients (99.2%) had one or more variants, with 226 variants in FFPE tissue and 221 in fresh frozen tissue. Of the 129 variants identified in this study, 96 variants were present in both FFPE and fresh frozen tissues; 27 variants were found in FFPE tissues only; 6 variants were found only in fresh frozen tissues. The mutation results demonstrated >94.0% concordance in all variants, with Kappa coefficient >0.500 in 64.3% (83/129) of variants. At the gene level, concordance ranged from 73.8 to 100.0%, with Kappa coefficient >0.500 in 81.3% (13/16) of genes. Conclusions: The results of mutation analysis performed with a multi-gene panel and FFPE and fresh frozen tissue were highly concordant in patients with CRC, at both the variant and gene levels. There were, however, some important differences in mutation results between the two tissue types. Therefore, fresh frozen tissue should not routinely be replaced with FFPE tissue for mutation analysis with a multi-gene panel. Rather, FFPE tissue is a reasonable alternative for fresh frozen tissue when the latter is unavailable.
... It was observed that our study showed concordance to Betge et al. (2015) who has tried to compare sequencing in fresh and frozen tissue samples and have 158 reported that convincing results were obtained with FFPE tissues (Betge et al., 2015). Past studies have reported that NGS (Gall et al., 1993), Genome-wide massively parallel sequencing (Schweiger et al., 2009), Whole genome (Van Allen et al., 2014) and Whole exome sequencing (Astolfi et al., 2015;Bonfiglio et al., 2016) have been performed on FFPE tissues. ...
... It was observed that our study showed concordance to Betge et al. (2015) who has tried to compare sequencing in fresh and frozen tissue samples and have 158 reported that convincing results were obtained with FFPE tissues (Betge et al., 2015). Past studies have reported that NGS (Gall et al., 1993), Genome-wide massively parallel sequencing (Schweiger et al., 2009), Whole genome (Van Allen et al., 2014) and Whole exome sequencing (Astolfi et al., 2015;Bonfiglio et al., 2016) have been performed on FFPE tissues. ...
... Tanımlanan patojenik/ olası patojenik varyantların genin kodladığı proteinin fonksiyonel özellikleri üzerine olası etkileri bakımından değerlendirilebilmesi için çeşitli tahminleme araçlarından yararlanılmıştır. Ligand-reseptör etkileşiminde görevli Ig-benzeri domainleri kodlayan dizilerde belirlenen patojenik çerçeve kayması mutasyonlarının [R55Gfs*49 (3/30), W66Mfs*4 (10/30), L71Hfs*3 (4/30), Q79Rfs*25 ( (26,27). Yeni tanımlanan patojenik/ olası patojenik varyantlardan ekzon 25'te belirlenen p.C1255Afs*3 çerçeve kaymasının (22/30), ATP bağlanma kavitesini oluşturan (R1253, N1254, C1255, L1256) bölgede bulunması ve bu bölgenin ALK mutasyonlarında hedefe yönelik ilaç etkileşimi için kullanılan hedeflerden birini (28) oluşturması, bu ve diğer yeni tanımlanan mutasyonların ATP bağlanma dizisini tamamen ortadan kaldıracağının tahminlenmesi, bu gene ait mutasyonların olgularda anormal kinaz aktivitesine yol açma ve ilaç direnci oluşturma potansiyelini ortaya çıkarmaktadır. ...
Article
Amaç: Paranazal sinüs kanserleri oldukça nadir görülen heterojen bir hastalık grubudur. Maksiler sinüs skuamoz hücreli karsinomu, paranazal sinüs kanserlerinin anatomik ve histolojik olarak en yaygın alt tipidir. Bu kanserin genetik profiline dair bilginin sınırlı olması, hastaların hedefli tedavi seçeneklerinden yararlanamamasına neden olmaktadır. Çalışmamızda bu nadir kanserdeki reseptör tirozin kinaz mutasyonlarının tanımlanması ve mutasyonların olası fonksiyonel etkilerinin tahmin edilmesi amaçlanmıştır. Gereç ve Yöntem: Bu amaçla 30 olgunun tümörüne ait FFPE dokulardan DNA izolasyonu gerçekleştirildi, olguların mutasyon profili yeni nesil sekanslama yöntemi ve biyoinformatik değerlendirme ile belirlendi. Belirlenen patojenik/ olası patojenik varyantların fonksiyonel etkileri farklı in silico araçlar yardımıyla tahminlendi. Bulgular: Olgularının tamamında en az bir adet patojenik/olası patojenik KIT, PDFGRA ve RETmutasyonu belirlendi. KIT geninin katalitik bölgesindeki mutasyonların kinaz aktivitesini arttıracağı tahmin edildi. PDFGRA genindeki p.P567P ve p.D1074D mutasyonları, 30 olgunun tamamında ve SRA veritabanından elde edilen normal dokulara ait okumaların tümünde belirlendi. Sonuç: Reseptör tirozin kinaz mutasyonlarının paranazal sinüs kanserlerinde de önemli rol oynayabileceğinin belirlenmiş olması özellikle artmış kinaz aktivitesini hedefleyen tedavi yaklaşımlarını bu olguların erişimine sunma potansiyeli taşıması bakımından oldukça önemlidir.
... To overcome these drawbacks, the use of formalin-fixed paraffin-embedded specimens (FFPE) has been explored [11,12]. Compared with the frozen material, FFPE tissues are more suitable for relatively simple long-term storage at room temperature and are widely available from biobanks in pathology departments [13]. Although this biotype harbors a great potential for expanding metagenomics studies (i.e., allowing access to clinical samples from a wide range of locations and times), FFPE specimens carry several limitations for genomic analysis [14] mostly derived from the formalin fixation process and storage that negatively impact the DNA integrity (e.g., cross-linking, fragmentation, and mutations) [15]. ...
Article
Full-text available
Formalin-fixed, paraffin-embedded (FFPE) tissues represent the most widely available clinical material to study colorectal cancer (CRC). However, the accuracy and clinical validity of FFPE microbiome profiling in CRC is uncertain. Here, we compared the microbial composition of 10 paired fresh-frozen (FF) and FFPE CRC tissues using 16S rRNA sequencing and RNA-ISH. Both sample types showed different microbial diversity and composition. FF samples were enriched in archaea and representative CRC-associated bacteria, such as Firmicutes, Bacteroidetes and Fusobacteria. Conversely, FFPE samples were mainly enriched in typical contaminants, such as Sphingomonadales and Rhodobacterales. RNA-ISH in FFPE tissues confirmed the presence of CRC-associated bacteria, such as Fusobacterium and Bacteroides, as well as Propionibacterium allowing discrimination between tumor-associated and contaminant taxa. An internal quality index showed that the degree of similarity within sample pairs inversely correlated with the dominance of contaminant taxa. Given the importance of FFPE specimens for larger studies in human cancer genomics, our findings may provide useful indications on potential confounding factors to consider for accurate and reproducible metagenomics analyses.
... Mutational comparisons have also been undertaken in colorectal cancer (CRC) specimens; the detected concordance rate was up to 81.9% in a study of 33 matched metastatic CRC samples [32]. In a cohort of 10 paired metastatic liver CRC specimens, a high mutational concordance was observed when 212 amplicon regions in 48 cancer-related genes were sequenced, revealing 21 identical mutation calls and only two differing mutations [33]. Furthermore, Gao et al. conducted an extensive study using a 22-gene panel detecting 103 hotspot mutations in paired FFPE and fresh-frozen primary CRC tissues from 118 patients [34]. ...
Article
Full-text available
1. Background: The application of massively parallel sequencing has led to the identification of aberrant druggable pathways and somatic mutations within therapeutically relevant genes in gastro-oesophageal cancer. Given the widespread use of formalin-fixed paraffin-embedded (FFPE) samples in the study of this disease, it would be beneficial, especially for the purposes of biomarker evaluation, to assess the concordance between comprehensive exome-wide sequencing data from archival FFPE samples originating from a prospective clinical study and those derived from fresh-frozen material. 2. Methods: We analysed whole-exome sequencing data to define the mutational concordance of 16 matched fresh-frozen and FFPE gastro-oesophageal tumours (N = 32) from a prospective clinical study. We assessed DNA integrity prior to sequencing and then identified coding mutations in genes that have previously been implicated in other cancers. In addition, we calculated the mutant-allele heterogeneity (MATH) for these samples. 3. Results: Although there was increased degradation of DNA in FFPE samples compared with frozen samples, sequencing data from only two FFPE samples failed to reach an adequate mapping quality threshold. Using a filtering threshold of mutant read counts of at least ten and a minimum of 5% variant allele frequency (VAF) we found that there was a high median mutational concordance of 97% (range 80.1-98.68%) between fresh-frozen and FFPE gastro-oesophageal tumour-derived exomes. However, the majority of FFPE tumours had higher mutant-allele heterogeneity (MATH) scores when compared with corresponding frozen tumours (p < 0.001), suggesting that FFPE-based exome sequencing is likely to over-represent tumour heterogeneity in FFPE samples compared to fresh-frozen samples. Furthermore, we identified coding mutations in 120 cancer-related genes, including those associated with chromatin remodelling and Wnt/β-catenin and Receptor Tyrosine Kinase signalling. 4. Conclusions: These data suggest that comprehensive genomic data can be generated from exome sequencing of selected DNA samples extracted from archival FFPE gastro-oesophageal tumour tissues within the context of prospective clinical trials.
... Particularly in the context of amplicon workflows, the standardization of variant calling pipelines remains elusive. (Betge et al., 2015). ...
Preprint
Motivation Next-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic. Results We have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant. Availability Pisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces . Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene mutations. Contact pisces@illumina.com Supplementary information Supplementary data are available online.
... Mutational comparisons have also been undertaken in colorectal cancer (CRC) specimens; the detected concordance rate was up to 81.9% in a study of 33 matched metastatic CRC samples [32]. In a cohort of 10 paired metastatic liver CRC specimens, a high mutational concordance was observed when 212 amplicon regions in 48 cancer-related genes were sequenced, revealing 21 identical mutation calls and only two differing mutations [33]. Furthermore, Gao et al. conducted an extensive study using a 22-gene panel detecting 103 hotspot mutations in paired FFPE and fresh-frozen primary CRC tissues from 118 patients [34]. ...
Research Proposal
Full-text available
Dear Colleagues, Gastric cancer represents one of the most frequent and lethal tumors worldwide today, finding itself in the fifth place in incidence and the third in mortality. Surgery remains the only curative treatment for localized tumors, but only 20% of patients are suitable for surgery due to the lack of specific symptoms and the late diagnosis, especially in Western countries. Additionally, even in patients who receive curative treatment, rates of locoregional relapse and distant metastasis remain high. Palliative chemotherapy is the principal treatment in cases of metastatic disease even if the prognosis of patients receiving chemotherapy is still poor. Therefore, a multidisciplinary evaluation is important in order to improve the efficacy of active treatments. In this context, there is an unmet need for a better understanding of genetic alterations and prognostic and predictive factors in order to choose the best tailored therapy for each patient. The aim of this Special Issue is to focus on the results and problems of multimodality treatment in metastatic gastric cancer, the search for prognostic and predictive factors, and the evaluation of novel strategies for individualized treatment. We are inviting relevant original research, systematic reviews, meta-analyses, and short communications covering the above-mentioned topics. https://www.mdpi.com/journal/jcm/special_issues/Gastric_Treatments
Article
The discovery of increasing numbers of actionable molecular and gene targets for cancer treatment has driven the demand for tissue sampling for next-generation sequencing (NGS). Requirements for sequencing can be very specific, and inadequate sampling leads to delays in management and decision making. It is important that interventional radiologists are aware of NGS technologies and their common applications and be cognizant of the factors that contribute to successful sample sequencing. This review summarizes the fundamentals of cancer tissue collection and processing for NGS. It elaborates on sequencing technologies and their applications with the aim of providing readers with a working understanding that can enhance their clinical practice. It then describes imaging, tumor, biopsy, and sample collection factors that improve the chances of NGS success. Finally, it discusses future practice, highlighting the problem of undersampling in both clinical and research settings and the opportunities within interventional radiology to address this.
Article
Identification of somatic variants in cancer by high-throughput sequencing has become common clinical practice largely because many of these variants may be predictive biomarkers for targeted therapies. However, there can be high sample quality control (QC) failure rates for some tests preventing the return of results. SLIMamp is a patented technology that has been incorporated into commercially available cancer NGS testing kits with the claimed advantage that these kits can interrogate challenging formalin-fixed paraffin-embedded tissue (FFPET) samples with low tumor purity, poor DNA quality, and/or low input DNA, resulting in a high sample QC pass rate. The aim of this study was to substantiate that claim using Pillar's oncoReveal Solid Tumor Panel. 48 samples that had failed one or more pre-analytical QC sample parameters for whole exome sequencing (WES) from ATGC's ISO15189 accredited diagnostic genomics laboratory were acquired. XING Genomic Services (XGS) performed an exploratory data analysis to characterize the samples and then tested the samples in their ISO15189 accredited laboratory. Clinical reports could be generated for 37 samples (77%), of which 29 (60%) contained clinically actionable or significant variants that would not have otherwise been identified. 11 samples were deemed to be unreportable and the sequencing data were likely dominated by artefacts. A novel post-sequencing QC metric was developed which can discriminate between clinically reportable and unreportable samples.
Article
Full-text available
To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase ε (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression. Supplementary information The online version of this article (doi:10.1038/nature11252) contains supplementary material, which is available to authorized users.
Article
Full-text available
COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk) is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. Our latest release (v70; Aug 2014) describes 2 002 811 coding point mutations in over one million tumor samples and across most human genes. To emphasize depth of knowledge on known cancer genes, mutation information is curated manually from the scientific literature, allowing very precise definitions of disease types and patient details. Combination of almost 20 000 published studies gives substantial resolution of how mutations and phenotypes relate in human cancer, providing insights into the stratification of mutations and biomarkers across cancer patient populations. Conversely, our curation of cancer genomes (over 12 000) emphasizes knowledge breadth, driving discovery of unrecognized cancer-driving hotspots and molecular targets. Our high-resolution curation approach is globally unique, giving substantial insight into molecular biomarkers in human oncology. In addition, COSMIC also details more than six million noncoding mutations, 10 534 gene fusions, 61 299 genome rearrangements, 695 504 abnormal copy number segments and 60 119 787 abnormal expression variants. All these types of somatic mutation are annotated to both the human genome and each affected coding gene, then correlated across disease and mutation types.
Article
Full-text available
Background Venn diagrams are commonly used to display list comparison. In biology, they are widely used to show the differences between gene lists originating from different differential analyses, for instance. They thus allow the comparison between different experimental conditions or between different methods. However, when the number of input lists exceeds four, the diagram becomes difficult to read. Alternative layouts and dynamic display features can improve its use and its readability. Results jvenn is a new JavaScript library. It processes lists and produces Venn diagrams. It handles up to six input lists and presents results using classical or Edwards-Venn layouts. User interactions can be controlled and customized. Finally, jvenn can easily be embeded in a web page, allowing to have dynamic Venn diagrams. Conclusions jvenn is an open source component for web environments helping scientists to analyze their data. The library package, which comes with full documentation and an example, is freely available at http://bioinfo.genotoul.fr/jvenn.
Article
Full-text available
Intraductal tubulopapillary neoplasms of the pancreas are very rare tumors characterized by intraductal tubulopapillary growth, ductal differentiation, scant intracellular mucin production and cellular dysplasia. Here, we report the first case of an intraductal tubulopapillary neoplasm of the pancreas with clear cell morphology. The tumor was detected during the diagnostic work-up of acute pancreatitis in a 43- year old female. Histological examination revealed a tumor with the typical architecture of an intraductal tubulopapillary neoplasm of the pancreas with tumor cells showing abundant clear cytoplasm and Di-PAS negativity. Immunohistochemistry revealed positivity for Pan-CK, CK7, CK8/18, MUC1, MUC6, carbonic anhydrase IX, CD10, EMA, β-catenin and e-cadherin. Sanger sequencing did not detect mutations for β-catenin, BRAF, KRAS, PIK3CA and GNAS. Altogether, histology, immunohistochemical expression profile (MUC1+, MUC6+, MUC2-, MUC5AC-, thrypsin-, chymotrypsin-, CDX2-) and sequencing results led to the diagnosis of intraductal tubulopapillary neoplasm. However, the neoplasm consisted of cells showing abundant clear cytoplasm, a morphological pattern not being described so far in the current classification of pancreatic intraductal neoplasms. Potential differential diagnosis and the molecular basis of clear cell morphology are discussed. In conclusion, we consider this tumor as intraductal tubulopapillary neoplasm of the pancreas with unique clear cell phenotype. After surgery and without adjuvant therapy, the patient’s clinical course has been uneventful for over two years now. Virtual slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1051828790117127
Article
Full-text available
Recent years have seen development and implementation of anticancer therapies targeted to particular gene mutations, but methods to assay clinical cancer specimens in a comprehensive way for the critical mutations remain underdeveloped. We have developed UW-OncoPlex, a clinical molecular diagnostic assay to provide simultaneous deep-sequencing information, based on >500× average coverage, for all classes of mutations in 194 clinically relevant genes. To validate UW-OncoPlex, we tested 98 previously characterized clinical tumor specimens from 10 different cancer types, including 41 formalin-fixed paraffin-embedded tissue samples. Mixing studies indicated reliable mutation detection in samples with ≥10% tumor cells. In clinical samples with ≥10% tumor cells, UW-OncoPlex correctly identified 129 of 130 known mutations [sensitivity 99.2%, (95% CI, 95.8%-99.9%)], including single nucleotide variants, small insertions and deletions, internal tandem duplications, gene copy number gains and amplifications, gene copy losses, chromosomal gains and losses, and actionable genomic rearrangements, including ALK-EML4, ROS1, PML-RARA, and BCR-ABL. In the same samples, the assay also identified actionable point mutations in genes not previously analyzed and novel gene rearrangements of MLL and GRIK4 in melanoma, and of ASXL1, PIK3R1, and SGCZ in acute myeloid leukemia. To best guide existing and emerging treatment regimens and facilitate integration of genomic testing with patient care, we developed a framework for data analysis, decision support, and reporting clinically actionable results.