ArticlePDF Available

Amplicon Sequencing of Colorectal Cancer: Variant Calling in Frozen and Formalin-Fixed Samples

May 2015
PLOS ONE 10(5):e0127146

DOI:10.1371/journal.pone.0127146

License
CC BY 4.0

Authors:

Johannes Betge

German Cancer Research Center

Svenja Leible

German Cancer Research Center

Show all 12 authorsHide

Next generation sequencing (NGS) is an emerging technology becoming relevant for genotyping of clinical samples. Here, we assessed the stability of amplicon sequencing from formalin-fixed paraffin-embedded (FFPE) and paired frozen samples from colorectal cancer metastases with different analysis pipelines. 212 amplicon regions in 48 cancer related genes were sequenced with Illumina MiSeq using DNA isolated from resection specimens from 17 patients with colorectal cancer liver metastases. From ten of these patients, paired fresh frozen and routinely processed FFPE tissue was available for comparative study. Sample quality of FFPE tissues was determined by the amount of amplifiable DNA using qPCR, sequencing libraries were evaluated using Bioanalyzer. Three bioinformatic pipelines were compared for analysis of amplicon sequencing data. Selected hot spot mutations were reviewed using Sanger sequencing. In the sequenced samples from 16 patients, 29 non-synonymous coding mutations were identified in eleven genes. Most frequent were mutations in TP53 (10), APC (7), PIK3CA (3) and KRAS (2). A high concordance of FFPE and paired frozen tissue samples was observed in ten matched samples, revealing 21 identical mutation calls and only two mutations differing. Comparison of these results with two other commonly used variant calling tools, however, showed high discrepancies. Hence, amplicon sequencing can potentially be used to identify hot spot mutations in colorectal cancer metastases in frozen and FFPE tissue. However, remarkable differences exist among results of different variant calling tools, which are not only related to DNA sample quality. Our study highlights the need for standardization and benchmarking of variant calling pipelines, which will be required for translational and clinical applications.

Depth of Sequencing correlates with DNA quality. (A) Sample preparation workflow. DNA was isolated from fresh frozen or FFPE CRC liver metastasis resection specimens with Qiagen Blood and Tissue or FFPE kit, respectively. Frozen samples then directly underwent sequencing library preparation, pooling of libraries, quality control and sequencing. FFPE samples were additionally tested for DNA quality by qPCR. Library quality was tested with Bioanalyzer. For samples with low amounts of correctly sized DNA amplicons (fragments at 310bp), new libraries were prepared with higher starting DNA concentrations and re-analyzed with Bioanalyzer. Samples with yet low amounts of DNA with correct size and highly fragmented DNA were excluded. (B) Δ Cq-values of quality control PCR indicate poor sample quality. DNA concentration of fragments between 250bp and 450bp after library preparation was calculated with Agilent Bioanalyzer and plotted against Δ Cq values of FFPE quality control PCR. (C) higher Δ Cq-values correlate with lower mean depth of sequencing. (D) Coverage distribution of amplicons from all paired FFPE and frozen samples, normalized to total sample coverage. Frozen samples had a mean depth of 4,622, FFPE samples 1,852. doi:10.1371/journal.pone.0127146.g001

…

Sequencing statistics of all patient samples.

…

Amplicon Sequencing identifies hot-spot mutations in CRC metastases. (A) Sequencing analysis workflow. Sequence alignment files underwent local-realignment around Indels, left alignment and base quality score recalibration. After variant calling with GATK Unified Genotyper, annotation and effect prediction of detected variants was done using SnpEff. Raw variants of all samples were filtered by custom parameters with SnpSift. Variants included in the 1000 Genomes Project data were excluded to only obtain somatic mutations in cancer. (B) High frequency of TP53 and APC mutations among somatic mutations identified in CRC liver metastases (frozen and FFPE tissue). Colored fields represent presence of a nonsynonymous coding SNP (blue), a mutation leading to a stop-codon (grey) or a frameshift mutation (orange). Bars sum up mutations present in each patient (vertical bars) or each mutated gene (horizontal bars). Of note, some genes contain more than one mutation. doi:10.1371/journal.pone.0127146.g002

…

KRAS mutations identified by sanger sequencing compared to deep amplicon sequencing analyzed with different variant calling tools.

…

Paired frozen and FFPE samples of CRC liver metastases have a high concordance of mutations in hotspot cancer genes. (A) GATK Unified Genotyper variant calling pipeline was used to identify non-synonymous coding mutations in FFPE (green) and frozen samples (red). (B) Venn-Diagram of non-synonymous coding mutations identified in FFPE and frozen samples. (C) Representative images of reads mapped to the site of BRAF V600E mutation identified in FFPE but not in frozen tissue of patient 09, displayed with the Integrative Genomics Viewer. (D) Variant frequency of selected mutations and estimated tumor cell content analyzing FFPE samples.

…

Figures - uploaded by Tianzuo Zhan

Content may be subject to copyright.

Content uploaded by Tianzuo Zhan

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

RESEARCH ARTICLE

Amplicon Sequencing of Colorectal Cancer:

Variant Calling in Frozen and Formalin-Fixed

Samples

Johannes Betge

1,2

*, Grainne Kerr

, Thilo Miersch

, Svenja Leible

, Gerrit Erdmann

Christian L. Galata

, Tianzuo Zhan

1,2

, Timo Gaiser

, Stefan Post

, Matthias P. Ebert

Karoline Horisberger

, Michael Boutros

1Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department

of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany,

2Department of Medicine II, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg

University, Mannheim, Germany, 3Department of Surgery, University Hospital Mannheim, Medical Faculty

Mannheim, Heidelberg University, Mannheim, Germany, 4Institue of Pathology, University Hospital

Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany

*j.betge@dkfz.de;m.boutros@dkfz.de

Abstract

Next generation sequencing (NGS) is an emerging technology becoming relevant for geno-

typing of clinical samples. Here, we assessed the stability of amplicon sequencing from for-

malin-fixed paraffin-embedded (FFPE) and paired frozen samples from colorectal cancer

metastases with different analysis pipelines. 212 amplicon regions in 48 cancer related

genes were sequenced with Illumina MiSeq using DNA isolated from resection specimens

from 17 patients with colorectal cancer liver metastases. From ten of these patients, paired

fresh frozen and routinely processed FFPE tissue was available for comparative study.

Sample quality of FFPE tissues was determined by the amount of amplifiable DNA using

qPCR, sequencing libraries were evaluated using Bioanalyzer. Three bioinformatic pipe-

lines were compared for analysis of amplicon sequencing data. Selected hot spot mutations

were reviewed using Sanger sequencing. In the sequenced samples from 16 patients, 29

non-synonymous coding mutations were identified in eleven genes. Most frequent were mu-

tations in TP53 (10), APC (7), PIK3CA (3) and KRAS (2). A high concordance of FFPE and

paired frozen tissue samples was observed in ten matched samples, revealing 21 identical

mutation calls and only two mutations differing. Comparison of these results with two other

commonly used variant calling tools, however, showed high discrepancies. Hence, ampli-

con sequencing can potentially be used to identify hot spot mutations in colorectal cancer

metastases in frozen and FFPE tissue. However, remarkable differences exist among re-

sults of different variant calling tools, which are not only related to DNA sample quality. Our

study highlights the need for standardization and benchmarking of variant calling pipelines,

which will be required for translational and clinical applications.

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 1/18

OPEN ACCESS

Citation: Betge J, Kerr G, Miersch T, Leible S,

Erdmann G, Galata CL, et al. (2015) Amplicon

Sequencing of Colorectal Cancer: Variant Calling in

Frozen and Formalin-Fixed Samples. PLoS ONE

10(5): e0127146. doi:10.1371/journal.pone.0127146

Academic Editor: Jeong-Sun Seo, Seoul National

University College of Medicine, REPUBLIC OF

KOREA

Received: January 10, 2015

Accepted: April 13, 2015

Published: May 26, 2015

access article distributed under the terms of the

Creative Commons Attribution License, which permits

unrestricted use, distribution, and reproduction in any

medium, provided the original author and source are

credited.

Data Availability Statement: All relevant data are

available via the European Nucleotide Archive (ENA)

under accession number PRJEB8754.

Funding: JB has been supported by a fellowship

from the Hartmut-Hoffmann-Berling International

Graduate School (HBIGS).

Competing Interests: The authors have declared

that no competing interests exist.

Introduction

Due to recent advances in deep sequencing technologies, remarkable insights have been gained

on the alterations acquired by colorectal cancer (CRC) genomes during the carcinogenic process,

largely expanding our view on CRC genomic progression [1–3]. The promise that after structural

characterization of cancer genomes, clinical decision-making would be guided by individual ge-

nomic tumor profiles, however, remains to be fulfilled. Nevertheless, the development of novel

targeted therapeutics highlights the need for reliable and cost effective methods for molecular

characterization of cancer genomes to identify patients that ultimately respond to treatment on

the basis of druggable mutations, predictive alterations or acquired resistance markers.

Targeted sequencing based on PCR amplicons represents a feasible approach for evaluation

of actionable mutations, mutational hot spots or predictive alterations in cancer genomes for

clinical studies. Compared to genome-wide or exome-wide sequencing, a high depth of se-

quencing (>1000 reads) at the genomic loci of interest can be reached, thus facilitating detec-

tion of low-frequency variants in heterogeneous tumor samples admixed with stromal cells

[4,5]. Moreover, due to the comparably low number of base pairs to be sequenced per patient,

multiple samples, also for longitudinal analysis, can be analyzed in parallel on bench-top ma-

chines such as Illumina MiSeq, lowering costs and potentially allowing routine clinical applica-

tion in the near future.

However, for clinical application and for translational studies on archived clinical samples,

many problems remain to be solved. Most widely available specimens for clinical diagnostics

and biomarker studies are formalin-fixed, paraffin-embedded (FFPE) tissues from pathology

archives, as their long-term storage is relatively simple and cost efficient compared to frozen

material. However, it is known that formalin fixation leads to covalent linking of DNA, RNA

and protein by methylene bridges, deamination and oxidation reactions, formation of cyclic

base derivatives and also to DNA fragmentation [6]. These DNA alterations hamper sequenc-

ing technologies leading to less robust results and difficulties in interpreting data from se-

quencing experiments. Furthermore, a gold standard method for analysis of next-generation

sequencing (NGS) data is lacking and quality assurance programs are not launched yet. Differ-

ent bioinformatic analysis tools and pipelines have been developed for NGS data. However, it

appears that reproducibility between them needs to be improved [7]. Moreover, statistical

models for variant discovery and variant evaluation, designed for whole-exome or whole-ge-

nome data consisting of many samples with low coverage, may not be optimal for small ampli-

con datasets with few targeted regions. Thus, there is no generally accepted standard on how to

perform variant calling on amplicon sequencing data. These problems highlight the need for

sample preparation and data analysis pipelines optimized for amplicon sequencing of clinical

samples.

In this study, we describe an experimental and bioinformatic pipeline for amplicon sequenc-

ing of clinical fresh frozen and FFPE samples from CRC. Special focus is drawn on preparation

of sequencing libraries from low-quality FFPE samples. The bioinformatics pipeline, using an

adapted Genome Analysis Toolkit (GATK) Unified Genotyper, is explained in detail and com-

pared with other commonly used variant calling methods with respect to their suitability for

amplicon sequencing using FFPE material.

Materials and Methods

Patients

Thirty-three samples from 17 patients who underwent resection of liver metastasis of CRC

in the Department of Surgery, University Hospital Mannheim, between February 2012 and

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 2/18

February 2013 were included in this study. For all of these patients, either fresh frozen or for-

malin-fixed paraffin-embedded (FFPE) tissue was used for DNA isolation. From 10 patients,

paired frozen and FFPE tissue was available for study and from 5 patients, matched primary tu-

mors could be obtained from the archives of the Institute of Pathology, University Hospital

Mannheim. Additionally, one matched primary-metastasis pair from a neuroendocrine carci-

noma of the small bowel (Pat05), primary culture material from one patient (Pat16), material

from a prostate cancer patient and cell lines DLD-1, HCT116, HT55, HUH7, HEK293T, HS68

and SW480 were included in sequencing runs and analysis for other projects or as controls.

Samples were analyzed in two sequencing runs, one patient (Pat13) was analyzed in both runs

as control. All cell lines were obtained from ATCC. Information about patients can be found in

S1 Table.

Ethics approval

Ethics board approval was obtained from the Medical Ethics Commission II of the Medical

Faculty Mannheim, Heidelberg University, Mannheim, Germany (No. 2012-293N-MA, 2013-

841R-MA, 2014-551N-MA). Written informed consent from the donors of tissue samples was

obtained for the use in research.

Sample preparation

Frozen samples and cell lines. Samples from hepatic metastases from CRC patients were

transported in RPMI cell culture medium and were snap frozen on dry ice and subsequently

stored at -80°C. DNA isolation was done with the Qiagen DNeasy Blood & Tissue Kit (Qiagen,

Hilden, Germany) according to the manufacturers recommendations, including RNAse diges-

tion (Fig 1A). Cell lines were pelleted and DNA was isolated with the same protocol. Extracted

DNA was diluted and directly used for preparation of sequencing libraries.

FFPE samples. Tissue from hepatic metastases had been fixed in formalin und embedded

in paraffin during routine pathological work-up. Suitable blocks were chosen and five 10μm

slices were used for DNA extraction without microdissection. A slide stained with haematoxy-

lin and eosin (H&E) from each block was used to estimate the tumor cell content of the corre-

sponding slices by two investigators (TG and JB) using a double-headed microscope. DNA was

isolated using the Qiagen QIAamp DNA FFPE Kit according to the manufacturer’s instruc-

tions. DNA was eluted in 40μl Buffer ATE and concentrations were measured with NanoDrop

2000 (NanoDrop, Wilmington, USA) and Qubit BR kit (Life Technologies, Darmstadt, Ger-

many). Isolation yielded between 4.8μg and 22.8μg (mean 10.23μg) when measured with the

Qubit BR kit. Detailed information about preparation of FFPE samples can be found in S2

Table.

Library Preparation

DNA quality of FFPE samples was evaluated by determining the amount of amplifiable DNA

using the FFPE QC PCR (Illumina, San Diego, USA) according to the manufacturer’s recom-

mendations. Mean ΔCq-value of all FFPE samples was 2.0 (Median 1.9, Min 0.9, Max 4.1).

Nine samples (47%) had a ΔCq value higher than the recommended 2.0 (S2 Table). TruSeq

Amplicon Cancer Panel (Cat. No. FC-130-1008, Illumina) libraries were prepared with recom-

mended DNA amounts (150ng for fresh frozen material and cell lines, 250ng for FFPE sam-

ples). The panel includes 212 amplicons of 170–190bp lenght, targeting mutational hot spots in

48 cancer related genes. Amplicon regions are depicted in S3 Table.

Bioanalyzer (Agilent Technologies, Böblingen, Germany) was used to confirm successful li-

brary amplification and library quality of FFPE samples by assessing concentration of DNA

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 3/18

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 4/18

with aspired size (~310bp) and short DNA fragments (<150bp). To compare amounts of DNA

within the desired size region, the concentration of DNA amplicons in the range of 250–450bp

was calculated. Concentration of DNA with a size between 250bp and 450bp varied greatly be-

tween 51.7 and 93831.9 pg/μl (mean 5675.1 pg/μl, median 672.2 pg/μl) within the libraries of

different samples and inversely correlated with ΔCq values (Spearman’s Coefficient: -0.805, Fig

1B,S2 Table). For the samples with low DNA concentrations at the 310bp amplicon, library

preparation was repeated using highest possible DNA amounts (S1 Fig,S2 Table). Bioanalyzer

revealed higher concentrations of DNA around 250–450bp (365.3 pg/μl—5669.8 pg/μl; mean

6190.9 pg/μl; median 1996.3 pg/μl), however, with significant background of short DNA

fragments. After PCR clean-up of libraries, short DNA fragments were reduced, but three sam-

ples also showed diminished amounts of the 310bp amplicon and were thus excluded from

sequencing.

Data processing

Bioinformatic analysis pipeline is shown in Fig 2A. Reads were aligned against hg19 reference

genome using the BWA algorithm implemented the MiSeq software (MiSeq Reporter v2.2.29).

Fig 1. Depth of Sequencing correlates with DNA quality. (A) Sample preparation workflow. DNA was isolated from fresh frozen or FFPE CRC liver

metastasis resection specimens with Qiagen Blood and Tissue or FFPE kit, respectively. Frozen samples then directly underwent sequencing library

preparation, pooling of libraries, quality control and sequencing. FFPE samples were additionally tested for DNA quality by qPCR. Library quality was tested

with Bioanalyzer. For samples with low amounts of correctly sized DNA amplicons (fragments at 310bp), new libraries wereprepared with higher starting

DNA concentrations and re-analyzed with Bioanalyzer. Samples with yet low amounts of DNA with correct size and highly fragmented DNA were excluded.

(B) ΔCq-values of quality control PCR indicate poor sample quality. DNA concentration of fragments between 250bp and 450bp after library preparation was

calculated with Agilent Bioanalyzer and plotted against ΔCq values of FFPE quality control PCR. (C) higher ΔCq-values correlate with lower mean depth of

sequencing. (D) Coverage distribution of amplicons from all paired FFPE and frozen samples, normalizedto total sample coverage. Frozen samples had a

mean depth of 4,622, FFPE samples 1,852.

doi:10.1371/journal.pone.0127146.g001

Fig 2. Amplicon Sequencing identifies hot-spot mutations in CRC metastases. (A) Sequencing analysis workflow. Sequence alignment files underwent

local-realignment around Indels, left alignment and base quality score recalibration. After variant calling with GATK Unified Genotyper, annotation and effect

prediction of detected variants was done using SnpEff. Raw variants of all samples were filtered by custom parameters with SnpSift. Variants included in the

1000 Genomes Project data were excluded to only obtain somatic mutations in cancer. (B) High frequency of TP53 and APC mutations among somatic

mutations identified in CRC liver metastases (frozen and FFPE tissue). Colored fields represent presence of a nonsynonymous coding SNP (blue), a

mutation leading to a stop-codon (grey) or a frameshift mutation (orange). Bars sum up mutations present in each patient (vertical bars) or each mutated

gene (horizontal bars). Of note, some genes contain more than one mutation.

doi:10.1371/journal.pone.0127146.g002

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 5/18

BAM files were quality-checked with FASTQC (v.0.9.5; http://www.bioinformatics.babraham.

ac.uk/projects/fastqc/). Indels in sequence alignment files were left-aligned and local realign-

ment around Indels was done with the RealignerTargetCreator and the IndelRealigner tools

from the Genome Analysis Toolkit (GATK, version 2.4–9) [8]. Base quality score recalibration

was performed. Duplicate mapping and marking was not deemed suitable for amplicon se-

quencing and thus omitted.

Unified Genotyper pipeline

Variant calling. Unified Genotyper from the GATK (version 2.4–9) was used for variant

calling. All samples were processed in parallel and split into individual variant files for each

sample after variant calling. Maximum coverage per locus was increased from the default 250

to 9,000,000 to take into account the high depth of amplicon sequencing. (Downsampling to

lower depth is done in whole-exome studies to increase speed by saving memory). The mini-

mum confidence threshold for calling was set to 10, the minimum confidence threshold for

emitting to 30. SNPs and Indels were evaluated simultaneously. A region list of all amplicons

was used to define regions for single nucleotide polymorphism (SNP) and Indel calling to in-

crease analysis speed. As an alternative, the Unified Genotyper pipeline was used by processing

each sample individually, otherwise the same parameters were used.

Variant annotation and effect prediction. SnpEff (version 2.0.5, http://snpeff.

sourceforge.net/)[9] was used for variant annotation and effect prediction and the GATK Var-

iantAnnotator tool was run with the—A SnpEff option to add the SnpEff annotations with the

highest biological significance for each variant to the variant calling format (vcf) files. Subse-

quently, the vcf file with information about all sequenced samples was split into individual

sample variant files using the GATK SelectVariants program. Variants were annotated with the

variant frequencies in the 1000 genomes project using the SnpSift (http://snpeff.sourceforge.

net/SnpSift.html) annotate feature [9].

Variant filtering. SnpSift from the SnpEff package was used for filtering of raw variants.

The following quality-filter criteria were applied: quality by depth greater than 0.8 (QD >0.8),

total depth for calling variants at a specific locus greater than 200 (DP >200), Fisher strand

(Phred-scaled p-value using Fisher's Exact Test to detect strand bias) smaller than 70 (FS <

70), minimum variant confidence greater than 1500 (QUAL >1500), mapping quality greater

than 40 (MQ >40) and mapping quality rank sum test higher than -15 (! exists MQRankSum |

MQRankSum >-15). Filter criteria had been optimized by explorative analysis. Moreover,

only the coding variants were selected with the following expressions: (SNPEFF_EFFECT =

'NON_SYNONYMOUS_CODING') | (SNPEFF_EFFECT = 'CODON_CHANGE_PLUS_

CODON_DELETION') | (SNPEFF_EFFECT = 'CODON_DELETION') | (SNPEFF_EFFECT =

'FRAME_SHIFT') | (SNPEFF_EFFECT = 'STOP_GAINED')). All variants present in the 1000

Genomes data were excluded to obtain only somatic mutation data and exclude common

germline variants. Variant recalibration was not done due to the nature of targeted sequencing

data and the relatively small dataset.

SAMtools mpileup/BCF-tools pipeline

SAMtools (version 0.1.18) mpileup was used to generate raw variant calls with the—u (generate

uncompress BCF output),—f (faidx indexed reference sequence file),-D (output per-sample

DP),-S (output per-sample strand bias P-value) options and hg19 as reference genome, pro-

cessing all samples in parallel. Maximum per-sample depth for Indel and SNP calling was set

to 10,000. Bcftools view with—bvcg options (output BCF file format, output potential variant

sites only, call SNPs, call genotypes at variant sites) was used for variant calling. Data were

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 6/18

processed and variants were annotated as for GATK data described above. Variants at loci with

a depth of less than 50 were filtered out, as well as all non-coding variants and all variants pres-

ent in the 1000G data.

Illumina Somatic Variant Caller pipeline

MiSeq on-board software Somatic Variant Caller was run with default parameters. Vcf files

containing variant information were downloaded from Basespace. Subsequently, they were an-

notated with 1000G variant frequencies. All non-coding, silent, synonymous and unknown

variants were filtered out, as well as all variants present in 1000G data. Moreover, all variants at

a locus with coverage of <200, variants with a variant frequency <0.05 or with a genotype

quality less than 100 were excluded.

Data analysis and visualization

Filtered variants were exported from variant files into tab-delimited files using SnpSift and

concatenated into a single tab-delimited file including all variants of all patients. Descriptive

statistics and data visualization was performed using Microsoft Excel and R packages (http://

www.r-project.org/). Venn diagrams were made using venny (http://bioinfogp.cnb.csic.es/

tools/venny/index.html) and jvenn [10]. The Integrative Genomics Viewer was used for analy-

sis and visualization of specific mutated loci [11].

The amplicon sequenceing data of all samples were deposited in the European Nucleotide

Archive (ENA) and can be accessed with accession number PRJEB8754.

Sanger sequencing

Sanger sequencing was performed to evaluate KRAS exon 2 and BRAF exon 15 statuses as de-

scribed here [12]. Briefly, genomic DNA was extracted from FFPE tumor tissue after manual

macro-dissection using the QIAamp DNA Micro kit (Qiagen, Hilden, Germany). The following

PCR primers were used for amplification: 5-AACACATTTCAAGCCCCAAA-30(BRAF-F), 50-

GAAACTGGTTTCAAAATATTCGTT-30(BRAF-R), 50-AGGCCTGCTGAAAATGACTGAA

TA-30(KRAS-F), 50-CTGTATCAAAGAATGGTCCTGCAC-30(KRAS-R), 50-

Thermal cycling conditions were 5 min at 94°C, followed by 35 cycles of 94°C for 30 sec-

onds, 53°C (BRAF) or 60°C (KRAS) for 30 seconds and 72°C for 30 seconds followed by a final

incubation at 72°C for 7 minutes. After dye-terminator sequencing using the PCR amplifica-

tion primers, analyses by capillary electrophoresis were performed on a 3130 Genetic Analyzer

(Applied Biosystems, Foster City, CA).

Results

Depth of sequencing correlates with DNA quality

We sequenced 212 amplicon regions in 48 cancer related genes with Illumina MiSeq using

DNA isolated from resection specimens from 17 patients with CRC liver metastases. From ten

of these patients, paired fresh frozen and routinely processed FFPE tissue was available for

comparative study. Sequencing statistics and DNA quality measurements were analyzed to

evaluate differences of FFPE and frozen material (Fig 1A).

The number of paired reads and paired reads mapped was significantly higher in frozen

samples compared with FFPE samples, however the percentage of mapped/raw reads was only

78% compared with 96% in FFPE (Table 1). Mean sequencing quality (Phred score 38 vs. 37)

was gradually higher in FFPE samples compared to frozen samples; also the GC content was

higher in FFPE than in frozen tissue (49% vs. 45%). Detailed sequencing statistics for each

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 7/18

frozen and FFPE sample are shown in S4 Table. Frozen samples had a mean depth of 4,622

reads, FFPE samples of 1,852 reads. In FFPE samples, we investigated the correlation of se-

quencing depth with DNA quality measured by quality control PCR. This step is performed be-

fore library preparation and estimates the amount of amplifiable DNA as a surrogate for

functional DNA quality (Fig 1B and 1C). We found that higher ΔCq-values, indicative of lower

DNA quality, correlated with lower mean depth of sequencing (Pearson Coefficient -0.505, Fig

1C). Of note, higher ΔCq-values also correlated with higher GC-content of the samples (Pear-

son Coefficient 0.488, S2 Fig) while the depth of sequencing appeared to be independent of

mean GC-content of the sequenced sample (S2 Fig). Fig 1D shows histograms of the coverage

of amplicons for each paired FFPE and frozen samples, normalized to total coverage of the

sample. FFPE samples tend to have a less balanced distribution of coverage on the different

amplicons than frozen samples.

These data indicate that sequencing performance correlates with DNA quality of sequenced

FFPE samples.

High concordance of mutations identified in frozen and FFPE samples

from CRC metastases

Recent large scale projects have identified the most common mutations occurring in CRC [1].

Sequencing 212 amplicon regions in 48 cancer related genes, we analyzed variant calls using an

adapted Unified Genotyper analysis pipeline.

In the sequenced tumor samples from 16 patients (frozen and/or FFPE), a total of 29 muta-

tions were identified in eleven genes after excluding all non-coding mutations, all synonymous

variants, and all non-harmful variants present in the 1000 genomes data (Fig 2A–2B). The

number of mutations per patient varied from zero to four, mean number of mutations per pa-

tient was 1.8. Of the mutations, 16 were SNPs, four were Indels leading to a frameshift and

nine to a stop codon. The most frequently mutated gene was TP53, which showed 10 mutations

in nine of the patients. We observed seven APC mutations in six patients, while KRAS and

PIK3CA were mutated two and tree times, respectively (Fig 2B).

DNA from FFPE tissues may have alterations due to the process of fixation in formalin. We

compared the variants identified in paired frozen and FFPE tissues. In ten sequenced patients

with paired frozen and FFPE tissue, 23 mutations were identified in FFPE samples and 21 mu-

tations in frozen samples, thus a concordance of 91% could be observed (Fig 3A and 3B). The

two non-matching mutations (BRAF V600E and ATM E1971G) were both identified in the

FFPE but not in the frozen sample of patient 09. Sanger sequencing of the BRAF mutational

hotspot in exon 15 was performed, revealing V600E mutation. Of note, six percent of >10,000

reads at the BRAF V600E locus in the frozen sample showed the alternative base “T”, which

however did not lead to a variant call with Unified Genotyper pipeline (Fig 3C).

Table 1. Sequencing statistics of all patient samples.

FFPE Frozen

# Raw Paired Reads 423,355 1,278,334

# Paired Reads Mapped 407,493 984,164

% Mapped/Raw 96% 78%

Mean Depth 1,852 4,622

%GC 49 45

Mean Seq. Quality (Phred score) 38 37

Mean Mapping Quality 49.63 48.98

doi:10.1371/journal.pone.0127146.t001

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 8/18

The correlation between observed percentage of tumor cells on representative FFPE slides

and calculated variant frequency for selected mutations was moderate (Fig 3D).

These data show that sequencing of FFPE tissue can lead to overall similar results as se-

quencing frozen material and could thus be a feasible approach for routine clinical samples.

Low reproducibility of variant calling in FFPE and frozen tissue with

different bioinformatics pipelines

Low reproducibility between different variant calling pipelines has been reported for whole-ge-

nome or whole-exome sequencing data [7]. To test whether this problem also occurs with

amplicon sequencing data, we compared different tools for variant calling in order to test re-

producibility of our results. We observed marked differences between different variant calling

Fig 3. Paired frozen and FFPE samples of CRC liver metastases have a high concordance of mutations in hotspot cancer genes. (A) GATK Unified

Genotyper variant calling pipeline was used to identify non-synonymous coding mutations in FFPE (green) and frozen samples (red). (B) Venn-Diagram of

non-synonymous coding mutations identified in FFPE and frozen samples. (C) Representative imagesof reads mapped to the site of BRAF V600E mutation

identified in FFPE but not in frozen tissue of patient 09, displayed with the Integrative Genomics Viewer. (D) Variant frequency of selected mutations and

estimated tumor cell content analyzing FFPE samples.

doi:10.1371/journal.pone.0127146.g003

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 9/18

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 10 / 18

software (Fig 4). Compared to Unified Genotyper pipeline (Fig 4A and 4B) Samtools/BCFtools

found five of the mutations identified with the Unified Genotyper pipeline (patient 04 APC,

patient 09 CDH1, patient 12 KRAS and TP53 and patient 14 TP53). The APC mutation of pa-

tient 09 was also identified at the same locus but only in the frozen sample. However, two addi-

tional APC frameshift mutations in patients 03 and 13 were called only by Samtools/BCFtools.

In contrast, 15 mutations called with the Unifed Genotyper pipeline in both FFPE and frozen

as well as two mutations called only in FFPE tissue were not identified with Samtools/BCFtools.

Thus, Samtools/BCFtools as used in our pipeline seems to be less sensitive, although it may

identify additional small Indels leading to frameshift mutations (Fig 4C and 4D). Moreover, re-

sults from Illumina MiSeq on-board Somatic Variant Caller pipeline are shown in Fig 4E and

4F. Notably, this pipeline appears to call variants in both frozen and FFPE samples that are not

identified by other pipelines.

Regarding the paired primary CRCs we analyzed from patients 04, 10, 11 and 14, Illumina

Somatic Variant Caller again called more variants than others, especially in patient 04 (S5

Table). Cell lines that were included as controls are shown in S6 Table. In the cell lines, almost

identical results were obtained with the Unified Genotyper pipeline and Illumina Somatic Vari-

ant Caller, while Samtools mpileup/Bcftools was less sensitive.

All variant data from patients and cell lines obtained with different variant calling pipelines

can be found in S7 Table.

These data indicate that remarkable differences exist among results of different variant call-

ing pipelines, which are not only related to DNA sample quality.

Sensitivity and specifity of amplicon sequencing with respect to different

variant calling pipelines using frozen and FFPE tissues

To evaluate sensitivity and specifity of amplicon sequencing analyzed with different bioinfor-

matics tools, we performed Sanger sequencing of KRAS exon 2. As shown in Table 2, sensitivity

and specifity were 100% using Unified Genotyper with DNA isolated from frozen samples. In

FFPE samples, one discordant case (patient 02) was noted, which had KRAS c.38G>A

Fig 4. Comparison of different methods for variant calling. Mutations identified in matched frozen and FFPE tissue of CRC liver metastases detected

with (A, B) Genome Analysis Toolkit (GATK) Unified Genotyper (C, D) Samtools mpileup/Bcftools and (E, F) Somatic variant caller. Green color represents

FFPE samples, red represents frozen, color intensities represent number of non-synonymous coding mutations per gene.

doi:10.1371/journal.pone.0127146.g004

Table 2. KRAS mutations identified by sanger sequencing compared to deep amplicon sequencing analyzed with different variant calling tools.

No. Sanger UG frozen UG FFPE SAM frozen SAM FFPE SVC frozen SVC FFPE

Pat02 c.38G>A NA 0 NA 0 NA c.423T>A

Pat03 0 0 0 0 0 0 0

Pat04 c.35G>A c.35G>A c.35G>A 0 0 c.35G>A c.35G>A

Pat06 0 NA 0 NA 0 NA 0

Pat08 0 0 0 0 0 0 0

Pat09 0 0 0 0 0 0 0

Pat11 0 0 0 0 0 0 0

Pat12 c.35G>T c.35G>T c.35G>T c.35G>T c.35G>T c.35G>T c.35G>T

Pat17 0 0 NA 0 NA 0 NA

Pat18 0 0 NA 0 NA 0 NA

UG, Uniﬁed Genotyper pipeline; SAM, Samtools mpileup/Bcftools pipeline, SVC, Somatic Variant Caller; NA, not available

doi:10.1371/journal.pone.0127146.t002

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 11 / 18

mutation according to Sanger sequencing. However, of note, Sanger sequencing was performed

with material from the primary tumor and the metastatic piece analyzed with amplicon se-

quencing had estimated tumor content of only 10%. In addition, none of the reads had the mu-

tated variant at the mutation locus (S3 Fig). Frozen tumor sample was not available from this

patient. Regarding other variant calling pipelines, Samtools/BCFtools failed to identify KRAS

mutation of patient 04, while Somatic Variant Caller had a false positive call in patient 02 FFPE

sample, missing the mutation at codon 38.

Additionally, human cancer cell lines were analyzed to test concordance of variant calling

pipelines irrespective of sample quality and to evaluate suitability of filter criteria. As shown in

S4 Fig, a high concordance is observed between variant loci identified in cancer cell lines after

filtering poor quality and non-harmful variants. Moreover, almost all of the variant loci in cell

lines HCT116, HT55, HUH7 and SW480 identified with Unified Genotyper pipeline were also

identified by large scale databases Cell Line Encyclopedia [13] and COSMIC [14], while discor-

dant loci were largely eliminated from our data upon filtering (S4 Fig).

Accordingly, in CRC metastases substantial differences can be observed between raw data-

sets and datasets after filtering variants by quality measures and functional annotations. Vari-

ant count is substantially reduced, while concordance between frozen and FFPE, as well as

between different variant calling pipelines increases. Results are presented in S5 Fig.

Processing all sequence alignment files together for variant calling is

more sensitive than separately

Processing many samples together for variant calling is generally recommended for whole-ge-

nome or whole-exome sequencing data in order to increase the number of reads at specific loci.

However, it is not known whether this is also beneficial for deep amplicon sequencing, since it

might lower the impact of rare variants only present in a subset of tumor cells in few samples.

In contrast, it might increase sensitivity for common mutations present in many samples. We

observed a general increase in sensitivity for variant calling when samples were processed in

parallel (S6A Fig and S6B Fig) compared to separate processing with otherwise identical pipe-

line and filter criteria (S6C Fig and S6D Fig). Separate variant calling identified no additional

mutation compared to combined variant calling, but missed three mutations in frozen samples

and five mutations in FFPE samples. Hence, even in high-depth amplicon sequencing data,

processing samples in parallel appears to be beneficial.

Discussion

We performed amplicon sequencing of hot-spot mutational regions in cancer related genes in

clinical samples from 16 patients with metastatic CRC with Illumina MiSeq. From ten patients,

we compared results of fresh frozen and FFPE tissue and observed a high concordance of vari-

ant calls using GATK Unified Genotyper pipeline with adapted filter criteria and processing

variant calling on all samples in parallel. Thereby, we illustrate the general feasibility of ampli-

con sequencing in FFPE tissue. However, we observed marked differences among tested variant

calling pipelines even in this small dataset, highlighting the importance of benchmarking and

development of more robust variant calling methods. Moreover, preparation of sequencing

libraries with DNA from low quality FFPE samples remains challenging. Here, we prepared se-

quencing libraries also for samples with poor quality by increasing input DNA and demon-

strated successful library amplification by analysis with Agilent Bioanalyzer. However, we

observed that samples with poor DNA quality also had lower sequencing coverage and were

more problematic for variant calling.

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 12 / 18

Observed mutation frequencies were in line with literature data for APC, TP53, PIK3CA

and KRAS being most frequently altered in CRC [1]. TP53 but not APC had the highest muta-

tion frequency in our cohort, which may be due to the fact that we sequenced metastases and

TP53 mutation is known to be a comparably late step in the carcinogenic process [15]. APC

mutations occurred in lower frequency than expected. It is likely that mutations in regions not

targeted by our approach have been missed.

Mutation calling in NGS data is challenging due to various potential sources of error, in-

cluding not only sequencing errors, but also artifacts occurring during PCR amplification, in-

correct local alignment or problems due to tumor heterogeneity [16]. According to the data

presented here, concordance of sequencing results with different variant calling pipelines was

generally low. A pipeline based on Unified Genotyper by GATK and SnpEff was used and com-

pared with the output of SAMtools/BCFtools and the Illumina Somatic Variant Caller regard-

ing both FFPE and frozen samples in direct comparison. The former appeard to be less

sensitive than Unified Genotyper pipeline, however, it also identified variants not found by the

other pipelines. The latter method showed more variant calls than the other pipelines. Howev-

er, since many variant calls were present either only in the FFPE or the frozen sample of the

same patient, and since several in CRC unexpected or unusual mutations appeared, especially

in poor quality samples, it is very likely that many of the additional variants identified by this

pipeline are false positive. Compared with Sanger sequencing of KRAS exon 2, a high concor-

dance of Unified Genotyper pipeline results was shown. For one patient, we observed a discor-

dant mutation status by Sanger and Illumina sequencing. However, Sanger sequencing was

performed with material from the primary tumor and the sample from the liver metastsis had

an estimated tumor content of only 10%. Notably, no mutant reads were observed in at the

KRAS exon 2 locus in the deep sequencing data. Hence, tumor heterogeneity or low tumor

content of the sequenced material can have led to the false-negative result. Other groups have

analyzed concordance of different variant calling pipelines in whole-exome and whole-genome

data [7,16–18]. O’Rave et al. [7] sequenced 15 exomes and found a low concordance of only

57% for SNPs running five different analysis pipelines and 27% for Indels running three dif-

ferent analysis pipelines with near default parameters. Pabinger et al. [17] provided a broad

overview on software for NGS data analysis and tested 32 different programs for variant identi-

fication, annotation and visualization on four data sets including two cancer datasets. They

grouped tools into such categories as “germline callers”and “somatic callers”. The concordance

of five tested “germline callers”was low for SNPs and zero for Indels, while they also found no

common variant with three “somatic callers”, analyzing whole-exome datasets. These studies

highlight the problems of accurate variant identification in large-scale whole-exome data. Our

study, however, is to our knowledge the first to raise this issue in rather small-scale, targeted

amplicon sequencing data from clinical, formalin-fixed samples. Some authors suggested to an-

alyze datasets with different variant callers and to combine their results. However, comparing

the results of different tools that apply different quality metrics, can be difficult and time con-

suming. Statistical methods, such as applying false-discovery-rate confidence values, have been

developed to rank mutation calls from different tools [19]. More elaborate experimental proce-

dures like sequencing replicates of normal tissue are necessary for such methods, which reduce

feasibility of such methods for clinical amplicon sequencing, for which in many retrospective

settings not even normal tissue is available as a reference. The same holds true for extensive val-

idation of data by applying different sequencing technologies, e.g. Sanger sequencing, to rule

out false-positive calls. This may in our view be a feasible approach to confirm novel SNPs cor-

related to inherited diseases in whole-exome studies of individual patients, but not for the

study of cancer genomes in retrospective studies on limited archival tumor material. Moreover,

the problem of false negatives cannot be overcome with this approach.

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 13 / 18

The study of FFPE material makes analysis even more difficult due to highly fragmented

DNA and SNP artifacts, for instance related to cytosine deamination to uracil [20]. Our data

indicate general feasibility of amplicon sequencing using FFPE tissue, demonstrating a high

concordance of variant calls in matched frozen and FFPE samples using GATK Unified Geno-

typer pipeline with adapted filter criteria. Notably, the two non-matching mutations (BRAF

V600E and ATM E1971G) were both identified in the FFPE sample but not in the frozen sam-

ple of the same patient, suggesting false positive calls due to low DNA quality (FFPE sample of

patient 09 had comparably high ΔCq value, comp. S2 Table). Sanger sequencing of the BRAF

mutational hotspot in exon 15, however, revealed V600E mutation. Hence, variant calling of

amplicon sequencing led to a false negative result in frozen tissue, most likely due to low

amounts of tumor cells or tumor heterogeneity. Interestingly, six per cent mutated reads of

>10,000 reads at the BRAF V600E locus in the frozen sample were not sufficient for a mutation

call by Unified Genotyper pipeline.

Substantiating our results, a few authors have reported NGS studies using FFPE material,

also demonstrating feasibility of this approach [6,21–28]. For instance, Wagle et al. [26] suc-

cessfully applied exome capture sequencing to target 137 “druggable”mutations in 10 FFPE

samples form colon and breast cancer patients. Spencer et al. [6] provided the yet only study di-

rectly comparing paired frozen and FFPE samples from 16 patients with lung cancer, using hy-

bridization capture enrichment for sequencing 27 cancer related genes. They also used Unified

Genotyper from GATK for variant calling. They also found greater coverage variability and in-

creased C to T transitions in FFPE samples while base calls between paired frozen and FFPE

samples had concordances as high as 99%.

An important issue for sequencing of FFPE samples is DNA preparation, qualification and

library preparation [29]. It is recommended to measure DNA with Qubit (and NanoDrop) as-

says to assess purity and quantity of DNA [29]. As in previous reports, quantity of DNA isolat-

ed from our samples measured by NanoDrop differed from Qubit results, with NanoDrop

generally overestimating DNA amounts [29]. Moreover, it has been recommended to quantify

the “functional”, amplifiable DNA content of DNA isolated from FFPE tissue before applying

to NGS techniques, especially to PCR based amplicon sequencing. Sah et al. [30] reported that

a qPCR based assay (QFI-PCR), similar to the FFPE quality assay (Illumina) that we performed

on our samples, could identify poorest quality samples. Moreover, similar to our approach,

samples with low amounts of amplifiable DNA could be “rescued”by increasing input

amounts. According to our data, the amount of amplifiable DNA (represented by a low ΔCq-

Value) in FFPE samples correlates with the amount of properly amplified library DNA. We

also could rescue some of the samples that had libraries with low amounts of properly ampli-

fied DNA by increasing DNA input. Since we used maximum DNA input, the ideal increase in

DNA amount for poor samples and also the cut-off to exclude poorest samples remains to be

defined. This is especially valid as a minimum input of precious DNA as possible is desired. In-

terestingly, the samples with lowest amounts of amplifiable DNA also had a higher number of

variant calls and a markedly increased number of false positive calls according to data from

Sah et al. [30], indicating that the amount of amplifiable DNA is also a surrogate for general

DNA quality. In our dataset, similar effects could be observed. The depth of sequencing was

lower for samples with high ΔCq-values. Moreover, samples with low amounts of properly am-

plified library DNA, as measured by Bioanalyzer, tended to have many (most likely false-posi-

tive) variant calls, when analyzed with Somatic Variant Caller pipeline (compare S2 Table,Fig

4,S5 Table). However, with the Unified Genotyper variant calling pipeline and strict variant fil-

tering, this effect could be diminished. Thus, poor sample quality can in part be compensated

with advanced bioinformatics methods. Nevertheless, this pipeline appears to have the problem

of enhanced false negatives in samples with poor quality compared to the Somatic Variant

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 14 / 18

Caller pipeline (compare Fig 4), indicating that variant calling remains problematic especially

in samples with poor DNA quality. In any case, best possible sample preparation is crucial to

allow optimal results in variant calling. Methods for biochemical modification of DNA from

FFPE tissues have been proposed [20], however, more data is needed to verify these interesting

results before implementing into current practice. Criteria for excluding poor quality samples

from sequencing also have to be refined. A recommended cut-off of ΔCq >2 would have ex-

cluded almost half of our samples, which is not satisfactory for clinical studies with rare patient

material. Further, larger studies would need to be employed to identify potentially suitable cut-

off values. Remarkably, large differences exist between the recommended amounts of input

DNA between Illumina TruSeq amplicon panel (>250ng) and IonTorrent AmpliSeq panel, for

which libraries can be prepared with as little as 10ng DNA.

Our study has several limitations. First, only a small set of patients has been analyzed. Larger

series of FFPE samples have to be sequenced to show feasibility for routine practice and clinical

studies. Moreover, many other variant calling algorithms and pipelines are available, that are

steadily improving. Nevertheless, we believe that the problems of reproducibility of variant calling

can be well demonstrated with this small dataset of matched frozen and FFPE samples, exempli-

fied by the tested variant calling pipelines. Almost all of our samples were from 2012. A larger

amount of older FFPE samples would have to be analyzed to define their usability for clinical se-

quencing. Data from previous studies, however, suggests that age does not generally have a big

impact on DNA quality, but rather fixation time in formalin seems to be of major importance [6].

Conclusions

In conclusion, our data shows that amplicon sequencing of clinical CRC samples is a viable ap-

proach to characterize druggable, predictive or prognostic mutations in the cancer genome. A

high concordance between mutations identified in frozen tissue and paired FFPE samples does

furthermore suggest that also archived tissues from pathology departments can be used for ge-

nomic profiling with this method. However, bioinformatic pipelines for data analysis yet show

marked differences in results. Moreover, dedicated sample and library preparation and qualifi-

cation, including exclusion of poorest quality samples, have to be done. For the use of amplicon

sequencing in routine diagnostics or in clinical studies, gold standard methods have to be de-

fined, which should lead to higher reproducibility.

Supporting Information

S1 Fig. Sequencing libraries produced with low-quality DNA. Bioanalyzer was used to mea-

sure amounts of DNA by fragment length of DNA from FFPE samples during the library prep-

aration workflow. Three representative samples are shown: Patient 04 had high levels of DNA

with the aspired DNA fragment size of ~310bp and low amounts of short length DNA frag-

ments <100bp after initial library preparation with standard input amounts. Patients 13 and

15 are examples for low quality DNA with low amounts of DNA around 310bp and high

amounts of highly fragmented DNA. Library preparation was repeated for those samples using

maximum DNA input (compare S2 Table), which led to significantly higher concentrations of

correct sized DNA for sample 13, but not for sample 15. Highly increased background, short-

fragment DNA was shown to be reduced after the PCR clean-up step. Patient 13 was then se-

quenced, while patient 15 was excluded.

(TIF)

S2 Fig. Functional DNA amount correlates with GC-content. (A) ΔCq-values of quality con-

trol PCR of FFPE samples are plotted against GC content of sample DNA. (B) GC content of

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 15 / 18

sample DNA and mean depth of sequencing of FFPE samples analyzed.

(TIF)

S3 Fig. No KRAS mutation identified by amplicon sequencing in patient 02. Representative

image of reads mapped to the site of KRAS exon 2 with no mutated reads detected at the muta-

tional hot-spot at codon 38 in FFPE tissue from the liver metastasis, displayed with the Integra-

tive Genomics Viewer. KRAS mutation had been detected in the primary tumor by Sanger

sequencing. The expected mutational locus is indicated by black lines.

(TIF)

S4 Fig. Filtering removes false positive variant calls in cell lines analyzed by deep amplicon

sequencing. (A,B) Variant calling of deep amplicon sequencing data from cell lines HCT116,

DLD-1, SW480, HUH7, HT55, HEK293T and HS68 was performed with GATK Unified Geno-

typer (UG), SamTools/BcfTools (SAM) or Illumina Somatic Variant Caller (SVC) without any

filtering of variants (A) or with exclusion of variants below defined quality thresholds, synony-

mous and non-coding variants, as well as variants present in the 1000G data (B). Concordance

of genomic variant loci identified with the tree pipelines was analyzed with jvenn. (C,D) Over-

lap of variant loci identified in HCT116, HT55, HUH7 and SW480 with the GATK Unified

Genotyper pipeline with variant loci detected by the Cell Line Encyclopedia Project [13]is

shown without (C) or with (D) filtering out variants below quality thresholds, synonymous

and non-coding variants, as well as variants present in the 1000G data. (E,F) Overlap of variant

loci identified in HCT116, HT55 and HUH7 with the GATK Unified Genotyper pipeline with

variant loci detected by the COSMIC cell line project [14] is shown without (E) or with (F) fil-

tering out low quality variants, synonymous and non-coding variants, as well as non-harmful

variants present in the 1000G data.

(TIF)

S5 Fig. Concordance of variant loci in frozen and FFPE samples analyzed with three differ-

ent variant calling pipelines with and without filtering. (A) Variant calling of sequencing

data from matched frozen and FFPE samples were performed with GATK Unified Genotyper

(UG), SamTools/BcfTools (SAM) or Illumina Somatic Variant Caller (SVC) without any filter-

ing of variants. Overlap of genomic variant loci identified in each group are shown. Below, the

number of variant loci identified in each group are outlined. (B) Variants from (A) were anno-

tated and variants with low quality metrics, synonymous and non-coding variants, as well as

variants present in the 1000G data were filtered out. Again, overlap of genomic variant loci

identified in each group are shown. Below, the number of variant loci identified in each group

are outlined. Fields with “0”overlap are left empty.

(TIF)

S6 Fig. Variant calling is more sensitive when samples are processed together compared

with analyzing each sample individually. (A, B) GATK Unified Genotyper pipeline with vari-

ant calling in all analyzed samples together or (C, D) separate. Green color represents FFPE

samples, red represents frozen, color intensities represent number of non-synonymous coding

mutations per gene.

(TIF)

S1 Table. Patients.

(PDF)

S2 Table. Sample and library preparation.

(PDF)

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 16 / 18

S3 Table. List of amplicons and targeted regions.

(XLSX)

S4 Table. Patient sequencing statistics.

(XLSX)

S5 Table. Mutations identified in primary tumors with different variant calling pipelines.

(PDF)

S6 Table. Mutations identified in cell lines with different variant calling pipelines.

(PDF)

S7 Table. All variants identified in analyzed samples

(XLSX)

S8 Table. All unfiltered variants identified in analyzed samples

(XLSX)

Author Contributions

Conceived and designed the experiments: JB GK SP KH MB. Performed the experiments: JB

TM SL GE TG. Analyzed the data: JB GK TZ MPE MB. Contributed reagents/materials/analy-

sis tools: CLG SP KH. Wrote the paper: JB GK TG MPE KH MB.

References

1. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal

cancer. Nature. 2012; 487: 330–337. doi: 10.1038/nature11252 PMID: 22810696

2. Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, et al. The Genomic Landscapes of

Human Breast and Colorectal Cancers. Science. 2007; 318: 1108–1113. PMID: 17932254

3. Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990; 61: 759–767.

PMID: 2188735

4. Han S-W, Kim H-P, Shin J-Y, Jeong E-G, Lee W-C, Lee K-H, et al. Targeted Sequencing of Cancer-Re-

lated Genes in Colorectal Cancer Using Next-Generation Sequencing. PLoS ONE. 2013; 8: e64271.

doi: 10.1371/journal.pone.0064271 PMID: 23700467

5. Tougeron D, Lecomte T, Pages JC, Villalva C, Collin C, Ferru A, et al. Effect of low-frequency KRAS

mutations on the response to anti-EGFR therapy in metastatic colorectal cancer. Ann Oncol. 2013; 24:

1267–1273. doi: 10.1093/annonc/mds620 PMID: 23293113

6. Spencer DH, Sehn JK, Abel HJ, Watson MA, Pfeifer JD, Duncavage EJ. Comparison of clinical targeted

next-generation sequence data from formalin-fixed and fresh-frozen tissue specimens. J Mol Diagn.

2013; 15: 623–633. doi: 10.1016/j.jmoldx.2013.05.004 PMID: 23810758

7. O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, et al. Low concordance of multiple variant-callingpipe-

lines: practical implications for exome andgenome sequencing. Genome Med. 2013; 5: 28. doi: 10.

1186/gm432 PMID: 23537139

8. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis

Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res.

2010; 20: 1297–1303. doi: 10.1101/gr.107524.110 PMID: 20644199

9. Cingolani P, Platts A, Wang Le, Coon M, Nguyen T, Wang L, et al. A program for annotating and pre-

dicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila mel-

anogaster strain w (1118); iso-2; iso-3. Fly. 2012; 6: 80–92. doi: 10.4161/fly.19695 PMID: 22728672

10. Bardou P, Mariette J, Escudié F, Djemiel C, Klopp C. jvenn: an interactive Venn diagram viewer. BMC

Bioinformatics 2014; 15:293. doi: 10.1186/1471-2105-15-293 PMID: 25176396

11. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance ge-

nomics data visualization and exploration. Brief Bioinform 2013; 14:178–192. doi: 10.1093/bib/bbs017

PMID: 22517427

12. Ahls MG, Niedergethmann M, Dinter D, Sauer C, Lüttges J, Post S, et al. Case report: Intraductal tubu-

lopapillary neoplasm of the pancreas with unique clear cell phenotype. Diagn Pathol 2014; 9:11. doi:

10.1186/1746-1596-9-11 PMID: 24443801

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 17 / 18

13. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line

Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012; 483:603–607.

doi: 10.1038/nature11003 PMID: 22460905

14. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, et al. COSMIC: exploring the

world's knowledge of somatic mutations in human cancer. Nucleic Acids Research 2015; 43:D805–

811. doi: 10.1093/nar/gku1075 PMID: 25355519

15. Baker SJ, Preisinger AC, Jessup JM, Paraskeva C, Markowitz S, Willson JK, et al. p53 gene mutations

occur in combination with 17p allelic deletions as late events in colorectal tumorigenesis. Cancer Res

1990; 50:7717–7722. PMID: 2253215

16. Kim SY, Speed TP. Comparing somatic mutation-callers: beyond Venn diagrams. BMC Bioinformatics

2013; 14:189. doi: 10.1186/1471-2105-14-189 PMID: 23758877

17. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, et al. A survey of tools for variant

analysis of next-generation genome sequencing data. Brief Bioinform 2014; 15:256–278. doi: 10.1093/

bib/bbs086 PMID: 23341494

18. Roberts ND, Kortschak RD, Parker WT, Schreiber AW, Branford S, Scott HS, et al. A comparative anal-

ysis of algorithms for somatic SNV detection in cancer. Bioinformatics 2013; 29:2223–2230. doi: 10.

1093/bioinformatics/btt375 PMID: 23842810

19. Löwer M, Renard BY, de Graaf J, Wagner M, Paret C, Kneip C, et al. Confidence-based Somatic Muta-

tion Evaluation and Prioritization. PLoS Comput Biol 2012; 8:e1002714. doi: 10.1371/journal.pcbi.

1002714 PMID: 23028300

20. Do H, Dobrovic A. Dramatic reduction of sequence artefacts from DNA isolated from formalin-fixed can-

cer biopsies by treatment with uracil- DNA glycosylase. Oncotarget 2012; 3:546–558. PMID: 22643842

21. Pritchard CC, Salipante SJ, Koehler K, Smith C, Scroggins S, Wood B, et al. Validation and Implemen-

tation of Targeted Capture and Sequencing for the Detection of Actionable Mutation, Copy Number Var-

iation, and Gene Rearrangement in Clinical Cancer Specimens. J Mol Diagn 2013. doi: 10.1016/j.

jmoldx.2013.08.004

22. Frampton GM, Fichtenholtz A, Otto GA, Wang K, Downing SR, He J, et al. Development and validation

of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotech-

nol. 2013; 31:1023–1031. doi: 10.1038/nbt.2696 PMID: 24142049

23. Endris V, Penzel R, Warth A, Muckenhuber A, Schirmacher P, Stenzinger A, et al. Molecular diagnostic

profiling of lung cancer specimens with a semiconductor-based massive parallel sequencing approach:

feasibility, costs, and performance compared with conventional sequencing. J Mol Diagn 2013;

15:765–775. doi: 10.1016/j.jmoldx.2013.06.002 PMID: 23973117

24. Becker K, Vollbrecht C, Koitzsch U, Koenig K, Fassunke J, Huss S, et al. Deep ion sequencing of ampli-

con adapter ligated libraries: a novel tool in molecular diagnostics of formalin fixed and paraffin embed-

ded tissues. J Clin Pathol 2013; 66:803–806. doi: 10.1136/jclinpath-2013-201549 PMID: 23618693

25. Hadd AG, Houghton J, Choudhary A, Sah S, Chen L, Marko AC, et al. Targeted, high-depth, next-gen-

eration sequencing of cancer genes in formalin-fixed, paraffin-embedded and fine-needle aspiration

tumor specimens. J Mol Diagn 2013; 15:234–247. doi: 10.1016/j.jmoldx.2012.11.006 PMID: 23321017

26. Wagle N, Berger MF, Davis MJ, Blumenstiel B, Defelice M, Pochanard P, et al. High-throughput detec-

tion of actionable genomic alterations in clinical tumor samples by targeted, massively parallel se-

quencing. Cancer Discov 2012; 2:82–93. doi: 10.1158/2159-8290.CD-11-0184 PMID: 22585170

27. Kerick M, Isau M, Timmermann B, Sültmann H, Herwig R, Krobitsch S, et al. Targeted high throughput

sequencing in clinical cancer settings: formaldehyde fixed-paraffin embedded (FFPE) tumor tissues,

input amount and tumor heterogeneity. BMC Med Genomics 2011; 4:68. doi: 10.1186/1755-8794-4-68

PMID: 21958464

28. Schweiger MR, Kerick M, Timmermann B, Albrecht MW, Borodina T, Parkhomchuk D, et al. Genome-

Wide Massively Parallel Sequencing of Formaldehyde Fixed-Paraffin Embedded (FFPE) Tumor Tis-

sues for Copy-Number- and Mutation-Analysis. PLoS ONE 2009; 4:e5548. doi: 10.1371/journal.pone.

0005548 PMID: 19440246

29. Simbolo M, Gottardi M, Corbo V, Fassan M, Mafficini A, Malpeli G, et al. DNA Qualification Workflow for

Next Generation Sequencing of Histopathological Samples. PLoS ONE 2013; 8:e62692. doi: 10.1371/

journal.pone.0062692 PMID: 23762227

30. Sah S, Chen L, Houghton J, Kemppainen J, Marko AC, Zeigler R, et al. Functional DNA quantification

guides accurate next-generation sequencing mutation detection in formalin-fixed, paraffin-embedded

tumor biopsies. Genome Med 2013; 5:77. doi: 10.1186/gm481 PMID: 24001039

Amplicon Sequencing of Frozen and Formalin Fixed Samples

PLOS ONE | DOI:10.1371/journal.pone.0127146 May 26, 2015 18 / 18

Supplementary Figures and Tables

Data

May 2015

Johannes Betge · Grainne Kerr · Thilo Miersch · Svenja Leible · Michael Boutros

Download

Why Formalin-fixed, Paraffin-embedded Biospecimens Must Be Used in Genomic Medicine: An Evidence-based Review and Conclusion

Article

Full-text available

Jul 2020

Fresh-frozen tissue is the “gold standard” biospecimen type for next-generation sequencing (NGS). However, collecting frozen tissue is usually not feasible because clinical workflows deliver formalin-fixed, paraffin-embedded (FFPE) tissue blocks. Some clinicians and researchers are reticent to embrace the use of FFPE tissue for NGS because FFPE tissue can yield low quantities of degraded DNA, containing formalin-induced mutations. We describe the process by which formalin-induced deamination can lead to artifactual cytosine (C) to thymine (T) and guanine (G) to adenine (A) (C:G > T:A) mutation calls and perform a literature review of 17 publications that compare NGS data from patient-matched fresh-frozen and FFPE tissue blocks. We conclude that although it is indeed true that sequencing data from FFPE tissue can be poorer than those from frozen tissue, any differences occur at an inconsequential magnitude, and FFPE biospecimens can be used in genomic medicine with confidence:

Comparison of Fresh Frozen Tissue With Formalin-Fixed Paraffin-Embedded Tissue for Mutation Analysis Using a Multi-Gene Panel in Patients With Colorectal Cancer

Article

Full-text available

Mar 2020

Background: Next generation sequencing (NGS)-based multi-gene panel tests have been performed to predict the treatment response and prognosis in patients with colorectal cancer (CRC). Whether the multi-gene mutation results of formalin-fixed paraffin-embedded (FFPE) tissues are identical to those of fresh frozen tissues remains unknown. Methods: A 22-gene panel with 103 hotspots was used to detect mutations in paired fresh frozen tissue and FFPE tissue from 118 patients with CRC. Results: In our study, 117 patients (99.2%) had one or more variants, with 226 variants in FFPE tissue and 221 in fresh frozen tissue. Of the 129 variants identified in this study, 96 variants were present in both FFPE and fresh frozen tissues; 27 variants were found in FFPE tissues only; 6 variants were found only in fresh frozen tissues. The mutation results demonstrated >94.0% concordance in all variants, with Kappa coefficient >0.500 in 64.3% (83/129) of variants. At the gene level, concordance ranged from 73.8 to 100.0%, with Kappa coefficient >0.500 in 81.3% (13/16) of genes. Conclusions: The results of mutation analysis performed with a multi-gene panel and FFPE and fresh frozen tissue were highly concordant in patients with CRC, at both the variant and gene levels. There were, however, some important differences in mutation results between the two tissue types. Therefore, fresh frozen tissue should not routinely be replaced with FFPE tissue for mutation analysis with a multi-gene panel. Rather, FFPE tissue is a reasonable alternative for fresh frozen tissue when the latter is unavailable.

An Effective and Viable DNA Extraction Protocol for FFPE Tissues and its Effect on Downstream Molecular Application

Article

Jan 2020

Nadir paranazal sinüs kanserlerinde yeni tanımlanan reseptör tirozin kinaz mutasyonları ve potansiyel fonksiyonel etkileriNovel receptor tyrosine kinase mutations in rare paranasal sinus cancers and their potential functional implications

Article

Mar 2023

Amaç: Paranazal sinüs kanserleri oldukça nadir görülen heterojen bir hastalık grubudur. Maksiler sinüs skuamoz hücreli karsinomu, paranazal sinüs kanserlerinin anatomik ve histolojik olarak en yaygın alt tipidir. Bu kanserin genetik profiline dair bilginin sınırlı olması, hastaların hedefli tedavi seçeneklerinden yararlanamamasına neden olmaktadır. Çalışmamızda bu nadir kanserdeki reseptör tirozin kinaz mutasyonlarının tanımlanması ve mutasyonların olası fonksiyonel etkilerinin tahmin edilmesi amaçlanmıştır. Gereç ve Yöntem: Bu amaçla 30 olgunun tümörüne ait FFPE dokulardan DNA izolasyonu gerçekleştirildi, olguların mutasyon profili yeni nesil sekanslama yöntemi ve biyoinformatik değerlendirme ile belirlendi. Belirlenen patojenik/ olası patojenik varyantların fonksiyonel etkileri farklı in silico araçlar yardımıyla tahminlendi. Bulgular: Olgularının tamamında en az bir adet patojenik/olası patojenik KIT, PDFGRA ve RETmutasyonu belirlendi. KIT geninin katalitik bölgesindeki mutasyonların kinaz aktivitesini arttıracağı tahmin edildi. PDFGRA genindeki p.P567P ve p.D1074D mutasyonları, 30 olgunun tamamında ve SRA veritabanından elde edilen normal dokulara ait okumaların tümünde belirlendi. Sonuç: Reseptör tirozin kinaz mutasyonlarının paranazal sinüs kanserlerinde de önemli rol oynayabileceğinin belirlenmiş olması özellikle artmış kinaz aktivitesini hedefleyen tedavi yaklaşımlarını bu olguların erişimine sunma potansiyeli taşıması bakımından oldukça önemlidir.

Performance of 16S Metagenomic Profiling in Formalin-Fixed Paraffin-Embedded versus Fresh-Frozen Colorectal Cancer Tissues

Article

Full-text available

Oct 2021

Formalin-fixed, paraffin-embedded (FFPE) tissues represent the most widely available clinical material to study colorectal cancer (CRC). However, the accuracy and clinical validity of FFPE microbiome profiling in CRC is uncertain. Here, we compared the microbial composition of 10 paired fresh-frozen (FF) and FFPE CRC tissues using 16S rRNA sequencing and RNA-ISH. Both sample types showed different microbial diversity and composition. FF samples were enriched in archaea and representative CRC-associated bacteria, such as Firmicutes, Bacteroidetes and Fusobacteria. Conversely, FFPE samples were mainly enriched in typical contaminants, such as Sphingomonadales and Rhodobacterales. RNA-ISH in FFPE tissues confirmed the presence of CRC-associated bacteria, such as Fusobacterium and Bacteroides, as well as Propionibacterium allowing discrimination between tumor-associated and contaminant taxa. An internal quality index showed that the degree of similarity within sample pairs inversely correlated with the dominance of contaminant taxa. Given the importance of FFPE specimens for larger studies in human cancer genomics, our findings may provide useful indications on potential confounding factors to consider for accurate and reproducible metagenomics analyses.

The Mutational Concordance of Fixed Formalin Paraffin Embedded and Fresh Frozen Gastro-Oesophageal Tumours Using Whole Exome Sequencing

Article

Full-text available

Jan 2021

1. Background: The application of massively parallel sequencing has led to the identification of aberrant druggable pathways and somatic mutations within therapeutically relevant genes in gastro-oesophageal cancer. Given the widespread use of formalin-fixed paraffin-embedded (FFPE) samples in the study of this disease, it would be beneficial, especially for the purposes of biomarker evaluation, to assess the concordance between comprehensive exome-wide sequencing data from archival FFPE samples originating from a prospective clinical study and those derived from fresh-frozen material. 2. Methods: We analysed whole-exome sequencing data to define the mutational concordance of 16 matched fresh-frozen and FFPE gastro-oesophageal tumours (N = 32) from a prospective clinical study. We assessed DNA integrity prior to sequencing and then identified coding mutations in genes that have previously been implicated in other cancers. In addition, we calculated the mutant-allele heterogeneity (MATH) for these samples. 3. Results: Although there was increased degradation of DNA in FFPE samples compared with frozen samples, sequencing data from only two FFPE samples failed to reach an adequate mapping quality threshold. Using a filtering threshold of mutant read counts of at least ten and a minimum of 5% variant allele frequency (VAF) we found that there was a high median mutational concordance of 97% (range 80.1-98.68%) between fresh-frozen and FFPE gastro-oesophageal tumour-derived exomes. However, the majority of FFPE tumours had higher mutant-allele heterogeneity (MATH) scores when compared with corresponding frozen tumours (p < 0.001), suggesting that FFPE-based exome sequencing is likely to over-represent tumour heterogeneity in FFPE samples compared to fresh-frozen samples. Furthermore, we identified coding mutations in 120 cancer-related genes, including those associated with chromatin remodelling and Wnt/β-catenin and Receptor Tyrosine Kinase signalling. 4. Conclusions: These data suggest that comprehensive genomic data can be generated from exome sequencing of selected DNA samples extracted from archival FFPE gastro-oesophageal tumour tissues within the context of prospective clinical trials.

Pisces: An Accurate and Versatile Variant Caller for Somatic and Germline Next-Generation Sequencing Data

Preprint

Apr 2018

Motivation Next-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic. Results We have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant. Availability Pisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces . Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene mutations. Contact pisces@illumina.com Supplementary information Supplementary data are available online.

Special Issue "Multimodality Treatments in Metastatic Gastric Cancer", The Journal of Clinical Medicine (ISSN 2077-0383, IF = 3.303); Message from the guest editor

Research Proposal

Full-text available

Apr 2021

Angelica Petrillo

Dear Colleagues, Gastric cancer represents one of the most frequent and lethal tumors worldwide today, finding itself in the fifth place in incidence and the third in mortality. Surgery remains the only curative treatment for localized tumors, but only 20% of patients are suitable for surgery due to the lack of specific symptoms and the late diagnosis, especially in Western countries. Additionally, even in patients who receive curative treatment, rates of locoregional relapse and distant metastasis remain high. Palliative chemotherapy is the principal treatment in cases of metastatic disease even if the prognosis of patients receiving chemotherapy is still poor. Therefore, a multidisciplinary evaluation is important in order to improve the efficacy of active treatments. In this context, there is an unmet need for a better understanding of genetic alterations and prognostic and predictive factors in order to choose the best tailored therapy for each patient. The aim of this Special Issue is to focus on the results and problems of multimodality treatment in metastatic gastric cancer, the search for prognostic and predictive factors, and the evaluation of novel strategies for individualized treatment. We are inviting relevant original research, systematic reviews, meta-analyses, and short communications covering the above-mentioned topics. https://www.mdpi.com/journal/jcm/special_issues/Gastric_Treatments

Next-Generation Sequencing and Image-Guided Tissue Sampling: A Primer for Interventional Radiologists

Article

Mar 2023

The discovery of increasing numbers of actionable molecular and gene targets for cancer treatment has driven the demand for tissue sampling for next-generation sequencing (NGS). Requirements for sequencing can be very specific, and inadequate sampling leads to delays in management and decision making. It is important that interventional radiologists are aware of NGS technologies and their common applications and be cognizant of the factors that contribute to successful sample sequencing. This review summarizes the fundamentals of cancer tissue collection and processing for NGS. It elaborates on sequencing technologies and their applications with the aim of providing readers with a working understanding that can enhance their clinical practice. It then describes imaging, tumor, biopsy, and sample collection factors that improve the chances of NGS success. Finally, it discusses future practice, highlighting the problem of undersampling in both clinical and research settings and the opportunities within interventional radiology to address this.

Minimizing Sample Failure Rates for Challenging Clinical Tumor Samples

Article

Feb 2023

Identification of somatic variants in cancer by high-throughput sequencing has become common clinical practice largely because many of these variants may be predictive biomarkers for targeted therapies. However, there can be high sample quality control (QC) failure rates for some tests preventing the return of results. SLIMamp is a patented technology that has been incorporated into commercially available cancer NGS testing kits with the claimed advantage that these kits can interrogate challenging formalin-fixed paraffin-embedded tissue (FFPET) samples with low tumor purity, poor DNA quality, and/or low input DNA, resulting in a high sample QC pass rate. The aim of this study was to substantiate that claim using Pillar's oncoReveal Solid Tumor Panel. 48 samples that had failed one or more pre-analytical QC sample parameters for whole exome sequencing (WES) from ATGC's ISO15189 accredited diagnostic genomics laboratory were acquired. XING Genomic Services (XGS) performed an exploratory data analysis to characterize the samples and then tested the samples in their ISO15189 accredited laboratory. Clinical reports could be generated for 37 samples (77%), of which 29 (60%) contained clinically actionable or significant variants that would not have otherwise been identified. 11 samples were deemed to be unreportable and the sequencing data were likely dominated by artefacts. A novel post-sequencing QC metric was developed which can discriminate between clinically reportable and unreportable samples.

Comprehensive molecular characterization of human colon and rectal cancer. Nature 487:330-337

Article

Full-text available

Jul 2012

To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase ε (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression. Supplementary information The online version of this article (doi:10.1038/nature11252) contains supplementary material, which is available to authorized users.

Abstract 5142: COSMIC: Exploring the world's knowledge of somatic mutations in cancer.

Article

Full-text available

Oct 2014
NUCLEIC ACIDS RES

COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk) is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. Our latest release (v70; Aug 2014) describes 2 002 811 coding point mutations in over one million tumor samples and across most human genes. To emphasize depth of knowledge on known cancer genes, mutation information is curated manually from the scientific literature, allowing very precise definitions of disease types and patient details. Combination of almost 20 000 published studies gives substantial resolution of how mutations and phenotypes relate in human cancer, providing insights into the stratification of mutations and biomarkers across cancer patient populations. Conversely, our curation of cancer genomes (over 12 000) emphasizes knowledge breadth, driving discovery of unrecognized cancer-driving hotspots and molecular targets. Our high-resolution curation approach is globally unique, giving substantial insight into molecular biomarkers in human oncology. In addition, COSMIC also details more than six million noncoding mutations, 10 534 gene fusions, 61 299 genome rearrangements, 695 504 abnormal copy number segments and 60 119 787 abnormal expression variants. All these types of somatic mutation are annotated to both the human genome and each affected coding gene, then correlated across disease and mutation types.

Jvenn: An interactive Venn diagram viewer

Article

Full-text available

Aug 2014
BMC BIOINFORMATICS

Background Venn diagrams are commonly used to display list comparison. In biology, they are widely used to show the differences between gene lists originating from different differential analyses, for instance. They thus allow the comparison between different experimental conditions or between different methods. However, when the number of input lists exceeds four, the diagram becomes difficult to read. Alternative layouts and dynamic display features can improve its use and its readability. Results jvenn is a new JavaScript library. It processes lists and produces Venn diagrams. It handles up to six input lists and presents results using classical or Edwards-Venn layouts. User interactions can be controlled and customized. Finally, jvenn can easily be embeded in a web page, allowing to have dynamic Venn diagrams. Conclusions jvenn is an open source component for web environments helping scientists to analyze their data. The library package, which comes with full documentation and an example, is freely available at http://bioinfo.genotoul.fr/jvenn.

Case report: Intraductal tubulopapillary neoplasm of the pancreas with unique clear cell phenotype

Article

Full-text available

Jan 2014
Diagn Pathol

Intraductal tubulopapillary neoplasms of the pancreas are very rare tumors characterized by intraductal tubulopapillary growth, ductal differentiation, scant intracellular mucin production and cellular dysplasia. Here, we report the first case of an intraductal tubulopapillary neoplasm of the pancreas with clear cell morphology. The tumor was detected during the diagnostic work-up of acute pancreatitis in a 43- year old female. Histological examination revealed a tumor with the typical architecture of an intraductal tubulopapillary neoplasm of the pancreas with tumor cells showing abundant clear cytoplasm and Di-PAS negativity. Immunohistochemistry revealed positivity for Pan-CK, CK7, CK8/18, MUC1, MUC6, carbonic anhydrase IX, CD10, EMA, β-catenin and e-cadherin. Sanger sequencing did not detect mutations for β-catenin, BRAF, KRAS, PIK3CA and GNAS. Altogether, histology, immunohistochemical expression profile (MUC1+, MUC6+, MUC2-, MUC5AC-, thrypsin-, chymotrypsin-, CDX2-) and sequencing results led to the diagnosis of intraductal tubulopapillary neoplasm. However, the neoplasm consisted of cells showing abundant clear cytoplasm, a morphological pattern not being described so far in the current classification of pancreatic intraductal neoplasms. Potential differential diagnosis and the molecular basis of clear cell morphology are discussed. In conclusion, we consider this tumor as intraductal tubulopapillary neoplasm of the pancreas with unique clear cell phenotype. After surgery and without adjuvant therapy, the patient’s clinical course has been uneventful for over two years now. Virtual slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1051828790117127

Validation and Implementation of Targeted Capture and Sequencing for the Detection of Actionable Mutation, Copy Number Variation, and Gene Rearrangement in Clinical Cancer Specimens

Article

Full-text available

Nov 2013

Recent years have seen development and implementation of anticancer therapies targeted to particular gene mutations, but methods to assay clinical cancer specimens in a comprehensive way for the critical mutations remain underdeveloped. We have developed UW-OncoPlex, a clinical molecular diagnostic assay to provide simultaneous deep-sequencing information, based on >500× average coverage, for all classes of mutations in 194 clinically relevant genes. To validate UW-OncoPlex, we tested 98 previously characterized clinical tumor specimens from 10 different cancer types, including 41 formalin-fixed paraffin-embedded tissue samples. Mixing studies indicated reliable mutation detection in samples with ≥10% tumor cells. In clinical samples with ≥10% tumor cells, UW-OncoPlex correctly identified 129 of 130 known mutations [sensitivity 99.2%, (95% CI, 95.8%-99.9%)], including single nucleotide variants, small insertions and deletions, internal tandem duplications, gene copy number gains and amplifications, gene copy losses, chromosomal gains and losses, and actionable genomic rearrangements, including ALK-EML4, ROS1, PML-RARA, and BCR-ABL. In the same samples, the assay also identified actionable point mutations in genes not previously analyzed and novel gene rearrangements of MLL and GRIK4 in melanoma, and of ASXL1, PIK3R1, and SGCZ in acute myeloid leukemia. To best guide existing and emerging treatment regimens and facilitate integration of genomic testing with patient care, we developed a framework for data analysis, decision support, and reporting clinically actionable results.

A genetic model for colorectal carcinogenesis

Article