ArticlePDF Available

Quantitation of next generation sequencing library preparation protocol efficiencies using droplet digital PCR assays - a systematic comparison of DNA library preparation kits for Illumina sequencing

Authors:

Abstract and Figures

Background The emergence of next-generation sequencing (NGS) technologies in the past decade has allowed the democratization of DNA sequencing both in terms of price per sequenced bases and ease to produce DNA libraries. When it comes to preparing DNA sequencing libraries for Illumina, the current market leader, a plethora of kits are available and it can be difficult for the users to determine which kit is the most appropriate and efficient for their applications; the main concerns being not only cost but also minimal bias, yield and time efficiency. ResultsWe compared 9 commercially available library preparation kits in a systematic manner using the same DNA sample by probing the amount of DNA remaining after each protocol steps using a new droplet digital PCR (ddPCR) assay. This method allows the precise quantification of fragments bearing either adaptors or P5/P7 sequences on both ends just after ligation or PCR enrichment. We also investigated the potential influence of DNA input and DNA fragment size on the final library preparation efficiency. The overall library preparations efficiencies of the libraries show important variations between the different kits with the ones combining several steps into a single one exhibiting some final yields 4 to 7 times higher than the other kits. Detailed ddPCR data also reveal that the adaptor ligation yield itself varies by more than a factor of 10 between kits, certain ligation efficiencies being so low that it could impair the original library complexity and impoverish the sequencing results. When a PCR enrichment step is necessary, lower adaptor-ligated DNA inputs leads to greater amplification yields, hiding the latent disparity between kits. Conclusion We describe a ddPCR assay that allows us to probe the efficiency of the most critical step in the library preparation, ligation, and to draw conclusion on which kits is more likely to preserve the sample heterogeneity and reduce the need of amplification.
Content may be subject to copyright.
M E T H O D O L O G Y A R T I C L E Open Access
Quantitation of next generation
sequencing library preparation protocol
efficiencies using droplet digital PCR assays
- a systematic comparison of DNA library
preparation kits for Illumina sequencing
Louise Aigrain
*
, Yong Gu and Michael A. Quail
Abstract
Background: The emergence of next-generation sequencing (NGS) technologies in the past decade has allowed
the democratization of DNA sequencing both in terms of price per sequenced bases and ease to produce DNA
libraries. When it comes to preparing DNA sequencing libraries for Illumina, the current market leader, a plethora of
kits are available and it can be difficult for the users to determine which kit is the most appropriate and efficient for
their applications; the main concerns being not only cost but also minimal bias, yield and time efficiency.
Results: We compared 9 commercially available library preparation kits in a systematic manner using the same DNA
sample by probing the amount of DNA remaining after each protocol steps using a new droplet digital PCR (ddPCR)
assay. This method allows the precise quantification of fragments bearing either adaptors or P5/P7 sequences on both
ends just after ligation or PCR enrichment. We also investigated the potential influence of DNA input and DNA fragment
size on the final library preparation efficiency. The overall library preparations efficiencies of the libraries show important
variations between the different kits with the ones combining several steps into a single one exhibiting some final yields
4 to 7 times higher than the other kits. Detailed ddPCR data also reveal that the adaptor ligation yield itself varies by
more than a factor of 10 between kits, certain ligation efficiencies being so low that it could impair the original library
complexity and impoverish the sequencing results. When a PCR enrichment step is necessary, lower adaptor-ligated
DNA inputs leads to greater amplification yields, hiding the latent disparity between kits.
Conclusion: We describe a ddPCR assay that allows us to probe the efficiency of the most critical step in the library
preparation, ligation, and to draw conclusion on which kits is more likely to preserve the sample heterogeneity and
reduce the need of amplification.
Keywords: DNA library preparation, Next generation sequencing, NGS, Illumina sequencing, Droplet digital PCR
* Correspondence: la8@sanger.ac.uk
Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambs
CB10 1SA, UK
© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Aigrain et al. BMC Genomics (2016) 17:458
DOI 10.1186/s12864-016-2757-4
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Background
Laboratories preparing DNA for Illumina sequencing
have access to a quantity of protocols and commercial
kits and their numbers are constantly increasing. These
kits vary not only in price but also in their protocol.
Some of them follow the classical protocol of shearing,
end-repair, A-tailing, adaptor ligation and amplification
with clean-up between most or all steps, while others
have bespoke adaptor ligation steps, or combine several
of these steps into a single one, or dont even require
any amplification at all [1, 2]. The nature of the protocol
and reagents used might greatly affect the efficiency of
the library preparation but very few laboratories conduct
a quantitative comparison between several available kits
before choosing the most appropriate one for their spe-
cific application [35].
We developed an assay based on droplet digital
PCR (ddPCR) technology to measure the amount of
DNA remaining after each steps of a protocol, as well
as the percentage of fragment bearing adaptors at
their ends after the ligation step, or P5/P7 primers
after amplification [6]. In contrast with qPCR, ddPCR
doesnt require the use of any standards to calculate
the absolute number of specific molecules in a sample
[4, 710]. This allows the quantification of not only
the overall yield, as normally done with qPCR, but
also of the yield of some critical intermediate steps
such as the adaptor ligation [1114].
We present here the quantitative comparison of 9 kits:
NEBNext and NEBNext Ultra from New England
Biolabs, SureSelectXT from Agilent, Truseq Nano and
Truseq DNA PCR-free from Illumina, Accel-NGS 1S
and Accel-NGS 2S from Swift Biosciences, and KAPA
Hyper and KAPA HyperPlus from KAPA Biosystems. All
libraries were prepared using the same DNA sample
(barcoded amplicons from phiX174 [15]), and the differ-
ent kits where compared in terms of overall and step-
wise efficiencies, DNA loss, protocol length, flexibility
and complexity. We also noticed variations in the size of
the final libraries despite the use of identical bead ratio
during the clean-up steps. Our results should help la-
boratories already present or entering the NGS field to
choose the most appropriate kit for their specific appli-
cations and requirements.
Results
DNA library preparation kits for Illumina sequencing
We tested 9 kits listed in Table 1 following the protocol
recommended in each manual but keeping the ratio of
beads during the clean-up steps, the PCR reagents and
settings for the amplification step identical between kits
in order to allow a direct comparison between the
ddPCR results. We made sure that these slight modifica-
tions always remained in the ranges recommended by
the manufacturers. Table 2 summaries the overall proto-
col for each of the kits and the total number of steps re-
quired. The total number of steps correlates well with
the length of the library preparation both in term of
overall preparation time and hands-on time. Combining
several steps into a unique one as it is done in the NEB-
Next Ultra and both KAPA kits not only decrease the
overall preparation time, it also improves the DNA re-
covery as most DNA loss occurs during bead clean-up
steps [16, 17]. The KAPA HyperPlus kit also contains a
fragmentase step instead of the classical mechanical
shearing step and post-shearing clean-up necessary
before any other kit [1, 3, 4, 18, 19]. After fragmentase
treatment, the sample can go straight into the end repair
and A-tailing step, improving the DNA recovery and re-
ducing overall preparation time even further.
Certain kits offer more flexibility than others when it
comes to the choice of adaptors. Every kit except the
KAPA ones provides their own adaptors, however for
most of them the users can decide to use their own if
necessary. All the adaptors tested in this study exhibit
identical sequence in the first dozen double-stranded
bases directly involved in the ligation step, ensuring a
similar behaviour independently of the adaptor chosen
Table 1 List of the library preparation kits, DNA inputs and adapters tested
Kit Manufacturer Reference DNA inputs (ng) Adaptors
NEBNext® New England Biolabs®
Inc.
Cat. #E6040S/L 500 Sanger ([1], current protocols)
NEBNext® UltraNew England Biolabs®
Inc.
Cat. #E7370S/L 500 Sanger
SureSelectXT Agilent Cat. #930075 500 Sanger
Truseq® Nano Illumina® Cat. # FC-121-9010DOC, Part # 15041110 Rev. B 500 & 100 Sanger & Illumina
Truseq® DNA PCR-free Illumina® Cat. # FC-121-9006DOC, Part # 15036187 Rev. B 500 Sanger & Illumina
Accel-NGS1S Swift BiosciencesCat. No. DL-ILM1S-12/48, Version 04291444 500 & 100 Swift Biosciences
Accel-NGS2S Swift BiosciencesCat. No. DL-ILM2-48, Version 01131444/2.8 500 & 20 Swift Biosciences
KAPA Hyper KAPA Biosystems Cat. #KR0961 v1.14 500 Sanger
KAPA HyperPlus KAPA Biosystems Cat. #KR1145 v14.1 500 & 20 Sanger
Aigrain et al. BMC Genomics (2016) 17:458 Page 2 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
(Additional file 1: Figure S1). The exception to this is
the kits from Swift Bioscience where adaptor ligation is
split into 2 sequential steps, on one DNA strand and
then the other, making it difficult for the user to use
onesown adaptors.
Yields and DNA input
Our droplet digital PCR assay allowed us to probe the
amount of DNA remaining in each sample after A-
tailing, after adaptor ligation and after PCR (Figs. 1 and
2). We also measured the amount of adaptor ligated
DNA after the ligation step and the amount of fragment
bearing P5 and P7 primers after the PCR step (Fig. 3). In
the case of Truseq DNA PCR-free, the adaptor used
already contained the P5 and P7 sequence so that the
post-ligation sample is ready for sequencing.
During all the steps before ligation, low or no DNA
loss is observed except with the Truseq DNA PCR-free
kit where more than 80 % of the initial DNA was lost
due to more numerous and stringent bead clean-up steps
recommended (upper and lower Spri clean-ups, Fig. 4)
[17]. This explains why the user is advised to start with
1μg of DNA for the Truseq DNA PCR-free protocol.
After adaptor ligation, we were able to both probe the
amount of DNA remaining and the efficiency of the
ligation reaction itself, which, as expected, was the most
critical step of all. Unfortunately, in the case of the Swift
Biosciences kits, we were not able to measurement the
amount of DNA bearing adaptors at their ends due to
the specificity of the Swift Biosciences adaptor ligation
chemistry which prevented us from using our own adap-
tors and primers. For the other kits, the variation of
ligation efficiency was very marked; some kits exhibiting
Table 2 Description of the type and number of steps for each DNA library kit tested
End repair Bead cleaning A-tailing Bead cleaning Adaptor ligation Bead cleaning PCR & bead cleaning Number of steps
after shearing
NEBNext x x x x x x x 8
NEBNext Ultra 2 in 1
a
xxx 5
SureSelect x x x x x x x 8
Truseq Nano x x x x x x 7
Truseq DNA
PCR-free
xxx
b
xxx 6
Accel-NGS 1S
c
Adaptase 1st extension x 2nd extension x x 7
Accel-NGS 2S
c
4 different steps + 4 bead cleaning x 10
KAPA Hyper
d
2in1
a
x x (x) 3 (or 5)
KAPA
HyperPlus
d,e
x x x (x) 3 (or 5)
a
Both End-repair and A-tailing enzymes are combined in a single reaction mix
b
Illumina recommends performing an upper and lower bead clean-up selection after the end repair step
c
Swift Biosciences Accel protocols follow different chemical steps than the classical end-repair, A-tailing, adaptor ligation and PCR
d
KAPA Hyper and KAPA HyperPlus protocol dont always require a PCR amplification step
e
KAPA HyperPlus protocol starts with non-sheared DNA. The 1st step of the protocol corresponds to the enzymatic shearing of the DNA sample (fragmentase).
This fragmentase step leaves blunt-ended DNA fragments which dont require End-repair and can go straight to A-tailing without any bead clean-up
Fig. 1 Principle of ddPCR the droplet generator creates an emulsion with the sample containing the DNA, PCR enzyme and buffer, specific
primers and Taqman probe (left). Only droplets containing a DNA fragment will exhibit a high fluorescence after the PCR amplification (middle).
The sample is then analysed with a droplet reader which counts the number of fluorescence and empty droplets in a channel corresponding to
the initial number of target molecule in the sample
Aigrain et al. BMC Genomics (2016) 17:458 Page 3 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Fig. 2 Schematic of the ddPCR assay to test the amount of DNA remaining at each step of the library preparation (using DNA fragment specific
primers shown by blue arrows [15]) and to measure the amount of DNA fragment bearing adaptors after the ligation step (using the adaptor
specific primers shown in orange) and P5/P7 primers after the amplification step (using the P5/P7 primers shown in green)
Fig. 3 Example of yield measurements obtained with the NEBNext kit when comparing the amount of DNA at each step versus the initial DNA
input (500 ng). Blue bars correspond to values measured using the PhiX specific primers and reflect the DNA loss at each step mainly due to
pipetting and bead clean up. The orange bar corresponds to the value measured with the adaptor specific primers and reflects the amount of
DNA in the sample bearing adaptor after the ligation step in comparison with the initial DNA input. The green bar corresponds to the value
measured with the P5/P7 primers and reflects the amount of sequencable DNA in the sample at the end of the library preparation
Aigrain et al. BMC Genomics (2016) 17:458 Page 4 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
such low adaptor ligation yields that it could impair the
final complexity of the library (Fig. 4), while other per-
formed extremely well. This remains the case even when
looking at the yield in a stepwise manner (Additional file
1: Figure S2) rather than the overall yield. For NEBNext,
SureSelectXT, Illumina Truseq Nano and KAPA Hyper,
the ligation step yield varies from 15 to 40 %. A very low
step yield of 3.5 % was measured for NEBNext Ultra and
in contrast 100 % ligation efficiency was observed for
the KAPA HyperPlus kit.
Such variation of ligation efficiencies can be entirely
masked when focusing on the post PCR yields. Most kits
exhibiting an overall post PCR yield between 100 and
150 % after 10 cycles of amplification when measuring
the amount of fragment bearing P5 and P7 primers ver-
sus DNA input, at the exception of the KAPA HyperPlus
kit for which the overall PCR yield is just above 800 %.
However the stepwise yields of the PCR steps, when
comparing with the amount of DNA bearing adaptors
just after the ligation, were much more variable with
values ranging from 500 % to almost 4000 %. The yields
of the PCR step also appeared anticorrelated with the
yield of the ligation step.
For kits designed specifically for low DNA input, we
tested the same DNA input as for any other kits, 500 ng,
and compared with lower inputs (100 ng or 20 ng). We
noticed that the ligation step was slightly more efficient
with the higher DNA input, however the same high DNA
input led to lower PCR step yields (Additional file 1:
Figure S3). High DNA input PCR can indeed inhibit the
amplification reaction explaining the anticorrelation ob-
served between ligation and PCR yields; very efficient
Fig. 4 Bar charts showing the overall DNA library preparation yields of the different tested kits in comparison with the initial DNA input (500 ng).
Except where mentioned otherwise all libraries were prepared using the original Illumina Paired end adaptor (also named Sanger adaptors) [6,
22]. After end repair and A-tailing, the DNA loss was estimated using the PhiX specific primers (1st column). After adaptor ligation, both the DNA
amount (PhiX primers, 2nd column) and the adaptor ligation efficiency (adaptor specific primers, 3rd column) were measured, except for the Accel
kits for which we were not able to measure directly the adaptor ligation efficiency as the adaptor sequences were unknown and are marked by a
start (NM for not measured). After PCR, both the total amount of DNA (PhiX primers, 4th column) and the amount of DNA bearing P5 and P7
primers at their ends were measured (P5 & P7 primers, last column)
Aigrain et al. BMC Genomics (2016) 17:458 Page 5 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ligation steps leading to high DNA input for the PCR step.
Other factors such as limiting dNTPs or primers during
amplification might also have a similar effect.
Bias on the fragment size
During this study, sample fragment sizes were assayed
using a Bioanalyzer to check the profile of the input
DNA (same DNA stock for all the samples) and the final
libraries [20]. We noticed that the profile of libraries
prepared with different kits varied significantly despite
the fact that both the DNA input and the bead clean-up
ratios were kept identical (except for the Truseq DNA
PCR-free kit which recommends an upper and lower
Spri clean-up after end repair). All the libraries were
started with an equimolar ratio of the 3 PhiX DNA frag-
ments used and we expect some slight variation after the
library preparation as the smallest DNA fragment might
be prone to more loss during the bead clean-up steps.
But the variation observed between kits was much more
serious than just loss of the shorter fragments as it can
obviously be seen when looking at the example of Bioa-
nalyzer traces in Additional file 1: Figure S4.
To quantify this variation more accurately we calcu-
lated the ratio between the 3 PhiX fragments before li-
brary preparation (equimolar ratio of ~33 % each) and
post library preparation. We then plotted the variation
between the pre- and post-library preparation ratios as
show in Fig. 5. The libraries prepared with the Truseq
DNA PCR-free kit were not included in Fig. 5 due to the
difference in the protocols which prevent us from doing
any straight comparison.
Data quality
All the libraries prepared during this study were se-
quenced on an Illumina Miseq platform. To compare the
data quality of different libraries, we compared the error
rates such as insertions and mismatches (Fig. 6). While all
libraries performed well with overall error rates lower than
0.2 %, we observed some differences between kits. Both
Accel kits exhibit higher error rates than the other, above
0.18 % while all the other kits lead to error rates below
0.13 %. The main source of error for all the kits was al-
ways mismatches however, in the case of Accel-NGS 2S
kits, insertions were also observed. Among the other kits,
the NEB, Agilent and KAPA ones had the best perform-
ance with error rates below 0.1 %.
Discussion
We compared the practicability, reproducibility and qual-
ity of the libraries and sequencing data produced using 9
Fig. 5 Bar chart representing the percentage of variation between the 3 different PhiX fragments before and after library preparation with the
different kits tested. For each sample, the molar concentration of the 3 PhiX fragments was estimated before and after library preparation on a
Bioanalyzer chip. The ratio between the 3 fragments before library prep was close to 33 % for each, but important variations were observed after
library preparation. Here we plotted the difference between the pre- and post-library preparation ratio as a percentage. All the libraries were
prepared using the original Illumina Paired end adaptor [6, 22] except for the Truseq Nano kit and the Swift kits
Aigrain et al. BMC Genomics (2016) 17:458 Page 6 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
different kits to prepare Illumina DNA libraries. What we
mean by practicality is the overall time required to prepare
a library, the hands on time, and the number of steps in-
volved in the process [1, 3, 4]. In our experience, overall
preparation time correlates very well with the total num-
ber of steps in a protocol when including clean-up steps.
Therefore any kit combining several steps into a single
ones and limiting the number of clean-ups should be
favoured if preparation time is a critical parameter in the
project. The fastest protocols are the NEBNext Ultra kit
and the KAPA kits, particularly the KAPA HyperPlus, and
up to certain extend the Illumina Truseq DNA PCR-free.
Certain kits are specifically designed for low DNA in-
put such as the NEBNext Ultra and Swift Accel-NGS 2S,
while others such as the KAPA ones accept a wide range
of DNA input from a 1 ng to 1 μg. However if the
ligation efficiency of a kit is very low (<15 %) as it is the
case for the NEBNext Ultra kit, or if the DNA loss dur-
ing the library preparation is high (>50 %) as it is the
case for the Accel kits, the final amount of sequencable
DNA becomes worryingly low. It is important to high-
light that this study focuses on evaluating the efficiency
of each steps of different library preparation protocols
and we did not assess directly the complexity of the li-
brary. Bearing this in mind, the KAPA HyperPlus kit
which exhibits a fully efficient adaptor ligation step and
less than 10 % DNA loss, appears as the kit of choice for
any low DNA input sample.
The Truseq DNA PCR-free kit is the only one recom-
mending an input as high as 1 μg due to the stringent
clean-up steps to remove both too long and too short
DNA fragments from the library. Nonetheless avoiding
any amplification step presents great advantages not
only in terms of preparation time but also to minimise
bias. The amplification step can indeed introduce artifi-
cial mutations which are difficult to distinguish from real
SNPs [1, 21, 22]. The sample composition can also be
affected by polymerases amplifying preferentially certain
fragments over others, and this phenomenon can
become very preeminent for non-GC neutral samples
[13, 2325]. Although certain enzymes have been shown
to exhibit very high fidelity and low bias even for AT- or
GC-rich DNA, the possibility to simply avoid any ampli-
fication at all can drastically improve the data quality for
such samples [1, 2]. It is important to highlight that not
only the Truseq DNA PCR-free kit but also any other kit
exhibiting a high ligation efficiency could potentially be
used without any PCR step, as long as the sequence of
the used adaptors contains the P5/P7 primers sequence
necessary for sequencing on an Illumina platform.
Another factor often ignored is the shearing step. Most
protocols necessitate already sheared and cleaned-up
DNA to start the library preparation, and sonication on a
Covaris instrument is often the method of choice due to
its reproducibility and tunability [1, 3]. Enzymatic shearing
presents several advantages such as low cost (no need to
invest in neither a specific instrument nor consumables)
and low DNA loss (samples can go straight from enzym-
atic shearing to end-repair without any intermediate
clean-up step) [4]. However until recently most enzymatic
shearing mix available exhibited high bias toward certain
GC content samples and difficulties to control the average
DNA fragment size in a library. But the latest generation
of enzymatic shearing mixes such as the fragmentase
Fig. 6 Comparison of sequencing data quality between libraries prepared with different DNA library preparation kits. Deletions were not detected
while insertions (red bars) were extremely low for all kits. The most common source of error is mismatches (blue bars) which vary between 0.08 %
and 0.2 % depending on the kit used
Aigrain et al. BMC Genomics (2016) 17:458 Page 7 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
provided with the KAPA HyperPlus kit appears much
more reliable, controllable and less susceptible to bias
(Additional file 1: Figure S5, [26]). KAPA HyperPlus isnt
the only kit using such a streamlined protocol and subse-
quently we have tested other kits such as the NEB UltraII
that also exhibits very high ligation yields in early testing
(>85 %, data not shown).
We observed an interesting phenomenon when com-
paring the ligation and PCR yields of the different kits as
both appear almost anticorrelated in our data (Fig. 4 and
Additional file 1: Figures S2 and S3). An explanation
could be that when the initial DNA input is low or when
the ligation step efficiency is poor, the amount of
adaptor ligated DNA going into the PCR reaction is very
little; on the other hand if the adaptor ligation is very ef-
ficient or the starting DNA input very high, important
amount of DNA is going into the PCR reaction. Yet high
DNA substrate isnt recommended for PCR reactions as
it is known to inhibit the amplification reaction. Such
phenomenon can hide differences between kits since a
protocol exhibiting high ligation efficiency will produce
a high concentration of PCR substrate (adaptor ligated
fragments) which can inhibit amplification, while on the
other hand a kit exhibiting low ligation efficiency will
lead to a very efficient PCR (no substrate excess), both
kits giving similar amount of final library product. A
high ligation yield insures the preservation of the sample
diversity and decreases the amount of amplification re-
quired, avoiding the introduction of additional bias during
PCR [27]. In that respect the Illumina Truseq Nano and
PCR free kits, as well as the KAPA Hyper kit exhibited
some of the highest ligation yields, above 30 %, and the
ligation step with the KAPA HyperPlus was fully efficient.
Finally we noticed variations in the ratios of our 3 con-
trol amplicons in the final libraries when prepared with
different kits. We cannot discriminate the two possible
sources of variation, fragment size or fragment sequence,
and both are most probably playing a role here. To avoid
introducing any bias in our comparison, we used the
same Spri ratio during the clean-up steps with every kit
tested except Truseq DNA PCR-free. However the same
Truseq Nano kit resulted in very different fragment ra-
tios when using the Sanger adaptors [1] rather than the
Illumina adaptors (royal and navy blue bars in Fig. 5)
implying that the sequence of the adaptors and of the
DNA fragments involved in the library preparation does
play a role and might introduce certain bias. The kits
leading to the lowest variations (<25 % for each fragment
size) and therefore probably introducing the least bias
were KAPA HyperPlus and NEBNext.
Conclusion
We identified the kits that are the most practical and the
most efficient, both characteristics often working hand
in hand. Using a novel ddPCR assay, we were able to
deconvolute the influence of each intermediate step in
the library preparation and highlight the significance of
adaptor ligation efficiency which can be hidden when fo-
cusing only on the overall library preparation yield after
amplification. Unlike qPCR measurements [11, 12], our
ddPCR assay doesnt require any specific standards and
can be used to assess the efficiency of any other kit or
protocol not mentioned in this study or not realised yet,
providing a great tool for direct comparison and object-
ive selection. The emergence of PCR free protocols and
simplified protocols merging several steps into one will
certainly improve not only the workflow, overall and
hand on times of DNA library preparation, but also the
chemical efficiency of these.
Method
DNA sample
All the libraries compared in this study were prepared
with the same DNA sample stock. The sample consisted
of three amplicons of different sizes but sharing some
homologous sequence from PhiX174 (214 bp, 397 bp
and 568 bp, see Table 3) [15].
Table 3 Description of the primers and Taqman probe used in ddPCR assay
Oligonucleotide Sequence Comments
PhiXa sens GGC GCT CGT CTT TGG TAT GTA Amplification and detection of 214 bp fragment
PhiXb sens TGA ATT GTT CGC GTT TAC CTT Amplification of 397 bp fragment
PhiXc sens GTA CGC TGG ACT TTG TAG GAT Amplification of 568 bp fragment
PhiX rev GGC GTC CAT CTC GAA G Amplification and detection of all 3 DNA fragments
Adaptor sens CTT TCC CTA CAC GAC GCT CTT Detection of adaptor ligated fragments
Adaptor rev ATT CCT GCT GAA CCG CTC TTC Detection of adaptor ligated fragments
P5 primer AAT GAT ACG GCG ACC ACC GA Detection of final library fragments
P7 primer CAA GCA GAA GAC GGC ATA CGA Detection of final library fragments
Taqman probe [6FAM]GCGATAACCGGAGTAGTTGAAATG[TAM] Taqman probe targeting the common sequence
between the 3 DNA fragments
Aigrain et al. BMC Genomics (2016) 17:458 Page 8 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
DNA library preparation kits for Illumina sequencing
In this study, the following kits were tested and compared:
NEBNext and NEBNext Ultra from NEB, SureSelectXT
from Agilent, Truseq Nano and Truseq DNA PCR-free
from Illumina, Accel-NGS 1S and 2S from Swift Biosci-
ences, and KAPA Hyper and KAPA HyperPlus from KAPA
Biosystems (see Table 1). All kits where tested with 500 ng
DNA input and the ones designed specifically for low in-
put DNA were also tested with lower amount of staring
material (see Table 1). All samples were processed in trip-
licate and the error estimations of our values correspond
to the standard deviations calculated on each triplicate set.
We followed closely the manufacturers recommended
protocol for each kit as well as the amount of adaptor
added to the sample prior to ligation in correspondence to
the DNA input used. For the sake of consistency and to
allow an objective comparison between libraries, all the li-
braries which underwent a PCR step, independently from
the kit used, where amplified using the KAPA HiFi Master
Mix (KR0370 v5.13) and P5/P7 primers following pre-
cisely the protocol and program recommended by KAPA
for 6 amplification cycles.
In order to mimic the preparation of a normal genomic
DNA library, the DNA stock was sheared using a Covaris
S200 (settings for 500 bp peak as recommended by the
manufacturer) and clean-up with a 1.8:1 beads:DNA ratio
before starting the library protocol following the kit man-
uals. The only exception was for the KAPA HyperPlus kit
which contains is own enzymatic shearing step. In this
specific case, we followed the recommended protocol
without any initial Covaris shearing and incubating the
DNA with the fragmentase mix for 5 min at 37 °C.
Droplet digital PCR (ddPCR) assay
In order to evaluate the efficiency of each library prepar-
ation, we developed an assay based on droplet digital
PCR technology [4, 7, 2830]. All the measurements are
done on a Bio-Rad QX200 instrument. Samples are di-
luted and mix with recommended ddPCR master mix,
and with specific primers and Taqman probe targeting
the homologous region of our amplicons (Table 3). An
example of the precise dilutions required for a library
starting with 500 ng DNA input is given in the
Additional file 2: Table S1 and typically varies between
10
5
and 10
7
depending on the library preparation step
and the specific reaction volume at this step. The dilu-
tions were decreased accordingly for lower input librar-
ies (5 times less for 100 ng input, 25 times less for 20 ng
input). We always aimed for maximal number of mole-
cules per ddPCR reaction of 10,000. The ddPCR aqueous
reaction mix is then converted on the Droplet Generator
into an emulsion containing tens of thousands of drop-
lets containing either zero or a single DNA fragment
due to the very low dilution.
The ddPCR program correspond to the following setting
with a temperature ramping of 2 °C/s: denaturation for
10 min at 95 °C, then 40 cycles for denaturing for 30 s at
94 °C and annealing/extension at 65 °C for 60 s, and a
final enzyme deactivation at 98 °C for 10 min. After PCR,
only droplets initially loaded will exhibit high fluorescence
due to the annealed Taqman probe allowing the counting
of the number of molecules in the initial sample by the
droplet reader without the necessity of any standards
(Fig. 1) [31]. Each measurement was done in triplicate.
To evaluate the amount of DNA remaining after each
step as well as the yield of the reactions, two independ-
ent measurements are carried out: the amount of overall
molecules remaining in the sample at each steps in the
protocol (after A-tailing, after adaptor ligation and after
PCR) using primers targeting the homologous sequence
of the DNA fragments and the amount of molecules
bearing adaptors after ligation or P5/P7 primers after
PCR amplification, this time using adaptor specific and
P5/P7 primers (Table 3,Fig. 2). One advantage of the
ddPCR method is that it doesnt depend on equivalent
PCR efficiencies for each measurement as it gives a bin-
ary answer for each droplet [31]. The critical point is to
insure a clear distinction between loaded and empty
droplets fluorescence intensities (Fig. 2).
The DNA loss and chemical yield of each steps and of
the overall library preparation are calculated by combin-
ing the different ddPCR measurements of the total DNA
remaining at a certain step, the adaptor ligated DNA or
the final library bearing P5/P7 adaptors at their ends.
For the first steps of the library preparation, DNA shear-
ing, end repair and A-tailing, only the DNA loss due to
bead clean-up is measured. However both DNA loss and
chemical efficiencies are calculated for the last 2 steps of
each protocols, adaptor ligation and DNA amplifica-
tion. It is important to highlight that the ligation yield
corresponds here to the overall yield of all the previous
chemical steps up to the ligation, including end repair
and A-tailing, so variations in ligation yield between
protocols might also reflect difference in the end repair
or the A-tailing steps rather than just the ligation itself.
Yield calculations
In this study, we distinguish the overall yieldof a library
preparation protocol step from the stepwise yield.The
overall yield corresponds to the amount of DNA remaining
after a certain step in comparison with the initial DNA in-
put of the library preparation (500 ng, 100 ng or 20 ng de-
pending on our samples). The stepwise yield corresponds
to a measurement of the efficiency of a chemical step itself
by comparing the number of molecules being successfully
transformed (for example the number of molecules bearing
adaptors on both ends after the adaptor ligation step) with
the total number of molecules remaining in the sample (in
Aigrain et al. BMC Genomics (2016) 17:458 Page 9 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
our example the total number of molecule after ligation re-
gardless of the presence of adaptor is measured by ddPCR
using the PhiX primers). The comparison of the overall
yield and stepwise yield for an identical step allows us to
deconvolute the amount of DNA loss, simply due to bead
clean-ups and pipetting, from the actual efficiency of a
chemical step such as the ligation of adaptors. More details
on the overall yield and step yield calculations can be found
below and in the Additional file 2: Table S1 and below.
Overall yields are calculated as a ratio of the number
of DNA molecules left at a certain step of the library
preparation protocol (for example DNA
adaptor
post ligation
for
the DNA amount left after ligation and bearing adaptors
measured with adaptor primers) versus the initial DNA
input (DNA
total
starting input
measured with PhiX primers,
Figs. 2 and 3 and Additional file 2: Table S1):
Yieldoverall
ligation ¼DNApostligation
adaptor =DNAstarting input
total
The efficiency of a specific step, stepwise yield, for a
sample prepared with a specific protocol is calculated by
comparing the overall number of DNA molecules
remaining in the sample just after this step (for example
for the ligation stepwise Yield DNA
total
post ligation
measured
with the PhiX specific primers, Fig. 2) with the amount
of DNA fragments bearing adaptors at their ends in the
very same sample (for our example of ligation step yield,
DNA
adaptor
post ligation
measured this time with the adaptor
specific primers, Fig. 2 and Additional file 1: Figure S2
and Additional file 2: Table S1):
Yieldstepwise
ligation ¼DNApostligation
adaptor =DNApostligation
total
Sequencing and data processing
Libraries were multiplexed in batches of 15 and se-
quenced on an Illumina Miseq instrument with the V2
chemistry. Runs were 150 base paired-end reads and the
appropriate single index read.
After sequencing, reads were mapped with the refer-
ence using BWA [32]. Then base errors were counted
throughout the mapped reads for mismatches, inser-
tions and deletions and the error rates were obtained
by averaging them with the total number of bases in
the mapped region of all reads.
Additional files
Additional file 1: Figure S1. Comparison of the overall yields for libraries
prepared with the Truseq Nano kit with either the Sanger adaptors (original
Illumina adaptors, in pink) and the modern Illumina adaptors (in blue).
Figure S2 Bar charts showing the stepwise DNA library preparation yields of
the different kits tested. Initial DNA input: 500 ng. Except where mentioned
otherwise all libraries were prepared using the original Illumina Paired end
adaptor (Sanger adaptors) [6, 22]. The most critical steps correspond to the
adaptor ligation for which the yield varies from 3.50 to 100 % depending on
the kit tested. Figure S3 Bar charts showing the comparison of the overall
DNA library preparation yields of the different kits tested depending on the
initial DNA input. Although higher DNA inputs lead to slightly higher adaptor
ligation yields, the final PCR yield appears much greater when the initial DNA
input is low. Figure S4 Bioanalyzer traces of 3 libraries prepared with the PhiX
amplicons of 3 different sizes. The initial input sample contained a equimolar
ratio of the 3 amplicons whereas this ratio varies in the final libraries presented
here depending on the kit used (Truseq Nano in red, SureSelect in blue and
KAPA hyper in green). Figure S5 Enzymatic shearing using the fragmentase
provided with the KAPA HyperPlus kit. A) Tunability and robustness of the
fragmentase treatments depending on the GC content of the DNA sample,
DNA input and the incubation time. B) KAPA HyperPlus libraries GC contents
and their correlation with the theoretical values. (PPTX 367 kb)
Additional file 2: Table S1. Example of dilution factors and yield
calculations for the NEBNext libraries with 500 ng DNA input and Sanger
adaptors. The calculation of the number of molecules measured by ddPCR
at each step of the library preparation is described in the first 7 columns.
The number 20 in column 7 corresponds to the ddPCR reaction volume in
μLasonly1μL of diluted sample is pipetted in the final ddPCR reaction mix
of 20 μL total. Column 8 to 10 describe the calculation of the Overall Yield
of each steps whereas column 11 to 14 (table split on 2 pages) explain the
calculations of the Step Yield. The equations corresponding to each cell/
column values are displayed in blue. (DOCX 22 kb)
Abbreviations
bp, base pairs; ddPCR: droplet digital PCR; NGS, next-generation sequencing;
PCR, polymerase chain reaction; Tm, melting temperature
Funding
This work was supported by the Wellcome Trust [grant number 098051].
Availability of data and material
Not applicable as the data is included in the results, figures and
supplementary materials.
Authorscontributions
LA designed the ddPCR assay, carried out the ddPCR measurements, data
analysis, sequencing experiments and wrote the manuscript. YG performed
the bioinformatic analysis. MQ designed the study and helped to draft the
manuscript. All authors read and approved the final manuscript.
Competing interests
MQ is a member of the NEB Key Opinion Leader panel. The authors declare
that they have no other competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Received: 23 December 2015 Accepted: 19 May 2016
References
1. Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, Durbin R, Swerdlow H,
Turner DJ. A large genome centers improvements to the Illumina
sequencing system. Nat Methods. 2008;5(12):100510.
2. Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ.
Amplification-free Illumina sequencing-library preparation facilitates
improved mapping and assembly of (G + C)-biased genomes. Nat Methods.
2009;6(4):2915.
3. Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F,
Salomon DR, Ordoukhanian P. Library construction for next-generation
sequencing: overviews and challenges. Biotechniques. 2014;56(2):614. 66,
68, passim.
4. Linnarsson S. Recent advances in DNA sequencing methods - general
principles of sample preparation. Exp Cell Res. 2010;316(8):133943.
Aigrain et al. BMC Genomics (2016) 17:458 Page 10 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
5. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-
generation sequencing technology. Trends Genet. 2014;30(9):41826.
6. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown
CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. Accurate whole human
genome sequencing using reversible terminator chemistry. Nature. 2008;
456(7218):539.
7. Cai Y, Li X, Lv R, Yang J, Li J, He Y, Pan L. Quantitative analysis of pork and
chicken products by droplet digital PCR. Biomed Res Int. 2014;2014:810209.
8. Hindson CM, Chevillet JR, Briggs HA, Gallichotte EN, Ruf IK, Hindson BJ,
Vessella RL, Tewari M. Absolute quantification by droplet digital PCR versus
analog real-time PCR. Nat Methods. 2013;10(10):10035.
9. Yang R, Paparini A, Monis P, Ryan U. Comparison of next-generation droplet
digital PCR (ddPCR) with quantitative PCR (qPCR) for enumeration of
Cryptosporidium oocysts in faecal samples. Int J Parasitol. 2014;44(14):110513.
10. Hindson BJ, Ness KD, Masquelier DA, Belgrader P, Heredia NJ, Makarewicz
AJ, Bright IJ, Lucero MY, Hiddessen AL, Legler TC, et al. High-throughput
droplet digital PCR system for absolute quantitation of DNA copy number.
Anal Chem. 2011;83(22):860410.
11. Laurie MT, Bertout JA, Taylor SD, Burton JN, Shendure JA, Bielas JH.
Simultaneous digital quantification and fluorescence-based size
characterization of massively parallel sequencing libraries. Biotechniques.
2013;55(2):617.
12. Taylor SC, Carbonneau J, Shelton DN, Boivin G. Optimization of Droplet
Digital PCR from RNA and DNA extracts with direct comparison to RT-qPCR:
Clinical implications for quantification of Oseltamivir-resistant
subpopulations. J Virol Methods. 2015;224:5866.
13. Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB,
Nusbaum C, Gnirke A. Analyzing and minimizing PCR amplification bias in
Illumina sequencing libraries. Genome Biol. 2011;12(2):R18.
14. Simbolo M, Gottardi M, Corbo V, Fassan M, Mafficini A, Malpeli G, Lawlor RT,
Scarpa A. DNA qualification workflow for next generation sequencing of
histopathological samples. PLoS One. 2013;8(6):e62692.
15. Quail MA, Smith M, Jackson D, Leonard S, Skelly T, Swerdlow HP, Gu Y, Ellis
P. SASI-Seq: sample assurance Spike-Ins, and highly differentiating 384
barcoding for Illumina sequencing. BMC Genomics. 2014;15(1):110.
16. DeAngelis MM, Wang DG, Hawkins TL. Solid-phase reversible immobilization
for the isolation of PCR products. Nucleic Acids Res. 1995;23(22):47423.
17. Fisher S, Barry A, Abreu J, Minie B, Nolan J, Delorey TM, Young G, Fennell TJ,
Allen A, Ambrogio L, et al. A scalable, fully automated process for
construction of sequence-ready human exome targeted capture libraries.
Genome Biol. 2011;12(1):R1.
18. Knierim E, Lucke B, Schwarz JM, Schuelke M, Seelow D. Systematic
comparison of three methods for fragmentation of long-range PCR
products for next generation sequencing. PLoS One. 2011;6(11):e28240.
19. Marine R, Polson SW, Ravel J, Hatfull G, Russell D, Sullivan M, Syed F, Dumas
M, Wommack KE. Evaluation of a transposase protocol for rapid generation
of shotgun high-throughput sequencing libraries from nanogram quantities
of DNA. Appl Environ Microbiol. 2011;77(22):80719.
20. Hussing C, Kampmann ML, Mogensen HS, Borsting NM. Comparison of
techniques for quantification of next-generation sequencing libraries.
Forensic Science International: Genetics Supplement Series 2015. In press.
21. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-
short read data sets from high-throughput DNA sequencing. Nucleic Acids
Res. 2008;36(16):e105.
22. Quail MA, Otto TD, Gu Y, Harris SR, Skelly TF, McQuillan JA, Swerdlow HP,
Oyola SO. Optimal enzymes for amplifying sequencing libraries. Nat Meth.
2012;9(1):101.
23. Dabney J, Meyer M. Length and GC-biases during sequencing library
amplification: a comparison of various polymerase-buffer systems with ancient
and modern DNA sequencing libraries. Biotechniques. 2012;52(2):8794.
24. Oyola SO, Otto TD, Gu Y, Maslen G, Manske M, Campino S, Turner DJ,
MacInnis B, Kwiatkowski DP, Swerdlow HP, et al. Optimizing illumina next-
generation sequencing library preparation for extremely at-biased genomes.
BMC Genomics. 2012;13.
25. Perelygina L, Zhu L, Zurkuhlen H, Mills R, Borodovsky M, Hilliard JK.
Complete sequence and comparative analysis of the genome of herpes B
virus (Cercopithecine herpesvirus 1) from a rhesus monkey. J Virol. 2003;
77(11):616777.
26. Miller BE, van Kets V, van Rooyen B, Whitehorn H, Jones P, Ranik M, Geldart
A, van der Walt E, Appel M: A novel, single-tube enzymatic fragmentation
and library construction method enables fast turnaround times and
improved data quality for microbial whole-genome sequencing.
KAPABiosystem 2015, APP109001(1.15):10.
27. Seguin-Orlando A, Schubert M, Clary J, Stagegaard J, Alberdi MT, Prado JL,
Prieto A, Willerslev E, Orlando L. Ligation bias in illumina next-generation
DNA libraries: implications for sequencing ancient genomes. PLoS One.
2013;8(10):e78575.
28. Ludlow AT, Robin JD, Sayed M, Litterst CM, Shelton DN, Shay JW, Wright
WE. Quantitative telomerase enzyme activity determination using droplet
digital PCR with single cell resolution. Nucleic Acids Res. 2014;42(13):e104.
29. Wang Q, Yang X, He Y, Ma Q, Lin L, Fu P, Xiao H. Droplet Digital PCR for
Absolute Quantification of EML4-ALK Gene Rearrangement in Lung
Adenocarcinoma. J Mol Diagn. 2015;17(5):51520.
30. Yang W, Shelton DN, Berman JR, Zhang B, Cooper S, Svilen T, Hefner E,
Regan JF. Droplet DigitalPCR: Multiplex detection of kras mutations in
formalin-fixed, paraffin-embedded colorectal cancer samples. Biotechniques.
2015;58:2.
31. Hatch AC, Fisher JS, Tovar AR, Hsieh AT, Lin R, Pentoney SL, Yang DL, Lee
AP. 1-Million droplet array with wide-field fluorescence imaging for digital
PCR. Lab Chip. 2011;11(22):383845.
32. Li H, Durbin R. Fast and accurate short read alignment with Burrows-
Wheeler transform. Bioinformatics. 2009;25(14):175460.
We accept pre-submission inquiries
Our selector tool helps you to find the most relevant journal
We provide round the clock customer support
Convenient online submission
Thorough peer review
Inclusion in PubMed and all major indexing services
Maximum visibility for your research
Submit your manuscript at
www.biomedcentral.com/submit
Submit your next manuscript to BioMed Central
and we will help you at every step:
Aigrain et al. BMC Genomics (2016) 17:458 Page 11 of 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Whole-genome resequencing and subsequent analysis were outsourced to Macrogen Inc. (Seoul, South Korea). The sequencing libraries were prepared according to the manufacturer's instructions for the TruSeq Nano DNA High Throughput Library Prep Kit (Illumina) (21). Briefly, 100 ng of genomic DNA was sheared using adaptive focused acoustic technology (Covaris), and the fragmented DNA was end-repaired to create 5′-phosphorylated, blunt-ended dsDNA molecules. ...
Article
Full-text available
Mycobacterium abscessus is a group of emerging antimicrobial-resistant nontuberculous mycobacteria that causes severe lung disease in infected patients globally. Recently, molecular epidemiology studies have indicated that horizontal gene transfer (HGT) events in the rpoB gene are prevalent between M. abscessus subspecies. To determine the global prevalence of M. abscessus strains subjected to rpoB HGT, we performed phylogenetic inference using a 711-bp rpoB sequence extracted from 1,786 M. abscessus isolates for which the whole-genome sequence was publicly available. Our data showed that a total of 74 isolates (4.1%) from 1,786 strains are subject to rpoB HGT, which is more prevalent than strains with hsp65 HGT (19 isolates from 1,786, 1.1%). Most of these (69 isolates) belong to two major groups of Mycobacterium massiliense, of which the rpoB gene is horizontally transferred from M. abscessus (Rec-mas), dominant circulating clone 7 (DCC7) (44 isolates) and ST46 type by multilocus sequence typing (25 isolates). The Rec-mas strains of the two groups have distinct geographical patient distributions, of which the former is mainly distributed in the United States, while the latter is prevalent in Asia. Our further genome-based analysis indicated that the ST46 type is a novel DCC candidate of M. massiliense that is responsible for dissemination between noncystic fibrosis patients in Asia. In conclusion, our global phylogenetic analysis revealed two major Rec-mas clones with distinct geographical distributions, namely, DCC7 and ST46. This study provides insights into the genetic clustering and person-to-person transmission of globally dominant and area-specific strains harboring the HGT rpoB gene. IMPORTANCE Horizontal gene transfer (HGT) events play a pivotal role in the evolution of Mycobacterium abscessus into dominant circulating clones (DCCs), which is capable of causing patient-to-patient transmission. In particular, HGT of the rpoB gene between strains of different subspecies of M. abscessus could also compromise differentiation between strains of M. abscessus. Here, for the first time, using 1,786 M. abscessus genome sequences, we evaluated the global prevalence of M. abscessus strains subjected to rpoB HGT. We found a greater prevalence of M. abscessus subjected to rpoB HGT than to those subjected to hsp65 HGT, which is mainly due to two Rec-mas clones, dominant circulating clone 7 and ST46, which are responsible for dissemination between non-CF patients in Asia. Our data highlight the importance of rpoB HGT in the evolution of M. abscessus, particularly Mycobacterium massiliense, into virulent DCC clones.
... For DNA sequencing, one of the most critical components is the choice of library preparation methods. While many studies exist for the selection of library preparation methods for high-coverage sequencing (Aigrain 2016;Ribarska 2022), researchers seeking to do low coverage whole genome sequencing frequently make decisions at this critical step without guidance. The absence of publicly available guidelines for low coverage library preparations exacerbates this issue. ...
Preprint
Full-text available
In the fields of human health and agricultural research, low coverage whole-genome sequencing followed by imputation to a large haplotype reference panel has emerged as a cost-effective alternative to genotyping arrays for assaying large numbers of samples. However, a systematic comparison of library preparation methods tailored for low coverage sequencing remains absent in the existing literature. In this study, we evaluated one full sized kit from IDT and miniaturized and evaluated three Illumina-compatible library preparation kits-the KAPA HyperPlus kit (Roche), the DNA Prep kit (Illumina), and an IDT kit-using 96 human DNA samples. Metrics evaluated included imputation concordance with high-depth genotypes, coverage, duplication rates, time for library preparation, and additional optimization requirements. Despite slightly elevated duplication rates in IDT kits, we find that all four kits perform well in terms of imputation accuracy, with IDT kits being only marginally less performant than Illumina and Roche kits. Laboratory handling of the kits was similar: thus, the choice of a kit will largely depend on (1) existing or planned infrastructure, such as liquid handling capabilities, (2) whether a specific characteristic is desired, such as the use of full-length adapters, shorter processing times, or (3) use case, for instance, long vs short read sequencing. Our findings offer a comprehensive resource for both commercial and research workflows of low-cost library preparation methods suitable for high-throughput low coverage whole genome sequencing.
... Starting from 10 ng of amplified DNA or from the unamplified DNA in the picogramrange, we successfully prepared sequencing libraries using the Accel-NGS 1S Plus kit. This kit has been previously tested with DNA quantities diluted up to picograms as well as from 100 ng of virome DNA [48], allowing the capture of both ss-and dsDNA molecules [49,50]. The higher number of PCR cycles required for the picogram DNA quantities resulted in a higher number of identical reads in the unamplified samples, compared to the amplified ones, which need to be removed to avoid possibly assembly and mapping biases [51]. ...
Article
Full-text available
Viruses are the most abundant 'biological entities' in the world's oceans. However, technical and methodological constraints limit our understanding of their diversity, particularly in benthic abyssal ecosystems (>4000 m depth). To verify advantages and limitations of analyzing virome DNA subjected either to random amplification or unamplified, we applied shotgun sequencing-by-synthesis to two sample pairs obtained from benthic abyssal sites located in the Northeastern Atlantic Ocean at ca. 4700 m depth. One amplified DNA sample was also subjected to single-molecule long-read sequencing for comparative purposes. Overall, we identified 24,828 viral Operational Taxonomic Units (vOTUs), belonging to 22 viral families. Viral reads were more abundant in the amplified DNA samples (38.5-49.9%) compared to the unamplified ones (4.4-5.8%), with the latter showing a greater viral diversity and 11-16% of dsDNA viruses almost undetectable in the amplified samples. From a procedural point of view, the viromes obtained by direct sequencing (without amplification step) provided a broader overview of both ss and dsDNA viral diversity. Nevertheless, our results suggest that the contextual use of random amplification of the same sample and long-read technology can improve the assessment of viral assemblages by reducing off-target reads.
... While it is possible to reduce the costs of the library preparation step in many different ways as was outlined above, it is crucial that the quality of the resulting data sets is not impaired and has no negative impact on downstream analyses (Aigrain et al., 2016;Alberti et al., 2014;Dabney and Meyer, 2012;McNulty et al., 2020;Romero et al., 2014). However, to the best of our knowledge, no comprehensive characterization of library complexity and biases was performed for manual library preparation miniaturizations. ...
Article
Full-text available
We present an easy-to-reproduce manual miniaturized full-length RNA sequencing (RNAseq) library preparation workflow that does not require the upfront investment in expensive lab equipment or long setup times. With minimal adjustments to an established commercial protocol, we were able to manually miniaturize the RNAseq library preparation by a factor of up to 1:8. This led to cost savings for miniaturized library preparation of up to 86.1% compared to the gold standard. The resulting data were the basis of a rigorous quality control analysis that inspected: sequencing quality metrics, gene body coverage, raw read duplications, alignment statistics, read pair duplications, detected transcripts and sequence variants. We also included a deep dive data analysis identifying rRNA contamination and suggested ways to circumvent these. In the end, we could not find any indication of biases or inaccuracies caused by the RNAseq library miniaturization. The variance in detected transcripts was minimal and not influenced by the miniaturization level. Our results suggest that the workflow is highly reproducible and the sequence data suitable for downstream analyses such as differential gene expression analysis or variant calling.
... The use of multicolor dPCR detection technology spans across a variety of applications, including detection of SNPs [29,30], clinical diagnostics, oncology [31,32], environmental monitoring [33,34], single-cell analysis [35,36] and food or agricultural testing [37,38]. dPCR also serves as a supplement to qPCR [39,40], NGS [41], prenatal testing [42,43], and copy number variation (CNV) genotyping [44,45]. Currently, dPCR technologies that have been commercialized are capable of meeting applications in a variety of settings; however, a number of critical challenges remain, such as the dPCR system has high cost and complex operation and the cost of chip processing is high. ...
Article
Full-text available
The Kirsten rat sarcoma virus gene (KRAS) is the most common tumor in human cancer, and KRAS plays an important role in the growth of tumor cells. Normal KRAS inhibits tumor cell growth. When mutated, it will continuously stimulate cell growth, resulting in tumor development. There are currently few drugs that target the KRAS gene. Here, we developed a microfluidic chip. The chip design uses parallel fluid channels combined with cylindrical chamber arrays to generate 20,000 cylindrical microchambers. The microfluidic chip designed by us can be used for the microsegmentation of KRAS gene samples. The thermal cycling required for the PCR stage is performed on a flat-panel instrument and detected using a four-color fluorescence system. “Glass-PDMS-glass” sandwich structure effectively reduces reagent volatilization; in addition, a valve is installed at the sample inlet and outlet on the upper layer of the chip to facilitate automatic control. The liquid separation performance of the chip was verified by an automated platform. Finally, using the constructed KRAS gene mutation detection system, it is verified that the chip has good application potential for digital polymerase chain reaction (dPCR). The experimental results show that the chip has a stable performance and can achieve a dynamic detection range of four orders of magnitude and a gene mutation detection of 0.2%. In addition, the four-color fluorescence detection system developed based on the chip can distinguish three different KRAS gene mutation types simultaneously on a single chip.
... ; https://doi.org/10.1101/2023.03.14.23287139 doi: medRxiv preprint 3 molecular loss during the purification after ligation and the amplification effect before ligation. For most hybrid capture technologies that are not amplified before ligation, the library conversion rate is generally not higher than 30%, which accounts for the main limiting factor [10]. The final detection efficiency is approximately the product of "theoretical detection ratio" and "library conversion rate". ...
Preprint
Background: Cell-free DNA (cfDNA) promises to serve as surrogate biomarkers for non-invasive molecular diagnostics. Disease-specific cfDNA, such as circulating tumor DNA (ctDNA), was short and rare, making the detection performance of the current targeted sequencing methods unsatisfying. Methods: Through introducing a linear pre-amplification process and optimizing the adapter ligation with customized reagents, we developed the One-PrimER Amplification (OPERA) system. In this study, we examined its performance in detecting mutations of low variant allelic frequency (VAF) in various samples with short-sized DNA fragments. Results: In cell line-derived samples containing sonication-sheared DNA fragments with 50-150 bp (peak at 70-80 bp), OPERA was capable of detecting mutations as low as 0.0025% VAF, while CAPP-Seq only detected mutations of >0.03% VAF. Both single nucleotide variant and insertion/deletion can be detected by OPERA. In synthetic fragments as short as 80 bp with low VAF (0.03%-0.1%), the detection sensitivity of OPERA was significantly higher compared to that of droplet digital polymerase chain reaction. The error rate was 5.9x10-5 errors per base after de-duplication in plasma samples collected from healthy volunteers. By suppressing single-strand errors, the error rate can be further lowered by >5 folds in EGFR T790M hotspot. In plasma samples collected from lung cancer patients, OPERA detected mutations in 57.1% stage I patients with 100% specificity and achieved a sensitivity of 30.0% in patients with tumor volume of less than 1 cm3. Conclusions: OPERA can effectively detect mutations in rare and highly-fragmented DNA. Trial registration: This study has been registered on ChiCTR (ChiCTR1900024028) at 23rd June 2019.
... Vogelstein and Kinzler (1999) coined the term "digital PCR" in 1999, after which the term was adopted by others (Vogelstein and Kinzler, 1999;Lo et al., 2007;Wang et al., 2010;Warren et al., 2006;Zhou et al., 2001). Individual dPCR reactions are prepared following procedures and reagents analogous to qPCR, including the addition of primers, hydrolysis probes (if applicable), intercalating dyes, and reaction enzymes (Pecoraro et al., 2019;Botes et al., 2013;Aigrain et al., 2016). In dPCR, the reaction mixture is divided into thousands to millions of small partitions (physically separate reaction cells). ...
Article
Full-text available
Digital polymerase chain reaction (dPCR) is emerging as a reliable platform for quantifying microorganisms in the field of health-related water microbiology. This paper reviews the fundamental principles of dPCR and its applications for health-related water microbiology. The relevant literature indicates increasing adoption of dPCR for measuring fecal indicator bacteria, microbial source tracking marker genes, and pathogens in various aquatic environments. The adoption of dPCR has accelerated recently due to increasing use for wastewater surveillance of SARS-CoV-2 the virus that causes COVID-19. The collective experience in the scientific literature indicates that well-optimized dPCR assays can quantify genetic fragments of microorganisms without the need for a calibration curve and often with superior analytical performance (i.e., greater sensitivity, precision, and reproducibility) than quantitative polymerase chain reaction (qPCR). Nonetheless, dPCR should not be viewed as a panacea for the fundamental uncertainties and limitations associated with measuring microorganisms in health-related water microbiology. With dPCR platforms, the sample analysis cost and processing time typically are greater than qPCR. However, if improved analytical performance (i.e., sensitivity and accuracy) is required, dPCR can be an alternative option for quantifying microorganisms, including pathogens, in aquatic environments.
... As the frequency of washing steps increases, the sample loss would get worse. The DNA loss often is beyond what we expect in short fragments (Aigrain et al. 2016). Furthermore, the more we open the lid of reaction tubes or pipetting the samples for the clean-up steps, the risk of cross-contamination may increase. ...
Article
Library preparation is an essential step for the next-generation sequencing, such as whole-genome sequencing, reduced-representation genome sequencing, exome sequencing and transcriptome sequencing. The library preparation often involves many steps, including DNA fragmentation, end repair, ligation and amplification. Each step involves different enzymes and buffer systems, so many washing steps are implemented in between to clean-up the enzymes and solutes from the previous step. Those extra washing steps not only are tedious and costly, but more importantly may introduce cross-contamination and reduce the final library yield. Here, we modified the common protocol of Illumina library prep to reduce the washing steps by deactivating the enzymes with high temperature. The modified protocol has two less washing steps than the original one, which can save more than 40 min of hands-on time and reduce potential risk of cross-contamination. We compared our protocol with the original one by constructing libraries using 200 ng DNA of Tetraodon nigroviridis. The results showed that libraries prepared with the modified protocol had higher yields than that using the original protocol (53.4 ± 16.8 ng/ml vs. 8 ± 0.7 ng/ml), whereas the coverage and PCR duplication rate were similar. Furthermore, we eliminated the very first washing step after DNA shearing to preserve short DNA fragments, which increased proportion of fragments less than 100 bp DNA from 0.82 to 2.99%. In conclusion, using the modified protocols not only can save time and money, but also can generate higher yield and keep more short DNA fragments. Supplementary information: The online version contains supplementary material available at 10.1007/s13205-022-03168-5.
Article
Circulating tumor DNA (ctDNA) was short and rare, making the detection performance of the current targeted sequencing methods unsatisfying. We developed the One-PrimER Amplification (OPERA) system and examined its performance in detecting mutations of low variant allelic frequency (VAF) in various samples with short-sized DNA fragments. In cell line-derived samples containing sonication-sheared DNA fragments with 50-150 bp, OPERA was capable of detecting mutations as low as 0.0025% VAF, while CAPP-Seq only detected mutations of >0.03% VAF. Both single nucleotide variant and insertion/deletion can be detected by OPERA. In synthetic fragments as short as 80 bp with low VAF (0.03%-0.1%), the detection sensitivity of OPERA was significantly higher compared to that of droplet digital polymerase chain reaction. The error rate was 5.9×10-5 errors per base after de-duplication in plasma samples collected from healthy volunteers. By suppressing "single-strand errors", the error rate can be further lowered by >5 folds in EGFR T790M hotspot. In plasma samples collected from lung cancer patients, OPERA detected mutations in 57.1% stage I patients with 100% specificity and achieved a sensitivity of 30.0% in patients with tumor volume of less than 1 cm3. OPERA can effectively detect mutations in rare and highly-fragmented DNA.
Article
Full-text available
DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.
Article
Full-text available
Targeted therapies in many cancers have allowed unprecedented progress in the treatment of disease. However, routine implementation of genomic testing is constrained due to: 1) limited amounts of sample (pg–ng range) per biological specimen, 2) diagnostic turnaround time and workflow, 3) cost, and 4) difficulties in detection of mutational loads below 5%. KRAS is mutated in approximately 40% of colorectal cancers (CRCs). The majority of mutations affect codons 12, 13, and 61 and indicate a negative response to anti–epidermal growth factor receptor (EGFR) therapy. To optimize therapy strategies for personalized care, it is critical to rapidly screen patient samples for the presence of multiple KRAS mutations.
Technical Report
Full-text available
Next-generation whole genome sequencing of microbes demands rapid, robust, and scalable library construction workflows, capable of generating high-quality sequence data across a wide range of genome sizes, complexities and genomic GC content. In this Application Note, we describe a streamlined library preparation method that results in minimal bias, high uniform coverage, and facilitates de novo assembly of microbial genomes.
Article
Full-text available
The recent introduction of Droplet Digital PCR (ddPCR) has provided researchers with a tool that permits direct quantification of nucleic acids from a wide range of samples with increased precision and sensitivity versus RT-qPCR. The sample interdependence of RT-qPCR stemming from the measurement of Cq and ΔCq values is eliminated with ddPCR which provides an independent measure of the absolute nucleic acid concentration for each sample without standard curves thereby reducing inter-well and inter-plate variability. Well-characterized RNA purified from H275-wild type (WT) and H275Y-point mutated (MUT) neuraminidase of influenza A (H1N1) pandemic 2009 virus was used to demonstrate a ddPCR optimization workflow to assure robust data for downstream analysis. The ddPCR reaction mix was also tested with RT-qPCR and gave excellent reaction efficiency (between 90% and 100%) with the optimized MUT/WT duplexed assay thus enabling the direct comparison of the two platforms from the same reaction mix and thermal cycling protocol. ddPCR gave a marked improvement in sensitivity (>30-fold) for mutation abundance using a mixture of purified MUT and WT RNA and increased precision (>10 fold, p<0.05 for both inter- and intra-assay variability) versus RT-qPCR from patient samples to accurately identify residual mutant viral population during recovery. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Article
Full-text available
In this project, a highly precise quantitative method based on the digital polymerase chain reaction (dPCR) technique was developed to determine the weight of pork and chicken in meat products. Real-time quantitative polymerase chain reaction (qPCR) is currently used for quantitative molecular analysis of the presence of species-specific DNAs in meat products. However, it is limited in amplification efficiency and relies on standard curves based Ct values, detecting and quantifying low copy number target DNA, as in some complex mixture meat products. By using the dPCR method, we find the relationships between the raw meat weight and DNA weight and between the DNA weight and DNA copy number were both close to linear. This enabled us to establish formulae to calculate the raw meat weight based on the DNA copy number. The accuracy and applicability of this method were tested and verified using samples of pork and chicken powder mixed in known proportions. Quantitative analysis indicated that dPCR is highly precise in quantifying pork and chicken in meat products and therefore has the potential to be used in routine analysis by government regulators and quality control departments of commercial food and feed enterprises.
Article
To ensure efficient sequencing, the DNA of next-generation sequencing (NGS) libraries must be quantified correctly. Therefore, an accurate, sensitive and stable method for DNA quantification is crucial. In this study, seven different methods for DNA quantification were compared to each other by quantifying NGS libraries for the Ion Torrent™ and Illumina® platforms as well as dsDNA oligos with known DNA concentrations. Rather large variations in library concentration estimates were observed. The differences between the highest and lowest concentration estimates varied with a factor of 5-100 depending on the library concentration. The Bioanalyzer, TapeStation and Qubit® instruments gave concentrations closest to the expected when quantifying dsDNA oligos. At very low concentrations (2-4 pg/ul) only the Bioanalyzer could reliably quantify the dsDNA oligos.
Article
Crizotinib treatment significantly prolongs progression-free survival, increases response rates, and improves the quality of life in patients with ALK-positive non-small-cell lung cancer. Droplet Digital PCR (ddPCR), a recently developed technique with high sensitivity and specificity, was used in this study to evaluate the association between the abundance of ALK rearrangements and crizotinib effectiveness. FFPE tissues were obtained from 103 consecutive patients with lung adenocarcinoma. Fluorescent in situ hybridization (FISH) and ddPCR were performed. The results revealed that 14 (13.6%) of the 103 patients were positive by dual-color, break-apart FISH. Three variants (1, 2, and 3) of the EML4-ALK gene rearrangements were detected. Thirteen of 14 ALK-positive cases identified by FISH were confirmed by ddPCR (four with variant 1, two with variant 2, and seven with variant 3). The case missed by ddPCR was identified as KIF5B-ALK gene rearrangement by PCR-based direct sequencing. Sixteen patients were detected with low copy numbers of EML4-ALK gene rearrangement, which failed to meet the positive cutoff point of FISH. Two of them responded well to crizotinib after unsuccessful chemotherapy. Our study indicates that ddPCR can be used as a molecular analytical tool to accurately measure the EML4-ALK rearrangement copy numbers in FFPE samples of lung adenocarcinoma patients. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Article
Clinical microbiology laboratories rely on quantitative PCR (qPCR) for its speed, sensitivity, specificity and ease-of-use. However, qPCR quantitation requires the use of a standard curve or normalization to reference genes. Droplet digital PCR (ddPCR) provides absolute quantitation without the need for calibration curves. A comparison between ddPCR and qPCR-based analyses was conducted for the enteric parasite Cryptosporidium, which is an important cause of gastritis in both humans and animals. Two loci were analysed (18S rRNA and actin) using a range of Cryptosporidium DNA templates, including recombinant plasmids, purified haemocytometer-counted oocysts, commercial flow cytometry-counted oocysts and faecal DNA samples from sheep, cattle and humans. Each method was evaluated for linearity, precision, limit of detection (LOD) and cost. Across the same range of detection, both methods showed a high degree of linearity and positive correlation for standards (R(2) ⩾ 0.999) and faecal samples (R(2) ⩾ 0.9750). The precision of ddPCR, as measured by mean Relative Standard Deviation (RSD;%), was consistently better compared with qPCR, particularly for the 18S rRNA locus, but was poorer as DNA concentration decreased. The quantitative detection of qPCR was unaffected by DNA concentration, but ddPCR was less affected by the presence of inhibitors, compared with qPCR. For most templates analysed including Cryptosporidium-positive faecal DNA, the template copy numbers, as determined by ddPCR, were consistently lower than by qPCR. However, the quantitations obtained by qPCR are dependent on the accuracy of the standard curve and when the qPCR data were corrected for pipetting and DNA losses (as determined by ddPCR), then the sensitivity of both methods was comparable. A cost analysis based on 96 samples revealed that the overall cost (consumables and labour) of ddPCR was two times higher than qPCR. Using ddPCR to precisely quantify standard dilutions used for high-throughput and cost-effective amplifications by qPCR would be one way to combine the advantages of the two technologies.
Article
Ten years ago next-generation sequencing (NGS) technologies appeared on the market. During the past decade, tremendous progress has been made in terms of speed, read length, and throughput, along with a sharp reduction in per-base cost. Together, these advances democratized NGS and paved the way for the development of a large number of novel NGS applications in basic science as well as in translational research areas such as clinical diagnostics, agrigenomics, and forensic science. Here we provide an overview of the evolution of NGS and discuss the most significant improvements in sequencing technologies and library preparation protocols. We also explore the current landscape of NGS applications and provide a perspective for future developments.