Content uploaded by Bart C Weimer
Author content
All content in this area was uploaded by Bart C Weimer on Dec 21, 2015
Content may be subject to copyright.
Quality Control of Library
Construction Pipeline for PacBio
SMRTbell 10 kb Library Using an
Agilent 2200 TapeStation System
Authors
Nguyet Kong, Whitney Ng, and
Bart Weimer
100K Pathogen Genome Project
Population Health and Reproduction
Department
School of Veterinary Medicine
University of California-Davis
Davis, CA, USA
Lenore Kelly
Agilent Technologies, Inc.
Santa Clara, CA, USA
Application Note
Abstract
The PacBio RS II uses a single molecule real-time (SMRT) DNA technology to
sequence molecules and detect DNA modifications in bacterial genomes. SMRT has
enabled the understanding of key biomarkers pertinent to microbial genome stabil-
ity, pathogenicity. It is a useful tool for use in creating robust diagnostics. As with
many next generation sequencing technologies, the starting point is production of
high molecular weight genomic DNA and high quality sequence production. Large
scale high-throughput sequencing projects such as the 100K Pathogen Genome
Project require methods that can rapidly assess quantity and quality of DNA in a
multiplexed format to enable the use of a streamlined sequencing pipeline. In this
study, the Agilent 2100 Bioanalyzer and Agilent 2200 TapeStation Systems were
used to analyze sheared DNA and final libraries for sequencing. The 2200
TapeStation System’s ability to analyze the various quality control steps required in
library construction on a single platform proved to be advantageous in a
high-throughput sequencing pipeline.
2
Introduction
The 100K Foodborne Pathogen Genome Project
(http://100kgenome.vetmed.ucdavis.edu/) is a global effort to
sequence the genomes of 100,000 microbes that are impor-
tant to food security through a consortium of government,
academic, and industrial partners. Recent advancements in
DNA sequencing technologies allow whole genome sequenc-
ing to be accomplished at an unprecedented rate and
sequencing quality. The use of next generation sequencing
(NGS) methods have increased productivity and revealed new
information about structure and function of bacterial
genomes. NGS methods rely on short reads to produce draft
genomes, but single molecule real-time (SMRT) DNA
sequencing technology system uses the PacBio RS II, which
provides extra-long read lengths that can improve genome
assembly by reducing the number of contigs. However, to
generate long subread lengths, we need high-quality double
stranded gDNA with high molecular weight fragments. The
DNA must be sheared to a target size of 10 kb. When the
sample input is already fragmented, it will be difficult to con-
struct a library with inserts averaging 10 kb. Additionally,
removal of short insert SMRTbells will allow more efficient
loading of large inserts. As a result, consensus from De Novo
assembly and targeted sequencing is more accurate.
Structural and cell type variation can also be observed, a
feature not accessible with any other sequencing technology.
In large-scale sequencing projects such as the 100K
Foodborne Pathogen Genome Project, robust methods of DNA
quantification and sizing using high-throughput procedures
from DNA extraction to library construction are critical in
various steps (Figure 1) [1-4].
DNA quantification and sizing is often analyzed using gel
electrophoresis, but this approach is not suitable in a
high-throughput workflow. Estimating the size of DNA against
a ladder on an agarose gel to determine concentration, offers
low resolution and cannot be automated for increased repro-
ducibility. The Pacific Biosciences template preparation
process does not use amplification techniques and requires
double stranded high molecular weight gDNA to be made into
a SMRTbell template. Quality assessments are recommended
to ensure that the DNA is pure and of high molecular weight
prior to fragmentation using the Covaris instrument. Currently,
Pacific Biosciences uses the Agilent 2100 Bioanalyzer instru-
ment with the DNA 12000 assay to perform qualitative and
quantitative analysis of the DNA fragments. Ideally, there
should be a distribution of the sheared fragments around
10 kb with some loss of starting material. The 2100
Bioanalyzer instrument with the DNA 12000 assay offers a
high-resolution analysis of DNA fragments up to 12,000 bp
with up to 12 samples per chip. It is advantageous to use a
system that enables analysis of gDNA as well as other steps
in the process of library construction. An alternative is the
Agilent 2200 TapeStation System, which resolves gDNA,
sheared DNA, and final libraries in a high-throughput format,
up to 96 samples per run. It also offers the ability to quanti-
tate and size DNA [5,6]. This system offers the characteristics
required by a workflow of microbial genome library prepara-
tion for a project of this magnitude.
Figure 1. 100K Pathogen Genome Project sample preparation workflow for PacBio 10 kb Library Preparation using
the Agilent 2200 TapeStation System for quality control.
Next generation sequencing pipeline
NGS Workflow
Genomic DNA extraction Lysis and extraction
(5991-3722EN)
Agilent 2200 TapeStation
(5991-4003EN)
Need automation
Agilent 2200 TapeStation
(5991-5075EN)
Short read – KAPA
(5991-4296EN)
Long read – PacBio
(5991-4482EN)
Need automation
Need automation
QC: gDNA
Shear gDNA
QC: Fragmented DNA
Library construction
QC: Library quality and quantity
Library normalization and pooling
Consistent cell lysis to yield HMW gDNA
HTP method to measure HMW gDNA
Consistent method
HTP method to measure fragmented gDNA
Automated protocols
Consistent method to measure final library
Consistent ratios of each library
NGS Sequencing
Application Note
This Application Note
Workflow needs
3
The system automatically loads the prepared samples from
the 96-well plates onto the Genomic DNA ScreenTape. The
analysis is done automatically with an electropherogram and
a gel image. The high resolution of the 2200 TapeStation
analysis is important to assess the quality of the gDNA,
ensuring that it is intact with minimal degradation prior to
using the Covaris g-TUBE device to shear the gDNA into the
targeted fragment size [7]. It is critical that the fragmented
DNA is accurately quantified in preparation for the PacBio
SMRTbell 10kb Library Construction on the Agilent NGS
Workstation. This application note presents a comparative
study between the quantification and sizing of microbial
gDNA, sheared DNA, and final library obtained with the
2200 TapeStation System using the Genomic DNA ScreenTape
assay.
Figure 2. Electropherogram and gel image of high molecular weight gDNA from an Agilent 2200 TapeStation
using the Genomic DNA ScreenTape System.
Table 1.
ID no. Color Bacterium
GC content
(%)
Gram
reaction
Approx. genome
size (MB)
1 Green Campylobacter jejuni 30 Negative 2
2 Blue Listeria monocytogenes 38 Positive 2
3 Aqua Vibrio fluvials 41 Negative 5
4 Red Salmonella enterica serovar Enteritidis 52 Negative 5
Materials and Methods
The 100K Pathogen Genome Project sample preparation
workflow begins with high molecular weight gDNA [1-3].
Successful libraries from selected bacterial isolates with dif-
ferent GC content that supports that the 2200 TapeStation can
be used throughout the entire library construction pipeline.
These microbes are listed in Table 1, and include:
Campylobacter (%GC = 30), Listeria (%GC = 38),
Vibrio (%GC = 43), and Salmonella (%GC = 52). DNA was
extracted, and followed by a cleanup with a Qiagen QIAamp
DNA Mini Kit (51306) using the manufacturer’s instructions [1].
The extracted DNA was analyzed using the 2200 TapeStation
System for high molecular weight DNA prior to shearing
(Figure 2) [4-6].
200
400
600
800
1,000
1,200
1,400
bp
bp
1234
Sample intensity (FU)
600
900
1,200
1,500
2,000
2,500
3,000
4,000
7,000
15,000
48,500
48,500
15,000
7,000
4,000
3,000
2,500
2,000
1,500
1,200
900
600
400
250
100
AB
4
DNA was sheared using Covaris g-TUBE (520079) [7]. The size
and quantity of the fragmented DNA was determined with the
2200 TapeStation System using the Genomic DNA ScreenTape
assay to confirm a normal size distribution around a ~10 kb
peak (Figure 3). Fragmented DNA was quantified using the
method described by Jeannotte, et al. (5991-4003EN) [8]
describing the 2200 TapeStation System as the 96-well plate
high-throughput workflow for quantitation and sizing of gDNA
samples for library construction [5,6,8]. Results are obtained
in approximately one to two minutes per sample.
The input into library construction with the PacBio SMRTbell
10kb Library Preparation (Figure 4) was normalized for all
samples. Once the standard 10 kb PacBio library was made
on the NGS Agilent Workstation [9,10], the final library was
confirmed with the 2200 TapeStation System with the
Genomic DNA ScreenTape assay to determine the size of the
library. Libraries were quantified using a Life Technologies
Qubit 2.0 Fluorometer with a dsDNA HS assay (Q32854)
following manufacturer protocol before submission to the
sequencing facility [11].
Results and Discussion
Genomic DNA from four bacterial samples was extracted and
analyzed for quantity and quality. The 260/280 nm and
260/230 nm ratios were used to quickly assess the contami-
nation of the gDNA with protein or organics, respectively. The
ratio of 1.8 was acceptable to proceed with additional DNA
assessment. Since the 2100 Bioanalyzer System offers resolu-
tion analysis of DNA fragments up to 12,000 bp, which is
much smaller than gDNA, it is appropriate for QC for frag-
ments after shearing. In contrast, the 2200 TapeStation
System with the Genomic DNA ScreenTape assay is able to
assess the quality and quantity of gDNA with minimal user
intervention. In electropherogram mode, each gDNA sample
was measured to determine the DNA size and quantity for an
acceptable range of > 50 kb for tha NGS pipeline. The high
molecular weight gDNA used for library construction was
determined, and an overlay of the electropherograms and a
virtual gel image for these microbe types are provided in
Figure 2.
200
400
600
800
900
700
500
300
100
1,000
bp
bp
1234
Sample intensity (FU)
600
900
1,200
1,500
2,000
2,500
3,000
4,000
7,000
15,000
48,500
48,500
15,000
7,000
4,000
3,000
2,500
2,000
1,500
1,200
900
600
400
250
100
AB
Figure 3. Representative electropherogram (A) and virtual gel (B) of sheared bacterial genomic DNA was
generated using the Agilent 2200 TapeStation System with the Genomic DNA ScreenTape assay
with the average shearing size for Campylobacter (green, 16 kb), Listeria (blue, 12 kb), Vibrio (aqua,
14 kb), and Salmonella (red, 20 kb).
5
Sheared data
After the gDNA size is determined to be > 50 kb by the 2200
TapeStation System, it is fragmented using a Covaris g-TUBE
device following manufacturer protocol for shearing the DNA
to make PacBio 10 kb libraries. Once the DNA is fragmented,
the size is normally determined using the 2100 Bioanalyzer
System with a DNA 12000 Kit. This application note used the
2200 TapeStation System with the Genomic DNA ScreenTape
assay to confirm a normal size distribution (Figure 3). The
advantage of the 2200 TapeStation System is the unique
capacity to quantify double-stranded gDNA across a large
range of sizes, especially size ranges larger than 12 kb. The
2200 TapeStation System is able to consistently measure the
size on the Genomic DNA ScreenTape assay (Figure 3).
Table 2 shows the average shearing size of various bacteria,
many within the upper range of 2100 Bioanalyzer System.
However, for Salmonella, which has a size of 20 kb, it is best
to use the 2200 TapeStation System with the Genomic DNA
ScreenTape assay. This assay can size up to 60 kb, so 20 kb
can be easily determined. It is important for PacBio 10 kb
library construction workflow to be able to determine the
sizing correctly for the sequencing facility to properly load the
libraries. The advantage could introduce a new paradigm in
DNA fragmentation and library construction in which the
input of double-stranded gDNA could be quantified precisely
in the size of interest. An input range of 1 to 5 µg of sheared
gDNA from four different bacterial isolates of varying GC
content (Table 2) was used to construct libraries.
Final library
Libraries were made following the PacBio SMRTbell 10kb
Library Preparation with the PacBio library size traditionally
confirmed with the 2100 Bioanalyzer System using the DNA
12000 Kit. As expected, when the four PacBio libraries were
analyzed, the samples overlapped with the upper marker. As
the 2200 TapeStation System can size larger fragments, up to
60 kb, we found it was able to better resolve the libraries,
without interference from other components within the assay
(Figure 4).
Figure 4. Representative electropherogram (A) and virtual gel (B) of DNA libraries were prepared for
sequencing with the PacBio SMRTbell 10 kb Library Preparation Kit on the Agilent NGS
Workstation was generated with the Agilent 2200 TapeStation System with Genomic DNA
ScreenTape assay with the average library size for Campylobacter (green, 19 kb), Listeria
(blue, 19 kb), Vibrio (aqua, 26 kb), and Salmonella (red, 24 kb).
Table 2. Average Shearing Size and Average Final Library for each Bacterium
ID no. Color Bacterium
Average shearing (kb) Average final library (kb)
Agilent TapeStation Agilent TapeStation Competitor
1 Green C. jejuni 11 19 16
2 Blue L. monocytogenes 12 19 21
3 Aqua V. fluvials 14 25 25.5
4 Red S. Enteritidis 20 24 25
20
40
60
80
90
70
50
30
10
100
AB
bp
bp
1234
Sample intensity (FU)
600
900
1,200
1,500
2,000
2,500
3,000
4,000
7,000
15,000
48,500
48,500
15,000
7,000
4,000
3,000
2,500
2,000
1,500
1,200
900
600
400
250
100
www.agilent.com/chem
Agilent shall not be liable for errors contained herein or for incidental or consequential
damages in connection with the furnishing, performance, or use of this material.
Information, descriptions, and specifications in this publication are subject to change
without notice.
© Agilent Technologies, Inc., 2015
Printed in the USA
December 18, 2015
5991-6521EN
Conclusion
Genomic DNA from four bacteria was extracted and analyzed
for quantity and quality using the Agilent 2200 TapeStation
System with the Genomic DNA ScreenTape assay in place of
the traditionally used Agilent 2100 Bioanalyzer System.
Fragmented DNA for PacBio 10 kb libraries are generally
greater than 10 kb in size. This is the classical approach to
analyze data using the 2100 Bioanalyzer System with the
DNA 12000 Kit, recommended by Pacific Biosciences. When
PacBio libraries are larger than 10 kb, they can merge into
the upper marker causing inaccuracies when assessing the
DNA fragment sizes and concentrations. Accurate quantifica-
tion and sizing of the final PacBio libraries is necessary for
the proper annealing of the sequencing primer and binding
polymerase to the SMRTbell templates before placing on the
PacBio RS II sequencer. The 2200 TapeStation System is a
high-throughput alternative that offers sufficient resolution
of large DNA fragments. The range of the 2200 TapeStation
System provides the versatility and convenience of using one
platform to accurately assess all of the important quality
control steps required of PacBio library construction.
Acknowledgements
We gratefully acknowledge the technical assistance provided
by Vivian Lee, Alvin Leonardo, Lucy Cai, San Mak, Kendra Liu,
Patrick Ancheta and Regina Agulto from the laboratory of Dr.
Bart Weimer at University of California, Davis.
References
1. Qiagen QIAamp DNA Mini Kit:http://www.qiagen.com/
products/catalog/sample-technologies/dna-sample-tech-
nologies/genomic-dna/qiaampdna-mini-kit
2. N. Kong, et al. Production and Analysis of High Molecular
Weight Genomic DNA for NGS Pipelines Using Agilent
DNA Extraction Kit (p/n 200600), Agilent Technologies,
publication number 5991-3722EN (2014).
3. B. Ganesan, et al. “Identification of the Leucine-to-2-
Methylbutyric Acid Catabolic Pathway of Lactococcus
lactis” Appl. Environ. Microbiol. 72, 4264-73 (2006).
4. Y. Xie, et al. “Expression profile of Lactococcus lactis
ssp. lactis IL1403 during environmental stress with a
DNA macroarray” Appl. Environ. Microbiol. 70, 6738-6747
(2004).
5. Agilent 2200 TapeStation User Manual, Agilent
Technologies (p/n G2964-90001).
6. Agilent Genomic DNA ScreenTape System Quick Guide,
Agilent Technologies (p/n G2964-90040 rev.B).
7. Covaris User Manual g-TUBE, http://covarisinc.com/
wp-content/uploads/pn_010154.pdf
8. R. Jeannotte, et al. “High-Throughput Analysis of
Foodborne Bacterial Genomic DNA Using Agilent 2200
TapeStation and Genomic DNA ScreenTape System”
Agilent Technologies, publication number 5991-4003EN,
(2014).
9. PacBio Procedure & Checklist -Low-Input 10 kb Library
Preparation and Sequencing, http://www.pacificbio-
sciences.com/samplenet/ProcedureChecklistLowInput10
kbLibraryPreparationandSequencingMagBeadStation.pdf
10. Agilent NGS Bravo Workstation (G5541A)
11. Qubit 2.0 Fluorometer User Manual, Life Technologies:
http://www.ebc.uu.se/digitalAssets/176/176882_3qubi
t2fluorometerusermanual.pdf
For More Information
These data represent typical results. For more information
on our products and services, visit our Web site at
www.agilent.com/chem.