Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method.
ABSTRACT Metagenomics generates and tests hypotheses about dynamics and mechanistic drivers in wild populations, yet commonly suffers from insufficient (< 1 ng) starting genomic material for sequencing. Current solutions for amplifying sufficient DNA for metagenomics analyses include linear amplification for deep sequencing (LADS), which requires more DNA than is normally available, linker-amplified shotgun libraries (LASLs), which is prohibitively low throughput, and whole-genome amplification, which is significantly biased and thus non-quantitative. Here, we adapt the LASL approach to next generation sequencing by offering an alternate polymerase for challenging samples, developing a more efficient sizing step, integrating a 'reconditioning PCR' step to increase yield and minimize late-cycle PCR artefacts, and empirically documenting the quantitative capability of the optimized method with both laboratory isolate and wild community viral DNA. Our optimized linker amplification method requires as little as 1 pg of DNA and is the most precise and accurate available, with G + C content amplification biases less than 1.5-fold, even for complex samples as diverse as a wild virus community. While optimized here for 454 sequencing, this linker amplification method can be used to prepare metagenomics libraries for sequencing with next-generation platforms, including Illumina and Ion Torrent, the first of which we tested and present data for here.
-
Citations (0)
- Cited In (1)
-
Article: The Pacific Ocean Virome (POV): A Marine Viral Metagenomic Dataset and Associated Protein Clusters for Quantitative Viral Ecology.
[show abstract] [hide abstract]
ABSTRACT: Bacteria and their viruses (phage) are fundamental drivers of many ecosystem processes including global biogeochemistry and horizontal gene transfer. While databases and resources for studying function in uncultured bacterial communities are relatively advanced, many fewer exist for their viral counterparts. The issue is largely technical in that the majority (often 90%) of viral sequences are functionally 'unknown' making viruses a virtually untapped resource of functional and physiological information. Here, we provide a community resource that organizes this unknown sequence space into 27 K high confidence protein clusters using 32 viral metagenomes from four biogeographic regions in the Pacific Ocean that vary by season, depth, and proximity to land, and include some of the first deep pelagic ocean viral metagenomes. These protein clusters more than double currently available viral protein clusters, including those from environmental datasets. Further, a protein cluster guided analysis of functional diversity revealed that richness decreased () from deep to surface waters, () from winter to summer, () and with distance from shore in surface waters only. These data provide a framework from which to draw on for future metadata-enabled functional inquiries of the vast viral unknown.PLoS ONE 01/2013; 8(2):e57355. · 4.09 Impact Factor
Page 1
Towards quantitative metagenomics of wild viruses
and other ultra-low concentration DNA samples:
a rigorous assessment and optimization of the
linker amplification methodemi_27911..12
Melissa B. Duhaime,1,†Li Deng1,2, Bonnie T. Poulos1
and Matthew B. Sullivan1,*
1Department of Ecology and Evolutionary Biology,
University of Arizona, Tucson, AZ, USA.
2Helmholtz Zentrum München-German Research Center
for Environmental Health, Institute of Groundwater
Ecology, Neuherberg, Germany.
Summary
Metagenomics generates and tests hypotheses about
dynamics and mechanistic drivers in wild popula-
tions, yet commonly suffers from insufficient (< 1 ng)
starting genomic material for sequencing. Current
solutions for amplifying sufficient DNA for metage-
nomics analyses include linear amplification for deep
sequencing (LADS), which requires more DNA than is
normally available, linker-amplified shotgun libraries
(LASLs), which is prohibitively low throughput, and
whole-genome amplification, which is significantly
biased and thus non-quantitative. Here, we adapt the
LASL approach to next generation sequencing by
offering an alternate polymerase for challenging
samples, developing a more efficient sizing step, inte-
grating a ‘reconditioning PCR’ step to increase yield
and minimize late-cycle PCR artefacts, and empiri-
cally documenting the quantitative capability of the
optimized method with both laboratory isolate and
wild community viral DNA. Our optimized linker
amplification method requires as little as 1 pg of DNA
and is the most precise and accurate available, with
G + C content amplification biases less than 1.5-fold,
even for complex samples as diverse as a wild virus
community. While optimized here for 454 sequencing,
this linker amplification method can be used to
prepare metagenomics libraries for sequencing with
next-generation platforms, including Illumina and Ion
Torrent, the first of which we tested and present data
for here.
Introduction
Microbial processes drive much of the biogeochemistry
that fuels the planet (Falkowski et al., 2008), and viruses
meddle with these microbial processes at the level of the
single cell hosts they infect, resulting in modulation of
local- and global-scale biogeochemical processes. This
has been best demonstrated in the ocean cyanobacteria
and their viruses (cyanophages), whose genomes contain
metabolically and environmentally significant genes,
including genes for photosynthesis (Mann et al., 2003;
Lindell et al., 2004; Millard et al., 2004; Sullivan et al.,
2006), phosphate stress response (Sullivan et al., 2005),
nitrogen stress response (Sullivan et al., 2010), and
nucleotide scavenging (Sullivan et al., 2005). In model
systems, these core photosynthesis genes are expressed
(Clokie et al., 2006; Lindell et al., 2007) and translated
into proteins (Lindell et al., 2005) during infection, and are
predicted to boost phage fitness (Bragg and Chisholm,
2008; Hellweger, 2009). Further, cyanophages shuffle
genes in niche-defining host genomic islands (Coleman
et al., 2006; Kettler et al., 2007; Rodriguez-Valera et al.,
2009), resulting in viral-driven changes of the host cell
surface (Avrani et al., 2011).
Yet, in most environments, there are few such model
systems available, and ultimately a community-scale
context is needed to understand the extent to which
model system findings are a reliable proxy for wild popu-
lations. Researchers commonly turn to whole community
sequencing, metagenomics (Handelsman et al., 1998), to
probe viral and microbial diversity, protein function and
population genomics. Many of these studies are hindered
by limited biomass, a consequence of targeted genomics
[e.g. stable-isotope probing (Neufeld et al., 2007), cell
sorting (Woyke et al., 2009)], low cell density microbial
communities (Biddle et al., 2008) or, as in virus studies,
small target genome sizes. For example, a typical 20 l
ocean virus sample yields on the order of 1 pg to 1 ng
Received 30 November, 2011; revised 5 April, 2012; accepted 4 May,
2012. *For correspondence. E-mail mbsulli@email.arizona.edu; Tel.
(+1) 520 626 9100; Fax(+1) 520 621 9903.
Ecology and Evolutionary Biology, University of Michigan, Ann Arbor,
Michigan, USA.
Re-useof this articleispermitted
Terms and Conditions set out at http://wileyonlinelibrary.com/
onlineopen#OnlineOpen_Terms
†Presentaddress:
inaccordancewith the
bs_bs_banner
Environmental Microbiology (2012)doi:10.1111/j.1462-2920.2012.02791.x
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd
Page 2
DNA, while 454 pyrosequencing and Illumina require
1–5 mg for standard library prep, with slightly less DNA
necessary for recent methodological advances, such as
linear amplified deep sequencing (LADS), which requires
3–40 ng DNA (Hoeijmakers et al., 2011) and Nextera,
which requires > 50 ng (Marine et al., 2011). To date, viral
researchers have relied on linker amplification shotgun
libraries (LASLs; Breitbart et al., 2002; Vega Thurber,
2009) or whole-genome amplification methods [e.g. mul-
tiple displacement amplification (MDA);Angly et al., 2006;
Dinsdale et al., 2008] to generate sufficient material.
However, the former suffers from cloning biases and
does not scale for next-generation sequencing and the
latter suffers from stochastic amplification biases, which
render the resultingmetagenomes
(Abulencia et al., 2006; Zhang et al., 2006; Arriola et al.,
2007) and can skew a community’s taxonomic profile
(Yilmaz et al., 2010), rendering cross-sample comparison
meaningless.
Two recent developments set the stage for progress,
particularly in environmental viral genomics. First, a new
precipitation method improves aquatic viral concentration
efficiencies from < 25% (typical of tangential flow filtration)
to nearly 100% (John et al., 2011). Second, the Broad
Institute recently modified LASL protocols (Breitbart et al.,
2002) for 454 pyrosequencing (Henn et al., 2010).
Briefly, in this linker amplification (LA) modification for
non-quantitative
next-generation platforms, DNA is sheared, blunt-end
repaired and linker-ligated, then gel-sized to a narrow size
range before PCR amplification to generate greater quan-
tities of the target DNA. Genome sequencing of viral iso-
lates suggested that these features minimize inherent
PCR biases (Henn et al., 2010), generally thought to be
due to heterogeneous fragment lengths and variable
primer site annealing.
Here, we further improve upon the LA method through
assessment of sensitivity – by answering ‘how low can
we go?’ with respect to starting DNA concentrations,
efficiency – by identifying and optimizing steps where
sample loss occurs, accuracy – by empirically quantifying
sequence biases introduced, and applicability – by suc-
cessful application of the method to multiple next-
generation sequencing platforms.
Results and discussion
LA method optimizations
Briefly (Fig. 1), extracted DNA is sheared to 400–800 bp
using an ultrasonic technique (Covaris). The sheared
DNAis end-repaired to facilitate ligation of oligonucleotide
linkers. Linker-ligated DNA is then size fractionated (400–
800 bp) to target the properly ligated DNA. A small-scale
PCR titration is performed to determine the optimal cycle
Fig. 1. Linker amplification (LA) method schema. This study assesses an optimized LA method, with particular focus on providing new
bar-codes in the linker ligation step to facilitate pooling of samples, as well as quantitative evaluation of the impact of amplification on resulting
isolate and community DNA genomic sequencing.
2 M. B. Duhaime, L. Deng, B. T. Poulos and M. B. Sullivan
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 3
number (lowest number of cycles resulting in a high
molecular weight product) for large-scale PCR. A three-
cycle reconditioning step is performed to reduce hetero-
duplexes, increase product yield, and enrich for high
molecular weight DNA (Thompson et al., 2002) that can
be sequenced with next generation sequencers. Here,
each step of this LA method was assessed and optimized
(Table 1; Table S1), empirically determining the effects of
DNA concentration, PCR cycle number, and recondition-
ing PCR on the resulting datasets generated from each a
clonal virus isolate and an environmental viral community
(Table 1).
First, we designed a new set of 5 bp barcodes to label
DNA samples uniquely during amplification and allow
pooling of multiple samples on the sequencing plate
(Fig. 1, Table 1).
Second, we identified an alternative high-fidelity poly-
merase (LA TaKaRa HS) to complement that previously
used (Pfu Turbo HotStart; Henn et al., 2010). Both
enzymes yield product from starting DNA concentrations
as low as 100 fg in only 30–35 PCR cycles (Table S2).
However, differences emerged. Based on sensitivity,
TaKaRa outperformed Pfu for microbial 16S samples and
an isolate genome dilution series, while the opposite held
Table 1. Summary of treatments studied in linker amplification sequence analysis.
PoolTreatment Input DNA (ng)PCR cyclesBarcode (5′–3′) Linker
Reads
(post-QC)
Pseudoalteromonas phage H105/1: clonal virus lysate
1 cyc15A 10
cyc15B
cyc15C
cyc18A1
cyc18B
cyc18C
cyc20A 0.1
cyc20B
cyc20C
cyc25A0.01
cyc25B
cyc25C
cyc30A0.001
cyc30B
cyc30C
2 cyc15rA10
cyc15rB
cyc15rC
cyc18rA1
cyc18rB
cyc18rC
cyc20rA0.1
cyc20rB
cyc20rC
cyc25rA0.01
cyc25rB
cyc25rC
cyc30rA0.001
cyc30rB
cyc30rC
unampn/a
Biosphere2 Ocean: environmental virus sample
1B2cyc15A10
B2cyc25A0.1
B2cyc15rA 10
B2cyc25rA 0.1
2B2cyc15B 10
B2cyc25B0.1
3B2cyc15C10
B2cyc25C0.1
4unamp An/a
5unamp B n/a
15 CGACA
CATAG
ATGTA
CGTGT
ACGTG
TGAGT
CTCTA
ACTCT
TGCTG
CTATG
AGCAT
TCGCA
CTGAG
ATCAG
TCATA
CGACA
CATAG
ATGTA
CGTGT
ACGTG
TGAGT
CTCTA
ACTCT
TGCTG
CTATG
AGCAT
TCGCA
CTGAG
ATCAG
TCATA
None
CCA CAC AGA TCA CGA AGC ATA C4 306
1 626
7 582
8 072
11 889
12 739
8 729
18
20
3
5 186
8 722
8 006
6 091
7 250
8 201
10 992
1 287
2 088
1 710
1 183
436
900
4 431
25
30
15 + 3a
CCA CAC AGA TCA CGA AGC ATA C
18 + 3
20 + 3
4
1 194
2 768
1 157
1 527
1 775
5 195
1 252
30 279
25 + 3
30 + 3
No ampACG AGT GCG TAT ATC GCG AGT CAT
15
25
15 + 3
25 + 3
15
25
15
25
No amp
No amp
CGACA
CAGAT
ACGTG
TACGA
CGACA
CAGAT
CGACA
CAGAT
None
None
CCA CAC AGA TCA CGA AGC ATA C222 421
212 093
119 144
111 680
261 245
340 488
246 313
310 311
132 639
160 879
CCA CAC AGA TCA CGA AGC ATA C
CCA CAC AGA TCA CGA AGC ATA C
ACG AGT GCG TAT ATC GCG AGT CAT
None
Triplicates are differentiated asA, B and C; reconditioned samples are identified with an ‘r’. When text in a row is blank, refer to the previously listed
text; for example, input DNA for each of cyc15A, cyc15B, and cyc15C is 10 ng.
a. Three additional cycles represent the reconditioning PCR.
n/a, not applicable; No amp, no amplification.
Linker amplification for ultra-low DNA samples3
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 4
true for DNA extracted from a varied collection of ocean
virus concentrates (Table S2). However, the sensitivity of
Pfu came at a cost, as this enzyme amplified a ‘no tem-
plate control’ at 30 and 35 cycles, while TaKaRa did not
(Table S2). Further, in select samples, TaKaRa yielded
more product and with a broader size range than Pfu
(Fig. S1). Finally, while Pfu was more sensitive, amplifying
some samples that could not be amplified by TaKaRa
(Table S2), the LA TaKaRa HS enzyme enriched for
sequences that were of extremely low abundance, ‘rares’,
in the original sample (addressed below in sequence
analysis, as well as in companion paper, Hurwitz et al.,
2012). In summary, we found that enzyme choice
depended on sample type and study question. If one
enzyme is unable to amplify a challenging sample,
success with the other is likely. Furthermore, the use of LA
TaKaRa HS polymerase may be of particular interest to
studies targeting components of the rare biosphere (see
Results and discussion: Assessment of systematic biases
due to amplification).
Third, we sought to determine the most efficient and
precise post-shearing size selection protocol. Efficiency
and precision are necessary as environmental sam-
ples often yield ultra-low DNA concentrations, yet
next-generation sequencing libraries require significant
amounts of DNA, e.g. 1–5 mg, of precise sizes, e.g.
400–800 bp for 454 pyrosequencing (Roche Diagnostics
GmbH,2009a)and300–450 bpforIllumina(2009;Illumina
Sample Preparation Guide).
Of the three sizing fractionation methods tested for
target recovery efficiency (fraction recovered DNA in
target 400–600 bp size range), throughput (ease of appli-
cability to numerous samples simultaneously), and risk of
cross-sample contamination, Pippin Prep, an automated
optical electrophoretic system that does not require gel
extraction, was the most efficient and reproducible (94–
96% of input DNA, n = 3; Table S3), with the tightest,
most specific sizing (Fig. 2B). Pippin Prep and Solid
Phase Reversible Immobilization (SPRI), a method
based on size-specific capture of DNA by AmPure
beads, were equally as high-throughput with low risk of
cross-contamination. Yet, SPRI was the least efficient,
recovering 46–50% of the targeted size fraction post-
shearing (n = 3), as it can not be used to bound the
upper size range (leaving DNA > 600 bp) due to a sub-
optimal ratio of PEG:DNA, the mechanism used to
remove DNA fragments (Hawkins et al., 1994). However,
to size fractionate DNA for Illumina libraries (targeting
300–450 bp), the double-SPRI (dSPRI) method has
been shown to be a viable method (Rodrigue et al.,
2010), as it caps both high and low size ranges. Of
the three methods tested, standard gel extraction had
Fig. 2. A. Comparison of sheared H105/1 genomic DNA versus unsheared. DNA was run on Agilent chip (DNA 7500 ladder).
B. Comparison of size-fractionation methods. Size-fractionation of sheared DNA targeting the 400–600 bp range. DNA was run on Agilent chip
(high-sensitivity ladder).
4 M. B. Duhaime, L. Deng, B. T. Poulos and M. B. Sullivan
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 5
moderate target recovery efficiency (64–74%). However,
this efficiency varies greatly with researcher proficiency
and, further, the size selection takes orders of magnitude
more time and risks cross-sample contamination. Based
on this comparative analysis, we recommend the Pippin
Prep automated electrophoretic system to prepare
samples for 454 pyrosequencing libraries.
Finally, we quantified the full process post-shearing for
seven phage lysates and environmental virus samples
and found that the concentration, blunt-end repair,
linker ligation and size fractionation (gel-based) was
5.7% ? 2.3% efficient, with respect to total DNA reco-
very (Table S4; note this calculation is based on gel size
fractionation – it is likely that efficiency is higher with the
newly assessed Pippin Prep, the most efficient sizing
method tested; Table S3). We have found the typical
increase in DNA yield due to LA to range from 10- to
1200-fold (Table S5).
Empirical evaluation of amplification biases
After optimizing the LA method, we sought to determine
quantitatively the influence of starting DNA concentra-
tion and amplification parameters, such as cycle number
and reconditioning (three extra rounds of amplification to
reduce heteroduplex formation; Thompson et al., 2002),
on sequence data from the clonal virus isolate, H105/1,
and environmental virus community DNA from the Bio-
sphere2 ocean (B2O; Oracle, AZ).
Effect of starting DNA, cycle number and reconditioning
on read depth. In order to discern which treatments have
an effect on read depth, the deviation of amplified read
depths from unamplified was compared (all coloured lines
of Fig. 3A). There was no significant difference between
treatments of different starting quantities of DNA and
numbers of PCR cycling (P = 0.13–0.82, pairwise two-
tailed t-tests; Table S6). This is in contrast to the popularly
used MDA, with which limiting template DNA concentra-
tion (1 ng) imposes dramatic representational biases on
resulting sequence data (Wu et al., 2006). At first pass,
there was a significant difference between recondi-
tioned and non-reconditioned treatments (P = 1.6E-05).
However, this was due to the unintentional reduced
sequencing effort of the reconditioned samples (fewer
reads causing even some regions of the genome to
approach, though never reach, zero coverage; Table 1,
Fig. 2A) and subsequent scaling of individual datasets by
sequencing effort, which artificially inflates the magnitude
of reconditioned read depths. When low coverage areas
(< sevenfold) are masked from the genome and the same
test performed, there is no significant difference between
reconditioned and non-reconditioned samples (P = 0.33).
Aiming for average genome coverage of 15¥ should help
to minimize these low coverage areas and avoid scaling
issues that can interfere with cross-dataset comparisons
when coverage is disparate. Based on this absence of
discernible biases imposed by reconditioning, this step is
highly recommended, as it has been shown to minimize
heteroduplex formation during amplification (Thompson
et al., 2002) and will ultimately result in a threefold
increase in product yield, an important consideration in
low DNA samples.
Assessment of systematic biases due to amplifica-
tion. On the whole, there was notable difference between
the amplified and unamplified samples (Fig. 3A), which
we hypothesized to result from systematic biases intro-
duced during amplification (e.g. %G + C). Indeed, by nor-
malizing the relative frequency of %G + C-binned reads to
those observed in the unamplified treatment, we found the
LA method to under-represent regions of the H105/1
genome with < 40% G + C and over-represent regions
above, by 0.5- and 1.5-fold respectively (Fig. 3B). Extend-
ing this assessment to the B2O metagenome, we found a
similar onefold over- and under-representation of reads,
relative to that seen in unamplified treatments (Fig. 4).
Notably, the %G + C of the B2O metagenome (12–84%)
spans a much wider range than H105/1 (31–55%), and is
a range characteristic of most sequenced dsDNA virus
metagenomes (Fig. S2). At the %G + C extremes seen in
the B2O community assemblage, both high and low
%G + C reads are under-represented in amplified treat-
ments, a common phenomenon inherent to PCR amplifi-
cation (Hoeijmakers et al., 2011). Regardless, these
systematic differences resulting in 0.5- to 1.5-fold biases
are a marked improvement over the stochastic represen-
tation biases of whole genome amplification, which can
lead to 100s-fold (Woyke et al., 2009) to 10 000s-fold
changes (Zhang et al., 2006) and have been shown to
render MDA-generated metagenomes non-quantitative
(Yilmaz et al., 2010). Our optimized LA results are com-
parable to observations of the new LADS (Hoeijmakers
et al., 2011) and amplification-free (Kozarewa et al., 2009)
methods, though these methods require significantly
more DNA, 3–40 ng and 100s of ng of input DNA, respec-
tively, which is often unattainable from environmental
samples.
Our experimental design allowed us to evaluate vari-
ability in replicate metagenome preparations to assess
how PCR cycling and reconditioning impacted resultant
datasets. We anticipated that increased cycle number
might exacerbate the %G + C biases. However, the lack
of such a trend indicates that replicate LA datasets pre-
pared with varying amounts of starting DNA (1 pg to
10 ng) and using different cycling conditions are quantita-
tively comparable (Figs 3 and 4), with few significant dif-
ferences between treatments (Table S7a). As with the
Linker amplification for ultra-low DNA samples5
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 6
read depth comparison, there was a higher incidence of
significant differences between non-reconditioned and
reconditionedsamples, especially
numbers (Table S7b). Indeed, a slight positive trend
existedbetweenreconditioned
number, which did not exist for the non-reconditioned
samples (Fig. S3). Thus, the additional 3-cycle recondi-
tioning step resulted in a 10-fold increase in DNA – and
DNA of higher molecular weight (Fig. S1) – at the cost of
only slight %G + C bias at higher cycle numbers (Fig. 3B).
Importantly, all biases imposed across all treatments are
still never more than 0.5- to 1.5-fold (Figs 3 and 4).
As a common goal of ecology is cross-community com-
parisons of diversity, we also examined how LA impacted
diversity profiles, represented as rarefaction curves of
at highercycle
samplesandcycle
protein clusters with at least 20 members (Fig. 5A).
Regardless of cycle number or reconditioning, the rar-
efaction curves of amplified treatments were nearly iden-
tical, while those from the unamplified treatments were
less diverse (Fig. 5). Quantification of ‘singleton’ reads
(defined at 90% identity), indicate that rare reads from the
original community DNA are enriched for in the amplified
treatments (over 62% percent of the original reads;
Fig. 5B), with more than 10% fewer singletons in the
unamplified treatments.
We hypothesize that this enrichment of rare reads is a
feature of the LA TaKaRa polymerase enzyme used
during PCR. A related study has shown the enrichment of
‘rares’does not occur in similarly prepared samples where
the Pfu Turbo master mix was used, with the resultant
Fig. 3. Quantitative evaluation of resulting isolate genome sequencing data.
A. Read depth of treatments, as mapped to the Phage H105/1 reference genome: 15–30 PCR cycles, with (red) and without (blue)
reconditioning, and unamplified (black) genomic DNA. Counts are scaled by the total number of nucleotides per treatment (Table 1) and
multiplied by a factor (1 222 442, the average number of nucleotides in all treatments), to scale to a relatable ‘read depth’ value.
B. H105/1 genome ‘GC-bias curve’ representing the relationship between %G + C and read depth, as calculated in a 500 bp sliding window
across the genome. Colour scale shared between (A) and (B).
6M. B. Duhaime, L. Deng, B. T. Poulos and M. B. Sullivan
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 7
amplified datasets thus appearing more diverse (data
published in companion paper; Hurwitz et al., 2012). This
trend is supported by Wu and colleagues, who found
pyrotag datasets prepared with a Pfu polymerase to be
less diverse than the same sample amplified with a
TaKaRa polymerase, which they attribute, quite generally,
to possible differences in the enzymes’ processivity or
proof-reading (Wu et al., 2010). These observations may
be related to the fact that during PCR, hydrolytic deami-
nation of dCTP to dUTP in already amplified products can
inhibit further amplification of these fragments by Pfu, as
it is an archaeal family B polymerase known to stall at
dU-containing DNA (Lasken et al., 1996). However, the
Pfu Turbo master mix contains a dUTPase enhancement
Fig. 4. Biosphere2 ocean metagenome
‘GC-bias curve’ representing the %G + C of
each read per treatment, relative to the
average %G + C of the unamplified
treatments.
0.20.4 0.60.8
-1.0
-0.5
0.0
0.5
1.0
B2O metagenome GC-Bias curve (relative to unamplified)
read G+C/ALL
fold change in read depth
AVG unamp
unamp A
unamp B
cyc15A, 10 ng
cyc15B, 10 ng
cyc15C, 10 ng
cyc25A, 0.1 ng
cyc25B, 0.1 ng
cyc25C, 0.1 ng
cyc15rA, 10 ng
cyc25rA, 0.1 ng
45
50
55
60
65
70
75
Distribution of singleton reads
per treatment
percent of original reads as singletons
(b)
Fig. 5. Protein cluster diversity in the Biosphere 2 ocean viral community.
A. Rarefaction curve representing the relative sequence diversity of each treatment, as measured by protein clusters (with > 20 sequence
members) derived from all amplified and unamplified treatments. Higher diversity in the amplified treatments is likely due to a preferential
amplification of the rare biosphere by the LA TaKaRa-HS DNA polymerase used in PCR.
B. Boxplot representing the range of percent original reads as singletons in each treatment. The two outliers at 51% are the unamplified
treatments.
Linker amplification for ultra-low DNA samples7
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 8
factor (Hogrefe et al., 2002) that effectively deaminates
dUTP back to dCTP, allowing continued amplification of
already amplified fragments and thus presumably main-
taining the original relative abundances of DNAfragments
in the sample. Thus, one alternate explanation may be
that if the TaKaRa enzyme is at all hindered by the pres-
ence of dUTP in already amplified templates, the enzyme
will preferentially amplify the low abundance, yet to be
amplified rare reads, as we observe. These rare reads
amplified to detection by TaKaRa do not appear to be
artefacts of amplification, as they have sequence similar-
ity to established protein clusters at the same frequency
as the abundant reads in the same dataset (Hurwitz et al.,
2012). Whether this enrichment of rare reads is beneficial
or detrimental depends heavily on the downstream appli-
cation of the sequenced dataset.
Application to other sequencing platforms. While the pro-
tocol and analysis presented here are developed for 454
pyrosequencing, with minor adjustments this LA protocol
is appropriate for samples intended for other sequencing
platforms. For instance, shearing conditions can be
altered to size templates for shorter read-length sequenc-
ing technologies (e.g. 200 bp for Ion Torrent’s current
316 chip). Further, template mixtures can be used to
sequence with Illumina technologies to circumvent issues
with the non-random linker sequence on linker-amplified
templates, which causes erroneous base-calling by the
Illumina software. To this end, we recently pooled linker
amplified DNA from 20 freshwater cyanophage isolates at
a ratio of 1:1 with known phi29 DNA template (our tem-
plate mixture) and successfully generated high quality,
full-length Illumina sequence data (Fig. S4; L. Deng and
M.B. Sullivan, unpubl. data). Alternate approaches to
prep libraries for Illumina sequencing include (i) adding a
3′-U during PCR amplification to the LA linker-containing
DNA, such that the linker can be cleaved at the U with,
e.g. the USER Enzyme (New England Biolabs; Beverly,
MA, USA), followed by an S1 nuclease clean-up of
remnant ssDNA, resulting in linker-free DNA for Illumina
library prep (described by Rodrigue et al., 2009); or (ii)
performing the LA method with Illumina linkers. The
benefit of the former being that the amplification behav-
iour of linkers in this study has been rigorously tested and
assessed here.
With ever-declining sequencing costs and sequence
analysis tools becoming more readily available, metage-
nomics has become a standard tool for investigating wild
communities. As such, it is essential to optimize and
popularize robust methods to preserve a quantitative
metagenomic signal. The optimized LAmethod presented
here enables a several orders of magnitude increase in
DNA yield that results in minimally biased (0.5- to 1.5-fold
as compared with unamplified) metagenomes appropri-
ate for comparative analyses from samples with limiting
amounts (< 1 ng) of DNA.
Experimental procedures
The development and optimization of the LA method
described below and (detailed in Table S1) are derived from
an accumulation of knowledge gained from preparing over 50
varied samples (Tables S2–5; Figs S1–5). The analyses of
sequence biases introduced by the method are based on
a single phage isolate, Pseudoaltermonas phage H105/1
(herein H105/1; Duhaime et al., 2011), and an environmental
sample from the Biosphere2 ocean (herein B2O; Oracle, AZ,
USA).
Linker amplification
Isolation of DNA. To isolate genomic DNA, H105/1 was
grown on Pseudoalteromonas sp. H105, the lysate was
polyethylene glycol/NaCl precipitated and cesium chloride
(CsCl) purified, as previously described (Sambrook and
Russell, 2001; Duhaime et al., 2011). Genomic DNA was
extracted using Wizard Prep Resin (Promega; Madison, WI,
USA) and mini-columns (Promega) as described by Henn
and colleagues (2010). For environmental virus community
DNA, 1080 l seawater was 0.2 mm filtered, and the virus-
containing filtrate iron chloride precipitated and concentrated
per John and colleagues (2011). The resuspended viral con-
centrate was treated with DNase I (100 units ml-1) for 2 h at
room temperature, then DNase activity inactivated with
100 mM EDTA/EGTA. This virus preparation was purified on
a CsCl gradient (recovering 7.1 ¥ 1011 viruses ml-1 from the
r = 1.4–1.52 CsCl fraction) (Sambrook and Russell, 2001)
and DNA was extracted from 0.6 ml using Wizard Prep
Resin (as above). Lysate and environmental DNA was
ligated to Adaptor-A (Fig. 1), diluted, and amplified at various
PCR cycle numbers (Table 1) per the LA protocol described
below.
Preparation of sheared, Linker-A Ligated DNA. Covaris
Adaptive FocusedAcoustics (AFA) was used to shear DNAto
400–800 bp, with the following parameters: 130 ml of DNA in
Tris EDTA (TE) buffer (up to 5 mg total DNA), duty cycle of
5%, intensity of 3200 cycles per burst, at 6–8°C for 62 or
120 s, for the Covaris E210 and S2 models respectively. DNA
fragment sizes were determined before and after shearing on
a DNA7500 or High Sensitivity chip in theAgilent Bioanalyzer
2100 (Agilent Technologies; Santa Clara, CA, USA). Follow-
ing shearing, low yield (< 500 ng) samples were concentrated
two- to threefold (final volume 35–50 ml) with Amicon
Ultra-0.5100 kDa filter units (Millipore; Billerica, MA, USA),
according to manufacturer’s directions.
Hemi-phosphorylated Linker-A (Fig. 1) was prepared by
annealing a synthesized forward linker (5′-phosphorylated-
GTA TGC TTC GTG ATC TGT GTG GGT GT-3′; 1.14 mM in
TE) to the reverse linker (5′-CCA CAC AGA TCA CGA AGC
ATA C-3′; 1.14 mM in TE; not phosphorylated in order to
promote unidirectional ligation) in a 50 mM NaCl buffer. Equal
volumes of forward and reverse linkers were heated to
100°C, slowly cooled on the bench to room temperature, and
8 M. B. Duhaime, L. Deng, B. T. Poulos and M. B. Sullivan
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 9
placed on ice for 5 min. Resultant Linker-A was diluted to
10 mM in TE and stored at -20°C for subsequent use.
Blunt-end repair of sheared DNA, ligation to Linker-A and
required cleanup reactions (Fig. 1) were performed as
described (Henn et al., 2010) using the End-It DNA End-
Repair kit (Epicentre Biotechnologies; Madison, WI, USA),
Fast-Link DNA Ligation kit (Epicentre Biotechnologies)
and Min-Elute Reaction Clean-up kit (Qiagen; Valencia, CA,
USA) respectively. Sheared, linker-ligated DNA was then
size-fractionated.
H105/1 genomic DNA, 10 mg in 130 ml (77.4 ng ml-1) was
sheared to 400–600 bp (Covaris) to test three sizing methods
in triplicate: gel electrophoresis, SPRI and a digital optical
electrophoretic system, Pippin Prep (Sage Science; Beverly,
MA, USA). Each method was tested with 13.7 ml of sheared
DNA at a concentration of 14.08 ng ml-1(183 ng total DNA).
For gel extraction, sheared DNA was run on a 1.5% agarose
gel stained with ethidium bromide for 90 min at 80 V. Gel
fragments in the 400–600 bp size range were excised, puri-
fied to remove agarose (Qiagen MinElute Gell Extraction kit),
and eluted in 20 ml EB buffer, as described (Henn et al.,
2010). SPRI with Agencourt AMPure XP beads (Beckman
Coulter; Danvers, MA, USA) was used to remove the less
than 400 bp fragments (detailed in Results and discussion:
LA method optimizations). To specifically retain the desired
greater than 400 bp fragments a bead to DNA ratio of 55:100
was used, as determined per Roche (Roche Diagnostics
GmbH, 2009a), such that the final reaction contained 13 ml
sheared DNA, 7 ml TE buffer, and 11 ml Ampure beads. DNA
was eluted in 20 ml EB buffer. Pippin Prep samples were run
in pre-cast 2% agarose gel cassettes, pre-stained with
ethidium bromide (Sage Science), set to recover the 400–
600 bp fragments. DNA was recovered in 39–42 ml final
volumes (Table S3). Following each method, DNA accurately
retained in the target 400–600 bp range was quantified to
determine target recovery efficiency using an Agilent Bioana-
lyzer 2100 (Agilent Technologies; Santa Clara, CA, USA).
Polymerase evaluation, barcode design. Two high fidelity
polymerases, Pfu Turbo Hotstart (Stratagene; La Jolla, CA,
USA) and TaKaRa LA Taq Hotstart (Takara Bio; Shiga,
Japan), both with 3′ to 5′ exonuclease (‘proof-reading’) activ-
ity and an antibody quencher for hotstart capability, were
assessed for efficiency and sensitivity. Both enzyme reac-
tions used 1–2 ml Linker-A ligated and size-fractionated DNA
with 0.5 ml (5 pmol) of the 10 mM PCR phos-A primer. For
the Pfu Turbo Hotstart system, 12.5 ml Pfu Turbo Hotstart
2¥ Master Mix (0.1 U Pfu Turbo ml-1) was used, while the
TaKaRa LA HS system required 2.5 ml 10¥ PCR buffer, 4.0 ml
of 2.5 mM dNTP mix (10 nmol each), and 0.25 ml TaKaRa LS
Taq HS (5 U LA TaKaRa ml-1). Both reactions were brought to
25 ml with nuclease-free water. For all reactions prepared for
the sequence analysis (H105/1 phage genome and B2O
environmental DNA), the TaKaRa LA Taq Hotstart system
was used.
A series of 5 bp barcodes was added to the ‘phos-A PCR
primer’(5′-p-CCACACAGATCACGAAGCATAC-3′)
et al., 2010), such that a sample-specific, unique barcode
would be added at the 5′ end of each DNA fragment during
amplification (Fig. 1, barcodes listed in Table 1). The bar-
codes were designed such that (i) no consecutive duplicate
(Henn
nucleotides exist, (ii) barcodes differ by at least 2 nucleotides,
(iii) no C appears at the 3′ end, as Linker-Ahas a 5′ C (Fig. 1),
and (iv) no G appears at the 5′ end, as the 454 emPCR
primers have a 3′ G (forward: 5′-CGTATCGCCTCCCT
CGCGCCATCAG-3′; reverse: 5′-CTATGCGCCTTGCCAG
CCCGCTCAG-3′) (Roche Diagnostics GmbH, 2009b).
Small-scale PCR titration. To determine the minimum
number of PCR cycles needed for amplification of each
sample, a small-scale PCR titration was performed with
varying cycle numbers [95°C for 2 min, (95°C for 30 s, 60°C
for 60 s, 72°C for 90 s) ¥ 15, 18, 20, 25, or 30, 72°C for
10 min]. Generally, three cycle numbers were tested accord-
ing to the amount of DNA available after ligation and sizing,
as a log-linear relationship exists between input DNA and
cycle number needed for amplification (Fig. S5). Generally,
1–10 ng DNA samples were run for 15–20 cycles, 0.1–1 ng
for 18–25 cycles, 10–100 pg for 22–30 cycles, and less than
10 pg for 25–35 cycles. If DNA was not amplified by 35
cycles, more DNAwas added to the PCR reaction (up to 10%
of the PCR reaction volume) or the sample was concentrated
using AMPure XP beads (80 ml beads to 100 ml DNA) and
PCR titration attempted once more.
PCR products were analysed on 1.5% agarose gels with
0.5 ng ml-1ethidium bromide, run in 1¥ Tris-acetate-EDTA
(TAE) buffer at 90 V for 30 min, using 5 ml PCR product mixed
with 1 ml 6¥ Blue/Orange Loading Dye (Promega). Quick
Load 100 bp DNA Ladder (New England Biolabs; Beverly,
MA, USA), 250 bp DNA Ladder (Invitrogen; Carlsbad, CA,
USA), or 1 kB Plus DNA Ladder (Invitrogen) was used for
fragment size determination.
Large-scale amplification and ‘reconditioning PCR’. Sample
DNA was amplified (per PCR protocol above) using the
number of cycles determined by the small-scale titration.
Depending on quantity of starting DNA, six to ten reactions
were performed in this step, resulting in up to 250 ml PCR
product, to ensure sufficient DNA for sequencing (1–5 mg per
standard 454 pyrosequencing or Illumina library).
To both increase yield and minimize heteroduplex forma-
tion, a ‘reconditioning PCR’ step was added, sensu Thomp-
son and colleagues (2002). To recondition, the amplified DNA
is diluted 10-fold in a fresh PCR reaction mix (200 ml reac-
tions with 2.5 ml TaKaRa LA HS and 20 ml of small-scale PCR
product as template, all other reagents in same proportions
as in small-scale titration) and amplified for three cycles
[95°C for 2 min, (95°C for 30 s, 60°C for 60 s, 72°C for
90 s) ¥ 3, 72°C for 10 min], effectively increasing the primer-
to-template ratio by replenishing the reaction with new primer.
All PCR products previously generated were reconditioned,
resulting in up to 2.5 ml of final reconditioned product,
which was then concentrated to 250 ml using Amicon
Ultra-0.5100 kDa centrifugal columns, purified with the
MinElute PCR Purification (Qiagen) kit, and DNA eluted off
the mini-columns with 25–40 ml TE buffer warmed to 80°C.
Note that in order to determine the effect of this treatment on
resultant sequence data, parallel treatments of phage H105/1
genome and B2O environmental DNA were not recondi-
tioned (Table 1). Finally, amplified samples were quantified
(Quant-iT Pico Green for dsDNA; Invitrogen) and, where
sample pooling was necessary, samples with unique bar-
Linker amplification for ultra-low DNA samples9
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 10
codes were mixed in equimolar amounts for sequencing
library preparation. Libraries were sequenced using GS FLX
Titanium 454 pyrosequencing (no paired-ends) or the Illu-
mina HiSeq (paired-end reads, one channel with raw output
of 19.9 GB).
Sequence analysis
Phage genome analysis. Genomic reads were mapped
per treatment to the complete H105/1 genome (RefSeq
NC_015293) using gsMapper (v 2.5.3). Read depths were
divided by the total number of nucleotides per treatment to
normalize for sequencing effort. %G + C and average read
depths were calculated over a 500 bp sliding window to
generate ‘GC bias’ plots.
Environmental
metagenome reads were subjected to quality control to
remove reads that (i) contained an ambiguous base, (ii) were
at least two standard deviations from the mean sequence
length per plate, (iii) were at least two standard deviations
from the mean quality score per plate, or (iv) were identified
by cd-hit454 (default parameters) as emulsion-PCR repli-
cates. %G + C was calculated per read. For the rarefaction
analysis, reads from all treatments were assembled using
Velvet (Zerbino and Birney, 2008) with automatic coverage
estimation and a k-mer length of 21. Genes were predicted
on all raw reads and contigs greater than 100 bp using Prodi-
gal (metagenomic gene finding, remainder of parameters as
default) (Hyatt et al., 2010). Protein clusters with 60% within-
cluster identity were built using cd-hit (Li and Godzik, 2006)
with word length of 4. Original reads were non-redundantly
mapped to protein cluster representative sequences using
blastx (Altschul et al., 1990) with 60% identity cut-off. Rar-
efaction curves were generated by sampling protein clusters
of > 20 members without replacement, increasing sampling
effort in 5000-read steps (100 replicates) using R scripts. To
compare the number of ‘rares’, reads were clustered per
treatment using cdhit-est (Li and Godzik, 2006) at 90%
within-cluster identity, considering both strands, and with a
word size of 8. The number of singletons per treatment was
defined as the number of reads in single-member clusters.
virusmetagenome. Biosphere2ocean
Statistical tests. Tests comparing the relative read depths
and %G + C bias between unamplified and amplified treat-
ments were performed using a series of paired two-tailed
Student’s t-tests. To quantify the magnitude of read depth and
%G + C bias, the area between the unamplified and amplified
curves was calculated by trapezoid integration. Pairwise tests
were performed to compare (i) all reconditioned versus all
non-reconditioned samples (n = 14 pairs), (ii) the recondi-
tioned and non-reconditioned per cycle number (n = 3 pairs),
(iii) all treatments of a cycle number against each of the
remaining cycle number treatments, reconditioned and non-
reconditioned combined (n = 6 pairs), and (iv) ‘c’ repeated,
testing reconditioned and non-reconditioned separately
(n = 3 pairs).
The LA protocol is available at http://eebweb.arizona.edu/
faculty/mbsulli/protocols.htm. All sequence data is available
on the CAMERA web portal, tracking number CAM_
P_0000912.
Acknowledgements
We thank N. Strange-Thomann, M. Osburne and V. Rich for
discussion during methodological optimization, V. Denef for
discussions during analysis, members of the Tucson Marine
Phage Lab for comments on manuscript drafts. Biosphere2
Ocean samples were collected by A. Gregory, N. Solonenko
and C. Decker. Phage isolate sequencing was performed by
Emory University GRA Genomic Center; environmental
samples were sequenced by the University ofArizona Genet-
ics Core Facility. We further acknowledge the UITS Research
Computing Group and the Arizona Research Laboratories
Biotech Computing for high-performance computing access
and support. The Joint Genome Institute Community
Sequencing Program is acknowledged for providing the
freshwater cyanophage pooling experiment on Illumina
HiSeq (CSP grant awarded to L.D.) and discussions regard-
ing possible modifications for Illumina library prep (personal
communication with C. Hoover); the remainder of the work
was funded by a Gordon and Betty Moore Foundation grant
to M.B.S.
References
Abulencia, C.B., Wyborski, D.L., Garcia, J.A., Podar, M.,
Chen, W., and Chang, S.H. (2006) Environmental whole-
genome amplification to access microbial populations in
contaminated sediments. Appl Environ Microbiol 72: 3291–
3301.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman,
D.J. (1990) Basic local alignment search tool. J Mol Biol
215: 403–410.
Angly, F.E., Felts, B., Breitbart, M., Salamon, P., Edwards,
R.A., and Carlson, C. (2006) The marine viromes of four
oceanic regions. PLoS Biol 4: e368.
Arriola, E., Lambros, M.B., Jones, C., Dexter, T., Mackay, A.,
and Tan, D.S. (2007) Evaluation of Phi29-based whole-
genome amplification for microarray-based comparative
genomic hybridisation. Lab Invest 87: 75–83.
Avrani, S., Wurtzel, O., Sharon, I., Sorek, R., and Lindell, D.
(2011) Genomicisland
Prochlorococcus-virus coexistence. Nature 474: 604–608.
Biddle, J.F., Fitz-Gibbon, S., Schuster, S.C., Brenchley, J.E.,
and House, C.H. (2008) Metagenomic signatures of the
Peru Margin subseafloor biosphere show a genetically dis-
tinct environment. Proc Natl Acad Sci USA 105: 10583–
10588.
Bragg, J.G., and Chisholm, S.W. (2008) Modeling the fitness
consequences of a cyanophage-encoded photosynthesis
gene. PLoS ONE 3: e3550.
Breitbart, M., Salamon, P., Andresen, B., Mahaffy, J.M.,
Segall, A.M., and Mead, D. (2002) Genomic analysis of
uncultured marine viral communities. Proc Natl Acad Sci
USA 99: 14250–14255.
Clokie, M.R., Shan, J., Bailey, S., Jia, Y., Krisch, H.M., and
West, S. (2006) Transcription of a ‘photosynthetic’ T4-type
phage during infection of a marine cyanobacterium.
Environ Microbiol 8: 827–835.
Coleman, M.L., Sullivan, M.B., Martiny, A.C., Steglich, C.,
Barry, K., and Delong, E.F. (2006) Genomic islands and the
ecology and evolution of Prochlorococcus. Science 311:
1768–1770.
variability facilitates
10M. B. Duhaime, L. Deng, B. T. Poulos and M. B. Sullivan
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 11
Dinsdale, E.A., Edwards, R.A., Hall, D., Angly, F., Breitbart,
M., and Brulc, J.M. (2008) Functional metagenomic profil-
ing of nine biomes. Nature 452: 629–632.
Duhaime, M.B., Wichels, A., Waldmann, J., Teeling, H., and
Glöckner, F.O. (2011) Ecogenomics and genome land-
scapes of marine Pseudoalteromonas phage H105/1.
ISME J 5: 107–121.
Falkowski, P.G., Fenchel, T., and Delong, E.F. (2008) The
microbial engines that drive Earth’s biogeochemical cycles.
Science 320: 1034–1039.
Handelsman, J., Rondon, M.R., Brady, S.F., Clardy, J., and
Goodman, R.M. (1998) Molecular biological access to the
chemistry of unknown soil microbes: a new frontier for
natural products. Chem Biol 5: R245–R249.
Hawkins, T.L., O’Connor-Morin, T., Roy, A., and Santillan, C.
(1994) DNA purification and isolation using a solid-phase.
Nucleic Acids Res 22: 4543–4544.
Hellweger, F.L. (2009) Carrying photosynthesis genes
increases ecological fitness of cyanophage in silico.
Environ Microbiol 11: 1386–1394.
Henn, M.R., Sullivan, M.B., Stange-Thomann, N., Osburne,
M.S., Berlin, A.M., and Kelly, L. (2010) Analysis of high-
throughput sequencing and annotation strategies for
phage genomes. PLoS ONE 5: e9083.
Hoeijmakers, W.A., Bártfai, R., Françoijs, K.J., and Stunnen-
berg, H.G. (2011) Linear amplification for deep sequencing.
Nat Protoc 6: 1026–1036.
Hogrefe, H.H., Hansen, C.J., Scott, B.R., and Nielson, K.B.
(2002) Archaeal dUTPase
fications with archaeal DNA polymerases by prevent-
ing dUTP incorporation. Proc Natl Acad Sci USA 99:
596–601.
Hurwitz, B.H., Deng, L., Poulos, B.T., and Sullivan, M.B.
(2012) Evaluation of methods to concentrate and purify
wild ocean virus communities through comparative, repli-
cated metagenomics. Environ Microbiol, in press.
Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer,
F.W., and Hauser, L.J. (2010) Prodigal: prokaryotic gene
recognition and translation initiation site identification. BMC
Bioinformatics 11: 119.
Illumina (2009) Paired-End Sequencing Sample Preparation
Guide. Catalog # PE-930-1001 Part # 1005063 Rev. B.
John, S.G., Mendez, C.B., Deng, L., Poulos, B., Kauffman,
A.K., and Kern, S. (2011) A simple and efficient method for
concentration of ocean viruses by chemical flocculation.
Environ Microbiol Rep 3: 195–202.
Kettler, G.C., Martiny, A.C., Huang, K., Zucker, J., Coleman,
M.L., and Rodrigue, S. (2007) Patterns and implications of
gene gain and loss in the evolution of Prochlorococcus.
PLoS Genet 3: e231.
Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman,
M., and Turner, D.J. (2009) Amplification-free Illumina
sequencing-library preparation
mapping and assembly of (G+C)-biased genomes. Nat
Methods 6: 291–295.
Lasken, R.S., Schuster, D.M., and Rashtchian, A. (1996)
Archaebacterial DNA polymerases tightly bind uracil-
containing DNA. J Biol Chem 271: 17692–17696.
Li, W., and Godzik, A. (2006) Cd-hit: a fast program for
clustering and comparing large sets of protein or nucleotide
sequences. Bioinformatics 22: 1658–1659.
enhances PCRampli-
facilitates improved
Lindell, D., Sullivan, M.B., Johnson, Z.I., Tolonen, A.C.,
Rohwer, F., and Chisholm, S.W. (2004) Transfer of photo-
synthesis genes to and from Prochlorococcus viruses.
Proc Natl Acad Sci USA 101: 11013–11018.
Lindell, D., Jaffe, J.D., Johnson, Z.I., Church, G.M., and
Chisholm, S.W. (2005) Photosynthesis genes in marine
viruses yield proteins during host infection. Nature 438:
86–89.
Lindell, D., Jaffe, J.D., Coleman, M.L., Futschik, M.E.,
Axmann, I.M., and Rector, T. (2007) Genome-wide expres-
sion dynamics of a marine virus and host reveal features of
co-evolution. Nature 449: 83–86.
Mann, N.H., Cook, A., Millard, A., Bailey, S., and Clokie, M.
(2003) Bacterial photosynthesis genes in a virus. Nature
424: 741.
Marine, R., Polson, S.W., Ravel, J., Hatfull, G., Russell, D.,
and Sullivan, M., et al. (2011) Evaluation of a transposase
protocol for rapid generation of shotgun high-throughput
sequencing libraries from nanogram quantities of DNA.
Appl Environ Microbiol 77: 8071–8079.
Millard, A., Clokie, M.R., Shub, D.A., and Mann, N.H. (2004)
Genetic organization of the psbAD region in phages infect-
ing marine Synechococcus strains. Proc Natl Acad Sci
USA 101: 11007–11012.
Neufeld, J.D., Vohra, J., Dumont, M.G., Lueders, T., Man-
efield, M., and Friedrich, M.W. (2007) DNA stable-isotope
probing. Nat Protoc 2: 860–866.
Roche Diagnostics GmbH (2009a) GS FLX Titanium General
Library Preparation. Method Manual USM-0048.B.
Roche Diagnostics GmbH (2009b) Amplicon Fusion Primer
Design Guidelines for GS FLX Titanium Series Lib-A
Chemistry. Technical Bulletin TCB No. 013-2009: 1–3.
Rodrigue, S., Malmstrom, R.R., Berlin, A.M., Birren, B.W.,
Henn, M.R., and Chisholm, S.W. (2009) Whole genome
amplification and de novo assembly of single bacterial
cells. PLoS ONE 4: e6864.
Rodrigue, S., Materna, A.C., Timberlake, S.C., Blackburn,
M.C., Malmstrom, R.R., and Alm, E.J. (2010) Unlocking
short read sequencing for metagenomics. PLoS ONE 5:
e11840.
Rodriguez-Valera, F., Martin-Cuadrado, A.B., Rodriguez-
Brito, B., Pasic ´, L., Thingstad, T.F., and Rohwer, F. (2009)
Explaining microbial population genomics through phage
predation. Nat Rev Microbiol 7: 828–836.
Sambrook, J., and Russell, D.W. (2001) Molecular Cloning: A
Laboratory Manual, 3rd edn. Cold Spring Harbor, NY, USA:
Cold Spring Harbor Laboratory Press.
Sullivan, M.B., Coleman, M.L., Weigele, P., Rohwer, F., and
Chisholm, S.W. (2005) Three Prochlorococcus cyanoph-
age genomes: signature features and ecological interpre-
tations. PLoS Biol 3: e144.
Sullivan, M.B., Lindell, D., Lee, J.A., Thompson, L.R.,
Bielawski, J.P., and Chisholm, S.W. (2006) Prevalence
and evolution of core photosystem II genes in marine
cyanobacterial viruses and their hosts. PLoS Biol 4:
e234.
Sullivan, M.B., Huang, K.H., Ignacio-Espinoza, J.C., Berlin,
A.M., Kelly, L., and Weigele, P.R. (2010) Genomic analysis
of oceanic cyanobacterial myoviruses compared with
T4-like myoviruses from diverse hosts and environments.
Environ Microbiol 12: 3035–3056.
Linker amplification for ultra-low DNA samples 11
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 12
Thompson, J.R., Marcelino, L.A., and Polz, M.F. (2002) Het-
eroduplexes in mixed-template amplifications: formation,
consequence and elimination by ‘reconditioning PCR.
Nucleic Acids Res 30: 2083–2088.
Vega Thurber, R. (2009) Current insights into phage biodi-
versity and biogeography. Curr Opin Microbiol 12: 582–
587.
Woyke, T., Xie, G., Copeland, A., González, J.M., Han, C.,
and Kiss, H. (2009) Assembling the marine metagenome,
one cell at a time. PLoS ONE 4: e5299.
Wu, J.Y., Jiang, X.T., Jiang, Y.X., Lu, S.Y., Zou, F., and Zhou,
H.W. (2010) Effects of polymerase, template dilution and
cycle number on PCR based 16 S rRNA diversity analysis
using the deep sequencing method. BMC Microbiol
10: 255.
Wu, L., Liu, X., Schadt, C.W., and Zhou, J. (2006) Microarray-
based analysis of subnanogram quantities of microbial
community DNAs by using whole-community genome
amplification. Appl Environ Microbiol 72: 4931–4941.
Yilmaz, S., Allgaier, M., and Hugenholtz, P. (2010) Multiple
displacementamplification
analysis of metagenomes. Nat Methods 7: 943–944.
Zerbino, D.R., and Birney, E. (2008) Velvet: algorithms for de
novo short read assembly using de Bruijn graphs. Genome
Res 18: 821–829.
Zhang, K., Martiny, A.C., Reppas, N.B., Barry, K.W., Malek,
J., and Chisholm, S.W. (2006) Sequencing genomes from
single cells by polymerase cloning. Nat Biotechnol 24:
680–686.
compromisesquantitative
Supporting information
Additional Supporting Information may be found in the online
version of this article:
Fig. S1. Comparison of DNA polymerase efficiency and
effect of reconditioning PCR on two phage lysates, TUSD #20
and #23. LA-TaKaRa yielded more product and with a
broader size range than PFU Turbo HotStart. Original PCR
product is diluted 10¥ for input in reconditioning reaction,
and thus results in a 10-fold increase in product. Also, note
the enrichment for high molecular weight product following
reconditioning.
Fig. S2. %G + C range of virus communities, as seen using
various amplification (linker amplification, phi29 multiple dis-
placement amplification, and linker amplified shotgun librar-
ies)and sequencingplatforms
Sanger sequencing). (A) in-house sequence data, (B) and
(C) are identifiable by name and available on the CAMERA
web portal, including all metadata and publication refer-
ences.
Fig. S3. The magnitude of difference in %G + C bias between
unamplified and amplified treatments was assessed by the
magnitude of difference between the integrated area under
their %G + C-bias plot curves (Fig. 3). As cycle number
increased (and quantity of starting material decreased), there
was a slight trend in the deviation of amplified from unampli-
fied treatments.
Fig. S4. Quality score profile of Illumina reads from 20 pooled
freshwater cyanophage genomes (L. Deng and M.B. Sulli-
van, unpubl. data) generated from a linker amplified library.
Boxplot generated by Fastx Toolbox.
Fig. S5. Log-linear relationship between PCR cycle numbers
and starting DNA, as it is diluted to extinction. Test was
performed with dilution of a single phage genome, H105/1.
Table S1.
Detailed break-down
method, including time and cost estimates from sample to
sequence for a typical 20 l aquatic virus sample.
Table S2. Comparison of DNA polymerase sensitivity and
specificity.
Table S3. Comparison of sheared DNA size-fractionation
techniques.
Table S4. Evaluation of linker amplification efficiency.
Table S5. Increase in DNA as a result of linker amplification,
from both environmental and phage lysate DNA preps.
Table S6. Two-tailed paired Student’s t-tests comparing the
integrated areas between unamplified read depth curve and
curve of each amplified treatment.
Table S7. Two-tailed paired Student’s t-tests comparing the
integrated areas between unamplified %G + C bias curve of
unamplified and each amplified treatment for H105/1.
(454pyrosequencing,
of linkeramplification
Please note: Wiley-Blackwell are not responsible for the
content or functionality of any supporting materials supplied
by the authors. Any queries (other than missing material)
should be directed to the corresponding author for the
article.
12M. B. Duhaime, L. Deng, B. T. Poulos and M. B. Sullivan
© 2012 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology
Page 13
Supplementary?? Figure?? 1.?? Comparison?? of?? DNA?? polymerase?? efficiency?? and?? effect?? of?? reconditioning??
PCR?? on?? two?? phage?? lysates,?? TUSD?? #20?? and?? #23.?? LA-‐TaKaRa?? yielded?? more?? product?? and?? with?? a??
broader?? size?? range?? than?? PFU?? Turbo?? HotStart.?? Original?? PCR?? product?? is?? diluted?? 10X?? for?? input?? in??
reconditioning?? reaction,?? and?? thus?? results?? in?? a?? 10-‐fold?? increase?? in?? product.?? Also,?? note?? the??
enrichment?? for?? high?? molecular?? weight?? product?? following?? reconditioning.??
??
??
??
??
??
??
??
Page 14
Supplementary?? Figure?? 2.?? ?? %G+C?? range?? of?? virus?? communities,?? as?? seen?? using?? various?? amplification??
(linker?? amplification,?? phi29?? multiple?? displacement?? amplification,?? and?? linker?? amplified?? shotgun??
libraries)?? and?? sequencing?? platforms?? (454?? pyrosequencing,?? Sanger?? sequencing).?? (a)?? in-‐house??
sequence?? data,?? (b)?? and?? (c)?? are?? identifiable?? by?? name?? and?? available?? on?? the?? CAMERA?? web?? portal,??
including?? all?? metadata?? and?? publication?? references.??
??
??
0.00.20.40.60.8 1.0
0
2
4
6
8
10
12
Linker Amplifcation / 454 Pyrosequencing
g+c/all
read count density
unamp (this study)
amp (this study)
SIO10 TFF-CsCl
SIO2 FeCl-DNase
SIO9 FeCl-Sucrose
Fitzroy2
Dunk3
AntJanFreeVir
AntJanIndVir
AntNovFreeVir
AntNovIndVir
MBARI Coastal 10m
MBARI Upwell 10m
MBARI Upwell-DCM 42m
MBARI Oligo-DCM 105m
MBARI Oligo 1000m
MBARI Oligo 4300m.gc
0.0 0.20.4 0.60.8 1.0
0
2
4
6
8
10
12
phi29 Amplifcation / 454 pyrosequencing
g+c/all
read count density
Stroma Highborne
Stroma Pozas Azules
Stroma Rio Mosquites
Recl H2O: Effluent DNA
Recl H2O: Effluent RNA
Recl H2O: Nursery DNA
Recl H2O: Nursery DNA
Recl H2O: Park DNA
Recl H2O: Potal DNA
Tampa Bay Prophage
0.0 0.20.40.60.8 1.0
0
2
4
6
8
10
12
LA Shotgun Library / dideoxy Sanger
g+c/all
read count density
Human Fecal RNA 1
Human Fecal RNA 2
Human Fecal RNA 3
Hot Spring BEARPAW
Hot Spring OCTOPUS
Chesapeake Bay
(a)
(b)(c)
Page 15
Supplementary?? Figure?? 3.?? The?? magnitude?? of?? difference?? in?? %G+C?? bias?? between?? unamplified?? and??
amplified?? treatments?? was?? assessed?? by?? the?? magnitude?? of?? difference?? between?? the?? integrated?? area??
under?? their?? %G+C-‐bias?? plot?? curves?? (Figure?? 3).?? As?? cycle?? number?? increased?? (and?? quantity?? of??
starting?? material?? decreased),?? there?? was?? a?? slight?? trend?? in?? the?? deviation?? of?? amplified?? from??
unamplified?? treatments.??
??
Trend in cycle # and integrated diference between %G+C-bias curves
of unamplifed and amplifed treatments
??
??