ArticlePDF Available

Abstract and Figures

High-throughput DNA barcoding has become essential in ecology and evolution but some technical questions still remain. Increasing the number of PCR cycles above routine 20-30 cycles is a common practice when working with old-type specimens, with little amounts of DNA, or when facing annealing issues with the primers. However, increasing the number of cycles can raise the number of artificial mutations due to polymerase errors. In this work we sequenced 20 COI libraries in the Illumina MiSeq platform. Libraries were prepared with 40, 45, 50, 55, and 60 PCR cycles from four individuals belonging to four species of four genera of cephalopods. We found no relationship between the number of PCR cycles and the number of mutations despite using a nonproofreading polymerase. Moreover, even when using a high number of PCR cycles the resulting number of mutations was low enough not to be an issue in the context of high-throughput DNA barcoding (but may still remain an issue in DNA metabarcoding due to chimera formation). We conclude that the common practice of increasing the number of PCR cycles should not negatively impact the outcome of a high-throughput DNA barcoding study in terms of the occurrence of point mutations.
Content may be subject to copyright.
NOTE
PCR cycles above routine numbers do not compromise
high-throughput DNA barcoding results
J. Vierna, J. Doña, A. Vizcaíno, D. Serrano, and R. Jovani
Abstract: High-throughput DNA barcoding has become essential in ecology and evolution, but some technical questions still
remain. Increasing the number of PCR cycles above the routine 20–30 cycles is a common practice when working with old-type
specimens, which provide little amounts of DNA, or when facing annealing issues with the primers. However, increasing the
number of cycles can raise the number of artificial mutations due to polymerase errors. In this work, we sequenced 20 COI
libraries in the Illumina MiSeq platform. Libraries were prepared with 40, 45, 50, 55, and 60 PCR cycles from four individuals
belonging to four species of four genera of cephalopods. We found no relationship between the number of PCR cycles and the number
of mutations despite using a nonproofreading polymerase. Moreover, even when using a high number of PCR cycles, the resulting
number of mutations was low enough not to be an issue in the context of high-throughput DNA barcoding (but may still remain
an issue in DNA metabarcoding due to chimera formation). We conclude that the common practice of increasing the number of
PCR cycles should not negatively impact the outcome of a high-throughput DNA barcoding study in terms of the occurrence of
point mutations.
Key words: COI, DNA barcoding, Illumina, library, mutations, non-proofreading polymerase.
Résumé : Le codage a
`barres a
`haut débit de l’ADN est devenu essentiel en écologie et en évolution, mais certaines questions
techniques demeurent. Un accroissement du nombre de cycles de PCR au-dela
`des 20–30 cycles usuels est pratique commune
lorsqu’on travaille avec des spécimens anciens, lesquels fournissent peu d’ADN, ou lorsque des problèmes d’appariement sont
rencontrés avec les amorces. Cependant, l’accroissement du nombre de cycles peut augmenter le nombre de mutations artifi-
cielles dues aux erreurs de la polymérase. Dans ce travail, les auteurs ont séquencé 20 librairies COI sur un appareil MiSeq
d’Illumina. Les librairies ont été préparées a
`partir de quatre individus appartenant a
`quatre espèces au sein de quatre genres de
céphalopodes en complétant 40, 45, 50, 55 ou 60 cycles de PCR. Les auteurs n’ont observé aucune relation entre le nombre de
cycles de PCR et le nombre de mutations, en dépit de l’utilisation d’une enzyme sans activité exonucléase 3=¡5= proofreading »).
De plus, même au terme d’un grand nombre de cycles, le nombre de mutations était suffisamment faible pour ne pas constituer
un problème dans le contexte du codage a
`barres a
`haut débit (bien qu’il puisse en constituer un dans le cas du métacodage a
`
barres de l’ADN en raison de la formation de chimères). Les auteurs concluent que la pratique courante d’augmenter le nombre
de cycles de PCR ne devrait pas avoir d’impact négatif sur les résultats d’études faisant appel au codage a
`barres a
`haut débit en
matière d’occurrence de mutations ponctuelles. [Traduit par la Rédaction]
Mots-clés : COI, codage a
`barres de l’ADN, Illumina, librairie, mutations, polymérase sans activité exonucléase 3=¡5=.
Introduction
High-throughput DNA barcoding (for single specimens; Shokralla
et al. 2014,2015;Toju 2015), as well as similar methods such as DNA
metabarcoding (for mixed species samples; Taberlet et al. 2012)or
amplicon metagenomics, combine DNA-based species identifica-
tion using standardised markers (DNA barcoding, Hebert et al.
2003) with the power of high-throughput sequencing (HTS). These
methods are powerful tools in life sciences research (Taberlet
et al. 2012;Kress et al. 2015;Toju 2015), from studying century-old
type specimens (Prosser et al. 2016), to assessing species composition
of gut microbiota (Abdelrhman et al. 2016) from mixed samples.
Here, we focus on high-throughput DNA barcoding. This meth-
odology overcomes some of the problems that currently limit
DNA barcoding, such as the high DNA template concentration
required for Sanger sequencing and the co-amplification of other
DNA templates due to intrasample contamination, Wolbachia in-
fection, gut contents, heteroplasmy, and pseudogenes. Moreover,
high-throughput DNA barcoding reduces both per specimen costs
and labour time by nearly 80%, thus allowing to be scaled up to
deal with large-scale biodiversity monitoring projects (Shokralla
et al. 2015;Cruaud et al. 2017).
However, even though high-throughput DNA barcoding is a
promising method, some technical issues require further study.
For example, some authors have explored the impact of the se-
quencing platform (Smith and Peay 2014), the polymerase used
(Oliver et al. 2015;Brandariz-Fontes et al. 2015), the DNA barcode
length (Hajibabaei et al. 2006;Doña et al. 2015), the library prep-
aration method (Schirmer et al. 2015), the primers (Schirmer et al.
2015), the annealing temperature (Schmidt et al. 2013), or the
Received 5 April 2017. Accepted 21 June 2017.
Corresponding Editor: F. Chain.
J. Vierna* and A. Vizcaíno. AllGenetics & Biology SL. Edificio CICA, Campus de Elviña s/n. E-15008 A Coruña, Spain.
J. Doña* and R. Jovani. Department of Evolutionary Ecology, Estación Biológica de Doñana (CSIC), Avenida Américo Vespucio s/n. E-41092 Sevilla, Spain.
D. Serrano. Department of Conservation Biology, Estación Biológica de Doñana (CSIC), Avenida Américo Vespucio s/n. E-41092 Sevilla, Spain.
Corresponding author: R. Jovani (email: jovani@ebd.csic.es).
*These authors contributed equally to this work.
Copyright remains with the author(s) or their institution(s). Permission for reuse (free in most cases) can be obtained from RightsLink.
Pagination not final (cite DOI) / Pagination provisoire (citer le DOI)
1
Genome 00: 1–6 (0000) dx.doi.org/10.1139/gen-2017-0081 Published at www.nrcresearchpress.com/gen on 28 July 2017.
Genome Downloaded from www.nrcresearchpress.com by 212.230.235.80 on 09/15/17
For personal use only.
phenomenon known as mistagging (Schnell et al. 2015;Esling
et al. 2015) in DNA metabarcoding or amplicon sequencing. Re-
cently, Geisen et al. (2015) and Díaz-Real et al. (2015) studied to
what extent DNA metabarcoding produced quantitative (and not
only qualitative) and reliable results in two groups of symbionts.
Finally, several other papers have dealt with some of these issues
through bioinformatic analysis of the HTS reads (Caporaso et al.
2010;Coissac et al. 2012;Edgar 2013;Bokulich et al. 2013;Boyer
et al. 2016).
Here, we focused on the number of PCR cycles used for library
preparation. This is a technical issue that can potentially impact
the biological conclusions of high-throughput DNA barcoding
projects, but that has not yet been studied in detail. Increasing
the number of PCR cycles above the normal 20–35 cycles (e.g.,
Shokralla et al. 2014,2015;Carew et al. 2017) is a common practice:
for example, when working with old-type specimens (Prosser et al.
2016), which provide small amounts of input DNA, or when the
PCR is inefficient (e.g., Blaalid et al. 2013;Ellis et al. 2013;Carew
et al. 2017). However, a large number of PCR cycles may entail the
risk of increasing the number of artificial mutations on the out-
put sequencing reads because of DNA polymerase errors and the
amplification of these errors in subsequent PCR cycles (Cha and
Thilly 1993;Hengen 1995;Casbon et al. 2011;Brandariz-Fontes
et al. 2015). This is a potential major problem for high-throughput
DNA barcoding because it can eventually distort, among others,
genetic threshold-based species delimitation. Yet, to our knowl-
edge, how these extra cycles affect DNA barcoding results has
never been investigated.
To explore the consequences of the number of PCR cycles upon
the number of artificial mutations, we extracted DNA from four
different individuals belonging to four cephalopod species. From
each of the four DNA samples, we prepared five high-throughput
DNA barcoding libraries with different number of PCR cycles:
from 40, i.e., roughly 20 cycles higher than regular numbers, to
60, as done commonly when dealing with problematic samples.
After sequencing the 20 libraries using the Illumina MiSeq plat-
form, we studied the relationship between the number of PCR
cycles and the number of mutations present in the MiSeq reads.
Our results show that, for a number of cycles between 40 and 60,
there is no relationship between the number of PCR cycles and the
number of mutations, with the number of reads with mutations
being very low. Therefore, we conclude that a number of PCR
cycles as high as 60 does not compromise the success of a high-
throughput DNA barcoding project in terms of the occurrence of
point mutations.
Materials and methods
Four ethanol-preserved tissues obtained from different cephalopod
species belonging to the orders Octopoda, Oegopsida, and Sepiida
were analysed (see sample IDs and cephalopod species in Table 1).
Species were identified according to morphology and DNA bar-
coding (Fernando Fernández-Álvarez, personal communication).
The genetic p-distances between the selected individuals were
between 80.1 and 85.7 for the cytochrome coxidase subunit I gene
(COI) used in this study.
Total DNA was extracted from each individual using the NZY-
Tissue gDNA Isolation Kit (NZYTech). DNAs were quantified with
the Qubit dsDNA HS Assay Kit (ThermoFisher Scientific) and used
as input for the preparation of the libraries.
We followed a standard Illumina library preparation protocol.
In brief, we amplified the COI region (i.e., the standard animal
barcode, Hebert et al. 2003) and included the Illumina specific
adapters and indices by following a two-step PCR approach, slightly
modified from Lange et al. (2014). For the sake of clarity, we refer to
these PCRs as PCR1 and PCR2.
PCR1 primers were LCO1490 and HCO2198 (Folmer et al. 1994),
which proved successful in a previous study in which the same
specimens were DNA barcoded (Fernando Fernández-Álvarez et al.,
personal communication). Oligonucleotide tails bearing the Illu-
mina sequencing primers were attached to the 5=ends of primers
LCO1490 and HCO2198. PCR2 was carried out with tailed primers
that bear the indices and adapters and anneal to the Illumina
sequencing primers (see Fig. 1 for a schematic representation of
the binding process).
PCR1 was carried out using 25 ng of total DNA in a final volume
of 25 L containing 6.50 L of Supreme NZYTaq Green PCR Master
Mix (NZYTech) (nonproofreading polymerase; error rate of1×10
−5
according to the manufacturer), 0.5 M of each primer, and PCR-
grade water up to 25 L. The thermal cycling conditions were as
follows: an initial denaturation step at 95 °C for 5 min, followed by
35, 40, 45, 50, or 55 cycles (see Fig. 2) of denaturation at 95 °C for
30 s; annealing at 53 °C for 30 s; extension at 72 °C for 45 s; and a
final extension step at 72 °C for 10 min. The products of PCR1 were
purified using the SPRI method (DeAngelis et al. 1995), with Mag-
Bind RXNPure Plus magnetic beads (Omega Biotek). The purified
products were loaded in a 1% agarose gel stained with GreenSafe
(NZYTech) and visualised under UV light.
PCR2 was carried out using 2.5 L of the purified PCR1 products,
and the same conditions as for PCR1 except for the number of
cycles, which was set to five (Fig. 2) and the annealing temperature
(60 °C). The products obtained were purified following the SPRI
method as indicated above. Then, the purified products were
loaded in a 1% agarose gel stained with GreenSafe (NZYTech) and
visualised under UV light. All samples yielded libraries of the
expected size.
Libraries were quantified using the Qubit dsDNA HS Assay Kit
(ThermoFisher Scientific) and pooled in equimolar amounts. The
pool was sequenced in a fraction of a 600-cycle run (MiSeq Reagent
Kit v3; PE300) of an Illumina MiSeq sequencer along with a PhiX
library used to increase sequence diversity of the overall library,
in Macrogen (Seoul, Korea).
Fig. 1. Schematic representation of the primers used for PCR1 and PCR2 (see main text). The positions of the Illumina adapters, indices, and
sequencing primers are also shown. Note that primers are not drawn to scale.
Pagination not final (cite DOI) / Pagination provisoire (citer le DOI)
2 Genome Vol. 00, 0000
Published by NRC Research Press
Genome Downloaded from www.nrcresearchpress.com by 212.230.235.80 on 09/15/17
For personal use only.
FASTQ files were demultiplexed using RTA 1.18.54 (Illumina) and
checked with FastQC 0.11.3 (http://www.bioinformatics.babraham.
ac.uk/projects/fastqc/). Then, they were quality-trimmed using
very conservative parameters in Trimmomatic 0.36 (Bolger et al.
2014) with the option SLIDINGWINDOW:1:30. SLIDINGWINDOW
starts scanning at the 5=end and clips the read once the average
quality within the window falls below a threshold (Trimmomatic
Manual 0.32). We set the size of the window to 1 and the quality
threshold to 30 (Phred Quality Score). Therefore, when the quality
of a single nucleotide fell below a Phred Quality Score of 30, the
read was clipped from this position to the 3=end. We used these
very conservative parameters to make sure that the mutations
observed in the sequencing results were due to PCR errors and
not to sequencing errors. The quality of the resulting files was
checked again with FastQC.
Quality-trimmed FASTQ files were imported into Geneious 8.1.6
(http://www.geneious.com,Kearse et al. 2012). Each pair of R1 and
R2 files were set as paired reads to improve the mapping. A map-
to-reference analysis was carried out with the Geneious mapper
using relaxed parameters (maximum number of mismatches per
read, 25%; minimum overlap identity, 80%) to allow potentially
mutated reads to map. The DNA barcode sequences from the
four cephalopod specimens were set as references (DDBJ/EMBL/
GenBank accession numbers KX078469–KX078472). The results of
the map-to-reference analysis were inspected manually to verify
that the reads of each library mapped to the correct reference
sequence. We obtained 20 assembly files corresponding to the
four species by the five PCR treatments.
Regions including the first 50 nucleotides of the mapped R1 and
R2 reads (starting immediately after the primer annealing region)
were aligned in each assembly with Muscle (Edgar 2004) as imple-
mented in Geneious 8.1.6. We selected these two 50-nucleotide
regions because such read length accumulated the maximum
number of reads after passing the quality threshold (see above);
using larger regions would have reduced the sample size and,
therefore, the statistical power of the analysis. Reads were trimmed
to the same length to simplify later bioinformatic analyses.
For each alignment file, we calculated the number of mutations
per read by comparing every read against the consensus sequence.
The consensus sequences obtained from the FASTA files of the
same species were identical between them (regardless of the num-
ber of PCR cycles) and they were also identical to the correspond-
ing COI sequences available in DDBJ/EMBL/GenBank. For this, we
used a custom developed R function (R Core Team 2016) to calcu-
late the number of mutations by multiplying the pairwise genetic
p-distance by the total length of our reads. The function treated
insertions and deletions (indels) as single mutational steps and
the genetic p-distance was calculated with the dist.dna function
(raw model) from the ape 3.4 R package (Paradis et al. 2004). Then,
we ran a Poisson generalised linear mixed model (GLMM) on the
entire resulting data set (glmer function from package lme4 1.1-12;
Bates et al. 2015). We considered the number of mutations as the
response variable, the number of cycles as the predictor variable,
and the species as a random factor. We confirmed assumptions
underlying GLMMs by exploring regression residuals for normal-
ity against a Q-Q plot.
Fig. 2. From each cephalopod sample, five different high-throughput DNA barcoding libraries were constructed and sequenced in the
Illumina MiSeq platform. In each of these five libraries, the number of PCR cycles during PCR1 was different (35, 40, 45, 50, and 55 cycles).
Pagination not final (cite DOI) / Pagination provisoire (citer le DOI)
Vierna et al. 3
Published by NRC Research Press
Genome Downloaded from www.nrcresearchpress.com by 212.230.235.80 on 09/15/17
For personal use only.
Finally, to make sure that the PCR1 reaction was still function-
ing after 55 cycles (i.e., that the emergence of new artificial muta-
tions was still possible), qPCRs were performed in all four samples
with the same parameters as in PCR1, but with 60 cycles to cover
the whole range of our experiment. The resulting fluorescence
versus number of cycles plots were visually analysed, confirming
that the reaction was still taking place after 55 cycles.
Results
Due to the stringent quality-filtering, only 2.26% of the raw
reads were used for the statistical analyses (see supplementary
material, Table S1). The average quality of both the raw and
quality-trimmed reads, as measured with FastQC, is available in
the supplementary material, Fig. S1.
We detected mutations in 4176 out of the 69 792 reads analysed
(i.e., 5.98%), which passed the quality-filtering step, mapped to
the correct reference sequence, and were located within the
50-nucleotide stretches after the primer annealing regions.
The number of mutations was consistent across species and the
maximum number of mutations per read was three along differ-
ent treatments (Fig. 3;Table 1). Accordingly, we found no effect of
the number of cycles on the number of mutations (Fig. 3; slope ±
SE = 0.0002 ± 0.0024, Z= 0.096, P= 0.923).
Discussion
In this work, we investigated whether increasing the number of
PCR cycles during library preparation produces a higher number
of mutations that could eventually impact the outcome of a high-
throughput DNA barcoding study. We demonstrated that even for
a high number of cycles (60, i.e., up to 55 cycles for PCR1 and five
additional cycles for PCR2) the number of reads with mutations
remained very low despite using a non-proofreading enzyme and
despite the potential occurrence of heteroplasmy (which would
increase the number of mutated positions when compared to the
reference sequence). However, we only analysed two regions of
50 nucleotides each from the COI animal DNA barcode, whereas
Table 1. Percentage of reads with 0, 1, 2, or 3 mutations relative to the reference
sequence.
Library ID 0 1 2 3
No. of
reads
CEP007 (Bathypolypus sponsalis) 94.055 5.772 0.167 0.004 22 105
CEP016 (Ancistroteuthis lichtensteini) 94.169 5.607 0.222 0 14 408
CEP023 (Todaropsis eblanae) 93.409 6.37 0.198 0.022 22 637
SEP006 (Sepietta oweniana) 95.019 4.839 0.14 0 10 642
Note: No sequence with four or more mutations was found.
Fig. 3. Number of mutations relative to the reference sequence observed in each PCR treatment. (a)Bathypolypus sponsalis.(b)Ancistroteuthis
lichtensteini.(c)Todaropsis eblanae.(d)Sepietta oweniana.
Pagination not final (cite DOI) / Pagination provisoire (citer le DOI)
4 Genome Vol. 00, 0000
Published by NRC Research Press
Genome Downloaded from www.nrcresearchpress.com by 212.230.235.80 on 09/15/17
For personal use only.
different genomic regions may impose different error rates to
DNA polymerase (e.g., Arezi et al. 2003). Nevertheless, the lack of
effect we found in these regions with high sequence quality by
experimentally increasing the number of PCR cycles indicates
that PCR cycles might have negligible impacts on point mutations
and subsequent taxonomic assignment.
Some DNA metabarcoding-specific technical issues can arise by
an increase in the number of PCR cycles, and thus require further
study. For instance, chimeras are hybrid amplicons that can be
formed during a PCR when an aborted extension product from an
earlier cycle functions as a primer in a subsequent PCR cycle (Haas
et al. 2011). Chimeras inflate diversity in an artificial manner and
should be carefully taken into account. In this work, chimeras
were not an issue because we prepared our libraries using DNA
from individual specimens (i.e., high-throughput DNA barcoding
libraries). However, the formation of chimeras has been found to
be correlated with the number of PCR cycles and to the con-
sumption of the primers (Wang and Wang 1996;Qiu et al. 2001;
Thompson et al. 2002). Fortunately, several bioinformatic tools
have been developed to deal with chimeras and thus their impact
can be greatly reduced (Edgar et al. 2011,Haas et al. 2011,Coissac
et al. 2012,Boyer et al. 2016). Thus, even though our results hold
for DNA metabarcoding studies in terms of point mutations, the
formation of chimeras at high PCR cycles is a separated problem
that should be considered in DNA metabarcoding studies.
Overall, our results show that increasing the number of PCR
cycles above routine levels during library preparation is not risky
for high-throughput DNA barcoding studies, in terms of the
amount of point mutations produced by polymerase errors even
when a non-proofreading enzyme is used. Therefore, this strategy
can be safely followed with little amounts of input DNA or when
there are mismatches in the primer annealing regions that make
the PCRs inefficient.
Data accessibility
The MiSeq raw data, the sequences files, and the supplementary
material have been deposited in Figshare (https://doi.org/10.6084/
m9.figshare.3860958). The R code used for the analyses is available
on the GitHub repository (https://github.com/Jorge-Dona/Barcoding-
tools).
Acknowledgements
We thank Fernando Fernández-Álvarez for letting us analyse
the cephalopod samples, which belong to the research project
CALOCEAN-2 (AGL2012-39077), funded by the Ministerio de Economía y
Competitividad (Spain). This work was supported by the Ministe-
rio de Economía y Competitividad (Spain) with a Ramón y Cajal
research contract RYC-2009-03967 to R.J., and two research proj-
ects (CGL2011-24466, CGL2015-69650-P) to D.S. and R.J. J.D. was also
supported by the Ministerio de Economía y Competitividad (Spain)
(SVP-2013-067939).
References
Abdelrhman, K.F.A., Bacci, G., Mancusi, C., Mengoni, A., Serena, F., and
Ugolini, A. 2016. A first insight into the gut microbiota of the sea turtle Caretta
caretta. Front. Microbiol. 7: 1060. PMID:27458451.
Arezi, B., Xing, W., Sorge, J.A., and Hogrefe, H.H. 2003. Amplification efficiency
of thermostable DNA polymerases. Anal. Biochem. 321: 226–235. doi:10.1016/
S0003-2697(03)00465-2. PMID:14511688.
Bates, D., Mächler, M., Bolker, B., and Walker, S. 2015. Fitting linear mixed-
effects models using lme4. Journal of Statistical Software, 67(1): 1–48. doi:10.
18637/jss.v067.i01.
Blaalid, R., Kumar, S., Nilsson, R.H., Abarenkov, K., Kirk, P.M., and Kauserud, H.
2013. ITS1 versus ITS2 as DNA metabarcodes for fungi. Mol. Ecol. Resour. 13:
218–224. doi:10.1111/1755-0998.12065. PMID:23350562.
Bokulich, N.A., Subramanian, S., Faith, J.J., Gevers, D., Gordon, J.I., Knight, R.,
et al. 2013. Quality-filtering vastly improves diversity estimates from Illu-
mina amplicon sequencing. Nat. Methods, 10: 57–59. PMID:23202435.
Bolger, A.M., Lohse, M., and Usadel, B. 2014. Trimmomatic: a flexible trimmer
for Illumina Sequence data. Bioinformatics, 30(15): 2114–2120. doi:10.1093/
bioinformatics/btu170. PMID:24695404.
Boyer, F., Mercier, C., Bonin, A., Le Bras, Y., Taberlet, P., and Coissac, E. 2016.
obitools: a unix-inspired software package for DNA metabarcoding. Mol.
Ecol. Resour. 16: 176–182. doi:10.1111/1755-0998.12428. PMID:25959493.
Brandariz-Fontes, C., Camacho-Sánchez, M., Vila
`, C., Vega-Pla, J.L., Rico, C., and
Leonard, J.A. 2015. Effect of the enzyme and PCR conditions on the quality of
high-throughput DNA sequencing results. Sci. Rep. 5: 8056. doi:10.1038/
srep08056. PMID:25623996.
Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D.,
Costello, E.K., et al. 2010. QIIME allows analysis of high-throughput commu-
nity sequencing data. Nat. Methods, 7: 335–336. doi:10.1038/nmeth.f.303.
PMID:20383131.
Carew, M.E., Metzeling, L., StClair, R., and Hoffmann, A.A. 2017. Detecting inver-
tebrate species in archived collections using next generation sequencing.
Mol. Ecol. Resour. [Online ahead of print.] doi:10.1111/1755-0998.12644.
Casbon, J.A., Osborne, R.J., Brenner, S., and Lichtenstein, C.P. 2011. A method for
counting PCR template molecules with application to next-generation se-
quencing. Nucleic Acids Res. 39: e81. doi:10.1093/nar/gkr217. PMID:21490082.
Cha, R.S., and Thilly, W.G. 1993. Specificity, efficiency, and fidelity of PCR. Ge-
nome Res. 3: S18–S29. doi:10.1101/gr.3.3.S18. PMID:8118393.
Coissac, E., Riaz, T., and Puillandre, N. 2012. Bioinformatic challenges for DNA
metabarcoding of plants and animals. Mol. Ecol. 21: 1834–1847. doi:10.1111/j.
1365-294X.2012.05550.x. PMID:22486822.
Cruaud, P., Rasplus, J.Y., Rodriguez, L.J., and Cruaud, A. 2017. High-throughput
sequencing of multiple amplicons for barcoding and integrative taxonomy.
Sci. Rep. 7: 41948. doi:10.1038/srep41948. PMID:28165046.
DeAngelis, M.M., Wang, D.G., and Hawkins, T.L. 1995. Solid-phase reversible
immobilization for the isolation of PCR products. Nucleic Acids Res. 23:
4742–4743. doi:10.1093/nar/23.22.4742. PMID:8524672.
Diaz-Real, J., Serrano, D., Píriz, A., and Jovani, R. 2015. NGS metabarcoding
proves successful for quantitative assessment of symbiont abundance: the
case of feather mites on birds. Exp. Appl. Acarol. 67: 209–218. doi:10.1007/
s10493-015-9944-x. PMID:26139533.
Doña, J., Diaz-Real, J., Mironov, S., Bazaga, P., Serrano, D., and Jovani, R. 2015.
DNA barcoding and minibarcoding as a powerful tool for feather mite
studies. Mol. Ecol. Resour. 15: 1216–1225. doi:10.1111/1755-0998.12384. PMID:
25655349.
Edgar, R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Res. 32: 1792–1797. doi:10.1093/nar/gkh340.
PMID:15034147.
Edgar, R.C. 2013. UPARSE: highly accurate OTU sequences from microbial
amplicon reads. Nat. Methods, 10: 996–998. doi:10.1038/nmeth.2604. PMID:
23955772.
Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. 2011. UCHIME
improves sensitivity and speed of chimera detection. Bioinformatics, 27:
2194–2200. doi:10.1093/bioinformatics/btr381.
Ellis, R.J., Bruce, K.D., Jenkins, C., Stothard, J.R., Ajarova, L., Mugisha, L., and
Viney, M.E. 2013. Comparison of the distal gut microbiota from people and
animals in Africa. PLoS ONE, 8: e54783. doi:10.1371/journal.pone.0054783.
PMID:23355898.
Esling, P., Lejzerowicz, F., and Pawlowski, J. 2015. Accurate multiplexing and
filtering for high-throughput amplicon-sequencing. Nucleic Acids Res. 43:
2513–2524. doi:10.1093/nar/gkv107. PMID:25690897.
Folmer, O., Black, M., Hoeh, W., Lutz, R., and Vrijenhoek, R. 1994. DNA primers
for amplification of mitochondrial cytochrome coxidase subunit I from di-
verse metazoan invertebrates. Mol. Mar. Biol. Biotechnol. 3: 294–299. PMID:
7881515.
Geisen, S., Laros, I., Vizcaíno, A., Bonkowski, M., and de Groot, G.A. 2015. Not all
are free-living: high-throughput DNA metabarcoding reveals a diverse com-
munity of protists parasitizing soil metazoa. Mol. Ecol. 24: 4556–4569. doi:
10.1111/mec.13238. PMID:25966360.
Haas, B.J., Gevers, D., Earl, A.M., Feldgarden, M., Ward, D.V., Giannoukos, G.,
et al. 2011. Chimeric 16S rRNA sequence formation and detection in Sanger
and 454-pyrosequenced PCR amplicons. Genome Res. 21: 494–504. doi:10.
1101/gr.112730.110. PMID:21212162.
Hajibabaei, M., Smith, M., Janzen, D.H., Rodriguez, J.J., Whitfield, J.B., and
Hebert, P.D. 2006. A minimalist barcode can identify a specimen whose DNA
is degraded. Mol. Ecol. Notes, 6: 959–964. doi:10.1111/j.1471-8286.2006.01470.x.
Hebert, P.D.N., Cywinska, A., Ball, S.L., and deWaard, J.R. 2003. Biological iden-
tifications through DNA barcodes. Proc. R. Soc. B Biol. Sci. 270: 313–321.
doi:10.1098/rspb.2002.2218.
Hengen, P.N. 1995. Methods and reagents: fidelity of DNA polymerases for PCR.
Trends Biochem. Sci. 20: 324–325. doi:10.1016/S0968-0004(00)89060-X. PMID:
7667892.
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., et al.
2012. Geneious Basic: an integrated and extendable desktop software plat-
form for the organization and analysis of sequence data. Bioinformatics, 28:
1647–1649. doi:10.1093/bioinformatics/bts199. PMID:22543367.
Kress, W.J., García-Robledo, C., Uriarte, M., and Erickson, D.L. 2015. DNA bar-
codes for ecology, evolution, and conservation. Trends Ecol. Evol. 30, 25–35.
doi:10.1016/j.tree.2014.10.008.
Lange, V., Böhme, I., Hofmann, J., Lang, K., Sauter, J., Schöne, B., et al. 2014.
Cost-efficient high-throughput HLA typing by MiSeq amplicon sequencing.
BMC Genomics, 15: 63. doi:10.1186/1471-2164-15-63. PMID:24460756.
Pagination not final (cite DOI) / Pagination provisoire (citer le DOI)
Vierna et al. 5
Published by NRC Research Press
Genome Downloaded from www.nrcresearchpress.com by 212.230.235.80 on 09/15/17
For personal use only.
Oliver, A.K., Brown, S.P., Callaham, M.A., Jr., and Jumpponen, A. 2015. Polymerase
matters: non-proofreading enzymes inflate fungal community richness esti-
mates by up to 15%. Fungal Ecol. 15: 86–89. doi:10.1016/j.funeco.2015.03.003.
Paradis, E., Claude, J., and Strimmer, K. 2004. APE: analyses of phylogenetics
and evolution in R language. Bioinformatics, 20: 289–290. doi:10.1093/
bioinformatics/btg412. PMID:14734327.
Prosser, S.W., deWaard, J.R., Miller, S.E., and Hebert, P.D. 2016. DNA barcodes
from century-old type specimens using next-generation sequencing. Mol.
Ecol. Resour. 16: 487–497. doi:10.1111/1755-0998.12474. PMID:26426290.
Qiu, X., Wu, L., Huang, H., McDonel, P.E., Palumbo, A.V., Tiedje, J.M., and Zhou, J.
2001. Evaluation of PCR-generated chimeras, mutations, and heteroduplexes
with 16S rRNA gene-based cloning. Appl. Environ. Microbiol. 67: 880–887.
doi:10.1128/AEM.67.2.880-887.2001. PMID:11157258.
R Core Team. 2016. R: a language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna, Austria. Available from
https://www.R-project.org/.
Schirmer, M., Ijaz, U.Z., D’Amore, R., Hall, N., Sloan, W.T., and Quince, C. 2015.
Insight into biases and sequencing errors for amplicon sequencing with
the Illumina MiSeq platform. Nucleic Acids. Res. 43: e37. doi:10.1093/nar/
gku1341. PMID:25586220.
Schmidt, P.-A., Bálint, M., Greshake, B., Bandow, C., Römbke, J., and Schmitt, I.
2013. Illumina metabarcoding of a soil fungal community. Soil Biol. Biochem.
65: 128–132. doi:10.1016/j.soilbio.2013.05.014.
Schnell, I.B., Bohmann, K., and Gilbert, M.T. 2015. Tag jumps illuminated
reducing sequence-to-sample misidentifications in metabarcoding studies.
Mol. Ecol. Resour. 15: 1289–1303. doi:10.1111/1755-0998.12402. PMID:25740652.
Shokralla, S., Gibson, J.F., Nikbakht, H., Janzen, D.H., Hallwachs, W., and
Hajibabaei, M. 2014. Next-generation DNA barcoding: using next-generation
sequencing to enhance and accelerate DNA barcode capture from single
specimens. Mol. Ecol. Resour. 14: 892–901. PMID:24641208.
Shokralla, S., Porter, T.M., Gibson, J.F., Dobosz, R., Janzen, D.H., Hallwachs, W.,
et al. 2015. Massively parallel multiplex DNA sequencing for specimen iden-
tification using an Illumina MiSeq platform. Sci. Rep. 5: 9687. doi:10.1038/
srep09687. PMID:25884109.
Smith, D.P., and Peay, K.G. 2014. Sequence depth, not PCR replication, improves
ecological inference from next generation DNA sequencing. PLoS ONE, 9:
e90234. doi:10.1371/journal.pone.0090234. PMID:24587293.
Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C., and Willerslev, A. 2012.
Towards next-generation biodiversity assessment using DNA metabarcoding.
Mol. Ecol. 21: 2045–2050. doi:10.1111/j.1365-294X.2012.05470.x. PMID:22486824.
Thompson, J.R., Marcelino, L.A., and Polz, M.F. 2002. Heteroduplexes in mixed-
template amplifications: formation, consequence and elimination by ‘recon-
ditioning PCR’. Nucleic Acids Res. 30: 2083–2088. doi:10.1093/nar/30.9.2083.
PMID:11972349.
Toju, H. 2015. High-throughput DNA barcoding for ecological network studies.
Popul. Ecol. 57: 37–51. doi:10.1007/s10144-014-0472-z.
Wang, G.C., and Wang, Y. 1996. The frequency of chimeric molecules as a con-
sequence of PCR co-amplification of 16S rRNA genes from different bacterial
species. Microbiology, 142: 1107–1114. doi:10.1099/13500872-142-5-1107. PMID:
8704952.
Pagination not final (cite DOI) / Pagination provisoire (citer le DOI)
6 Genome Vol. 00, 0000
Published by NRC Research Press
Genome Downloaded from www.nrcresearchpress.com by 212.230.235.80 on 09/15/17
For personal use only.
... During this process, the product of the first PCR reaction was used as a template for the second PCR (Cha and Thilly 1993). Double PCR can, in theory, be problematic due to potential polymerase-induced mutations during the increased number of amplification steps of the PCR cycles (Cha and Thilly 1993;Vierna et al. 2017). However, raising the number of PCR cycles is recommended (and is a common practice) in cases of very low concentrations of template DNA (Rameckers et al. 1997;Vierna et al. 2017). ...
... Double PCR can, in theory, be problematic due to potential polymerase-induced mutations during the increased number of amplification steps of the PCR cycles (Cha and Thilly 1993;Vierna et al. 2017). However, raising the number of PCR cycles is recommended (and is a common practice) in cases of very low concentrations of template DNA (Rameckers et al. 1997;Vierna et al. 2017). Moreover, the probability of polymerase-induced mutations is low in cases of small amounts of template DNA and short lengths of the amplified DNA (Cha and Thilly 1993) and no relationship between the number of PCR cycles and the number of mutations was found (Vierna et al. 2017). ...
... However, raising the number of PCR cycles is recommended (and is a common practice) in cases of very low concentrations of template DNA (Rameckers et al. 1997;Vierna et al. 2017). Moreover, the probability of polymerase-induced mutations is low in cases of small amounts of template DNA and short lengths of the amplified DNA (Cha and Thilly 1993) and no relationship between the number of PCR cycles and the number of mutations was found (Vierna et al. 2017). We used a double pre-PCR to amplify mtDNA from blow samples, and all double-PCR products were successfully aligned with known SRW haplotypes with 100% accordance suggesting a lack of PCR artefacts. ...
Article
Full-text available
Here, we present the potential of genetic non-invasive sampling within a citizen science framework using commercial whale-watching boats. We collected various non-invasive samples, including sloughed skin, faeces, and blow samples. Considering the body size and intensity of exhalation, blows generate a sufficient amount of genetic material to enable individual sampling, identification, and population genetic analyses. For the first time, exhale samples of southern right whales were collected from a commercial vessel and used for DNA amplification. We compared the DNA yield and genotyping success of mitochondrial sequences and nuclear microsatellite markers from blow samples to other non-invasive and minimally invasive sampling methods. Blow samples typically contain very small amounts of cells and possibly fragmented DNA, resulting in relatively low target DNA yield. After PCR optimisation, mitochondrial DNA (mtDNA) was sequenced in 37.5% of blow samples, and a minimum of six microsatellites were successfully amplified in 25% of blow samples. This study highlights the potential for cetaceans, iconic subjects of conservation biology, to serve as a future model in utilising citizen science and airborne DNA for population monitoring, addressing the unique challenges of the marine environment.
... Illumina sequencing primer sequences were attached to these primers at their 5' ends. A two-step PCR protocol was applied for library preparation (see e.g., Vierna et al. 2017). In the first amplification step, PCRs were carried out in triplicate in a final volume of 12.5 μL, containing 2.5 μL of template DNA (diluted 1:10), 0.5 μM of the primers, 6.25 μL of Supreme NZYTaq 2x Green Master Mix (NZYTech), 1X CES (Ralser et al. 2006), and ultrapure water up to 12.5 μL. ...
Article
Full-text available
The most important reason for the lack of a German nationwide and standardised survey of soil organisms is probably the time‐consuming and expensive identification of soil invertebrates. The present contribution should contribute to solving this problem. Earthworms and soil were sampled at 25 sites, the animals were identified morphologically and by community DNA (comDNA) and environmental DNA (eDNA) metabarcoding. The comparison of results showed that comDNA detected more species (3.6 on average) than eDNA (3.0) and morphological identification (2.8). In contrast, eDNA, on average, detected a similar number of species as morphological identification. However, some species appear to have a different probability of being detected by eDNA than others, depending on their abundance, behaviour, biology or body size. All three identification methods can differentiate between sites with different species composition, and the degree of separation can vary depending on the identification method. The relative proportion of eDNA reads shows potential as a surrogate of relative abundance/biomass for endogeic but not for anecic species. The overall aim of the ‘MetaSOL’ project (which the present contribution originated from) was to develop recommendations for efficient and routinely implementable monitoring of soil fauna. The results showed that genetic identification methods are suitable for earthworms. Before genetic identification methods can be introduced into official practice, key preconditions such as comprehensive, well‐curated and quality‐controlled DNA reference databases and method standardisation must be addressed. Robust indices of soil health based on soil organism data need to be developed. The inclusion of further groups in addition to earthworms should be examined.
... The reaction mixture was incubated as follows: an initial denaturation step at 95℃ for 5 min, followed by 35 cycles of 95℃ for 30 s, 48℃ for 45 s, 72℃ for 45 s and a final extension step at 72℃ for 7 min. The oligonucleotide indices that are required for multiplexing different libraries in the same sequencing pool were attached in a second amplification step with identical conditions but only for 5 cycles and at 60℃ annealing temperature 31 . A negative control that contained no DNA (BPCR) was included in every PCR round to check for contamination during library preparation 11 . ...
Article
Full-text available
The Middle Route of the South to North Water Diversion Project (MRP) and its water source, the Danjiangkou Reservoir (DJK), play a pivotal role in mitigating the chronic water scarcity challenges faced by northern China. Eukaryotic plankton are widespread in aquatic ecosystems, which are crucial for the water quality stability of DJK and MRP, yet comparative studies on their contemporaneous dynamics and assembly processes are scarce. In this study, amplicon sequencing was used to investigate the eukaryotic plankton communities. The results revealed that the similarity in community composition of DJK is significantly higher than that of MRP, exhibiting distance-decay patterns. Environmental heterogeneity exhibits significant differences between DJK and MRP, and it significantly influences community composition and alpha diversity. Additionally, the assembly processes of eukaryotic plankton in both DJK and MRP are predominantly influenced by stochastic processes. However, in comparison to DJK, deterministic processes have a more pronounced impact on MRP, accounting for 39.29% and 1.82%, respectively. The variations in total nitrogen (TN), chlorophy IIa (Chl.a), and conductivity (Spc) have led to a transition in the assembly of eukaryotic phytoplankton communities in MRP from a stochastic process to a deterministic process. This study extends insights into the dynamics and assembly processes of eukaryotic plankton communities in the large, engineered drinking water diversion project and its water source, which is also useful for the management and regulation of the DJK and MRP. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-87983-9.
... PCR, library preparation and sequencing were performed by AllGenetics & Biology SL (A Coruña, Spain). The metabarcoding libraries were prepared using a two-step PCR protocol following the methodology described in Vierna et al. (2017). This involved the amplification of target DNA region in the first PCR, followed by a second PCR to attach Illumina adapters and unique indices for multiplexing. ...
Article
Marine macroalgae surfaces create a nutrient‐rich environment that promotes the formation of epiphyte biofilms. Biofilms are complex systems that facilitate ecological interactions within the community, yet parasitism remains largely unexplored. This study describes the diversity and temporal dynamics of the microeukaryotic community in the biofilm of Mediterranean macroalgae during summer, focusing on parasitic groups. Protist diversity was assessed using metabarcoding sequencing of the V4 region of the 18S rDNA gene using primers biased against metazoans. The macroalgal biofilm exhibited dynamic shifts in the microeukaryotic community structure associated to three phases of biofilm formation. Each phase was characterised by the dominance of specific eukaryotic and parasitic groups with clear successions between them. Our study revealed a high diversity of parasitic protists from different lineages in the macroalgal biofilm. These parasites can infect a wide variety of hosts, including the basibiont, species within the biofilm (micro‐ and macrocolonizers), nearby marine hosts and terrestrial organisms. The highest diversity and abundance of parasites were found in the mature phase of the biofilm, where the complexity and stability of the system seem to favour parasitism. The parasite assemblage was dominated by Apicomplexa, with many corresponding to unknown diversity, demonstrating that biofilms are a hotspot of unknown parasitic interactions. These parasites could potentially affect the dynamics of these communities and facilitate ecological interactions between the biofilm and surrounding organisms, suggesting that parasitism play a key, but still unexplored role, in shaping complex marine biofilms network.
... The library preparation and Illumina NovaSeq sequencing were carried out by AllGenetics & Biology SL (https://www.allgenetics.eu) following a two-step polymerase chain reaction (PCR) protocol (following [58]). To enable the detection of a broad range of metazoan prey in the stomachs, a multimarker approach was applied, combining the 'Leray'-fragment of the COI with the V1-V2 region of 18S rDNA. ...
Article
Full-text available
The waters of Greenland harbour a high species richness and biomass of gelatinous zooplankton (GZP); however, their role in the diet of the many fish species, including commercially exploited species, has not yet been verified. Traditionally, GZP was considered to be a trophic dead end, i.e. with a limited contribution as prey for higher trophic levels. We applied DNA metabarcoding of two gene fragments (COI, 18S V1–V2) to the stomach contents of seven pelagic and demersal fish species in Greenland waters, to identify their prey composition as well as the occurrence of GZP predation. We detected GZP DNA reads in the stomachs of all investigated fish species, with frequency of occurrences ranging from 12.5% (for Melanogrammus aeglefinus) to 50% (for Argentina silus). GZP predation had not yet been reported for several of these species. GZP were found to majorly contribute to the diet of A. silus and Anarhichas denticulatus, particularly, the siphonophore Nanomia cara and the scyphozoan Atolla were of a high importance as prey, respectively. The use of multiple genetic markers enabled us to detect a total of 59 GZP taxa in the fish stomachs with several GZP species being detected only by one of the markers.
... The reaction mixture was incubated as follows: an initial denaturation step at 95 • C for 5 min, followed by 35 cycles of 95 • C for 30 s, 48 • C for 45 s, 72 • C for 45 s and a final extension step at 72 • C for 7 min. The oligonucleotide indices that are required for multiplexing different libraries in the same sequencing pool were attached in a second amplification step with identical conditions but only for 5 cycles and at 60 • C annealing temperature [47]. A negative control that contained no DNA (BPCR) was included in every PCR round to check for contamination during library preparation. ...
Article
Full-text available
Plankton studies serve as a basis for marine ecosystem research, but knowledge of marine plankton is still incomplete due to its extreme taxonomic and functional complexity. The application of metabarcoding is very valuable for the characterisation of the plankton community. The plankton community of the Southern Adriatic is subject to strong environmental fluctuations and changes, which underlines the need for frequent, reliable and comprehensive characterisation of the plankton. The aim of this study was to determine the taxonomic composition and seasonal distribution of eukaryotic plankton in the Southern Adriatic. Plankton samples were collected monthly for one year at the coastal station of the Southern Adriatic and metabarcoding was used for taxonomic identification. The results showed a high taxonomic diversity and dynamic seasonal distribution patterns for both the protist and metazoan plankton communities. Metabarcoding revealed both the core, year-round plankton community and previously unrecorded plankton organisms in the Southern Adriatic. The results provide for the first time a comprehensive overview of the plankton community in this area by metabarcoding. The identified seasonal patterns of plankton genera and species in the Southern Adriatic will contribute to the understanding of plankton interactions and future changes in community diversity characterisation.
... This round utilized identical conditions, except with only five cycles and an annealing temperature of 60°C. For a visual representation of the library preparation process, refer to Figure 1 in Vierna et al. (2017). To ensure accuracy and reliability, negative controls were included during library preparation and DNA extraction. ...
Article
Full-text available
With a reduction on available chemical treatments, there is an increased interest on biological control of grapevine trunk diseases. Few studies have investigated the impact of introducing beneficial microorganisms in rhizosphere, on the indigenous soil existent microbiome. In this study, we explored the effect of two biological control agents, Trichoderma atroviride SC1 (commercial product Vintec® from Certis Belchim, Ta SC1) and Bacillus subtilis PTA-271 (Bs PTA-271), on the grapevine rhizosphere bacterial and fungal microbiome, and on plant defense expression, using High-Throughput Amplicon Sequencing and qPCR, respectively. Additionally, we quantified both Ta SC1 and Bs PTA-271 in rhizosphere overtime using digital droplet PCR. The fungal microbiome was more affected by factors such as soil type, BCA treatment, and sampling time than bacterial microbiome. Specifically, Ta SC1 application produced negative impacts on fungal diversity, while applications of BCAs did not affect bacterial diversity. Interestingly, the survival and establishment of both BCAs showed opposite trends depending on the soil type, indicating that the physicochemical properties of soils have a role on BCA establishment. Fungal co-occurrence networks were less complex than bacterial networks, but highly impacted by Ta SC1 application. Soils treated with Ta SC1, presented more complex and stable co-occurrence networks, with a higher number of positive correlations. Induced grapevine defenses also differed according to the soil, being more affected by BCA inoculation on sandy soil.
Article
The continual investigation of novel genetic markers has yielded promising solutions for addressing the challenges encountered in forensic DNA analysis. In this study, we have introduced a custom‐designed panel capable of simultaneously amplifying 41 novel Multi‐insertion/deletion (Multi‐InDel) markers and an amelogenin locus using the capillary electrophoresis platform. Through a developmental validation study conducted in accordance with guidelines recommended by the Scientific Working Group on DNA Analysis Methods, we demonstrated that the new Multi‐InDel system exhibited the sensitivity to produce reliable genotyping profiles with as little as 62.5 pg of template DNA. Accurate and complete genotyping profiles could be obtained even in the presence of specific concentrations of PCR inhibitors. Furthermore, the maximum amplicon size for this system was limited to under 220 bp in the genotyping profile, resulting in its superior efficiency compared to commercially available short tandem repeat kits for both naturally and artificially degraded samples. In the context of mixed DNA analysis, the Multi‐InDel system was proved informative in the identification of two‐person DNA mixture, even when the template DNA of the minor contributor was as low as 50 pg. In conclusion, a series of performance evaluation studies have provided compelling evidence that the new Multi‐InDel system holds promise as a valuable tool for forensic DNA analysis.
Article
Full-text available
http://biorxiv.org/content/early/2016/09/04/073304 Until now, the potential of NGS has been seldom realised for the construction of barcode reference libraries. Using a two-step PCR approach and MiSeq sequencing, we tested a cost-effective method and developed a custom workflow to simultaneously sequence multiple markers (COI, Cytb and EF, altogether 2kb) from hundreds of specimens. Interestingly, primers and PCR conditions used for Sanger sequencing did not require optimisation to construct MiSeq library. After completion of quality controls, 87% of the species and 76% of the specimens had valid sequences for the three markers. Nine specimens (3%) exhibited two divergent (up to 10%) sequence clusters. In 95% of the species, MiSeq and Sanger sequences obtained from the same samplings were similar. For the remaining 5%, species were paraphyletic or the sequences clustered into two divergent groups (>7%) on the final trees (Sanger + MiSeq). These problematic cases are difficult to explain but may represent coding NUMTS or heteroplasms. These results highlight the importance of performing quality control steps, working with expert taxonomists and using more than one marker for DNA-taxonomy or species diversity assessment. The power and simplicity of this method appears promising to build on existing experience, tools and resources while taking advantage of NGS.
Article
Full-text available
Although much biological research depends upon species diagnoses, taxonomic expertise is collapsing. We are convinced that the sole prospect for a sustainable identification capability lies in the construction of systems that employ DNA sequences as taxon 'barcodes'. We establish that the mitochondrial gene cytochrome c oxidase I (COI) can serve as the core of a global bioidentification system for animals. First, we demonstrate that COI profiles, derived from the low-density sampling of higher taxonomic categories, ordinarily assign newly analysed taxa to the appropriate phylum or order. Second, we demonstrate that species-level assignments can be obtained by creating comprehensive COI profiles. A model COI profile, based upon the analysis of a single individual from each of 200 closely allied species of lepidopterans, was 100% successful in correctly identifying subsequent specimens. When fully developed, a COI identification system will provide a reliable, cost-effective and accessible solution to the current problem of species identification. Its assembly will also generate important new insights into the diversification of life and the rules of molecular evolution.
Article
Full-text available
Understanding the ecological function of species and the structure of communities is crucial in the study of ecological interactions among species. For this purpose, not only the occurrence of particular species but also their abundance in ecological communities is required. However, abundance quantification of species through morphological characters is often difficult or time/money consuming when dealing with elusive or small taxa. Here we tested the use of next-generation sequencing (NGS) for abundance estimation of two species of feather mites (Proctophyllodes stylifer and Pteronyssoides parinus) under five proportions (16:1, 16:4, 16:16, 16:64, and 16:256 mites) against a mock community composed by Proctophyllodes clavatus and Proctophyllodes sylviae. In all mixtures, we retrieved sequence reads from all species. We found a strong linear relationship between 454 reads and the real proportion of individuals in the mixture for both focal species. The slope for Pr. stylifer was close to one (0.904), and the intercept close to zero (-0.007), thus showing an almost perfect correspondence between real and estimated proportions. The slope for Pt. parinus was 0.351 and the intercept 0.307, showing that while the estimated proportion increased linearly relative to real proportions of individuals in the samples, proportions were overestimated at low real proportions and underestimated at larger ones. Additionally, pyrosequencing replicates from each DNA extraction were highly repeatable (R = 0.920 and 0.972, respectively), showing that the quantification method is highly consistent given a DNA extract. Our study suggests that NGS is a promising tool for abundance estimation of feather mites' communities in birds.
Article
Invertebrate biodiversity measured at mostly family level is widely used in biological monitoring programs to assess anthropogenic impacts on ecosystems. However, next generation sequencing (NGS) could allow development of new more sensitive biomonitoring tools by allowing rapid species identification. This could be accelerated if archived invertebrate collections and environmental information from past programs are used to understand species distributions and their environmental responses. In this study, we take archived macroinvertebrate samples from two sites collected on multiple occasions and test if NGS can successfully detect species. Samples had been stored in 70% ethanol at room temperature for up to 12 years. Three amplicons ranging from 197-274 bps within the DNA barcode region were amplified from samples and compared to DNA barcoding libraries to identify species. We were able to amplify partial DNA barcodes from most samples, and species were often detected with multiple amplicons. However, some singletons and taxa poorly covered by DNA barcoding were missed. This suggests additional DNA barcodes will be required to fill 'gaps' in current DNA barcode libraries for aquatic macroinvertebrates and/or that it may not be possible to detect all taxa in a sample. Furthermore, older samples often detected fewer taxa and were less reliable for amplification, suggesting NGS is best used on samples within eight years of collection. Nevertheless, many common taxa with existing DNA barcodes were reliably identified with NGS and were often present at sites across multiple years, showing the potential of NGS for detecting common and abundant species in archived material. This article is protected by copyright. All rights reserved.
Article
Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many other individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavorable for DNA preservation, success in sequence recovery has been uncertain. The present study addresses this challenge by employing next generation sequencing (NGS) to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA. DNA quality was first screened in more than 1800 century-old type specimens of Lepidoptera by attempting to recover 164bp and 94bp reads via Sanger sequencing. This analysis permitted the assignment of each specimen to one of three DNA quality categories - high (164bp sequence), medium (94bp sequence), or low (no sequence). Ten specimens from each category were subsequently analyzed via a PCR-based NGS protocol requiring very little template DNA. It recovered sequence information from all specimens with average read lengths ranging from 458bp to 610bp for the three DNA categories. By sequencing ten specimens in each NGS run, costs were similar to Sanger analysis. Future increases in the number of specimens processed in each run promise substantial reductions in cost, making it possible to anticipate a future where barcode sequences are available from most type specimens. This article is protected by copyright. All rights reserved.
Article
The network theoretical framework of ecological community studies is expected to promote not only the basic understanding of ecological and coevolutionary dynamics but also the application of those scientific insights into ecosystem management. However, our knowledge of ecological network architecture in the wild largely stems from empirical studies on macro-organismal systems such as those of plant-pollinator, plant-seed disperser, and prey-predator interactions. In this sense, we have remained ignorant of the diversity of ecological network architecture, its underlying assembly processes, and its consequences on ecological and coevolutionary dynamics. In this paper, I discuss how the high-throughput DNA barcoding of microbes, especially that based on next-generation sequencing, potentially expands the target of ecological network studies. I review the methodological platforms of next-generation sequencing-based analyses of microbe-host animal/plant networks and then introduce some case studies on the networks of plants and their hyper-diverse fungal symbionts. As those preliminary studies are uncovering the unexpected diversity of ecological network architecture, further application of such next-generation sequencing-based analyses to a diverse array of microbial systems will significantly improve our views on community ecological and coevolutionary processes.
Article
Protists, the most diverse eukaryotes, are largely considered to be free-living bacterivores, but vast numbers of taxa are known to parasitize plants or animals. High-throughput sequencing (HTS) approaches now commonly replace cultivation-based approaches in studying soil protists, but insights into common biases associated to this method are limited to aquatic taxa and samples. We created a mock community of common free-living soil protists (amoebae, flagellates, ciliates), extracted DNA and amplified it in the presence of metazoan DNA using 454 HTS. We aimed at evaluating whether HTS quantitatively reveals true relative abundances of soil protists and to investigate whether the expected protist community structure is altered by the co-amplification of metazoan-associated protist taxa. Indeed, HTS revealed fundamentally different protist communities from those expected. Ciliate sequences were highly overrepresented, while those of most amoebae and flagellates were underrepresented or totally absent. These results underpin the biases introduced by HTS that prevent reliable quantitative estimations of free-living protist communities. Furthermore, we detected a wide range of non-added protist taxa likely introduced along with metazoan DNA, which altered the protist community structure. Among those, 20 taxa most closely resembled parasitic, often pathogenic taxa. Therewith, we provide the first HTS data in support of classical observational studies that showed that potential protist parasites are hosted by soil metazoa. Taken together, profound differences in amplification success between protist taxa and an inevitable co-extraction of protist taxa parasitizing soil metazoa obscure the true diversity of free-living soil protist communities. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Article
DNA metabarcoding offers new perspectives in biodiversity research. This recently-developed approach to ecosystem study relies heavily on the use of next-generation sequencing (NGS), and thus calls upon the ability to deal with huge sequence datasets. The OBITools package satisfies this requirement thanks to a set of programs specifically designed for analyzing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to setup tailored-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The OBITools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools. A Galaxy wrapper is available on the GeneOuest Core facility Toolshed: http://toolshed.genouest.org. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.