ArticlePDF Available

Abstract and Figures

The use of lake sedimentary DNA to track the long-term changes in both terrestrial and aquatic biota is a rapidly advancing field in paleoecological research. Although largely applied nowadays, knowledge gaps remain in this field and there is therefore still research to be conducted to ensure the reliability of the sedimentary DNA signal. Building on the most recent literature and seven original case studies, we synthesize the state-of-the-art analytical procedures for effective sampling, extraction, amplification, quantification and/or generation of DNA inventories from sedimentary ancient DNA (sedaDNA) via high-throughput sequencing technologies. We provide recommendations based on current knowledge and best practises.
Content may be subject to copyright.
Quaternary 2021, 4, 6. https://doi.org/10.3390/quat4010006 www.mdpi.com/journal/quaternary
Review
Lake Sedimentary DNA Research on Past Terrestrial and
Aquatic Biodiversity: Overview and Recommendations
Eric Capo
1,
*, Charline Giguet-Covex
2,
*
,†
, Alexandra Rouillard
3,4,
*
,†
, Kevin Nota
5,
*
,†
, Peter D. Heintzman
6,
*
,†
, Aurèle
Vuillemin
7,8,
*
,†
, Daniel Ariztegui
9
, Fabien Arnaud
2
, Simon Belle
10
, Stefan Bertilsson
10
, Christian Bigler
1
, Richard
Bindler
1
, Antony G. Brown
6,11
, Charlotte L. Clarke
11
, Sarah E. Crump
12
, Didier Debroas
13
, Göran Englund
1
, Gen-
tile Francesco Ficetola
14,15
, Rebecca E. Garner
16,17
, Joanna Gauthier
17,18
, Irene Gregory-Eaves
17,18
, Liv Heinecke
19,20
,
Ulrike Herzschuh
19,21
, Anan Ibrahim
22
, Veljo Kisand
23
, Kurt H. Kjær
4
, Youri Lammers
6
, Joanne Littlefair
24
, Erwan
Messager
2
, Marie-Eve Monchamp
17,18
, Fredrik Olajos
1
, William Orsi
7,8
, Mikkel W. Pedersen
4
, Dilli P. Rijal
6
, Jo-
han Rydberg
1
, Trisha Spanbauer
25
, Kathleen R. Stoof-Leichsenring
19
, Pierre Taberlet
6,15
, Liisi Talas
23
, Camille
Thomas
9
, David A. Walsh
16
, Yucheng Wang
4,26
, Eske Willerslev
4
, Anne van Woerkom
1
, Heike H. Zimmermann
19
,
Marco J. L. Coolen
27,
*
,‡
, Laura S. Epp
28,
*
,‡
, Isabelle Domaizon
29,30,
*
,‡
, Inger G. Alsos
6,‡
and Laura Parducci
5,31,
*
,‡
1
Department of Ecology and Environmental Science, Umeå University, Umeå 90736, Sweden;
christian.bigler@umu.se (C.B.); richard.bindler@umu.se (R.B.); goran.englund@umu.se (G.E.);
fredrik.olajos@umu.se (F.O.); johan.rydberg@umu.se (J.R.); annevwoerkom@gmail.com (A.v.W.)
2
Department Environment, Dynamics and Territories of the Mountains (EDYTEM), UMR 5204 CNRS, Uni-
versity Savoie Mont Blanc, 73370 Le Bourget du Lac, France; fabien.arnaud@univ-smb.fr (F.A.);
erwan.messager@univ-smb.fr (E.M.)
3
Department of Geosciences, UiT the Arctic University of Norway, 9019 Tromsø, Norway
4
Section for Geogenetics, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark;
kurtk@snm.ku.dk (K.H.K.); mwpedersen@sund.ku.dk (M.W.P.); yw502@cam.ac.uk (Y.W.);
ewillerslev@sund.ku.dk (E.W.)
5
Department of Ecology and Genetics, the Evolutionary Biology Centre, Uppsala University, 752 36 Uppsala,
Sweden
6
The Arctic University Museum of Norway, UiT the Arctic University of Norway, 9010 Tromsø, Norway;
tony.brown@soton.ac.uk (A.G.B.); youri.lammers@uit.no (Y.L.); dilli.p.rijal@uit.no (D.P.R.);
pierre.taberlet@univ-grenoble-alpes.fr (P.T.); inger.g.alsos@uit.no (I.G.A.)
7
Department of Earth & Environmental Sciences, Ludwig-Maximilians-Universität München, 80331 Munich,
Germany; w.orsi@lrz.uni-muenchen.de
8
GeoBio-Center LMU, Ludwig-Maximilians-Universität München, 80331 Munich, Germany
9
Department of Earth Sciences, University of Geneva, 1205 Geneva, Switzerland;
daniel.ariztegui@unige.ch (D.A.); camille.thomas@unige.ch (C.T.)
10
Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, 75007 Upp-
sala, Sweden; simon.belle@slu.se (S.B.); stefan.bertilsson@slu.se (S.B.)
11
School of Geography and Environmental Science, University of Southampton, Southampton SO17 1BJ, UK;
C.Clarke@soton.ac.uk
12
Institute of Arctic and Alpine Research, University of Colorado Boulder, Boulder, CO 80309, USA;
sarah.crump@colorado.edu
13
LMGE, UMR CNRS 6023, University Clermont Auvergne, 63000 Clermont-Ferrand, France;
didier.debroas@uca.fr
14
Department of Environmental Science and Policy, University of Milan, 20122 Milan, Italy;
francesco.ficetola@unimi.it
15
University Grenoble Alpes, CNRS, Université Savoie Mont Blanc, LECA, 38610 Grenoble, France
16
Department of Biology, Concordia University, Montréal, QC H3G 1M8, Canada;
rebecca.garner@mail.concordia.ca (R.E.G.); david.walsh@concordia.ca (D.A.W.)
17
Groupe de recherche interuniversitaire en limnologie, Montréa, QC H3C 3J7, Canada;
gauthier.joanna@gmail.com (J.G.); irene.gregory-eaves@mcgill.ca (I.G.-E.);
me.monchamp@gmail.com (M.-E.M.)
18
Department of Biology, University McGill, Montréal, QC H3A 0G4, Canada
19
Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, 27570 Potsdam, Germany;
liv.heinecke@uni-potsdam.de (L.H.); Ulrike.Herzschuh@awi.de (U.H.);
kathleen.stoof-leichsenring@awi.de (K.R.S.-L.); heike.zimmermann@awi.de (H.H.Z.)
20
Institute for Mathematics, University of Potsdam, 14469 Potsdam, Germany
21
Institute for Environmental Sciences and Geography, University of Potsdam, 14469 Potsdam, Germany
22
Department of Biology, University of Konstanz, 78464 Konstanz, Germany; anan.ibrahim@uni-konstanz.de
Citation: Capo, E.; Giguet-Covex, C.;
Rouillard, A.; Nota, K.; Heintzman,
P.D.; Vuillemin, A.; Ariztegui, D.;
Arnaud, F.; Belle, S.; Bertilsson, S.; et
al. Lake Sedimentary DNA Research
on Past Terrestrial and Aquatic Bio-
diversity: Overview and Recommen-
dations. Quaternary 2021, 4, 6.
https://doi.org/10.3390/quat4010006
Academic Editor: Matthew Peros
Received: 27 November 2020
Accepted: 29 January 2021
Published: 13 February 2021
Publisher’s Note: MDPI stays neu-
tral with regard to jurisdictional
claims in published maps and insti-
tutional affiliations.
Copyright: © 2021 by the authors. Li-
censee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and con-
ditions of the Creative Commons At-
tribution (CC BY) license (http://crea-
tivecommons.org/licenses/by/4.0/).
Quaternary 2021, 4, 6 2 of 61
23 Institute of Technology, University of Tartu, 50090 Tartu, Estonia; veljo.kisand@ut.ee (V.K.);
liisi.talas@gmail.com (L.T.)
24 School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, UK;
j.e.littlefair@qmul.ac.uk
25 Department of Environmental Sciences and Lake Erie Center, University of Toledo, Toledo, OH 43606, USA;
trisha.spanbauer@utoledo.edu
26 Department of Zoology, University of Cambridge, Cambridge CB2 1TN, UK
27 Western Australia Organic and Isotope Geochemistry Centre, School of Earth and Planetary Sciences, the
Institute for Geoscience Research (TIGeR), Curtin University, Bentley 6102, Australia
28 Limnological Institute, Department of Biology, University of Konstanz, 78464 Konstanz, Germany
29 INRAE, University Savoie Mont Blanc, CARRTEL, 74200 Thonon les bains, France
30 UMR CARRTEL, Pôle R&D ECLA, 74200 Thonon les bains, France
31 Department of Environmental Biology, Sapienza University of Rome, 00185 Rome, Italy
* Correspondence: eric.capo@hotmail.fr (E.C.); charline.giguet-covex@univ-smb.fr (C.G.-C.);
alexandra.rouillard@sund.ku.dk (A.R.); kevin.nota@ebc.uu.se (K.N.); peter.d.heintzman@uit.no (P.D.H.);
a.vuillemin@lrz.uni-muenchen.de (A.V.); Marco.Coolen@curtin.edu.au (M.J.L.C.);
laura.epp@uni-konstanz.de (L.S.E.); isabelle.domaizon@inrae.fr (I.D.); laura.parducci@uniroma1.it (L.P.)
Joint second authors.
Joint last authors.
Abstract: The use of lake sedimentary DNA to track the long-term changes in both terrestrial and
aquatic biota is a rapidly advancing field in paleoecological research. Although largely applied now-
adays, knowledge gaps remain in this field and there is therefore still research to be conducted to
ensure the reliability of the sedimentary DNA signal. Building on the most recent literature and
seven original case studies, we synthesize the state-of-the-art analytical procedures for effective
sampling, extraction, amplification, quantification and/or generation of DNA inventories from sed-
imentary ancient DNA (sedaDNA) via high-throughput sequencing technologies. We provide rec-
ommendations based on current knowledge and best practises.
Keywords: sedimentary ancient DNA; sedimentary DNA; lake sediments; paleolimnology; paleoe-
cology; paleogenetics; paleogenomics; metabarcoding; metagenomics; biodiversity
1. Tracking Past Ecological Changes from Lakes and Catchments with Sedimentary
DNA
1.1. Sedimentary DNA, a Powerful Proxy to Track Past Biodiversity Changes
Lake sediments consist of both autochthonous (in-lake) and allochthonous (from the
catchment and beyond) organic and inorganic matter. Paleoecological inquiries using bi-
ological archives stored in lake sediments have been largely dominated by microscopic
analyses of the relatively limited number of aquatic and terrestrial groups that leave well-
preserved and readily identifiable morphological remains in the sediment (e.g., silicified
diatoms, calcified nannofossils, organic walled or calcified dinoflagellates, chrysophytes
cysts, Cladocera remains, chironomid head capsules, fungal spores, pollen and plant
macrofossils) [1–9], keeping the remaining biological diversity out of reach. Where these
morphological remains are not well preserved or cannot be taxonomically identified to
the species level, alternative proxies have been sought to build a more detailed and com-
prehensive understanding of past biological diversity in a broader range of environments
[10,11]. In particular, the use of biomarkers, such as pigments and lipids, have emerged
as reliable alternatives for the study of past ecosystems functioning [12]. However, their
taxonomic specificity remains limited.
A more recent alternative that is now gaining in popularity is to target nucleic acids
(DNA) preserved in the sediment archive. The sequencing of targeted genetic regions or
the metagenome (all DNA present) extracted from sedimentary DNA can be used to re-
construct changes in biodiversity i.e., distribution of species based on the genetic diver-
sity—over a range of temporal and spatio-ecological scales [13–16]. Depending on the
Quaternary 2021, 4, 6 3 of 61
state of degradation—affected by age, temperature and other environmental variables
[17]—sedDNA can be regarded as sedimentary DNA (younger and better preserved) or
as sedimentary ancient DNA (sedaDNA; often older, and more poorly preserved). For
simplicity, we define sedaDNA as the fraction of environmental DNA buried in subsur-
face sediment that originates from organisms that are no longer physiologically active. To
the best of our knowledge, the first study to report the presence of ancient DNA in lake
sediments was performed by Coolen and Overmann [18], who demonstrated that Holo-
cene changes in water stratification and euxinia (i.e., a water column that is both anoxic
and sulfidic) can be inferred from downcore changes in sulfur-reducing bacteria, based
on analyses of sedimentary 16S rRNA amplicons. Since then, the number of publications
in the emerging field of sedaDNA has increased dramatically, particularly in the last five
years (Figure 1, Table S1).
Figure 1. Bar chart showing the number of publications per year from 2000 to 2020 containing at least one of the following
terms in the title, the keywords, or the abstract: sedaDNA; sedimentary ancient DNA; sedDNA; sedimentary DNA. The
document search was conducted through Scopus on 25 November 2020 and was restricted to journal articles and review
articles. The results are split into two categories for visualization: publications that contained the search terms sedaDNA
or sedimentary ancient DNA (dark grey bars); and publications that had the search terms sedDNA or sedimentary DNA,
but that did not contain the first two terms (pale grey bars).
In recent years, sedaDNA has shown a great potential for reconstructing past biolog-
ical assemblages within lakes and their terrestrial catchments (for review see [13–15,19–
22]). Following extraction of sedaDNA, several analytical methods have been used (Table
S1) including: (i) methods aiming to detect or quantify the occurrence of specific target
organisms on the basis of endpoint PCR assays, quantitative real-time qPCR and special
applications such as droplet digital ddPCR; (ii) DNA metabarcoding-based methods com-
bining PCR amplification of marker loci (often described as “barcodes”) with high-
throughput sequencing; (iii) metagenomic approaches based on untargeted shotgun se-
quencing of the total pool of DNA recovered from the sediment; (iv) hybridization-based
target enrichment methods to recover DNA fragments of interest from the sediment met-
agenome (see Section 3.8).
Because sedaDNA captures biological changes in both the aquatic and terrestrial eco-
systems, past genetic signals have the potential to provide insights about landscape de-
velopment (including terrestrial floral and faunal changes and anthropogenic impacts)
and the lake ecology itself (Table S1). Using sedaDNA to reconstruct changes in local veg-
etation has become a particularly well-established method as a complement to traditional
microscopic analyses in lake sediment (e.g., pollen) e.g., [23–30]. It also offers the oppor-
Quaternary 2021, 4, 6 4 of 61
tunity of studying the response of organisms to disturbances of natural and/or anthropo-
genic origin [31–35], the interactions of species at different trophic levels [36–38] and be-
tween native and introduced species [35,39], or species recovery and ecosystem restora-
tion [40]. Lake sedaDNA is not only of interest to paleoecologists but also to archaeolo-
gists, because the recovered data can provide insights into human history and interactions
with the environment such as agriculture and urbanisation e.g., [41,42]. Lake sedaDNA
can also be used by geomorphologists to trace sediment sources or by evolutionary biol-
ogists to study population changes through time.
In the present work, we aim to provide an overview of the most recent advances in
sedaDNA research on lakes. We also identify the major methodological knowledge gaps
that remain and present original data—in the form of seven case studies—that tackle these
challenges. Finally, we recommend a series of current best laboratory practices to success-
fully and robustly reconstruct past environmental change from lake sediment DNA ar-
chives, but acknowledge that this is a rapidly advancing field, which requires continuous
update.
1.2. sedaDNA to Study Past Vegetation Changes from Lake Catchment
DNA-based studies of past vegetation have mainly focused on arctic, boreal, and al-
pine regions because of their high sensitivity to climate change, providing new insights
on past vegetation dynamics and species distributions e.g., [16,26,27,43–49]. At high-lati-
tude regions, sedaDNA has for example contributed to increased knowledge on the oc-
currence of insect-pollinated plants, which are typically underestimated in pollen anal-
yses [24,27,50]. Some recent examples of studies are described below.
An exceptionally rich sedaDNA record that covers 24,000 years of vegetation dynam-
ics in the Polar Urals, provides empirical evidence that arctic-alpine species survived early
Holocene forest expansion in this heterogeneous landscape [27,47]. A study of 10 lakes in
northern Fennoscandia show that there was a continuous increase in the regional species
pool from the onset of the Holocene until the last two millennia, suggesting a severe time
lag in colonization [16]. In the European Alps, sedaDNA studies have provided new
knowledge about the history of agricultural activities; for instance, fruit trees were de-
tected by sedaDNA but were not detected by microscopic pollen analyses [51]. Other se-
daDNA studies have focused on the effects of soil evolution, climate change and pastoral
activities on plant communities e.g., [32]. Based on the current body of literature, we know
that sedaDNA can also trace vegetation changes over millennia in environments at cold
high-latitude regions, including tropical regions from high-altitude sites with lower tem-
peratures and warmer lowland sites [52–54]. At the same time, some of the above-cited
works highlighted confounding methodological issues related to accelerated degradation
of sedaDNA with temperature, which may result in less informative historical DNA sig-
nals in lake sediments from moderate to hot temperature environments e.g., [53].
1.3. sedaDNA to Detect Human and Animal Presence in the Lake Catchment
Human activities have been inferred from microscopic analyses of pollen from culti-
vated plants, native plants that are favored by livestock grazing or other disturbances and
coprophilous fungal spores associated with livestock [55]. The first studies that aimed to
track the presence of humans and domesticated animals in a lake catchment with sedi-
mentary DNA-based methods used bacterial DNA indicative of past human and animal
faecal waste [56,57]. In contrast to pollen, which normally gives both a local and regional
signal in a lake, bacterial DNA provides only a local and catchment specific signal, alt-
hough it does not allow identification of which livestock species were present. The
metabarcoding approach applied to a high-altitude lake sediment record in the Alps re-
vealed the first history of livestock compositional dynamics, including Bos (cow), Ovis
(sheep), and Equus (horse/donkey) [31]. Other studies of lakes in the northwestern Alps
have similarly provided new detailed insights into the temporal patterns of pastoral ac-
Quaternary 2021, 4, 6 5 of 61
tivities and changes in livestock species composition, especially during the Medieval Pe-
riod [58–61]. Even in the very remote site of Kerguelen Island, an ongoing rabbit invasion
could be successfully traced using sedaDNA, elucidating the impact that the invasion had
on plant community and landscape erosion [35]. Other studies have used shotgun meta-
genomics to detect the presence of mammalian megafauna and other animals formerly
present in lake catchments. Such approaches have contributed to constraining the timing
of an island woolly mammoth population extinction in Alaska during the Holocene [33]
and understanding when newly deglaciated landscapes became biologically viable in
North America at the end of the last ice age [62]. However, due to methodological limita-
tions and/or lack of animal sedaDNA preservation in lakes, recent studies set out to target
mammalian DNA from lake sediments have not been particularly successful [63] and re-
sulted instead in by-catching different animal groups [64] (see Section 1.5).
1.4. sedaDNA to Unravel Past Diversity and Composition of Lake Biota
The use of long-term sedaDNA now offers the possibility to study the impact of cli-
matic, environmental, and anthropogenic perturbations on aquatic biota e.g., [11,34,65].
This approach provides information about Holocene and Late Pleistocene biota, including
taxa that leave no morphologically identifiable remains preserved in lakes (see [13] for
review). Thus far, the most common investigated aquatic organisms in sedaDNA studies
have been photosynthetic bacteria and phytoplankton [18,34,66–81]. Although macro-
phytes and some algal groups are typically well detected in sedaDNA plant surveys [24–
26,81,82], there are few papers reporting the presence of zooplanktonic DNA [83,84] and
fish DNA from lake sediments [85–87].
Most of these studies describe the long-term responses of communities to environ-
mental perturbations (see [13]), while others investigate ecological interactions through
time, such as the co-existence of parasitic groups and their phytoplanktonic hosts [36,37]or
the role of double-stranded DNA viruses in terminating past algal blooms [88]. A study
succeeded in tracing within-population genetic variation in an algae population during
the Last Glacial Maximum [81]. The presence of key functional sets of genes from aquatic
organisms can also be traced using the sedaDNA approach, like in the case of mcy genes
responsible for producing microcystin by specific cyanobacteria [89,90], amoA genes asso-
ciated with ammonia oxidation [91] and merA genes involved in mercury detoxification
[92,93]. Finally, the research works showed how it is possible to study past activity of
methanotrophs in lake systems [66,94–97].
1.5. Influence of Taphonomic Processes on the Burial and Persistence of sedaDNA
The studies presented above show the utility of sedaDNA methods to unravel biodi-
versity changes in lakes and their catchments but several concerns remain about the in-
terpretation of such data due to the limited knowledge we have on the taphonomic pro-
cesses occurring in lakes (i.e., origin, transport, and preservation of genetic material under
prevailing environmental conditions). While issues related to taphonomy in lakes have
been reviewed elsewhere [21,39], we provide here a brief update.
From source environment to sediment: For aquatic organisms living in the lake, sev-
eral factors can influence the processes of incorporation and burial of DNA in the sedi-
ments, e.g., organism abundance, its spatial distribution, the ability to form cysts, and ed-
ibility of the focal taxon [98,99]). For terrestrial organisms living in the lake catchment, it
is important to consider also the transport of material and the DNA from the catchment
to the lake, which likely depends on multiple factors including soil erosion and hydrolog-
ical connectivity [14,30,39,64,82].
Degradation and adsorption of DNA molecules at the water-sediment interface and
in sediments: Environmental conditions in water and at the water–sediment interface and
within the sediment column—such as temperature, redox state, conductivity and pH—
influence the rates and extent of both abiotic and biotic DNA degradation e.g., [66,67,100].
The adsorption of DNA to mineral particles was recently reported to be a strong factor
Quaternary 2021, 4, 6 6 of 61
controlling the persistence of DNA molecules in sedimentary archives [101,102]. Desorp-
tion of adsorbed DNA from the particulate phase in sediments can also occur and is de-
pendent on the mineralogic composition, pore-water pH or the valence and concentra-
tions of cations in the sediment [103]. Changes in environmental conditions at the water–
sediment interface are thus likely to be of crucial importance for the long-term preserva-
tion of DNA molecules during burial into sediments. Meanwhile, sedaDNA has also been
successfully recovered and analyzed from aquatic systems that did not provide ideal
preservation conditions, including sediments from warm, tropical lakes [52,83,104,105]
and oxygenated deep-sea sediments [11,106]. However, the state of such preservation was
found to be poor with the exception of the upper sediments corresponding to the last ~200
years, as found by Bremond et al. [53] in tropical Lake Sele (Africa). This may be due to
accelerated DNA degradation rates related to higher temperatures (i.e., average annual
air temperature of 28 °C over the course of the year) and the associated higher bacterial
activity at their study site. As such, there is still much to be learned regarding the condi-
tions that promote and compromise sedaDNA preservation.
Early diagenesis of DNA molecules during burial in sediments: Experimental evi-
dence from lake sediments suggest that there is only a limited effect of early diagenesis
on the DNA signal of microbial eukaryotic communities [107]. However, it is possible that
DNA is degraded or damaged during early diagenesis, e.g., by microbes using extracellu-
lar DNA as energy sources [108] or by environmentally-induced strand breakage [109]. In
contrast, DNA from organisms that produce resting stages (cyanobacterial akinetes), Cla-
docera resting eggs (ephippia) or resting propagules (protists), is likely better protected
than extracellular DNA or intracellular DNA from organisms with more fragile outer
membranes (see Section 2.3). Similarly, recalcitrant structural elements such as lignin in
terrestrial plants are likely to protect cellular DNA against microbial degradation more
efficiently than for most of the aquatic primary producers (algae and macrophytes).
Long-term persistence of DNA molecules in sediments: Overall, the processes in-
volved in the preservation of DNA on long time scales in sediments are not yet fully un-
derstood, but DNA preservation still seems to be affected mainly by environmental con-
ditions (temperature and anoxia) during incorporation at the bottom of the lake. Never-
theless, sedaDNA molecules have been reported from more than 270,000-year-old sedi-
ments of Lake Van in Turkey [72], and from marine sediments nearly a million years old
[110]. Such reports however, are very close to the theoretical limit of ancient DNA preser-
vation [111] and should be therefore treated with caution until they are repeated using
methods that can properly authenticated ancient DNA from these substrates [112]. Today
we know that strand breakage, miscoding lesions’ crosslinks may heavily compromise the
analysis of sedaDNA molecules [109,113–115]. Thus, the quality of DNA molecules needs
to be carefully inspected to avoid confounding temporal biodiversity change with changes
in DNA quality over time, particularly when inferring long-term patterns on species per-
sistence and richness e.g., [16,26,39,47,116]. Authenticity of sedaDNA older than 10,000
years and/or from poorer preservation settings can be demonstrated by identifying char-
acteristic DNA damage patterns in metagenomic data and supported by corroborating
proxy data whenever possible [33].
2. To What Degree does the sedaDNA Signal Represent Past Communities?
Success in recovering targeted ancient DNA from sedimentary archives depends on
the taphonomy and preservation of DNA molecules within the sediments (as described
in Section 1.5) and the ability to extract, sequence and taxonomically identify the DNA
molecules [82,117]. The recovery success can be evaluated by comparing plant DNA sig-
nals with flora surveys in the lake catchments and signals from other sedimentary proxies
like macrofossils, pollen, biomarkers or historical records. Yet, it has now become evident
that the taxonomic breadth of environmental reconstructions based on metagenomics ap-
proaches (e.g., [62,81,118]) can now potentially expand to all groups of organisms and is
therefore too vast to be systematically “validated” by other proxies. The degradation of
Quaternary 2021, 4, 6 7 of 61
DNA molecules over time in sediments and the presence of active microbes that compro-
mise or obscure the archived ancient signal should also be considered carefully when do-
ing sedaDNA analysis.
2.1. sedaDNA Data Compared to Historical, Archaeological and Monitoring Data
Terrestrial organisms: For the DNA molecules of an organism that lived in a lakes
catchment, the chances of reaching the lake bottom depend on several factors [39]. One of
these factors is organism biomass, which affects DNA production at the source
[14,30,64,82]. A study comparing sedaDNA from surface samples with vegetation surveys
from eleven lakes in the boreal/subarctic ecozone showed that the effective detection of
plant taxa mainly depends on the abundance of the taxa in the vegetation and distance
from the lake [82]. Additionally, there were differences among the lakes studied and
among plant families. Overall, the sedaDNA signal was strong enough to reconstruct
near-shore vegetation types, and the detection of aquatic species was particularly good
[82]. When compared to historical maps, sedaDNA accurately tracked exotic conifer plan-
tations, and in contrast to pollen analyses from the same site, it did not show any signal
from taxa dispersed over longer distances [119]. Thus, the DNA signal from lake sediment
represents the vegetation within the catchment, in particular that in close hydrologic con-
nection to the lake (e.g., shoreline, streams).
Meanwhile, studies assessing the reliability of mammal DNA records from lake sed-
iments in comparison to historical or archaeological data are rare. This is because most
sedaDNA studies to date have focused on areas where such data are lacking, leaving this
gap as yet unfilled. However, the first study reconstructing livestock composition has
been validated by archaeological remains and historical data [31]. Finally, the temporal
occurrence of sedaDNA from rabbits, which had invaded an island in the 19th Century,
was consistent with known historical records [35].
Plankton DNA compared to water monitoring data: Several studies confirmed that
DNA from most aquatic planktonic organisms—from photosynthetic microbes to fish—
can be detected in lake sediments [11,18,73,85,120]. Recently, few studies provided first
insights into the reliability of DNA signals preserved in sediments to represent lacustrine
communities. For example, Capo et al. [120] revealed that 70% of the microbial eukaryotic
taxa (phylogenetic units in this work) living in the water column were retrieved in the
sedimentary archives of a temperate lake. However, a comparison of the protist composi-
tion in the water column and in the underlying surface sediments showed that some
groups, including cryptophytes, were underrepresented in sediments [120]. This is
thought to be related to their high nutritional value, which makes cryptophyte cells likely
to be preferentially grazed by herbivorous zooplankton [121,122], explaining therefore
why their DNA does not reach the sediment.
Processes involved during the transport of DNA molecules or dead cells in the water
column are still poorly understood [99], especially for smaller sized aquatic prokaryotes
and protists with poorly discernible morphological traits. Few bacterial taxa can be iden-
tified using traditional microscopy cell counts, with some exceptions such as (large-size
fraction) cyanobacteria. The work of Monchamp et al. [73] revealed indeed a correlation
between the taxonomy of pelagic cyanobacteria morphology identified in the water col-
umn and the genetic information obtained from sediment. Similarly, Garner et al. [80]
identified DNA from heterotrophic bacteria and viruses in sediment using the contempo-
rary microbial diversity in surface-water metagenomes from the corresponding lakes as
references. The presence of DNA from pelagic zooplankton in lake sediments confirms
the potential using the sedaDNA approach to study temporal community changes in
larger taxa, including rotifers, copepods, and Cladocera e.g., [11,45,83,123]. Altogether, as
recently stated by Armbrecht et al. [22], it is “reasonable to assume that obligate photo-
synthetic plankton and/or zooplankton do not survive and reproduce after burial”. While
there is relatively limited evidence to date of how the active growth of modern-day mi-
Quaternary 2021, 4, 6 8 of 61
crobial communities affects the ancient DNA signal at depth (i.e., below the first few cen-
timeters in the sediment column), sedaDNA studies investigating past bacteria and ar-
chaeal communities can be strengthened by the authentication of ancient DNA signals
with procedures to map DNA damage patterns such as those described in Pedersen et al.
[62] and Lammers et al. [81].
2.2. sedaDNA Data Compared to Other Sediment Proxies
Molecular reconstructions of past flora and fauna based on sedaDNA have been com-
pared against a range of biological and geochemical sediment proxies, often showing com-
plementarity between the different approaches [27–29,33,48]. In particular, comparisons
with well-established proxies such as plant macrofossils, pollen and diatom enumeration,
coprophilous fungal spores, or specific biomarker identification, have provided unique
opportunities to evaluate the nature and reliability of the DNA signal obtained from sed-
iments.
Plant DNA compared to pollen and macrofossil analyses: The first studies comparing
pollen and sedaDNA showed rather limited overlap, therefore only partly confirming the
reliability of the taxa detected by DNA signatures [23,28,124]. However, the detection of
taxa in sedaDNA has greatly increased in recent years with expanded reference libraries
and improved molecular methods, and we see now higher taxonomic overlap between
DNA and pollen [25,27]. A major issue for interpreting sedaDNA data, also applying to
pollen data, is the source area. Plant abundances inferred from metabarcoding reads seem
to decline with distance from the lake, suggesting that sedaDNA provides a local signal
[82]. Pollen is likely not a major source of DNA in sediment records, as pollen from taxa
typically dispersed over long distances is poorly recorded in sedaDNA [25,27,28,125–127].
A metagenomic study focusing on woody taxa from a Beringian island also found that
sedaDNA from spruce (Picea) could not be detected, although it was present in the pollen
record, whereas willow could be detected in both records, with the sedaDNA data au-
thenticated [127]. Compared to pollen, sedaDNA provide a more detailed account of ag-
ricultural activities due to its higher taxonomic resolution (e.g., identification of fruit or
vegetable plant species; In [39,58,128]. Thus, sedaDNA provides a more detailed descrip-
tion of the catchment area, particularly for the areas with close connectivity to the lake
such as the shoreline and inflowing streams and for the areas impacted by erosion pro-
cesses.
Fewer studies have compared sedaDNA with macrofossils [23–25,29,125] and the ex-
pectations are that plant species detected with macrofossil can also be detected with se-
daDNA. However, some taxa detected as macrofossils may not be observed in the DNA
record and vice versa. Which proxy is more sensitive may vary according to the nature
and conditions of the site, the taxonomic group and the amount of biomass produced
[24,25]. Nevertheless, the majority of the taxa detected are similar in comparative studies
where macrofossil preservation is good.
In conclusion, comparison with pollen and macrofossils, as well as vegetation sur-
veys and historical maps, confirms that sedaDNA detects the common species and has
sufficient detectability and resolution to determine the vegetation types.
Mammal DNA compared to fossil records: The presence of herbivorous animal herds
in a lake catchment can be inferred from sediments by using other proxies than sedaDNA.
The most established approach is based on the analysis of pollen and the detection of ni-
trophilous and ruderal taxa favored by animal faeces, trampling and selection due to plant
consumption e.g., Rumex sp., Urtica sp., Chenopodium sp., Plantago sp.; In [55]. Some stud-
ies also focused on the presence of DNA from bacterial lineages specific to the gut micro-
biota [56,57,129–131]. Other approaches involve the use of spores of coprophilous fungi,
which develop on herbivorous faeces (Sporormiella sp., [33,35,130], lipid biomarkers [132],
or corroborating data from radiocarbon dated bones [33]. All these approaches can be
used to assess the robustness of mammalian DNA records [33,35,39,130]. Indeed, several
of these studies showed that the detection rate of mammalian DNA is lower than rates
Quaternary 2021, 4, 6 9 of 61
estimated via other inferred methods. A good detection of mammals has been observed
at sites where there was only one source of drinking water [33] or where the concentration
of mammals was high due to pastoral practices (e.g., presence of stabling areas; In [31,39])
or migration routes [62]. In contrast, the detection of mammals in tundra sites is poorer,
likely because individuals are scattered and the sources of drinking water are multiple
[64]. At a more southern site, mammal sedaDNA has not been detected even in the pres-
ence of abundant coprophilous fungal spores [63]. Nevertheless, sedaDNA has the ad-
vantage of allowing identification at the species level, which is not possible using pollen,
Sporormiella spores or specific bacteroidales. While currently no mammalian DNA and
faecal DNA datasets have been compared, this is, nevertheless, a promising complemen-
tary approach. Indeed, ratios between different stanols and bile acids can be used to dis-
tinguish between omnivore and ruminant species and between humans and pigs
[131,133–135]. Though limited, evidence from archaeological sites suggests that shotgun
metagenomic reads assigned to a range of different animal taxa, mirror their respective
biomass estimated using classical analyses of bone remains [136]. Nevertheless, some
work remains to establish whether this quantitative relationship for sedaDNA might hold
across different types of archives.
DNA from aquatic biota compared to lipids, pigments and subfossils: The sedaDNA
signal of past aquatic biota has mainly been compared to lipid, pigment, or subfossil (such
as diatom frustules) records. The current view is that the DNA information in sediments
degrades faster than pigments or lipids, although significant positive correlations have
been observed between these different proxies [18,72,77,137–141]. Studies comparing dia-
tom diversity retrieved from sediment using morphological and genetic approaches con-
sistently show that beta-diversity values obtained from both methods were highly com-
parable (i.e., the turnover is similar), whereas alpha-diversity values and taxonomic as-
signment datasets were not [45,142–147]. This discrepancy in the reconstructions is at least
in part attributable to very sparsely populated genetic databases available for diatoms,
which can be tested in the near-future thanks to the exponential growth of genomic data-
bases and initiatives such as the Earth Biogenome Project [148]. Heinecke et al. [149]
demonstrated that the detection of sedaDNA identified as Potamogetonaceae was con-
sistent with the recovery of subfossil remains from a species within this family of hydro-
phytes, whereas Clarke et al. [26] recently detected Callitriche and Sparganium in both pol-
len and sedaDNA, whereas each of the proxies detected an additional two and five taxa
of aquatic macrophytes.
2.3. Dead or Alive: What Makes Up the sedaDNA Pool?
DNA from animals, land plants, zooplankton, and from many photosynthetic bacte-
ria and protists can be preserved in sediments. These data provide information on past
ecosystems prevailing at the time of deposition. This is because the pool of DNA is un-
likely to originate from organisms living in dark and/or anoxic conditions in the sediments
upon burial. In contrast, subsurface microbial communities (notably facultative and obli-
gate anaerobic microorganisms) are generally thought to be structured through in situ
environmental conditions such as the availability of electron acceptors and donors, poros-
ity, and sediment lithology e.g., [150,151]. However, recent studies suggest that subsur-
face microbial taxa were present at the time of deposition and that their vertical distribu-
tion in the sedimentary record was shaped by the paleoenvironmental conditions that
prevailed at the time of deposition [152–156]. For example, downcore sedimentary 16S
rRNA gene profiling revealed that Holocene sediments of Laguna Potrok Aike in Argen-
tina reflected a vertical stratification linked to electron acceptors availability while in the
Late Pleistocene samples, up to 50,000 years in age, salinity, organic matter-type and the
depositional conditions over the Last Glacial–Interglacial cycle being the most important
selective stressors [153]. Analogously, shotgun metagenomic analyses of sediments from
the Arabian Sea revealed subseafloor bacteria that were involved in denitrification pro-
Quaternary 2021, 4, 6 10 of 61
cesses during the formation of an extensive oxygen minimum zone [154]. A switch to fer-
mentation is a likely explanation for their subsequent long-term post-depositional sur-
vival. However, none of these examples determined to what extent the identified commu-
nities represented dead, dormant, or metabolically active communities. Conversely, a re-
cent study of Dead Sea sediments [157] illustrated a new pathway of carbon transfor-
mation in the subsurface and demonstrated how life can be maintained in extreme envi-
ronments characterized by long-term isolation and minimal energetic resources.
Besides revival and cultivation of the living subset of the community or methods
based on metabolic probing, several indirect approaches have been used to test for micro-
bial viability (live/dead) and/or activity (see [158] for a review). Here we describe the most
feasible approaches that can be optimised for use with sedimentary records and which are
also compatible with downstream DNA sequencing. It is generally accepted that a cell
must be intact, capable of reproduction, and metabolically active to be considered alive.
The separate extraction of intracellular vs. extracellular DNA and subsequent amplicon
or shotgun metagenomic sequencing analysis can reveal the diversity and metabolic po-
tential of intact living vs. dead subsurface bacteria. This approach was applied to shallow
sediments of tropical Lake Towuti (Indonesia) to reveal which microbial populations
grew, declined, or persisted at low density with sediment depth [159]. Sequencing analy-
sis of reverse transcribed sedimentary RNA markers most likely reflects the activities of
microbes that were alive at the time of sampling [160]. That is because transcription is
among the first levels of cellular response to environmental stimuli and RNA has a much
shorter average half-life than DNA, being in the order of hours or days for ribosomal RNA
and hours to minutes for messenger RNA [161]. Viability PCR via propidium monoazide
(PMA) is another promising live/dead approach for sedimentary bacteria [162]. This nu-
cleic acid intercalating dye binds to extracellular DNA and DNA inside damaged cells
whereas it cannot enter living cells with intact membranes. Upon exposure to a bright
light source, photoactivation causes PMA to form covalent bonds so that the irreversibly
damaged DNA cannot be amplified in PCR assays. A comparison of microbial communi-
ties between untreated and PMA treated samples will reveal, respectively, total vs. living
bacteria. However, this approach needs to be performed on freshly collected sediments
and efficient exposure to the light source is essential and requires optimisation ([158] and
references therein).
Overall, short read lengths (<200 base pairs (bp)) have often been associated with the
more damaged signatures of ancient DNA libraries (e.g., [33,62,104,118]) providing an
idea about ancient and modern DNA in the sediment DNA pool. A few more sophisti-
cated bioinformatic approaches have also been developed. For instance, the growth rate
of environmental bacteria may be calculated by measuring genome replication rates from
shotgun metagenomic data. The most promising of these approaches is the Growth Rate
Index (GRiD) [163] because it can infer growth rates of specific microbial populations from
complete or draft genomes as well as metagenomic bins at ultra-low sequence coverage
(0.2x). If used in high throughput mode, prior knowledge of the microbial composition
and coverage is not required [163]. Finally, the assessment of ancient DNA damage pat-
terns, a bioinformatic method applied to metagenomic data (see Section 3.8), has recently
been applied to sedaDNA to identify ancient DNA sequences with post-mortem damage
[29,33,62,81,118,127]. Such a procedure is a powerful method to ensure the authenticity of
the DNA fragments assigned as ancient in sedaDNA studies.
3. State of the Art Lake sedaDNA Analyses
Working on sensitive samples such as sedaDNA requires the application of strict
sampling and laboratory protocols and prevents contamination by modern exogenous
DNA. Lake-sediment cores should be opened and sub-sampled in clean, dedicated an-
cient DNA laboratories [164,165]. The DNA is extracted from the samples, and molecular
methods are then applied using targeted approaches (PCR, qPCR, ddPCR, metabarcod-
Quaternary 2021, 4, 6 11 of 61
ing) or whole-(meta)genome shotgun sequencing (shotgun metagenomics, target enrich-
ment through hybridization capture) (see Table S1 for detailed information about the
methodology used in each study). Recovered DNA sequences are then taxonomically
and/or functionally annotated using a suite of bioinformatic tools to answer paleoecolog-
ical questions. The potential and limitations of DNA approaches have been largely dis-
cussed [117,166,167]. However, several considerations specifically related to the study of
ancient DNA in sediments are addressed here [13,14,21,22,103]. The appropriate proce-
dure for the analysis of sedaDNA will inevitably depend on an array of parameters, from
the origin and distribution of studied organisms in their ecosystems to the factors influ-
encing sedaDNA preservation, extractability and, in the case of PCR-directed approaches,
the probability of amplifying authentic sedaDNA. In this section, we synthesize the most
recent literature concerning the different steps of experimental design for sedaDNA work
which we augment with seven original case studies described in detail in the Appendix
A.
3.1. Criteria for the Selection of Lakes
Lake selection is first and foremost defined by the specific scientific aims and budg-
etary considerations. For many purposes, like archaeological studies, choice might be lim-
ited to natural archives close to the site of interest and conditions that lead to efficient
burial and preservation of DNA in sediments. DNA paleo-reconstructions have mostly
used sediments from alpine, boreal and arctic lakes, where ancient DNA is well preserved
due, for instance, to low temperatures, bottom water anoxia, and little to no bioturbation
[100]. Nevertheless, successful sedaDNA studies have also been conducted in temperate
lakes, where conditions appear to be less suitable for DNA preservation. Thus, we likely
do not yet have a full understanding of the factors controlling DNA preservation. For ex-
ample, it has been suggested that faster sedimentation rates in temperate and tropical re-
gions [168] result in the DNA being buried faster below the active surface sediment zone
which would cause more rapid preservation and immobilization in anoxic conditions.
This may in turn overshadow higher degradation rates that result from the comparatively
high temperatures that prevail at the bottom of such systems. More work is needed to
accurately define the environmental conditions where sedaDNA is most likely to be effi-
ciently preserved.
When considering the effect of past environmental changes on terrestrial ecosystems,
the topography and size of a lake catchment relative to the lake size can influence the
prospects of successfully using a sedaDNA approach. For extracellular DNA, which is
readily adsorbed to clay particles, the source of material transported to lake sediment can
be strongly influenced by erosional processes [39]. For example, surface soil horizons
would be expected to provide more plant DNA than deep mineral horizons, bare soil or
glacial flours. In addition, a well-developed hydrological network and higher rates of top-
soil erosion may transport and deposit DNA that provides a better representation of a
catchment flora and thus of the different potential habitats than less hydrologically con-
nected areas [39]. Thus, for lakes with low allochthonous sediment input from areas with
low relief and/or no major inflow streams, the DNA signal will mostly represent the plant
community in the riparian zone within a few hundred meters [82], or the lake itself. On
the other hand, the DNA signal from lakes with a larger hydrological catchment and con-
siderable riverine input may represent a much larger source area [27,47]. No sedaDNA
research, to our knowledge, has focused on transport processes in arid zones, however it
is likely that the intermittent hydrological connection to the catchment would also affect
the representativity of the DNA signal towards a stronger riparian signature, in a similar
way than what has been found for bulk organic carbon pools and biomarkers [169,170].
Additionally, different sediment lithologies, including those from within the same
lake basin, pose varying challenges to successful PCR amplification of sedaDNA. This is
the case for the presence of humic substances, co-extracted with DNA, that can act as in-
hibitors and have adverse effects on the performance of any PCR or other nucleic acid
Quaternary 2021, 4, 6 12 of 61
analysis (e.g., [171]). Depending on the concentration and type of inhibitors and the par-
ticular enzymes used for the PCR, effects can be highly variable, i.e., some enzymes being
more sensitive than others. In our case study A1 (Appendix A), we evaluated to what
extent PCR inhibition was found to be related to sediment quality by taking advantage of
a minerogenic-rich to organic-rich sediment continuum. Analyzing PCR inhibition along
a sediment profile (138 sediment samples), no clear relationships between inhibition and
sediment type were observed. However, it was noted that minerogenic-rich sediments
(~32,000–21,000 yr. BP) had little to no inhibition while organic-rich sediments (~21,000–
14,000 yr. BP) resulted in stronger inhibition even if these effects were variable. A slow
decrease in inhibition is observed towards the youngest organic part of the core, creating
an overall V-shape pattern (Figure 2). The reason for this remains unclear. It might be
related to community changes in or around the lake, or (bio)chemical changes in the sed-
iments. One key result of this case study is the strong negative correlation (r2 0.66, p <
0.01) between PCR inhibition and the number of plant DNA sequences amplified and se-
quenced in our work (Figure A1 in Appendix A).
Figure 2. The plot shows three measures of DNA extraction/amplification quality with time on the
x-axis (years before present (BP)). The first plot (blue dots) shows the DNA concentration (ng.µL
1
)
with a moderate to strongly negative correlation with age. The second plot (pink dots) shows the
PCR inhibition as the dilution volume (in µL) necessary for qPCR reactions to succeed. Inhibition is
absent until ~23,000 years BP and increases steadily afterwards. The third plot (green dots) shows
the log
10
of the mean number of raw reads from plant DNA metabarcoding. The number of reads is
strongly and negatively correlated to the level of inhibition (see Appendix A). A simplified descrip-
tion of sediment lithology is provided at the top of the figure: minerogenic (M), organic (O), and
minerogenic–organic (MO) sediment types.
Quaternary 2021, 4, 6 13 of 61
3.2. Number of Sediment Cores to Collect for sedaDNA
In molecular paleoecological studies, it is common practice to collect a single sedi-
ment core from the deepest part of a lake because we assume that the genetic information
is homogeneously distributed across the sediments and that this in turn reflects the biodi-
versity in the catchment. This practice is largely based on the assumption that DNA is
distributed in a manner similar to fine-grained material such as organic matter or pollen,
where one core is typically representative for the entire studied lake basin. However, this
assumption can be questioned because we know that micro and macrofossils and organic
matter can have a patchy distribution in the lake basin [172–174]. Additionally, the trans-
fer and deposition of organic matter—and therefore catchment-DNA—in lake sediments
is not necessarily homogeneous and may depend on catchment features [39]. Indeed,
while a single-core signal may be suitable for capturing the temporal dynamics of small
planktonic organisms that are evenly distributed in the water mass, the detection of DNA
from larger aquatic organisms (e.g., fish, hydrophytes, littoral mussel species) can be
strongly influenced by their more heterogenous in-lake distributions, as shown previ-
ously in eDNA studies [99,175,176]. A complex lake topography (e.g., lakes with two dis-
tinct basins) may also cause spatial variation in the DNA signal. Thus, there is a need to
assess spatial variability in sedaDNA signals.
The use of “field replicates” i.e., collection of several sediment cores within one lake
basin, in sedaDNA studies may be used to assess (i) how consistent the signal is at a spe-
cific site (core-site replicates) and (ii) whether or not there is spatial variability in the se-
daDNA signal. The work of Etienne et al. [177] showed that field replicates led to a high
spatial heterogeneity on the signal of fungal spores. In contrast, the recent work of
Weisbrod et al. [178] with surface sediment DNA showed that a single sediment core can
capture the dominant microbial taxa when targeting toxin-producing cyanobacteria. Re-
garding aquatic plants, their dispersion potential in the water has been proposed as a fac-
tor that can influence their detection in sedaDNA studies, particularly in large and deep
lakes (5.45 km2 surface area, 71 m maximum depth; [179]). In this study, free floating-leaf
plants that can be more easily dispersed, were also readily detected in the deepest part of
the lake. In contrast, helophytes, which are rooted in the near-shore area (littoral zone),
were less-well detected and submerged plants were in an intermediate position. These
results contrast with the findings from a survey of 11 smaller and shallower lakes (0.04–
27 ha; 1.7–20 m), where two samples taken 15 cm apart in the center of each lake allowed
for the detection of 90% of the common and dominant and 30–60% of scattered and rare
taxa of macrophytes [82].
Taking multiple sediment cores from the same site, i.e., “core-site replicates” may
also facilitate the detection of organisms that are in relatively low abundance in aquatic
systems or further away from the lake within the catchment. It may, for example, be nec-
essary for detection of fish DNA in sediments. Indeed, despite numerous attempts by
multiple authors involved in the present work, the detection of fish from sedaDNA ar-
chives has only been reported in a few lake systems to date [85–87,180]. Although there is
still uncertainty about the reasons for this apparent failure, the low amounts of fish DNA,
compared to microbial DNA, present in the sediment may be one explanation. The poten-
tial to increase the sensitivity of such analyses by capitalizing on the dramatic rise in se-
quencing capacity that we are currently experiencing may indeed help us tackle this prob-
lem.
3.3. Storage of Sediment Cores Prior to DNA Analysis
Inadequate handling during coring and subsequent storage of sediment samples can
have unexpected consequences such as (i) degradation of DNA by fast-growing bacteria
that use nucleic acids as a substrate or due to hydrolysis and oxidation and (ii) modifica-
tion of the composition of mainly the microbial DNA pool by growth. For instance, even
minimal exposure to oxygen results in rapid fungal and bacterial growth. Even storing
Quaternary 2021, 4, 6 14 of 61
sediment in well-sealed conditions at 4 °C can clearly affect the reliability of the sedaDNA
signal, particularly when studying past microbial diversity with e.g., 16S or 18S rRNA
metabarcoding. The 16S rRNA gene amplicon data from the case study A2 (Appendix A)
highlight the effect of secondary anaerobic growth on the DNA signal observed from sed-
imentary archives. In particular, these data show that microbial seed banks can reactivate
or alternatively, that new microbes can colonize the sediments shortly after exposure to
oxygen (Figure 3). In the two cases reported here, freeze–thaw cycles (Figure 3A) and ox-
ygen diffusion (Figure 3B) increased the proportion of extractable DNA that mainly orig-
inated from growth of facultative anaerobic and metabolically versatile Gamma proteo-
bacteria. Therefore, in 16S rRNA amplicon DNA inventories, secondary growth can in-
duce a significant bias in the signal from past and present bacterial communities in the
sediment (Figure 3B, Appendix A).
Figure 3. 16S rRNA gene taxonomic diversity in sediment samples showing secondary growth. (A) Relative abundances
(%) of 16S rRNA genes in anoxic abyssal clay from the Northern Atlantic Ocean subjected to repeated freeze–thaws that
led to oxidation of pore water ammonia with secondary growth during successive biological replicates (1 to 4). (B) Relative
abundances (%) of 16S rRNA genes in ferruginous sediments from 0.5 cm (1) and 7.5 cm (2) depth, with sediments stored
unfrozen under anoxia in hermetic bags (top) and those that experienced oxygen diffusion in Falcon tubes (bottom). Sec-
ondary growth occurred due to pore water oxidation during the 4 months of sample storage, which also resulted in higher
intracellular DNA concentrations in the aforementioned oxidized (light green) than pristine (dark green) (light green)
ferruginous sediment samples.
Overall, it is likely desirable to split sediment cores lengthwise into two halves—one
for other analyses (e.g., geochemistry, dating) and one for DNA analysis. Additionally, it
is prudent to sub-sample sediments immediately after opening the core and then store
subsamples frozen at 20 °C or below. However, freezing is not needed in all cases and
storing cores at 4 °C under a protected, oxygen-free atmosphere in tightly sealed contain-
ers may improve their preservation by avoiding unnecessary freeze–thaw cycles. For ex-
ample, the pristineness of DNA extracted two years after sediment core sub-sampling was
validated by using qPCR assays targeting facultative anaerobes [154]. These results ensure
that metagenomic information can be interpreted in terms of past microbial processes in
the water column at the time of deposition. Similarly, a study of vascular plant sedaDNA
from lake sediment cores that were retrieved in 2009, left untouched and stored at 4 °C
until being sub-sampled in 2014 yielded a detailed floristic sedaDNA record [26,27]. Other
cores stored for 5–10 years at 4 °C have been successfully used to recover plant DNA
[16,181]. Some early studies [83,105,182] stored the sediment samples in Queen’s Tissue
Buffer, but generally, the use of chemical DNA preservatives, such as ethanol, Longmire’s
lysis buffer or RNA later for sediment subsamples is not common for studies of sedimen-
tary ancient DNA. While a dedicated test has not been published to the best of our
knowledge, the addition of chemical preservatives can potentially cause problems in
DNA extraction, and would introduce a further potential source of contamination.
3.4. Number of Analytical Replicates to Perform for sedaDNA Research
Similar to field replicates, the use of analytical replicates for both DNA extraction and
PCR amplification is required for detecting taxa with low detection probabilities like rare
Quaternary 2021, 4, 6 15 of 61
species (i.e., specific fish populations) or species that are remotely located (e.g., terrestrial
mammals). In order to increase the probability of detection, processing a number of rep-
licates has proven to be helpful [39,183,184]. Numerous sedaDNA studies have included
replicate samples at the extraction and/or PCR steps [16,24,25,31,35,39,49]. In the context
of studying taxa with low detection probabilities (e.g., mammals), Ficetola et al. [183] rec-
ommended the use of at least eight PCR replicates. For catchment vegetation, comparison
to macrofossils [24] and vegetation surveys [24] have shown that one positive PCR out of
four or eight PCR replicates, respectively, may represent true positives (i.e., the presence
of the desired targeted DNA in the environmental sample). However, it was also men-
tioned that increasing the number of analytical replicates can increase the probability of
false positives [185].
The use of analytical replicates to assess the diversity and composition of planktonic
communities from sedaDNA has been shown to provide a highly consistent assemblage
composition of dominant taxa e.g., [34,38,76,141]. For instance, Ibrahim et al. [38] revealed
similarity between DNA inventories obtained from triplicate extraction replicates in terms
of community structure of microbial eukaryotes, diatoms and cyanobacterial assemblages
obtained from a sediment record covering the last 100 years. Although the relative abun-
dances of dominant microbial molecular taxa do not vary much between analytical repli-
cates, it is noticeable that the proportion of shared taxa can be relatively low (<40% shared
taxa between extraction replicates; [34,107]). Such low levels of consistency can be caused
by the detection of numerous rare taxa, illustrating the generic challenges involved in es-
timating absolute richness from DNA metabarcoding data [184].
3.5. Tracing Contamination of sedaDNA Samples
The process of recovering sedaDNA must be carried out following guidelines for an-
cient DNA, due to high contamination risks e.g., [186]. There are many ways in which
sedaDNA samples can be contaminated, from the time of field sampling to sequencing.
Contamination can originate from the equipment and consumables used during the sam-
pling collection, sediment core extrusion and core splitting, but also derive from compro-
mised cleanliness in the ancient DNA laboratory, non-sterile laboratory reagents and in-
sufficient precautions taken by the operator(s) to avoid cross-contamination and introduc-
tion of exogenous DNA sources. Sterile tools, clean working environments, appropriate
clothing (single-use suit, gloves, mask, hairnets) and molecular biology grade reagents
that may need to be decontaminated with UV irradiation can help to minimize the intro-
duction of exogenous modern DNA.
Because modern DNA molecules are intact and have typically limited post-mortem
damage while are at the same time normally present in higher concentration than ancient
DNA, there is always a risk that such “false targets” are amplified during PCR. It should
also be noted that samples with the lowest concentrations of sedaDNA are more suscep-
tible to contamination, because of lower competitiveness during PCR [39]. PCR amplicons
generated by qPCR or metabarcoding are particularly insidious forms of contamination,
as they are highly concentrated and indistinguishable from authentic results. For this rea-
son, it is crucial that pre- and post-PCR facilities are physically separated and strict pro-
tocols are used in the way reagents and personnel transit between these areas [164]. Con-
tamination by modern DNA is more likely when studying microbial, fungal, and human
aDNA [113], or plant species that are widespread and/or used for furniture/building con-
struction such as pine and spruce [25]. Even when all precautions are taken, and quality
control procedures are followed (i.e., integrating all the required negative controls and
taking multiple replicates), the authenticity of sedaDNA sequences can be difficult to
demonstrate especially from metabarcoding data (see also Section 3.8), and the bioinfor-
matic filtering procedure required is not always straightforward (see also Section 3.10).
One way to trace contamination that has occurred during coring activities is to spray
or paint the coring equipment with an artificial DNA tracer like DNA extracts or ampli-
Quaternary 2021, 4, 6 16 of 61
cons of a plasmid or an exotic species not likely to be present in the original sample en-
suring that only contaminant-free internal parts of the core are analyzed [16,62]. During
subsampling, DNA extraction, PCR amplification and library preparation, negative con-
trols are always necessary to track potential contamination and can be used to filter DNA
inventories for potential contaminants [14,25,66,185,186]. Positive PCR controls are largely
used in environmental DNA research to verify the success of the molecular biology pro-
cedure and evaluate the presence of sequencing errors. For ancient DNA analyses, they
should be avoided or used in a laboratory physically separated from the ancient DNA
areas, to avoid potential cross-contamination [16]. Finally, the use of occupancy-detection
models is a new approach for estimating the frequency of false positives and can be in-
formed by the results of negative controls [187,188].
3.6. DNA Extraction Methods for sedaDNA Research
The DNA extraction method used may influence the DNA signal obtained from sed-
iments. The PowerSoil, PowerMax, and UltraClean DNA Isolation kits (Qiagen) have so
far been widely used by terrestrial and aquatic molecular ecologists (Figure 4), while var-
ious other kits and custom protocols have also been used by a number of studies. Based
on our review (Table S1), the choice of DNA extraction protocol appears to be driven
largely by the prior success of the research group with a particular kit. As more studies
comparing and/or optimizing extraction methods are published (case studies described in
Box 1, In [189]), the importance of selecting extraction methods optimized for the different
sediment types become apparent.
Figure 4. DNA extraction protocols used in the 160 publications compiled in Table S1.
DNA deposited at the water–sediment interface can be either extracellular (exDNA)
or intracellular (inDNA) [13,22,103] and different DNA extraction protocols can be used
to preferentially extract the two fractions. InDNA is likely to be more protected inside
protective resting stages such as cysts or spores, cell membranes, or lignins, but can also
be attacked by nucleases present in the cells. In contrast, exDNA released into the envi-
ronment after cell lysis can be quickly adsorbed on clay minerals, which significantly pro-
motes preservation by decreasing chemical and physical degradation processes [101] and
making its molecules less accessible as a food source for indigenous sediment bacteria
[190]. One of the protocols that allows desorption of exDNA fragments bound to minero-
genic particles is described in Taberlet et al. [191] named hereafter the Taberlet2012 pro-
tocol. This protocol relies on the use of a phosphate buffer and it is coupled to the Nucle-
oSpin soil kit protocol (Macherey-Nagel). This approach has been successfully used in
several recent lake sediments studies (e.g., [31,32,39,51] but see also [25]). However, no
Quaternary 2021, 4, 6 17 of 61
explicit comparison has been made between molecular inventories obtained from total
DNA (inDNA and exDNA), inDNA and exDNA. Here, we provide three case studies—
A3, A4 and A5 that highlight differences and similarities using extraction protocols tar-
geting the different fractions of the DNA (see Box 1, Appendix A).
The physical and chemical properties of sediments can have strong effects on the ef-
ficiency of DNA extraction from sediments. For instance, clay minerals and calcite both
bind tightly to DNA [101,102,192] but require different extraction approaches to maximize
DNA recovery. In the case study A6, we show that carbonate-rich sediments yield lower
amounts of sedaDNA than organic-rich sediments using a conventional extraction proto-
col (PowerSoil kit) and an optimized extraction protocol can greatly improve vascular
plant DNA recovery from carbonate-rich samples. (Box 1, Appendix A). The co-extraction
of inhibitors, particularly from organic-rich sediments, can cause major problems for
downstream analyses. Although new DNA extraction methods are being developed to
reduce the effects of inhibitors [189], further refinements and a better understanding of
the nature of inhibition are still necessary. An alternative approach to reducing the con-
centration of inhibitors in organic-rich sediments is to include an additional clean-up step
(e.g., [193]). This was shown in case study A1, using the OneStep PCR inhibitor removal
kit (Zymo Research) after the PowerSoil kit, which successfully reduced the level of inhi-
bition with only a limited loss of DNA (mean DNA recovery of 91%, see Appendix A).
To reduce reagent usage in downstream procedures, it may be desirable to concen-
trate the supernatant or eluate, either during or after DNA extraction, respectively. To
achieve this, a concentration step using spin columns can be added, either just after the
desorption when using the phosphate buffer or our carbonate-optimised approach or at
the end of the extraction when using the PowerMax kit. This approach has been recently
applied to sediments [29,35,42,53,62,81,154] and in the case study A7 we provide new data
showing its efficiency (Box 1, Appendix A).
Altogether, the findings of our case studies allow us to provide recommendations
about the type of DNA extraction protocols to use for specific ecosystems and target taxa.
Ecosystems with different features—lake basin and catchment—can result in different rel-
ative abundances and richness because the studied biological communities and poten-
tially sources of origin are different. Additionally, contrasted sediment lithologies (e.g.,
minerogenic vs. organic-rich sediments) can influence the extractability of sedaDNA and
thus the DNA signal recovered from sedimentary archives. For instance, carbonated post-
glacial lake sediments should be extracted with a DNA extraction protocol optimized for
DNA extractability to maximize the recovery of plant richness from the sedimentary ar-
chives (case study A6). Furthermore, sediment records with changing lithologies over
time can lead to different efficiency in DNA extractability and PCR inhibitions (case study
A1). Conversely, different extraction protocols may yield significantly different relative
abundances and richness in taxa and genes, and should therefore be selected following
preliminary analyses for suitability, notably where the heterogeneity between sample
types and reconstructed assemblages is expected to be important. In most cases, it is better
to use a DNA extraction protocol that extracts both intracellular and extracellular DNA,
such as NucleospinSoil, Powersoil and PowerMax kits, including a physical lysis step for
improved detection of taxa, regardless of the targeted group (case studies A3, A4 and A5).
While no test suggests the need for concentrating DNA to successfully recover the se-
daDNA signal from relatively abundant and small organisms such as microbes (case stud-
ies A3 and A4), the optimization of DNA extractions protocol to increase DNA amounts
is recommended when targeting remotely located and not abundant organisms such as
terrestrial mammals (case study A7). While our case studies show that sediment type and
extraction protocol may have a significant impact on taxonomic composition recovered
from ancient sedaDNA, much like in the modern environmental DNA literature, advances
in measuring and modelling the taxonomic bias inherent to the metagenomic experi-
mental workflow towards more quantitative results are promising [194].
Quaternary 2021, 4, 6 18 of 61
Box 1. Optimizing DNA extraction protocols for molecular paleoecology.
In this box we describe the main findings of five case studies and for each of them additional information is presented
in Appendix A. The case study A3 compares eukaryotic inventories obtained using a protocol extracting both exDNA
and inDNA (NucleoSpin protocol) and a protocol favoring exDNA extraction (Taberlet 2012 protocol). We showed
that the composition of eukaryotes varies depending on the extraction protocol used, even when considering the high
variability in the signal recovered from each lake (Figure 5A). This is also the case in terms of richness (Figure A3 in
Appendix A). One striking finding is that the extraction of total DNA with the NucleoSpin protocol appeared to be
more efficient for detecting rotifer DNA than the Tarberlet2012 protocol for extracting exDNA, most likely because
of the improved extraction of DNA from rotifer eggs with the lysis buffer from the NucleoSpin protocol. The case
study A4 revealed that the qPCR amplification of several aquatic and terrestrial taxa targeted (bacteria, diatoms,
eukaryotes, plants, arthropods and vertebrates) led to similar amplification levels (low quantification cycle (Cq) val-
ues correspond to higher amplification success)—when comparing inDNA and exDNA fractions obtained from a
modified Powersoil protocol, with the exception of arthropods that were found to be amplified preferentially from
intracellular DNA (Figure 5B). In addition, the use of the unmodified Powersoil protocol for extracting total DNA
was more efficient for detecting and amplifying sedaDNA from several biological groups (average Cq values lower).
For plant DNA in surface sediments, similar results were obtained in the case study A5 across four different lakes, in
which differences in the diversity retrieved from total DNA vs. exDNA (Taberlet2012) were investigated both for use
with the NucleoSpin kit and the PowerMax Soil kit. For samples from lakes surrounded by a high taxonomic diversity
of terrestrial plants, distinct differences in the number of plant molecular taxa retrieved were observed between ex-
traction protocols, with the unmodified PowerMax kit revealing the highest number of Molecular Operational Taxo-
nomic Units (MOTUs) (Figure 5C). In our case study A6, we show that a modified DNA extraction protocol designed
to release the mineral-bound sedaDNA from calcitic minerals using EDTA-based chelation provide higher richness
estimates of plant assemblages in calcite-rich sediments but was comparable for organic-rich sediments, though with
a lower level of reproducibility (Figure 5D). Finally, while investigating the number of positive PCR replicates re-
quired for the analyses of domesticated mammal DNA (ovine and bovine) in the case study A7, we reported higher
amplification success when using Amicon filters coupled with the Taberlet2012 protocol relative to using the stand-
ard approach (Taberlet2012 protocol) (Figure 5E).
Quaternary 2021, 4, 6 19 of 61
Figure 5. (A) Case study A3. Proportion of reads within microbial eukaryotic and metazoan assemblages for DNA ex-
tracted with the NucleoSpin protocol (intracellular and extracellular DNA) and Taberlet 2012 protocol (extracellular DNA)
from sediments collected in four lakes (Bourget (LDB), Geneva (LEM), Lauzanier (LAZ), Serre de l´Homme (SDH)). For
each extraction protocol, two DNA extracts were obtained. (B) Case study A4. qPCR Amplification success (quantification
cycle (Cq) value) of DNA extracts obtained with different extraction methods: the standard PowerSoil kit protocol (PS
protocol), a Powersoil protocol coupled with Phosphate Buffer to extract the extracellular (exPS protocol) and intracellular
DNA (inPS protocol) fractions. Lower Cq values imply a higher qPCR amplification success (i.e., a lower number of PCR
cycles necessary to detect a signal above the fluorescence background). (C) Case study A5. Number of plant molecular
taxa identified in six surface sediments after DNA extraction with four different protocols: FastDNA
®
Spin Kit for soil
(FD), Favor Prep Soil DNA Isolation Midi Kit (FP), Nucleospin Soil Kit (NS) and PowerMax Soil DNA Isolation Kit (PM).
The samples were obtained from three small lakes from the Southern Taymyr Peninsula, Siberia, and from three locations
within Lake Karakul, Pamir Mountains, Tajikistan. The molecular taxa were split into three categories: terrestrial plants,
Quaternary 2021, 4, 6 20 of 61
aquatic plants and bryophytes. (D) Case study A6. Detection of plant molecular taxa within six tested layers using an
original protocol using Qiagen DNeasy PowerSoil kit (Powersoil protocol) and an optimized protocol for calcite-rich sed-
iments (optimized protocol). M, minerogenic-rich; O: organic-rich; M-O: minerogenic–organic. Taxa are ordered and col-
oured by functional group as shown with the bars on the extreme right: brown, woody taxa; yellow, graminoids; dark
green, forbs; light green, ferns/horsetails/club mosses; dark blue, aquatics; gray, non-vascular plants. (E) Case study A7.
Detection of domestic mammals (Bos sp. and Ovis sp.) in positive PCRs in two sediment samples from the Anterne Lake
(France) after DNA extraction with the Taberlet2012 protocol and a modified version of this protocol including a step with
Amicon®ultra-15 10k centrifugal filters (Millipore) to concentrate the DNA extract (AmTaberlet2012 protocol). The two
samples were dated to the Roman age (2010 cal. years BP) and very recent years (–56 years BP). A total of 16 and 8 PCR
replicates were performed for the Taberlet2012 protocol and AmTaberlet2012 protocol, respectively.
3.7. Sediment Amount to Use for DNA Extraction
It makes sense to take as much sediment as possible to avoid anugget effect,
whereby DNA is heterogeneously distributed, and have a sample that best represents the
variation present within a sample, although this depends on how heterogeneous the sed-
iments are and if larger samples have been homogenized before subsampling for extrac-
tion. Furthermore, trade-offs need to be made with other complementary analyses that
also require sediments. Many DNA extraction kits commonly used in sedaDNA studies
(e.g., DNeasy PowerSoil kit, NucleoSpin soil kits) are limited to smaller amounts of sedi-
ment (<2 g wet sediment). However, one of the most widely used extraction kits (Figure
4) is the DNeasy PowerMax Soil DNA Isolation kit, which can use up to 10 g of starting
material. By contrast, the Taberlet2012 protocol allows the extraction of DNA from a large
amount of sediment (e.g., 15 g of wet sediment) to which phosphate buffer is added. A
recent study by Kang et al. [195] showed that sediment mass input (0.5 g vs. 10 g) did not
affect the resulting diatom richness or community structure inferred from metabarcoding.
The case study A3 found differences in the total DNA concentration of eukaryotic se-
daDNA extracted when applying the Taberlet2012 protocol with different sediment
masses (0.75 and 4 g of wet sediment). Although only four lakes were examined, higher
sediment masses did not consistently lead to higher total DNA concentrations in the ex-
tracts (Figure A3 in Appendix A). Secondly, the predominant OTUs differed strongly be-
tween the two sediment masses for both micro-eukaryotic and metazoan groups: in lake
Lauzanier (LAZ), diatoms and rotifers were highly abundant in 0.75 g samples, whereas
Cercozoa, nematodes, and unclassified Stramenopiles and Metazoa were predominant in
4 g of material. In contrast, extraction using phosphate buffer (Taberlet2012 protocol) re-
sulted in a lower number of reads, poorer repeatability, and less diversity detected com-
pared to PowerMax Soil kit both for minerogenic [24] and organic sediments [25].
3.8. Molecular Methods for Generating sedaDNA Data
The method used to generate sedaDNA data is first and foremost constrained by the
ecological question of the specific project. We describe below the three main methods that
are currently used in sedaDNA research.
Targeted quantitative analysis: is used to detect and/or quantify specific taxa through
the use of methods such as qPCR and ddPCR. In qPCR, target relative abundance is quan-
tified to provide information about the occurrence of historical taxa. However, inhibition
during qPCR reactions can bias this quantification. Alternatively, this bias can be utilized
to quantify inhibition (e.g., [189]). Unlike qPCR, the recently developed ddPCR does not
require standard curves and inhibition assays, due to pre-amplification partitioning of
target templates into thousands of droplets of defined minute volumes where individual
PCR reactions will take place (see [196,197] for application to modern environmental sam-
ples). For this method, the detection limit is very low, which may be advantageous given
the issues that can be present in sedaDNA extracts. Meanwhile, as in environmental DNA
studies [167], the use of these methods should be validated by providing supporting data
to confirm that the DNA amplified truly corresponds to the target. Two options include
the use of TaqMan probes, increasing the specificity of binding to the targeted marker
region, or the sequencing of qPCR products.
Quaternary 2021, 4, 6 21 of 61
DNA metabarcoding: can be used to assess the diversity and composition of specific
assemblages (e.g., plankton, vegetation, fish) [167]. This method is based on the barcoding
principle, which consists of sequencing standardised markers that are conserved enough
to be specific to the target higher taxonomic group but variable enough to contain enough
information to discriminate lower taxonomic groups, such as species or genera [198].
Available reference sequences are compiled into databases to which the metabarcode se-
quences are compared for taxonomic assignment. Metabarcoding is a robust and powerful
tool that has been widely applied in sedaDNA studies (Table S1) [11,13,74,181,199]. The
power, but also limitation, of metabarcoding is the fact that it is PCR-based. This makes it
possible to amplify minute quantities of template molecules, but also introduces PCR and
amplification biases (e.g., [200]). Because the targeted, reference databases can often be
relatively complete for specific biological groups, bioinformatics become easier to handle.
This is achieved by adding unique combinations of forward and reverse primers that in-
clude short unique tags of 6–15 nucleotides, which enables pooling of a large number of
PCR products that can be sequenced on the same run. After sequencing, the DNA reads
obtained can be demultiplexed by subsetting the DNA reads based on the tag associated
with each primer. Metabarcoding requires the template to be present in the sample, and,
if the fragments in the sample are too degraded, then an intact template, including the
primer binding sites, might no longer be present. Thus, generally short barcodes are tar-
geted, which may have lower taxonomic resolution compared to longer ones [201]. Fur-
thermore, this method does not retain the damage patterns found at or near the ends of
aDNA molecules, which are typically used to authenticate ancient DNA. This is because
primers, which are artificial constructs, would either bind to or exclude the ends of the
template molecules where damage is present. The damage signal is therefore lost during
PCR amplification. Nevertheless, it remains the most used approach to studies of many
eukaryotes as large numbers of samples can be run with comparatively low processing
efforts and costs (see [167]).
Shotgun and target-enriched metagenomics: represent nascent but promising ap-
proaches to reconstructing past biodiversity preserved in sedaDNA [29,33,62,202]. The
immediate advantage of metagenomic over PCR-based approaches is that they can re-
solve the ultrashort DNA sequences that cannot be amplified by PCR but are characteristic
of the vast majority of sedaDNA molecules. In particular, shotgun metagenomics does not
have the taxonomic biases and blind-spots that are inherent to PCR approaches, and
which may preclude this latter approach for certain ecological questions and/or taxonomic
groups.
Unlike PCR-based approaches, metagenomic approaches allow for the ends of an-
cient DNA molecules to be sequenced. This allows one to differentiate between modern
DNA (from indigenous sediment microbiota or exogenous contamination) and sedaDNA
by identifying patterns of DNA damage that accumulate via age- and temperature-related
hydrolytic and oxidative decay (e.g., cytosine deamination, depurination induced DNA
strand breakage), [29,33,62,81,118,127]. However, to retain damage patterns in meta-
genomic libraries, it is necessary that the polymerase used during the indexing PCR step
does not stall when amplifying these templates. Great care is also required when selecting
a proofreading, high fidelity polymerase because they often cannot read through uracils.
Notably, polymerases such as PfuTurbo Cx HotStart (Agilent) or AccuPrime polymerase
(Thermo Fisher) are specially engineered to be able to amplify templates containing uracil
bases with minimal bias [203].
The phylogenomic resolving power of metagenomics, either through non-targeted
shotgun sequencing or targeted enrichment via hybridization-based capture, can be har-
nessed to reconstruct population-level diversity. Schulte et al. [202] demonstrated that an-
cient Siberian larch species can be resolved by designing hybridization probes based on
contemporary chloroplast genomes. To reconstruct an ancient algal population, Lammers
et al. [81] used an iterative-mapping approach to reconstruct full organellar genomes and
were able to distinguish multiple haplogroups. Fragment recruitment strategies, as used
Quaternary 2021, 4, 6 22 of 61
by these studies, could also be deployed to reconstruct microbial ecotypes which, if match-
ing with high similarity to contemporary ecotypes that are sufficiently described, can lend
insights into the ecological niches of paleoenvironments [80].
Metagenomics is positioned to become a powerful approach to reconstructing histor-
ical lake systems through microbial functional analysis and may enable access to the func-
tional diversity of past microbiomes. For example, a novel approach to describing paleo-
oceanographic conditions through the functional analysis of Black Sea sediment meta-
genomes draws an interesting perspective on reconstructing microbial functions by lev-
eraging the depositional-age functional traits that survive in the metabolisms of extant
sediment microbiota [193].
When sufficiently comprehensive genomic reference databases are available, the de-
tection of species may be dramatically enhanced [29]. With the growth of these databases
the potential for identifying species using the shotgun metagenomics approach will
greatly increase. One such promising initiative is the recently launched Earth Biogenome
Project setting out to provide a high-quality genome inventory of eukaryotic life on earth
[148]. Currently, there is a strong bias towards mammals, agricultural and medicinal
plants, and pathogenic microbes in available databases. However, this will change as an
increasing number of large-scale genome skims data—i.e., shallow shotgun sequencing of
genomes to reconstruct multicopy markers and organellar genomes—become available
[204,205]. Thus, we expect metagenomics-based approaches to be increasingly used in the
near future.
Published comparisons between methods: In the wider molecular ecology literature,
comparisons between these methods have been performed on environmental DNA sam-
ples and mock communities. For example, Wood et al. [206] found that ddPCR had the
highest detection rate of the Mediterranean fanworm in water and biofouling samples
when compared with qPCR and metabarcoding (which had the lowest detection rate) ap-
proaches. qPCR was also found to have a higher detection rate over metabarcoding by
Harper et al. [207] in an eDNA study; however, other studies have emphasized the power
of metabarcoding for discovering members of the community which are not anticipated
at the stage of selecting more specific primers [208]. The choice between metabarcoding
and metagenomics is equally blurred. Studies have generally shown that metabarcoding
provides slightly more accurate assignment [209,210], with higher detection frequencies
[211], but may also generate more spurious Molecular Operational Taxonomic Units (MO-
TUs) than metagenomic approaches. We note that the performance of metagenomics will
almost certainly improve further as genomic reference databases become more complete.
Differences in taxonomic composition can be observed between metabarcoding and met-
agenomics datasets, especially if the DNA is very degraded [189,212]. In all cases, com-
parisons will be context-dependent, relying on parameters such as DNA degradation, se-
quencing depth, primer selection, detection thresholds, available reference libraries, and
appropriate use of controls.
3.9. DNA Markers and Reference Databases Used in Current sedaDNA Research
sedaDNA from lake sediments is degraded into short fragments. For instance, shot-
gun libraries from Holocene lake sediments consisted mainly of DNA reads between 30
and 100 bp [33,62], although fragments up to 560 bp have been found in 10,000-year-old
sediments in the Black Sea ([11]. Thus, targeting long barcodes (typically >150 bp; [213]),
which generally have higher taxonomic resolution [198,214,215] is not a viable option for
sedaDNA approaches, even if shorter barcodes generally have lower taxonomic resolu-
tion [201,216,217]. An alternative is to use multiple barcodes and compare the results [48].
If shotgun metagenomics is used, then a representative sample of all DNA fragments pre-
sent in an extract is sequenced, independent of the length.
Nevertheless, as mentioned in Section 3.8, there will be a bias in the data related to
representation in the reference databases [29]. This bias increases the risk of random
matches to model organisms that are well covered in reference libraries [81], although
Quaternary 2021, 4, 6 23 of 61
fortunately, methods have recently been developed to address this issue (e.g., [218]). For
any of the methods, there is a real risk of false positive assignment due to incomplete
reference libraries or inappropriate filtering (see Section 3.10). This risk is inversely related
to the taxonomic resolution of the marker, as there is a higher chance of obtaining incorrect
assignments for a conserved sequence than a highly specific one. An error in the reference
library will furthermore have a greater effect the more conserved the marker is.
Here, we propose DNA markers that can be used to study sedaDNA for various types
of organisms (Table 1). For each type of organism, we identify the current most-suitable
DNA marker to use but acknowledge that as overall genomic resources increase
[148,219,220], the potential for developing new markers, or using full organellar and/or
nuclear genomes, also increases.
Table 1. Recommended DNA markers for sedaDNA research. For sedaDNA applications, we recommend that markers of
<150 bp are targeted (see Section 3.9). Amplicon length includes primer binding sites. Microbial eukaryotes include uni-
cellular fungi.
Organisms Genic Regions Length (bp) References
plants tnrL P6 loop 49–188 [201]
mammals 16S mtDNA 60–84 [31]
fish 12S gene 163–185 [221]
zooplankton 18S V7 100–110 [222]
zooplankton COI gene 313 [223]
microbial eukaryotes 18S V1-V3 560 [11]
microbial eukaryotes 18S V4 300–540 [78]
microbial eukaryotes 18S V7 260 [76]
microbial eukaryotes 18S V9 130 [79]
fungi ITS2 region 250–500 [224]
diatoms rbcL gene 67–76 [105]
diatoms rbcL gene 577 [142]
diatoms 18S V4 400 [225]
archaea 16S gene 220 [91]
bacteria 16S gene 194 [97]
ammonium-oxidizing bacteria amoA gene 635 [91]
cyanobacteria 16S gene 400 [74]
methanotrophs 16S gene 111–200 [97]
dsDNA virus mcp gene 260 [88]
DNA markers to reconstruct past vegetation changes: Most environmental aDNA
studies of plants target the short and variable P6 loop of trnL (UAA) intron [14,44,201],
which is not a standard plant barcode (CBOL 2009, Ref. [214]). There is a well-curated trnL
(UAA) intron reference library (ArctBorBryo) containing 2445 sequences of 815 arctic and
835 boreal vascular plants [50,226], as well as 455 low-arctic bryophytes [227]. Recent
large-scale genome skimming reference libraries have been generated for Norway/Polar
regions, the European Alps, the Carpathians (n = 6655, Ref. [25]), China (n = 1659, Ref.
[204]) and Australia (n = 672, Ref. [205]). These new reference libraries give not only the
full nuclear ribosomal DNA and chloroplast genome sequences (including the P6 loop),
but allow for improved detection using shotgun metagenomics [29] and designing probes
for target enrichment [189,202].
DNA markers to detect mammalian presence in a lake catchment: Several universal
mammalian primer sets were initially designed for ancient DNA analyses from animal
remains [228], which were then applied to environmental samples, such as coprolites, fro-
zen soils, and cave sediments [43,229,230]. Ultimately, a large set of mitochondrial univer-
Quaternary 2021, 4, 6 24 of 61
sal and species-specific markers were developed [43,229,231–234]. Because it is near-im-
possible to avoid modern human contamination, universal primers have to be combined
with a human-blocking probe to inhibit the exponential amplification of human DNA
templates (such as in [235]). In lake sedaDNA studies, a new universal primer, MamP007,
leading to the amplification of a fragment of the mitochondrial 16S rRNA gene, was pro-
posed and has been applied successfully [31,35,39,51,59,60]. Because of the low number of
differences between this mammalian primer and the binding sites for avian species and
clitellates worms, as well as the generally low mammal sedaDNA template content in lake
sediments, MamP007 was also able to amplify these non-target groups of taxa [64], along
with fish taxa and amphibians [179,236]. Consequently, the mammalian DNA of targeted
species has to be considered as “rare” for lake sedaDNA applications. The scarcity of
mammalian aDNA in lake sediment archives will also depend on biomass, behavior, hu-
man practices for domestic herbivores, and as explained in Section 2.1, the transfer capac-
ity in the catchment–lake system [39,64]. The quantity of mammalian DNA can thus
strongly vary from one site to another and according to the animals of interest.
DNA markers to study the past diversity of aquatic organisms: To date, 16S rRNA
genes have been the most common target used to investigate ancient bacterial and ar-
chaeal diversity in metabarcoding studies [18,66,91,94–97,104]. For instance, 16S rRNA
and amoA genes allowed for the detection of ammonium-oxidizing Archaea and inference
of past variation in nutrient and salinity levels in lake sediments [91]. Total bacteria, type
I methanotrophs, type II methanotrophs and the NC10 phylum were traced using primers
for amplicons between 111 and 200 bp long encoding their 16S rRNA genes to study past
methane oxidation across <2000 years freshwater sediment profiles [94–97]. For cyanobac-
terial assemblages, Monchamp et al. [73,74,84] used a 16S primer set (400 bp) specifically
designed for cyanobacteria by Nübel et al. [237].
To track past changes in the diversity and composition of microbial eukaryotic com-
munities (including phytoplankton and fungi), the 18S rRNA gene V7 region marker used
in Capo et al. [76] offers a good tradeoff between taxonomic resolution and fragment
length (260 bp). It has been proven to capture past modifications related to environmental
change [34,38,76,141]. Coolen et al. [11] targeted the V1–V3 region of 18S rRNA genes
while Kisand et al. [78] and More et al. [79] targeted 18S V4 and V9 region, respectively.
Specific databases, such as PR2 [238] and SILVA [239], can be used to identify microbial
eukaryotes from any of the 18S genetic regions. In addition, the ITS region has been pro-
posed as a DNA marker for fungal specific barcoding from environmental samples [240].
Such DNA markers have been for instance used by Ortega-Arulú et al. [241] targeting ITS
1 and Tõnno et al. [242] ITS 2 to identify fungal taxonomy against the UNITE database
[243].
The detection of diatom communities in environmental DNA can be traced by tar-
geting the chloroplast rbcL [105,144,147,244] and the nuclear 18S V4 region [245] because
both markers are well represented in current reference databases, which facilitates the
assignment of sequences to lower taxonomic levels. The short rbcL metabarcode (67–76
bp) [105] has been shown to facilitate specific amplification of diatoms from tropical [105]
and arctic lake sediments [143,146,147], as well as marine deposits [246]. Moreover, the
amplification of larger (577 bp) rbcL fragments could be achieved from more recent sedi-
mentary deposits [142,247], while the amplification of the hypervariable V4 region of the
18S results in the detection of additional taxa next to diatoms [248], which include other
Stramenopiles, Alveolata and Rhizaria. A more specific diatom amplification with 18S V4
is enabled when applying the marker to filtered water DNA [249] or modern biofilm sam-
ples [250]. In addition, primers targeting a 230-bp region of the viral major capsid protein
(mcp) gene have been used to study the diversity of dsDNA algal viruses (Phycodnaviri-
dae) in Holocene sediments [88]
Complete zooplankton assemblages have not yet been extensively studied from sed-
imentary archives. Studies have targeted specific groups i.e., copepods [45,123] and roti-
fers [83], but studies employing universal primers are rare. A combination of both 18S and
Quaternary 2021, 4, 6 25 of 61
COI primers has been used to analyze eukaryotic organisms [251] but, to our knowledge,
no work has been published to specifically study sedaDNA for zooplankton assemblages
across multiple phylogenetic groups. One potential bias is that the V7 region of the 18S
rRNA gene of Daphnia (and probably other cladoceran taxa) is longer than other zooplank-
ton species [252] leading to a potential bias in the PCR amplification and/or sequencing of
such DNA markers to reconstruct past zooplankton communities. Additionally, in the
case study A3 (Box 1, Appendix A), 18S rRNA gene V7 region amplicon sequencing
showed that microeukaryotes dominated in terms of taxonomic richness (76 to 96% of the
MOTUs are represented by microeukaryotes).
There are few sedaDNA studies that have successfully applied metabarcoding tech-
niques to fish. Published studies have focused on short single-marker PCR or qPCR ap-
proaches, designed for species such as Perca flavescens [253], Coregonus lavaretus [85], and
Oncorhynchus sp. [86]. Evaluations of markers have been carried out in the eDNA litera-
ture for applications such as biomonitoring [254,255] and seafood identification [256].
Achieving a balance between specificity and breadth in a marker has proven challenging
due to the diversity of fish taxa and ecotypes; however, several studies have highlighted
the potential utility of the 12S ribosomal region. Collins et al. [254] found that a 12S rRNA
gene marker achieved higher levels of universality and taxonomic discrimination, com-
bined with lower levels of non-specific amplification when compared with COI, 16S
rRNA, and cytb genes. Although COI has a much greater reference database coverage, due
to being the universal metazoan barcode [198], many reads are assigned to non-target
groups making it difficult to obtain reproducible signals for fish taxa, which will be espe-
cially problematic given the low amounts of fish DNA compared to bacterial or planktonic
groups in sediments. The MiFish-U 12S primers, in combination with a high annealing
temperature, have been used to successfully amplify a hypervariable 163–185 bp fragment
of fish DNA in surface sediments [221], although applications to older sediments remain
uncertain.
3.10. Bioinformatic Filtering and Analysis of sedaDNA Data
While a comprehensive review of bioinformatic methods for sedaDNA goes beyond
the scope of this work, we provide here an overview of some of the main tools employed
as background to our case studies and we discuss some of the specific implications related
to study design and selection of laboratory methods recommended in Section 4. Currently,
there is no standard procedure nor criteria proposed for data “filtering” or “cleaning”,
neither for metabarcoding nor shotgun metagenomics data. This is because develop-
ments, and eventual standardization, of bioinformatic filtering criteria are still in their in-
fancy. Generally, taxa that are more frequently detected in negative controls than in sam-
ples can be considered suspect and should be removed from the data set. Regardless of
the procedure applied, “true” presences can be removed (type II error), and “false”
presences retained (type I error) [82,183]. To limit these errors, sedaDNA data can be
cross-validated by comparing it with independent proxies [22,24,39,82]. When no cross-
validation is possible, one can set the threshold where the majority of retained taxa are
biogeographically likely to be true positives whereas the taxa filtered out are not expected
in the sample [27,47]. However, such an approach has the potential to be overly conserva-
tive and bias the taxonomic inventory toward the present. The detection of a given suspect
taxon in the same sample as taxa with similar ecological preferences can also be used to
validate the presence of this suspect taxon [39]. For metagenomic datasets, taxa with very
low read counts (e.g., n = ~3–5 per sample or 1 count in each of 3–5 samples) are generally
excluded as spurious and controls thoroughly screened [62]. In metagenomic data sets, it
is also imperative that a strict minimum fragment length cutoff (e.g., 35 bp) is applied to
reduce spurious alignments [257]. Where doubt remains, it is important that the filtering
procedure does not impact the conclusions of the study [25].
A major issue with the lack of standard bioinformatic pipelines to analyze DNA
metabarcoding or shotgun metagenomics data is that slightly different procedures may
Quaternary 2021, 4, 6 26 of 61
lead to contrasting results [167,258–260]. Many sedaDNA works use OBITOOLs [261] to
analyze DNA metabarcoding data (e.g., [16,31,35,199]), using mainly the EMBL nucleotide
database or a custom reference database depending on the target taxonomic group or ge-
ographic region. This package, along with similar packages such as the Anacapa toolkit
[262], contains various functions allowing for a comprehensive control of different sec-
tions of the data handling pipeline from read alignments, cleaning and filtering (i.e., re-
moval of potential sequence variants generated by sequencing errors) to taxonomic clas-
sifications with simple user-customizable parameters. For shotgun metagenomics data
obtained from sedaDNA, an example emerging procedure is the Holi pipeline [62]. By
mapping shotgun reads to the full EMBL genomic and/or nucleotide database, which en-
compasses all realms of life, and subsequently assigning those matches to the lowest com-
mon ancestor taxa, Holi greatly diminishes the risk of misassignments [62]. In addition to
the cleaning, merging, mapping, and annotation steps provided by the tools compiled in
this pipeline, tools such as pmdtools [263] and mapDamage [264] can be used to identify
ancient DNA present in alignments to ultimately differentiate between modern and an-
cient DNA molecules ensuring, therefore, the reliability of the sedaDNA signal obtained
as per described in Section 2.3.
The choice of reference library used to annotate DNA sequences is a key component
for obtaining reliable ecological information. Indeed, a major source of false positives may
also be due to errors in reference libraries or sequence sharing combined with local species
lacking in the reference library. Hence, the choice between the use of curated and non-
curated reference databases (using more liberal alignment parameters) and the use of re-
gionally local reference databases (like that made for the Norway/Polar regions and the
European Alps) is of utmost importance to avoid misidentifications.
Details on statistical data analysis for processed sedaDNA data (univariate and mul-
tivariate) are beyond the scope of the present review. We instead direct readers to some
key publications providing relevant recommendations about this step of the workflow
[188,265–267].
4. Recommendations
In this section, we use state-of-the-art developments in lake sedaDNA research iden-
tified from our synthesis of previous studies and compiled case studies to provide meth-
odological recommendations for future sedaDNA work. Because of the high variability in
the sedaDNA signal from lakes with contrasting catchment morphologies as well as nu-
merous other factors influencing the transport of DNA from sources to sediments, its ex-
tractability and amplification, it remains difficult to provide clear and concise guidelines
about how to collect, analyze, and interpret sedaDNA data. However, we aim for this
effort to further promote and guide future sedaDNA research, which will result in more
robust reconstructions of past changes in aquatic and terrestrial biodiversity and offer
predictions of the consequences of current and future climatic and environmental changes
on biota.
Lake selection: Select the lake based on ecological questions and, if needed, adapt
protocols to improve the ability to extract and recover DNA from the studied sediments.
As a coring campaign is a major effort, especially in remote areas, we suggest, as a pilot
study, analyses of surface sediments to test the amenability of the sedimentary record for
sedaDNA analysis. If studying DNA from terrestrial organisms, it is important to first
consider the size and topography of lake catchment and the hydrographic network to es-
timate the efficiency of DNA transfer to the lake following detailed recommendations
from Giguet-Covex et al. [39]. Sediment lithology (e.g., clay mineral content) may also
impact the preservation potential of DNA in lake sediment [101], but further study is re-
quired to create a predictive framework for lithology-based site selection.
Field replicates: The ideal number of sediment cores to sample depends on the target
taxon and question. If one wants to assess the signal from unequally distributed large
organisms (e.g., fish), the use of spatial field replicates may be beneficial for the detection
Quaternary 2021, 4, 6 27 of 61
of your target species. For investigation of terrestrial or more evenly distributed aquatic
organisms such as plankton, a single core taken in the central part of the lake may suffice.
Core-site replicates: If the target organisms are rare or remotely-located, the use of
multiple replicate cores from the same site is recommended to increase the probability of
detection. In contrast, taking multiple cores is time-consuming and costly and implies the
need of inter-calibration between cores for dating. Coring with tubes with broad diameter
(e.g., 90 mm) may be an alternative to core-site replicates to increase the amount of mate-
rial collected.
Analytical replicates: When enough material (i.e., sediment) is available, use multi-
ple analytical replicates to ensure the reliability of the data obtained. When material is
limited: if targeting abundant biological organisms (i.e., plankton), two to three analytical
replicates (extraction and/or PCR replicates) should suffice to capture their sedaDNA sig-
nal; if targeting rare or remotely-located organisms, the use of at least six to eight analyt-
ical replicates is recommended. For further discussion about how to calculate the suitable
number of replicates needed for statistical considerations, we direct readers to relevant
publications [167,188].
Handling of analytical replicates: Extraction replicates can be pooled prior to PCR
step and PCR replicates can be multiplexed with the same tagged primers to reduce costs
and increase the number of reads per sample. However, such an approach will result only
in presence–absence data and will not allow for assessments of replicability or semi-quan-
tification for the results. If replicability data are desired, then it is essential that different
tagged primer sets are used so that replicates can be separated and analyzed inde-
pendently in silico.
Storage of cores and sediment subsampling: For core storage, recreate in situ con-
ditions in the sediment column (cold, dark, anoxic) as much as possible. Perform sub-
sampling immediately after core collection, or immediately after splitting the core in half.
After the core liners are cut, a wire cutter is often used to split the core, which often results
in contaminated surfaces of the core halves. It is therefore essential that these surfaces are
removed using sterile implements, so that an unexposed layer is sampled, Sterile syringes
with the ends cut off with sterilized scissors, blades or a heated wire can serve as mini
corers to obtain subsamples (e.g., procedure outlined by Pedersen et al. 2016). The col-
lected material can then be transferred to sterile tubes, while avoiding the potentially
cross-contaminated smeared surface (e.g., [18]). Directly store the subsampled sediment
at 20 °C or proceed to DNA extraction. For vascular plant and mammalian sedaDNA
analysis, the core can be stored cool, ideally unopened, for several years prior to sampling,
although secondary growth by fungi, for example, may lead to reduced efficiency of tar-
geted sedaDNA recovery. To avoid freeze–thaw cycles of the sediment, sub-sample the
sediment in multiple tubes for later extraction, if needed.
Monitoring of contamination: Consider using a synthetic DNA or exotic amplicon
tracer during coring. Follow established protocols for minimizing contamination in an-
cient DNA laboratories [164]. Include and sequence negative controls from sampling, ex-
traction, and downstream steps, as a minimum. Compare sequences/taxa that appear in
both the controls and samples—these should, at a minimum, be treated with caution or
removed entirely. Whenever possible, perform a multi-proxy approach with diagnostic
macroscopic, microscopic, cellular, and/or DNA markers for specific taxa to cross validate
the sedaDNA approach, especially if results are unexpected or ground-breaking. Use an-
alytical methods that remove putative contaminants from the dataset.
Choice of DNA extraction protocol: In most cases, the following protocols should
efficiently extract DNA from sediments: Powersoil (Pro) kit for small amounts of sediment
(approx 0.25 g), PowerMax or FastDNA kit for larger sample sizes (up to 10 g), although
the sediment type (organic-, clay-, carbonate-rich, etc.) should also inform the protocol
used. Different extraction protocols may yield significantly different richness and relative
abundances in taxa and genes, and should therefore be selected following preliminary
Quaternary 2021, 4, 6 28 of 61
analyses for suitability, notably where the heterogeneity between sample types and re-
constructed assemblages is expected to be important. For shotgun sequencing approaches
on very old sediments, it is better to use a protocol that retains ultrashort DNA fragments,
such as the protocol from Pedersen et al. [62] or Murchie et al. [189]. Combine DNA in-
ventories obtained from different DNA extraction protocols to maximize your chances to
detect a specific target and/or robustly reconstruct the studied biological assemblage.
Choice of the data generation method: It depends on the question and target organ-
ism(s) of interest. Shotgun metagenomics can give an indication about the degradation
state of the DNA molecules that can be necessary when data authentication is required
(e.g., working with microbes that can live in, or contaminate, lake sediments), but may not
be strictly necessary if interested in terrestrial organisms or aquatic macroorganisms.
However, metabarcoding and quantitative methods (qPCR and ddPCR) can be used to
process more samples at lower cost and are therefore recommended for long time series
or large data sets, provided there is sufficient sedaDNA preservation. If DNA preserva-
tion is poor and molecules of the target taxa are likely to be rare in the sedaDNA mixture,
then a target enrichment approach is recommended.
Supplementary Materials: The following are available online at www.mdpi.com/2571-
550X/4/1/6/s1.
Author Contributions: Conceptualization, E.C., C.G.-C., A.R., K.N., P.D.H., A.V., M.J.L.C., L.S.E.,
I.D., I.G.A. and L.P.; Methodology, E.C., C.G.-C., K.N., P.D.H., A.V., M.J.L.C., L.S.E., I.D., I.G.A.,
L.P., W.O., I.G.-E., A.G.B., L.H., U.H., H.H.Z., G.F.F. and P.T.; Formal Analysis, E.C., C.G.-C., K.N.,
P.D.H., A.V., M.J.L.C., L.S.E., I.D., I.G.A., L.P., W.O., I.G.-E., A.G.B., L.H., U.H., L.H., H.H.Z., G.F.F.,
M.-E.M. and P.T.; Investigation, E.C., C.G.-C., K.N., P.D.H., A.V., M.J.L.C., L.S.E., I.D., I.G.A., L.P.,
W.O., I.G.-E., A.G.B., L.H., U.H., H.H.Z., G.F.F. and P.T.; Writing Original Draft Preparation, E.C.,
C.G.-C., A.R., K.N., P.D.H., A.V., D.A., F.A., S.B. (Simon Belle), S.B. (Stefan Bertilsson), C.B., R.B.,
A.G.B., C.L.C., S.E.C., D.D., G.E., G.F.F., R.E.G., J.G., I.G.-E., L.H., U.H., A.I., V.K., K.H.K., Y.L., J.L.,
E.M., M.-E.M., FO, W.O., M.W.P., D.P.R., J.R., T.S., K.R.S.-L., P.T., L.T., C.T., D.A.W., Y.W., E.W.,
A.v.W., H.H.Z., M.J.L.C., L.S.E., I.D., I.G.A. and L.P.; Writing Review & Editing, E.C., C.G.-C., A.R.,
K.N., P.D.H., A.V., D.A., F.A., S.B. (Simon Belle), S.B. (Stefan Bertilsson), C.B., R.B., A.G.B., C.L.C.,
S.E.C., D.D., G.E., G.F.F., R.E.G., J.G., I.G.-E., L.H., U.H., A.I., V.K., K.H.K., Y.L., J.L., E.M., M.-E.M.,
F.O., W.O., M.W.P., D.P.R., J.R., T.S., K.R.S.-L., P.T., L.T., C.T., D.A.W., Y.W., E.W., A.v.W., H.H.Z.,
M.J.L.C., L.S.E., I.D., I.G.A. and L.P.; Visualization, E.C., C.G.-C., K.N., P.D.H., A.V., M.J.L.C., L.S.E.,
I.D., I.G.A., L.P., W.O., I.G.-E., A.G.B., L.H., U.H., L.H., H.H.Z., G.F.F. and P.T.; Supervision,
M.J.L.C., L.S.E., I.D., I.G.A. and L.P.
Funding: This work was supported by funding from the Knut and Alice Wallenberg Foundation
(grant 2016.0083), the Swedish Research Council for Sustainable Development Formas (grant FR-
2016/0005), the Norwegian Research Council (grant 250963/F20, “ECOGEN”), the Deutsche For-
schungsgemeinschaft (grants OR 417/1-1 granted to W.O. and VU 94/1-1 granted to A.V), the Ger-
man Research Foundation (grant EP98/2-1 to L.S.E,) and the Australian Research Council Discovery
(grant DP160102587), the RTG 164 (PhD scholarship to LH). The computations and data handling
of case studies A1 and A4 were enabled by resources provided by the Swedish National Infrastruc-
ture for Computing (SNIC) at UPPMAX partially funded by the Swedish Research Council through
grant agreement no. 2018-05973.
Data Availability Statement: For case studies A1, A3, A4, A5, A7, raw data are available at
https://doi.org/10.6084/m9.figshare.13007279.v1. For case study 1, the raw amplicon sequencing
reads are not yet published but might be shared on request. Contact Kevin Nota
(kevin.nota@ebc.uu.se) or Laura Parducci (laura.parducci@uniroma1.it). For case study A2, The 16S
data are available in SRA BioSample accessions SAMN13324854 to SAMN13324920 and from the
European Nucleotide Archive (http://www.ebi.ac.uk/ena) under study accession number
PRJEB14484. For Case study A6, raw data are available within amplicon libraries JIE4 and AOHL-
3-8 under project accession PRJEB39329 on the European Nucleotide Archive (ENA). Tag sequences
and the bioinformatic filtering scripts are freely available on Github at https://github.com/pheintz-
man/metabarcoding and https://github.com/Y-Lammers/MergeAndFilter, respectively. A table
from case study A6 is available at https://doi.org/10.6084/m9.figshare.13007279.v1.
Quaternary 2021, 4, 6 29 of 61
Acknowledgments: For case studies A1 and A4, we thank the Swedish Phytogeographical Society
(SVS) for funding this work through the B. Lundman’s fund for botanical studies scholarship. For
case study A3, we thank Cécile Chardon and Louis Jacas (UMR CARRTEL INRAE, France) for the
laboratory work of case study A3. For case study A5, we thank Claudia Havel for assistance in the
lab and Paolo Ballota, Zafar Mahmoudov and Rasmus Thiede for assistance during fieldwork. For
case study A6, we thank Francisco Javier Ancin Murguzur for assistance in the field, Iva Pitelkova
for conducting LOI, Enrique Tejero Caballo and Karina Monsen for support with XRF scanning and
high-resolution imagery, and Youri Lammers for assistance with bioinformatics. For case study A7,
we thank Ludovic Gielly for his assistance in labwork. We would also like to thank ASTERS, the
manager of the Haute Savoie natural reserves, for constant help since the onset of the palaeoenvi-
ronmental studies on Lake Anterne. We are grateful for the useful comments and suggestions from
three anonymous reviewers.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix A
Appendix A.1. Case Study A1—Effects of Sediment Type on PCR Amplification Success
By Kevin Nota 5 and Laura Parducci 5,31
When amplifying DNA using PCR, or performing any other enzymatic reaction, in-
hibitors co-extracted with DNA can have adverse effects on performance. Depending on
the concentration and type of the inhibitor and the particular enzymes, effects can be
highly variable e.g., some enzymes are more sensitive than others.
The aim of this case study is to establish if PCR inhibition of lake sedaDNA is related
to the type of sediment, here from minerogenic to organic sediments. Briefly, we sub-sam-
pled 138 sediment layers from a single varved lake sediment record Lago Grande di Mon-
ticchio, southern Italy, time period ~1993–31,190 years before present yr BP). DNA was
extracted using the PowerSoil kit (Qiagen) with some modifications (see Material and
Methods section). The PCR inhibition was analyzed using a synthetic oligonucleotide (85
bp, including unique primer binding sites) spiked in qPCR reactions with standardized
amounts (0.0001 pM). To evaluate the level of inhibition, the DNA extracts were diluted
until no PCR inhibition was detected for the standardized amount of synthetic oligonu-
cleotide. In particular, we defined the removal of the effect of PCR inhibitors when the
qPCR amplification curve of the synthetic oligonucleotide was the same as for a qPCR
reaction with ultrapure water instead of the DNA template. In addition, the effects of PCR
inhibitors on the success of a DNA metabarcoding approach (here number of raw
reads/DNA sequences) were studied using primers developed to amplify vascular plant
communities (trnL-p6 loop, [201]). The Zymo OneStep PCR inhibitor removal kit was
tested on ten samples with varying degrees of inhibition to investigate whether adding
this purification step after DNA extraction with the PowerSoil kit improves qPCR ampli-
fication success.
qPCR was used for testing inhibition, rather than using the regular PCR used for
metabarcoding. This was done because qPCR has the advantage of visualizing the PCR
amplification in real-time. This allows for scoring of the inhibition much more accurately
and reduces hands-on lab time because an agarose gel is not necessary for the visualiza-
tion of PCR success. One disadvantage of using qPCR is that these tend to be more sensi-
tive to inhibitors than commonly used polymerases for metabarcoding. Dilution factors
obtained in the qPCR are, therefore, not directly applicable to more robust PCR enzymes
which are developed to work in more challenging samples. However, the qPCR does give
a representation of relative levels of inhibition between samples. Making it highly valua-
ble information for PCR optimization, because it is possible to select the most inhibited
samples to test the metabarcoding protocol. The idea behind this is that if a PCR protocol
works for the most inhibited sample in the core, the protocol will likely also work for the
remaining samples.
The PCR inhibition was largely detected from sediments dated <25,000 years old (Fig-
ure A1). However, from 25,000 years, no clear relationship was found between the types
Quaternary 2021, 4, 6 30 of 61
of sediments varying from minerogenic at the end of the late-glacial period (~15,000 years)
to organic in the late Holocene period (c. 2000 years ago). In addition, PCR inhibition was
not significantly correlated with total organic content (TOC) (results not shown, TOC data
were available for samples between 11,000 and 22,600 yr BP) indicating that PCR inhibi-
tion is not related to the proportion of organic compounds in the sediment. One key result
of this case study is the strong negative correlation (r
2
0.659, p < 0.01) between PCR inhi-
bition and the number of DNA amplicons sequenced (Figure A2). In contrast, no correla-
tion was detected between the DNA concentration and PCR inhibition. DNA concentra-
tion was also negatively correlated to age (r
2
0.608, p < 0.01). The Zymo OneStep PCR in-
hibitor removal kit was able to reduce the dilution factor required to remove inhibition by
~50% (Table A1). Adding the extract through multiple Zymo OneStep columns, or run-
ning the extract multiple times through the same column did not improve the efficiency
in removing more inhibitors.
Overall, it is unclear what exactly the inhibition signal represents. It might be related
to the presence of certain plant taxa that produce different levels of inhibitors that are
more difficult to remove during DNA extraction. Although we do not know the cause and
origin of inhibition, we should recognize its importance during performance of DNA
metabarcoding. It is therefore very important to understand the levels of inhibition pre-
sent in sediment samples in order to either dilute DNA extracts or preferably optimize
PCR reactions to guarantee the best amplification and avoid bias during metabarcoding
analyses. Adding a Zymo purification step after extraction gives, in most cases, a reduc-
tion in inhibition but does not remove it all.
Figure A1. Correlations between (A) plant read count and PCR inhibition level (B) sediment age and DNA concentration.
Quaternary 2021, 4, 6 31 of 61
Figure A2. Comparison inhibition PowerSoil extracts with or without additional purification. The Zymo column was used
once according to kit manual (green), inhibition after using two clean columns (light blue) and running the extract three
times through the same Zymo column (pink).
Table A1. DNA recovery after cleaning with the Zymo OneStep Inhibitor Removal Kit.
Inhibition DNA Concentration in ng/µL
Sample Age (Years
BP)
(Before Clean-
ing) Before Cleaning After Cleaning Difference Recovery (%)
Mon 163 31,191 0 0.21 0.16 0.05 0.75
Mon 157 28
,
151 0 3.94 3.06 0.88 0.78
Mon 144 21,676 1 0.25 0.2 0.06 0.78
Mon 136 17,467 5 0.38 0.37 0.01 0.98
Mon 127 14,476 5 0.42 0.42 0.01 1.02
Mon 121 14,181 5 0.38 0.38 0.01 0.98
Mon 115 13,933 20 0.37 0.38 0.01 1.04
Mon 100 12,779 2 0.66 0.6 0.06 0.91
Mon 064 10
,
271 10 0.89 0.88 0.01 0.99
Mon 063 10,008 2 0.83 0.81 0.02 0.98
Mon 050 7510 10 1.26 1.16 0.1 0.92
Mon 047 6964 5 1.07 0.97 0.1 0.91
Mon 040 5462 2 1.75 1.52 0.23 0.87
Mon 027 3425 10 1.28 1.17 0.11 0.91
Mon 025 3054 1 2.9 2.5 0.4 0.86
Mon 020 2061 1 2.38 2.07 0.31 0.87
Mean 0.91
Material and Methods
DNA extraction—All 138 sediment samples were homogenzsed by vortexing before
DNA extraction. DNA was extracted using the PowerSoil kit (Qiagen) with some modifi-
cations. After bead-beating, 2 µL of 20 mg/mL Proteinase K and 25 µL of 1 M Dithiothreitol
(DTT) was added to bead-beating tubes and incubated overnight at 37 °C in a rotating
incubator, the volume of Solution C3 was increased from 200 µL to 250 µL, the solution
C4 was increased from 1200 µL to 1400 µL, samples were incubated for 10 min at room
temperature before eluting twice in 60 µL elution buffer containing 10 mM Tris-HCL and
0.05% Tween-20; Bead-beating was done on a “normal” vortexer at the highest speed for
10 min by taping the tubes horizontally on the vortex.
Inhibitor control template design—The oligonucleotide used as template for the in-
hibition control was designed by generating a random 85 nucleotide fragments using R
with a random sequence with a GC content close to 50%. Candidate sequences were
checked for secondary structures using the IDT oligo analyzer
(https://eu.idtdna.com/calc/analyzer). Sequences that did not show any secondary struc-
tures were blasted against all sequences present in GenBank. We selected and further used
only sequences with no matches in GenBank. The primer binding sites were manually
edited to create the most “optimal” primer pair (Table A2). The 85 oligonucleotide
“Inh_Nota85” nucleotide oligos were synthesized as primers, diluted to 0.0001 pM and
used as the template in qPCR.
Table A2. Nucleotide sequences of primers (In_CF_85, In_CR_85) and the 85 bp oligonucleotide (Inh_NotaTemplate_85)
designed to be used as an inhibition control. Primer binding sites are underlined. Tm is the annealing temperature.
Oligo Name Sequence GC% Tm (°C)
In_CF_85 ACGGAGTGCGGTCTTAATGG 55 57.3
In_CR_85 GGTACGGGTCTGTCGGATAG 60 56.5
Quaternary 2021, 4, 6 32 of 61
Inh_Tem-
plate_Nota85
ACGGAGTGCGGTCTTAATGGCGTTCAATT-
GCGTTAATTGACGGCTCGAG-
TGTCCCCTACATCTTGCTATCCGACAGACCCG
TACC
52.9 72.7
Inhibition testing with qPCR amplification—All qPCR reactions were run on the
same qPCR machine (CFX96, BioRad). qPCR amplification was performed in 10-µL reac-
tion volumes containing 1X TATAA SYBR® GrandMaster Min, 0.5 µM forward primer,
0.5 µM reverse primer, 0.1 pM inhibitor control oligo. The thermocycling program was as
follows: 30 s at 95 °C, followed by 50 cycles, of 5 s at 95 °C, 30 s at 55 °C, and 10 s at 72 °C.
After the qPCR cycles a melt curve was obtained by increasing the temperature from 60
to 95 °C, with 0.5 °C increase every 5 s. qPCR amplifications were tested using: 3 µL un-
diluted DNA extracts, 1 µL undiluted extracts, and 1 µL extracts diluted 2X, 5X, 10X, 15X,
or 20X. The dilution factor was used as inhibition value (e.g., inhibition score of 20 was
given to samples that first amplified as expected with a 20x dilution).
Metabarcoding—Metabarcoding was done using the trnL g/h primers [201]. PCR re-
actions were done in 20-uL reactions containing the following, 0.04 U/µL Platinum™ II
Taq Hot-Start DNA Polymerase (Invitrogen), 1X Platinum™ II PCR Buffer, 0.2 mM of each
dNTP, 0.2 µM tagged forward primer, 0.2 µM tagged reverse primer, and 4 µL of DNA
extract. The PCR Thermal cycler program was as follows, 94 °C for 2 min, followed by 55
cycles of 94 °C for 30 s, 55 °C for 30 s, 68 °C for 15 s. Both forward and reverse primers
contained eight nucleotide tags at the 5 ends. We used 96 dual unique tags in both pri-
mers. PCRs were set up in 96 well-plates, each plate contained 2 × 40 samples randomly
distributed in the plate, 4 PCR negatives, 2 × 4 extraction negatives; 4 wells were kept
empty to control for background primer contamination (4 tagged primers are not used,
their presence in the sequences would indicate primer contamination). In total 8 × 120
samples were amplified over 12 96-well plates. From all PCR reactions on a plate, 10 µL
was taken and pooled together. Three times 100 µL pooled PCR products were purified
using the MinElute PCR Purification kit (Qiagen), DNA was eluted in 20 µL and the three
purifications from the same pool were pooled together. DNA was quantified using Qubit,
and ~100 ng of pooled DNA was used for library prep using the Carøe et al. [268] single-
tube protocol with the flowing modifications: reaction volume was increased to 40 µL,
after adapter ligation a min-elute clean-up was done instead of after the fill-in step to re-
move adapter dimers, no clean-up was done after the fill-in step. Index PCR contained the
following, 2.5-U Pfu Turbo polymerase (Agilent Technologies), 0.2 mM of each dNTP, 1X
Pfu reaction buffer, 0.2 µM of both index primers [269]. Each plate was indexed with
unique dual indexes, plates 1–4, 9–12 were indexed for 10 cycles, and plates 5–8 for 16
cycles (low copy number of libraries, highly likely due to PCR inhibitors). Plates 1–4, 5–8,
and 9–12 were pooled together in equal volume and sequenced on 3 MiSeq lanes using v2
300 cycle paired-end chemistry (Table A3). Sequencing was done at SciLifeLab, Uppsala
(Sweden).
Zymo OneStep Inhibitor Removal Kit—For 10 samples ~100 µL of the extract was run
through the Zymo inhibitor removal kit following the instructions of the manufacturer.
For two of the samples, the extract was run through the inhibitor removal column three
times. Inhibitor tests as described before were run after every flowthrough. For another
two samples, the extract was run through two separate Zymo columns and inhibitor re-
moval assay was run after every flowthrough. For 16 samples the DNA concentration was
measured with Qubit (ds DNA high sensitivity, Invitrogen) before and after purification,
see Table A1. The mean recovery of DNA after purification was 91%. For some samples
the recovery was nearly 100%; however, the three oldest samples had a recovery between
0.75 and 0.78.
Table A3. Metabarcoding plate set-up.
Quaternary 2021, 4, 6 33 of 61
Plate Sample Age Range Number of Samples Extraction Negatives PCR Negatives Index Cycles
Lane 1 14,149–31,191 yr BP 4 × 2 × 40 4 × 2 × 4 4 10
Lane 2 11
,
520–14,132 yr BP 4 × 2 × 40 4 × 2 × 4 4 16
Lane 3 1993–11,458 yr BP 4 × 2 × 40 4 × 2 × 4 4 10
Appendix A.2. Case Study A2—Secondary Growth of Metabolically Versatile and Facultative
Anaerobes during Sediment Storage and Handling of Sediment Cores
By Aurèle Vuillemin 7,8 and William D. Orsi 7,8
Anoxic sediments from the abyssal Northern Atlantic Ocean and ferruginous Lake
Towuti, Indonesia, experienced inadvertent short exposure to oxygen during sampling,
storage and aliquoting. This resulted in substantial secondary growth of metabolically
versatile and facultative anaerobes in the samples. Oxidation of pore water redox sensitive
species is difficult to monitor and has the consequence of replenishing energetically more
favorable terminal electron acceptors for microbial respiration (i.e., O2, NO32–, Fe3+, SO42–)
from their initially reduced counterparts (e.g., Fe2+, NH4, H2S, H2, CO2). Although faculta-
tive anaerobes are normally minor in pristine samples, fast growers are known to outcom-
pete slow growers in sediment incubations [270] and thereby rapidly mask the taxonomic
assemblage corresponding to in-situ conditions. Here, we briefly report two cases of sec-
ondary growth in anoxic sediments due to unexpected pore water oxidation.
In the first case, the inner part of a piston core consisting of anoxic abyssal clay was
subsampled on board using sterile end-cut syringes that were directly frozen in liquid
nitrogen. The end of the syringe was systematically discarded [271]. We extracted and
sequenced total DNA in four successive biological replicates [160]. Although sediments
were immediately stored back at 80 °C, pore water oxidation occurred due to partial
thawing of the sample, which allowed secondary growth to resume briefly with each
freeze–thaw cycle (Figure 3A in main text). The taxonomic assignment pointed to micro-
organisms from the water column that remained viable in the sediment, for instance Al-
teromonas and Alcanivorax among Gammaproteobacteria, Comamonadaceae (e.g., Vario-
vorax, Hydrogenophaga) among Betaproteobacteria and Sphingomonadaceae (e.g., Sphingo-
bium, Sphingomonas) among Alphaproteobacteria. These taxa are all known hydrogen-ox-
idizing oxalotrophic bacteria, using oxalate (C2O42) as both carbon source and energy
[272].
In the second case, anoxic iron-rich sediments were retrieved via gravity coring and
aliquoted into successive 2-cm and 5-cm sections inside a glove bag filled with nitrogen
[104]. The sediment rims were discarded using a sterile spatula, the remainder of each
section transferred into aluminum-foil bags flushed with nitrogen and hermetically heat-
sealed. The upper samples of one core were transferred into Falcon tubes instead and
sealed with plastic foil. For DNA extraction, sediments were pressed out through a hole
poked in the bag, not letting any air in, and heat-sealed again, whereas Falcon tubes were
not hermetic enough to fully prevent oxygen diffusion into the sample. Because bottom
waters in tropical Lake Towuti are 28 °C throughout the year [104], we stored all samples
at room temperature to keep them close to in situ conditions. Ammonia is also a redox
sensitive species in pore water, whose oxidation replenishes energetically favorable elec-
tron acceptors, mostly nitrate and nitrite. The subsequent reduction of these electron ac-
ceptors via denitrification can be coupled with microbial oxidation of iron, implying that
secondary growth can still proceed once anoxic conditions are restored in the sample.
Here, we compare pristine and oxidized sample replicates (Figure 3B in main text) and
show that secondary growth resulted in a doubled amount of extracted intracellular DNA
[159]. The main taxa that actively grew during the four months of sample storage were
identified as Betaproteobacteria inclusive of iron-oxidizing Gallionella, facultative photo-
heterotrophic anaerobe Rhodocyclus. These strains perform iron oxidation coupled to ni-
trate reduction and are often misinterpreted as obligate instead of facultative anaerobes.
In addition, obligate anaerobic Deltaproteobacteria and Firmicutes such as Desulfovibrio,
Quaternary 2021, 4, 6 34 of 61
Desulfosporosinus and Pelotomaculum also grew in the samples due to metabolic versatility
in the use of terminal electron acceptors (e.g., NO3, Fe3+, SO42, S2O32) [153].
To conclude, we draw attention to the presence of metabolically versatile and facul-
tative anaerobes that remain viable in the sediment under anoxic conditions. We warn of
rapid oxidation of pore water that would result in increased bacterial respiration rates
with rapid secondary growth in the samples. For successive DNA extractions, we advise
heating a sterile blade and cutting the whole frozen sediment at once into single-use 2–3-
g aliquots to be kept at 80 °C.
Appendix A.3. Case Study A3—Variability in Eukaryotic Inventories across Different DNA
Extraction Protocols
By Isabelle Domaizon 29,30, Eric Capo 1, Charline Giguet Covex 2 and Irene Gregory-Eaves
18
We evaluated how the use of different DNA extraction methods affected DNA in-
ventories obtained for eukaryotic assemblages. Our case study focused on recent sediment
samples (less than 100 years old) collected from four lakes. We compared samples from
two large (>44 km2) deep hard-water lakes (Bourget (LDB), Geneva (LEM)) and two small
(<0.1 km2), shallow and organic-rich high-altitude lakes (Lauzanier (LAZ), Serre de
l’Homme (SDH)). For each lake, the DNA was extracted using two protocols: the Nucleo-
Spin Soil extraction kit (NucleoSpin protocol—hereafter NS) and Taberlet2012 protocol
(hereafter PB), the latter with two different sediment masses (0.75 and 4g). Extraction du-
plicates were performed for the three protocols (NS-0.75g, PB-0.75g and PB-4g), resulting
in a total number of six DNA extracts per lake. An 18S metabarcoding approach was used
to compare DNA inventories obtained for eukaryotic diversity, including microbial eu-
karyotes (e.g., bacillariophyta, chlorophytes, ciliates) and metazoan (metazooplankton, ol-
igochaeta, teleostei).
Differences were detected in both total DNA concentrations and the composition of
eukaryotes across the DNA extraction protocols (Figure A3). For the two deep lakes (LDB
and LEM), NS-0.75g and PB-0.75g protocols resulted in similar DNA concentrations (max-
imum of 503 ng.g wet sed1 for one LEM sample with NS-0.75g) while the PB-4g protocol
yielded much higher DNA concentrations (from 900 to 2205 ng.g wet sed1). In contrast,
higher DNA concentrations were obtained for the two shallow altitude lakes with the NS-
0.75 g protocol compared to the PB protocol—0.75 and 4 g—(at least 2.5 and 6 times higher
for LAZ and SDH, respectively). In terms of the composition of the assemblages, we found
that the micro-eukaryotic DNA read´s proportion ranged from 65 to 99% of inventories
(with the exception of one NS-0.75g LDB replicate). In the two deep peri-alpine lakes, the
richness in microbial eukaryotes and metazoans was found to be higher for DNA ex-
tracted using the NS protocol (by contrast with richness obtained from PB protocol either
with 0.75 and 4 g sediment), while an opposite pattern was found for the two altitude
lakes. We did not find a clear pattern associated with the MOTU numbers and the type of
extraction nor the amount of sediment extracted.
The structure of the eukaryotic assemblages (here shown as hierarchical clustering
trees) from the two deep peri-alpine lakes appears to be primarily associated with DNA
extraction protocol whereas the quantity of sediment used was a more distinguishing var-
iable for the two shallow altitude lakes (Figure A3). The composition of both micro-eukar-
yotic and metazoan inventories varies between lakes but also for each DNA extraction
protocol. For example, ciliates reads were more abundant in PB-0.75g than NS-0.75g in-
ventories in LDB but an opposite pattern was observed from LEM and SDH. Similarly,
higher bicosoecidan read numbers were observed in the LEM inventory obtained from
the NS-0.75g compared to the PB-0.75g protocol (opposite for LAZ). Nevertheless, simi-
larities were observed between inventories from low sediment masses (0.75 g) compared
to the 4g extraction protocol, with the predominance of bacillariophyta reads in LAZ in
the 0.75 g protocol. Nematodes were more easily detected in inventories from low (0.75 g)
Quaternary 2021, 4, 6 35 of 61
rather than high sediment masses (4 g). Additionally, we noticed a predominance of roti-
fers within the metazoans DNA reads when the NS protocol is used for LEM, LAZ and
SDH but not LDB.
Altogether, our findings highlight that the eukaryotic community DNA signal is sen-
sitive to the extraction protocol applied and the initial sediment sample mass extracted.
Finally, the differences in signal also vary depending on the lakes studied (here four dif-
ferent lakes), which may be related to the composition of the microbial communities and
the type of lake (and their catchment) studied, which could in turn have an effect on
pre/post depositional taphonomic processes.
Figure A3. This figure describes, for each sample, the total DNA concentration (in ng.g sed
1
), number of microbial eukar-
yotic and metazoan MOTUs, proportion of reads from microbial eukaryotes and metazoan groups. Samples were obtained
from two large (>44 km
2
) deep peri-alpine lakes (Bourget (LDB), Geneva (LEM)) and two small (<0.1 km
2
), shallow, high-
altitude lakes (Lauzanier (LAZ), Serre de l’Homme (SDH)). A code color discriminates between the three DNA extraction
treatments (red color for NS-0.75g, green for PB-0.75g and blue for PB-4g). For each lake, hierarchical clustering analysis
was independently performed from the total MOTU abundance table for all DNA inventories. NS—NucleoSpin protocol,
PB—Taberlet2012 protocol.
Material and Methods
Study site—Lake Bourget (45°44N 5°52E, 18 km long, 2.8 km wide, maximum depth
145 m, mesotrophic, id: LDB) and Lake Geneva (46°26N 6°33E, 73 km long, 14 km wide,
maximum depth 310 m, mesotrophic, id: LEM) are peri-alpine hard-water lakes. While
Lake Bourget is the largest French natural freshwater reserve, Lake Geneva, located on
the border between France and Switzerland at the northern end of the French Alps is the
largest European lake. These lakes have experienced increases in nutrient inputs, espe-
cially phosphorus, since the 1950s. Lakes Serre de l’Homme (44°4628.3” N 6°2345.4” E,
2235 m asl, 0.25 ha lake surface area, 1.95 ha catchment surface area, 1.5 m maximum
depth, id: SDH) and Lauzanier (44°2243.9” N 6°5220.4” E, 2285 m asl, 3.85 ha lake surface
area, 148 ha catchment surface area and 8 m maximum depth, id: LAZ) are small and
shallow subalpine lakes located in the Southern French Alps (Parcs naturels des Ecrins
Quaternary 2021, 4, 6 36 of 61
and du Mercantour, respectively). Their catchments are made of sandstones and for Lau-
zanier, marls and calcareous rocks are also present. Sedimentological, geochemical and
DNA analyses performed on Serre de l’Homme showed high development of aquatic
plants in the last three hundred years, which led to higher organic matter accumulation
in the sediments [39]. The Lake Lauzanier ecological state experienced important changes
during the last 2000 years with an important increase in productivity and thus in organic
content in the last 60 years [273]. In addition to the geology, size and depth, these high-
altitude lakes differ from the peri-alpine lakes by the presence of ice cover for approxi-
mately 7 months each year.
Sediment sampling—Sediment cores from Lake Bourget and Lake Geneva were sam-
pled during the ANR Iper Retro program. The detailed sampling and subsampling pro-
cedures were reported in previous articles where sedimentary DNA was analyzed (e.g.,
[40]). Sediment cores from lakes Serre de l’Homme (SDH 09 P1; N° IGSN: IEFRA00AW)
and Lauzanier (LAZ 12 P1; N° IGSN: IEFRA0082) were taken in 2009 and 2012, respec-
tively. The sampling procedure is described in previous articles (e.g., [39]) as well as in
protocols.io (dx.doi.org/10.17504/protocols.io.bdwsi7ee). Ages and sedimentological fea-
tures of sediment samples are presented in Table A4.
Table A4. Information about sediment samples used in this work.
Lake Age (yr BP) Sediment Features
Lauzanier 30 to 7 15 to 20% of organic matter (OM), 3 to 3.5% of
carbonate, >75% of loss on ignition (LOI) residue
Serre de l’Homme 36 to 24 52 to 62% of OM, 2.3 to 2.5% of carbonates,
>35.5% of LOI residue
Bourget 25 to 6 4 to 7% of organic matter (OM), 45 to 60
% of carbonates
Léman 30 to 8 4 to 9% of organic matter
(OM), 35 to 58% of carbonates
DNA extractions—For each lake, DNA was extracted from two sediment layers using
the three following DNA extraction protocols: the NucleoSpin Soil kit (NS protocol) with
0.75 g of sediment (NS-0.75g), the Taberlet2012 protocol (PB protocol) with 0.75 g (PB-
0.75g) and 4 g of sediment (PB-4g). The three extraction protocols were performed in du-
plicate for each lake. All protocols (NS and PB) are based on the same approach, with five
main steps: cell lysis (NS) or DNA desorption (Taberlet2012), filtration through a pre-col-
umn (to retain contaminants), precipitation and fixing of DNA on a membrane, washing
of the membrane and elution of extracted DNA. For the NS protocol, DNA extraction was
performed on approximately 0.75 g of wet sediment using the NucleoSpin® Soil kit ac-
cording to the manufacturer’s instructions (Macherey-Nagel, Düren, Germany). This kit
includes a step for cell lysis to release intracellular DNA (both chemical treatment via the
use of lysis buffer and physical treatment via the use of magnetic beads), but is limited in
terms of amount of sediment that can be treated (less than 1 g). For the Taberlet2012 pro-
tocol, DNA extraction was performed on approximately 0.75 and 4 g of wet sediment by
using the procedure adapted from Taberlet et al. [191] and Giguet-Covex et al. [31]. The
detailed protocols are accessible at https://dx.doi.org/10.17504/protocols.io.beenjbde for
extraction of 4 g of sediment and https://dx.doi.org/10.17504/protocols.io.betsjene for ex-
traction from 0.75 g of sediment. As a first step phosphate buffer (0.12M Na2HPO4; pH
8) is used to desorb DNA attached to particles. For the extraction of 0.75g of sediment, we
used the NucleoSpin® Soil kit (Macherey-Nagel). For the extraction of 4g of sediment, the
NucleoSpin® Plant II Midi kit (Macherey-Nagel) is then used to fix the desorbed DNA,
however we also used the SB buffer (solution from the NucleoSpin Soil kit) to precipitate
the DNA. After elution, DNA was kept at 20 °C until PCR and downstream analysis. The
Quaternary 2021, 4, 6 37 of 61
bulk DNA concentration was estimated using a Nanodrop ND-1000 Spectrophotometer
(Thermo Scientific, Wilmington, DE, USA).
PCR amplifications, high-throughput sequencing—A 260-bp long fragment of the V7
region of the 18S rRNA gene was PCR amplified, from c.a. 25 ng of environmental DNA
extract for each sample in duplicate using the general eukaryotic primers 960f (5-
GGCTTAATTTGACTCAACRCG-3) [274] and NSR1438 (5-GGGCATCACAGAC-
CTGTTAT-3) [275]. Each PCR was performed in duplicate in a total volume of 25 µL con-
taining 3 µL of 10x NH4 reaction buffer, 1.2 µL of 50 mM MgCl2, 0.25 µL of BioTaq (Bio-
line), 0.24 µL of 10 mM dNTP, 0.36 µL of 0.5 µg/µL BSA and 1 µL of each primer (500 nM).
The amplification conditions consisted of an initial denaturation at 94 °C for 10 min fol-
lowed by 35 cycles of 1 min at 94 °C, 1 min at 55 °C and 90 s at 72 °C. The amplicons were
then subjected to a final 10-min extension at 72 °C. Resulting PCR products were quanti-
fied using Quant-it PicoGreen kit (Invitrogen, Carlsbad, CA, USA). Finally, the 80 ampli-
cons were pooled at equimolar concentrations, purified using IllustraTM GFXTM PCR DNA
and Gel Band Purification Kit (GE Healthcare Life Sciences, VelizyVillacoublay, France)
and sent to Fasteris SA (Geneva, Switzerland) for library preparation and paired-end (2 ×
250 bp) sequencing on a MiSeq Illumina instrument (San Diego, CA, USA).
Bioinformatics and data analysis—The paired-end reads were merged together using
UPARSE tools (option—fastq_mergepairs with a minimal overlap equal to 150 and no
mismatch allowed) [276] allowing the attainment of 5,847,791 raw DNA sequences. These
DNA sequences were then submitted to three cleaning procedures: (i) no undefined bases
(Ns), (ii) a minimum sequence length of 200 bp, and (iii) no sequencing error in the for-
ward and reverse primers. Putative chimeras were detected by UCHIME [277]. The com-
bining of these cleaning procedures with a read length (2 × 250 bp) covering the entire
amplicon length (~ 220 bp without the primers) allowed the sequencing of each base twice
and the drastic minimization of the sequencing errors. After this cleaning step and demul-
tiplexing process, the remaining DNA sequences were clustered at a 95% similarity
threshold with UPARSE 7.0 [276] to obtain the seed MOTUs (option—cluster_fast). The
taxonomic affiliation was performed by BLAST against the SSURef SILVA database [278]
after application of the following selection criteria: length >1200 bp, quality score >75%
and a pintail value >50; the SILVA database was enriched by lacustrine DNA sequences
originating from various studies on lacustrine systems [279–282]. The taxonomy of a
MOTU corresponded to the best hit given by the similarity search. If an OTU was associ-
ated with several best hits (hits with the same identity), then the taxonomy was the com-
mon taxonomy of these hits named also the lowest common ancestor. MOTUs affiliated
to embryophytes, cercozoa nuclear and unclassified Eukaryota were removed from these
analyzes in order to consider only unicellular eukaryotes, fungi and metazoa in this anal-
ysis. The two molecular inventories obtained from one SDH DNA extract obtained using
the protocol PB-0.75g showed low amounts of read abundances and were thus removed
for further analysis of the dataset. For each sample, the molecular inventories were
merged and MOTUs detected in only one of the two duplicates were discarded. In addi-
tion, all MOTUs with a total abundance lower than 3 DNA read sequences were removed.
Finally, the resulting 23 molecular inventories were standardized at 26,584 resulting in a
final MOTU relative abundance table with a total number of 4802 MOTUs. For each lake
independently, a hierarchical clustering tree was realized based on the Bray–Curtis dis-
similarity matrices calculated from the final MOTU relative abundance table using the
hclust function from the R package “vegan” [283].
Appendix A.4. Case Study A4—Variability in Biological Groups across Different DNA Extrac-
tion Methods
By Kevin Nota 5 and Laura Parducci 5,31
We extracted extracellular DNA (exDNA), intracellular DNA (inDNA) and total
DNA using four different extraction protocols and compared qPCR amplification results
Quaternary 2021, 4, 6 38 of 61
for six biological groups (bacteria, eukaryotes, diatoms, plants, vertebrates and arthro-
pods). DNA was extracted from seven sediment samples (each ca 0.25 g) collected from
European and Russian lakes and dated from ca 2000 to 42,000 years BP. Total DNA was
extracted using three protocols: (i) the standard PowerSoil kit protocol (PS protocol), (ii)
a modified PowerSoil kit protocol using Andersen’s lysis buffer (aPS protocol), (iii) a Pow-
ersoil protocol coupled with Phosphate Buffer to extract the extracellular (exPS protocol)
and intracellular DNA (inPS protocol) fractions. All samples were extracted and qPCR
amplified in duplicates. In all samples, including controls, we measured the quantification
cycle (Cq) value—the number of PCR cycles necessary to detect a signal above fluores-
cence background—and the melting temperature (Tm) value, which gives information
about sequence composition of the amplicons.
PowerSoil protocol extracted DNA amounts (557 ± 448 ng.g sed1) on average at-least
10-fold higher than other protocols (34 ± 29, 24 ± 25, 24 ± 15 ng.g sed1 for exPS, inPS, and
aPS protocols, see Table A5). Adding together the exDNA and inDNA fractions did not
result in the amount obtained with the Powersoil protocol. The qPCR results are shown
in Figure A4. The majority of Cq values from samples were lower than those from extrac-
tion and PCR negatives suggesting successful amplification of the targeted organisms
from sedaDNA. The extraction negatives show on average lower Cq values than the PCR
negatives in all groups of taxa, suggesting that contamination may have occurred during
DNA extraction but not during the PCR step. Tm values tend to be different between ex-
traction and PCR negatives compared to the samples, indicating that amplified products
are different, with the exception of bacteria. The within-protocol variation in Tm is likely
due to the different ages and locations resulting in differences. However, if both extraction
protocols extract the same diversity, the within and between extraction protocols should
be similar, which is not the case for most of the extraction protocols. With the exception
of bacteria, Tm values varied extensively between extraction protocols. The exPS and aPS
protocols show melting temperatures dissimilar to the inDNA and Powersoil protocols
for arthropods and eukaryotes. For plants, Tm values were different for the four DNA
extracts, indicating that the different protocols may have an impact on the amplified di-
versity, and there may be a difference between the inDNA and exDNA fractions. The re-
sults show a similar trend in amplification between the different taxa and extraction pro-
tocols and overall, the lowest Cq values were obtained when the total DNA was extracted
(PS protocol). With the inPS and exPS protocols, Cq values did not vary between taxa,
except for Arthropods where the inDNA fraction was better amplified than the exDNA
fractions and shows Cq values close to those obtained using the PS protocol.
Overall, our results show that PS protocol is the most efficient in extracting DNA
from the sediments for all groups of taxa. The PowerSoil extraction with modified lysis
buffer (aPS protocol) seems to reduce extraction efficiency for all taxa. Combined inDNA
and exDNA fractions do not come close to the amount of DNA extracted with the Pow-
erSoil kit, indicating that the inDNA and exDNA does not represent the whole fraction,
and making direct comparisons difficult. inDNA and exDNA show similar results for
most taxa, except for insects.
Table A5. DNA quantification in ng.g sed1.
Extraction Mean (±sd) Median Min Max
PS protocol 557 ± 448 442 81 1531
exPS protocol 34 ± 29 23 9 102
inPS protocol 24 ± 25 15 0 94
aPS protocol 24 ± 15 21 0 50
Quaternary 2021, 4, 6 39 of 61
Figure A4. Overview of the qPCR results. In the bottom graphs the Cq values are summarized in box-plots, all the indi-
vidual measurements are plotted over the box-plots on a lighter color, points with same color as the box plot are outliers
(n = 28 for all extraction methods, except for plants, n = 42, PCR negatives n = 8, and extraction negatives n = 10). The top
graphs show the melting temperature for each Cq in the bottom graph (if multiple peaks were observed the highest inten-
sity was considered). For each of the six biological groups studied, the extraction protocol are listed as: (i) unmodified
PowerSoil kit protocol (PS protocol) (ii) modified PowerSoil kit protocol using Andersen’s lysis buffer (aPS protocol) (iii)
Powersoil protocol coupled with Phosphate Buffer to extract the extracellular (exPS protocol) and intracellular DNA (inPS
protocol) fractions. PCR neg and Ext neg correspond to PCR negatives n = 8, and extraction negatives n = 10.
Material and Methods
Study sites—The study sites are listed in Table A6.
Table A6. Overview of the sediment sample used in this work.
Id Location Sediment Type Age (yBP) References
NER2 North-east Russia Peat-Permafrost ~6000 [125]
NER11 North-east Russia Peat-Permafrost ~8200 [125]
MON18 Southern Italy Lake ~1993 [284]
MON78 Southern Italy Lake ~11,562 [284]
NWF2 North-west Finland Peat ~42,000 [125]
ATT2 Southern Sweden Lake ~11,000 [285]
ATT25 Southern Sweden Lake ~15,000 [285]
DNA extraction—Three extractions protocols were conducted on seven sediment
samples in duplicate; see Figure A5 for the extraction schemes. All extractions were per-
formed from ca 0.25 g of sediments. The Sodium Phosphate buffer approach according to
Taberlet et al. [191], with addition of extraction of cell pellet according to Alawi et al. [286]
was used. An amount of 675 µL 0.12 M NaP buffer (pH 8) was added to the sediment and
incubated at room temperature for 15 min. Samples were vortexed and centrifuged for 10
min at 500× g at room temperature, supernatant was transferred to a clean 2-mL tube. This
was repeated twice (three times in total). The pooled supernatant was centrifuged at
10,000× g for 30 min at room temperature. The supernatant was transferred to a 2-mL 30
kDa Centrifugal Filter Unit to concentrate DNA. Concentrated DNA was further purified
Quaternary 2021, 4, 6 40 of 61
with the PowerSoil kit after the lysis step. The pellet was resuspended and 150 µL nuclease
free water, before extraction with the PowerSoil kit. Total DNA was extracted using an
unmodified PowerSoil kit protocol (PowerSoil protocol), except for 10 min incubation at
room temperature before eluting. Bead-beating was done on a “normal” vortexer at high-
est speed for 10 min by taping the tubes horizontally on the vortex. For the aPS protocol,
675 µL Andersens lysis buffer was added to sediment [62] and incubated overnight at 37
°C in a rotating incubator rack. Samples were centrifuged for 1 min at 10,000× g and trans-
ferred to a clean 2-mL tube. Supernatant was extracted further with the PowerSoil kit after
the kit lysis step. The lysis buffer contained the following, 8 mM N-lauroylsarcosine so-
dium salt, 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 20 mM EDTA (pH 8.0), 50 µL 2-mer-
captoethanol (per 1 mL of lysis buffer), 33 µL DTT (per 1 mL of lysis buffer), and Protein-
ase K [287]. DNA for all extracts were quantified with a Qubit
®
2.0 Fluorometer Qubit™
1X dsDNA µHS Assay Kit. The total DNA extraction has on average at least a 10-fold
higher DNA concentration than the other extraction protocols. Counting the exDNA and
inDNA together does not result in a similar amount of DNA.
.
Figure A5. Overview of the extractions. In the graph a represents the intracellular and extracellu-
lar DNA extraction, and b the Andersen lysis buffer total DNA extraction.
qPCR amplification—All qPCR reactions were done on the same qPCR machine
(CFX96, BioRad). qPCR amplification was done in 10-µL reaction volumes containing the
following, 1X TATAA SYBR
®
GrandMaster Min, 0.5 uM forward primer, 0.5 uM reverse
primer (0.2 uM was used instead of 0.5 uM for the diatom primers). The thermocycling
program was the same for all amplifications except for the annealing temperatures (see
Table A7). The thermocycling program was as follows, 30 s at 95 °C, followed by 60 cycles
of 5 s at 95 °C, 30 s annealing (48–59 °C), and 10 s at 72 °C. After the qPCR cycles a melting
curve was obtained by increasing the temperature from 60 to 95 °C with 0.2 °C increase
every 5 s
Table A7. Summary of the markers and the used primers and annealing temperatures used for the qPCR amplification.
Primer Name Target Taxa Target
Marker
Used Annealing
Temperature Length (bp) References
Euka01For Eukaryota 18S rRNA 59 °C 48–777 [288]
Euka01Rev
Quaternary 2021, 4, 6 41 of 61
Diat_rbcL_705F Diatoms rbcL 49 °C 76 [105]
Diat_rbcL_808R
Sper01For (g Taberlet) Seed plants trnL 52 °C 10–220 [201]
Sper01Rev (h Taberlet)
Atrh01For Arthropoda 16S mt DNA 48 °C 18–97
Atrh01Rev [167]
Vert01For Vertebrata 12S mt DNA 49 °C 56–132 [289]
Vert01Rev
A967F Bacteria 16S rRNA 55 °C 98 [290]
1046R
Quaternary 2021, 4, 6 42 of 61
Appendix A.5. Case Study A5—Effects of DNA Extraction Methods on the Diversity of the
Plant DNA Signal
Appendix A.5.1. By Laura S. Epp 23, Liv Heinecke 19,20, Heike H. Zimmermann 19, Kath-
leen R. Stoof-Leichsenring 19 and Ulrike Herzschuh 19,21
Summary
We tested a suite of different kit-based extraction protocols on six different lake sur-
face sediment samples from two geographical regions to evaluate the effects of the extrac-
tion procedure on the recovered plant diversity. The two regions, the Southern Taymyr
Peninsula and the high Pamir Mountains, differ strongly in terrestrial vegetation, and in
the type of lakes targeted. For two extraction kits, we compared protocols for extracting
total DNA (standard kit buffers including a physical lysis) with protocols for extracting
extracellular DNA, the latter based on using a Sodium Phosphate buffer wash as initial
step [191]. Extracted DNA was amplified using the universal plant metabarcoding pri-
mers g and h [201], and PCR products were Illumina sequenced. We found that for sam-
ples from the Taymyr Peninsula, which has a high diversity and biomass of terrestrial
vegetation, and where we targeted three small, shallow lakes, results were highly depend-
ent on the extraction protocol, with the number of identified MOTUs ranging from below
10 to over 75 for a single sample. For samples from the high Pamir Mountains, where
terrestrial vegetation is very sparse and low in diversity, and which originate from a very
large, alkaline lake, the number of MOTUs was low (mostly below 10) for all extraction
protocols, and no particular difference between protocols could be observed. Overall, pro-
tocols including a physical lysis step yielded a higher diversity than protocols relying on
extracellular DNA.
Experimental Procedures
Six lake surface sediment samples from two geographical areas were processed for
this study: from three small lakes from the Southern Taymyr Peninsula, Siberia, and from
three locations within Lake Karakul, Pamir Mountains, Tajikistan, (Table A8). The lakes
differ in their properties, as does the surrounding vegetation. The high latitude Southern
Taymyr Peninsula is characterized by the transition from boreal larch forest to forest tun-
dra and tundra, and altogether the area has a high vegetation cover [30]. The catchment
of Lake Karakul is very arid and only sparsely vegetated [149,291]. Furthermore, the water
of Lake Karakul is alkaline (pH ~9), which negatively affects DNA preservation, while the
Siberian lakes have near neutral pH values.
Table A8. Geographical provenance and characteristics of the surface sediment samples.
Sample Lake
Geographical
Area Coordinates Elevation
(m asl)
Water
Depth
(m)
pH Lake
Area
11-CH06 11-CH06
Siberia, South-
ern Taymyr
Peninsula
97.715861
N/70.667444 E 103 4.8 6.42 0.05 km2
11-CH12 11-CH12 102.288566 N/
72.398881 E 60 14.3 7.5 0.03 km2
11-CH17 11-CH17 102.235194
N/72.244486 E 51 3.4 7.87 0.03 km2
KK13_SS3 Lake Karakul
Tajikistan,
High Pamir
Mountains
39.01814
N/73.52910E 3915 15.9 9.2 380 km2
KK13_SS7 Lake Karakul 39.02255
N/73.51955 E 3915 20.4 9.2 380 km2
KK13_SS8 Lake Karakul 73.51955
N/73.53254 E 3915 13.8 9.2 380 km2
Quaternary 2021, 4, 6 43 of 61
DNA from the six surface sediment samples was extracted using four different DNA
extraction kits (Table A9), some with modified lysis buffers. In particular, both for the
NucleoSpin Soil Kit and the PowerMax Soil Kit, we conducted DNA extractions targeting
extracellular DNA by replacing the kit lysis buffer by a wash with a Sodium Phosphate
buffer [191]. For the PowerMax Soil Kit, we additionally used a third lysis buffer [292] that
was used in a number of earlier sedaDNA studies [45,50,181,293]. Except for the Nucleo-
Spin Soil Mini Kit, for which we used about 0.25 g, we used approximately 5 g of sediment
for each extraction. Protocols using the kit lysis buffers were carried out according to the
manufacturer’s instruction. The wash with the Sodium Phosphate buffer was carried out
as in Gebremedhin et al. [294]. For lysis with the “Bulat” buffer and Proteinase K, the
sediments were incubated with the buffer overnight at 56 °C in a shaking incubator.
Table A9. DNA extraction kits, lysis buffers and weight of inserted sediment for each extraction protocol.
Extraction Kit Kit Lysis
Buffer
Phosphate Buffer (Extracel-
lular)
Bulat Buffer + Pro-
teinase K
FastDNA Spin Kit for Soil, 50 mL tubes
(MP Biomedicals) ~5 g - -
FavorPrep Soil DNA Isolation Midi Kit
(FavorGen) ~5 g - -
NucleoSpin Soil, Mini Kit (Macherey-
Nagel) ~0.25 g ~5 g -
PowerMax Soil Kit (originally MoBio, now
Qiagen) ~5 g ~5 g ~5 g
Extracted DNA was amplified with the plant specific metabarcoding primers g and
h [201] modified with an 8-bp identifier tag preceded by NNN at the 5 end and prepared
for Illumina sequencing as in Heinecke et al. [149]. Bioinformatic processing was per-
formed with the OBItools as in Epp et al. [46], with the modification that after filtering no
minimum number of reads was required for a sequence type to be further considered.
Results
Samples from the lakes in the Southern Taymyr peninsula yielded MOTU numbers
between less than 10 and more than 75, with a strong difference between different extrac-
tion protocols (Figure A6). PCR products with a high number of MOTUs contained a high
proportion of terrestrial vascular plants. Extractions of the surface sediment samples from
Lake Karakul showed a much lower diversity. With a single exception, the PCR products
yielded less than 10 MOTUs each, and the proportion of terrestrial vascular plants com-
pared to aquatic plants was lower. The diversity retrieved in the samples from Lake Kar-
akul was quite uniform among the samples.
In the samples from the Southern Taymyr Peninsula, highest MOTU numbers were
retrieved with either the PowerMax Soil Kit with standard kit buffers, or with the Nucle-
oSpin Mini Kit with standard kit buffers. In all three analyzed lakes, these two extraction
protocols were first and second in terms of MOTU numbers. The highest numbers overall,
and the highest degree of reproducibility between PCRs, was achieved with the
PowerMax Soil Kit for the sample from Lake CH12. Apart from this sample, the degree of
reproducibility was unfortunately low between PCR replicates in this experiment.
For the two protocols, which were carried out with and without a physical lysis step,
the physical lysis resulted in higher diversity than a wash with a sodium phosphate buffer
without lysis. This was true even when a much lower amount of sediment was inserted
into the lysis and extraction, as was done for the NucleoSpin Soil Mini Kit.
Conclusions
Quaternary 2021, 4, 6 44 of 61
In our tests DNA extraction protocols made a large difference in sediments from the
Southern Taymyr Peninsula, an area that harbors a high diversity of terrestrial vascular
plants. Surface sediment samples from Lake Karakul, which is surrounded by very sparse
vegetation, yielded a much lower diversity, irrespective of the extraction method. For the
Southern Taymyr peninsula samples, extraction protocols employing a physical lysis step
resulted in higher diversity of plant MOTUs and two of the tested protocols (PowerMax
Soil Kit and NucleoSpin Soil Mini Kit) yielded a relatively high diversity across all three
samples, but reproducibility of results was highly variable. Our results suggest that pre-
liminary analyses of samples from new areas, prior to conducting large-scale experiments
with many samples, can decisively influence the quality of the results.
Figure A6. Number of MOTUs identified in samples of (A) surface sediments of lakes of the Southern Taymyr Peninsula
and (B) Lake Karakul after extraction with a range of different protocols.
Appendix A.6. Case Study A6—A Protocol for Ancient DNA Extraction from Calcite-Rich
Minerogenic Lake Sediments
Peter D. Heintzman
6
, Dilli P. Rijal
6
, Antony G. Brown
6,11
and Inger G. Alsos
6
Postglacial lake sediment records are often characterized by minerogenic sediments
at their base, which were deposited before the development of organic-rich soils in the
catchment and high rates of within-lake organic production. These sediments may be cal-
cite- or silicate-based, or a mixture, depending on the catchment bedrock geology. During
analysis of a lake sediment core derived from a calcite-rich catchment in northern Norway
(Figure A7A), we found vascular plant DNA metabarcoding results became far poorer
(reduced taxonomic richness and PCR replicability) in these basal minerogenic layers. As
calcitic minerals can tightly bind DNA [102,192], we hypothesized that our DNA extrac-
tion protocol was not effectively releasing DNA during the lysis step. We therefore devel-
oped a modification to digest calcite, using EDTA-based chelation, thereby releasing ma-
trix-bound DNA before continuing with our existing protocol. We tested this new method
on six sediment layers previously extracted using our original protocol.
Quaternary 2021, 4, 6 45 of 61
Figure A7. (A) Location of Lake Gauptjern in northern Norway. Map image from Google Earth (all
map data sources and attributions are within the image). (B) Layers within the basal section of the
Gauptjern core that were tested (orange circles). M, minerogenic-rich; O: organic-rich; M-O: miner-
ogenic–organic.
We first confirmed that the basal minerogenic sediments of Lake Gauptjern are cal-
cite-rich by visual examination (Figure A7B), mass loss-on-ignition (LOI) analysis at 950
°C, and calcium/titanium (Ca/Ti) values derived from x-ray fluorescence (Table A10).
Greatly elevated LOI950 and Ca/Ti values in the basal minerogenic sediments suggest a
significant contribution from calcium carbonate and confirm these sediments as likely rep-
resenting a biogenic calcitic marl [295,296].
Across all samples and DNA extraction protocols, we detected 60 plant taxa. Our
original extraction protocol worked well for the two organic-rich (O) sediment layers,
which had LOI550 values of 51–79%, resulting in a richness of 32–44 taxa (Table A10). How-
ever, the intermediate minerogenic–organic (M-O) through to minerogenic-rich (M) sedi-
ments display a declining taxonomic richness, with only 9–10 taxa identified in the basal
calcitic sediment. The optimized protocol also exhibits a decline in richness, although this
is far less severe with 35–38 taxa in the organic-rich layers through to 24–27 in the basal
calcitic sediment. The two protocols gave comparable richness results for the organic-rich
sediment samples, although we note that PCR replicability was reduced with the opti-
mized protocol. All minerogenic sediment samples (M, M-O), with an elevated LOI950
(>20%) and Ca/Ti (>100), exhibited a 1.5–3.0-fold increase in richness, with PCR replicabil-
ity increasing by an average of 64%, when the optimized protocol was applied (Table A10,
Figure 5D in the main text). The majority of taxa detected in the minerogenic sediment
samples using the original protocol are also recovered with the optimized protocol (mean
of 85%, Table A10). Altogether, these data demonstrate that our optimized DNA extrac-
tion protocol yields superior results for this particular type of minerogenic-rich sediment.
As well as impacting taxonomic richness estimates, the results from the two DNA
extraction protocols would lead to differing ecological interpretations of the early post-
glacial vegetation history of the catchment surrounding Lake Gauptjern. Based on the re-
sults from the original extraction protocol, we would interpret a species-poor vascular
plant community with a higher-than-expected fern component (e.g., Gymnocarpium, Dry-
opteris) in the early postglacial (162.5 cm core depth), followed by a tripling of plant com-
munity richness and corresponding increase in community complexity between 158.5 and
Quaternary 2021, 4, 6 46 of 61
142.5 cm core depth. On the other hand, the results from the optimized extraction protocol
gives a more reasonable interpretation, with a diverse plant community in the early post-
glacial and richness less than doubling over the same interval.
In addition to these general patterns, the two protocols give contrasting results for
the inferred first occurrence of key woody taxa, with crowberry (Empetrum nigrum), blue-
berry (Vaccinium myrtillus), lingonberry (V. vitis-idaea), pine (Pinus sylvestris), and alder
(Alnus) first appearing within an 8-cm interval between 154.5 and 142.5 cm, based on data
from the original protocol (Figure 5D in main text). However, the optimized protocol in-
dicates that crowberry was already present in the catchment during the early postglacial
(162.5 cm), with blueberry and lingonberry appearing from at least 158.5 cm. The first
appearances of pine and alder, on the other hand, are unaffected by the protocol used,
which gives us high confidence that these taxa first appear in the record between 152.5
and 142.5 cm.
Although inhibition of DNA extracts, especially from organic-rich sediments, is a
major problem and the focus of methodological optimizations [189], we show here that
minerogenic-rich substrates can also impact DNA recovery. In this study, we used a mod-
ified digestion strategy to improve the release of mineral-bound DNA from calcite-rich
sediments. Although our study concerned biogenic marl, we believe that the results
would also hold for calcites of detrital or pedogenic origin. Our findings highlight that a
one-method-fits-all approach to lake sediment DNA extraction would have been inappro-
priate and have led to erroneous interpretations of the early postglacial succession of vas-
cular plants around this lake.
Table A10. Metadata for the six tested sediment samples and five negative controls. M: minerogenic-rich; O: organic-rich;
M-O: minerogenic-organic.
Sample Depth
(cm) M/O LOI (%) XRF
Original Proto-
col
Optimised Pro-
tocol Overlap Fold Increase
550 950 Ca/Ti Si/Ti Fe/Ti
Rich-
ness Replic.
Rich-
ness Replic.
Rich-
ness Prop.
Rich-
ness Replic.
EG13_NEC na na na na na na na 0 na 0 na 0 0 na na
EG13_NEC na na na na na na na 0 na 2 0.125 0 0 na na
EG13_NEC na na na na na na na 1 0.125 0 na 0 0 na na
EG13_NPC na na na na na na na 0 na 0 na 0 0 na na
EG13_NPC na na na na na na na 0 na 0 na 0 0 na na
EG13_L097 98.5 O 50.56 4.66 9.92 4.67 9.23 44 0.707 38 0.599 36 0.82 0.86 0.85
EG13_L141 142.5 O 79.16 2.64 7.53 0.58 5.89 32 0.539 35 0.496 26 0.81 1.09 0.92
EG13_L151 152.5 M-O 34.50 23.43 161.64 0.89 12.51 20 0.438 30 0.600 14 0.70 1.50 1.37
EG13_L153 154.5 M-O 31.83 25.08 197.30 1.10 13.94 11 0.227 26 0.543 9 0.82 2.36 2.39
EG13_L157 158.5 M 16.42 34.91 384.53 0.87 5.07 10 0.300 24 0.568 10 1.00 2.40 1.89
EG13_L161 162.5 M 12.36 33.73 238.23 1.01 2.01 9 0.611 27 0.551 8 0.89 3.00 0.90
Footnotes: LOI, loss-on-ignition (at 550 and 550–950 °C); XRF, X-ray fluorescence; M, minerogenic-rich; O, organic-rich;
Replic., replicability; Prop., proportion of taxa detected by the original protocol that were also detected by the optimized
protocol; NEC, negative extraction control; NPC, negative PCR control.
Material and Methods
Study site, sediment sampling and geochemistry—Lake Gauptjern is located on a
calcite-marble bedrock in northern Norway (400 m a.s.l.; 68.85647 N, 19.61843 E; Figure
A7A) and has a lake area of 0.78 ha and a catchment area of 0.13 km2. The Gauptjern sed-
iment record has previously been described and analyzed for palaeoecological proxies,
including pollen and macrofossils [297]. In March 2017, we collected a 163-cm long sedi-
ment core (EG13) from Lake Gauptjern using a 10-cm diameter Nesje corer [298], which
included the basal minerogenic layers identified by Jensen and Vorren [297]. We com-
pared DNA extraction methods for six samples, which were taken from the minerogenic
(M), intermediate (M-O), and organic-rich (O) layers (Table A10, Figure A7B). To confirm
that the minerogenic sediments were a calcitic marl, we calculated mass loss-on-ignition
(LOI) of dried sediment and performed x-ray fluorescence (XRF) scanning. For LOI and
Quaternary 2021, 4, 6 47 of 61
XRF, the methods followed Clarke et al. (2019), except that XRF was conducted at 5 mm
resolution and LOI was calculated at both 550 °C (LOI550; for organic carbon content) and
550–950 °C (LOI950; for carbonate/inorganic carbon content) [295,296]. We took DNA and
LOI samples concurrently in the specialized clean-room ancient DNA facilities at The Arc-
tic University Museum of Norway in Tromsø.
DNA extraction—We initially extracted DNA following Rijal et al. [16], whose
method uses 0.25–0.35 g of input and is modified from the Qiagen DNeasy PowerSoil kit
(Qiagen Norge, Oslo, Norway). Due to poorer results from the minerogenic layers (Table
A10), we re-extracted DNA using a modified protocol to improve the release of DNA from
calcite-rich sediments. For this optimized protocol, we used a lysis buffer, consisting of 1
mL of 0.5 M EDTA (pH 8.0), 2.5 µL of 20 mg/mL proteinase K, and 32 µL of 1 M Dithio-
threitol (DTT), that was incubated overnight at 56 °C. We then centrifuged the digested
mixture to separate the supernatant from the pellet. We removed the supernatant, which
we concentrated to 100 µL in a 30 kDa Vivaspin-500 column (GE Healthcare, Oslo, Nor-
way). The concentrated supernatant was then used as input to the standard method, but
with the bead-beating and overnight lysis steps omitted.
PCR amplifications, high-throughput sequencing—We amplified each DNA extract
using unique dual-tagged ‘gh’ primers [201] to target the vascular plant trnL p6-loop locus
(following [49]), for eight PCR replicates per extract (see Table in figshare folder
https://doi.org/10.6084/m9.figshare.13007279.v1). We applied negative controls during
both DNA extraction and PCR setup. After PCR, we pooled and purified PCR products
into two pools following Clarke et al. [26]. One pool (JIE4) was shipped to FASTERIS, SA
(Switzerland) and converted into a single-indexed PCR-free MetaFast library (following
[26]). The second pool (AOHL-3–8) was converted into a unique dual-indexed library in
Tromsø following the Illumina TruSeq DNA PCR-free protocol (Illumina, Inc., CA, USA),
with the bead-cleanup steps modified to retain our short amplicon inserts. Both libraries
were sequenced on ~10% of separate 2x 150 cycle mid-output flow cells on the Illumina
NextSeq platform at FASTERIS.
Bioinformatics and data analysis—We followed the bioinformatics pipeline pre-
sented by Rijal et al. (2020), which uses the ObiTools software package [261] and custom
R scripts (available at https://github.com/Y-Lammers/MergeAndFilter). Briefly, we
merged overlapping paired-end reads and retained only those that merged. We then de-
multiplexed our tagged amplicons using the tag-PCR replicate lookup presented in. We
then collapsed identical sequences, removed putative artifactual sequences from our data
that may have derived from Illumina library index-swaps or PCR/sequencing errors, and
removed sequences that had fewer than 3 reads in all PCR replicates. We next identified
amplicon sequence variants (ASVs) that had 100% identity agreement with either the
ArctBorBryo and/or EMBL nucleotide (rl133 release) databases, following Volstad et al.
[49]. We further removed identified ASVs that 100% matched two “blacklists” consisting
of known contaminants/exotics or synthetic sequences [16]. The final taxonomic assign-
ment of the 71 retained ASVs was determined by one of us (Alsos) using regional botanical
taxonomic expertise. We calculated the proportion of weighted replicates (wtRep) for each
retained ASV in each sample using the method presented by Rijal et al. [16], which weights
PCR replicate detections based on relative sequencing depth. For each of the six taxa that
were represented by more than one ASV, we selected the ASV that had the greatest da-
taset-combined wtRep. This resulted in a final data set of 60 taxa that were used for all
taxonomic richness and PCR replicability statistics. For visualization, we selected the top
40 taxa that had the greatest dataset-combined wtRep. We visualized the results in R v3.4.4
(R Core Team 2018) using the rioja v0.9-21 package [299]. PCR replicability was calculated
as the mean proportion of detections in PCR replicates across all taxa within a sample.
Quaternary 2021, 4, 6 48 of 61
Appendix A.7. Case Study A7—Improvement of DNA Extraction Methods for the Detection of
Catchment Mammal DNA Signal
By Charline Giguet-Covex 2, Francesco Gentile Ficetola 14,15 and Pierre Taberlet 6,15
We tested if adding a concentration step to a DNA extraction protocol can improve
the performance of extraction of sedaDNA. From the established DNA extraction protocol
combining the use of a Phosphate Buffer step with the NucleoSpin Soil kit protocol (Taber-
let2012 protocol), successfully used in several studies analyzing lake sedaDNA (e.g.,
[31,32,51]), we added a step with Amicon®ultra-15 10k centrifugal filters (Millipore) to
concentrate the DNA extract (AmTaberlet2012 protocol).
eDNA was extracted from four sediment samples from the Lake Anterne (French
Alps), already analyzed by Giguet-Covex et al. [31]. We used one sample dated back to
the Neolithic age (roughly 4800 cal. years BP), one dated back to the Bronze age (3160 cal.
years BP), one was from the Roman age (2010 cal. years BP) and one was very recent (-56
years BP). In the AmTaberlet2012 protocol, 15 g of sediments were mixed with 15 mL of
phosphate buffer and centrifuged for 10 min at 10,000× g. Then, 12 mL of the resulting
supernatant were transferred to Amicon®ultra-15 10k centrifugal filters (Millipore), and
then centrifuged at 4000× g for ultra-filtration and concentration of the buffer with DNA.
The centrifugation time was variable among sediments; the aim of this step was to reduce
the volume as much as possible, from 12 mL to 500–700 µL of supernatant containing a
high concentration of DNA. For most of samples approx. 20 min of centrifugation allowed
the reduction in the volume to the desired level. However, for a few samples a longer
centrifugation time (up to 30 min) was required. An amount of 400 µL of the resulting
concentrate was then kept as starting material for the following extraction steps, using the
standard protocols with the NucleoSpin® Soil kit (Macherey-Nagel, Düren, Germany)
[167,300]. The extracted DNA was then amplified using the primers MamP007, following
Giguet-Covex et al. [31]. These primers have