PreprintPDF Available

Abstract and Figures

The Arctic is currently warming at an unprecedented rate, which may affect environmental constraints on the freshwater microbial communities found there. Yet, our knowledge of the community structure and functional potential of High Arctic freshwater microbes remains poor, even though they play key roles in nutrient cycling and other ecosystem services. Here, using high-throughput metagenomic sequencing and genome assembly, we show that sediment microbial communities in the High Arctic's largest lake by volume, Lake Hazen, are phylogenetically diverse, ranging from Proteobacteria, Verrucomicrobia, Planctomycetes, to members of the newly discovered Candidate Phyla Radiation (CPR) groups. These genomes displayed a high prevalence of pathways involved in lipid chemistry, and a low prevalence of nutrient uptake pathways, which might represent adaptations to the specific, cold (~3.5°C) and extremely oligotrophic conditions in Lake Hazen. Despite these potential adaptations, it is unclear how ongoing environmental changes will affect microbial communities, the makeup of their genomic idiosyncrasies, as well as the possible implications at higher trophic levels.
Content may be subject to copyright.
Metagenome of High Arctic lake sediments 1
Microbial genomes retrieved from High Arctic lake sediments provision
1
their microbes with genes to strive in cold and oligotrophic
2
environments
3
Matti O. Ruuskanen1*, Graham Colby2, Kyra A. St.Pierre3, Vincent L. St.Louis4,
4
Stéphane Aris-Brosou5 and Alexandre J. Poulain6
5
6
1,2,5,6Department of Biology, University of Ottawa, Ottawa, ON, Canada.
7
3,4Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada.
8
5Department of Mathematics & Statistics, University of Ottawa, Ottawa, ON, Canada.
9
10
1matti.ruuskanen@uottawa.ca, 2gcolb087@uottawa.ca, 3kyra2@ualberta.ca, 4vince.stlouis@ualberta.ca,
11
5sarisbro@uottawa.ca, 6apoulain@uottawa.ca
12
13
*Correspondence: Matti Ruuskanen, matti.ruuskanen@gmail.com
14
15
16
17
Running head: Metagenome of High Arctic lake sediments
18
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 2
Abstract
19
The Arctic is currently warming at an unprecedented rate, which may affect environmental
20
constraints on the freshwater microbial communities found there. Yet, our knowledge of the
21
community structure and functional potential of High Arctic freshwater microbes remains poor,
22
even though they play key roles in nutrient cycling and other ecosystem services. Here, using high-
23
throughput metagenomic sequencing and genome assembly, we show that sediment microbial
24
communities in the High Arctic’s largest lake by volume, Lake Hazen, are phylogenetically diverse,
25
ranging from Proteobacteria, Verrucomicrobia, Planctomycetes, to members of the newly
26
discovered Candidate Phyla Radiation (CPR) groups. These genomes displayed a high prevalence of
27
pathways involved in lipid chemistry, and a low prevalence of nutrient uptake pathways, which
28
might represent adaptations to the specific, cold (~3.5°C) and extremely oligotrophic conditions in
29
Lake Hazen. Despite these potential adaptations, it is unclear how ongoing environmental changes
30
will affect microbial communities, the makeup of their genomic idiosyncrasies, as well as the
31
possible implications at higher trophic levels.
32
33
Keywords: Metagenomics, arctic lakes, lake sediments, psychrotrophy, oligotrophy, microbial
34
diversity
35
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 3
Introduction
36
Climate change is transforming Arctic ecosystems: elevated temperatures and increasing precipitation
37
have facilitated permafrost thaw (Romanovsky et al., 2010) and glacial melt (Lehnherr et al., 2018;
38
Milner et al., 2017), leading to impacts on the functioning of aquatic and terrestrial ecosystems, and the
39
natural services they provide. Microbes are major players in the biogeochemical cycling of organic
40
matter and inorganic nutrients; as such studying contemporary Arctic microbial communities is critical
41
for both documenting and predicting how these cycles might respond to ongoing and future
42
environmental change. However, baseline microbial community data from the Arctic are still lacking
43
because these environments are remote and challenging to study. Furthermore, in-depth study of
44
microbial communities has only recently become possible with culture-independent methods like high-
45
throughput sequencing (e.g., Shokralla et al., 2012). As Arctic environments are already responding to
46
climate change at the watershed scale (Lehnherr et al., 2018), and these warming-related changes are
47
projected to continue (IPCC, 2018), there is an urgent need to gather data on the current state of Arctic
48
microbial communities and their function from which to compare in the future.
49
To date, culture-independent studies of microbial communities in Arctic lakes have been most
50
often performed by amplifying and sequencing taxonomic markers like the 16S rRNA gene (e.g.,
51
Crump et al., 2012; Mohit et al., 2017; Ruuskanen et al., 2018; Stoeva et al., 2014; Wang et al., 2016,
52
2019). However, amplicon-based methods are subject to PCR amplification bias, which might alter the
53
estimates of microbial community composition and diversity. Furthermore, while taxonomy based on a
54
marker gene can be used to predict the functional potential of microbes (Louca et al., 2016), these
55
predictions are purely hypothetical in the absence of functional data. Metagenomic sequencing enables
56
the reconstruction of nearly complete Metagenome Assembled Genomes (MAGs) solely from
57
environmental DNA (Zhou et al., 2015). Gene sequences coding for proteins derived from contiguous
58
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 4
sequences in the metagenomes can also be used to reconstruct the functional potential of microbial
59
communities in the sampled environment. For example, metagenomic sequencing enabled the
60
discovery of the Candidate Phyla Radiation (CPR; Brown et al., 2015; Hug et al., 2016), consisting of
61
uncultured, deeply branching lineages in bacteria, which had previously evaded detection in purely
62
amplicon-based studies. After their initial discovery, CPR members have been found in a variety of
63
environments, including the deep subsurface (Danczak et al., 2017), marine sediments (León‐Zayas et
64
al., 2017), and hypersaline soda lakes (Vavourakis et al., 2018). Presence of CPR bacteria has also been
65
reported in Arctic freshwater environments (Vigneron et al., 2019; Wurzbacher et al., 2017). While
66
they appeared to be absent in 16S rRNA amplicon data, a metagenomic analysis of the same samples
67
revealed them to be highly abundant (Vigneron et al., 2019). However, shotgun metagenomic data from
68
lake sediment microbial communities in Arctic environments are still scarce (Wang et al., 2019), and
69
both larger lakes and High Arctic lakes remain thus far uncharted by these methods. Such knowledge
70
would definitely expand our understanding of Arctic and global microbial diversity.
71
To investigate this diversity of microbes and their metabolisms in understudied Arctic lakes, we
72
analyzed shotgun metagenomes from sediment extracted DNA from Lake Hazen, the world’s largest
73
High Arctic lake by volume. In a previous study using 16S rRNA gene amplicons, we hypothesized
74
that taxonomically dissimilar sediment communities at different locations in the lake might be
75
functionally similar (Ruuskanen et al., 2018). In the current study, we revisited this question by
76
investigating the taxonomic diversity and functional potential of the sediment microbial community
77
using metagenomics. We identified rarely studied organisms within the sediment, characterized the
78
metabolic pathways that are over- and underrepresented in these reconstructed genomes (compared to
79
reference data from online repositories as of June 2018), described the ecologically important nutrient
80
cycles that are potentially present in the sediments, and identified which taxa might play key roles in
81
them.
82
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 5
Material and methods
83
Sampling and chemical analyses
84
Lake Hazen (located within Quttinirpaaq National Park on northern Ellesmere Island, Nunavut,
85
Canada; 81.8°N, 71.4°W) is 544 km2 in surface area, with a maximum depth of 267 m, making it the
86
largest lake by volume north of the Arctic Circle. The area immediately surrounding the lake is a polar
87
oasis with higher than average temperatures for similar latitudes. Temperatures over 0ºC have been
88
observed in this region on more than 80 days per year (Soper and Powell, 1985), which is likely due to
89
thermal shielding by the Grand Land mountains in the northwestern portion of Lake Hazen’s watershed
90
(France, 1993). The lake is primarily fed hydrologically by meltwaters of the outlet glaciers of the
91
Grant Land Ice Cap, and has a single major outflow, the Ruggles River, which flows southeastwardly
92
to the eastern coast of Ellesmere Island. Sediment cores were collected from two sites: Deep Hole
93
[261 m] on the 4th of August and Snowgoose Bay [49 m] on the 8th of August 2016 (Figure S1).
94
Sampling was conducted from a boat using an UWITEC (Mondsee, Austria) gravity corer with 86 mm
95
inner diameter polyvinyl chloride core tubes. At both sites, triplicate cores were collected: one each for
96
DNA extraction, porewater chemistry and microsensor measurements. While the extracted cores were
97
up to 40 cm in length, the subsectioning was restricted to the topmost 6 cm in all cores, since
98
microprobes cannot be pushed any deeper than this into the sediment cores. The core from which DNA
99
were extracted was sectioned at 0.5 cm intervals in the field, preserved with LifeGuard™ (MO BIO,
100
Carlsbad, CA), and stored at -18°C until DNA extraction. Because of logistical difficulties involved
101
with sampling in the High Arctic, the sectioning equipment could only be cleaned with non-sterile lake
102
water and bleach between each section before putting the complete sections into sterile 50 mL
103
centrifuge tubes. Non-powdered nitrile gloves were worn while handling the samples. For porewater
104
analyses (NH4+, NO2-/NO3-, SO42-, TDP, Cl-), the core was similarly sectioned at 1 cm intervals and
105
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 6
placed in sterile 50 mL centrifuge tubes. The sediment sections were centrifuged at 4500 rpm for 15
106
minutes to separate the sediments from the porewater. The supernatant was then filtered through 0.45-
107
µm cellulose acetate syringe filters that were first rinsed with a bit of sample water. The remaining
108
filtrate was stored in sterile 15 ml Corning polystyrene centrifuge tubes, and then immediately frozen at
109
-18°C until analyses for NH4+, NO2-/NO3-, SO42-, TDP and Cl-. Porewater chemical concentrations were
110
determined using CALA-certified protocols at the Biogeochemical Analytical Service Laboratory
111
(University of Alberta, Edmonton, AB, Canada). On the final core, 100-µm resolution microprofiles of
112
O2, redox potential, and pH were measured using Unisense (Aarhus, Denmark) glass microsensors
113
interfaced with the Unisense Field Multimeter immediately upon return to camp. Cores were
114
maintained at ambient temperatures (~4C) throughout profiling. The microprofiles of these cores have
115
also been described in an earlier study (St. Pierre et al., 2019).
116
DNA extraction and sequencing
117
For the extraction of environmental DNA, triplicate ~0.5 g (wet weight) subsamples were taken from
118
three intervals per sediment core (Deep Hole: 0.0 0.5 cm, 1.0 1.5 cm, 2.0 2.5 cm; Snowgoose
119
Bay: 0.0 0.5 cm, 1.5 2 cm, 3.0 3.5 cm). The samples were first washed once with a saline buffer
120
(10 mM EDTA, 50 mM Tris-HCl, 50 mM Na2HPO4·7H2O at pH 8.0) to remove inhibitors (Poulain et
121
al., 2015; Zhou et al., 1996), and then DNA was extracted from the samples (and a negative reagent
122
control) with the DNeasy PowerSoil Kit (Qiagen, Hilden, Germany) according to the manufacturer’s
123
instructions. Sample manipulations and extractions were conducted with sterilized equipment in a
124
laminar flow cabinet (HEPA 100). The quality of the DNA was checked with a NanoDrop 2000
125
(Thermo Fisher Scientific, Wilmington, DE, USA) and through confirming the amplification of the
126
glnA gene from the extracts (with E. coli DNA as the positive and sterile H2O as the negative control)
127
by PCR and gel electrophoresis (see SI text). The glnA gene was chosen as the control for DNA quality
128
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 7
because its genomic copy number should be lower than that of the 16S rRNA gene (Stoeva et al., 2014)
129
and a positive result would better confirm the quality of the DNA. The triplicate DNA extracts were
130
then combined for each core horizon. Library preparation and sequencing was completed by Genome
131
Quebec (Montreal, QC, Canada) with Illumina HiSeq 2500 PE125 in triplicate lanes for each sample.
132
Preprocessing high-throughput sequencing data
133
The forward and reverse reads were trimmed and filtered for size and quality with Trimmomatic 0.36
134
(Bolger et al., 2014) and the data from all six samples were co-assembled with Megahit v1.1.2 (Li et
135
al., 2015). Anvi’o v4 (Eren et al., 2015) was used for database management, following their standard
136
metagenomic workflow. Briefly, reads were mapped to the contigs with Bowtie 2 (Langmead and
137
Salzberg, 2012) and contigs longer than 1 kbp were binned with CONCOCT 0.4.1 (Alneberg et al.,
138
2014). Open reading frames were identified with Prodigal (Hyatt et al., 2010), and functional
139
annotations were inferred based on six reference systems: NCBI’s Cluster of Orthologous Genes
140
(COG; Galperin et al., 2015; Tatusov et al., 2003) was run within Anvi’o using DIAMOND (Buchfink
141
et al., 2015) for the protein alignments using the default E-value cutoff of 0.001. Following this, Pfam
142
(Finn et al., 2014), TIGRFAM (Selengut et al., 2007) and Gene Ontology (GO; Ashburner et al., 2000;
143
The Gene Ontology Consortium, 2017) annotations for proteins were added with InterProScan 5.29-
144
68.0 (Jones et al., 2014). Briefly, for both the Pfam and TIGRFAM reference systems, the query
145
protein sequences were searched with HMMER3 (Eddy, 2009) against the respective hidden Markov
146
model databases. Hits for individual query proteins were filtered based on curated model-specific cut-
147
offs and, in the case of Pfam, lower-scoring hits in the same Pfam clan were removed (Jones et al.,
148
2014). Finally, GO terms were associated with the proteins within InterProScan through cross-
149
referencing the Pfam and TIGRFAM annotations with the InterPro database (Hunter et al., 2012).
150
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 8
These bins that were assembled automatically from the contigs were then manually refined in
151
Anvi’o to < 10% contamination (also named “redundancy” in the Anvi’o documentation), based on
152
single copy genes following Campbell et al. (2013) for bacteria, and Rinke et al. (2013) for archaea.
153
These contamination estimates were cross-compared against the lineage-specific marker genes from
154
CheckM v1.0.11 (Parks et al., 2015), and bins that still had CheckM contamination > 10% were further
155
refined manually by splitting. Final completion values for the refined genome bins were also estimated
156
with CheckM. Read coverages of the MAGs in each sample were calculated with Anvi’o, normalized
157
to total number of reads in each sample, and finally scaled to binned reads in each sample. Genome
158
bins were analyzed for the presence of KEGG (Kanehisa et al., 2017) and MetaCyc (Caspi et al., 2018)
159
pathways by mapping GO terms of each bin to Enzyme Classification (EC) categories and
160
reconstructing parsimonious pathways with MinPath (Ye and Doak, 2009). The reconstructed genomes
161
were also analyzed for the presence of marker genes (Table S1) and MetaCyc pathways (Table S2) for
162
core elemental cycles (C, N, P, and S), together with metal regulation and homeostasis genes.
163
Abundances of individual MetaCyc pathways were calculated by summing the sample-wise normalized
164
abundances of each MAG indicated to contain the pathway. The abundance of the individual marker
165
genes in the separate samples was calculated from gene-level read coverages in Anvi’o, followed by
166
normalization to total reads in a sample.
167
To estimate the robustness of the MAG-based community composition in the samples, reads
168
identified as 16S rRNA genes were extracted from the dataset, and assembled with MATAM (Pericard
169
et al., 2018). The 16S rRNA gene contigs were then classified against the SILVA 128 NR95 database
170
(Quast et al., 2013) with the RDP Classifier (Wang et al., 2007). The abundances of the operational
171
taxonomic units (individual 16S rRNA contigs) in MATAM were calculated as the proportion of 16S
172
rRNA reads mapping to each contig per sample.
173
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 9
For phylogenetic and taxonomic comparisons of MAGs to reference data, all non-redundant (one
174
per species) complete bacterial and archaeal genomes, and all the available Candidate Phyla Radiation
175
(CPR; Brown et al., 2015; Hug et al., 2016) genomes, were downloaded from the NCBI GenBank
176
database (Benson et al., 2012). As of June 2018, this comprised 3,362 bacterial, 240 archaeal and 3,561
177
genomes for CPR. A further 71 CPR genomes were added from Danczak et al. (2017). Open reading
178
frames were used if available, or they were identified de novo with Prodigal. Functional annotations for
179
genes and pathways in these genomes were performed de novo as described above for MAGs. For
180
assessing the phylogeny, sequences of 16 ribosomal proteins were extracted from both the NCBI
181
genomes and our MAGs that were > 70% complete (following Hug et al., 2016; Table S3). The
182
sequences were aligned per ribosomal protein with MAFFT 7.402 (Katoh and Standley, 2013) using
183
translated protein sequences, and back-translated to the original nucleotide sequences with TranslatorX
184
(Abascal et al., 2010). Badly aligned sequences were removed, and the alignments were trimmed with
185
trimAl 1.2rev59 using the ‘-gappyout’ mode (Capella-Gutiérrez et al., 2009). Phylogenetic trees were
186
constructed from each ribosomal protein with FastTree 2.1.9, compiled with double precision to
187
estimate accurately short branch lengths (as recommended in the manual), and using the GTR + Γ
188
model of sequence evolution (e.g., Aris-Brosou and Rodrigue, 2012). Sequences with unexpectedly
189
long branches in the individual ribosomal protein trees were then removed with treeshrink (Mai and
190
Mirarab, 2018) with tolerance of false positives ‘--quantiles’ set to 0.01. The trimmed alignments were
191
concatenated for each genome with gap characters added for missing ribosomal proteins. NCBI
192
genome entries with more than 25% gaps and MAGs with more than 50% gaps over the full alignment
193
were then removed.
194
The higher cut-off on gap proportion for MAGs (50%) compared to the reference genomes (25%)
195
was used to enable inclusion of a higher number of MAGs in the downstream analyses. However,
196
together with the low cut-off used in the binning step (all >1 kbp contigs included), the phylogenetic
197
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 10
uncertainty of our taxonomic assignments could have been increased. Thus, we calculated the pairwise
198
phylogenetic distance of each MAG against the reference genomes for each of the ribosomal protein
199
trees with the ‘cophenetic’ function from ape (Paradis et al., 2004), and the number of incidences of
200
each reference genome in the ten closest genomes for each MAG. The correlation between the
201
proportional incidence of the most commonly found reference genome in each MAG was then
202
compared against the number of different contigs containing ribosomal proteins in them with a least-
203
square linear regression, where the number of contigs was log10-transformed. Finally, the phylogenetic
204
tree containing both the MAGs and the reference genomes was constructed with FastTree 2.1.9 under
205
the GTR + Γ model of sequence evolution, as above.
206
Taxonomy analysis and functional potential of the microbial community
207
For further data analysis, only MAGs with < 50% gaps over the complete alignment were preserved
208
(n = 55; Table S4). The phylogenetic tree containing the MAGs and reference genomes was visualized
209
with ggtree (Yu et al., 2017), and annotated based on NCBI taxonomy. The taxonomy of the MAGs
210
was manually annotated based on the lowest level in a monophyletic clade starting from a node with a
211
support value of at least 0.5 (full tree is included in the SI both as a PDF and a Newick tree file). For
212
MAGs with uncertain taxonomy assignments at the phylum level, 16S rRNA sequences were extracted
213
from their genomes (> 200 bp; n = 2) with ssu-align 0.0.1 (Nawrocki and Eddy, 2010). Also, the 16S
214
rRNA sequences of their closest relative reference genomes based on the ribosomal protein tree were
215
extracted. The MAG 16S rRNA sequences were matched against the NCBI’s nr database with
216
BLASTN 2.8.1+ (Zhang et al., 2000) and the top 50 hits for each query were downloaded. These
217
collections consisting of 16S sequences from the MAG itself, Thermanaerovibrio acidaminovorans
218
DSM 6589, Candidatus Caldatribacterium saccharofermentans OP9-77CS, Acetomicrobium
219
hydrogeniformans (NCBI Reference Sequence: NR_116842.1), and the 50 top matches from the
220
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 11
BLASTn searches were then aligned and trees built with the Silva Alignment and Tree service (Pruesse
221
et al., 2012), using FastTree under the GTR + Γ model of sequence evolution. The trees were then
222
rooted using the Acetomicrobium hydrogeniformans 16S rRNA gene as the outgroup.
223
Phyloseq 1.24.0 (McMurdie and Holmes, 2013) was used to manage abundance and phylogenetic
224
data. Phylum-level microbial community composition was qualitatively compared between the 16S
225
rRNA assembly and the genome assembly of 55 MAGs, where taxonomy assignments were based on
226
the tree of ribosomal proteins. Differences in the phylum-level 16S rRNA based taxonomy between the
227
samples were also assessed by fitting a generalized linear model with a quasi-binomial link function
228
(using ‘glm’; R Core Team, 2018) to the data. To examine patterns in community structure and
229
function of the MAGs, between-sample distances were first calculated for genome phylogeny using
230
patristic distances with Double Principal Coordinate Analysis (DPCoA; Pavoine et al., 2004). For 16S-
231
based taxonomy, and functional pathways, Bray-Curtis dissimilarities were calculated in vegan 2.5.2
232
(Oksanen et al., 2018). The distance matrices were ordinated with NMDS, and envfit (Oksanen et al.,
233
2018) was used to correlate the ordinations with sample physicochemistry. To further compare
234
clustering patterns, the distance matrices were dimensionally reduced with t-Distributed Stochastic
235
Neighbor Embedding (t-SNE; van der Maaten and Hinton, 2008) with the R package rtsne 0.13
236
(Krijthe and van der Maaten, 2017) and ‘perplexity’ set to 1.6. Clusters were then detected with
237
Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN; Campello et
238
al., 2013) in dbscan 1.1.2 (Hahsler et al., 2018) with ‘minPts’ set to 2.
239
The marker genes and MetaCyc pathways were classified to environmentally relevant cellular
240
processes (Carbon (C), Nitrogen (N), Sulfur (S), Phosphorus (P), and toxic metal cycling) and these
241
were divided into categories (Table S5). Some pathways were deemed to be misclassified based on
242
their usual taxonomic range specified in the MetaCyc pathway descriptions and subsequently removed
243
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 12
from the data. The proportion of genomes that had at least one single marker gene or pathway for a
244
process were quantitatively compared between the MAGs (n = 55) and reference genomes subset to
245
phyla shared with the MAGs (n = 2,486). The homogeneity of pathway abundances between MAGs to
246
the reference genomes was assessed with a Pearson’s 2 test where P-values were estimated based on
247
10,000 permutations. Differentially abundant processes for each comparison, which contributed more
248
than their equal share (out of n = 46 processes) to the total X2 score, were visualized with heatmaps of
249
their Pearson residuals. Finally, the gene-level read coverages (normalized to total number of reads per
250
sample) of N, S, P, and toxic metal processing marker genes in the Lake Hazen sediments were
251
compared between the two sites and between oxic (> 0.0 mg L-1) and anoxic (0.0 mg L-1) samples. The
252
significances of the differences were assessed with pairwise t-tests, using ‘mt’ (False Discovery Rate -
253
based correction for multiple testing) from Phyloseq 1.24.0 (McMurdie and Holmes, 2013). To identify
254
the most abundant MAGs and phyla for each process across the samples, the read coverage of each of
255
the 55 MAGs (relative to total number of reads per sample) was averaged over all six samples (Table
256
S6). The relative abundances of the MAGs were also summed together at the phylum level (or class for
257
Proteobacteria) by processes identified through MetaCyc pathways or individual marker genes (Table
258
S7). The most abundant phylum in each process was then identified for inclusion in the flow-charts of
259
the N, S, and Hg cycles (Figure 5).
260
Primary sequence data produced in this study is available in the NCBI Sequence Read Archive and
261
the 55 medium-quality MAGs are available in GenBank
262
(https://www.ncbi.nlm.nih.gov/bioproject/PRJNA525692). The geochemical data are available in the
263
NSF Arctic Data Center online repository
264
(https://arcticdata.io/catalog/#view/doi:10.18739/A2SJ19Q9P). Shell scripts, and code used in R 3.5.2
265
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 13
(R Core Team, 2018), are available in the supplementary material and through GitHub
266
(https://github.com/Begia/Hazen-metagenome).
267
Results and discussion
268
Reconstruction of Metagenome Assembled Genomes (MAGs)
269
A total of 115.6 Gbp of sequence in 685 million reads were obtained from the six samples (Figure S2).
270
The reads were co-assembled into 5.3 million contigs consisting of 3.7 Gbp of non-redundant
271
sequences. The longest contig was 408.1 kbp and the N50 was 748 bp. The performance of our
272
assembly was slightly better than when the same assembler was used for a complex soil metagenome
273
(Howe et al., 2014; Li et al., 2015), but comparable to other studies of sediments with the same
274
assembler (e.g., Carr et al., 2018; Lau et al., 2018). We sorted the contigs into 1488 bins with < 10%
275
contamination, of which 146 were at least ‘medium-quality draft’ assemblies with > 50% completion
276
(after Bowers et al., 2017). The remaining bins were ‘low-quality draft’ assemblies at < 50%
277
completion (n = 1342). On average, 58% of all reads per sample were included in the 1488 bins.
278
Of these 146 medium-quality draft genomes, 68 were found to be > 70% complete, of which
279
only a subset of 55 MAGs had > 50% of the nucleotides in the concatenated alignment of ribosomal
280
proteins, deemed to be the minimum amount of information for placing them in the genome tree. On
281
average, these 55 MAGs had a completeness of 88.2% based on single copy genes, a G/C ratio of 52.9,
282
an N50 of 22 kbp, a genome size of 3.1 Mb, and contained 3,117 genes with a coding density of 90%
283
(Table S4). Note that we used a 1 kbp lower bound on contig size during the binning process,
284
essentially to increase the number of contigs in this step, which is somewhat lower than has recently
285
been used (e.g., 2.5 kbp and 5 kbp in Delmont et al., 2018). However, only eight MAGs (14.5% of the
286
total) appeared to have low N50 values (2.3 - 5 kbp), and the phylogenetic uncertainty in the taxonomic
287
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 14
assignments of the MAGs was not correlated to the number of contigs containing ribosomal proteins in
288
each MAG (P = 0.9). Thus, while this lower 1 kbp cut-off that we used in binning and taxonomy
289
assignment of the MAGs might have increased their fragmentation, this liberal cut-off did not increase
290
the uncertainty of their phylogenetic placement. These are key considerations for the reliability of the
291
downstream analyses, because both the reconstructions of the metabolic pathways of the 55 MAGs
292
(with MinPath) and their phylogenetic placement (with ribosomal proteins) are based on the collection
293
of annotated genes contained in the individual MAGs.
294
Differential recovery of MAGs compared to 16S rRNA gene data
295
The most common phyla (or class for proteobacteria) in the MAGs were Verrucomicrobia (14%),
296
Betaproteobacteria (13%), Alphaproteobacteria (12%), Planctomycetes (10%) and Actinobacteria (9%;
297
Figure 1a). To quantify the reliability of this taxonomic profile, we also binned the raw reads separately
298
with the MATAM pipeline specifically designed to identify 16S rRNA genes (Pericard et al., 2018).
299
This analysis, performed with the default settings of the pipeline, produced 166 OTUs which covered at
300
least 500 bp of the complete 16S rRNA gene, about a third of its length. Among these OTUs, the most
301
common bacterial and archaeal phyla were Betaproteobacteria (20%), Woesearchaeota (13%),
302
Alphaproteobacteria (10%), Actinobacteria (8%) and Deltaproteobacteria (5%; Figure 1b). As a result,
303
the 55 medium-quality MAGs represented a differential recovery of the total (16S rRNA reads-derived)
304
microbial community, apparently missing at least two major taxonomic groups, Archaea and
305
Firmicutes. This may be due to challenges inherent to identifying taxa based on 16S data and
306
discrepancies between the NCBI and the SILVA databases (Parks et al., 2018), as well as at least two
307
additional issues
308
First, while we identified 25 archaeal bins in the assembly, only six of them were over 10% complete
309
and none were more than 70% complete. It should be noted that the six sequenced samples had similar
310
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 15
community compositions based on the 16S rRNA reads-derived data (Figure 1b; all P = 1.0 at the
311
phylum-level), which might have increased the binning difficulty because it utilizes differences in read
312
coverages of contigs between samples (e.g., Alneberg et al., 2014). Second, Firmicutes appeared to be
313
underrepresented among the assembled bins. Only nine low-quality Firmicutes genomes were
314
assembled, of which seven were between 10% and 50% complete, and the rest below 10% complete.
315
This underrepresentation of Firmicutes in the assembled genomes might have been caused by their
316
endosporulation affecting DNA extraction, making them detectable only by marker genes (Filippidou
317
et al., 2015). Despite the difficulties related to the metagenomic assembly, the taxonomic composition
318
of the microbial community in Lake Hazen sediments was very similar to our previous study of the
319
same sites in spring 2015 (Ruuskanen et al., 2018). However, several groups that were found here to be
320
highly prevalent (Archaea, Omnitrophica) were likely missed in the earlier study because of PCR
321
selection bias (Suzuki and Giovannoni, 1996). The community was also similar to previous
322
metagenomic studies of oligotrophic lake sediments (e.g., Wang et al., 2016). One notable exception to
323
previous studies (e.g., Rautio et al., 2011) was that Cyanobacteria were much less common in Lake
324
Hazen. This is likely due to most other arctic studies having focused on shallow thaw ponds, whereas
325
Lake Hazen is ~260 m deep, with light penetration restricted to the upper ~25 m of the water column.
326
Furthermore, other physicochemical characteristics of Lake Hazen (such as ultra-oligotrophy) may not
327
be favorable to Cyanobacteria (St. Pierre et al., 2019), which might affect their abundance in sediments
328
as well. It should also be noted, that at least the phylum-level microbial community composition in
329
Lake Hazen sediments appeared to be stable from 2015 spring to 2016 summer, despite high
330
sedimentation rates in summer of 2015 resulting from enhanced glacial melt (St. Pierre et al., 2019).
331
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 16
332
Figure 1. Composition of the microbial communities in the samples from Lake Hazen sediments. (a)
333
Medium-quality draft MAGs (n = 55) with manual taxonomy assignments based on reference genomes
334
in a phylogenetic tree constructed from ribosomal proteins. The phylum Proteobacteria is here
335
subdivided into classes. (b) Raw reads binned into 16S rRNA contigs (n = 166) with MATAM and
336
automatic taxonomy assignments from the SILVA 128 NR95 database.
337
338
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 17
To understand how the physicochemistry of the lake sediments (Figure S3) can drive community
339
structure and function, we used ordination and clustering methods to compare the samples (for an in-
340
depth analysis of Lake Hazen contemporary limnology, see St. Pierre et al., 2019), first based on the
341
16S-derived data. Here, the sites from Lake Hazen did not have significantly different microbial
342
communities (based on 10,000 permutations; P = 0.10), and Cl- concentration was the only chemical
343
variable with a significant linear correlation with the differences in microbial community structure (P =
344
0.03; Figure S4). However, the Cl- concentration was very low, varying within a narrow range (0.11
345
0.27 mgL-1), and also covarying with both pH and SO42- concentration, which are both known to
346
influence the community structure in Lake Hazen (Ruuskanen et al., 2018). Furthermore, we observed
347
no significant differences in community structure along the redox gradient (P = 0.77) or O2
348
concentration (P = 0.23) which is partly consistent with a previous study of Lake Hazen sediments
349
that showed that the microbial community was not constrained by oxygen concentration, but that the
350
redox gradient was associated with community structure (Ruuskanen et al., 2018). This discrepancy is
351
however not unexpected, as the range of redox potentials of the samples in the current study (Figure
352
S3) was much narrower than in the previous study. Furthermore, the O2 concentration likely plays
353
some role in the community structure and function in Lake Hazen sediments, but the gradient can be
354
extremely steep (Figure S3; Ruuskanen et al., 2018), and differences could be only seen with
355
comparisons of much thinner sediment horizons.
356
To assess the validity of these 16S-derived results, we turned to the 55 MAGs, which also
357
showed no significant differences in terms of community structure among samples, or in terms of
358
correlations between community structure and physicochemistry (all P > 0.10). Similarly, a clustering
359
analysis was unable to fully separate the sites based on these 55 MAGs (Figure S5A), but not on the
360
16S rRNA contig data that exhibited more differences among sites (with identical settings as for the
361
MAGs; Figure S5B); note that these clustering analyses are by nature however qualitative, as no
362
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 18
statistical tests were performed. Finally, we saw no separation of the samples in clustering when using
363
functional pathway data from the 55 MAGs (derived with MetaCyc; Figure S6). This was also likely
364
due to the small differences in the community composition of the two sites. As the communities
365
appeared to be similar at both sampling sites, we pooled the data from each site together for all
366
downstream analyses, in order to assess the extent of phylogenetic and potential metabolic microbial
367
diversity in those lake sediments.
368
Lake Hazen sediments harbor phylogenetically diverse bacteria
369
To place the 55 MAGs in a phylogenetic and taxonomic context, we reconstructed a tree adding 5,942
370
reference genomes to our MAG collection (Figure 2). The analysis showed that five of the MAGs (9%)
371
likely represented CPR bacteria from previously known, NCBI-classified, candidate phyla (namely:
372
Kerfeldbacteria, Falkowbacteria, Berkelbacteria, Pacebacteria and Shapirobacteria). This result
373
highlights the importance of sequencing samples from rarely studied environments, such as High Arctic
374
lakes. Two of these MAGs displayed similar characteristics to previously studied CPR (Hug et al.,
375
2016), such as small genome size (< 1.3 Mbp) and a low number of genes (1309 and 1255 open reading
376
frames; Table S4). In addition, three MAGs could not be classified to any known phylum by either
377
their ribosomal proteins, or their partial 16S rRNA gene sequences (SH-like aLRT support < 50%). We
378
note however that among these three unclassified MAGs, LH_MA_65_9 was likely related to
379
uncultured bacteria close to candidate division OP11 (Figure S7), and LH_MA_57_9 was likely related
380
to bacteria close to candidate division OP10 (Figure S8), a division mostly sampled from lake
381
sediments. For example, the sediments of Upper Mystic Lake (Massachusetts, USA,
382
www.ncbi.nlm.nih.gov/nuccore/DQ166697.1) contained the closest match to LH_MA_65_9 (99%
383
BLAST identity).
384
385
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 19
386
Figure 2. Phylogenetic tree of MAGs aligned against reference genomes annotated at phylum-level
387
based on the NCBI taxonomy database. Genomes identified in NCBI as belonging to each phylum are
388
indicated with different colors, which are matching those in Figure 1. Black diamonds indicate the
389
phylogenetic placement of the MAG assembled in the current study. Scale bar are in unit of number of
390
substitutions per site. Full tree with SH-like node support values is included as supplementary files
391
(Full_tree_with_supports.pdf and Full_tree_with_supports.nwk).
392
393
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 20
Lake Hazen MAGs are enriched in genes to strive at low temperatures and oligotrophy
394
To test if these reconstructed genomes from Lake Hazen possess unique metabolic features, we
395
quantified the presence of select pathways in both the MAGs and a set of reference genomes at the
396
same taxonomic rank (phylum level; n = 2,486 genomes), and compared their prevalence (Figure 3). In
397
particular, we found that the marker genes and pathways for cellular metabolism and nutrient cycling
398
(Table S5) were significantly different (Pearson’s
2; X2 = 361.25 ; P = 0.0001 based on 10,000
399
permutations).
400
Among the most different pathways, three were overrepresented in the MAGs (Figure 3b).
401
Among those three, glycerol (glycerophosphodiester) degradation and fatty acid / lipid biosynthesis
402
were more prevalent in the MAGs (Figure 3b). This is likely linked with temperature tolerance and
403
energy conservation strategies. Indeed, both cold stress (Chintalapati et al., 2004) and starvation (Lever
404
et al., 2015) can induce changes in the lipid composition of microbial cell membranes. A high
405
prevalence of fatty acid desaturases was also recently seen in several Antarctic lake metagenomes (Koo
406
et al., 2018), suggesting that these pathways are important for psychrotrophic microbes given that
407
water temperatures are around 3.5°C below a depth of 50 m (St. Pierre et al., 2019). Alternatively,
408
triacylglycerols could be utilized for energy storage in the form of lipid droplets (Alvarez and
409
Steinbüchel, 2002). In addition to energy storage, lipids, in particular when present as droplets, might
410
also play a role in regulating the stress response in bacteria (Zhang et al., 2017). Both of these roles
411
could be beneficial to the bacteria harboring them in the Lake Hazen sediments, as most nutrients and
412
oxygen are delivered to the lake in glacial meltwater during summer (St. Pierre et al., 2019). This short
413
period of higher nutrient and oxygen availability correlates with a temporary jump in microbial activity
414
(St. Pierre et al., 2019), whose energy stores might be triacylglycerols that are then gradually released
415
to maintain metabolism during the long winter.
416
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 21
417
Figure 3. Cellular processes and metabolism in reference genomes and MAGs. (a) Percent representation
418
of genomes in each data set (reference genomes and MAGs) with at least one marker pathway or gene
419
for each process or metabolism. Names of pathways or processes with the most differentially abundant
420
presence in MAGs compared to the reference genomes subset to common phyla are indicated in bold
421
italics. (b) The most differentially abundant processes according to the Pearson’s 2 test (P = 0.001) as
422
a heatmap of their Pearson residual scores.
423
The third overrepresented pathway, Dissimilatory Nitrite Reduction to Ammonia (DNRA) /
424
Polysulfide reduction, shows that the MAGs are also enriched in membrane-linked molybdopterin
425
oxidoreductases genes of the NrfD/PsrC family that includes genes coding for tetrathionate-, dimethyl
426
sulfoxide-, polysulfide-, and nitrite reductases (Jormakka et al., 2008). It is more likely that these genes
427
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 22
in MAGs (matching the annotation Pfam 03916) are related to sulfur reduction than DNRA, because
428
the more specific markers for DNRA, such as nrfH, nirB and nirD, were rarer in the MAGs than in the
429
reference genomes (Figure 3a). Furthermore, the same marker gene (nrfD) has been found for instance
430
in metagenomes from sediments of saline (Ferrer et al., 2011) and freshwater lakes (Lin et al., 2011),
431
and associated there with anaerobic respiration with oxidized sulfur compounds. The nrfD gene is thus
432
likely important also for anaerobic respiration in the mostly anoxic sediments of Lake Hazen.
433
Underrepresented pathways in the Lake Hazen sediment MAGs (Figure 3b) might be
434
underutilized and thus subject to genome streamlining, which is common in oligotrophic organisms
435
(Giovannoni et al., 2014). These pathways include, for example, nitrogen assimilation as NO3- and
436
NH4+. Likely, because of the low concentration of inorganic nitrogen in Lake Hazen (St. Pierre et al.,
437
2019) it might be directly assimilated only by a small number of organisms. The majority of microbes
438
might instead resort to recycling of existing organic nitrogen compounds or nitrogen fixation to fulfill
439
their needs for this nutrient (Figure 3). Similarly, aromatic compound degradation, methylotrophic
440
metabolism and sulfur compound oxidation (largely desulfonation) were rarer in the MAGs than
441
reference genomes. The lower prevalence of these pathways might also be a consequence of low
442
nutrient and substrate availability in Lake Hazen and the low number of methanogens (this study,
443
Emmerton et al., 2016; Ruuskanen et al., 2018; St. Pierre et al., 2019b). In addition, selenate reduction
444
was underrepresented in the MAGs. This pathway is usually important for anaerobic metabolism
445
(Staicu and Barton, 2017), but in Lake Hazen its lower prevalence could similarly reflect the low
446
availability of selenium because the site is distant from anthropogenic sources (e.g., Chapman et al.,
447
2010). It should be noted, that marker genes used to identify the aforementioned pathways may also
448
have been missed due to technological biases, on our focusing the analysis on only the 55 most
449
complete MAGs, and / or their incompleteness. Despite these potential biases however, the
450
underrepresentation of these specific pathways in the MAGs makes sense in the light of the conditions
451
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 23
in Lake Hazen sediments. Indeed, Lake Hazen is rapidly changing with the increased delivery of
452
sediment, nutrients, organic carbon and contaminants, perpetuated by enhanced glacial melt throughout
453
the watershed (Lehnherr et al., 2018). The dense and turbid glacial waters entering Lake Hazen flow
454
directly to the bottom of the lake (St. Pierre et al., 2019), and this might effectively lead to the
455
disappearance of oligotrophic ecological niches in the sediments with increased productivity.
456
Spatial homogeneity of nutrient / toxic metal cycling in Lake Hazen sediments
457
Finally, to investigate more closely nutrient and toxic metal cycling in the sediments and evaluate the
458
possible contribution of different microbes to these cycles, we identified marker genes or pathways for
459
these relevant processes. However, because the most parsimonious pathways are calculated using
460
marker genes present in a single genome bin, it is possible that low quality bins could have strongly
461
biased pathway inference (Ye and Doak, 2009). To alleviate this issue, we analyzed differences in
462
abundances of all individual marker genes found in all assembled contigs in the metagenome longer
463
than 1 kbp and not only the 55 MAGs using gene-level normalized read coverages summarized per
464
each marker gene (Figure 4).
465
Comparing within and between sites, as well as oxic (surface samples; < 0.5 cm) vs. anoxic
466
samples (deeper samples are completely anoxic at both sites (Figure S3), we found that none of the
467
individual marker genes for the nutrient cycles were differentially abundant between the two sites
468
(pairwise t-tests, all P > 0.9 after FDR correction) or between the oxic and anoxic sediment horizons
469
(pairwise t-tests, all P > 0.8 after FDR correction). Thus, for the next analysis we averaged the
470
normalized abundances of the 55 MAGs across all six samples (the two sites and three depths) to
471
identify the likely key organisms in the nitrogen, sulfur, and mercury cycles which improved our
472
ability to investigate these more fine-grain patterns of nutrient cycling.
473
474
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 24
475
Figure 4. Normalized read coverages of individual marker genes from the whole metagenomic data
476
across the samples. Samples from oxic environments (O2 concentration > 0.0 mg L-1) are indicated in
477
bold. Differences in abundances of individual marker genes between the two sites (Deep Hole and
478
Snowgoose Bay) and oxygen levels (oxic and anoxic) were not significant (pairwise t-test; all P > 0.8
479
after FDR correction). Gene family/domain annotations used to identify the individual marker genes are
480
shown in Table S5.
481
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 25
Based on these average read coverages, we identified the most abundant MAGs that included
482
markers for nitrogen, sulfur and mercury cycling (Table S6), as summarized in Table S7 for each
483
ecologically important process in the nutrient cycles and shown in Figure 5. Proteobacteria (mostly
484
beta- and alpha-) were overall the most abundant phylum that had marker genes or pathways for
485
nitrogen and sulfur cycling, but other phyla were often the most abundant when looking at individual
486
processes (SI text; Table S7). The metagenome also contained markers for the full sulfur cycle,
487
although the genes for dissimilatory sulfate reduction to sulfite (aprAB) and dissimilatory sulfite
488
reduction to sulfide (dsrAB) were not detected specifically in the 55 MAGs (SI text; Figure 5b). While
489
no marker genes for mercury methylation (hgcA; Pfam 3599) were found in the MAGs (but were
490
present in 32 reference genomes and 27 low quality bins from the metagenome), six reconstructed
491
genomes harbored mer-operon genes involved in mercury resistance, with Alphaproteobacteria as the
492
most abundant of these (Figure 5c). The nitrogen and sulfur cycles in the Lake Hazen sediments
493
appeared to be closely intertwined, catalyzed by taxonomically diverse lineages. Certain MAGs, such
494
as LH_MA_37_3, likely from Geobacter, and LH_MA_65_9, might represent in the lake ecosystem
495
highly important organisms that can fix dissolved nitrogen gas, and are thus capable of returning it
496
back into the biological cycles. Furthermore, other MAGs, such as LH_MA_55_1, might represent
497
microbes that play roles simultaneously in both sulfur and nitrogen cycles (SI text). These results are
498
consistent with our previous work that, based on 16S rRNA gene amplicon data, predicted the presence
499
of key functional groups such as aerobic ammonia oxidizers (LH_MA_61_7, likely from
500
Nitrosomonadales), nitrate reducers (e.g., LH_MA_28_10, likely from Rhodoferax) and sulfate
501
reducers (presence of aprAB genes in the metagenomic dataset) in the Lake Hazen sediments
502
(Ruuskanen et al., 2018).
503
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 26
504
Figure 5. Biogeochemical cycling in the Lake Hazen sediment. Most common phyla participating in
505
each process are indicated. The phylum Proteobacteria is here subdivided into classes. Marker gene
506
symbols or ‘MetaCyc’ for marker pathways are indicated in gray text next to each process. Gray arrows
507
denote processes that were not found in the 55 MAGs. If the gene is not indicated, a specific marker for
508
the process could not be identified in the annotation. (a) Nitrogen cycle. DNRA: Dissimilatory
509
Reduction of Nitrite to Ammonia. (b) Sulfur cycle. (c) Mercury cycle.
510
Conclusions
511
We show that in addition to being unique in its location (81N), dimensions, and volume, Lake Hazen
512
hosts a phylogenetically diverse set of microbes whose reconstructed genomes contain a high
513
prevalence of pathways that make them fit for thriving in a cold (~3.5°C) and oligotrophic
514
environment. This diversity includes organisms from recently discovered groups, such as the
515
Candidate Phyla Radiation, and from uncultured branches of the tree of life. Because metagenomic
516
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 27
surveys from arctic lake sediments are currently scarce, our results provide the scientific community
517
with a compositional baseline and functional potential of these ecosystems. Indeed, as the
518
oligotrophic niches that extant microbes inhabit are likely to be affected by ongoing environmental
519
changes in Lake Hazen, changes in the microbial community at the base of the lake ecosystem might
520
have unforeseen consequences, with possible repercussions leading even to higher trophic levels.
521
Acknowledgements
We would like to thank the Poulain and Aris-Brosou lab members at University of Ottawa, and
522
members of the Virta and Hultman labs at University of Helsinki for their helpful comments during
523
data analysis and writing of the manuscript. This work was funded by the Natural Resources Canada /
524
Polar Continental Shelf Project (VStL, AJP), the Natural Sciences and Engineering Research Council
525
of Canada (VStL, SAB, AJP), the ArcticNet Centre for Excellence (VStL, AJP), the Canadian
526
Foundation for Innovation (SAB, AJP) and the Finnish Academy of Science and Letters / the Vilho,
527
Yrjö, and Kalle Väisälä Fund (MR).
528
Author contributions
529
MR wrote the manuscript and analyzed the data. GC participated in sampling, performed the DNA
530
extractions and quality control in the laboratory. KStP and VStL performed the sampling and
531
measured the physicochemical data. AJP and SAB supervised the project. All authors contributed to
532
the writing and accepted the final version of the manuscript.
533
534
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 28
References
535
Abascal, F., Zardoya, R., and Telford, M. J. (2010). TranslatorX: multiple alignment of nucleotide
536
sequences guided by amino acid translations. Nucl. Acids Res., gkq291.
537
doi:10.1093/nar/gkq291.
538
Alneberg, J., Bjarnason, B. S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., et al. (2014). Binning
539
metagenomic contigs by coverage and composition. Nature Methods 11, 11441146.
540
doi:10.1038/nmeth.3103.
541
Alvarez, H. M., and Steinbüchel, A. (2002). Triacylglycerols in prokaryotic microorganisms. Appl.
542
Microbiol. Biotechnol. 60, 367376. doi:10.1007/s00253-002-1135-0.
543
Aris-Brosou, S., and Rodrigue, N. (2012). “The essentials of computational molecular evolution,” in
544
Evolutionary Genomics Methods in Molecular Biology. (Humana Press, Totowa, NJ), 111152.
545
doi:10.1007/978-1-61779-582-4_4.
546
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene
547
ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25,
548
2529. doi:10.1038/75556.
549
Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., et al. (2012).
550
GenBank. Nucleic Acids Research 41, D36D42. doi:10.1093/nar/gks1195.
551
Bolger, A. M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina
552
sequence data. Bioinformatics 30, 21142120. doi:10.1093/bioinformatics/btu170.
553
Bowers, R. M., Kyrpides, N. C., Stepanauskas, R., Harmon-Smith, M., Doud, D., Reddy, T. B. K., et
554
al. (2017). Minimum information about a single amplified genome (MISAG) and a
555
metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology 35,
556
725731. doi:10.1038/nbt.3893.
557
Brown, C. T., Hug, L. A., Thomas, B. C., Sharon, I., Castelle, C. J., Singh, A., et al. (2015). Unusual
558
biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208.
559
Buchfink, B., Xie, C., and Huson, D. H. (2015). Fast and sensitive protein alignment using
560
DIAMOND. Nature Methods 12, 5960. doi:10.1038/nmeth.3176.
561
Campbell, J. H., O’Donoghue, P., Campbell, A. G., Schwientek, P., Sczyrba, A., Woyke, T., et al.
562
(2013). UGA is an additional glycine codon in uncultured SR1 bacteria from the human
563
microbiota. PNAS 110, 55405545. doi:10.1073/pnas.1303090110.
564
Campello, R. J. G. B., Moulavi, D., and Sander, J. (2013). Density-based clustering based on
565
hierarchical density estimates. in Advances in Knowledge Discovery and Data Mining Lecture
566
Notes in Computer Science. (Springer, Berlin, Heidelberg), 160172. doi:10.1007/978-3-642-
567
37456-2_14.
568
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 29
Capella-Gutiérrez, S., Silla-Martínez, J. M., and Gabaldón, T. (2009). trimAl: a tool for automated
569
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 19721973.
570
doi:10.1093/bioinformatics/btp348.
571
Carr, S. A., Schubotz, F., Dunbar, R. B., Mills, C. T., Dias, R., Summons, R. E., et al. (2018).
572
Acetoclastic Methanosaeta are dominant methanogens in organic-rich Antarctic marine
573
sediments. The ISME Journal 12, 330342. doi:10.1038/ismej.2017.150.
574
Caspi, R., Billington, R., Fulcher, C. A., Keseler, I. M., Kothari, A., Krummenacker, M., et al. (2018).
575
The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res 46, D633
576
D639. doi:10.1093/nar/gkx935.
577
Chapman, P. M., Adams, W. J., Brooks, M., Delos, C. G., Luoma, S. N., Maher, W. A., et al. (2010).
578
Ecological Assessment of Selenium in the Aquatic Environment. CRC Press.
579
Chintalapati, S., Kiran, M. D., and Shivaji, S. (2004). Role of membrane lipid fatty acids in cold
580
adaptation. Cell. Mol. Biol. (Noisy-le-grand) 50, 631642.
581
Crump, B. C., Amaral-Zettler, L. A., and Kling, G. W. (2012). Microbial diversity in arctic freshwaters
582
is structured by inoculation of microbes from soils. ISME J 6, 16291639.
583
doi:10.1038/ismej.2012.9.
584
Danczak, R. E., Johnston, M. D., Kenah, C., Slattery, M., Wrighton, K. C., and Wilkins, M. J. (2017).
585
Members of the Candidate Phyla Radiation are functionally differentiated by carbon- and
586
nitrogen-cycling capabilities. Microbiome 5. doi:10.1186/s40168-017-0331-1.
587
Delmont, T. O., Quince, C., Shaiber, A., Esen, Ö. C., Lee, S. T., Rappé, M. S., et al. (2018). Nitrogen-
588
fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean
589
metagenomes. Nat Microbiol 3, 804813. doi:10.1038/s41564-018-0176-9.
590
Eddy, S. R. (2009). A new generation of homology search tools based on probabilistic inference.
591
Genome Inform 23, 205211.
592
Emmerton, C. A., St. Louis, V. L., Lehnherr, I., Graydon, J. A., Kirk, J. L., and Rondeau, K. J. (2016).
593
The importance of freshwater systems to the net atmospheric exchange of carbon dioxide and
594
methane with a rapidly changing high Arctic watershed. Biogeosciences 13, 58495863.
595
doi:10.5194/bg-13-5849-2016.
596
Eren, A. M., Esen, Ö. C., Quince, C., Vineis, J. H., Morrison, H. G., Sogin, M. L., et al. (2015).
597
Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3, e1319.
598
doi:10.7717/peerj.1319.
599
Ferrer, M., Guazzaroni, M.-E., Richter, M., García-Salamanca, A., Yarza, P., Suárez-Suárez, A., et al.
600
(2011). Taxonomic and Functional Metagenomic Profiling of the Microbial Community in the
601
Anoxic Sediment of a Sub-saline Shallow Lake (Laguna de Carrizo, Central Spain). Microbial
602
Ecology 62, 824837.
603
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 30
Filippidou, S., Junier, T., Wunderlin, T., Lo, C.-C., Li, P.-E., Chain, P. S., et al. (2015). Under-
604
detection of endospore-forming Firmicutes in metagenomic data. Computational and Structural
605
Biotechnology Journal 13, 299306. doi:10.1016/j.csbj.2015.04.002.
606
Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., et al. (2014). Pfam:
607
the protein families database. Nucleic Acids Res 42, D222D230. doi:10.1093/nar/gkt1223.
608
France, R. L. (1993). The Lake Hazen Trough: A late winter oasis in a polar desert. Biological
609
Conservation 63, 149151. doi:10.1016/0006-3207(93)90503-S.
610
Galperin, M. Y., Makarova, K. S., Wolf, Y. I., and Koonin, E. V. (2015). Expanded microbial genome
611
coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43,
612
D261D269. doi:10.1093/nar/gku1223.
613
Giovannoni, S. J., Cameron Thrash, J., and Temperton, B. (2014). Implications of streamlining theory
614
for microbial ecology. ISME J 8, 15531565. doi:10.1038/ismej.2014.60.
615
Hahsler, M., Piekenbrock, M., Arya, S., and Mount, D. (2018). dbscan: Density Based Clustering of
616
Applications with Noise (DBSCAN) and Related Algorithms. Available at: https://CRAN.R-
617
project.org/package=dbscan [Accessed June 4, 2018].
618
Howe, A. C., Jansson, J. K., Malfatti, S. A., Tringe, S. G., Tiedje, J. M., and Brown, C. T. (2014).
619
Tackling soil diversity with the assembly of large, complex metagenomes. PNAS, 201402564.
620
doi:10.1073/pnas.1402564111.
621
Hug, L. A., Baker, B. J., Anantharaman, K., Brown, C. T., Probst, A. J., Castelle, C. J., et al. (2016). A
622
new view of the tree of life. Nature Microbiology 1, 16048.
623
Hunter, S., Jones, P., Mitchell, A., Apweiler, R., Attwood, T. K., Bateman, A., et al. (2012). InterPro in
624
2011: new developments in the family and domain prediction database. Nucleic Acids Res 40,
625
D306D312. doi:10.1093/nar/gkr948.
626
Hyatt, D., Chen, G.-L., LoCascio, P. F., Land, M. L., Larimer, F. W., and Hauser, L. J. (2010).
627
Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC
628
Bioinformatics 11, 119. doi:10.1186/1471-2105-11-119.
629
IPCC (2018). Global warming of 1.5 C An IPCC Special Report on the impacts of global warming of
630
1.5 C above pre-industrial levels and related global greenhouse gas emission pathways, in the
631
context of strengthening the global response to the threat of climate change, sustainable
632
development, and efforts to eradicate poverty. , eds. V. Masson-Delmotte, P. Zhai, H. O.
633
Pörtner, D. Roberts, J. Skea, P. R. Shukla, et al. Summary for Policymakers Edited by Science
634
Officer Science Assistant ….
635
Jones, P., Binns, D., Chang, H.-Y., Fraser, M., Li, W., McAnulla, C., et al. (2014). InterProScan 5:
636
genome-scale protein function classification. Bioinformatics 30, 12361240.
637
doi:10.1093/bioinformatics/btu031.
638
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 31
Jormakka, M., Yokoyama, K., Yano, T., Tamakoshi, M., Akimoto, S., Shimamura, T., et al. (2008).
639
Molecular mechanism of energy conservation in polysulfide respiration. Nat Struct Mol Biol 15,
640
730737. doi:10.1038/nsmb.1434.
641
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., and Morishima, K. (2017). KEGG: new
642
perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45, D353D361.
643
doi:10.1093/nar/gkw1092.
644
Katoh, K., and Standley, D. M. (2013). MAFFT Multiple Sequence Alignment Software Version 7:
645
Improvements in Performance and Usability. Molecular Biology and Evolution 30, 772780.
646
doi:10.1093/molbev/mst010.
647
Koo, H., Hakim, J. A., Morrow, C. D., Crowley, M. R., Andersen, D. T., and Bej, A. K. (2018).
648
Metagenomic Analysis of Microbial Community Compositions and Cold-Responsive Stress
649
Genes in Selected Antarctic Lacustrine and Soil Ecosystems. Life (Basel) 8.
650
doi:10.3390/life8030029.
651
Krijthe, J., and van der Maaten, L. (2017). rtsne: T-distributed stochastic neighbor embedding using a
652
Barnes-Hut implementation. Available at: https://cran.r-
653
project.org/web/packages/Rtsne/index.html [Accessed November 29, 2017].
654
Langmead, B., and Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9,
655
357359. doi:10.1038/nmeth.1923.
656
Lau, N.-S., Zarkasi, K. Z., Md Sah, A. S. R., and Shu-Chien, A. C. (2018). Diversity and Coding
657
Potential of the Microbiota in the Photic and Aphotic Zones of Tropical Man-Made Lake with
658
Intensive Aquaculture Activities: a Case Study on Temengor Lake, Malaysia. Microb Ecol.
659
doi:10.1007/s00248-018-1283-0.
660
Lehnherr, I., Louis, V. L. S., Sharp, M., Gardner, A. S., Smol, J. P., Schiff, S. L., et al. (2018). The
661
world’s largest High Arctic lake responds rapidly to climate warming. Nature Communications
662
9, 1290. doi:10.1038/s41467-018-03685-z.
663
León‐Zayas, R., Peoples, L., Biddle, J. F., Podell, S., Novotny, M., Cameron, J., et al. (2017). The
664
metabolic potential of the single cell genomes obtained from the Challenger Deep, Mariana
665
Trench within the candidate superphylum Parcubacteria (OD1). Environmental Microbiology
666
19, 27692784. doi:10.1111/1462-2920.13789.
667
Lever, M. A., Rogers, K. L., Lloyd, K. G., Overmann, J., Schink, B., Thauer, R. K., et al. (2015). Life
668
under extreme energy limitation: a synthesis of laboratory- and field-based investigations.
669
FEMS Microbiol Rev 39, 688728. doi:10.1093/femsre/fuv020.
670
Li, D., Liu, C.-M., Luo, R., Sadakane, K., and Lam, T.-W. (2015). MEGAHIT: an ultra-fast single-
671
node solution for large and complex metagenomics assembly via succinct de Bruijn graph.
672
Bioinformatics 31, 16741676. doi:10.1093/bioinformatics/btv033.
673
Lin, W., Jogler, C., Schüler, D., and Pan, Y. (2011). Metagenomic Analysis Reveals Unexpected
674
Subgenomic Diversity of Magnetotactic Bacteria within the Phylum Nitrospirae. Appl. Environ.
675
Microbiol. 77, 323326. doi:10.1128/AEM.01476-10.
676
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 32
Louca, S., Parfrey, L. W., and Doebeli, M. (2016). Decoupling function and taxonomy in the global
677
ocean microbiome. Science 353, 12721277. doi:10.1126/science.aaf4507.
678
Mai, U., and Mirarab, S. (2018). TreeShrink: fast and accurate detection of outlier long branches in
679
collections of phylogenetic trees. BMC Genomics 19, 272. doi:10.1186/s12864-018-4620-2.
680
McMurdie, P. J., and Holmes, S. (2013). phyloseq: An R package for reproducible interactive analysis
681
and graphics of microbiome census data. PLoS ONE 8, e61217.
682
doi:10.1371/journal.pone.0061217.
683
Milner, A. M., Khamis, K., Battin, T. J., Brittain, J. E., Barrand, N. E., Füreder, L., et al. (2017).
684
Glacier shrinkage driving global changes in downstream systems. PNAS 114, 97709778.
685
doi:10.1073/pnas.1619807114.
686
Mohit, V., Culley, A., Lovejoy, C., Bouchard, F., and Vincent, W. F. (2017). Hidden biofilms in a far
687
northern lake and implications for the changing Arctic. NPJ Biofilms Microbiomes 3.
688
doi:10.1038/s41522-017-0024-3.
689
Nawrocki, E. P., and Eddy, S. R. (2010). ssu-align: a tool for structural alignment of SSU rRNA
690
sequences. URL h ttp://selab. janelia. org/software. html.
691
Oksanen, J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., et al. (2018). vegan:
692
Community Ecology Package. Available at: https://CRAN.R-project.org/package=vegan
693
[Accessed June 4, 2018].
694
Paradis, E., Claude, J., and Strimmer, K. (2004). APE: Analyses of phylogenetics and evolution in R
695
language. Bioinformatics 20, 289290. doi:10.1093/bioinformatics/btg412.
696
Parks, D. H., Chuvochina, M., Waite, D. W., Rinke, C., Skarshewski, A., Chaumeil, P.-A., et al.
697
(2018). A standardized bacterial taxonomy based on genome phylogeny substantially revises
698
the tree of life. Nature biotechnology.
699
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., and Tyson, G. W. (2015). CheckM:
700
assessing the quality of microbial genomes recovered from isolates, single cells, and
701
metagenomes. Genome Res. 25, 10431055. doi:10.1101/gr.186072.114.
702
Pavoine, S., Dufour, A.-B. A.-B., and Chessel, D. (2004). From dissimilarities among species to
703
dissimilarities among communities: a double principal coordinate analysis. J. Theor. Biol. 228,
704
523537. doi:10.1016/j.jtbi.2004.02.014.
705
Pericard, P., Dufresne, Y., Couderc, L., Blanquart, S., Touzet, H., and Birol, I. (2018). MATAM:
706
reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes.
707
Bioinformatics 34, 585591. doi:10.1093/bioinformatics/btx644.
708
Poulain, A. J., Aris-Brosou, S., Blais, J. M., Brazeau, M., Keller, W. (Bill), and Paterson, A. M. (2015).
709
Microbial DNA records historical delivery of anthropogenic mercury. The ISME Journal.
710
doi:10.1038/ismej.2015.86.
711
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 33
Pruesse, E., Peplies, J., and Glöckner, F. O. (2012). SINA: Accurate high-throughput multiple sequence
712
alignment of ribosomal RNA genes. Bioinformatics 28, 18231829.
713
doi:10.1093/bioinformatics/bts252.
714
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., et al. (2013). The SILVA
715
ribosomal RNA gene database project: Improved data processing and web-based tools. Nucl.
716
Acids Res. 41, D590D596. doi:10.1093/nar/gks1219.
717
R Core Team (2018). R: A language and environment for statistical computing. Vienna, Austria: R
718
Foundation for Statistical Computing Available at: https://www.R-project.org/.
719
Rautio, M., Dufresne, F., Laurion, I., Bonilla, S., Vincent, W. F., and Christoffersen, K. S. (2011).
720
Shallow freshwater ecosystems of the circumpolar Arctic1. Écoscience; Sainte-Foy 18, 204
721
222.
722
Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J.-F., et al. (2013).
723
Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431
724
437. doi:10.1038/nature12352.
725
Romanovsky, V. E., Smith, S. L., and Christiansen, H. H. (2010). Permafrost thermal state in the polar
726
Northern Hemisphere during the international polar year 20072009: a synthesis. Permafrost
727
and Periglacial Processes 21, 106116. doi:10.1002/ppp.689.
728
Ruuskanen, M. O., St. Pierre, K. A., St. Louis, V. L., Aris-Brosou, S., and Poulain, A. J. (2018).
729
Physicochemical Drivers of Microbial Community Structure in Sediments of Lake Hazen,
730
Nunavut, Canada. Front. Microbiol. 9. doi:10.3389/fmicb.2018.01138.
731
Selengut, J. D., Haft, D. H., Davidsen, T., Ganapathy, A., Gwinn-Giglio, M., Nelson, W. C., et al.
732
(2007). TIGRFAMs and Genome Properties: tools for the assignment of molecular function and
733
biological process in prokaryotic genomes. Nucleic Acids Res. 35, D260-264.
734
doi:10.1093/nar/gkl1043.
735
Shokralla, S., Spall, J. L., Gibson, J. F., and Hajibabaei, M. (2012). Next-generation sequencing
736
technologies for environmental DNA research. Molecular Ecology 21, 17941805.
737
doi:10.1111/j.1365-294X.2012.05538.x.
738
Soper, J. H., and Powell, J. M. (1985). Botanical studies in the Lake Hazen Region, northern Ellesmere
739
Island, Northwest Territories, Canada.
740
St. Pierre, K. A., St. Louis, V. L., Lehnherr, I., Schiff, S. L., Muir, D. C. G., Poulain, A. J., et al.
741
(2019). Contemporary limnology of the rapidly changing glacierized watershed of the world’s
742
largest High Arctic lake. Scientific Reports. doi:10.1038/s41598-019-39918-4.
743
Staicu, L. C., and Barton, L. L. (2017). “Bacterial Metabolism of Selenium—For Survival or Profit,” in
744
Bioremediation of Selenium Contaminated Wastewater, ed. E. D. van Hullebusch (Cham:
745
Springer International Publishing), 131. doi:10.1007/978-3-319-57831-6_1.
746
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 34
Stoeva, M. K., Aris-Brosou, S., Chételat, J., Hintelmann, H., Pelletier, P., and Poulain, A. J. (2014).
747
Microbial community structure in lake and wetland sediments from a High Arctic polar desert
748
revealed by targeted transcriptomics. PLOS ONE 9, e89531. doi:10.1371/journal.pone.0089531.
749
Suzuki, M. T., and Giovannoni, S. J. (1996). Bias caused by template annealing in the amplification of
750
mixtures of 16S rRNA genes by PCR. Appl. Environ. Microbiol. 62, 625630.
751
Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., et al.
752
(2003). The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41.
753
doi:10.1186/1471-2105-4-41.
754
The Gene Ontology Consortium (2017). Expansion of the Gene Ontology knowledgebase and
755
resources. Nucleic Acids Res. 45, D331D338. doi:10.1093/nar/gkw1108.
756
van der Maaten, L., and Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning
757
Research 9, 25792605.
758
Vavourakis, C. D., Andrei, A.-S., Mehrshad, M., Ghai, R., Sorokin, D. Y., and Muyzer, G. (2018). A
759
metagenomics roadmap to the uncultured genome diversity in hypersaline soda lake sediments.
760
Microbiome 6, 168. doi:10.1186/s40168-018-0548-7.
761
Vigneron, A., Lovejoy, C., Cruaud, P., Kalenitchenko, D., Culley, A., and Vincent, W. F. (2019).
762
Contrasting Winter Versus Summer Microbial Communities and Metabolic Functions in a
763
Permafrost Thaw Lake. Front. Microbiol. 10. doi:10.3389/fmicb.2019.01656.
764
Wang, N. F., Zhang, T., Yang, X., Wang, S., Yu, Y., Dong, L. L., et al. (2016). Diversity and
765
composition of bacterial community in soils and lake sediments from an arctic lake area. Front
766
Microbiol 7. doi:10.3389/fmicb.2016.01170.
767
Wang, N., Guo, Y., Li, G., Xia, Y., Ma, M., Zang, J., et al. (2019). Geochemical-Compositional-
768
Functional Changes in Arctic Soil Microbiomes Post Land Submergence Revealed by
769
Metagenomics. Microbes Environ., ME18091. doi:10.1264/jsme2.ME18091.
770
Wang, Q., Garrity, G. M., Tiedje, J. M., and Cole, J. R. (2007). Naïve Bayesian Classifier for Rapid
771
Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl. Environ. Microbiol.
772
73, 52615267. doi:10.1128/AEM.00062-07.
773
Wurzbacher, C., Nilsson, R. H., Rautio, M., and Peura, S. (2017). Poorly known microbial taxa
774
dominate the microbiome of permafrost thaw ponds. ISME J 11, 19381941.
775
doi:10.1038/ismej.2017.54.
776
Ye, Y., and Doak, T. G. (2009). A Parsimony Approach to Biological Pathway
777
Reconstruction/Inference for Genomes and Metagenomes. PLOS Computational Biology 5,
778
e1000465. doi:10.1371/journal.pcbi.1000465.
779
Yu, G., Smith, D. K., Zhu, H., Guan, Y., and Lam, T. T.-Y. (2017). GGTREE: an R package for
780
visualization and annotation of phylogenetic trees with their covariates and other associated
781
data. Methods in Ecology and Evolution 8, 2836. doi:10.1111/2041-210X.12628.
782
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Metagenome of High Arctic lake sediments 35
Zhang, C., Yang, L., Ding, Y., Wang, Y., Lan, L., Ma, Q., et al. (2017). Bacterial lipid droplets bind to
783
DNA via an intermediary protein that enhances survival under stress. Nature Communications
784
8, 15979. doi:10.1038/ncomms15979.
785
Zhang, Z., Schwartz, S., Wagner, L., and Miller, W. (2000). A greedy algorithm for aligning DNA
786
sequences. J. Comput. Biol. 7, 203214. doi:10.1089/10665270050081478.
787
Zhou, J., Bruns, M. A., and Tiedje, J. M. (1996). DNA recovery from soils of diverse composition.
788
Appl. Environ. Microbiol. 62, 316322.
789
Zhou, J., He, Z., Yang, Y., Deng, Y., Tringe, S. G., and Alvarez-Cohen, L. (2015). High-Throughput
790
Metagenomic Technologies for Complex Microbial Community Analysis: Open and Closed
791
Formats. mBio 6, e02288-14. doi:10.1128/mBio.02288-14.
792
.CC-BY-NC-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/724781doi: bioRxiv preprint first posted online Aug. 5, 2019;
Article
Full-text available
Temperatures in the Arctic are expected to increase dramatically over the next century, and transform high latitude watersheds. However, little is known about how microbial communities and their underlying metabolic processes will be affected by these environmental changes in freshwater sedimentary systems. To address this knowledge gap, we analyzed sediments from Lake Hazen, NU Canada. Here, we exploit the spatial heterogeneity created by varying runoff regimes across the watershed of this uniquely large high-latitude lake to test how a transition from low to high runoff, used as one proxy for climate change, affects the community structure and functional potential of dominant microbes. Based on metagenomic analyses of lake sediments along these spatial gradients, we show that increasing runoff leads to a decrease in taxonomic and functional diversity of sediment microbes. Our findings are likely to apply to other, smaller, glacierized watersheds typical of polar or high latitude ecosystems; we can predict that such changes will have far reaching consequences on these ecosystems by affecting nutrient biogeochemical cycling, the direction and magnitude of which are yet to be determined.
Article
Full-text available
Permafrost thawing results in the formation of thermokarst lakes, which are biogeochemical hotspots in northern landscapes and strong emitters of greenhouse gasses to the atmosphere. Most studies of thermokarst lakes have been in summer, despite the predominance of winter and ice-cover over much of the year, and the microbial ecology of these waters under ice remains poorly understood. Here we first compared the summer versus winter microbiomes of a subarctic thermokarst lake using DNA- and RNA-based 16S rRNA amplicon sequencing and qPCR. We then applied comparative metagenomics and used genomic bin reconstruction to compare the two seasons for changes in potential metabolic functions in the thermokarst lake microbiome. In summer, the microbial community was dominated by Actinobacteria and Betaproteobacteria, with phototrophic and aerobic pathways consistent with the utilization of labile and photodegraded substrates. The microbial community was strikingly different in winter, with dominance of methanogens, Planctomycetes, Chloroflexi and Deltaproteobacteria, along with various taxa of the Patescibacteria/Candidate Phyla Radiation (Parcubacteria, Microgenomates, Omnitrophica, Aminicenantes). The latter group was underestimated or absent in the amplicon survey, but accounted for about a third of the metagenomic reads. The winter lineages were associated with multiple reductive metabolic processes, fermentations and pathways for the mobilization and degradation of complex organic matter, along with a strong potential for syntrophy or cross-feeding. The results imply that the summer community represents a transient stage of the annual cycle, and that carbon dioxide and methane production continue through the prolonged season of ice cover via a taxonomically distinct winter community and diverse mechanisms of permafrost carbon transformation.
Article
Full-text available
Lezak B, Varacallo M. Anatomy, Bony Pelvis and Lower Limb, Foot Veins. [Updated 2019 Jun 6]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2019 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK542295/
Article
Full-text available
Lakes of meltwater in the Artic have become one of the transforming landscape changes under global warming. We herein compared microbial communities between sediments and bank soils at an arctic lake post land submergence using geochemistry, 16S rRNA amplicons, and metagenomes. The results obtained showed that each sample had approximately 2,609 OTUs on average and shared 1,716 OTUs based on the 16S rRNA gene V3-V4 region. Dominant phyla in sediments and soils included Proteobacteria, Acidobacteria, Actinobacteria, Gemmatimonadetes, and Nitrospirae; sediments contained a unique phylum, Euryarchaeota, with the phylum Thaumarchaeota being primarily present in bank soils. Among the top 35 genera across all sites, 17 were more abundant in sediments, while the remaining 18 were more abundant in bank soils; seven out of the top ten genera across all sites were only from sediments. A redundancy analysis separated sediment samples from soil samples based on the components of nitrite and ammonium. Metagenome results supported the role of nitrite because most of the genes for denitrification and methane metabolic genes were more abundant in sediments than in soils, while the abundance of phosphorus-utilizing genes was similar and, thus, was not a significant explanatory factor. We identified several modules from the global networks of OTUs that were closely related to some geochemical factors, such as pH and nitrite. Collectively, the present results showing consistent changes in geochemistry, microbiome compositions, and functional genes suggest an ecological mechanism across molecular and community levels that structures microbiomes post land submergence.
Article
Full-text available
Glacial runoff is predicted to increase in many parts of the Arctic with climate change, yet little is known about the biogeochemical impacts of meltwaters on downstream freshwater ecosystems. Here we document the contemporary limnology of the rapidly changing glacierized watershed of the world's largest High Arctic lake (Lake Hazen), where warming since 2007 has increased delivery of glacial meltwaters to the lake by up to 10-times. Annually, glacial meltwaters accounted for 62-98% of dissolved nutrient inputs to the lake, depending on the chemical species and year. Lake Hazen was a strong sink for No 3 −-NO 2 − , NH 4 + and DOC, but a source of DIC to its outflow the Ruggles River. Most nutrients entering Lake Hazen were, however, particle-bound and directly transported well below the photic zone via dense turbidity currents, thus reinforcing ultraoligotrophy in the lake rather than overcoming it. For the first time, we apply the land-to-ocean aquatic continuum framework in a large glacierized Arctic watershed, and provide a detailed and holistic description of the physical, chemical and biological limnology of the rapidly changing Lake Hazen watershed. Our findings highlight the sensitivity of freshwater ecosystems to the changing cryosphere, with implications for future water quality and productivity at high latitudes.
Article
Full-text available
Although freshwater biomes cover less than 1% of the Earth’s surface, they have disproportionate ecological significances. Attempts to study the taxonomy and function of freshwater microbiota are currently limited to samples collected from temperate lakes. In this study, we investigated samples from the photic and aphotic of an aquaculture site (disturbed) of Temengor Lake, a tropical lake in comparison with the undisturbed site of the lake using 16S rRNA amplicon and shotgun metagenomic approaches. Vertical changes in bacterial community composition and function of the Temengor Lake metagenomes were observed. The photic water layer of Temengor Lake was dominated by typical freshwater assemblages consisting of Proteobacteria, Actinobacteria, Bacteroidetes, Verrucomicrobia, and Cyanobacteria lineages. On the other hand, the aphotic water featured in addition to Proteobacteria, Bacteroidetes, Verrucomicrobia, and two more abundant bacterial phyla that are typically ubiquitous in anoxic habitats (Chloroflexi and Firmicutes). The aphotic zone of Temengor Lake exhibited genetic potential for nitrogen and sulfur metabolisms for which terminal electron acceptors other than oxygen are used in the reactions. The aphotic water of the disturbed site also showed an overrepresentation of genes associated with the metabolism of carbohydrates, likely driven by the enrichment of nutrient resulting from aquaculture activities at the site. The results presented in this study can serve as a basis for understanding the structure and functional capacity of the microbial communities in the photic and aphotic zones/water layers of tropical man-made lakes.
Article
Full-text available
Background: Hypersaline soda lakes are characterized by extreme high soluble carbonate alkalinity. Despite the high pH and salt content, highly diverse microbial communities are known to be present in soda lake brines but the microbiome of soda lake sediments received much less attention of microbiologists. Here, we performed metagenomic sequencing on soda lake sediments to give the first extensive overview of the taxonomic diversity found in these complex, extreme environments and to gain novel physiological insights into the most abundant, uncultured prokaryote lineages. Results: We sequenced five metagenomes obtained from four surface sediments of Siberian soda lakes with a pH 10 and a salt content between 70 and 400 g L-1. The recovered 16S rRNA gene sequences were mostly from Bacteria, even in the salt-saturated lakes. Most OTUs were assigned to uncultured families. We reconstructed 871 metagenome-assembled genomes (MAGs) spanning more than 45 phyla and discovered the first extremophilic members of the Candidate Phyla Radiation (CPR). Five new species of CPR were among the most dominant community members. Novel dominant lineages were found within previously well-characterized functional groups involved in carbon, sulfur, and nitrogen cycling. Moreover, key enzymes of the Wood-Ljungdahl pathway were encoded within at least four bacterial phyla never previously associated with this ancient anaerobic pathway for carbon fixation and dissimilation, including the Actinobacteria. Conclusions: Our first sequencing effort of hypersaline soda lake sediment metagenomes led to two important advances. First, we showed the existence and obtained the first genomes of haloalkaliphilic members of the CPR and several hundred other novel prokaryote lineages. The soda lake CPR is a functionally diverse group, but the most abundant organisms in this study are likely fermenters with a possible role in primary carbon degradation. Second, we found evidence for the presence of the Wood-Ljungdahl pathway in many more taxonomic groups than those encompassing known homo-acetogens, sulfate-reducers, and methanogens. Since only few environmental metagenomics studies have targeted sediment microbial communities and never to this extent, we expect that our findings are relevant not only for the understanding of haloalkaline environments but can also be used to set targets for future studies on marine and freshwater sediments.
Article
Full-text available
Taxonomy is an organizing principle of biology and is ideally based on evolutionary relationships among organisms. Development of a robust bacterial taxonomy has been hindered by an inability to obtain most bacteria in pure culture and, to a lesser extent, by the historical use of phenotypes to guide classification. Culture-independent sequencing technologies have matured sufficiently that a comprehensive genome-based taxonomy is now possible. We used a concatenated protein phylogeny as the basis for a bacterial taxonomy that conservatively removes polyphyletic groups and normalizes taxonomic ranks on the basis of relative evolutionary divergence. Under this approach, 58% of the 94,759 genomes comprising the Genome Taxonomy Database had changes to their existing taxonomy. This result includes the description of 99 phyla, including six major monophyletic units from the subdivision of the Proteobacteria, and amalgamation of the Candidate Phyla Radiation into a single phylum. Our taxonomy should enable improved classification of uncultured bacteria and provide a sound basis for ecological and evolutionary studies.
Article
Full-text available
This study describes microbial community compositions, and various cold-responsive stress genes, encompassing cold-induced proteins (CIPs) and cold-associated general stress-responsive proteins (CASPs) in selected Antarctic lake water, sediment, and soil metagenomes. Overall, Proteobacteria and Bacteroidetes were the major taxa in all metagenomes. Prochlorococcus and Thiomicrospira were highly abundant in waters, while Myxococcus, Anaeromyxobacter, Haliangium, and Gloeobacter were dominant in the soil and lake sediment metagenomes. Among CIPs, genes necessary for DNA replication, translation initiation, and transcription termination were highly abundant in all metagenomes. However, genes for fatty acid desaturase (FAD) and trehalose synthase (TS) were common in the soil and lake sediment metagenomes. Interestingly, the Lake Untersee water and sediment metagenome samples contained histone-like nucleoid structuring protein (H-NS) and all genes for CIPs. As for the CASPs, high abundances of a wide range of genes for cryo- and osmo-protectants (glutamate, glycine, choline, and betaine) were identified in all metagenomes. However, genes for exopolysaccharide biosynthesis were dominant in Lake Untersee water, sediment, and other soil metagenomes. The results from this study indicate that although diverse microbial communities are present in various metagenomes, they share common cold-responsive stress genes necessary for their survival and sustenance in the extreme Antarctic conditions.
Book
The Intergovernmental Panel on Climate Change (IPCC) is the leading international body for assessing the science related to climate change. It provides regular assessments of the scientific basis of climate change, its impacts and future risks, and options for adaptation and mitigation. This IPCC Special Report is a comprehensive assessment of our understanding of global warming of 1.5°C, future climate change, potential impacts and associated risks, emission pathways, and system transitions consistent with 1.5°C global warming, and strengthening the global response to climate change in the context of sustainable development and efforts to eradicate poverty. It serves policymakers, decision makers, stakeholders and all interested parties with unbiased, up-to-date, policy-relevant information. This title is also available as Open Access on Cambridge Core.