ArticlePDF Available

Deep nirS amplicon sequencing of San Francisco Bay sediments enables prediction of geography and environmental conditions from denitrifying community composition

Authors:
Article

Deep nirS amplicon sequencing of San Francisco Bay sediments enables prediction of geography and environmental conditions from denitrifying community composition

Abstract and Figures

Denitrification is a dominant nitrogen loss process in the sediments of San Francisco Bay. In this study, we sought to understand the ecology of denitrifying bacteria by using next-generation sequencing (NGS) to survey the diversity of a denitrification functional gene, nirS (encoding cytchrome-cd1 nitrite reductase), along the salinity gradient of San Francisco Bay over the course of a year. We compared our dataset to a library of nirS sequences obtained previously from the same samples by standard PCR cloning and Sanger sequencing, and showed that both methods similarly demonstrated geography, salinity, and, to a lesser extent, nitrogen, to be strong determinants of community composition. Furthermore, the depth afforded by NGS enabled novel techniques for measuring the association between environment and community composition. We used Random Forests modeling to demonstrate that the site and salinity of a sample could be predicted from its nirS sequences, and to identify indicator taxa associated with those environmental characteristics. This work contributes significantly to our understanding of the distribution and dynamics of denitrifying communities in San Francisco Bay, and provides valuable tools for the further study of this key N-cycling guild in all estuarine systems. This article is protected by copyright. All rights reserved.
Content may be subject to copyright.
Deep nirS amplicon sequencing of San Francisco Bay
sediments enables prediction of geography and
environmental conditions from denitrifying community
composition
Jessica A. Lee
and Christopher A. Francis*
Department of Earth System Science, Stanford
University, Stanford, CA, USA.
Summary
Denitrification is a dominant nitrogen loss process
in the sediments of San Francisco Bay. In this study,
we sought to understand the ecology of denitrifying
bacteria by using next-generation sequencing
(NGS) to survey the diversity of a denitrification
functional gene, nirS (encoding cytchrome-cd
1
nitrite reductase), along the salinity gradient of San
Francisco Bay over the course of a year. We com-
pared our dataset to a library of nirS sequences
obtained previously from the same samples by stan-
dard PCR cloning and Sanger sequencing, and
showed that both methods similarly demonstrated
geography, salinity and, to a lesser extent, nitrogen,
to be strong determinants of community composi-
tion. Furthermore, the depth afforded by NGS
enabled novel techniques for measuring the associ-
ation between environment and community
composition. We used Random Forests modelling
to demonstrate that the site and salinity of a sample
could be predicted from its nirS sequences, and to
identify indicator taxa associated with those envi-
ronmental characteristics. This work contributes
significantly to our understanding of the distribu-
tion and dynamics of denitrifying communities in
San Francisco Bay, and provides valuable tools for
the further study of this key N-cycling guild in all
estuarine systems.
Introduction
Denitrification, the anaerobic transformation of nitrate to
gaseous products such as nitrous oxide and dinitrogen, is
commonly recognized as the dominant nitrogen loss pro-
cess in estuarine sediments (Ward, 2013). In fact, the
environmental importance of this process has been recog-
nized since the 1800s (e.g., Smith, 1871); yet, the
responsible organisms, and their abundance, diversity, and
distribution, are only beginning to be understood.
One of the most frequently used methods for studying
the ecology of denitrifiers in the environment is targeted
PCR amplification and sequencing of a key functional
gene of the denitrification pathway, nirS, encoding cyto-
chrome-cd
1
nitrite reductase, which catalyses the first
committed step to a gaseous product (NO2
2!NO). In this
study, we present an exhaustive survey of nirS diversity in
San Francisco Bay, following denitrifier communities as
they change spatially along the salinity gradient, and tem-
porally over the course of a year.
San Francisco Bay has a watershed that encompasses
nearly half of the state of California, and the estuary is sub-
ject to substantial anthropogenic influence. The greatest
nitrogen sources are agricultural runoff and municipal
wastewater treatment (Hager and Schemel, 1992). The
fate of these inputs is a matter of concern due to their
bearing on primary productivity; however, nitrogen fluxes in
the estuary have only recently begun to be studied in depth
(Cornwell et al., 2014; Damashek et al., 2016). Previously,
we conducted a survey of sediment denitrifier populations
using nirS PCR clone libraries and Sanger sequencing
(Lee and Francis, 2017). However, as has been the case
in many other environmental studies of denitrifiers based
on clone libraries (e.g., Braker et al., 1998; Santoro et al.,
2006; Abell et al., 2013), these methods fell short of cover-
ing the enormous diversity of the denitrifier populations
present in San Francisco Bay sediments. We therefore
turned to the next-generation sequencing (NGS) platform
known as Ion Torrent Personal Genome Machine (PGM),
to re-examine the same samples at much higher sequence
coverage and also higher temporal resolution.
Received 8 January, 2017; accepted 26 August, 2017. *For corre-
spondence. E-mail: caf@stanford.edu; Tel. (1650) 724 0301; Fax
(1650) 725 2199.
Present address: Department of Biological Sci-
ences, University of Idaho, Moscow, ID, USA
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd
Environmental Microbiology (2017) 00(00), 00–00 doi:10.1111/1462-2920.13920
A small number of studies have also recently adapted
NGS techniques to target the nirS gene in environments
such as salt marsh sediment (Bowen et al., 2013), boreal
lake sediment (Saarenheimo et al., 2015), and estuarine
sediment (Decleyre et al., 2015). One striking finding in
common among all of these studies is the measurement of
greater nirS diversity than had been described previously.
Whereas earlier clone library studies typically failed to sat-
urate the predicted OTU richness in the functional gene
(Santoro et al., 2006; Mosier and Francis, 2010; Francis
et al., 2013), more recent studies using 454 pyrosequenc-
ing achieved coverage values of >90% (Decleyre et al.,
2015; Saarenheimo et al., 2015) and a higher number of
observed OTUs. Notably, NGS-based studies have
employed a broader OTU definition (80–90% similarity)
than the 95% similarity typically used by clone library stud-
ies of nirS, which means that the increase in observed
diversity is actually even greater than it appears. In short,
until now, nirS diversity has likely been chronically under-
sampled. Here, by applying NGS to samples that have
already been sequenced using clone libraries, we have the
unique opportunity to assess the impact of both deeper
sequencing and a shorter amplicon length on the ecologi-
cal conclusions we have drawn – key knowledge if future
studies using NGS for nirS sequencing are to be compared
to the clone library studies of the previous decades.
Both the undersampling of nirS diversity, as well as a
lack of rigorous temporal analyses, may contribute to our
inability thus far to accurately assess the contributions of
environmental factors in shaping that diversity. With the
possible exception of salinity, there is little agreement as to
which other factors (e.g., nitrogen availability, temperature,
etc.) are most important in influencing denitrifier diversity
(Jones and Hallin, 2010). In this study, we took advantage
of our extremely deep sequencing datasets by carrying out
a novel application of the machine learning technique,
Random Forests (RF), to elucidate ecological trends in the
San Francisco Bay denitrifying community. Here, we show
that Random Forests can be used to link microbial commu-
nities with their environments in a way that is not only
descriptive but, in some cases, even predictive; further-
more, RF can be used to lend focus to the analysis of a
complex microbial community by identifying the specific
taxa that are most strongly linked to a particular environ-
mental condition.
Results and discussion
Overall community structure
We sampled sediment at five sites in central and northern
San Francisco Bay (Fig. 1A) on a monthly basis between
July 2011 and June 2012; we extracted DNA from seven
of those months, and amplified and sequenced nirS using
Ion Torrent PGM. After resampling, the PGM nirS dataset
consisted of 455 000 reads evenly distributed in 35 sam-
ples, clustered into 1776 OTUs at 88% sequence identity.
Species accumulation curves indicated that sequencing
had sampled the majority of the diversity present (Fig. 1B).
Chao 1 richness estimates ranged from 227 to 684 OTUs
at each site, and the number of OTUs observed ranged
from 72% to 93% of the Chao 1 estimates, with a median
of 86% (Fig. 1C,D). While richness estimates did not differ
significantly among sites, evenness did: Simpson diversity
at site 8.1 (salinity 52–15 psu) was significantly higher
than at the higher-salinity sites (15–30 psu), even when
averaging across the whole year (p<0.05 for sites 13, 21
and 24 by Dunn’s test) (Fig. 1E,F). Site 8.1 was also the
only station at which distinct heterogeneity in sediment
particle size (i.e., both sand and clay) was regularly
observed, and the differences in associated microbiota
may have played a role in the high nirS unevenness
observed there; however, we did not do a systematic study
on the effect of sediment type on community composition.
Site 13 showed the widest range of temporal variation in
both richness and evenness, which may be associated
with the site’s relatively high temporal variability in salinity
and in sediment characteristics such as total nitrogen and
carbon (Supporting Information Fig. S1).
Relationship of environment and community composition
Weighted Unifrac analysis and Principal Coordinates Anal-
ysis (PCoA) analysis showed that nirS community
composition was strongly associated with site of sampling
(Fig. 2A). The greatest explained variation (61.1%)
occurred along PCoA Axis 1, which also correlated
strongly with salinity (Pearson’s r50.851, p<0.001 and
Fig. 2B). Axis 2, which describes 11.1% of the variation,
was correlated with sediment total N concentration (Pear-
son’s r50.395, p<0.001 and Fig. 2C). Although salinity
also correlated negatively with sediment Cu and positively
with S, and total N correlated positively with Cu and S
(Supporting Information Fig. S2), salinity and N had the
strongest relationships with the first two PCoA axes.
PERMANOVA was used to quantify the degree to which
environmental variables could explain the Unifrac distan-
ces among samples. We tested the effects of geography
(Site) and time (Month) alone, in addition to a linear combi-
nation of seven environmental characteristics of the
sediment and bottom water. Of the many measurements
made (Supporting Information Fig. S1), the variables used
for the model were selected based on previous evidence
for their role in influencing denitrifier communities, and
were limited in number in order to minimize collinearity
among variables. It should be noted that a few of the seven
variables covary strongly with others not included in the
model: for instance, total N is strongly correlated with total
C and therefore may be interpreted as an indicator of
2J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
sediment organic matter (Pearson’s r50.86, p<0.001); S
is correlated with Pb (r50.57, p<0.001) and Cu is corre-
lated with Fe, Mn and P (r50.73, 0.57, 0.57 respectively,
p<0.001). Relationships among these characteristics of
the samples are shown in Supporting Information Figs S1
and S2.
Taken alone, site was highly explanatory of microbial
community composition (R
2
50.76; p50.001). In a multi-
variate linear model composed of environmental variables,
salinity clearly showed the strongest relationship
(R
2
50.44; p50.001), with smaller but statistically signifi-
cant effects from bottom water NH1
4(R
2
50.060;
p50.009) and sediment total N content (R
2
50.064;
p50.003). Contributing yet smaller influences were sedi-
ment Cu (R
2
50.044, p50.023) and S (R
2
50.033,
p50.042), and bottom water temperature (R
2
50.035,
p50.035) (Table 1).
The strength of association we found between microbial
community and geographic origin agrees with previous
studies showing a strong association between habitat and
microbial community (The Human Microbiome Consor-
tium, 2012; Bokulich et al., 2014). However, because
environment often co-varies with geographic distance, we
tested whether distance or time of sampling could provide
an alternative interpretation of the community differences
between samples. Mantel tests were used to assess the
correlation between the Unifrac distance matrix and each
of three potentially explanatory distance matrices: one
based on geographic distances among sites, one based
on temporal distances between sampling dates, or one
based on the same seven environmental variables tested
in PERMANOVA. Mantel tests revealed no relationship
between Unifrac and time (i.e., samples taken more
closely in time were not more similar to each other), but
there was a significant correlation between Unifrac and
geography when controlling for environment, and a stron-
ger correlation between Unifrac and environment when
controlling for geography (Table 2). It thus remains difficult
to discern how much of the observed diversity patterns are
due to environmental filtering and how much due to
South Bay
San Pablo Bay Suisun
Bay
Central Bay
24
21
13 8.1
4.1
0 200 400
Reads sequenced
Observed OTUs
0 2000 4000 6000 8000 10000 12000
200 300 400 500
Observed OTUs
200 400 600
Chao1richness
0 20406080
Inverse Simpson
−11 −9−8 −7 6
SES PD
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
-10
2011 2012
A
B
C
D
E
F
Fig. 1. A. Map of sample stations. Station numbers are those given by the US Geological Survey for their Water Quality of San Francisco Bay
monitoring program. Map tiles were modified from Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL. Colours used in
panels B–F correspond to sample site as indicated in panel A. B. Collector’s curve for nirS OTUs in the resampled dataset (13 000 reads per
sample), showing the relationship of new OTUs observed (at the 88% identity cut-off level) to the number of reads sequenced. Each curve
represents one of the 35 sequenced sediment samples, colour-coded by site and was generated by random sampling of all reads in the
sample. Vertical bars represent standard error. C–F. Calculated alpha-diversity metrics for each site across time. C. Observed OTUs: number
of OTUs observed, at 88% cut-off level. D. Chao1 richness: Chao 1 richness estimator, a prediction of the total number of OTUs in the
sample. Vertical bars represent the standard error of estimates. E. Inverse Simpson: 1/the Simpson diversity metric; higher values of Inverse
Simpson correspond to greater diversity. F. SES-PD: standardized effect size of phylogenetic diversity, a measure of branch-length-based
phylogenetic diversity, normalized for the number of individuals in the sample.
San Francisco Bay deep nirS amplicon sequencing 3
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
dispersal (e.g., sediment transport), for which net move-
ment is generally seaward from the river delta
(Schoellhamer et al., 2012). As further information
becomes available on the patterns of microbial diversity in
San Francisco Bay sediments, studies relating sediment
transport and circulatory patterns to microbial diversity
could help elucidate the relative importance of selection
versus dispersal for structuring the microbial communities
in the estuary.
Comparison of sequence coverage by PGM sequencing
and clone library
To establish context for comparing our results to earlier
studies based on clone libraries and Sanger sequencing,
we analysed our library produced by Ion Torrent PGM
sequencing (4.55 310
5
reads, 215–265 bp long) along-
side a library of nirS clones (6.6 310
2
reads, 797–890 bp
Axis.1 [61.1%]
Axis.2 [11.1%]
Axis.1 [61.1%]
Axis.2 [11.1%]
Axis.1 [61.1%]
Axis.2 [11.1%]
0.025
0.050
0.075
0.100
Ntot
10
20
30
Sal
Jul
Sep
Oct
Nov
Jan
Mar
May
4.1
8.1
13
21
24
Month
A
B
C
Site
Fig. 2. Principal coordinates analysis (PCoA) plots of weighted
Unifrac distances among the samples of the nirS PGM sequence
dataset. The same plot is shown with three different legends.
A. Colours indicate site and shapes indicate month of sampling.
B. Colour indicates salinity of bottom water (PSU) (Sal).
C. Colour indicates total nitrogen content of sediment (% by mass)
(Ntot).
Table 1. Comparison of PERMANOVA results of four nirS sequence
libraries.
Dataset Model Variable df R
2
p
Full PGM Site Site 4 0.76 0.001
Month Month 6 0.053 1
7 variables NO2
31 0.010 0.40
Temp 1 0.035 0.035
N
tot
1 0.064 0.003
NH1
41 0.060 0.009
Sal 1 0.44 0.001
Cu 1 0.044 0.023
S 1 0.033 0.042
Residuals 13 0.35
Full Clone
Library
Site Site 4 0.35 0.005
Month Month 3 0.20 0.17
7 variables NO2
31 0.032 0.68
Temp 1 0.048 0.31
N
tot
1 0.14 0.006
NH1
41 0.054 0.23
Sal 1 0.13 0.002
Cu 1 0.050 0.28
S 1 0.036 0.60
Residuals 13 0.55
PGM, shared
OTUs only
Site Site 4 0.73 0.001
Month Month 3 0.070 0.99
7 variables NO2
31 0.029 0.32
Temp 1 0.053 0.097
N
tot
1 0.077 0.028
NH1
41 0.12 0.005
Sal 1 0.30 0.001
Cu 1 0.057 0.089
S 1 0.042 0.15
Residuals 13 0.36
Clone Library,
shared
OTUs only
Site Site 4 0.41 0.001
Month Month 3 0.16 0.47
7 variables NO2
31 0.060 0.11
Temp 1 0.052 0.16
N
tot
1 0.063 0.061
NH1
41 0.071 0.038
Sal 1 0.17 0.001
Cu 1 0.063 0.068
S 1 0.047 0.26
Residuals 13 0.52
Each Unifrac distance matrix was tested against three PERMANOVA
models: site alone; month alone or a linear combination of seven
physicochemical variables. The variables were added in the order
listed in the table, with no interactions. df5degrees of freedom. ‘Full
PGM’ refers to the full PGM sequence library (35 samples); ‘Full
Clone Library’ refers to the full clone library (20 samples); ‘PGM,
shared OTUs only’ refers to the PGM library subsetted only to the
187 OTUs and 20 samples that were present in both PGM and clone
library; ‘Clone Library, shared OTUs only’ refers to the clone library
subsetted only to the 187 shared OTUs. Entries in bold are significant
at p<0.05. Abbreviations: Sal: salinity of bottom water. NO
3
:dis-
solved nitrate in bottom water. Tem p : temperature of bottom water.
NH
4
: dissolved ammonium in bottom water. Ntot: total nitrogen of the
dried sediment, by mass. Fe: total iron content in sediment, by mass.
S: total sulfur content in dried sediment, by mass.
4J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
long) previously generated from a subset of the same sam-
ples (Lee and Francis, 2017). The clone library and PGM
datasets were generated using the same forward primer
but different reverse primers; in both cases, we clustered
sequences into OTUs at the 88% identity level. In agree-
ment with other studies using NGS (Bowen et al., 2013;
Decleyre et al., 2015; Saarenheimo et al., 2015), the
deeper sampling provided by PGM increased estimates of
the total diversity in each sample. The Chao 1 richness
estimate for the clone library ranged between 8 and 179
OTUs per sample with a mean of 50.5; for the PGM data-
set it ranged between 226 and 683 OTUs per sample with
a mean of 441.2 (Fig. 1D). In summary, increasing
sequencing depth by three orders of magnitude increased
the estimate of total species richness by one order of mag-
nitude. In addition, the confidence of the estimate
increased dramatically: in the clone libraries, the standard
error on the Chao 1 metric was 20–57% of the estimated
value, whereas it was 3–7% of the estimated value for the
PGM dataset.
We anticipated that low-abundance clades would
have less representation in the clone library, but that
high-abundance clades would be present in both librar-
ies; and that if sampling was unbiased, the sequences
detected in the clone library would be distributed evenly
across the phylogeny of all San Francisco Bay nirS
sequences. To test these hypotheses, USEARCH was
used to map clone library sequences to the OTUs from
the PGM dataset. This analysis revealed that 523 (79%)
clone sequences mapped to 187 OTUs from the PGM
dataset, and 137 (21%) clone sequences did not map.
Conversely, 1592 (90%) OTUs in the PGM dataset had
no representatives in the clone library (Supporting infor-
mation Fig. S3). The 187 (10%) of OTUs that did map
were more abundant than average, as they included
1.68 310
5
(37%) of the 1.55 310
5
sequence reads in
the PGM dataset; however, counter to our hypothesis,
they were not the most abundant OTUs (Fig. 3A). Fur-
thermore, quantitative measures showed that the clone
library sequences were clustered, rather than being
evenly distributed across the nirS phylogeny. The Stan-
dardized Effect Size of the Mean Pairwise Distance
(SES-MPD) of the clone library OTUs with respect to
the full tree was 26.44 (p50.001), and Standardized
Effect Size of the Mean Nearest Taxon Distance (SES-
MNTD) was 28.16 (p50.001). (For both metrics, nega-
tive values indicate that taxa are more closely related
than would be expected by chance (Vamosi et al.,
2009).) In particular, one clade was highly represented
in the clone library but not represented at all in the PGM
dataset (Supporting Information Fig. S3); clearly distinct
from cultured Proteobacteria nirS sequences, this clus-
ter appeared most closely related to the Deinococcus-
Thermus group, though only at a level of 50% pair-
wise nucleotide identity.
To investigate the possibility that the different reverse
primer was responsible for the difference in coverage
between the two libraries, we compared the clone
library sequences that did not map to PGM library OTUs
with those that did, and specifically examined the region
where primer nirS-q-R (the reverse primer used for
amplification prior to PGM sequencing) fell. Among the
clone sequences that mapped to the PGM library, the
majority (53%) matched nirS-q-R perfectly, whereas
among the clone sequences that did not map to the
PGM library, the majority (67%) had two mismatches
(over 20-bp) and only 12% matched perfectly (Support-
ing Information Fig. S4). Mismatches in the reverse
primer might therefore partially explain difference in
coverage between the two datasets.
Comparison of ecological trends by PGM sequencing
and clone library
It is possible for different sequencing methods to sample at
different depths and yet produce similar results in mea-
sures of the emergent ecological properties of the
community. To investigate this possibility, we directly com-
pared the two nirS datasets in terms of diversity and the
Table 2. Results of Mantel and Partial Mantel tests to assess correlation among distance matrices describing samples in the PGM dataset.
Time Geography|Environment Environment|Geography Geography Environment
Unifrac r
MANTEL
520.034,
p50.66
r
MANTEL
50.36,
p50.001
r
MANTEL
50.66,
p50.001
r
MANTEL
50.74,
p50.001
r
MANTEL
50.55,
p50.001
Time r
MANTEL
520.063,
p50.98
r
MANTEL
520.0066,
p50.51
Geography r
MANTEL
50.45,
p50.001
The Time matrix was based on the number of days between sampling times; the Geography matrix was based on distance between sites as
measured along a transect passing through the USGS Water Quality monitoring stations; the Environment matrix was calculated from a Princi-
pal Components Analysis based on seven environmental variables (salinity, temperature, bottom water nitrate concentration, bottom water
ammonium concentration, sediment total N content, sediment total Cu content and sediment total S content). ‘Geography|Environment’ denotes
a test of the correlation with geography while holding environment constant.
San Francisco Bay deep nirS amplicon sequencing 5
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
relationship of community composition to environment, in
the 20 samples that were sequenced using both methods.
We initially conducted the analysis by comparing alpha-
diversity between the two datasets, including all the OTUs
that had been found using either sequencing method.
Because PGM yielded a much higher number of OTUs
overall, both richness metrics tested were approximately
one order of magnitude higher in the PGM dataset than in
the clone library dataset for all samples, but trends among
samples were conserved between the two datasets (for
Observed OTUs, Spearman’s rho 50.50, p50.025
between the two datasets; for Chao 1, Spearman’s
rho 50.44, p50.055). In contrast, evenness (Inverse
Simpson, Shannon Diversity, and the Standardized Effect
Size of Phylogenetic Diversity) in the PGM samples was
not correlated with evenness in the clone library samples
(Table 3). Additionally, the Unifrac distance matrices of the
two sample sets (beta-diversity) showed no significant sim-
ilarity by Mantel test (r
MANTEL
520.16 and p50.98).
To deduce whether the differences in evenness and beta-
diversity could be attributed to the greater phylogenetic
coverage of the PGM dataset, we repeated the analysis
with only the 187 OTUs that both shared. The abundances
of each of those shared OTUs did correlate between the
two datasets (Spearman’s rho 50.43, p<0.001) (Fig. 3B).
However, when Unifrac matrices were calculated from either
of the two datasets using only those 187 OTUs, the Unifrac
matrices of the reduced datasets each resembled the matri-
ces of the original datasets more closely than they did each
other (for PGM, r
MANTEL
50.92, p50.001 between Unifrac
Fig. 3. Comparison of clone library and PGM dataset, based on the OTUs detected in both.
A. Rank-abundance plot of 100 most abundant OTUs in the PGM dataset. Bars are colour-coded by whether they were also detected in the
clone library (red) or not detected (grey). OTU identification numbers are omitted for clarity. Read abundance of each OTU is shown as its total
abundance in the resampled library of 4.55 310
5
reads.
B. Abundances of shared OTUs in each of the two datasets. Spearman’s rho 50.43, p<0.001.
C. Principal Coordinates Analysis (PCoA) plot of Unifrac distances among samples in the clone library, when analysis was reduced to include
only the OTUs that were also present in the PGM dataset.
D. Principal Coordinates Analysis (PCoA) plot of Unifrac distances among samples in the PGM dataset, when analysis was reduced to include
only the 20 samples that were also sequenced in the clone library, and only the OTUs that were also present in the clone library.
Table 3. Spearman rank correlations between alpha-diversity met-
rics of PGM dataset and clone library.
Metric rho p
Observed species 0.498 0.025
Chao richness 0.442 0.051
Inverse Simpson 0.008 0.977
Shannon diversity 0.119 0.617
SES-PD 0.194 0.411
Only the samples in common between the two datasets (n520)
were included.
6J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
distance matrices of reduced and original libraries; for the
clone library, r
MANTEL
50.48, p50.001; between reduced
PGM and reduced clone library, r
MANTEL
520.16,
p50.98), indicating differences in the degree to which each
OTU was represented in the different datasets.
Nevertheless, PERMANOVA analysis indicated that all
four sequence datasets – the full clone library and full
PGM library, as well as the two libraries reduced to just the
187 shared OTUs – showed similar statistical relationships
between environment and community composition (Table
1, Fig. 3C,D). Specifically, when taken alone, site was
always significant; and in a multivariate model, salinity was
always significant (in agreement with its importance as a
PCoA axis), and a significant influence was always found
from either ammonium or total N or both. The fact that our
findings are consistent across both methods of sequencing
may indicate that the slight differences between the two
sequencing methods have little influence on our ability to
detect ecological trends. This result bodes well for meta-
analyses attempting to compare studies using sequencing
methods with very different depths of sequencing.
Importantly, several of the weaker environmental effects
were found to be significant by PERMANOVA in the full
PGM dataset but not in the others; and the total variance
explained by environmental variables was much higher in
the PGM dataset than in the clone library dataset. Simi-
larly, in the Unifrac PCoA analysis of the PGM library, the
axes explained more variance, and samples separated by
site more clearly, than they did in the clone library. Thus,
the PGM dataset was more sensitive at detecting ecologi-
cal relationships by these methods.
Random forests for modelling and prediction of
environmental variables
The wealth of data available within the PGM nirS datasets,
and the observation that nirS communities separate clearly
by site and possibly also by salinity and sediment nitrogen,
prompted us to ask the question: if we know the composi-
tion of the microbial population in a sample, how well can
we predict the environmental conditions at the location
sampled? This question, posed in the reverse direction rel-
ative to most ecological classification problems, would
allow us to test the hypothesis of the community-
environment association in a different way, and simulta-
neously interrogate our libraries to determine whether
there are individual taxa that show particularly strong asso-
ciations with specific environmental conditions. The high
dimensionality of microbial community composition data
requires the use of a method that can accommodate a
very large, sparse set of input variables with unknown
probability distributions. We therefore tested the potential
of machine learning to predict the location or environment
of a sample based on the abundances of the 1776 nirS
OTUsfoundinthatsample.
Several different machine learning models have been
used by ecologists in recent decades, almost always to
predict species abundance from environmental data and
not vice versa (e.g., Parkhurst et al., 2005; Cutler et al.,
2007; Kampichler et al., 2010). Among the most popular
are Support Vector Machines (SVM), Artificial Neural Net-
works (ANN) and Random Forests (RF); whereas Linear
Discriminant Analysis (LDA) is a more classical statistical
method often used for similar problems. For this proof-of-
concept, we chose to use Random Forests, an ensemble
method in which many classification and regression trees
are built from subsets of the data using a bootstrap aggre-
gation technique, and classification is carried out by
majority vote (Breiman, 2001; Cutler et al., 2007). RF
offers multiple advantages over other methods: it supports
non-linear relationships among variables (unlike LDA), has
a relatively transparent model structure (compared to
ANN), can be used for both regression and classification
into multiple classes (unlike LDA or standard SVM), has
remarkable robustness against overfitting even without
prior dimensionality reduction (compared to ANN), requires
specification of just a small number of parameters with
minimal tuning (compared with SVM), and it is fast and
parallelizable and has relatively low computational require-
ments (Guyon and Elisseeff, 2003; Cutler et al., 2007;
Olden et al., 2008; Kampichler et al., 2010; Crisci et al.,
2012; Pappu and Pardalos, 2014).
One particularly appealing characteristic of RF is its
embedded method for variable selection, which has the
potential to identify individual variables (in our case,
OTUs) that are important to a model because they are
associated with the sample characteristic being pre-
dicted. The importance of each variable is calculated
during model construction as the increase in Out-Of-Bag
(OOB) prediction error when the variable is removed,
and because each tree in the model uses only a random
subset of variables, correlation between variables does
not pose a problem to the measurement of importance
(Cutler et al., 2007; Kampichler et al., 2010). Interpreting
variable importance in the other non-linear methods
such as ANN, in contrast, requires the use of wrapper
methods, which may substantially increase computa-
tional complexity (Gevrey et al., 2003; Guyon and
Elisseeff, 2003; Olden et al., 2004).
We built four sets of RF models. For each, we used the
OTU abundance table as input. One set consisted of
classification models to predict the site of each of the 35
samples; the other three were regression models, to pre-
dict either salinity, sediment total nitrogen or
temperature. Based on our previous analyses using
other methods (described above), we expected that nirS
community composition would predict site and salinity
San Francisco Bay deep nirS amplicon sequencing 7
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
well and nitrogen moderately well, but not temperature.
To assess model accuracy we used the OOB error mea-
sured during model construction, and we used a
repetitive training-and-testing procedure to generate
examples for visual analysis (Fig. 4).
The model for predicting site based on OTU abundances
classified most samples into their site of origin with high
accuracy: total OOB error (the frequency with which sam-
ples were misclassified as the incorrect site) was 5.9%.
The OOB confusion matrices (not shown), the OOB rates
for each site (Supporting Information Table S1), and the
results of the model training-and-testing procedure (Fig.
4A) revealed that when errors were made, they always
misclassified a sample into an adjacent site – e.g., sam-
ples from site 8.1 were sometimes misclassified as site
4.1, and the northernmost and southernmost sites were
never misclassified. This agrees with our findings using
Unifrac and Mantel analysis, which indicated that samples
from adjacent sites host similar communities.
In contrast, for salinity, nitrogen, and temperature, the
range of predictions was always narrower than the range
of true values (Fig. 4B–D). Average pseudo-r
2
values for
the models were: salinity, 0.75; nitrogen, 0.37; tempera-
ture, 20.15. (Pseudo-r
2
is calculated as 1 – MSE/var(y),
where MSE 5mean squared OOB error and var(y) 5the
variance of the dependent variable.) In agreement with our
earlier findings regarding the strength of the relationships
between these environmental variables and community
composition, the model linking salinity to OTU abundances
could make predictions with some accuracy, whereas the
model based on nitrogen was weaker and the model based
on temperature had no predictive power. The plot of the
4.1 8.1 13 21 24
Sample Site
Fraction o
f
predicted sites
0.0 0.2 0.4 0.6 0.8 1.0
0 5 10 15 20 25 30 35
0 5 10 15 20 25 30 35
Sample Salinity (PSU)
Predicted Salinity (PSU)
0.00 0.04 0.08 0.12
0.00 0.04 0.08 0.12
Sample Total N (%)
Predicted Total N (
%
)
8101214161820
8 101214161820
Sample Temperature (ºC)
Predicted Temperature (
ºC)
AB
CD
Site
4.1
8.1
13
21
24
Fig. 4. Results of Random Forests models to predict environmental characteristics of samples based on community composition. Each plot
shows the result of 10 model-test runs. In each run, 25 randomly chosen samples were used to build a model and the remaining 10 samples
were used to test the model.
A. Predictions by the classification model to predict sample site. Each bar represents a particular site (actual value); the colour of the bar
indicates the relative frequency with which samples from that site were predicted by the model to have originated from each of the five
possible sites (predicted values).
B–D. Predictions by the salinity, nitrogen, and temperature regression models, respectively. The x-axis represents actual values, and the y-axis
represents predicted values. In all plots, the site of origin of the sample is indicated by colour as given in the legend at lower right.
8J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
salinity model predictions also reveals that the accuracy of
the salinity model may be partly due to the fact that sam-
ples from the same site are all predicted to have similar
salinities (Fig. 4B).
For comparison, we carried out the same Random For-
ests modelling using the clone library dataset. As
expected, due to the smaller number of samples and lower
dimensionality of the dataset, the accuracy of the clone
library models was much lower than that of the PGM mod-
els. The clone library model for Site misclassified more
samples than it classified correctly (OOB error was 62%),
and the pseudo-r
2
values were 0.54, 20.07, and 20.25 for
salinity, nitrogen, and temperature respectively (Supporting
Information Table S1 and Fig. S5).
Random forests for selection of environmentally relevant
OTUs
We then investigated how the variable importance mea-
sure of the RF method could be used to identify OTUs
that were most strongly associated with specific environ-
mental conditions. This would enable RF to be used as a
method of indicator species analysis, a common ecologi-
cal technique for identifying important taxa to study for
the purposes of assessing environmental change or
organism dispersal (Fortunato et al., 2013; Barber
an
et al., 2015). Several popular methods, such as IndVal
(Dufr^
ene and Legendre, 1997), Metastats (White et al.,
2009) or LEfSe (Segata et al., 2011), find species asso-
ciated with discrete biological conditions or habitats, but
RF would enable the identification of OTUs associated
with continuous variables such as salinity. For this pur-
pose, we used the built-in permutational measure of
importance from the RF model-training procedure
(based on the difference in OOB error before and after
permuting the values of a predictor variable), with an
iterative selection procedure to identify OTUs that were
important in only one of the three predictive models: e.g.
OTUs of high importance for predicting site but not salin-
ity or nitrogen (see Experimental procedures for details).
This process resulted in 51 OTUs strongly associated
with site, 46 associated with salinity, and 38 associated
with nitrogen. Each of the groups of OTUs was phyloge-
netically diverse, and included representatives from
most of the major nirS clades (Fig. 5).
The relationship between environment and phylogeny
was clear for both the site-associated and salinity-
associated OTUs. Previous studies along salinity gradients
have found abundance of Betaproteobacteria to decrease,
and abundance of Alphaproteobacteria and Gammapro-
teobacteria to increase, with increasing salinity
(Herlemann et al., 2011). Our results were partially consis-
tent with those findings: the clades of Betaproteobacteria
nirS sequences contained only OTUs associated with low
salinities, and the OTUs associated with high salinities
were found only among the Gammaproteobacteria;how-
ever, most of the indicator OTUs identified as
Alphaproteobacteria were associated with low salinity
rather than high (Fig. 5B). This may be because most of
the putative Alphaproteobacteria indicator OTUs were spe-
cifically related to the Magnetospirillum genus, which is
commonly found in freshwater environments (e.g., Lefe
`vre
et al., 2012). It also demonstrates that this method is capa-
ble of highlighting environmental preferences in specific
phyla that may not be evident from examining shifts in the
abundance of broader taxonomic groups.
Phylogenetic clustering of site-associated OTUs
occurred on an even finer taxonomic scale: for instance,
within the Gammaproteobacteria, a cluster of several Halo-
monas-like nirS OTUs were found to be associated with
site 13, and a cluster of Thiohalomonas/Marinobacter-like
nirS OTUs were associated with site 21 (Fig. 5A). Notably,
we were able to identify site-indicator OTUs that were dis-
tinct from the salinity-indicator OTUs, possibly a sign that
although salinity and site are strongly associated, other
factors may still play a role in determining the site prefer-
ence of a specific OTU. The one exception to this is that
many indicator OTUs generated by all three models fell
into the Azoarcus-like clade, but this was likely because it
was the most highly represented clade in the sequence
library, having contributed 247 of the 1776 OTUs (Support-
ing Information Fig. S3). Some of the indicator taxa
identified are present only at a single site; some at multiple
sites within a limited salinity range; and some at all sites
but preferentially abundant at one of the sites.
The phylogeny of the OTUs specific to sediment total
nitrogen is less clear, which is understandable given the
poor predictive power of the nitrogen model (Fig. 5C). A
few OTUs were identified that were unambiguously associ-
ated with high- or low-nitrogen. Notably, the Leptothrix/
Acidovorax-like group of Betaproteobacteria contained
OTUs of both types. However, because sediment total
nitrogen correlated strongly with sediment total carbon, it
may alternatively be interpreted as representing sediment
organic matter content.
Our approach to the nirS PGM dataset is analogous to that
used by biomedical researchers employing RF to analyse
gene expression data: frequently, RF is used to identify cer-
tain genes whose expression levels in human subjects
correlate to a phenotype of interest. A logical future step for its
use in microbial ecology would be to employ variable selec-
tion tools that have been developed for use with gene
expression data, such as varSelRF (Diaz-Uriarte, 2007), to
define a limited set of indicator OTUs using logical criteria.
Nevertheless, our rudimentary method of selecting OTUs that
scored highly in importance repeatedly across many model
runs was quite successful in identifying taxa that responded
strongly to environmental gradients of interest. This method
San Francisco Bay deep nirS amplicon sequencing 9
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
has a wide range of possible further applications, from the
generation of hypotheses about known taxa, to aiding primer
design for PCR assays targeting specific clades as biological
indicators of certain environmental conditions.
Conclusions
Our in-depth NGS-enabled survey of nirS-type denitrifiers
in the San Francisco Bay estuary has revealed very strong,
previously unobserved ecological patterns in the sediment
microbial community. We have provided evidence for a
robust association between community composition and
geography, which may be attributed to environmental
selection by salinity and nitrogen availability, but may also
be influenced by dispersal. These observations were con-
sistent using both low-throughput (clone libraries) and
high-throughput (NGS) sequencing methods. We also
demonstrated that the remarkable site- and environment-
specificity of denitrifier communities in the San Francisco
Bay estuary allow the identification of a sample’s location
and salinity from its nirS sequences alone, using Random
Forests modelling. Although focused on a specific func-
tional group, this study highlights ecological processes that
may also shape the biogeography of all sediment micro-
biota in the estuary; more general community surveys will
be necessary in the future to distinguish which of the
effects observed here are specific to denitrifying microor-
ganisms. A full understanding of the ecological
implications of our results would also be benefited by work
that can distinguish active cells from dormant cells (Len-
non and Jones, 2011) and relic DNA (Carini et al., 2016),
and from applying models of flow and sediment transport
to understand microbial residence time (Crump et al.,
2004), to better understand the relative importance of dis-
persal versus selection on the biogeographical patterns we
have observed here.
AAUW01000018_Stappia_aggregata_IAM_12614
AB092344_Burkholderia_cepacia
AB536930_Rubrivivax_gelatinosus_IL144
ACIS01000002_Pseudogulbenkiania_ferrooxidans_2002
ACQT01000010_Acidovorax_delafieldii_2AN
AJ401462_Paracoccus_pantotrophus
AM492191_Thiohalomonas_denitrificans
AM902716_Bordetella_petrii
AP007255_Magnetospirillum_magneticum_AMB−1
AP007255_Magnetospirillum_magneticum_AMB−1_2
AP011112_Hydrogenobacter_thermophilus_TK−6
AXBY01000034_Labrenzia_sp._C1B10
AY078259_Thauera_aromatica
AY078262_Thauera_chlorobenzoica
AY078266_Thauera_terpenica
AY078272_Azoarcus_tolulyticus
AY838759_Thauera_sp._27
CP000032_Ruegeria_pomeroyi_DSS−3_2
CP000083_Colwellia_psychrerythraea_34H
CP000089_Dechloromonas_aromatica_RCB
CP000089_Dechloromonas_aromatica_RCB_2
CP000304_Pseudomonas_stutzeri_A1501
CP000352_Cupriavidus_metallidurans_CH34
CP000362_Roseobacter_denitrificans_OCh_114
CP000489_Paracoccus_denitrificans_PD1222
CP000514_Marinobacter_aquaeolei_VT8
CP000744_Pseudomonas_aeruginosa_PA7
CP000830_Dinoroseobacter_shibae_DFL_12_DSM_16493
CP001013_Leptothrix_cholodnii_SP−6
CP001392_Acidovorax_ebreus_TPSY
CP001707_Kangiella_koreensis_DSM_16069
CP001807_Rhodothermus_marinus_DSM_4252
CP001965_Sideroxydans_lithotrophicus_ES−1
CP002361_Oceanithermus_profundus_DSM_14977
CP002449_Alicycliphilus_denitrificans_BC_2
CP002568_Polymorphum_gilvum_SL003B−26A1
CP002878_Cupriavidus_necator_N−1
CP002881_Pseudomonas_stutzeri_ATCC_17588
CP003150_Pseudomonas_fluorescens_F113
CP003989_Thioalkalivibrio_nitratireducens_DSM_14787
CT573071_Candidatus_Kuenenia_stuttgartiensis
CU459003_Magnetospirillum_gryphiswaldense_MSR−1
DQ865926_Comamonas_denitrificans
FN555564_Brachymonas_denitrificans
FN666415_Thermus_thermophilus
FO203512_Oleispira_antarctica_RB−8
FP565575_Candidatus_Methylomirabilis_oxyfera
GQ384047_Halomonas_denitrificans
X91394_Ralstonia_eutropha_H16 AAUW01000018_Stappia_aggregata_IAM_12614
AB092344_Burkholderia_cepacia
AB536930_Rubrivivax_gelatinosus_IL144
ACIS01000002_Pseudogulbenkiania_ferrooxidans_2002
ACQT01000010_Acidovorax_delafieldii_2AN
AJ401462_Paracoccus_pantotrophus
AM492191_Thiohalomonas_denitrificans
AM902716_Bordetella_petrii
AP007255_Magnetospirillum_magneticum_AMB−1
AP007255_Magnetospirillum_magneticum_AMB−1_2
AP011112_Hydrogenobacter_thermophilus_TK−6
AXBY01000034_Labrenzia_sp._C1B10
AY078259_Thauera_aromatica
AY078262_Thauera_chlorobenzoica
AY078266_Thauera_terpenica
AY078272_Azoarcus_tolulyticus
AY838759_Thauera_sp._27
CP000032_Ruegeria_pomeroyi_DSS−3_2
CP000083_Colwellia_psychrerythraea_34H
CP000089_Dechloromonas_aromatica_RCB
CP000089_Dechloromonas_aromatica_RCB_2
CP000304_Pseudomonas_stutzeri_A1501
CP000352_Cupriavidus_metallidurans_CH34
CP000362_Roseobacter_denitrificans_OCh_114
CP000489_Paracoccus_denitrificans_PD1222
CP000514_Marinobacter_aquaeolei_VT8
CP000744_Pseudomonas_aeruginosa_PA7
CP000830_Dinoroseobacter_shibae_DFL_12_DSM_16493
CP001013_Leptothrix_cholodnii_SP−6
CP001392_Acidovorax_ebreus_TPSY
CP001707_Kangiella_koreensis_DSM_16069
CP001807_Rhodothermus_marinus_DSM_4252
CP001965_Sideroxydans_lithotrophicus_ES−1
CP002361_Oceanithermus_profundus_DSM_14977
CP002449_Alicycliphilus_denitrificans_BC_2
CP002568_Polymorphum_gilvum_SL003B−26A1
CP002878_Cupriavidus_necator_N−1
CP002881_Pseudomonas_stutzeri_ATCC_17588
CP003150_Pseudomonas_fluorescens_F113
CP003989_Thioalkalivibrio_nitratireducens_DSM_14787
CT573071_Candidatus_Kuenenia_stuttgartiensis
CU459003_Magnetospirillum_gryphiswaldense_MSR−1
DQ865926_Comamonas_denitrificans
FN555564_Brachymonas_denitrificans
FN666415_Thermus_thermophilus
FO203512_Oleispira_antarctica_RB−8
FP565575_Candidatus_Methylomirabilis_oxyfera
GQ384047_Halomonas_denitrificans
X91394_Ralstonia_eutropha_H16
AAUW01000018_Stappia_aggregata_IAM_12614
AB092344_Burkholderia_cepacia
AB536930_Rubrivivax_gelatinosus_IL144
ACIS01000002_Pseudogulbenkiania_ferrooxidans_2002
ACQT01000010_Acidovorax_delafieldii_2AN
AJ401462_Paracoccus_pantotrophus
AM492191_Thiohalomonas_denitrificans
AM902716_Bordetella_petrii
AP007255_Magnetospirillum_magneticum_AMB−1
AP007255_Magnetospirillum_magneticum_AMB−1_2
AP011112_Hydrogenobacter_thermophilus_TK−6
AXBY01000034_Labrenzia_sp._C1B10
AY078259_Thauera_aromatica
AY078262_Thauera_chlorobenzoica
AY078266_Thauera_terpenica
AY078272_Azoarcus_tolulyticus
AY838759_Thauera_sp._27
CP000032_Ruegeria_pomeroyi_DSS−3_2
CP000083_Colwellia_psychrerythrea_34H
CP000089_Dechloromonas_aromatica_RCB
CP000089_Dechloromonas_aromatica_RCB_2
CP000304_Pseudomonas_stutzeri_A1501
CP000352_Cupriavidus_metallidurans_CH34
CP000362_Roseobacter_denitrificans_OCh_114
CP000489_Paracoccus_denitrificans_PD1222
CP000514_Marinobacter_aquaeolei_VT8
CP000744_Pseudomonas_aeruginosa_PA7
CP000830_Dinoroseobacter_shibae_DFL_12_DSM_16493
CP001013_Leptothrix_cholodnii_SP−6
CP001392_Acidovorax_ebreus_TPSY
CP001707_Kangiella_koreensis_DSM_16069
CP001807_Rhodothermus_marinus_DSM_4252
CP001965_Sideroxydans_lithotrophicus_ES−1
CP002361_Oceanithermus_profundus_DSM_14977
CP002449_Alicycliphilus_denitrificans_BC_2
CP002568_Polymorphum_gilvum_SL003B−26A1
CP002878_Cupriavidus_necator_N−1
CP002881_Pseudomonas_stutzeri_ATCC_17588
CP003150_Pseudomonas_fluorescens_F113
CP003989_Thioalkalivibrio_nitratireducens_DSM_14787
CT573071_Candidatus_Kuenenia_stuttgartiensis
CU459003_Magnetospirillum_gryphiswaldense_MSR−1
DQ865926_Comamonas_denitrificans
FN555564_Brachymonas_denitrificans
FN666415_Thermus_thermophilus
FO203512_Oleispira_antarctica_RB-8
FP565575_Candidatus_Methylomirabilis_oxyfera
GQ384047_Halomonas_denitrificans
X91394_Ralstonia_eutropha_H16
Nitrogen
(%)
0.100
0.075
0.025
0.050
Salinity
(PSU)
30
20
10
Abundance
10
100
1,000
Site
4.1
8.1
13
21
24
ABC
Fig. 5. Phylogenetic trees of indicator OTUs identified by Random Forests models, with reference sequences from the literature. Each of the
three trees contains the same reference sequences but different indicator OTUs. Each unlabelled branch represents an OTU, and each circle
next to the branch represents one sample in which the OTU was found, with size indicating the abundance of the OTU in that sample. In each
tree, the OTUs shown are those that consistently ranked within the top 100 OTUs of highest importance in all of 10 independently trained
models for that particular environmental characteristic, and were not ranked highly in importance in models predicting the other two
characteristics. The number of OTUs meeting these criteria differed among the different sets of models.
A. OTUs of high importance in RF models predicting site (51 OTUs). Colour indicates site. Circles are ordered from left to right, first by site
and then by month.
B. OTUs of high importance in RF models predicting salinity (46 OTUs). Colour indicates salinity of bottom water in PSU. Circles are ordered
from left to right by salinity, from low to high.
C. OTUs of high importance in RF models predicting nitrogen (38 OTUs). Colour indicates total nitrogen content of sediment sample, in % by
mass of dry sediment. Circles are ordered from left to right by nitrogen content, from low to high.
10 J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
In addition to providing specific insight on microbial com-
munities in the San Francisco Bay estuary, our RF-based
method for identifying indicator species has the potential
for broad utilization. Microbial indicator species analysis is
used in fields as diverse as ecology [for identifying distinc-
tive biomes (Fortunato et al., 2013) or assessing microbial
dispersal patterns (Barber
an et al., 2015)], environmental
health [for tracking pathogens in the environment (McLel-
lan and Eren, 2014; Tan et al., 2015)] and medicine [for
identifying microbiota associated with host disease states
(Segata et al., 2011)], and the monitoring of microbial com-
munities in relation to environmental change will continue
to become more prominent as NGS technology becomes
increasingly more accessible (e.g., Leff et al., 2015). The
technique we have introduced here, which can identify not
just discrete source populations but also continuous envi-
ronmental variables, is a novel contribution. In San
Francisco Bay, microbial indicator species might allow sen-
sitive temporal monitoring of ecosystem dynamics. Climate
change is expected to bring a number of alterations to the
San Francisco Bay estuary, including higher temperatures
and salinity, and reduced runoff and suspended sediment
(Cloern et al., 2011), and in this changing environmental
context, nutrient dynamics are also a matter of increasing
concern (Rogers, 2013). Establishing a baseline under-
standing of microbial community ecology in the estuary,
and developing powerful tools for monitoring community
dynamics, will be essential for understanding and main-
taining the health of the ecosystem in the future.
Experimental procedures
Field sampling, DNA extraction and environmental
measurements
Sampling was conducted between July 2011 and June 2012
aboard monthly full-bay cruises on the R/V Polaris (USGS;
Menlo Park, CA) as part of the Water Quality of San Francisco
Bay monitoring program (http://sfbay.wr.usgs.gov/access/
wqdata/). Sample sites are shown in Fig. 1A. Sample collec-
tion, extraction of sediment DNA and chemical measurements
in sediment and bottom water were carried out as previously
described (Lee and Francis, 2017). Briefly, sediments were
collected by van Veen grab and sampled through a door in the
top of the grab using cut-off sterile 3 mL syringes, and stored
on dry ice until processing. Total DNA was extracted from the
top 1 cm from triplicate cores from each sample and pooled
before sequencing. Bottom water was sampled from the water
trapped above the sediment in the grab.
Gene amplification, sequencing and sequence
processing
The generation and sequencing of the PCR clone libraries
were described previously (Lee and Francis, 2017) and the
sequences are available from GenBank (accession numbers
KR060622 – KR061281). In summary, the primers nirS1F (50-
CCTAYTGGCCGCCRCART-30) and nirS6R (50-CGTTGAA
CTTRCCGGT-30)(Brakeret al., 1998) were used to amplify
nirS from DNA samples taken in July 2011, October 2011,
January 2012 and May 2012, yielding products ranging from
797 to 890 bp long; amplicons were cloned using the pGEM-T
vector system and sequenced via Sanger sequencing (Lee
and Francis, 2017). The PGM sequence library was con-
structed from the same DNA as that used for the clone
libraries, and also from samples from September and Novem-
ber 2011 and March 2012. DNA amplification, barcode library
construction, and sequencing were carried out by Molecular
Research LP (MR DNA; Shallowater, TX). A 265-bp region of
the nirS gene was amplified using nirS1F (the same forward
primer as was used for the PCR clone libraries) and nirS-q-R
(50-TCCMAGCCRCCRTCRTGCAG-30), a reverse primer used
successfully with nirS1F for nirS qPCR analysis in previous
studies (Mosier and Francis, 2010; Smith et al., 2015; Lee and
Francis, 2017). Sequencing was performed using the nirS1F
primer on an Ion Torrent Personal Genome Machine (PGM)
sequencer. Initial quality control was carried out by the
sequencing facility: sequences were depleted of barcodes and
primers, sequences shorter than 150-bp were removed, and
sequences with ambiguous base calls or with homopolymer
runs exceeding 6-bp were also removed. This resulted in an ini-
tial library of 3 796 589 reads, between 150 and 265-bp long.
For OTU library formation, reads were initially trimmed to
215-bp and all shorter reads were discarded. The trimmed
library was then entered into the UPARSE pipeline implement-
ing USEARCH v7 (Edgar, 2013) for dereplication and de-novo
clustering to the 88% similarity level after (Bowen et al., 2013)
(see also Lee and Francis, 2017), with removal of singletons
and chimeras. The representative sequences were then
entered into the FunFrame v0.9.3 pipeline (Bowen et al.,
2013) for error correction using HMM-Frame with the Cyto-
chrome D1 Hidden Markov Model (PFAM family PF02239)
and minimum score filter of 80. A final library of 1881 repre-
sentative OTU sequences was retained.
The sequences representing PGM OTUs were aligned
using PyNAST v1.2 within QIIME v.1.7.0 (Caporaso et al.,
2010), with a seed alignment that had been generated from
3111 sequences from the FunGene repository (Fish et al.,
2013). The alignment was then used to generate a phyloge-
netic tree with FastTree v2.1.3 within QIIME. To assess OTU
abundances, UPARSE was used to map all reads (untrimmed,
singletons included) to the 1881 representative OTU sequen-
ces at an 88% identity threshold. 7.3% of reads failed to map
and were disregarded.
Nucleotide accession numbers
The PGM sequences reported in this study have been
deposited in the NCBI Sequence Read Archive database
under the BioProject PRJNA285824. The clone library
sequences, as previously reported, are available in Gen-
bank under accession nos. KR060622 – KR061281.
Community diversity analysis
The full nirS PGM library was initially resampled to achieve
an even sampling depth of 13 000 sequences per sample,
San Francisco Bay deep nirS amplicon sequencing 11
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
resulting in a final library of 1776 OTUs. To determine
whether resampling led to any systematic biases, we con-
ducted all alpha-diversity, Unifrac and PERMANOVA
analyses on both the full and resampled dataset; for all
these analyses, results were similar between the two data-
sets. We also used least squares linear regression to
assess the effect of sample size on alpha diversity metrics
(Supporting Information Table S2). We found that the num-
ber of reads in a sample had a significant impact on all
diversity metrics except for Simpson; for this reason, we
moved forward with the resampled dataset in order to elim-
inate the influence of sample size, and report only those
results in the main text. The clone library was not
resampled.
All diversity analyses were carried out in R v3.0.2 (R
Core Team, 2014), using RStudio v0.98.1074 (Boston,
MA). Resampling, alpha-diversity metrics, weighted Uni-
frac, and Principal Coordinates Analysis (PCoA) were
carried out in the Phyloseq package v1.8.2 (McMurdie and
Holmes, 2013). The Standardized Effect Size of Phyloge-
netic Diversity (SES-PD) (Webb et al., 2002) was
calculated with Picante v1.6-2 (Kembel et al., 2010) in R,
using 999 permutations and a null model that shuffled all
taxa across the phylogeny. Linear regression was per-
formed using the lm function in R. To compare alpha-
diversity values between groups of samples, Dunn’s test
(Dunn, 1961) was performed, using the dunn.test package
in R (Dinno, 2016).
Statistical analyses of environmental variables
Of the environmental variables measured at time of sam-
pling, only seven were used in models to test for statistical
correlation with community data in San Francisco Bay. The
variables were chosen based on ecological relevance, as
described previously in the literature and in order to mini-
mize collinearity, which was assessed by PCA and by
Pearson’s product-moment correlation (Supporting Infor-
mation Figs S1 and S2). The variables tested were similar
to those used by (Lee and Francis, 2017), with some minor
changes; they included salinity, temperature, bottom water
NO2
3concentration, bottom water NH1
4concentration, sed-
iment total N content and sediment total Cu content, and
sediment total S content. Bottom water measurements of
NO2
3and NH1
4were used because porewater was not
available for all the samples sequenced in this study.
We generated an environmental distance matrix for use
in Mantel tests by conducting Principal Components Analy-
sis (PCA) on the samples using the seven chosen
variables, then calculating a Euclidean distance matrix
from the PCA. We generated a spatial distance matrix by
calculating the Euclidean distance among the sites accord-
ing to their location along the major axis of the estuary. A
temporal distance matrix was generated by calculating the
Euclidean distance among sampling date, in units of days.
Mantel and Partial Mantel tests were conducted in Vegan
v2.2–0 (Oksanen et al., 2007) in R, using Pearson correla-
tion and 999 permutations.
The relationship between environmental variables and
Unifrac distances was assessed by Permutational Multivar-
iate Analysis of Variance (PERMANOVA) (Anderson,
2001) in Vegan, with 999 permutations.
Reference phylogenetic tree
The reference tree used to analyse PGM data (Supporting
Information Fig. S3) was composed of 49 published nirS
nucleotide sequences from cultured isolates and the 660
sequences from the San Francisco Bay nirS clone libraries
(Lee and Francis, 2017). Published sequences were
trimmed to match the length of the clone library sequen-
ces, and alignment was conducted using the ‘Translation
Align’ function in Geneious v5.6.4 (Biomatters, Inc.).
Sequences representing OTUs from the PGM dataset
were then added to this custom seed alignment using
PyNAST v1.2.2 (in QIIME v1.8.0), and the alignment was
used to generate a tree in FastTree v2.1.7 using a
GTR 1CAT model, with ‘Candidatus Keuenina stuttgarten-
sis’, an anammox bacterium, as the outgroup. Clades were
manually collapsed and annotated using FigTree v1.4.2
(Rambaut, 2014).
Comparisons between PGM library and clone library
sequences
Spearman rank correlations between alpha-diversity met-
rics of the two datasets (clone library and PGM) were
calculated in Vegan. Mantel tests to compare Unifrac dis-
tance matrices were also performed in Vegan, with Pearson
correlation and 999 permutations. The coverage of the nirS-
q-R primer among the sequences in the clone library data-
set was assessed by motif search in Geneious v5.6.4.
The overlap in coverage between the clone library and
the PGM dataset was assessed by using UPARSE to map
the clone library reads to the sequences representing
PGM OTUs at an 88% identity threshold. Phylogenetic
clustering of the PGM OTUs that mapped to the clone
library was assessed relative to the full set of PGM OTUs
by calculating the Standardized Effect Sizes of the Mean
Nearest Taxon Distance (SES-MNTD) and the Mean Pair-
wise Distance (SES-MPD) (Vamosi et al., 2009) in Picante
v1.6-2 within R, using no abundance weighting, 999 runs
and a null model that shuffled labels across all taxa. The
abundances of the shared OTUs in the clone library data-
set were compared to the corresponding abundances in
the PGM dataset by Spearman rank correlation, carried
out in Vegan. Finally, Unifrac distance matrices were re-
calculated for the clone library and for the PGM dataset,
12 J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
using only the OTUs that the two datasets had in common,
and those distance matrices were compared by Mantel
test as described above, to each other and to the original
Unifrac distance matrices of the full libraries.
Random forests model generation and variable
importance
Random Forests (RF) was implemented using the random-
Forest v4.6–10 package for R (Liaw and Wiener, 2002).
Models were built using the OTU table from the PGM data-
set as the input, with either sample site (a classification
model) or bottom water salinity, sediment total nitrogen or
bottom water temperature (regression models) as the
dependent variable. In all cases, mtry (the number of varia-
bles sampled as candidates at each split) was given as 80
and each forest had 2000 trees. For comparison, RF mod-
els were also constructed from the clone library dataset
using the same procedure. All measures of model accu-
racy reported are from the OOB error estimator produced
by randomForest during model training of 100 models of
each type. For the classification model, OOB error is the
proportion of time samples were misclassified; for the
regression models, pseudo-r
2
is 1-(MSE/var(y)), where
MSE 5mean squared OOB error and var(y)5the variance
of the dependent variable.
To generate visual displays of the predictive accuracy of
RF models (because OOB error does not store values for
plotting), 10 model-test pairs were constructed for each
environmental variable. For each model-test pair, a training
dataset was constructed from the PGM dataset using 25
randomly-chosen samples (out of the 35 total), and then
tested on the remaining 10 samples; or from the clone
library dataset using 15 randomly chosen samples, and
tested on the remaining 5. This resulted in 100 PGM or 50
clone library predictions for each environmental variable,
which were then compiled into a single database. These
model predictions were used solely to generate plots as a
visual aid and for qualitative analysis, not for rigorous sta-
tistical analysis. Due to chance, most samples were
represented several times in these compiled datasets but
a few samples were not represented at all.
We identified OTUs that were strongly associated with
site, salinity or sediment nitrogen content using the built-in
permutational measure of variable importance (Error Rate
for classification models, Mean Squared Error for regres-
sion models) generated by randomForest during model
training. For each environmental variable, the full PGM
dataset was used to construct an RF model 10 indepen-
dent times. Each time, the 100 OTUs that ranked highest
in importance were recorded. Only the OTUs that
appeared in the top 100 of all 10 models were kept
(approximately 40–70 OTUs for each variable). Further,
only the OTUs that were important to one environmental
variable and not to the others were kept as indicator OTUs
(approximately 30–50 OTUs for each variable). For analy-
sis of the phylogeny of these groups of OTUs, the nirS
reference tree described above (‘Reference Phylogenetic
Tree’) was subsetted to retain just the indicator OTUs
along with reference sequences from cultivated organisms.
Acknowledgements
Funding for this study was provided by NSF CAREER Grant
OCE-0847266 (to C.A.F.) and by a Stanford Graduate Fellow-
ship (from William R. and Sara Hart Kimball) and a Marshall-
EPA Scholarship (to J.A.L.). We thank Julian Damashek for his
abundant help with sample collection; and we are grateful to him
as well as to Bradley Tolar and Linta Reji for helpful feedback on
the manuscript. Jennifer Bowen was exceptionally generous
with advice and feedback on sequence analysis methods.
Finally, we also owe profuse thanks to Jim Cloern, Jessica
Dyke, Amy Kleckner, Jan Thompson and the other USGS scien-
tists and staff who made our work on the R/V Polaris possible.
References
Abell, G., Ross, D., Keane, J., Oakes, J., Eyre, B., Robert, S.,
and Volkman, J. (2013) Nitrifying and denitrifying microbial
communities and their relationship to nutrient fluxes and
sediment geochemistry in the Derwent Estuary, Tasmania.
Aquat Microb Ecol 70: 63–75.
Anderson, M.J. (2001) A new method for non-parametric mul-
tivariate analysis of variance. Austral Ecol 26: 32–46.
Barber
an, A., Ladau, J., Leff, J.W., Pollard, K.S., Menninger,
H.L., Dunn, R.R., and Fierer, N. (2015) Continental-scale
distributions of dust-associated bacteria and fungi. Proc
Natl Acad Sci USA 112: 5756–5761.
Bokulich, N.A., Thorngate, J.H., Richardson, P.M., and Mills,
D.A. (2014) Microbial biogeography of wine grapes is condi-
tioned by cultivar, vintage, and climate. Proc Natl Acad Sci
USA 111: E139–E148.
Bowen, J.L., Byrnes, J.E., Weisman, D., and Colaneri, C.
(2013) Functional gene pyrosequencing and network analy-
sis: an approach to examine the response of denitrifying
bacteria to increased nitrogen supply in salt marsh sedi-
ments. Front M ic ro biol 4: 342.
Braker, G., Fesefeldt, A., and Witzel, K.-P. (1998) Develop-
ment of PCR primer systems for amplification of nitrite
reductase genes (nirK and nirS) to detect denitrifying bacte-
ria in environmental samples. Appl Environ Microbiol 64:
3769–3775.
Breiman, L. (2001) Random Forests. Mach Learn 45: 5–32.
Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K.,
Bushman, F.D., Costello, E.K., et al. (2010) QIIME allows
analysis of high-throughput community sequencing data.
Nat Methods 7: 335–336.
Carini, P., Marsden, P.J., Leff, J.W., Morgan, E.E., Strickland,
M.S., and Fierer, N. (2016) Relic DNA is abundant in soil
and obscures estimates of soil microbial diversity. Nat
Microbiol 2: nmicrobiol2016242.
Cloern, J.E., Knowles, N., Brown, L.R., Cayan, D., Dettinger,
M.D., Morgan, T.L., et al. (2011) Projected evolution of
San Francisco Bay deep nirS amplicon sequencing 13
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
California’s San Francisco Bay-delta-river system in a cen-
tury of climate change. PLoS One 6: e24465.
Cornwell, J.C., Glibert, P.M., and Owens, M.S. (2014) Nutrient
fluxes from sediments in the San Francisco Bay delta. Estu-
aries Coasts 37: 1120–1133.
Crisci, C., Ghattas, B., and Perera, G. (2012) A review of
supervised machine learning algorithms and their applica-
tions to ecological data. Ecol Model 240: 113–122.
Crump, B.C., Hopkinson, C.S., Sogin, M.L., and Hobbie, J.E.
(2004) Microbial biogeography along an estuarine salinity
gradient: combined influences of bacterial growth and resi-
dence time. Appl Environ Microbiol 70: 1494–1505.
Cutler, D.R., Edwards, T.C., Beard, K.H., Cutler, A., Hess,
K.T., Gibson, J., and Lawler, J.J. (2007) Random Forests
for classification in ecology. Ecology 88: 2783–2792.
Damashek, J., Casciotti, K.L., and Francis, C.A. (2016) Vari-
able nitrification rates across environmental gradients in tur-
bid, nutrient-rich estuary waters of San Francisco Bay.
Estuaries Coasts 39: 1050–1071.
Decleyre, H., Heylen, K., Sabbe, K., Tytgat, B., Deforce, D.,
Van N ieu wer bu rgh , F., et al. (2015) A doubling of microphy-
tobenthos biomass coincides with a tenfold increase in
denitrifier and total bacterial abundances in intertidal sedi-
ments of a temperate estuary. PLoS One 10: e0126583.
Diaz-Uriarte, R. (2007) GeneSrF and varSelRF: a web-based
tool and R package for gene selection and classification
using random forest. BMC Bioinformatics 8: 328.
Dinno, A. (2016) dunn.test: Dunn’s test of multiple compari-
sons using rank sums. R Package Version 1.3.2.
Dufr^
ene, M., and Legendre, P. (1997) Species assemblages
and indicator species: the need for a flexible asymmetrical
approach. Ecol Monogr 67: 345–366.
Dunn, O.J. (1961) Multiple comparisons among means. JAm
Stat Assoc 56: 52–64.
Edgar, R.C. (2013) UPARSE: highly accurate OTU sequences
from microbial amplicon reads. Nat Methods 10: 996–998.
Fish, J.A., Chai, B., Wang, Q., Sun, Y., Brown, C.T., Tiedje,
J.M., and Cole, J.R. (2013) FunGene: the functional gene
pipeline and repository. Fron t Micro biol 4: 291.
Fortunato, C.S., Eiler, A., Herfort, L., Needoba, J.A., Peterson,
T.D., and Crump, B.C. (2013) Determining indicator taxa
across spatial and seasonal gradients in the Columbia River
coastal margin. ISME J 7: 1899–1911.
Francis, C.A., O’Mullan, G.D., Cornwell, J.C., and Ward,
B.B. (2013) Transitions in nirS-type denitrifier diversity,
community composition, and biogeochemical activity
along the Chesapeake Bay Estuary. Front.Aquat Microbiol
4: 237.
Gevrey, M., Dimopoulos, I., and Lek, S. (2003) Review and
comparison of methods to study the contribution of varia-
bles in artificial neural network models. Ecol Model 160:
249–264.
Guyon, I., and Elisseeff, A. (2003) An introduction to variable
and feature selection. J Mach Learn Res 3: 1157–1182.
Hager, S.W., and Schemel, L.E. (1992) Sources of nitrogen
and phosphorus to Northern San Francisco Bay. Estuaries
15: 40–52.
Herlemann, D.P., Labrenz, M., J
urgens, K., Bertilsson, S.,
Waniek, J.J., and Andersson, A.F. (2011) Transitions in bac-
terial communities along the 2000 km salinity gradient of
the Baltic Sea. ISME J 5: 1571–1579.
Jones, C.M., and Hallin, S. (2010) Ecological and evolutionary
factors underlying global and local assembly of denitrifier
communities. ISME J 4: 633–641.
Kampichler, C., Wieland, R., Calm
e, S., Weissenberger, H.,
and Arriaga-Weiss, S. (2010) Classification in conservation
biology: a comparison of five machine-learning methods.
Ecol Inform 5: 441–450.
Kembel, S.W., Cowan, P.D., Helmus, M.R., Cornwell, W.K.,
Morlon, H., Ackerly, D.D., et al. (2010) Picante: R tools for
integrating phylogenies and ecology. Bioinformatics 26:
1463–1464.
Lee, J.A., and Francis, C.A. (2017) Spatiotemporal characteri-
zation of San Francisco bay denitrifying communities: a
comparison of nirK and nirS diversity and abundance.
Microb Ecol 73: 271–284.
Lefe
`vre, C.T., Schmidt, M.L., Viloria, N., Trubitsyn, D.,
Sch
uler, D., and Bazylinski, D.A. (2012) Insight into the
evolution of magnetotaxis in Magnetospirillum spp., based
on mam gene phylogeny. Appl Environ Microbiol 78:
7238–7248.
Leff, J.W., Jones, S.E., Prober, S.M., Barber
an, A., Borer,
E.T., Firn, J.L., et al. (2015) Consistent responses of soil
microbial communities to elevated nutrient inputs in grass-
lands across the globe. Proc Natl Acad Sci USA 112:
10967–10972.
Lennon, J.T., and Jones, S.E. (2011) Microbial seed banks:
the ecological and evolutionary implications of dormancy.
Nat Rev Microbiol 9: 119–130.
Liaw, A., and Wiener, M. (2002) Classification and regression
by randomForest. RNews2: 18–22.
McLellan, S.L., and Eren, A.M. (2014) Discovering new indica-
tors of fecal pollution. Trends Microbiol 22: 697–706.
McMurdie, P.J., and Holmes, S. (2013) phyloseq: an R pack-
age for reproducible interactive analysis and graphics of
microbiome census data. PLoS One 8: e61217.
Mosier, A.C., and Francis, C.A. (2010) Denitrifier abundance
and activity across the San Francisco Bay estuary. Environ
Microbiol Rep 2: 667–676.
Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., Minchin,
P.R., O’Hara, R.B., et al. (2007) The vegan package. Com-
munity Ecol Package 10: 631–637.
Olden,J.D.,Joy,M.K.,andDeath,R.G.(2004)Anaccurate
comparison of methods for quantifying variable importance
in artificial neural networks using simulated data. Ecol
Model 178: 389–397.
Olden, J.D., Lawler., and Poff, N.L. (2008) Machine learning
methods without tears: a primer for ecologists. Q Rev Biol
83: 171–193.
Pappu, V., and Pardalos, P.M. (2014) High-dimensional data
classification. In Clusters, Orders, and Trees: Methods and
Applications, Springer Optimization and Its Applications.
Aleskerov, F., Goldengorin, B., and Pardalos, P.M. (eds).
New York: Springer, pp. 119–150.
Parkhurst, D.F., Brenner, K.P., Dufour, A.P., and Wymer, L.J.
(2005) Indicator bacteria at five swimming beaches–analy-
sis using random forests. Water Res 39: 1354–1360.
R Core Team (2014) R: A language and environment for sta-
tistical computing. R Foundation for Statistical Computing,
Vienna, Austria.
Rambaut, A. (2014) FigTree v1.4.2. http://tree.bio.ed.ac.uk/
software/figtree/
14 J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
Rogers, P. (2013, November 30) San Francisco Bay waters
are becoming clearer, but that may mean threats from algae
growth. San Jose Mercury News.
Saarenheimo, J., Tiirola, M.A., and Rissanen, A.J. (2015)
Functional gene pyrosequencing reveals core proteobacte-
rial denitrifiers in boreal lakes. Front M ic ro biol 6: 674.
Santoro, A.E., Boehm, A.B., and Francis, C.A. (2006) Deni-
trifier community composition along a nitrate and salinity
gradient in a coastal aquifer. Appl Environ Microbiol 72:
2102–2109.
Schoellhamer, D.H., Wright, S.A., and Drexler, J. (2012) A
conceptual model of sedimentation in the Sacramento–San
Joaquin delta. San Franc Estuary Watershed Sci 10.
Segata, N., Izard, J., Waldron, L., Gevers, D., Miropolsky,
L., Garrett, W.S., and Huttenhower, C. (2011) Metage-
nomic biomarker discovery and explanation. Genome Biol
12: R60.
Smith, J.M., Mosier, A.C., and Francis, C.A. (2015) Spatiotempo-
ral relationships between the abundance, distribution, and
potential activities of ammonia-oxidizing and denitrifying micro-
organisms in intertidal sediments. Microb Ecol 69: 13–24.
Smith, R.A. (1871) On the examination of water for organic
matter. Mem Lit Philos Soc Manch 4: 37–88.
Tan, B., Ng, C., Nshimyimana, J.P., Loh, L.L., Gin, K.Y.-H., and
Thompson, J.R. (2015) Next-generation sequencing (NGS)
for assessment of microbial water quality: current progress,
challenges, and future opportunities. Front Microbiol 6:
1027.
The Human Microbiome Consortium (2012) Structure, func-
tion and diversity of the healthy human microbiome. Nature
486: 207–214.
Vamosi, S.M., Heard, S.B., Vamosi, J.C., and Webb, C.O.
(2009) Emerging patterns in the comparative analysis of
phylogenetic community structure. Mol Ecol 18: 572–592.
Ward, B.B. (2013) How nitrogen is lost. Science 341:
352–353.
Webb, C.O., Ackerly, D.D., McPeek, M.A., and Donoghue,
M.J. (2002) Phylogenies and community ecology. Annu Rev
Ecol Syst 33: 475–505.
White, J.R., Nagarajan, N., and Pop, M. (2009) Statistical
methods for detecting differentially abundant features in clin-
ical metagenomic samples. PLoS Comput Biol 5: e1000352.
Supporting information
Additional Supporting Information may be found in the
online version of this article at the publisher’s web-site:
Fig. S1. Principal Components Analysis biplot of samples
used in PGM sequence libraries with selected environmen-
tal measurements. Labelled arrows represent environmental
variables. Points represent samples, with sample site indi-
cated by colour and month indicated by shape. Abbrevia-
tions: Temp: temperature of bottom water. NO
3
: dissolved
nitrate in bottom water. NH
4
: dissolved ammonium in bot-
tom water. Sal: salinity of bottom water. Dist: distance from
the head of the estuary, as measured along the transect
passing through the USGS Water Quality monitoring sta-
tions. abund: abundance of nirS as measured by quantita-
tive PCR, in copies/gram dry sediment. Ctot, Ntot: total
carbon and nitrogen, respectively, of the dried sediment, by
mass. CN: carbon/nitrogen ratio of sediment, by mass. Fe,
Al, Mg, Na, Cl, S, Pb, Cu, P, Mn: total content of each ele-
ment in sediment, by mass.
Fig. S2. Distributions and pairwise correlations of the seven envi-
ronmental variables featured in the model for PERMANOVA anal-
ysis. The lower-left panels show scatterplots of each pair of
variables, with each point representing one sample, and a red
line showing the average using a locally-weighted polynomial
regression smoother. Panels on the diagonal show histograms of
each variable. Upper-right panels show the Pearson correlation
coefficient and significance for each pair of variables. Abbrevia-
tions: Sal: salinity of bottom water. NO
3
: dissolved nitrate in bot-
tom water. Tem p : temperature of bottom water. NH
4
: dissolved
ammonium in bottom water. Ntot: total nitrogen of the dried sedi-
ment, by mass. Cu: total copper content in sediment, by mass. S:
total sulfur content in dried sediment, by mass. Plots were gener-
ated using the pairs function in R with the panel.smooth function.
Fig. S3. Phylogenetic tree of all nirS sequences detected in
PGM dataset and clone library, with reference sequences
from cultured organisms. Large clades are collapsed into
wedges, where the length of the wedge is equal to the lon-
gest branch in the clade. Annotations describe the taxo-
nomic class as inferred from the cultured organisms; the
genus names of selected cultured organisms from within
the clade; and the number of San Francisco Bay sequences
in the clade. The number of OTUs detected in the PGM
dataset (defined at the 88% similarity level) is given in
parentheses; the number of clone sequences (not clustered
into OTUs) is given in brackets. The four individual OTUs
shown (OTU 484, OTU 1668, OTU 1213, OTU 431) were
present in the PGM dataset only. Confidence values for
each branch were calculated using FastTree’s Shimodaira-
Hasegawa test and are expressed as a percentage of 100.
The phylogeny is rooted with the nirS sequence of Candida-
tus Kuenenia stuttgartensis, an anammox bacterium.
Fig. S4. Mismatches between nirS-q-R PCR primer and
clone library sequences not found in the PGM dataset.
A. Number of mismatches, per sequence, between the
nirS-q-R primer and clone library sequences that mapped
to OTU representative sequences in the PGM dataset.
Mapping was done at 88% sequence identity.
B. Number of mismatches, per sequence, between the
nirS-q-R primer and clone library sequences that did not
map to OTUs in the PGM dataset.
C. Locations of mismatches observed between the nirS-q-R
primer and the clone library sequences. The sequence of
the primer is shown above the alignment, in grey. Locations
of mismatches in the sequences are annotated with red
rectangles above the mismatching nucleotides. The amino
acid sequence of the enzyme is shown below the nucleo-
tide sequence. Each nucleotide sequence shown represents
one set of mismatches observed in the dataset; each set
was present at a different abundance. Alignment produced
using Geneious v. 5.4.6.
Fig. S5. Results of Random Forests models to predict envi-
ronmental characteristics of samples based on community
composition in the clone library dataset. Each plot shows
the result of 10 model-test runs. In each run, 15 randomly
chosen samples were used to build a model and the
remaining five samples were used to test the model.
A. Predictions by the classification model to predict sample site.
Each bar represents a particular site (actual value); the colour of
San Francisco Bay deep nirS amplicon sequencing 15
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
the bar indicates the relative frequency with which samples from
that site were predicted by the model to have originated from
each of the five possible sites (predicted values).
B–D. Predictions by the salinity, nitrogen, and temperature
regression models, respectively. The x-axis represents
actual values, and the y-axis represents predicted values.
In all plots, the site of origin of the sample is indicated by
colour as given in the legend at lower right.
Tab le S1. Accuracy of Random Forest classification and regres-
sion models using nirS community composition to predict sample
properties, in either the clone library dataset or the PGM dataset.
Each model was built using either all 20 samples and 237 OTUs
of the clone library dataset, or all 35 samples and 1776 OTUs of
the PGM dataset. Values shown are the means from 100 models
built from each dataset. Top: Out-Of-Bag (OOB) classification
error for each category in the site classification model as a frac-
tion of 1, and the error for the whole model, averaged across all
sites. An OOB rate of zero means that no samples from that site
were misclassified; a rate of one means that all samples from
that site were misclassified. Bottom: pseudo-r
2
values from each
of three independent models. Pseudo-r
2
is calculated as 1-
(MSE/var(y)), where MSE 5mean squared OOB error and
var(y) 5variance of the dependent variable.
Table S2. Linear least-squares models explaining values of
four alpha-diversity metrics in the full dataset (not
resampled – uneven number of reads per sample) in rela-
tion to (i) the diversity metric in the resampled dataset (with
13 000 reads per sample) – to assess the portion of diver-
sity that is independent of sequencing depth and (ii) the
number of reads per sample. Models were additive with no
interactions. For all diversity metrics, values in the
resampled dataset and those in the original dataset corre-
sponded very closely. However, for all diversity metrics
except Simpson, samples size also mattered: samples with
greater numbers of reads had higher diversity.
16 J. A. Lee and C. A. Francis
V
C2017 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology,00, 00–00
... Additionally, chemolithotrophic denitrification may be responsible for N 2 production in OMZs where hydrogen sulfide (H 2 S) accumulates (Galán et al., 2014). The input and type of organic matter and salinity are among the key factors controlling denitrification rates and denitrifier community distribution in estuaries (e.g., Mosier and Francis, 2010;Eyre et al., 2013;Francis et al., 2013;Zhang Y. et al., 2014;Lee and Francis, 2017). Since NirK and NosZ require Cu, it could represent a regulating factor in denitrification and the production of N 2 O in marine environments (Granger and Ward, 2003). ...
... Therefore, most studies have documented the diversity and activity of denitrifiers in estuary sediments (e.g., Abell et al., 2010;Magalhães et al., 2011;Wang et al., 2014;, with a few of them in estuary waters (e.g., Santoro et al., 2006;Zhang Y. et al., 2014;. Denitrifiers often change along the estuarine salinity gradient, with distinct communities in fresh and marine regions (e.g., Abell et al., 2013;Francis et al., 2013;Lee and Francis, 2017). For instance, in the San Francisco Bay estuary, the abundance of nirK is higher in the riverine zone, whereas nirS is more abundant in marine zones (Mosier and Francis, 2010). ...
Article
Full-text available
Nitrogen (N) is a key element for life in the oceans. It controls primary productivity in many parts of the global ocean, consequently playing a crucial role in the uptake of atmospheric carbon dioxide. The marine N cycle is driven by multiple biogeochemical transformations mediated by microorganisms, including processes contributing to the marine fixed N pool (N2 fixation) and retained N pool (nitrification, assimilation, and dissimilatory nitrate reduction to ammonia), as well as processes contributing to the fixed N loss (denitrification, anaerobic ammonium oxidation and nitrite-dependent anaerobic methane oxidation). The N cycle maintains the functioning of marine ecosystems and will be a crucial component in how the ocean responds to global environmental change. In this review, we summarize the current understanding of the marine microbial N cycle, the ecology and distribution of the main functional players involved, and the main impacts of anthropogenic activities on the marine N cycle.
... A 12-bp barcode sequence unique for each sample was added between the sequencing adaptor and the reverse primer to differentiate samples. PCR was performed in triplicate following the protocols as described previously (Throbäck et al. 2004, Lee andFrancis 2017) (detailed PCR conditions are provided in Table S2). The PCR products of the nirS, nirK and nosZ-I genes were subsequently purified with DNA Gel Extraction Kit (Axygen, USA), then pooled in equimolar concentrations for paired-ends sequencing (2×250 bp) on an Illumina Hiseq 2500 platform. ...
... In addition, the absence of nirK-, nosZ-I-and nosZ-II-like denitrifying bacterial communities in the studied CKL sediment (salinity as high as 340 g/L) may indicate a severe environmental filtering (e.g. high salinity), because the primer pairs employed in this study can cover these functional genes in high-salinity environments with a salinity range of 0-130 g/L (Desnues et al. 2007, Dini-Andreote et al. 2016, Lee and Francis 2017, Miao et al. 2020. The difference in the sensitivity of nirS-/nirK-like nitrite-reducing bacteria and nosZ-I-/nosZ-II-like N 2 O-reducing bacteria to salinity are discussed in the following section. ...
Article
The distribution of nitrite- and N2O-reducing bacteria is key to potential N2O emission from lakes. However, such information in highland saline lakes remains unknown. Here, we investigated the abundance and community composition of nitrite- and N2O-reducing bacteria in the sediments of six saline lakes on the Qing-Tibetan Plateau. These studied lakes covered a wide range of salinity (1.0-340.0 g/L). Results showed that in the studied saline lake sediments nitrite-reducing bacteria were significantly more abundant than N2O-reducing bacteria, and their abundances ranged 7.14×103-8.26×108 and 1.18×106-6.51×107 copies per gram sediment (dry weight), respectively. Nitrite-reducing bacteria were mainly affiliated withα-, β-, and γ- Proteobacteria, with β- and α-Proteobacteria being dominant in low- and high-salinity lakes, respectively; N2O-reducing bacterial communities mainly consisted of Proteobacteria (α-, β-, γ-, and δ-subgroups), Bacteroidetes, Verrucomicrobia, Actinobacteria, Chloroflexi, Gemmatimonadetes and Balneolaeota, with Proteobacteria and Bacteroidetes/Verrucomicrobia dominating in low- and high-salinity lakes, respectively. The nitrite- and N2O-reducing bacterial communities showed distinct responses to ecological factors, and they were mainly regulated by mineralogical and physicochemical factors, respectively. In response to salinity change, the community composition of nitrite-reducing bacteria was more stable than that of N2O-reducing bacteria. These findings suggest that nitrite- and N2O-reducing bacteria may prefer niches with different salinity.
... Quantification real-time PCR (qPCR) of denitrifying nirS gene. We assessed the denitrification potential in the coral holobiont via the relative quantification of nirS gene, which catalyzes the conversion of nitrite to nitric oxide in the denitrification cascade (85) and has been previously used to determine denitrifier abundance and diversity (27,83,86). The nirS gene was amplified using the same primer pair nirS-1F, qR (83) previously used for in silico PCR as outlined above, and validated with Sanger sequencing (StarSEQ, Mainz, Germany). ...
Article
Full-text available
Mutualistic nutrient cycling in the coral-algae symbiosis depends on limited nitrogen (N) availability for algal symbionts. Denitrifying prokaryotes capable of reducing nitrate or nitrite to dinitrogen could thus support coral holobiont functioning by limiting N availability. Octocorals show some of the highest denitrification rates among reef organisms, however little is known about the community structures of associated denitrifiers and their response to environmental fluctuations. Combining 16S rRNA gene amplicon sequencing with nirS in-silico PCR and quantitative PCR, we found differences in bacterial community dynamics between two octocorals exposed to excess dissolved organic carbon (DOC) and concomitant warming. While bacterial communities of the gorgonian Pinnigorgia flava remained largely unaffected by DOC and warming, the soft coral Xenia umbellata exhibited a pronounced shift towards Alphaproteobacteria dominance under excess DOC. Likewise, the relative abundance of denitrifiers was not altered in P. flava , but decreased by one order of magnitude in X. umbellata under excess DOC likely due to decreased proportions of Ruegeria spp. Given that holobiont C:N ratios remained stable in P. flava but showed a pronounced increase with excess DOC in X. umbellata host, our results suggest that microbial community dynamics may reflect the nutritional status of the holobiont. Hence, denitrifier abundance may be directly linked to N availability. This suggests a passive regulation of N cycling microbes, which could help stabilize nutrient limitation in the coral-algal symbiosis and thereby support holobiont functioning in a changing environment. Importance Octocorals are important members of reef-associated benthic communities that can rapidly replace scleractinian corals as the dominant ecosystem engineers on degraded reefs. Considering the substantial change in the (a)biotic environment that is commonly driving reef degradation, maintaining a dynamic and metabolically diverse microbial community might contribute to octocoral acclimatization and ecological adaptation. Nitrogen (N) cycling microbes, in particular denitrifying prokaryotes, may support holobiont functioning by limiting internal N availability, but little is known about the identity and (a)biotic drivers of octocoral-associated denitrifiers. Here, we show contrasting dynamics of bacterial communities associated with two common octocoral species, the soft coral Xenia umbellata and the gorgonian Pinnigorgia flava after a six-week exposure to excess dissolved organic carbon (DOC) under concomitant warming conditions. The specific responses of denitrifier communities associated with the two octocoral species aligned with the nutritional status of holobiont members. This suggests a passive regulation of this microbial trait based on N availability in the coral holobiont.
... were optimal for species classification (Online Resource Fig S1). These values seems relatively high, compared to those ranging 85-90% in other amplicon sequencing studies targeting cytochrome-cd 1 nitrite reductase (Lee and Francis 2017), dissimilatory sulfite reductase (Pelikan et al. 2016) and ammonia monooxygenase (Pester et al. 2012). This might imply intraspecies conservation of CODHech genes within Firmicutes, while the CODHech genes are often horizontally transferred among different or same species (Techtmann et al. 2012;Sant'Anna et al. 2015). ...
Article
Full-text available
The microbial H2-producing (hydrogenogenic) carbon monoxide (CO)-oxidizing activity by the membrane-associated CO dehydrogenase (CODH)/energy-converting hydrogenase (ECH) complex is an important metabolic process in the microbial community. However, the studies on hydrogenogenic carboxydotrophs had to rely on inherently cultivation and isolation methods due to their rare abundance, which was a bottleneck in ecological study. Here, we provided gene-targeted sequencing method for the diversity estimation of thermophilic hydrogenogenic carboxydotrophs. We designed six new degenerate primer pairs which effectively amplified the coding regions of CODH genes forming gene clusters with ECH genes (CODHech genes) in Firmicutes which includes major thermophilic hydrogenogenic carboxydotrophs in terrestrial thermal habitats. Amplicon sequencing by these primers using DNAs from terrestrial hydrothermal sediments and CO-gas-incubated samples specifically detected multiple CODH genes which were identical or phylogenetically related to the CODHech genes in Firmictes. Furthermore, we found that phylogenetically distinct CODHech genes were enriched in CO-gas-incubated samples, suggesting that our primers detected uncultured hydrogenogenic carboxydotrophs as well. The new CODH-targeted primers provided us with a fine-grained (~ 97.9% in nucleotide sequence identity) diversity analysis of thermophilic hydrogenogenic carboxydotrophs by amplicon sequencing and will bolster the ecological study of these microorganisms.
... To remove heterogeneity in sequence depth, the samples were rarefied to the same sequence depth based on the least number of sequences for further analysis of alpha diversity (Chao 1, observed species, and Shannon). Beta diversity was calculated based on Euclidian distances (Lee and Francis, 2017) and visualized by principal co-ordinates analysis (PCoA). ...
Article
Dam construction has significantly altered riparian hydrological regime and environmental conditions in the reservoir region, yet knowledge concerning how bacterial community and N-cycling genes respond to these changes remains limited. In this study, we investigated the bacterial community composition, network structure and N-cycling genes in the water level fluctuation zones (WLFZs) of the Three Gorges Reservoir (TGR). Here, samples collected from five different water levels were divided into three groups: waterward sediments, interface sediments, and landward soils. Our results show that higher contents of NO2⁻-N, SOC, DOC, NH4⁺-N, and TP were characterized in waterward and interface sediments whereas higher NO3⁻-N content was observed in landward soils. The α-diversity of bacterial community decreased gradually from waterward sediments to landward soils. Compared with waterward sediments and landward soils, the interface sediments showed a unique bacterial community pattern with diverse primary producers as well as N-cycling microbes. The interface sediments also had a much more complex co-occurrence network and a higher possible community stability. Among all of N-cycling genes, higher abundances of nrfA and AOA amoA genes were observed in interface sediments. The dissimilarity in bacterial community composition and N-cycling gene abundance was mainly driven by water-level. Moreover, random forest model revealed that AOA amoA and nirS genes were the most sensitive indicators in response to water level fluctuations. Overall, this study suggests distinct abundance, diversity, and network structure of microbes in riparian sediments and soils across the gradient of water levels and enhances our understanding with respect to comprehensive effects of dam construction on nitrogen cycle.
... A total of 234 nirS sequences were obtained from six water samples in the fall season (Fig. S3) and were assigned to 135 OTUs (Fig. 2a) based on an 88% nucleotide similarity cutoff (Lee and Francis, 2017). According to the phylogenetic analysis result, the 135 OTUs were distributed across 12 phylogenetic clusters, among which cluster 7 included the most abundant nirS sequences (20.8% on average) (Fig. 2b). ...
... Alpha diversity measures of community richness (Chao1) and diversity (Shannon; Simpson) were calculated by Mothur software (Schloss et al., 2009). Beta diversity was calculated based on Euclidian distances (Lee and Francis, 2017) and was visualized by principal component analysis (PCA) in the Scikit-Learn package for Python (Pedregosa et al., 2011). The PCA was used to assess the dissimilarities of the microbial communities for each sample. ...
Article
The relationships between denitrifying microbial communities and their controlling factors are largely unknown in eutrophic estuaries sediment. This work showed that in Liaohe Estuary, nirS-type denitrifiers were consistently more abundant and diverse than nirK and nosZ-type denitrifiers, which probably means that they play an important role in nitrogen removal, particularly around nearshore stations. The dominant genera of nirK, nirS and nosZ-type denitrifiers were Sinorhizobium, Pseudomonas, and Azospirillum. Salinity, nitrogen levels, and sediment grain size were the main factors affecting the denitrification process in this eutrophic estuary. These results provide more information about the dynamics of denitrifying microbiota in marine sediments. Summary: The relationships between denitrifying microbial communities and abundance in estuaries sediment and their controlling factors are largely unknown, especially in eutrophic estuaries. In this study, nitrite reductase genes (nirS, nirK) and nitrous oxide reductase genes (nosZ) were used as molecular markers, qPCR and illumina Miseq high-throughput sequencing technology were used to study the relative abundance of key functional microflora groups and major environmental impact factors in Liaohe Estuary. The results showed that nirS-type denitrifiers were consistently more abundant and diverse than nirK and nosZ-type denitrifiers, which suggested that nirS-type denitrifiers probably play an important role in nitrogen removal in Liaohe Estuary, particularly around nearshore stations. The dominant genera of the bacterial containing nirK, nirS and nosZ genes were Sinorhizobium, Pseudomonas, and Azospirillum, in which Sinorhizobium and Azospirillum were nitrogen-fixing bacteria, while Pseudomonas was denitrogenation bacteria. The different dominant denitrifiers indicated that sedimentary denitrification was accomplished by cooperation of different denitrifying species rather than a single species. Salinity, NH4 ⁺, NO3 ⁻, NO2 ⁻ and sediment grain size were regarded as determinants for the denitrification process in the sediment of the estuary. Overall, the results of this study suggest that a comprehensive analysis of different denitrifying functional genes may provide more information about the dynamics of denitrifying microbiota in marine sediments.
Article
Amplicon sequencing of functional genes is a powerful technique to explore the diversity and abundance of microbes involved in biogeochemical processes. One such key process, denitrification, is of particular importance because it can transform nitrate (NO3⁻) to N2 gas that is released to the atmosphere. In nitrogen limited alpine wetlands, assessing bacterial denitrification under the stress of wetland desertification is fundamental to understand nutrients, especially nitrogen cycling in alpine wetlands, and thus imperative for the maintenance of healthy alpine wetland ecosystems. We applied amplicon sequencing of the nirS gene to analyze the response of denitrifying bacterial community to alpine wetland desertification in Zoige, China. Raw reads were processed for quality, translated with frameshift correction, and a total of 95,316 nirS gene sequences were used for rarefaction analysis, and 1011 OTUs were detected and used in downstream analysis. Compared to the pristine swamp soil, edaphic parameters including water content, organic carbon, total nitrogen, total phosphorous, available nitrogen, available phosphorous and potential denitrification rate were significantly decreased in the moderately degraded meadow soil and in severely degraded sandy soil. Diversity of the soil nirS-type denitrifying bacteria communities increased along the Zoige wetland desertification, and Proteobacteria and Chloroflexi were the dominant denitrifying bacterial species. Genus Cupriavidus (formerly Wautersia), Azoarcus, Azospira, Thiothrix, and Rhizobiales were significantly (P<0.05) depleted along the wetland desertification succession. Soil available phosphorous was the key determinant of the composition of the nirS gene containing denitrifying bacterial communities. The proportion of depleted taxa increased along the desertification of the Zoige wetland, suggesting that wetland desertification created specific physicochemical conditions that decreased the microhabitats for bacterial denitrifiers and the denitrification related genetic diversity.
Article
Full-text available
Extracellular DNA from dead microorganisms can persist in soil for weeks to years(1-3). Although it is implicitly assumed that the microbial DNA recovered from soil predominantly represents intact cells, it is unclear how extracellular DNA affects molecular analyses of microbial diversity. We examined a wide range of soils using viability PCR based on the photoreactive DNA-intercalating dye propidium monoazide(4). We found that, on average, 40% of both prokaryotic and fungal DNA was extracellular or from cells that were no longer intact. Extracellular DNA inflated the observed prokaryotic and fungal richness by up to 55% and caused significant misestimation of taxon relative abundances, including the relative abundances of taxa integral to key ecosystem processes. Extracellular DNA was not found in measurable amounts in all soils; it was more likely to be present in soils with low exchangeable base cation concentrations, and the effect of its removal on microbial community structure was more profound in high-pH soils. Together, these findings imply that this 'relic DNA' remaining in soil after cell death can obscure treatment effects, spatiotemporal patterns and relationships between microbial taxa and environmental conditions.
Article
Full-text available
Denitrifying bacteria play a critical role in the estuarine nitrogen cycle. Through the transformation of nitrate into nitrogen gas, these organisms contribute to the loss of bioavailable (i.e., fixed) nitrogen from low-oxygen environments such as estuary sediments. Denitrifiers have been shown to vary in abundance and diversity across the spatial environmental gradients that characterize estuaries, such as salinity and nitrogen availability; however, little is known about how their communities change in response to temporal changes in those environmental properties. Here, we present a 1-year survey of sediment denitrifier communities along the estuarine salinity gradient of San Francisco Bay. We used quantitative PCR and sequencing of functional genes coding for a key denitrifying enzyme, dissimilatory nitrite reductase, to compare two groups of denitrifiers: those with nirK (encoding copper-dependent nitrite reductase) and those with nirS (encoding the cytochrome-cd1-dependent variant). We found that nirS was consistently more abundant and more diverse than nirK in all parts of the estuary. The abundances of the two genes were tightly linked across space but differed temporally, with nirK peaking when temperature was low and nirS peaking when nitrate was high. Likewise, the diversity and composition of nirK- versus nirS-type communities differed in their responses to seasonal variations, though both were strongly determined by site. Furthermore, our sequence libraries detected deeply branching clades with no cultured isolates, evidence of enormous diversity within the denitrifiers that remains to be explored.
Article
Full-text available
Water quality is an emergent property of a complex system comprised of interacting microbial populations and introduced microbial and chemical contaminants. Studies leveraging next-generation sequencing (NGS) technologies are providing new insights into the ecology of microbially mediated processes that influence fresh water quality such as algal blooms, contaminant biodegradation, and pathogen dissemination. In addition, sequencing methods targeting small subunit (SSU) rRNA hypervariable regions have allowed identification of signature microbial species that serve as bioindicators for sewage contamination in these environments. Beyond amplicon sequencing, metagenomic and metatranscriptomic analyses of microbial communities in fresh water environments reveal the genetic capabilities and interplay of waterborne microorganisms, shedding light on the mechanisms for production and biodegradation of toxins and other contaminants. This review discusses the challenges and benefits of applying NGS-based methods to water quality research and assessment. We will consider the suitability and biases inherent in the application of NGS as a screening tool for assessment of biological risks and discuss the potential and limitations for direct quantitative interpretation of NGS data. Secondly, we will examine case studies from recent literature where NGS based methods have been applied to topics in water quality assessment, including development of bioindicators for sewage pollution and microbial source tracking, characterizing the distribution of toxin and antibiotic resistance genes in water samples, and investigating mechanisms of biodegradation of harmful pollutants that threaten water quality. Finally, we provide a short review of emerging NGS platforms and their potential applications to the next generation of water quality assessment tools.
Article
Understanding rates of nitrogen cycling in estuaries is crucial for understanding their productivity and resilience to eutrophication. Nitrification, the microbial oxidation of ammonia to nitrite and nitrate, links reduced and oxidized forms of inorganic nitrogen and is therefore an important step of the nitrogen cycle. However, rates of nitrification in estuary waters are poorly characterized. In fall and winter of 2011–2012, we measured nitrification rates throughout the water column of all major regions of San Francisco Bay, a large, turbid, nutrient-rich estuary on the west coast of North America. Nitrification rates were highest in regions furthest from the ocean, including many samples with rates higher than those typically measured in the sea. In bottom waters, nitrification rates were commonly at least twice the magnitude of surface rates. Strong positive correlations were found between nitrification and both suspended particulate matter and ammonium concentration. Our results are consistent with previous studies documenting high nitrification rates in brackish, turbid regions of other estuaries, many of which also showed correlations with suspended sediment and ammonium concentrations. Overall, nitrification in estuary waters appears to play a significant role in the estuarine nitrogen cycle, though the maximum rate of nitrification can differ dramatically between estuaries.
Article
The conversion of nitrite to nitric oxide in the denitrification pathway is catalyzed by at least two structurally dissimilar nitrite reductases, NirS and NirK. Although they are functionally equivalent, a genome with genes encoding both reductases has yet to be found. This exclusivity raises questions about the ecological equivalency of denitrifiers with either nirS or nirK, and how different ecological and evolutionary factors influence community assembly of nirS and nirK denitrifiers. Using phylogeny-based methods for analyzing community structure, we analyzed nirS and nirK data sets compiled from sequence repositories. Global patterns of phylogenetic community structure were determined using Unifrac, whereas community assembly processes were inferred using different community relatedness metrics. Similarities between globally distributed communities for both genes corresponded to similarities in habitat salinity. The majority of communities for both genes were phylogenetically clustered; however, nirK marine communities were more phylogenetically overdispersed than nirK soil communities or nirS communities. A more in-depth analysis was performed using three case studies in which a comparison of nirS and nirK community relatedness within the sites could be examined along environmental gradients. From these studies we observed that nirS communities respond differently to environmental gradients than nirK communities. Although it is difficult to attribute nonrandom patterns of phylogenetic diversity to specific niche-based or neutral assembly processes, our results indicate that coexisting nirS and nirK denitrifier communities are not under the same community assembly rules in different environments.