ArticlePDF Available

Abstract and Figures

In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus, Aeromonas , and Citrobacter . We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species’ viability from total RNA sequencing.
Content may be subject to copyright.
Monitoring the microbiome for food safety and quality using
deep shotgun sequencing
Kristen L. Beck
, Niina Haiminen
, David Chambliss
, Stefan Edlund
, Mark Kunitomi
, B. Carol Huang
Nguyet Kong
, Balasubramanian Ganesan
, Robert Baker
, Peter Markwell
, Ban Kawas
, Matthew Davis
Robert J. Prill
, Harsha Krishnareddy
, Ed Seabolt
, Carl H. Marlowe
, Sophie Pierre
, André Quintanar
, Laxmi Parida
Geraud Dubois
, James Kaufman
and Bart C. Weimer
In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or
environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry
meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix ltering step that
improved microbe detection specicity to >99.96% during in silico validation. The pipeline identied 119 microbial genera per HPP
sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus,
Aeromonas, and Citrobacter. We also observed shifts in the microbial community corresponding to ingredient composition
differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did
not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food
microbial communities, while additional work is required for predicting specic speciesviability from total RNA sequencing.
npj Science of Food (2021) 5:3 ;
Sequencing the microbiome of food may reveal characteristics of
the associated microbial content that culturing or targeted whole-
genome sequencing (WGS) alone cannot. However, to meet the
various needs of food safety and quality, next-generation sequen-
cing (NGS), and analysis techniques require additional development
with specic consideration for accuracy, speed, and applicability
across the supply chain
. Microbial communities and their
characteristics have been studied in relation to avor and quality
in fermented foods
, agricultural processes in grape
and apple
, and manufacturing processes and production batches in
Cheddar cheese
. However, the advantage of using the microbiome
specically for food safety and quality has yet to be demonstrated.
Currently, food safety regulatory agencies including the Food
and Drug Administration (FDA), Centers for Disease Control and
Prevention (CDC), United States Department of Agriculture
(USDA), and European Food Safety Authority (EFSA) are conver-
ging on the use of WGS for pathogen detection and outbreak
investigation. Large scale WGS of food-associated bacteria was
rst initiated via the 100 K Pathogen Genome Project
with the
goal of expanding the diversity of bacterial reference genomesa
crucial need for foodborne illness outbreak investigation, trace-
ability, and microbiome studies
. However, since WGS relies on
culturing a microbial isolate prior to sequencing, there are
inherent biases and limitations in its ability to describe the
microorganisms and their interactions in a food sample. Such
information would be very valuable for food safety and quality
High-throughput sequencing of the total DNA and total RNA are
promising approaches to characterize microbial niches in their
native state without introducing bias due to culturing
addition, total RNA sequencing has the potential to provide
evidence of live and biologically active components of the
. It also provides accurate microbial naming, relative
microbial abundance, and better reproducibility than total DNA or
amplicon sequencing
. Total RNA sequencing minimizes PCR
amplication bias that occurs in single gene amplicon sequencing
and overcomes the decreased detection sensitivity from using DNA
sequencing in metagenomics
. Total RNA metatranscriptome
sequencing, however, is yet to be examined in raw food ingredients
as a method to provide a robust characterization of the microbial
communities and the interacting population dynamics.
From a single sequenced food microbiome, numerous dimen-
sions of the sample can be characterized that may yield important
indicators of safety and quality. Using total DNA or RNA, evidence
for the eukaryotic food matrix can be examined. In Haiminen
et al.
, we quantitatively demonstrated the utility of metagenome
sequencing to authenticate the composition of complex food
matrices. In addition, from total DNA or RNA, one can observe
signatures from commensal microbes, pathogenic microbes, and
genetic information for functional potential (from DNA) or
biologically active function (from RNA)
. Detecting active
transcription from live microbes in food is very important to
avoid spurious microbial observations that may instead be false
positives due to quiescent DNA in the sample. The use of RNA in
food analytics also offers the opportunity to examine the
expression of metabolic processes that are related to antibiotic
, virulence factors, or replication genes, among
others. In addition, it has the potential to dene viable microbes
that are capable of replication in the food and even microorgan-
isms that stop replicating but continue to produce metabolic
activity that changes food quality and safety
Consortium for Sequencing the Food Supply Chain, San Jose, CA, USA.
IBM Almaden Research Center, San Jose, CA, USA.
IBM T.J. Watson Research Center, Yorktown Heights,
Ossining, NY, USA.
University of California Davis, School of Veterinary Medicine, 100 K Pathogen Genome Project, Davis, CA 95616, USA.
Mars Global Food Safety Center, Beijing,
Wisdom Health, A Division of Mars Petcare, Vancouver, WA, USA.
Bio-Rad Laboratories, Hercules, CA, USA.
Bio-Rad, Food Science Division, MArnes-La-Coquette, France.
These authors contributed equally: Kristen L. Beck, Niina Haiminen. email:;
Published in partnership with Beijing Technology and Business University
Microorganisms are sensitive to changes in temperature, salinity,
pH, oxygen content, and many other physicochemical factors that
alter their ability to grow, persist, and cause disease. They exist in
dynamic communities that change in response to environmental
perturbationjust as the gut microbiome shifts in response to
. Shifts in microbiome composition or activity can be
leveraged in the application of microbiome characterization to
monitor the food supply chain. For example, Noyes et al. followed
the microbiome of cattle from the feedlot to the food packaging,
concluding that the microbial community and antibiotic resistance
characteristics change based on the processing stage
hypothesize that observable shifts in microbial communities of food
can serve as an indicator of food quality and safety.
In this work, we examined 31 high protein powder samples
(HPP; derived from poultry meal). HPP are commonly used raw
materials in pet foods. They are subject to microbial growth prior
to preparation and continued survival in powder form
subjected the HPP samples to deep total RNA sequencing with
~300 million reads per sample. In order to process the 31 samples
collected over ~1.5 years from two suppliers at a single location,
we dened and calibrated the appropriate methodsfrom
sample preparation to bioinformatics analysisneeded to tax-
onomically identify the community members present and to
detect key features of microbial growth. First, we removed the
HPPs food matrix RNA content as eukaryotic background with an
important bioinformatic ltering step designed specically for
food analysis. The remaining sequences were used for relative
quantication of microbiome members and for identifying shifts
based on food matrix content, production source, and Salmonella
culturability. This work demonstrates that total RNA sequencing is
a robust approach for monitoring the food microbiome for use in
food safety and quality applications, while additional work is
required for predicting pathogen viability.
Evaluation of microbial identication capability in total RNA
and DNA sequencing
Microbial identication in microbiomes often leverages shotgun
DNA sequencing; however, total RNA sequencing can provide
additional information about viable bacterial activity in a
community via transcriptional activity. Since using total RNA to
study food microbiomes is novel, each step of the analysis
workow (Fig. 1) was carefully designed and scrutinized for
accuracy. For all analyses done in this study, we report relative
abundance in reads per million (RPM) (Eq. 1) as recommended by
Gloor et al.
and apply the conservative threshold of RPM > 0.1
to indicate presence as indicated by Langelier et al. and Illot
et al.
. Numerically, this threshold translates to ~30 reads per
genus per sample considering a sequencing depth of ~300 million
reads per sample (see the section Microbial identication). First,
we examined the effectiveness of RNA for taxonomic identication
and relative quantication of microbes in the presence of food
matrix reads. We observed that RNA sequencing results correlated
=0.93) with the genus relative quantication provided by DNA
sequencing (Supplementary Fig. S1). RNA sequencing also
detected more genera demonstrated by a higher α-diversity than
the use of DNA (Supplementary Fig. S2). In addition, from the
same starting material, total RNA sequencing resulted in 2.4-fold
more reads classied to microbial genera compared to total DNA
sequencing (after normalizing for sequencing depth). This
increase is substantial as microbial reads are such a small fraction
of the total sequenced reads. Considering these results, we further
examined the microbial content from total RNA extracted from 31
HPP samples (Supplementary Table 1) that resulted in an average
of ~300 million paired-end 150 bp sequencing reads per sample
in this study.
Evaluation and application of in silico ltering of eukaryotic
food matrix reads
Sequenced reads from the eukaryotic host or food matrix may
lead to false positives for microbial identication in microbiome
. This may occur partly due to reads originating from
low complexity regions of eukaryotic genomes, e.g., telomeric
and centromeric repeats, being misclassied as spurious
microbial hits
. In total DNA or RNA sequencing of clinical or
animal or even plant microbiomes, eukaryotic content may
often comprise >90% of the total sequencing reads. This
presents an important bioinformatic challenge that we
addressed by ltering matrix content using a custom-built
reference database of 31 common food ingredient and
contaminant genomes (Supplementary Table 2) using the
k-mer classication tool Kraken
. This step allows for rapidly
classifying all sequenced reads (~300 million reads for each of
Fig. 1 Bioinformatic pipeline schematic for processing microbiome samples in the presence of matrix content. Description of the
bioinformatic steps (light gray) applied to high protein powder metatranscriptome samples (dark gray). Black arrows indicate data ow and
blue boxes describe outputs from the pipeline.
K.L. Beck et al.
npj Science of Food (2021) 3 Published in partnership with Beijing Technology and Business University
31 samples) as matrix or non-matrix. The matrix ltering process
yielded an estimate of the total percent matrix content for a
sample. See our work in Haiminen et al.
on quantifying the
eukaryotic food matrix components with further precision.
To validate the matrix ltering step, we constructed in silico
mock food microbiomes with a high proportion of complex food
matrix content and low microbial content (Supplementary Table
3). We then computed the true positive, false positive, and false-
negative rates of observed microbial genera and sequenced reads
(Table 1). False-positive viral, archaeal, and eukaryotic microbial
genera (as well as bacteria) were observed without matrix ltering,
although bacteria were the only microbes included in the
simulated mixtures. Introducing a matrix ltering step to the
pipeline improved read classication specicity to >99.96% (from
78 to 93% without ltering) in both simulated food mixtures while
maintaining zero false negatives. With this level of demonstrated
accuracy, we used bioinformatic matrix ltering prior to further
microbiome analysis.
HPP microbiome ecology
After ltering eukaryotic matrix sequences, we applied the
remaining steps in the bioinformatic workow (Fig. 1) to examine
the shift in the HPP microbiome membership and to quantify the
relative abundance of microbes at the genus level. Genus is the
rst informative taxonomic rank for food pathogen identication
that can be considered accurate given the current incompleteness
of reference databases
and was therefore used in
subsequent analyses. Overall, between 98 and 195 microbial
genera (avg. 119) were identied (RPM > 0.1) per HPP sample
(Supplementary Table 4). When analyzing α-diversity i.e., the
number of microbes detected per sample, inter-sample compar-
isons may become skewed unless a common number of reads is
considered since deeper sequenced samples may contain more
observed genera merely due to a greater sampling depth
Thus, we utilized bioinformatic rarefaction i.e., subsampling
analysis to showcase how microbial diversity was altered by
sequencing depth. Examination of α-diversity across a range of in
silico subsampled sequencing depths showed that the community
diversity varied across samples (Fig. 2a). One sample (MFMB-04)
had 1.7 times more genera (195) than the average across
other samples (avg. 116, range 98143) and exhibited higher α-
diversity than any other sample at each in silico sampled
sequencing depth (Fig. 2a). Rarefaction analysis further demon-
strated that when considering fewer than ~67 million sequenced
reads, the observable microbial population was not saturated
(median elbow calculated as indicated in Satopää, et al.
). This
observation suggests that deeper sequencing or more selective
sequencing of the HPP microbiomes will reveal more microbial
Notably, between 2 and 4% (~5,000,00014,000,000) of reads
per sample remained unclassied as either eukaryotic matrix or
microbe (Supplementary Fig. S3). However, the unclassied reads
exhibited a GC (guanine plus cytosine) distribution similar to reads
classied as microbial (Supplementary Fig. S4), indicating these
reads may represent microbial content that is absent or
sufciently divergent from existing references.
We calculated β-diversity to study inter-sample microbiome
differences and to identify any potential outliers among the sample
collection. The Aitchison distances
of microbial relative abun-
dances were calculated between samples (as recommended for
compositional microbiome data
), and the samples were
hierarchically clustered based on the resulting distances (Fig. 2b).
The two primary clades were mostly dened by the supplier (except
for MFMB-17). Samples were collected over several months with
Supplier A contributing three batches over time and Supplier B
contributing one shipment batch (Supplementary Table 1); despite
time point differences, the microbiome composition still clusters
into separate clades by the supplier. In Haiminen et al.
reported that three of the HPP samples contained unexpected
eukaryotic species. We hypothesized that the presence of these
contaminating matrix components (beef identiable as Bos taurus
and pork identiable as Sus scrofa) would alter the microbiome as
compared to chicken (identiable as Gallus gallus) alone. Clustering
HPP samples using their microbiome membership led to a distinctly
different group of the matrix-contaminated samples, supporting
this hypothesis (Fig. 2b). These observations indicate that samples
can be discriminated based on their microbiome content for
originating source and supplier, which is necessary for source
tracking potential hazards in food.
Table 1. Accuracy of microbial identication using two in silico constructed simulated food mixtures.
Mixture 1, no
Mixture 1, MF
Mixture 1, no
Mixture 1,
Mixture 2, no
Mixture 2, MF
Mixture 2, no
Mixture 2,
Bacteria in mixture
(expected Content)
14 14 15,000 15,000 14 14 15,000 15,000
Observed bacteria 34 18 13,700 13,517 33 15 13,999 13,551
Observed viruses 9 0 563 0 4 0 328 0
Observed archaea 1 0 1 0 1 0 3 0
Observed Eukaryota 4 0 104 0 4 0 799 0
Total observed (TO) 48 18 14,368 13,517 42 15 15,129 13,551
True positives (TP) 14 14 13,571 13,511 14 14 13,623 13,548
TP as % of TO 29% 78% 94.45% 99.96% 33% 93% 90.05% 99.98%
False positives (FP) 34 4 797 6 28 1 1506 3
FP as % of TO 71% 22% 5.55% 0.04% 67% 7% 9.95% 0.02%
FP removed with MF 30 791 27 1503
% FP removed
with MF
88.2% 99.2% 96.4% 99.8%
The Simulated Food Mixtures (Mixture 1 and Mixture 2, see Supplementary Table 3) contain food matrix and microbial sequences. Microbial identication
results are shown without matrix ltering (no MF) and with matrix ltering (MF). The number of observed genera (# GENERA) and observed genus-assigned
reads (# READS) are shown for each category and summarized as the total observed ( TO) counts. True positive (TP) and false-positive (FP) counts and fractions
of TO are shown. The last two rows show the counts and percentages of false positives removed with matrix ltering.
K.L. Beck et al.
Published in partnership with Beijing Technology and Business University npj Science of Food (2021) 3
Comparative analysis of HPP microbiome membership and
We identied 65 genera present in all HPP samples (Fig. 3a),
whose combined abundance accounted for between 88 and 99%
of the total abundances of detected genera per sample.
Bacteroides, Clostridium, Lactococcus, Aeromonas, and Citrobacter
were the ve most abundant of these microbial genera. The
identied microbial genera also included viruses, the most
abundant of which was Gyrovirus (<10 RPM per sample). Gyrovirus
represents a genus of non-enveloped DNA viruses responsible for
chicken anemia which is ubiquitous in poultry. While there were
only 65 microbial genera identied in all 31 HPP samples, the α-
diversity per sample was on average twofold greater as previously
Beyond the collection of 65 microbes observed in all samples,
there were an additional 164 microbes present in various HPP
samples. Together, we identied a total of 229 genera among the
31 HPP samples tested (Figs 3b and 4, Supplementary Table 4). In
order to identify genera that were most variable between samples,
we computed the median absolute deviation (MAD)
using the
normalized relative abundance of each microbe (Fig. 5a). The
abundance of Bacteroides was the most variable among samples
(median =148.1 RPM, MAD =30.6) and showed increased abun-
dance in samples from Supplier A (excluding samples with
known host contamination) compared to Supplier B
(BenjaminiHochberg adjusted P< 0.00005). In total, there were
55 genera with signicant differences in abundance between
Supplier A and Supplier B (adjusted P< 0.01). Of the ten most
variable genera based on MAD, Aeromonas,Enterobacter,Pseudo-
monas, and Lactobacillus also had signicant differences between
Supplier A and B (adjusted P< 0.01 with their relative abundances
shown in Fig. 5b). In addition, Clostridium (median =37.4 RPM,
MAD =24.2), Lactococcus (median =36.8 RPM, MAD =18.2), and
Lactobacillus (median =24.2, MAD =7.2) were also highly variable
and threefold to fourfold more abundant in samples MFMB-04
and MFMB-20 compared to other samples (Fig. 5b). Pseudomonas
(median =11.1 RPM, MAD =12.2) was markedly more abundant
in MFMB-83 than any other sample (Fig. 5b). These genera
highlight variability between microbiomes based on supplier
origin or food source and may provide insights into other
dissimilarities in these samples.
Fig. 3 Microbial genera detected in high protein powder samples. a Phylogram of the 65 microbial genera present in all samples with
RPM > 0.1. bPhylogram of microbes observed in any sample. Log of the median RPM value across samples is indicated. Gray indicating a
median RPM value of 0.
Fig. 2 Ecological metrics of microbiome community. a Alpha diversity (number of genera) for all (n=31) high protein powder
metatranscriptomes is compared to the total number of sequenced reads for a range of in silico subsampled sequencing depths. The dashed
vertical line indicates the medial elbow (at ~ 67 million reads). bHierarchical clustering of Aitchison distance values of poultry meal samples
based on microbial composition. Samples were received from Supplier A (blue and red) and Supplier B (green). Matrix-contaminated samples
are additionally marked in red.
K.L. Beck et al.
npj Science of Food (2021) 3 Published in partnership with Beijing Technology and Business University
Microbiome shifts in response to changes in food matrix
We tested the hypothesis that the microbiome composition will
shift in response to changes in the food matrix and can be a
unique signal to indicate contamination or adulteration. In 28 of
the 31 HPP samples, >99% of the matrix reads were determined
in our related work
to originate from poultry (Gallus gallus),
which was the only ingredient expected based on ingredient
specications. However, three samples had higher pork and beef
content compared to all other HPP samples: MFMB-04 (7.74%
pork, 8.99% beef), MFMB-20 (0.53% pork, 1.00% beef), and
MFMB-38 (0.92% pork, 0.29% beef) compared to the highest pork
(0.01%) and beef (0.00%) content among the other 28 HPP
samples (Supplementary Data by Haiminen et al.
). The
microbiomes of these matrix-contaminated samples, each
coming from Supplier A, also clustered into a separate sub-
cluster (Fig. 2b). This demonstrated that a shift in the food matrix
composition was associated with an observable shift in the food
We further computed pairwise Spearmans correlation between
all samples, using the RPM vectors for the 229 detected genera as
input (Supplementary Fig. S5). Here, we exclude MFMB-04, MFMB-
20, and MFMB-38 from the group Supplier A samplesand
consider them as a separate matrix-contaminated group. The
mean correlation between Supplier A samples was 0.946, while
the mean correlation between Supplier B samples was 0.816. The
mean correlation between Supplier A and Supplier B samples was
0.805, lower than either within-group correlation. Contrasted
with this, the mean correlation between MFMB-04 and Supplier A
samples was 0.656, analogously for MFMB-20 the mean correlation
was 0.866, and for MFMB-38 it was 0.885. The increasing
correlation values correspond with decreasing percentages of
cattle and pork reads in the matrix-contaminated samples (16.7%
in MFMB-04, 1.5% in MFMB-20, and 1.2% in MFMB-38), indicating a
trend toward the microbial baseline with decreasing matrix
MFMB-04 and MFMB-20 had the highest percentage of
microbial reads compared to other samples (Supplementary Fig.
Fig. 4 High protein powder (HPP) microbial composition and relative abundance per sample. Heatmap (log
-scale) of HPP microbial
composition and relative abundance (RPM) where absence (RPM < 0.1) is indicated in gray. Genera are ordered by summed abundance across
samples. Samples were received from Supplier A (blue) and Supplier B (green). Red stars indicate matrix-contaminated samples (from
Supplier A).
Pseudomonas Campylobacter
0 50 100 150
Median Absolute Deviation (MAD)
Median and MAD of microbial genera (RPM > 0.1)
Supplier A Supplier B
Fig. 5 Variability of microbial genera relative abundance. a All identied microbial general are plotted with median value and median
absolute deviation (MAD) of RPM abundance. Genera with MAD > 5 are labeled with the genus name and a linear t is indicated by a
blue dotted line. bHeatmap (log
-scale) of ten microbial genera with the largest median absolute deviation (MAD) across samples. Genera
are ordered by decreasing MAD from top to bottom. Samples were received from Supplier A (blue) and Supplier B (green). Red stars indicate
matrix-contaminated samples (from Supplier A).
K.L. Beck et al.
Published in partnership with Beijing Technology and Business University npj Science of Food (2021) 3
S3). They also exhibited an increase in Lactococcus, Lactobacillus,
and Streptococcus relative abundances compared to other samples
(Fig. 5b), also reected at respective higher taxonomic levels
above genus (Supplementary Fig. S6).
There were 53 genera identied uniquely in MFMB-04 and/or
MFMB-20 i.e., RPM values above the aforementioned threshold in
these samples but not present in any other sample. (MFMB-38 had
a very low microbial load and contributed no uniquely identied
genera above the abundance threshold.) MFMB-04 contained 44
unique genera (Fig. 4) with the most abundant being Macrococcus
(35.8 RPM), Psychrobacter (23.8 RPM), and Brevibacterium (18.1
RPM). In addition, Paenalcaligenes was present only in MFMB-04
and MFMB-20 with an RPM of 6.4 and 0.3, respectively, compared
to a median RPM of 0.004 among other samples. Notable
differences in the matrix-contaminated samplesunique microbial
community membership compared to other samples may provide
microbial indicators associated with unanticipated pork or beef
Genus-level identication of foodborne microbes
We evaluated the ability of total RNA sequencing to identify
genera of commonly known foodborne pathogens within the
microbiome. We focused on fourteen pathogen-containing genera
including Aeromonas, Bacillus, Campylobacter, Clostridium, Coryne-
bacterium, Cronobacter, Escherichia, Helicobacter, Listeria, Salmo-
nella, Shigella, Staphylococcus, Vibrio, and Yersinia that were found
to be present in the HPP samples with varying relative
abundances. Of these genera, Aeromonas,Bacillus,Campylobacter,
Clostridium,Corynebacterium,Escherichia,Salmonella, and Staphy-
lococcus were detected in every HPP with median abundance
values between 0.5848.31 RPM (Fig. 6a). This indicated that a
baseline fraction of reads can be attributed to foodborne microbes
when using NGS. Of those genera appearing in all samples, there
was observed sample-to-sample variation in their abundance with
some genera exhibiting longer tails of high abundance, e.g.,
Staphylococcus and Salmonella, whereas others exhibit very low
abundance barely above the threshold of detection, e.g., Bacillus
and Yersinia (Fig. 6a). None of the pathogen-containing genera
were consistent with higher relative abundances due to differ-
ences in food matrix composition. Bacillus and Corynebacterium
exhibited slightly higher relative abundances in sample MFMB-04
which contained 7.7% pork and 9.0% beef (Fig. 6b). Yet while
MFMB-04 contained higher cumulative levels of these foodborne
microbes, the next highest sample was MFMB-93 which was not
associated with altered matrix composition, and both MFMB-04
and MFMB-93 contained higher levels of Staphylococcus (Fig. 6b).
Thus, matrix composition alone did not explain variations of these
pathogen-containing genera.
Interestingly, low to moderate levels of Salmonella were
detected within all 31 HPP microbiomes (Fig. 6a). The presence
of Salmonella in HPP is expected but the viability of Salmonella is
an important indicator of safety and quality. Thus, we further
sought to delineate Salmonella growth capability within these
microbiomes by comparing culturability with multiple established
bioinformatic NGS methods for Salmonella relative abundances in
the samples.
Assessment of Salmonella culturability and total RNA
Total RNA sequencing of food microbiomes has the potential to
provide additional sensitivity beyond standard culture-based food
safety testing to conrm or reject the presence of potentially
pathogenic microbes. In all of the examined HPP samples, some
portion of the sequenced reads were classied as belonging to
pathogen-containing genera (Fig. 6); however, the presence of
RNA transcripts does not necessarily indicate the current growth
of the organism itself. We further inspected one pathogen of
interest, Salmonella, to determine the congruence between
sequencing-based and culturability results. Of the 31 samples
examined with total RNA sequencing, Salmonella culture testing
was applied to 27 samples, of which four were culture-positive.
Surprisingly, Salmonella culture-positive samples were not among
those with the highest relative abundance of Salmonella from
sequencing (Fig. 7a). When ranking the samples by decreasing
Salmonella abundance, the culture-positive samples were not
enriched for higher ranks (P=0.86 from Wilcoxon rank-sum test
indicating that the distributions are not signicantly different,
Table 2). To conrm that the microbiome analysis pipeline did not
miss Salmonella reads present, we completed two orthogonal
analyses on the same dataset used in the microbial identication
step. The reference genomes relevant to these additional analyses
were publicly available and closed high-quality genomes available
from the sources indicated below.
Yers i n i a
Relative abundance (RPM)
High priority food safety genera relative abundance
Relative abundance (RPM)
Yers i n i a
High priority food safety genera microbial composition
Fig. 6 Relative abundance for fourteen pathogen-containing genera. a Relative abundance distribution of genera with high relevance to
food safety and quality from high protein powder (HPP) total RNA sequenced microbiomes. The width of the violin plot indicates the density
of samples with relative abundance at that value. Observation threshold of RPM =0.1 is indicated with the horizontal black line. bThe relative
abundances of those same genera are shown across samples of HPP total RNA sequenced samples.
K.L. Beck et al.
npj Science of Food (2021) 3 Published in partnership with Beijing Technology and Business University
First, for targeted analysis, we aligned the sequenced reads
using a different tool, Bowtie2
, to an augmented Salmonella-only
reference database. This reference was comprised of the 264
Salmonella genomes extracted from NCBI RefSeq Complete (used
in our previous microbial identication step) as well as an
additional 1183 public Salmonella genomes which represent
global diversity within the genus
. The number of reads that
aligned to the Salmonella-only reference was on average 370-fold
higher than identied as Salmonella by Kraken using the multi-
microbe NCBI RefSeq Complete. In this additional analysis, the
culture-positive samples had overall higher ranks compared to
culture-negative samples (P=0.06, Table 2), indicating that
additional Salmonella genomic data in the reference signicantly
improved discriminatory identication power. Salmonella culture-
positive samples were still not the most abundant (Fig. 7b), but
with an enriched database, sequencing positioned all four
culturable samples within the top ten rankings.
The second additional analysis examined the alignment of the
reads to a specic gene required
for replication and protein
production in actively dividing Salmonellaelongation factor Tu
(ef-Tu). This was done by aligning the reads to 4846 gene
sequences for ef-Tu extracted for a larger corpus of Salmonella
genomes from the Functional Genomics Platform (formerly
. The relative abundances of this transcript in
culture-positive samples were still comparable to culture-
negative samples (Fig. 7c). Culture-positive samples did not
exhibit higher ranks compared to culture-negative samples (P=
0.56, Table 2), indicating that ef-Tu relative abundance alone was
not sufcient to improve the lack of concordance in culturability
vs sequencing. These two orthogonal analyses demonstrated that
results from carefully developed culture-based testing and those
from current high-throughput sequencing technologies, whether
assessed at overall reads aligned or specic gene abundances,
were not conclusively in agreement when detecting active
Salmonella in food samples (Fig. 7and Table 2). However, the
use of a reference database enriched in whole-genome sequences
of the specic organism of interest was found appropriate for food
safety applications.
Since microbes compete for available resources within an
environmental niche and therefore impact one another
investigated Salmonella culture results in conjunction with co-
occurrence patterns of other microbes in the total RNA sequen-
cing data (Fig. 8). Point-biserial correlation coefcients (r
) were
calculated between Salmonella culturability results (presence or
absence which were available for 27 of the 31 samples) and
microbiome relative abundance. We observed 31 genera that
positively correlated and with Salmonella presence (r
> 0.5).
Erysipelothrix,Lactobacillus,Anaerococcus,Brachyspira, and Jeotga-
libaca exhibited the largest positive correlations. Gyrovirus was
negatively correlated with Salmonella growth (r
=0.54). In
three of the four Salmonella-positive samples (MFMB-04, MFMB-
20, and MFMB-38), food matrix contamination was also observed
(Supplementary Data in Haiminen et al.
). The concurrency of
Salmonella growth and matrix contamination was afrmed by the
microbial co-occurrence (specically Erysipelothrix, Brachyspira,
and Gyrovirus). This highlights the complex dynamic and
Salmonella Abundance (RPM)
Sample ID
Salmonella Content from Metatranscriptome Classifications By Sample
200 400 600 800
Salmonella Abundance (RPM)
Sample ID
Salmonella Alignments to Complete Genomes By Sample
1.0 1.5 2.0 2.5
Salmonella Abundance (RPM)
Sample ID
Salmonella Alignments to ef−Tu By Sample
detection_per_100g absence no_record presence
Fig. 7 Salmonella culture-positive status vs. high-throughput sequencing read abundance. Read abundance (RPM) shown from ak-mer
classication to NCBI Microbial RefSeq Complete, balignments to 1447 Salmonella genomes, and calignments to 4846 EF-Tu gene sequences.
Salmonella presence (red) indicates culture-positive result, absence (green) indicates culture-negative result, and no record (black) indicates
samples for which no culture test was completed.
Table 2. Salmonella analyses.
positive sample
MFMB-04 8th 10th 1st
MFMB-20 9th 9th 4th
MFMB-38 20th 3rd 21st
MFMB-41 30th 6th 28th
Rank-sum test
P=0.86 P=0.06 P=0.56
The ranks for Salmonella-positive samples and the associated Pvalues from
Wilcoxon rank-sum test are shown for high-throughput sequencing read
abundance (in RPM) for multiple analyses: k-mer classication to NCBI
Microbial RefSeq Complete (left), alignments to 1447 Salmonella genomes
(middle), and alignments to 4846 ef-Tu gene sequences (right). The
corresponding Salmonella relative abundances are shown in Fig. 7ac.
K.L. Beck et al.
Published in partnership with Beijing Technology and Business University npj Science of Food (2021) 3
community co-dependency of food microbiomes, yet shows that
multiple dimensions of the data (microbiome composition,
culture-based methods, and microbial load) will signal anomalies
from typical samples when there is an issue in the supply chain.
Accurate and appropriate tests for detecting potential hazards in
the food supply chain are key to ensuring consumer safety and
food quality. Monitoring and regular testing of raw ingredients
can reveal uctuations within the supply chain that may be an
indicator of an ingredients quality or of a potential hazard. Such
quality is assessed by standardized tests for the chemical and
microbial composition to meet legal requirements and specica-
tions from government agencies throughout the world. For raw
materials or nished products to meet these bounds of safety and
quality, their composition must usually have a low microbiological
load (except in fermented foods) and be chemically identical in
macro-components such as carbohydrate, protein, and fat.
Methods in this space must avoid false-negative results that
could endanger consumers, while also minimizing false positives
which could lead to unnecessary recalls and food loss.
Existing microbial detection technologies used in food safety
today such as pulse-eld gel electrophoresis (PFGE) and WGS
require microbial isolation. This provides biased outcomes as it
removes microbes from their native environment where other
biotic members also subsist and select microbes by culturability
alone. Amplicon sequencing, while a low-cost alternative to
metagenome or metatranscriptome sequencing for bacteria, also
imparts PCR amplication bias and reduces detection sensitivity
due to reliance on a single gene (16S ribosomal RNA)
. We,
therefore, investigated the utility of total RNA sequencing of food
microbiomes and demonstrated that from this single test,
we are able to yield several pertinent results about food safety
and quality.
For this evaluation, we developed a pipeline to characterize the
microbiome of typical food ingredient samples and to detect
potentially hazardous outliers. Special considerations for food
samples were made as computational pipelines for human or
other microbiome analyses are not sufcient for applications in
food safety without modication. In food, the eukaryotic matrix
needs to be conrmed, maybe mixed, and, as we and others
have shown, affects the identication accuracy of microbes that
are present
.Byltering food matrix sequence data properly,
we avoid incorrect microbial identication and characterization of
the microbiome
while also increasing the computational
efciency for downstream processing. The addition of this ltering
step in the pipeline removed ~90% of false-positive genera and
provided results at 99.96% specicity when evaluating simulated
mixtures of food matrix and microbes (Table 1).
Through the analysis of 31 HPP total RNA sequencing samples,
we demonstrated the pipelines ability to characterize food
microbiomes and indicate outliers. In this sample collection, we
identied a core catalog of 65 microbial genera found in all
samples where Bacteroides,Clostridium, and Lactococcus were the
most abundant (Supplementary Table 4). We also demonstrated
that in these food microbiomes the overall diversity was twofold
greater than the core microbe set. Fluctuations in the microbiome
can indicate important differences between samples as observed
here, as well as in the literature for grape berry
and apple fruit
microbiomes (pertaining to organic versus conventional farming)
or indicate inherent variability between production batches or
suppliers as observed here and during cheddar cheese manu-
. Specically, we observed a shift in the microbial
composition (Fig. 2b) and the microbial load (Supplementary Fig.
S3) in HPP samples (derived from poultry meal) where unexpected
pork and beef were observed. Matrix-contaminated samples were
marked by increased relative abundances of specic microbes
including Lactococcus,Lactobacillus, and Streptococcus (Fig. 5b).
This work shows that the microbiome shifts with observed food
matrix contamination from sources with similar macronutrient
content and thus, the microbiome alone is a likely signal of
compositional change in food.
Beyond shifts in the microbiome, we focused on a set of well-
dened foodborne-pathogen-containing genera and explored
their relative abundances observed from total RNA sequencing.
Fig. 8 Salmonella status correlations with genus relative abundances. Only those genera with the absolute value of the correlation
coefcient >0.5 are shown. Positive and negative correlations are indicated in gray and blue, respectively.
K.L. Beck et al.
npj Science of Food (2021) 3 Published in partnership with Beijing Technology and Business University
Of these genera, Aeromonas,Bacillus,Campylobacter,Clostridium,
Corynebacterium,Escherichia,Salmonella, and Staphylococcus were
detected in every HPP sample. This highlights that when using
NGS there may be an observable baseline of sequences assigned
to potentially pathogenic microbes. For this ingredient type, this
result lends a range of normalcy of relative abundance generated
by NGS. Further work is needed to establish a denitive and
quantitative range of typical variation in samples of a particular
food source and the degree of an anomaly for a new sample or
genus abundance. However, preliminary studies of this nature can
inform the development of guidelines when working with
increasingly sensitive shotgun metagenomic or metatranscrip-
tomic analysis.
Furthermore, sequenced DNA or RNA alone does not imply
microbial viability. Therefore, we investigated the relatedness of
culture-based tests and total RNA sequencing for the pathogenic
bacterium Salmonella in the HPP samples. As has been reported
for human gut
and deep sea
microbiomes, we also did not
detect a correlation between Salmonella read abundance and
culturability (Fig. 7and Table 2). Sequence reads matching
Salmonella references were observed for all samples (both culture-
positive and culture-negative) as determined by multiple analysis
techniques: microbiome classication, alignment to Salmonella
genomes, and targeted growth gene analysis. When ranking the
HPP samples based on Salmonella abundance from whole-
genome alignments, the culture-positive samples were enriched
for higher ranks (P=0.06). However, the culture-positive samples
were still intermixed in ranking with culture-negative samples.
This indicated that there was no clear minimum threshold of
sequence data as evidence for culturability and that this analysis
alone is not predictive of pathogen growth. One possible reason
for this is that the culture-positive variant of Salmonella is missing
from existing reference data sets. Potentially, Salmonella attained
a nonculturable state wherein it was detected by sequencing
techniques yet remained nonculturable from the HPP sources.
Successful isolation of total RNA and DNA and gene expression
analysis from experimentally known nonculturable bacteria has
been demonstrated by Ganesan et al. in multiple studies in other
. The physiological state should thus be taken under
consideration when benchmarking sequencing technologies in
comparison with culture-based methods. Thus, total RNA sequen-
cing of food samples may identify shifts that standard food testing
does not, but the incongruity between sequencing read data
and culture-based results highlights the need to perform more
benchmarking in food microbiome analysis for pathogen
The characterization of HPP food microbiomes leveraged
current accepted public reference databases, yet it is known that
these databases are still inadequate
. Furthermore, when
considering congruence between Salmonella culturability and
NGS read mapping techniques, the genetic breadth and depth of
multi-genome reference sequences are essential. For example,
focusing on ef-Tu, a known marker gene for Salmonella growth
was not sufcient to mirror the viability of in vitro culture tests.
This highlights the limitations of single-gene approaches for
identication. When the sequenced reads were examined in the
context of an augmented reference collection of Salmonella
genomes, we observed improved ranking and read mapping rate
for culture-positive samples (yet we did not achieve complete
concordance). This improvement underlined the increased analy-
tical robustness yielded from a multi-genome reference. We also
recognize that the read mapping rate may be exaggerated as
reads from non-Salmonella genomes could map to Salmonella in
the absence of any other reference genomes. Overall for robust
analysis and applicability to food safety and quality, microbial
references must be expanded to include more genetically
diverse representatives of pathogenic and spoilage organisms.
Description of food microbiomes will only improve as additional
public sequence data is collected and leveraged.
In our sample collection, 24% (effectively 5 to 14 million) of
reads remain unclassied. The GC content distribution of
unclassied reads matched microbial GC content distribution
(Supplementary Fig. S4) suggesting that these reads may have
been derived from microbes missing from the current reference
database that have not yet been isolated or sequenced. By
sequencing the microbiome, we sampled environmental niches in
their native state in a culture-independent manner and therefore
collected data from diverse and potentially never-before-seen
microbes. Tracking unclassied reads will also be essential for
monitoring food microbiomes. The inability to provide a name
from existing references does not eliminate the possibility that the
sequence is from an unwanted microbe or indicates a hazard. In
addition to tracking known microbes, quantitative or qualitative
shifts in the unclassied sequences might be used to detect when
a sample is different from its peers.
We demonstrated the potential utility of analyzing food
microbiomes for food safety using raw ingredients. This study
resulted in the detection of shifts in the microbiome composition
corresponding to unexpected matrix contaminants. This signies
that the microbiome is likely an important and effective hazard
indicator in the food supply chain. While we have used total RNA
sequencing for the detection of microbiome membership, the
technology has future applicability for the detection of anti-
microbial resistance, virulence, and biological function for multiple
food sources, and for other sample types. Notably, while this
pipeline was developed for food monitoring, with applicable
modications and identication of material-specic indicators, it
can be applied to other microbiomes including human and
Sample collection, preparation, and sequencing
HPP (HPP, 2.5 kg) samples were each collected from a train car in Reno, NV,
USA between April 2015 and February 2016 in four batches from two
suppliers HPP sample was composed of ve sub-samples from random
locations within the train car prior to shipment. Each HPP was shipped to
the Weimer laboratory at UC Davis (Davis, CA) with 2-day delivery. Upon
arrival, each HPP was aliquoted into at least three tubes containing Trizol
for long term storage and use in sequencing studies (see extraction section
for further processing before sequencing). The remaining HPP was sealed
in the plastic bag it arrived in. Those bags were put in closed storage tubs
that were stored at room temperature (~25 °C) for the remainder of the
study. Sample preparation, total RNA extraction, and integrity conrma-
tion, cDNA construction, and library construction for the sample material
used was described in our companion publication
Sequencing was performed by BGI@UC Davis (Sacramento, CA) using
Illumina HiSeq 4000 (San Diego, CA) with 150 paired-end chemistry for
each sample except the following: HiSeq 3000 with 150 paired-end
chemistry was used for MFMB-04 and MFMB-17. All total RNA sequencing
data are available via the 100 K Pathogen Genome Project BioProject
(PRJNA186441) at NCBI (Supplementary Table 1).
For evaluation of total RNA sequencing for microbial classication in
paired processing steps, total RNA and total DNA were extracted from the
same sample and denoted as MFMB-03 and MFMB-08, respectively. The
total RNA was extracted and sequenced as described above. The total DNA
was extracted and sequenced as described elsewhere
. The
Illumina HiSeq 2000 with 100 paired-end chemistry was used for MFMB-
03 and MFMB-08.
Sequence data quality control
Illumina Universal adapters were removed and reads were trimmed using
Trim Galore
with a minimum read length parameter 50 bp. The resulting
reads were ltered using Kraken
, as described below in Section 4.3, with
a custom database built from the PhiX genome (NCBI Reference Sequence:
NC_001422.1). Removal of PhiX content is suggested as it is a common
K.L. Beck et al.
Published in partnership with Beijing Technology and Business University npj Science of Food (2021) 3
contaminant in Illumina sequencing data
. Trimmed non-PhiX reads were
used in subsequent matrix ltering and microbial identication steps.
Matrix ltering process and validation
with a k-mer size of 31 bp (optimal size described in the Kraken
reference publication) was used to identify and remove reads that
matched a pre-determined list of 31 common food matrix and potential
contaminant eukaryotic genomes (Supplementary Table 2). These food
matrix organisms were chosen based on preliminary eukaryotic read
alignment experiments of the HPP samples as well as high-volume food
components in the supply chain. Due to the large size of eukaryotic
genomes in the custom Kraken
database, a random k-mer reduction was
applied to reduce the size of the database by 58% using Kraken-build with
optionmax-db-size, in order to t the database in 188 GB for in-memory
processing. A conservative Kraken score threshold of 0.1 was applied to
avoid ltering microbial reads. The matrix ltering database includes low
complexity and repeats regions of eukaryotic genomes to capture all
possible matrix reads. This ltering database with the score threshold was
also used in the matrix ltering in silico testing as described below.
Matrix ltering was validated by constructing synthetic paired-end reads
(150 bp) using DWGSIM
with mutations from reference sequences using
the following parameters: base error rate (e)=0.005, the outer distance
between the two ends of a read pair (d)=500, rate of mutations (r)=
0.001, a fraction of indels (R)=0.15, probability an indel is extended (X)=
0.3. Reference sequences are detailed in Supplementary Table 3. We
constructed two in silico mixtures of sequencing reads by randomly
sampling reads from eukaryotic reference genomes. Simulated Food
Mixture 1 was comprised of nine species with the following number of
reads per genome: 2 M cattle, 2 M salmon, 1 M goat, 1 M lamb, 1 M tilapia
(transcriptome), 962 K chicken (transcriptome), 10 K duck, 1 K horse, and
1 K rat totaling 7.974 M matrix reads. Simulated Food Mixture 2 contained
5 M soybean, 4 M rice, 3 M potato, 2 M corn, 200 K rat, and 10 K
drain y reads, totaling 14.210 M matrix reads. Both simulated food
mixtures included 1000 microbial sequence reads generated from
15 different microbial species for a total of 15 K sequence reads
(Supplementary Table 3).
Microbial identication
Remaining reads after quality control and matrix ltering were classied
using Kraken
against a microbial database with a k-mer size of 31 bp to
determine the microbial composition within each sample. NCBI RefSeq
genomes were obtained for bacterial, archaeal, viral, and
eukaryotic microorganisms (~7800 genomes retrieved April 2017). Low
complexity regions of the genomes were masked using Dustmasker
default parameters. A threshold of 0.05 was applied to the Kraken score in
an effort to maximize the F-score of the result (as demonstrated in Krakens
operating manual
. Taxa-specic sequence reads were used to calculate a
relative abundance in reads per million (RPM; Eq. 1), where R
the reads classied per microbial entity (e.g., the genus Salmonella) and R
represents the number of sequenced reads remaining after quality control
(trimming and PhiX removal) for an individual sample, including any reads
classied as eukaryotic:
This value provides a relative abundance of the microbial entity of
interest and was used in comparisons of taxa among samples. Genera with
a conservative threshold of RPM > 0.1 were dened as present, as
previously applied by others in the contexts of human infectious disease
and gut microbiome studies
. Pearson correlation of resulting microbial
genus counts was computed.
Community ecology analysis
Rarefaction analysis at multiple subsampled read depths R
was performed
by multiplying the microbial genus read counts with R
and rounding
the results down to the nearest integer to represent observed read counts.
Here, R
is the total number of reads in the sample after quality control
(including microbial, matrix, and unclassied reads). Resulting α-diversity at
read depth R
was computed as the number of genera with resulting
RPM > 0.1 and plotted at ve million read intervals: R
=5M, 10M,
15 M, ,R
. If, due to random sampling and rounding effects, the
computed α-diversity was lower than the diversity computed at any
previous depth, the previous higher α-diversity was used for plotting. The
median elbow was calculated as indicated using the R package kneed
In compositional data analysis
, non-zero values are required when
computing β-diversity based on Aitchison distance
. Therefore, reads
counts assigned to each genus were pseudo-counted by adding one in
advance of computation of RPM (Eq. 1) prior to calculating the Aitchison
distance for the microbial table. β-diversity was calculated using the R
package robCompositions
and hierarchical clustering was performed
using base R function hclust using the ward.D2method as recommended
for compositional data analysis
Pairwise Spearmans correlation was computed between all samples
(with the Matlab function corr) using the RPM vectors for the 229 detected
genera as input. For the purpose of comparing correlation values within
and between suppliers, the samples MFMB-04, MFMB-20, and MFMB-38
have excluded from the group Supplier A samplesand considered as a
separate matrix-contaminated group. In addition, a two-sample ttest was
calculated per genus on the RPM abundances from samples from Supplier
A (excluding MFMB-04, MFMB-20, and MFMB-38 due to known non-poultry
matrix content) and Supplier B using base R with a BenjaminiHochberg
adjustment for multiple hypothesis testing.
Unclassied read analysis
The GC percent distributions of the matrix (from matrix ltering), microbial,
and remaining unclassied reads per sample were computed using
and collated across samples with MultiQC
Analysis of Salmonella culturability
Growth of Salmonella was determined using a real-time quantitative PCR
method for the conrmation of Salmonella isolates for presumptive generic
identication of foodborne Salmonella. Testing was performed fully in
concordance with the Bacteriological Analytical Manual (BAM) for
for this approach that is also AOAC-approved. All samples
with positive results for Salmonella were classied as containing actively
growing Salmonella. To compare culture results with those from total RNA
sequencing, Salmonella RPM values were parsed from the genus-level
microbe table (described in the section Microbial identication).
Two additional approaches were employed to examine Salmonella read
mapping with a more sensitive tool and broader reference databases.
Quality controlled matrix-ltered reads were aligned using Bowtie2
very-sensitive-local-mode to (1) an expanded collection of whole
Salmonella genomes and (2) to a curated growth gene reference for
elongation factor Tu (ef-Tu). For results from both complete genome and
ef-Tu gene alignments, the relative abundance (RPM) was computed as
shown in Eq. 1.
For whole-genome alignments, a reference was constructed from 1183
recently published Salmonella genomes
in addition to the 264 Salmonella
genomes extracted from the aforementioned NCBI RefSeq Complete
collection (see the section Microbial identication).
To construct a curated growth gene (ef-Tu) reference, gene sequences
annotated in Salmonella genomes as elongation factor Tu,EF-Tuor
eftu(case insensitive) were retrieved from the Functional Genomics
Platform (formerly OMXWare)
using its Python package. This query
yielded 4846 unique gene sequences from a total of 36,242 Salmonella
genomes which were assembled or retrieved from the NCBI Sequence
Read Archive or RefSeq Complete Sequences as previously indicated
. The
retrieved ef-Tu gene sequences were subsequently used to build a custom
reference. Read alignment was completed with very-sensitive-
The read counts for each sample were ranked and the Wilcoxon rank-
sum test was computed between the rank vectors of 4 Salmonella-positive
and 23 Salmonella-negative samples. The four samples with unknown
Salmonella status were excluded from the rankings.
Point-biserial correlation coefcients (r
) were calculated between
Salmonella growth indicated by culture results (+1 and 1 for presence
and absence, respectively) and observed relative abundance from total
RNA sequencing results using the R package ltm
. The point-biserial
correlation is a special case of the Pearson correlation that is better suited
for a binary variable e.g., when Salmonella is reported as present or absent
(a samplesSalmonella status).
K.L. Beck et al.
npj Science of Food (2021) 3 Published in partnership with Beijing Technology and Business University
All high protein powder (HPP) poultry meal sequences are available through the
100 K Pathogen Genome Project (PRJNA186441) in the NCBI BioProject (see
Supplementary Table 1 for a complete list of accession numbers).
The pipeline and microbial or matrix references were constructed from publicl y
available tools and reference sequences as described in Methods. Automated
usability of this pipeline is available through membership in the Consortium for
Sequencing the Food Supply Chain.
Received: 14 January 2020; Accepted: 24 November 2020;
1. Kovac, J., Bakker, H. den, Carroll, L. M. & Wiedmann, M. Precision food safety: a
systems approach to food safety facilitated by genomics tools. TrAC Trends Anal.
Chem. (2017).
2. Weimer, B. C. et al. Dening the food microbiome for authentication, safety, and
process management. IBM J. Res. Dev. 60, 1 (2016).
3. Walsh, A. M. et al. Microbial succession and avor production in the fermented
dairy beverage ker. mSystems 1, e0005216 (2016).
4. Walsh, A. M. et al. Species classier choice is a key consideration when analysing
low-complexity food microbiome data. Microbiome 6, 50 (2018).
5. Duru, I. C. et al. Metagenomic and metatranscriptomic analysis of the microbial
community in Swiss-type Maasdam cheese during ripening. Int. J. Food Microbiol.
281,1022 (2018).
6. Martins, G. et al. Grape berry bacterial microbiota: Impact of the ripening process
and the farming system. Int. J. Food Microbiol. 158,93100 (2012).
7. Abdelfattah, A., Wisniewski, M., Droby, S. & Schena, L. Spatial and compositional
variation in the fungal communities of organic and conventionally grown apple
fruit at the consumer point-of-purchase. Hortic. Res. 3, 16047 (2016).
8. Williams, A. G., Choi, S.-C. & Banks, J. M. Variability of the species and strain
phenotype composition of the non-starter lactic acid bacterial population of
cheddar cheese manufactured in a commercial creamery. Food Res. Int. 35,
483493 (2002).
9. Weimer, B. C. 100K pathogen genome project. Genome Announc. 5, e0059417
10. Emond-Rheault, J.-G. et al. A Syst-OMICS approach to ensuring food safety and
reducing the economic burden of salmonellosis. Front. Microbiol. 8, 996 (2017).
11. Kaufman, J. H. et al. Insular microbiogeography: Three pathogens as exemplars.
Curr. Issues Mol. Biol. 36,89108 (2020).
12. Bashiardes, S., Zilberman-Schapira, G. & Elinav, E. Use of metatranscriptomics in
microbiome research. Bioinform. Biol. Insights 10,1925 (2016).
13. McGrath, K. C. et al. Isolati on and analysis of mRNA from environmental microbial
communities. J. Microbiol. Methods 75, 172176 (2008).
14. Cottier, F. et al. Advantages of meta-total RNA sequencing (MeTRS) over shotgun
metagenomics and amplicon-based sequencing in the proling of complex
microbial communities. npj Biolms Microbiomes 4, 2 (2018).
15. Macklaim, J. M. et al. Comparative meta-RNA-seq of the vaginal microbiota and
differential expression by Lactobacillus iners in health and dysbiosis. Microbiome
1, 12 (2013).
16. Haiminen, N. et al. Food authentication from shotgun sequencing reads with an
application on high protein powders. npj Sci. Food 3,111 (2019).
17. Lakin, S. M. et al. MEGARes: an antimicrobial resistance database for high
throughput sequencing. Nucleic Acids Res. 45, D574D580 (2016).
18. Noyes, N. R. et al. Resistome diversity in cattle and the environment decreases
during beef production. eLife 5, e13195 (2016).
19. Ganesan, B., Dobrowolski, P. & Weimer, B. C. Identication of the leucine-to-2-
methylbutyric acid catabolic pathway of Lactococcus lactis.Appl. Environ. Micro-
biol. 72, 42644273 (2006).
20. Ganesan, B., Seefeldt, K., Koka, R. C., Dias, B. & Weimer, B. C. Monocarboxylic acid
production by lactococci and lactobacilli. Int. Dairy J. 14, 237246 (2004).
21. Ganesan, B., Seefeldt, K. & Weimer, B. C. Fatty acid production from amino acids
and -keto acids by Brevibacterium linens BL2. Appl. Environ. Microbiol. 70,
63856393 (2004).
22. Ganesan, B., Stuart, M. R. & Weimer, B. C. Carbohydrate starvation causes a
metabolically active but nonculturable state in Lactococcus lactis.Appl. Environ.
Microbiol. 73, 24982512 (2007).
23. Ganesan, B. et al. Probiotic bacteria survive in Cheddar cheese and modify
populations of other lactic acid bacteria. J. Appl. Microbiol. 116, 16421656 (2014).
24. Ganesan, B. & Weimer, B. C. Cheese: Chemistry, Physics, and Microbiology (Elsevier,
25. Shein, A. M., Melby, C. L., Carbonero, F. & Weir, T. L. Linking dietary patterns with
gut microbial composition and function. Gut Microbes 8, 113129 (2017).
26. McDonald, D. et al. American gut: an open platform for citizen science micro-
biome research. mSystems 3, e0003118 (2018).
27. Clemente, J. C., Ursell, L. K., Parfrey, L. W. & Knight, R. The impact of the gut
microbiota on human health: an integrative view. Cell 148, 12581270 (2012).
28. Richards, J. L., Yap, Y. A., McLeod, K. H., Mackay, C. R. & Mariño, E. Dietary
metabolites and the gut microbiota: an alternative approach to control inam-
matory and autoimmune diseases. Clin. Trans. Immunol. 5, e82 (2016).
29. Yang, X. et al. Use of metagenomic shotgun sequencing technology to detect
foodborne pathogens within the microbiome of the beef production chain. Appl.
Env. Microbiol. 82, 24332443 (2016).
30. Hofacre, C. L. et al. Characterization of antibiotic-resistant bacteria in rendered
animal products. Avian Dis. 45, 953961 (2001).
31. Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome
datasets are compositional: and this is not optional. Front. Microbiol. 8, 2224 (2017).
32. Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze
microbiome high-throughput sequencing data. Can. J. Microbiol. 62, 692703
33. Langelier, C. et al. Integrating host response and unbiased microbe detection for
lower respiratory tract infection diagnosis in critically ill adults. Proc. Natl Acad.
Sci. USA 115, E12353E12362 (2018).
34. Ilott, N. E. et al. Dening the microbial transcriptional response to colitis through
integrated host and microbiome proling. ISME J. 10, 23892404 (2016).
35. Ripp, F. et al. All-Food-Seq (AFS): a quantiable screen for species in biological
samples by deep DNA sequencing. BMC Genomics 15, 639 (2014).
36. Lee, A. Y., Lee, C. S. & Gelder, R. N. Van. Scalable metagenomics alignment
research tool (SMART): a scalable, rapid, and complete search heuristic for the
classication of metagenomic sequences from complex sequence populations.
BMC Bioinforma. 17, 292 (2016).
37. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classica-
tion using exact alignments. Genome Biol. 15, R46 (2014).
38. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
39. Wu, D. et al. A phylogeny-driven genomic encyclopaedia of bacteria and archaea.
Nature 462, 10561060 (2009).
40. Kyrpides, N. C. et al. Genomic encyclopedia of bacteria and archaea: sequencing a
myriad of type strains. PLoS Biol. 12, e1001920 (2014).
41. Kyrpides, N. C., Eloe-Fadrosh, E. A. & Ivanova, N. N. Microbiome data science:
understanding our microbial planet. Trends Microbiol. 24, 425427 (2016).
42. Thompson, L. R. et al. A communal catalogue reveals Earths multiscale microbial
diversity. Nature 551, 457 (2017).
43. Nayfach, S. & Pollard, K. S. Toward accurate and quantitative comparative
metagenomics. Cell 166, 11031116 (2016).
44. Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a Kneedlein a Hay-
stack: Detecting knee points in system behavior. In 31st International Conference
on Distributed Computing Systems Workshops, Minneapolis, USA, 2024 June 2011,
pp 166171 (2011).
45. Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J. A. & Pawlowsky-Glahn, V.
Logratio analysis and compositional distance. Math. Geol. 32, 271275 (2000).
46. Di Palma, M. A. & Gallo, M. A co-median approach to detect compositional
outliers. J. Appl. Stat. 43, 23482362 (2016).
47. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat.
Methods 9, 357359 (2012).
48. Kong, N. et al. Draft genome sequenc es of 1,183 Salmonella strains from the 100K
pathogen genome project. Genome Announc. 5, e0051817 (2017).
49. Tubulekas, I. & Hughes, D. A single amino acid subst itution in elongation factor
Tu disrupts interaction between the ternary complex and the ribosome. J. Bac-
teriol. 175, 240250 (1993).
50. Seabolt, E. et al. IBM functional genomics platform, a cloud-based platform for
studying microbial life at scale. IEEE/ACM Trans. Comput. Biol. Bioinforma.11. (2020).
51. Zelezniak, A. et al. Metabolic dependencies drive species co-occurrence in diverse
microbial communities. Proc. Natl Acad. Sci. USA 112, 64496454 (2015).
52. Jones, M. B. et al. Library preparation methodology can inuence genomic and
functional predictions in human microbiome research. Proc Natl Acad Sci USA. (2015).
53. Pollock, J., Glendinning, L., Wisedchanwet, T. & Watson, M. The madness of
microbiome: attempting to nd consensus best practicefor 16S microbiome
studies. Appl. Environ. Microbiol. AEM.02627-17.
17 (2018).
K.L. Beck et al.
Published in partnership with Beijing Technology and Business University npj Science of Food (2021) 3
54. Browne, H. P. et al. Culturing of uncultu rablehuman microbiota reveals novel
taxa and extensive sporulation. Nature 533, 543546 (2016).
55. Eilers, H., Pernthaler, J., Glöckner, F. O. & Amann, R. Culturability and in situ
abundance of pelagic bacteria from the North Sea. Appl. Environ. Microbiol. 66,
30443051 (2000).
56. Hinchliff, C. E. et al. Synthesis of phylogeny and taxonomy into a comprehensive
tree of life. Proc. Natl Acad. Sci. USA 112, 1276412769 (2015).
57. Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16,
410422 (2018).
58. Weis, A. M. et al. Genomic comparison of campylobacter spp. and their potential
for zoonotic transmission between birds, primates, and livestock. Appl. Environ.
Microbiol. 82, 7165 LP7167175 (2016).
59. Miller, B. et al. A novel, single-tube enzymatic fragmentation and library con-
struction method enables fast turnaround times and improved data quality for
microbial whole-genome sequencing. Kapa Biosyst. Appl. Note 18. https://doi.
org/10.13140/RG.2.1.4534.3440 (2015).
60. Lüdeke, C. H. M., Kong, N., Weimer, B. C., Fischer, M. & Jones, J. L. Complete
genome sequences of a clinical isolate and an environmental isolate of Vibrio
parahaemolyticus. Genome Announc. 3, e0021615 (2015).
61. Jeannotte, R. et al. High-throughput analysis of foodborne bacterial genomic
DNA using Agilent 2200 TapeStation and genomic DNA ScreenTape system. Agil.
Appl. Note 18. (2015).
62. Arabyan, N. et al. Salmonella degrades the host glycocalyx leading to altered
infection and glycan remodeling. Sci. Rep. 6,111 (2016).
63. Krueger, F. TrimGalore: A wrapper around Cutadapt and FastQC to consistently
apply adapter and quality trimming to FastQ les, with extra functionality for
RRBS data. GitHub. Available online at:
TrimGalore (2018). Accessed 28 Jun 2018.
64. Mukherjee, S., Huntemann, M., Ivanova, N., Kyrpides, N. C. & Pati, A. Large-scale
contamination of microbial isolate genomes by Illumina PhiX control. Stand.
Genom. Sci. 10, 18 (2015).
65. Homer, N. DWGIM: Whole genome simulator for next-generation sequencing.
GitHub. (2011). Accessed 14 Jun 2017.
66. OLeary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status,
taxonomic expansion, and functional annotation. Nucleic Acids Res. 44,
D733D745 (2016).
67. Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. A fast and symmetric DUST
implementation to mask low-complexity DNA sequences. J. Comput. Biol. 13,
10281040 (2006).
68. Templ, M., Hron, K. & Filzmoser, P. robCompositions: an R-package for Robust
Statistical Analysis of Compositional Data. In: Buccianti A. & Pawlowsky-Glahn V.
Compositional Data Analysis, John Wiley & Sons, Ltd, pp 341355 (2011).
69. Andrews, S. FastQC: A quality control tool for high throughput sequence data.
Babraham Bioinformatics.
fastqc/ (2010). Accessed 01 Oct 2018.
70. Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis
results for multiple tools and samples in a single report. Bioinformatics 32,
30473048 (2016).
71. Andrews, W. H., Wang, H., Jacobson , A. & Hammack, T. Bacteriological analytical
manual (BAM) Chapter 5: Salmonella. In Bacteriological Analytical Manual U.S.
Food and Drug Administration (2018). Accessed 21 Jun 2019.
72. Grim, C. J. et al. High-resolution microbiome proling for detection and tracking
of Salmonella enterica.Front. Microbiol. 8, 1587 (2017).
73. Rizopoulos, D. ltm: an Rpackage for latent variable modeling and item response
theory analyses. J. Stat. Softw. 17,125 (2006).
The authors like to acknowledge the IBM Research Functional Genomics Platform
team (formerly OMXWare) for their data management support and availability for the
retrieval and processing of microbial genomes. This research project was nancially
supported by the Consortium for Sequencing the Food Supply Chain. Funding for the
total RNA sequencing of high protein powder factory ingredients was provided by
Mars, Incorporated to B.C.W. with a specic interest in metagenomics of the food
K.L.B. and N.H. conceived of the experimental design, developed the approach,
completed and oversaw the experiments, performed analyses, and wrote the
paper and are represented as co-rst authors; D.C., S.E., M.K., B.K., M.D., R.P., H.K.,
and E.S. developed the approach, analyzed the data, and revised the paper; B.C.H.
completed nucleic acid extraction method development and sequencing library
construction, and contributed to data analysis and writing; N.K. coordinated
sample collection and processing, nucleic acid extraction, and contributed to
writing; R.B. and P.M. conceived of the experimental design, developed the
approach, and reviewed the paper; B.G. contributed to the experimental design,
developed the approach, and wrote the paper; G.D., C.H.M., S.P., and A.Q.
participated to the conception of the experimental design and to the review of
the paper; L.P. conceived of the experiment, contributed to the data
analysis, and wrote the paper; J.H.K. conceived of the experiment, developed
the approach, and wrote the paper; B.C.W. conceived of the experimental design,
developed the approach, oversaw the experiments, performed analyses, and
wrote the paper.
The authors were employed by private or academic organizations as described in the
author afliations at the time this work was completed. IBM Corporation, Mars
Incorporated, and Bio-Rad Laboratories are members of the Consortium for
Sequencing the Food Supply Chain. The authors declare no other competing
Supplementary information is available for this paper at
Correspondence and requests for materials should be addressed to K.L.B. or B.C.W.
Reprints and permission information is available at
Publishers note Springer Nature remains neutral with regard to jurisdictional claims
in published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
© The Author(s) 2021
K.L. Beck et al.
npj Science of Food (2021) 3 Published in partnership with Beijing Technology and Business University
... Since July 2012, the '100K Pathogen Genome Project' consisting of the United States Food and Drug Administration (US-FDA), Food Safety and Nutrition Administration and Agilent Technologies, at its center at UC Davis, has been in progress. This large-scale project aims to accumulate 100 000 food poisoning bacteria genome information and build a database by acquiring genome information of various food poisoning bacteria and viruses [45]. In addition, 'GenomeTrakr,' launched jointly by the FDA and Centers for Disease Control (CDC), is analyzing the genome information of various food poisoning bacteria such as Escherichia coli, Campylobacter species and Vibrio species [46]. ...
... However, it is hard to differentiate between live and dormant groups of microbiota [41]. Using ribonucleic acid sequencing (RNA-Seq) to record expressed RNA transcripts in an ecosystem at a given time under a given condition provides better knowledge about the active participants [45]. A combination of computational modeling and spectrometry affords a potent perspective on effectively conveyed proteins [46,47]. ...
Full-text available
There is currently a transformed interest toward understanding the impact of fermentation on functional food development due to growing consumer interest on modified health benefits of sustainable foods. In this review, we attempt to summarize recent findings regarding the impact of Next-generation sequencing and other bioinformatics methods in the food microbiome and use prediction software to understand the critical role of microbes in producing fermented foods. Traditionally, fermentation methods and starter culture development were considered conventional methods needing optimization to eliminate errors in technique and were influenced by technical knowledge of fermentation. Recent advances in high-output omics innovations permit the implementation of additional logical tactics for developing fermentation methods. Further, the review describes the multiple functions of the predictions based on docking studies and the correlation of genomic and metabolomic analysis to develop trends to understand the potential food microbiome interactions and associated products to become a part of a healthy diet.
... Library construction and sequencing for tissue RNA-seq Total RNA was extracted from the duodenum, jejunum, and ileum with standard protocols previously described (11). The resulting total RNA was analyzed for integrity using the Agilent 2100 Bioanalyzer system (Agilent Technologies) and samples with an integrity number > 7 were used for library construction as previously described (17,18). With an input of 2 µg of total RNA with the KAPA Stranded mRNA-Seq kit (Roche) and Agencourt R ...
Full-text available
Malnourishment is a risk factor for childhood mortality, jeopardizing the health of children by aggravating pneumonia/acute respiratory infections and diarrheal diseases. Malnourishment causes morphophysiological changes resulting in stunting and wasting that have long-lasting consequences such as cognitive deficit and metabolic dysfunction. Using a pig model of malnutrition, the interplay between the phenotypic data displayed by the malnourished animals, the gene expression pattern along the intestinal tract, microbiota composition of the intestinal contents, and hepatic metabolite concentrations from the same animals were correlated using a multi-omics approach. Samples from the duodenum, jejunum, and ileum of malnourished (protein and calorie-restricted diet) and full-fed (no dietary restrictions) piglets were subjected to RNA-seq. Gene co-expression analysis and phenotypic correlations were made with WGCNA, while the integration of transcriptome with microbiota composition and the hepatic metabolite profile was done using mixOmics. Malnourishment caused changes in tissue gene expression that influenced energetic balance, cell proliferation, nutrient absorption, and response to stress. Repression of antioxidant genes, including glutathione peroxidase, in coordination with induction of metal ion transporters corresponded to the hepatic metabolite changes. These data indicate oxidative stress in the intestine of malnourished animals. Furthermore, several of the phenotypes displayed by these animals could be explained by changes in gene expression.
... The high-throughput shotgun sequencing of environmental samples is now the best approach to generate comprehensive genomic data about food safety or spoilage-related microorganisms (bacterial, eukaryotic, viral) in a microbial community [47][48][49] . Such data generated and accumulated over time provide valuable information about each microbial community and its members with a time-place stamp. ...
Full-text available
The development and application of modern sequencing technologies have led to many new improvements in food safety and public health. With unprecedented resolution and big data, high-throughput sequencing (HTS) has enabled food safety specialists to sequence marker genes, whole genomes, and transcriptomes of microorganisms almost in real-time. These data reveal not only the identity of a pathogen or an organism of interest in the food supply but its virulence potential and functional characteristics. HTS of amplicons, allow better characterization of the microbial communities associated with food and the environment. New and powerful bioinformatics tools, algorithms, and machine learning allow for development of new models to predict and tackle important events such as foodborne disease outbreaks. Despite its potential, the integration of HTS into current food safety systems is far from complete. Government agencies have embraced this new technology, and use it for disease diagnostics, food safety inspections, and outbreak investigations. However, adoption and application of HTS by the food industry have been comparatively slow, sporadic, and fragmented. Incorporation of HTS by food manufacturers in their food safety programs could reinforce the design and verification of effectiveness of control measures by providing greater insight into the characteristics, origin, relatedness, and evolution of microorganisms in our foods and environment. Here, we discuss this new technology, its power, and potential. A brief history of implementation by public health agencies is presented, as are the benefits and challenges for the food industry, and its future in the context of food safety.
... (continued on next page) food matrix composition, in microbiome shifts and the resulting ecological succession, therefore, uniquely and precisely trace the source and point of contamination (Beck et al., 2021;Yeung, 2012). Advanced predictive models derived by incorporating molecular and genetic variability on a single cell level as well as the community level enhances the accuracy of prediction and risk management (Plaza-Rodríguez et al., 2018). ...
Full-text available
With the increasing consumption of packaged and ready-to-eat food products, the risk of foodborne illness has drastically increased and so has the dire need for proper management. The conventional Microbial Risk Assessment (MRA) investigations require prior knowledge of process flow, exposure, and hazard assessment throughout the supply chain. These data are often generated using conventional microbiological approaches based either on shelf-life studies or specific spoilage organisms (SSOs), frequently overlooking crucial information such as antimicrobial resistance (AMR), biofilm formation, virulence factors and other physiological variations coupled with bio-chemical characteristics of food matrix. Additionally, the microbial risks in food are diverse and heterogenous, that might be an outcome of growth and activity of multiple microbial populations rather than a single species contamination. The uncertainty on the microbial source, time as well as point of entry into the food supply chain poses a constraint to the efficiency of preventive approaches and conventional MRA. In the last few decades, significant breakthroughs in molecular methods and continuously progressing bioinformatics tools have opened up a new horizon for risk analysis-based approaches in food safety. Real time polymerase chain reaction (qPCR) and kit-based assays provide better accuracy and precision with shorter processing time. Despite these improvements, the effect of complex food matrix on growth environment and recovery of pathogen is a persistent problem for risk assessors. The dairy industry is highly impacted by spoilage and pathogenic microorganisms. Therefore, this review discusses the evolution and recent advances in MRA methodologies equipped with predictive interventions and “multi-omics” approach for robust MRA specifically targeting dairy products. It also highlights the limiting gap area and the opportunity for improvement in this field to ensure precision food safety.
... Compared to other human-related environments, food microbiomes, including their resistance potential, are rarely investigated [10]. The microbiome of food is a decisive factor for food quality [11], shelf life [12] and fermentation process [13], and already highlighted as the "missing link" in food safety policies and standards [14]. ...
Full-text available
Background A detailed understanding of antimicrobial resistance trends among all human-related environments is key to combat global health threats. In food science, however, the resistome is still little considered. Here, we studied the apple microbiome and resistome from different cultivars (Royal Gala and Braeburn) and sources (freshly harvested in South Africa and exported apples in Austrian supermarkets) by metagenomic approaches, genome reconstruction and isolate sequencing. Results All fruits harbor an indigenous, versatile resistome composed of 132 antimicrobial resistance genes (ARGs) encoding for 19 different antibiotic classes. ARGs are partially of clinical relevance and plasmid-encoded; however, their abundance within the metagenomes is very low (≤ 0.03%). Post-harvest, after intercontinental transport, the apple microbiome and resistome was significantly changed independently of the cultivar. In comparison to fresh apples, the post-harvest microbiome is characterized by higher abundance of Enterobacteriales, and a more diversified pool of ARGs, especially associated with multidrug resistance, as well as quinolone, rifampicin, fosfomycin and aminoglycoside resistance. The association of ARGs with metagenome-assembled genomes (MAGs) suggests resistance interconnectivity within the microbiome. Bacterial isolates of the phyla Gammaproteobacteria , Alphaproteobacteria and Actinobacteria served as representatives actively possessing multidrug resistance and ARGs were confirmed by genome sequencing. Conclusion Our results revealed intrinsic and potentially acquired antimicrobial resistance in apples and strengthen the argument that all plant microbiomes harbor diverse resistance features. Although the apple resistome appears comparatively inconspicuous, we identified storage and transport as potential risk parameters to distribute AMR globally and highlight the need for surveillance of resistance emergence along complex food chains.
... As the ecological importance for microbial communities is revealed, there is increasing interest in manipulating the microbiota [30] and its "theater of activity"-the Microbiome [31]-that is expected to revolutionize personalized medicine [32,33], agriculture [34], food production [35,36], and numerous other processes that microbiomes impact. A bottleneck to such efforts is that resolving biological interactions between viruses and their microbial hosts and characterizing virus phenotypes beyond genomes in complex communities is not keeping up with the blistering pace of virus discovery. ...
Full-text available
Viral metagenomics (viromics) has reshaped our understanding of DNA viral diversity, ecology, and evolution across Earth’s ecosystems. However, viromics now needs approaches to link newly discovered viruses to their host cells and characterize them at scale. This study adapts one such method, sequencing-enabled viral tagging (VT), to establish “Viral Tag and Grow” (VT + Grow) to rapidly capture and characterize viruses that infect a cultivated target bacterium, Pseudoalteromonas. First, baseline cytometric and microscopy data improved understanding of how infection conditions and host physiology impact populations in VT flow cytograms. Next, we extensively evaluated “and grow” capability to assess where VT signals reflect adsorption alone or wholly successful infections that lead to lysis. Third, we applied VT + Grow to a clonal virus stock, which, coupled to traditional plaque assays, revealed significant variability in burst size—findings that hint at a viral “individuality” parallel to the microbial phenotypic heterogeneity literature. Finally, we established a live protocol for public comment and improvement via to maximally empower the research community. Together these efforts provide a robust foundation for VT researchers, and establish VT + Grow as a promising scalable technology to capture and characterize viruses from mixed community source samples that infect cultivable bacteria.
... Whole metagenome sequencing (WMS) allows scanning for several species simultaneously even when these are present in a small quantity in a food matrix [111]. This approach is widely used in the food security sector to identify and characterize complex microbial communities in food samples [112]. An important advantage of using WMS in food-borne hurtful microbial detection is the possibility of also detecting non-culturable pathogens; moreover, the production of draft genome sequences of the bacteria responsible for food-borne alerts is also possible, allowing for the identification of contamination sources [113]. ...
Full-text available
In the last decades, the demand for molecular tools for authenticating and tracing agri-food products has significantly increased. Food safety and quality have gained an increased interest for consumers, producers, and retailers, therefore, the availability of analytical methods for the determination of food authenticity and the detection of major adulterations takes on a fundamental role. Among the different molecular approaches, some techniques such as the molecular markers-based methods are well established, while some innovative approaches such as isothermal amplification-based methods and DNA metabarcoding have only recently found application in the agri-food sector. In this review, we provide an overview of the most widely used molecular techniques for fresh and processed agri-food authentication and traceability, showing their recent advances and applications and discussing their main advantages and limitations. The application of these techniques to agri-food traceability and authentication can contribute a great deal to the reassurance of consumers in terms of transparency and food safety and may allow producers and retailers to adequately promote their products.
The distinct microbial diversity present in fermented foods influencing flavor profile is now commonly referred to as microbial terroir. Understanding how microbial communities develop in fermented food is important because it can explain how different flavor profiles develop and how community stability leads to food preservation. Using a common DNA-sequencing approach to characterize the microbial communities that developed in eight fermented food products, we show that fermentation is primarily influenced by the main ingredient being cooked. Moreover, we found that each fermented food group harbored microbial communities similar to those previously described in traditional cooking styles. Thus, our study not only provides methodologies to characterize the microbial terroir signature of fermented foods in a professional kitchen but also enables us to understand further the value of local fermented food in our culinary journey.
Full-text available
The aim of study was to determine the occurrence of virulence factors and virulence-related genes among enterococci isolated from food of animal origin and effects of osmotic and high pressure stress on expression of virulence-related genes. The number of 78 isolates were analyzed. None of them showed a strong ability to form biofilm, 38.5% (n = 30) had the slime production ability, 41% (n = 32) had gelatinase activity, γ -type hemolysis was observed in 55% of isolates, and α-type hemolysis in 45%. All of the isolates carried 1–13 virulence-related genes. The most common genes were gelE (85.9%), sprE (78.2%) and asa1 (75.6%). There were also observed changes in the expression of the gelE, esp, asa1 and cylL genes in response to various NaCl concentration and high pressure processing. Results obtained in this study indicate that enterococci isolated from food may act as reservoirs of virulence genes. The presence of virulence factors among enterococci, especially the ability to biofilm formation is important for food safety and the protection of public health. The results presented in our work demonstrate that stress that can occur during food preservation and food processing can induce the changes in the virulence-related genes expression.
Full-text available
Untargeted sequencing of nucleic acids present in food can inform the detection of food safety and origin, as well as product tampering and mislabeling issues. The application of such technologies to food analysis may reveal valuable insights that are simply unobtainable by targeted testing, leading to the efforts of applying such technologies in the food industry. However, before these approaches can be applied, it is imperative to verify that the most appropriate methods are used at every step of the process: Gathering of primary material, laboratory methods, data analysis, and interpretation. The focus of this study is on gathering the primary material, in this case, DNA. We used bovine milk as a model to (i) evaluate commercially available kits for their ability to extract nucleic acids from inoculated bovine milk, (ii) evaluate host DNA depletion methods for use with milk, and (iii) develop and evaluate a selective lysis-propidium monoazide (PMA)-based protocol for host DNA depletion in milk. Our results suggest that magnetically based nucleic acid extraction methods are best for nucleic acid isolation of bovine milk. Removal of host DNA remains a challenge for untargeted sequencing of milk, highlighting the finding that the individual matrix characteristics should always be considered in food testing. Some reported methods introduce bias against specific types of microbes, which may be particularly problematic in food safety, where the detection of Gram-negative pathogens and hygiene indicators is essential. Continuous efforts are needed to develop and validate new approaches for untargeted metagenomics in samples with large amounts of DNA from a single host.
Full-text available
Here we propose that using shotgun sequencing to examine food leads to accurate authentication of ingredients and detection of contaminants. To demonstrate this, we developed a bioinformatic pipeline, FASER (Food Authentication from SEquencing Reads), designed to resolve the relative composition of mixtures of eukaryotic species using RNA or DNA sequencing. Our comprehensive database includes >6000 plants and animals that may be present in food. FASER accurately identified eukaryotic species with 0.4% median absolute difference between observed and expected proportions on sequence data from various sources including sausage meat, plants, and fish. FASER was applied to 31 high protein powder raw factory ingredient total RNA samples. The samples mostly contained the expected source ingredient, chicken, while three samples unexpectedly contained pork and beef. Our results demonstrate that DNA/RNA sequencing of food ingredients, combined with a robust analysis, can be used to find contaminants and authenticate food ingredients in a single assay.
Full-text available
Traditional taxonomy in biology assumes that life is organized in a simple tree. Attempts to classify microorganisms in this way in the genomics era led microbiologists to look for finite sets of 'core' genes that uniquely group taxa as clades in the tree. However, the diversity revealed by large-scale whole genome sequencing is calling into question the long-held model of a hierarchical tree of life, which leads to questioning of the definition of a species. Large-scale studies of microbial genome diversity reveal that the cumulative number of new genes discovered increases with the number of genomes studied as a power law and subsequently leads to the lack of evidence for a unique core genome within closely related organisms. Sampling 'enough' new genomes leads to the discovery of a replacement or alternative to any gene. This power law behaviour points to an underlying self-organizing critical process that may be guided by mutation and niche selection. Microbes in any particular niche exist within a local web of organism interdependence known as the microbiome. The same mechanism that underpins the macro-ecological scaling first observed by MacArthur and Wilson also applies to microbial communities. Recent metagenomic studies of a food microbiome demonstrate the diverse distribution of community members, but also genotypes for a single species within a more complex community. Collectively, these results suggest that traditional taxonomic classification of bacteria could be replaced with a quasispecies model. This model is commonly accepted in virology and better describes the diversity and dynamic exchange of genes that also hold true for bacteria. This model will enable microbiologists to conduct population-scale studies to describe microbial behaviour, as opposed to a single isolate as a representative.
Full-text available
Lower respiratory tract infections (LRTIs) lead to more deaths each year than any other infectious disease category. Despite this, etiologic LRTI pathogens are infrequently identified due to limitations of existing microbiologic tests. In critically ill patients, noninfectious inflammatory syndromes resembling LRTIs further complicate diagnosis. To address the need for improved LRTI diagnostics, we performed metagenomic next-generation sequencing (mNGS) on tracheal aspirates from 92 adults with acute respiratory failure and simultaneously assessed pathogens, the airway microbiome, and the host transcriptome. To differentiate pathogens from respiratory commensals, we developed a rules-based model (RBM) and logistic regression model (LRM) in a derivation cohort of 20 patients with LRTIs or noninfectious acute respiratory illnesses. When tested in an independent validation cohort of 24 patients, both models achieved accuracies of 95.5%. We next developed pathogen, microbiome diversity, and host gene expression metrics to identify LRTI-positive patients and differentiate them from critically ill controls with noninfectious acute respiratory illnesses. When tested in the validation cohort, the pathogen metric performed with an area under the receiver-operating curve (AUC) of 0.96 (95% CI, 0.86–1.00), the diversity metric with an AUC of 0.80 (95% CI, 0.63–0.98), and the host transcriptional classifier with an AUC of 0.88 (95% CI, 0.75–1.00). Combining these achieved a negative predictive value of 100%. This study suggests that a single streamlined protocol offering an integrated genomic portrait of pathogen, microbiome, and host transcriptome may hold promise as a tool for LRTI diagnosis.
Full-text available
Complex microbial communities shape the dynamics of various environments, ranging from the mammalian gastrointestinal tract to the soil. Advances in DNA sequencing technologies and data analysis have provided drastic improvements in microbiome analyses, for example, in taxonomic resolution, false discovery rate control and other properties, over earlier methods. In this Review, we discuss the best practices for performing a microbiome study, including experimental design, choice of molecular analysis technology, methods for data analysis and the integration of multiple omics data sets. We focus on recent findings that suggest that operational taxonomic unit-based analyses should be replaced with new methods that are based on exact sequence variants, methods for integrating metagenomic and metabolomic data, and issues surrounding compositional data analysis, where advances have been particularly rapid. We note that although some of these approaches are new, it is important to keep sight of the classic issues that arise during experimental design and relate to research reproducibility. We describe how keeping these issues in mind allows researchers to obtain more insight from their microbiome data sets.
Full-text available
In Swiss-type cheeses, characteristic nut-like and sweet flavor develops during the cheese ripening due to the metabolic activities of cheese microbiota. Temperature changes during warm and cold room ripening, and duration of ripening can significantly change the gene expression of the cheese microbiota, which can affect the flavor formation. In this study, a metagenomic and metatranscriptomic analysis of Swiss-type Maasdam cheese was performed on samples obtained during ripening in the warm and cold rooms. We reconstructed four different bacterial genomes (Lactococcus lactis, Lactobacillus rhamnosus, Lactobacillus helveticus, and Propionibacterium freudenreichii subsp. shermanii strain JS) from the Maasdam cheese to near completeness. Based on the DNA and RNA mean coverage, Lc. lactis strongly dominated (~80-90%) within the cheese microbial community. Genome annotation showed the potential for the presence of several flavor forming pathways in these species, such as production of methanethiol, free fatty acids, acetoin, diacetyl, acetate, ethanol, and propionate. Using the metatranscriptomic data, we showed that, with the exception of Lc. lactis, the central metabolism of the microbiota was downregulated during cold room ripening suggesting that fewer flavor compounds such as acetoin and propionate were produced. In contrast, Lc. lactis genes related to the central metabolism, including the vitamin biosynthesis and homolactic fermentation, were upregulated during cold room ripening.
Full-text available
We show that a citizen science, self-selected cohort shipping samples through the mail at room temperature recaptures many known microbiome results from clinically collected cohorts and reveals new ones. Of particular interest is integrating n = 1 study data with the population data, showing that the extent of microbiome change after events such as surgery can exceed differences between distinct environmental biomes, and the effect of diverse plants in the diet, which we confirm with untargeted metabolomics on hundreds of samples.Although much work has linked the human microbiome to specific phenotypes and lifestyle variables, data from different projects have been challenging to integrate and the extent of microbial and molecular diversity in human stool remains unknown. Using standardized protocols from the Earth Microbiome Project and sample contributions from over 10,000 citizen-scientists, together with an open research network, we compare human microbiome specimens primarily from the United States, United Kingdom, and Australia to one another and to environmental samples. Our results show an unexpected range of beta-diversity in human stool microbiomes compared to environmental samples; demonstrate the utility of procedures for removing the effects of overgrowth during room-temperature shipping for revealing phenotype correlations; uncover new molecules and kinds of molecular communities in the human stool metabolome; and examine emergent associations among the microbiome, metabolome, and the diversity of plants that are consumed (rather than relying on reductive categorical variables such as veganism, which have little or no explanatory power). We also demonstrate the utility of the living data resource and cross-cohort comparison to confirm existing associations between the microbiome and psychiatric illness and to reveal the extent of microbiome change within one individual during surgery, providing a paradigm for open microbiome research and education.
Full-text available
Background The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Results Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R² = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R² = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Conclusions Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.
Full-text available
The development and continuous improvement of high-throughput sequencing platforms has stimulated interest in the study of complex microbial communities. Currently, the most popular sequencing approach to study microbial community composition and dynamics is targeted 16S rRNA gene metabarcoding. To prepare samples for sequencing, there are a variety of processing steps, each with the potential to introduce bias at the data analysis stage. In this short review, key information from the literature pertaining to each processing step is described and consequently, general recommendations for future 16S rRNA gene metabarcoding experiments are made.
Full-text available
Sequencing-based microbiome profiling aims at detecting and quantifying individual members of a microbial community in a culture-independent manner. While amplicon-based sequencing (ABS) of bacterial or fungal ribosomal DNA is the most widely used technology due to its low cost, it suffers from PCR amplification biases that hinder accurate representation of microbial population structures. Shotgun metagenomics (SMG) conversely allows unbiased microbiome profiling but requires high sequencing depth. Here we report the development of a meta-total RNA sequencing (MeTRS) method based on shotgun sequencing of total RNA and benchmark it on a human stool sample spiked in with known abundances of bacterial and fungal cells. MeTRS displayed the highest overall sensitivity and linearity for both bacteria and fungi, the greatest reproducibility compared to SMG and ABS, while requiring a ~20-fold lower sequencing depth than SMG. We therefore present MeTRS as a valuable alternative to existing technologies for large-scale profiling of complex microbiomes.
The rapid growth in biological sequence data is revolutionizing our understanding of genotypic diversity and challenging conventional approaches to informatics. Due to the increasing available genomic data, traditional bioinformatic tools require substantial computational time and the creation of ever-larger indices each time a researcher seeks to gain insight from the data. To address this, we pre-computed important relationships between biological entities spanning the Central Dogma of Molecular Biology and captured this information in a relational database. The database can be queried across hundreds of millions of entities and returns results in a fraction of the time required by traditional methods. We describe IBM Functional Genomics Platform, a comprehensive database relating genotype to phenotype for bacterial life. Continually updated, the platform contains data derived from 200,000 curated, self-consistently assembled genomes. The database stores functional data for over 68 million genes, 52 million proteins, and 239 million domains with associated biological activity annotations from Gene Ontology, KEGG, MetaCyc, and Reactome. It maps the connections between each biological entity including the originating genome, gene, protein, and protein domain. We describe the data selection, the pipeline to create and update, and the developer tools.