ArticlePDF Available

Defining the food microbiome for authentication, safety, and process management

Authors:
  • Mars Global Food Safety Center

Abstract

Under intense scrutiny for safety and authenticity, our food supply encompasses probiotic supplementation, fermentation organisms, pathogenic bacteria, and microbial toxins - in short, the microbiome and metabolome of food. Recent claims regarding probiotic supplements, additives, and cultured foods highlight the need for widely accepted protocols for evidence-based oversight of such products, as well as specific methods to assess their safety and authenticity. Rapid improvements in high-throughput sequencing technologies, curated and annotated reference databases of whole genome sequences, bacterial strain banks, and novel informatics techniques coupled to a scalable computing platform are poised to provide a robust solution extendable to encompass systematic authentication of the microbiome and its variations up and down the supply chain. Members of the Sequence the Food Supply Chain Consortium are working to characterize and quantify the microbiome at a baseline and after processing. They are also working to create reference databases and develop a Metagenomics Computation and Analytics Workbench, capable of verifying the effectiveness of good manufacturing practices and monitoring control measures highlighted in a site's Hazard Analysis Critical Control Point plan. In this paper, we propose how microbial ecology, evolvability, and phylogenetic diversity exhort the application of new molecular techniques to assure safety, authenticity, and traceability for wholesome food.
Defining the food
microbiome for
authentication, safety, and
process management
B. C. Weimer
D. B. Storey
C. A. Elkins
R. C. Baker
P. Markwell
D. D. Chambliss
S. B. Edlund
J. H. Kaufman
Under intense scrutiny for safety and authenticity, our food supply
encompasses probiotic supplementation, fermentation organisms,
pathogenic bacteria, and microbial toxinsin short, the
microbiome and metabolome of food. Recent claims regarding
probiotic supplements, additives, and cultured foods highlight
the need for widely accepted protocols for evidence-based
oversight of such products, as well as specic methods to assess
their safety and authenticity. Rapid improvements in
high-throughput sequencing technologies, curated and annotated
reference databases of whole genome sequences, bacterial strain
banks, and novel informatics techniques coupled to a scalable
computing platform are poised to provide a robust solution
extendable to encompass systematic authentication of the
microbiome and its variations up and down the supply chain.
Members of the Sequence the Food Supply Chain Consortium are
working to characterize and quantify the microbiome at a baseline
and after processing. They are also working to create reference
databases and develop a Metagenomics Computation and
Analytics Workbench, capable of verifying the effectiveness of
good manufacturing practices and monitoring control measures
highlighted in a sites Hazard Analysis Critical Control Point
plan. In this paper, we propose how microbial ecology,
evolvability, and phylogenetic diversity exhort the application
of new molecular techniques to assure safety, authenticity,
and traceability for wholesome food.
Introduction
According to the Food and Agriculture Organization [1],
by 2050 the worlds food supply must grow by 70% to
meet the increase in the world population [2]. That growth
is projected to occur primarily in tropical zones, where
agriculture is largely developing, and where unique
growing conditions are distinct from those in the current
centers for food production. This will likely lead to new
hazards in the food supply and potentially lead to large
food safety and public health issues as these items are
transported around the world. These regions also have
fewer food safety laws, and this relative lack of laws
enables opportunities for fraud and addition of bacteria
that may not normally be part of the food supply today.
As our food supply continues to grow globally, the
complexities of production, distribution, and consumption
will scale as well. The industrysabilitytomeasureand
ensure that food and food ingredients are authentic and
safe throughout the global food supply chain is paramount
to guarantee that the global food supply is safe,
well-regulated, and wholesome for consumers. New food
safety regulations are being implemented globally as a
result of this situation. In September 2015, for example,
the United States published the nal rules for
implementing the Food Safety Modernization Act
for enabling use of new analytical methods [3]. In
October 2015, China implemented over 150 new foodDigital Object Identier: 10.1147/JRD.2016.2582598
B. C. WEIMER ET AL. 1:1IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
ÓCopyright 2016 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without
alteration and (2) the Journal reference and IBM copyright notice are included on the rst page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed
royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.
0018-8646/16 B2016 IBM
safety laws that also enable new analytical capacityas
well as a committee to oversee implementation of new
food safety laws, with members from academia and
industry. Both examples highlight the recognition
that expanding the global food supply must be anchored
in safety and quality.
Advanced techniques based in genomics are now
widely available around the world to meet the new
analytical challenges of food safety. This includes
whole genome sequencing (WGS) and, more recently,
metagenomic methods [46] (see Table 1 for a denition).
In Table 1, we also provide additional denitions of
some of the important terms commonly used in
discussing genomics and metagenomics. These new
technologies are often called culture-independent
or cultivation-independentmethods, which refer
to molecular techniques that provide information on the
presence, identity, and amounts of microbes or genes,
independent of cultivation media, which may
favor the growth of particular microbes
(thus biasing test results).
These technologies are gaining acceptance but are not
widely implemented in the food supply chain. One recent
case in point involved an investigation of dietary herbal
supplements by the New York State Attorney Generals
ofce that found signicant fraudulent and potentially
dangerous dietary supplements (herbal products) for sale
by major retailers [7]. In this effort, they employed
culture-independent genetic ngerprinting for product
authenticity to determine potential mislabeling and
contamination; the conclusions of this are still unclear.
The approach was similar to genetic barcoding,currently
used by the U.S. Food and Drug Administration (FDA)
to investigate fraud in the seafood industry. In the same
vein, the FDA is currently investigating and developing
genomic methods for probioticdietary supplements
involving microarray, sequencing, and associated
bioinformatic analysis for product identication and
labeling verication [8, 9]. These entirely new approaches
are hampered primarily by availability and taxonomic
depth of whole genome sequences for organisms and
microbial communities associated with specic food
types, the effect of processing, and the induced genetic
mutations during processing. Implementation of new
methods for use in food testing is critical to meet the
demands of supply chain globalization.
Table 1 A glossary of some common terms and abbreviations commonly used in a discussion of genomics and
sequencing.
1:2 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Until recently, technologies were not sufciently
developed to provide a scientically sound approach that
could be integrated into reliable solutions food producers
or processes could use with condence. Since 2011,
however, many groups that produce and regulate food
have been working to bring fundamentally new
technologies to bear that can be used routinely to monitor
the safety of the global food chain [6, 10]. Such efforts
may enable the analytical capability to address the
combination of food authenticity and safety, especially in
cases where live microbes are intentionally added and
intended for consumption. A number of countries in the
European Union, North America, and Asia are evaluating
many options for implementation to update methods for
testing, hazard identication, and molecular risk
assessment but all will implement NGS (next-generation
DNA sequencing) for public health and food safety
outbreak investigations [11]. Modernizing food safety
methods throughout the supply chain should help food
producers, ingredient distributors, processors, and
manufacturers ensure that they comply with international
standards and provide condence that enables trade and
safe consumption with the most modern molecular
methods available. Providing the means to quickly identify
unsafe ingredients before they are integrated into the
supply chain can save billions of dollars of revenue and
minimize risks to consumers by ensuring added benecial
microbes meet claims on the product label. To this end,
platforms using NGS technology and data for food-safety
applications, with whole genomes as a database, have
potential to improve safety across the food supply. Use of
community sequencing (as mentioned, termed
metagenomics) as well as culture-independent 16S
amplicon sequencing (often used only for studying
microbial diversity) are also being developed on a smaller
scale; both are limited by the costs associated with
generating sufcient data to allow for the robust
determination of microbial community membership
within a sample. However, efforts to study the whole
metagenomeare growing as evidenced by group
efforts, such as the Sequence the Food Supply Chain
Consortium (SFSCC), described in greater detail later in
this paper [1214].
After reviewing the persistence of foodborne pathogens
in the environment, we assess strategies to develop and
employ metagenomics as an analytical tool and discuss
how the same metagenomic techniques can serve to
validate probiotic content and activity, and provide
information on various food-related authentication, food
safety, and process issues critical to effective management
of the global food supply. We also describe efforts to
dene and articulate the role that the evolving eld of
genomics and Next Gen Sequencing (NGS) analytics can
play in monitoring and quality control. We conclude the
paper by describing the development of a scalable
web-based informatics software service for use in
protecting the public health.
Intersection of genomics and enteric foodborne
pathogen burden
Despite efforts to reduce foodborne outbreaks, Salmonella,
Campylobacter, enteropathogenic Escherichia coli,and
Listeria monocytogenes outbreaks continue to occur.
The largest number of cases and outbreaks arise from
Vibrio,whileSalmonella and Campylobacter infections
remain constant in recent years [15]. Recently, Listeria
cases are increasing per capita and, in the United States,
are approaching levels seen during the 1990s [15]. These
trends signal that the extensive processes to mitigate the
persistence of pathogens in previous years are having
limited impact in curtailing foodborne illness. In part,
the globalization of the food chain brings about a new
paradigm in microbial foodborne illness. Long-distance
travel as well as expanded global distribution has fueled
the increased consumption of food from increasingly
distant locations of production. One such example is the
Salmonella outbreak in May 2015 affecting Washington,
D.C., and nine states, attributed to sushi-grade tuna.
The product originated in India and was routed through at
least three ports before it was nally served in the
Washington, D.C., area [16].
It is increasingly recognized that persistence of
pathogens in the environment is leading to linked
outbreaks over many years, which was recently
highlighted by a multi-year outbreak associated with
chicken in the western United States. In this outbreak
from the same Salmonella serotype, at least seven
genotypes were persistent in the food supply for over
ve years [17].
Foodborne pathogens can adapt to multiple
environments and animal hosts, which may inuence
their persistence in the food supply [18, 19]. These
adaptations occur over very short time frames, days to
weeks (not years). One example of this is found in Vibrio,
where genome evolution leads to local adaptation and
increased persistence, and is induced in response to low
pH, high salt, and temperature shifts in the ocean [20].
These same environmental factors are modied during
food processing to control pathogens in the food supply
and may lead to the evolution of resistant strains that
could become persistent within the processing chain.
Shapiro et al. [20] found that genome-scale changes occur
at a much more rapid pace than was previously
appreciated, and they concluded that this is the basis for
population-scale evolution and can be used to trace the
origin of a specic strain using the NGS and WGS.
Another example of this evolution is the emergence,
worldwide, of hyper-virulent Salmonella during zoonotic
B. C. WEIMER ET AL. 1:3IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
transmission [21]. Using gene expression proling, the
Weimer lab (School of Veterinary Medicine, University of
California, Davis) demonstrated that an entire series of
metabolic capabilities changes induced hyper-virulence of
this organism. The trait is now stable and is being linked
to the genome evolution and mutation rate using NGS
and methylome characterization [22]. Lastly, Salmonella
enterica serotype Enteritidis, usually considered a clonal
organism, was detected using commonly available
typing methods, such as pulsed eld gel electrophoresis
(PFGE). Recently, however, Salmonella Enteritidis was
sequenced using short read technologies with isolates
from food, the environment, and a sea mammal outbreak
to nd that this serotype is much more genetically diverse
than previously appreciated and that specic lineages
can be conclusively determined using NGS and WGS [23].
These specic examples demonstrate the rapid and
consistent diagnostic power of NGS methods compared to
conventional pathogen typing methodsincluding
PFGE, Multi-Locus Sequence Typing (MLST), and
Multiple Locus Variable-Number Tandem Repeat Analysis
(MLVA). This realization presents food safety science
new challenges involving the implementation of new
methods that rapidly detect outbreaks, produce
actionable information, and allow quick interventions
during outbreaks.
Current surveillance schemes commonly mandate
culturing microbes from subsamples of samples, usually
under conditions that constitute a stressful environment
for the pathogen. These bacterial populations can be
used for detection of outbreaks and traceability. However,
the long waiting times involved (10 days for the result
and 3 to 4 weeks for serotype identication) are not
meeting the demands of the current U.S. regulatory
environment, or the spirit of the Food Safety
Modernization Act [3].
Delays in pathogen detection time negatively impact
public safety and the ability of producers, processors,
retailers, and consumers to adequately limit pathogen
levels below those expected by public health authorities.
Compounding detection time delays is the mounting
evidence that conditions used to process food create
selection pressures that result in population shifts
within the community, via induced mutations [20].
Shah et al. [24] recently veried that conditions used in
food processing lead to increased survival and virulence,
especially with temperature and acid stress, that
completely rescued Salmonella from the lethal conditions
applied to the sample, and led to resistance for
antimicrobial treatments resulting in growth and survival
at the same rate as the untreated culture.
Recent advances in sequencing technologies present
new opportunities to introduce molecular risk assessment
into public health via molecular genetics to more fully
appreciate how microbial adaptation may affect hazards
from bacteria in the food supply. These combined
examples demonstrate that genomic determination,
by the application of NGS and bioinformatic workows,
is uniquely positioned to empower new and innovative
analyses for foodborne pathogen detection, outbreak
traceability, and human case linkage in a time scale
that matches the epidemiological demands for a global
food supply.
Whole genome sequence availability and
challenges for culture-independent analytics
One hindrance to implementation of an NGS-based
approach is the lack of a baseline set of genomic
information that captures the genomic diversity of
zoonotic pathogens of signicance to the food supply and
public health. Two studies highlight increased knowledge
from the genome sequence. First, the Genomic
Encyclopedia of Bacteria and Archaea (GEBA) project
sequenced 3,500 genomes that are specic phylogenetic
branches, with the intent of increasing the breadth of
phylogenomic-based assignments. With as few as 50 new
genomes spread over the tree of life, approximately
1,060 new protein families were found [25]. In an opposite
approach, 34 very closely related Salmonella genomes
were sequenced to nd a linear increase in new protein
families [26]. These two studies demonstrate that more
genome sequences are needed to provide a comprehensive
database to represent the global genome for foodborne
bacteria. These results, combined with the observation that
abiotic stress induces genome diversity in Vibrio bacteria,
suggest that a large database of genomic resources is
essential to capture the breadth of genomic diversity
required for informed decisions in food safety applications
and comparisons for disease outbreak investigation.
Information required to create more complete databases is
growing quickly with the release of new genomes [2729].
Genomes can be found in public databases such as
the Short Read Archive (SRA), European Nucleotide
Archive (ENA), and DNA Data Bank of Japan (DDBJ).
Bioprojects, such as the 100K Pathogen Genome
Project (PRJNA186441), currently contain over
4,200 sequencing runs.
The genomic sequence availability of the most
common foodborne pathogens is uneven and does not
sufciently represent the required genetic diversity of each
organism to make adequate public health decisions
(Table 2). The diversity of Salmonella genomes [23, 26]
suggests that a very large number of genome sequences
are required to provide an adequate reference database to
pinpoint specic epidemiology questions and to begin
tracking virulence traits, such as antibiotic resistance
and effector molecule conservation. With this type of
information a new eld of molecular risk assessment can
1:4 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
be imagined that will bring about a new perspective on
public health risk [4]. Additionally, it provides the basis
for mitigation strategies. The lack of this resource
highlights the dilemma of linking serotype with virulence
as well as the burgeoning increase in sero-diversity of
Salmonella and the emergence of hypervirulence in
Salmonella and other foodborne pathogens.
In a related enteric pathogen example, non-O157
E. coli serotypes, as well the unique combinations of
emerging hybrid virulent pathotypes, highlight a need for
prospective vigilance. The 2011 outbreak of E. coli
O104 in Europe demonstrates how a rarely pathogenic
serotype became deadly through chimeric horizontal
gene transfer, loss of a key diagnostic gene (eae),
and large-scale genome rearrangement [30, 31]. The
scenario presented a very difcult problem to identify
and track using traditional methods, but was successfully
solved using NGS workows [31, 32]. However in
retrospect, this strain was very close in sequence,
hybrid pathotype, and overall genomic architecture to
Republic of Georgia strains dating from 2009 [30].
If these closely related strains had been captured by using
real-time genomic databases, the outbreak would have
been limited using appropriate diagnostic markers and
serotype awareness. Unfortunately, in the absence of
such efforts, it became the largest E. coli outbreak with
Shiga Toxigenic E. coli (STEC) etiology in terms of
magnitude/severity of illness and death. Thus, the
question of sequencing depth can most suitably be
approached with these two enteric pathogens. In 2008,
Rasko et al. [33] examined core, unique, and total genes
for 18 E. coli isolates with a presumed asymptotic limit
of 15,000 to 20,000 genes. Using this public data as an
example, Figure 1 shows that the number of genes
discovered within the Escherichia pangenome increases
with the number of sequenced genotypes as a power law.
Our observations of the approximately 3,300 currently
complete and high-quality publicly available assemblies
would suggest that the limit for new discoverable genes
has now more than doubled. It must be noted that a
single limit on discoverable genes is not a universal
concept to be applied to every bacterial species, given the
inherent differences in their clonality and diversity
relative to S. enterica sensu stricto (Figure 2). Indeed,
the power law behavior observed in Figure 1 suggests
there may be no limit. In addition, clinical isolates
dominant the E. coli phylogenetic tree that is
underrepresented for environmental, commensal, and
probiotic (e.g., see Table 2) isolates and includes some
newly observed cryptic lineages. Therefore, depth and
strain attribution become relative to species and, in fact,
should redene taxonomies and, perhaps more
importantly, the species concept itself. As we move
forward with culture independent analytics and
metagenomics, specicandveriable identication of
organisms hinges on availability of reference WGS and
will be paramount for realizing and developing the
full, applied potential of this technology for food
safety and authentication.
Figure 1
Observed pangenome using publically available closed or partially
assembled E coli whole genome sequences from NCBI The best
functional t to this data is a power law (not logarithmic or exponential)
Curve tting was done with respect to 344 observed genome types
(representing 3,348 genome sequences), with new genes determined
using BLAST criteria of 95% identity over 90% of the gene length
Genome types were identied to computationally condense and simplify
whole genome data to a common reference representative with more
than 0 2% genetic difference
Table 2 Genome sequence availability of probiotic
bacteria from various public databases (as of 9/2015).
The data in this table comes from the National Center
for Biotechnology Information (NCBI). Sequence Read
Archive (SRA) submissions include both DNA and
RNA sequencing samples.
B. C. WEIMER ET AL. 1:5IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Cultured foods as a paradigm for microbial
signature process monitoring
For proof of concept, foods with live microbial cultures
such as yogurt present a measurable microbial signature to
stratify the nature of changes in the food microbiome
during manufacturing. The use of probiotic bacteria has
been projected to increase [34], as it becomes more
common to add probiotics to many types of foods that
result in microbiome shifts via live and active strains [35].
Both health and nutritional benets of probiotics have
been claimed. Barzegari et al. [36] suggest that probiotic
additives should be designed for different foods that target
succession of gut microbiota communities at different
developmental stages. Ganesan et al. [37] found that
probiotics survive well as food ingredients for extended
periods in fermented products, where, although perhaps
not culturable, they actively produce bioactive compounds.
Together, these ndings indicate that the baseline or
foundational microbiome of a food can shift, as benecial
bacteria are added. Relative (and absolute) abundance of
these benecial organisms may be predictive of addition
and safety. This concept suggests that a stable and
disease-resistant microbiome in the supply chain can
ultimately reduce foodborne disease. Measurement of
the microbiome of foods and their ingredients will enable
robust classication and predictions that can be used in
developing bioinformatic toolsets. It can also support
the denition of new biomarkers for surveillance assays
to routinely test the delity of products in the food
supply chain.
Probiotics marketed with generalized health claims
should contain live and active cultures. In cultured
foods, standard plating techniques are unlikely to provide
accurate proles of the food microbiome. Interestingly,
addition of single probiotic bacteria can shift the food
microbiota in many fermented products, in addition to
changing the metabolite proles [37, 38]. This is also
observed for production of bioactive ingredients that alter
the microbial community using small molecules [3943].
This body of work presents a challenge that must be
solved by integrating multiple genomic, proteomic, and
metabolomic (so-called multi-omic) methods for use in
analysis individual methods are not sufcient to fully
capture the live and active nature of the biosystem.
This is especially true with addition of a live and active
probiotic intended to alter the human microbiome and
health as well.
The food matrix itself can be viewed as the habitat for a
community of microbes; as food ages and spoils, the
resulting biochemical changes leads to a succession in the
community. Similarly, routine processing and handling
(such as marinating meat) leads to changes in pH, salinity,
and a corresponding change in the microbial community.
Numerous scientic studies have demonstrated the
importance of microbial ecology to understanding the
effects of food processing and spoilage [44].
Microbial communities are dynamic systems that change
in response to changes in their environment including, for
example, the addition of new (probiotic) organisms [45].
Traditionally, sequencing of only 16S genes has been used
to dene community membership. However, this measure
is uneven for phylogenetic estimation and often leaves
one wanting more information about the state of the
community and its viability. By studying all the genes
(i.e., the complete metagenome), the community
composition and functional potential can be observed,
so as to provide evidence for a safe microbiome, or a
microbiome functioning as intended (e.g., for a fermented
product). This is predicated on the hypothesis that the
background microbiome inherent to the food is sufciently
stable to be predictive of both the food type (source) and
shifts in response to the addition of specic components
that drive the biochemistry within the food to shift the
community. Viewed as a biochemical dynamic system,
a food or food ingredient provides an environment for a
predictable set of genes(i.e., sets of organisms and sets
of metabolic pathways) restrained by the active genes
in the microbiome (i.e., expressed RNA) [46]. For this
reason, it is not sufcient to study microbiome DNA (most
of which may be derived from the host ingredient itself).
Rather it is necessary to identify both normal and
abnormal microbial communities in terms of the active
genes derived from live cells (host and microbiome)
that cooperate to create a novel environment (that may
Figure 2
Comparing the species landscape diversities for the genera
Escherichia versus Salmonella, which are major foodborne pathogens
with available and extensive sequence depth The scale bars indicate a
genetic distance of 1% relative to the respective core genome
Salmonella and Escherichia are both classied as genera but the spe-
cies diversity (and the phylogenetic complexity) of Escherichia is
higher than that of Salmonella Neighbor joining trees were generated
from the proportional distance of SNP (single nucleotide polymor-
phism) differences in core genes using the MEGA 3 1 software
1:6 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
reect the changing metabolite proles mentioned above).
The RNAseq data, a set of transcripts, can be used for
identication of genes and for studying the functions they
are performing [47].
We envision using metaRNAseq, not as a tool for gene
expression, but rather as a tool to reveal genes and gene
sets that change as the microbiome shifts in time or in
response to changes in food composition [48]. This
approach may be useful analytically by measuring the
resulting meta-transcriptome of the microbiome
direct-from-product to discover hundreds of evidence lines
that may be predictive of process effects relative to
specic organisms in the community, and how they shift
the microbiome in constructive or healthyways.
Constructive shifts may include resistance to spoilage
and/or protection from pathogenic incursion. Conversely,
undesirable shifts could favor pathogen growth and lead
to an increased risk of disease, again which could be
mitigated by process control. These aims directly require
the use of measures for live and active organisms in
addition to the metabolites that they produce after addition
and during storage.
Focusing initially on the community metaRNAseq
signature (mRNAsig) has several additional advantages
over the use of culturing followed by WGS or WGS
sequencing alone. It is agnosticto and robust against
horizontal gene transfer and the transfer of plasmids.
Over time, evolving organisms may respond to pressure by
incorporating new genes. The mRNAsig will detect this
and can be made use of in detection, source tracking,
and outbreak investigation. Since the RNA transcriptome
will also provide ribosomal 16S RNA, this approach also
provides an opportunity to calibrate metaRNAseq
method with respect to more traditional 16S community
analyses, allowing for both species and gene
identication [47, 48].
As mentioned previously, implementation of an
NGS-based approach for authentication of probiotic
microbiota in food requires a reference database of whole
genomes (WGDs) and metagenomes (MGDs). WGDs are
needed to ensure that added bacteria are classied
correctly in the community and to capture the genomic
diversity of specic organisms. The food microbiome
database is also needed to determine what is normal
and how this shifts with and without probiotic bacteria
supplementation. The absence of such a standard
database is a signicant hindrance to global food safety
and public health in broadly implementing genomics-based
methods. Use of projects like GenomeTrakr and the
100K Pathogen Genome Project attempt to solve this for
pathogenic bacteria, but there is not yet a coordinated
effort to address this problem or to validate the
completeness of candidate references in the probiotic and
functional microbe arena. Validation and authentication of
probiotic composition, by denition, must detect not only
organisms by names, but also the active genes in a
probiotic microbiome.
The Lactic Acid Bacteria Genome Consortium also used
this database-development approach in the early 2000s to
sequence the genomes of closely related lactic acid
bacteria, resulting in reclassication of the entire group of
organisms [49]. Since then, additional genomes from this
group of probiotic bacteria have been sequenced, and
technologies have advanced so that closed genomes can be
produced in a short time.
Genome sequencing: Quality control and scaling
for discrete data
The availability of genomic sequences for bacteria offers
resources for forming reference data sets for analysis, but
these resources are uneven, poorly curated, of questionable
quality, and do not adequately represent the required
organism diversity [50].
The same is true for probiotic organisms (Table 2).
Recently it was found that the SRA has a large proportion
of sequences from bacterial sources that are not of
sufcient quality to be used as a reference because they
cannot be assembled into a meaningful sequence [5154].
For example, the diversity of Salmonella genomes
(Figure 3) suggests that a very large number of genome
sequences may be required to provide an adequate
Figure 3
The cumulative rate of discovery of putative new genesas a function
of the number of isolates of Salmonella increases as a power law
(linear on a log-log plot) Sequencer output for each of 792 isolates
was assembled using ABySS (abyss-pe v 1 5 2) [52, 53], and the re-
sulting contigs (contiguous overlapping sequences) were annotated
using Prokka (v 1 10) [54] Random subsets of those isolates were
selected Each point is the number of distinct amino acid sequences
inferred for regions annotated as hypothetical proteins for one subset,
plotted against the number of isolates in that subset
B. C. WEIMER ET AL. 1:7IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
reference database to pinpoint specic epidemiology
questions and to begin tracking virulence traits, such as
antibiotic resistance and effector molecule conservation
that are known to be directly involved in disease risk and
outbreaks. Figure 3 shows the cumulate rate of observation
of newputative genes as a function of number of
isolates sampled. The data was obtained by de novo
assembly of over 4,000 individual isolates from the 100K
Pathogen Genome Project. A subset of 792 samples was
then selected at random in separate (i.e., jackknife) trials,
and the cumulative gene count was determined for each
trial. The approximately linear behavior observed on the
log-log plot suggests a power law behavior, with the rate
of cumulative gene discovery increasing with the number
of isolatesas observed above with de novo assembly
data. In the following formulation, Ngenes is the number of
genes, and !isolates is the number of isolates:
Ngenes /!0:465
isolates:
The exponent observed in this analysis of Salmonella
diversity (0.465) is quite close to the exponent observed in
our study of Escherichia suggesting possible universality.
This type of scale-free or power-law behavior is
reminiscent of the MacArthur-Wilson scaling law, well
established in insular biogeography and other population
ecology studies [55, 56]. Considering that horizontal gene
transfer is common in the environment and among many
types of bacteria, the power-law behavior suggests that
each and every isolate man be considered as a sample
from a different microbial community, and the diversity
of that community scales with the number of isolates
just as the biodiversity in any ecological habitat scales
with the area of the underlying habitat. Understanding the
underlying scaling behavior will allow prediction of the
number of samples required for gene discovery and can
reveal the universality of observed exponents. This
perspective is new to food microbiology and enables the
move to a population-based understanding of the
isolates that treats them as individuals rather than a
homogenous type of organism, e.g., Salmonella.The
concept can also be expanded to metagenomes or
metaRNAseq data sets, where individual samples have
ecologies that are unique within a type of environment
(e.g., a factory, a eld, or a food source).
Over time, as the necessary data are collected, a new
eld of molecular risk assessment will evolve, bringing
new perspectives on public health benets and risks.
These in turn will enable a new type of large scale
genomic-based microbiology to solve long standing
questions that traditional isolate and growth microbiology
cannot address. For these reasons, the development of
reference typecultures and genomes of probiotic taxa
and metagenomic databases are critical for avoiding errors,
such as concluding about the presence of an organism that
is not actually in the sample (a false positive).
Sequence the food supply chain: Paths to
harnessing HTS analytics
Recently, Mars and IBM announced the SFSCC, a
community effort to explore how advanced
next-generation sequencing technologies can be harnessed
as an analytical tool, providing a window into food
maturation processes at a scale and depth never before
achieved [1214]. Consortium members, including
universities, are conducting experiments to follow the
microbiota of foods and food ingredients using
meta-transcriptomes. This technique can provide
conclusive evidence that demonstrates live and active
bacteria in the food as well as quantitative estimates
for monitoring probiotic mixtures and validating ingredient
claims, which are not possible using culture methods.
More generally, the method provides insight into how
microbial ecology affects any food [57].
In addition to microbial content shifts, parallel
controlled studies of the metabolome in the same samples
can provide evidence for the composition, diversity,
and metabolic byproducts of probiotic additives, as well as
metabolic shifts in the underlying microbiome. Moreover,
dening new fast and inexpensive small molecule
monitors and tests can be implemented in a manufacturing
environment as surrogate markers, and provide early
warning signs to use more advanced and denitive tests,
such as HTS (high-throughput sequencing), to conrm the
initial results. Broadly stated, the impact of this program
could go beyond safety to guide new food development
and process improvement paradigms for improved
efciency and signicant economic benet. Studying the
complex communities of organisms in food requires
this type of multidimensional program. He et al. [58]
recently demonstrated that there is a predictable and
microbe specic response in vitro. They combined
meta-transcriptomics and NMR metabolomics to determine
that specic strains of E. coli have different small
molecule signatures during infection and association.
A scalable informatics platform to address the food
microbiome using culture-independent HTS and
metaRNAseq can be extended to metabolomics directly in
food. This would establish the baseline microbial
community for food ingredients and measure how the
inclusion of new organisms shifts both the microbiome
and desirable food chemistry. Understanding this baseline
requires more than just organism identication.
Quantication of the organisms, the gene sets, and their
dynamics is also required. We are targeting four areas
for investigation as prerequisites to harnessing HTS
analytics for food safety food applications. The rst three
provide the framework for the fourth.
1:8 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Area 1: Characterization and quantication of the
baseline microbiome
A starting hypothesis is that microbial communities are
strongly and consistently inuenced by their surrounding
environment. We anticipate that between communities,
group metabolism is patterned, consistent, and predictive,
so that actively expressed gene sets can be used as
markers of particular communities. Determining the
normality or core gene sets, as well as those that are
highly variable between foods, can then fuel authentication
estimates. These in turn will predict specic communities
that are indicative of environmental (or ingredient)
shifts, including shifts indicative of food fraud.
We further hypothesize that communities from different
regions (but the same food product) may vary in
taxonomic community measurements, but will be
conserved in actively expressed metabolic pathways.
As such, it is necessary to investigate and compare
communities in terms of both phylogenies and metabolic
function so as to identify specicsetsofgenesthatare
predictive, given food type and chemistry.
Area 2: Characterization and quantication of the
shifting microbiome after the addition of probiotics
The introduction of new members to a community
(e.g., probiotic additives) to food products and ingredients
will shift the community equilibrium to a new steady
state. We anticipate that this shift will be measurable and
consistent through changes in gene expression (through
RNAseq) and metabolites, and that these shifts will
occur in a predictive manner over time. The acquisition of
this data, in combination with the baseline active
microbiome estimates of raw products, will also allow for
the development of predictive models and robust assays
through data feature extraction.
Dening the mathematical relationship between
probiotic communities in nished products as a function
of the starting microbial communities, introduced bacteria,
and environmental factors requires a saleable
microbiological and computational effort.
Area 3: Systematic creation of a reference database of
probiotic genomes from well-dened isolates
Common analytical methods associated with HTS are not
exempt from errors that may falsely indicate risk where
there is none or fail to detect risk where it in fact exists.
The probability of such errors can be minimized by
implementing good experimental design and rigorous
statistical techniques. In order to further minimize
these risks the systematic creation of a reference database
of probiotic genomes from well-dened isolates is
paramount. While other projects are doing this for
pathogens, no such project yet exists for probiotic bacteria
or metaRNAseq microbiome estimates.
Area 4: Deployment of a software service for the analysis
of microbiomes
Leveraging a systems biology approach in an
environmental setting will allow samples to be
characterized in terms of both taxonomy and function.
Ensuring that this is performed in a timely fashion requires
exible and scalable informatics solutions. However,
industry adoption of new molecular methods may result in
unintended observations of previously unrecognized
hazards in the food supply. It is likely that in-depth
sequencing will uncover previously unrecognized and
undetected community members. Caution is required to
ensure that the sequences do not falsely indicate the
presence of organisms (which are not present) if short
sequence comparisons are used. Creation of a probiotic
reference culture and metagenomic database will reduce
the likelihood of false positives, determine sensitivity
and selectivity of molecular protocols, and support
proper calibration of molecular methods, all required for
effective regulatory policies and action.
To address this fourth area, we have created the
Metagenomics Computation and Analytics Workbench
(MCAW), a scalable web-based informatics system [59].
Understanding that not all organisms, and not all genes,
have been identied in food systems, we recognize that
the system needs to accommodate and make use of a
growing corpus of reference data, probably expanding to
hundreds of terabytes. Therefore, the system is extensible,
allowing for new informatics techniques to be incorporated
and for workows to be developed to reect new scientic
knowledge. This provides elastic performance and
high-performance parallel computing across multiple
nodes to service the needs of multiple concurrent users,
each with terabytes of data of interest. A future
cloud-based implementation will be necessary to support
the productivity of users in different roles who may not be
experts in software and systems, efciently presenting
views of pivotal metrics, and automatically coping
with metadata and storage management on tens of
thousands of datasets.
Analysis of a variety of microbiome metadata sets
demonstrates that as many as 50% to 80% of
metaRNAseq reads are unidentied [49]. However,
these same unidentied reads are repeatable features of
datasets from equivalent sources. For this reason,
a metagenomic software service for food safety must
track both annotated and un-annotated transcripts, and
compare both known and unknown components from
sample to sample. This necessitates a well-designed
highly structured database and associated operations for
querying and adding data. Furthermore, extraction of
salient features from the data for use as predictive markers
will be necessary in order to deliver robust assays for use
in eld settings.
B. C. WEIMER ET AL. 1:9IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Conclusion
Ongoing improvement in next-generation sequencing
technologies, curated and annotated genomic references,
and bacterial strain banks are poised to enable new food
safety, traceability, and quality solutions, leveraging
systematic authentication of the microbiome and its
variation up and down the supply chain. This approach
depends upon easily available and scalable computation
and informatics. Bioinformatic workows are too often
built and executed based on ad hoc lab specicworkows,
with documentation of the actual process rarely being well
recorded or presented. In order for next generation
sequencing to realize its full potential across a wide
variety of food authentication, quality, and food safety
applications, there is a need to expose, formalize, and
standardize workows for bioinformatics analysis,
processing, and data management. The provenance of
intermediate and nal results data must be retained in a
system of record, both to support scientic validation and
to provide auditability of externalized conclusions.
Industrial organizations seeking to make use of
genomic, metagenomic, and metabolomic data to protect
food safety require that the informatics technologies
be available as an inexpensive and scalable cloud
computing and web-based service. With such a service, all
data sets and the workows used to create them should
be stored using well-designed data management systems that
make use of industry-standard software and architecture.
The critical role of the bioinformatics used to create the
necessary evidence requires the creation, demonstration,
and use of an extensible and modular informatics
software service. As discussed above, standard libraries
for authentication of probiotic composition (and safety)
will improve as more genome sequences become available,
providing a robust database to represent the global genome
for use in metaRNAseq and other metagenomic
applications. As such, a bioinformatic software service
must index and track the history of all reference databases
used in analyses by dataset. It should also track and
organize the inevitable discovery of unknownproteins
(genes) and gene families and indexing that data to aid in
the renement and growth of the reference database itself.
The Metagenomics Computation and Analytics
Workbench, now under development by the SFSCC, offers
a model for a scalable software service that applies HTS
technologies to food safety. The workbench builds upon
the consortiums research, described above, to characterize
and quantify the microbiome from baseline to nal steady
state and create a reference database from well-dened
isolates to support the analysis of microbiomes and
development of assays.
The consortiums research promises to impact other
related and important goals. We expect the new
biomarkers identied by this work will have a variety of
applications including surveillance assays to routinely
test the delity of products in the food supply chain.
Furthermore, the informatics service now in place to
achieve the goals of this project will have direct
application to an even broader range of goals for food
safety. The FDA, the U.S. Department of Agriculture,
the Centers for Disease Control and Prevention, and other
entities all stand to benet from an informatics facility that
is reproducible, scalable, exible (extendible) and high
performance. It will drive end-to-end workows on large
collections of samples, so that both the labor cost and risk
of error in production deployments can be minimized.
The system can be adapted to several application and
use cases and the platform can accommodate users with
different roles and expertise, be resource- and
cost-efcient, and scalable through cloud and other
distributed techniques.
Acknowledgments
We would like to acknowledge Mr. Mark K. Mammel
and Dr. David W. Lacher from the Food and Drug
Administration, Division of Molecular Biology,
Center for Food Safety and Applied Nutrition, for their
contributions to this manuscript including the creation of
Figures 1 and 2. We also thank Judy Douglas for her
help and advice.
References
1 Food and Agriculture Organization of the United Nations,
2050 A third more mouths to feed [Online] Available http //
www fao org/news/story/en/item/35571/icode/
2 R A Fischer, D Byerlee, and G O Edmeades, Can
technology deliver on the yield challenge to 2050,presented at
the Expert Meeting on How to Feed the World in 2050, 2009
[Online] Available http //www z-punkt de/megatrends-ueber-
sicht html
3Food safety modernization act (FSMA),US Food Drug
Admin , Silver Spring, MD, USA, Public Law 111-353, 2011
4 C A Elkins, M L Kotewicz, S A Jackson, D W Lacher,
G S Abu-Ali, and I R Patel, Genomic paradigms for
food-borne enteric pathogen analysis at the USFDA Case studies
highlighting method utility, integration and resolution,Food
Additives Contaminants, A Chem Anal Control Expo Risk
Assess , vol 30, pp 142236, 2013
5 S R Leonard, M K Mammel, D W Lacher, and C A Elkins,
Application of metagenomic sequencing to food safety
Detection of shiga toxin-producing escherichia coli on
fresh bagged spinach,Appl Environ Microbiol , vol 81,
pp 818391, 2015
6 T M Bergholz, A I Moreno Switt, and M Wiedmann,
Omics approaches in food safety Fullling the promise?
Trends Microbiol , vol 22, pp 275281, 2014
7 A OConnor, New York attorney general targets supplements
at major retailers,New York Times, New York, NY, USA,
Feb 3, 2015 [Online] Available http //well blogs nytimes com/
2015/02/03/new-york-attorney-general-targets-supplements-at-ma-
jor-retailers
8 J N Patro, P Ramachandran, J L Lewis, M K Mammel,
T Barnaba, E A Pfeiler, and C A Elkins, Development and
utility of the FDA GutProbeDNA microarray for identication,
genotyping and metagenomic analysis of commercially available
probiotics,J Appl Microbiol , vol 118, pp 147888, 2015
1 : 10 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
9 J N Patro, P Ramachandran, T Barnaba, M K Mammel,
J L Lewis, and C A Elkins, Culture-independent metagenomic
surveillance of commercially available probiotics with
high-throughput next-generation sequencing,mSphere, vol 1,
no 2, Mar -Apr 2016, Art no e0005716 doi 10 1128/
mSphere 00057-16
10 K Kupferschmidt, Epidemiology Outbreak detectives embrace
the genome era,Science, vol 333, no 6051, pp 18181819, 2011
11 J Whitworth, WGS will become sole method but current typing
capacity should be available-ECDC,Food Quality News,
Oct 23, 2015 [Online] Available http //www foodqualitynews
com/Lab-Technology/WGS-cost-and-time-comparable-to-current-
typing-methods
12 J Welser, Sequencing the Food Supply Chain How a New
Consortium Will Improve Food Safety,Forbes, Jan 29, 2015
[Online] Available http //www forbes com/sites/ibm/2015/01/29/
sequencing-the-food-supply-chain-how-a-new-consortium-will-
improve-food-safety/
13 Launch pioneering effort to drive advances in global food
safety,IBM, Mars, Inc , Armonk, NY, USA, Jan 2015
[Online] Available http //www mars com/global/press-center/
global-food-safety aspx
14 Consortium for Sequencing the Food Supply Chain IBM
Research and Mars tackle global health with food safety
partnership [Online] Available http //www research ibm com/
client-programs/foodsafety/
15 Enteric diseases epidemiology branch Surveillance reports,
Centers Disease Control Prevent , Foodnet, Atlanta, GA, USA
[Online] Available http //www cdc gov/ncezid/dfwed/edeb/
reports html
16 L Bever, Salmonella outbreak linked to raw tuna sushi spreads
to nine states,The Washington Post, Washington, DC, USA,
May 22, 2015 [Online] Available http //www washingtonpost
com/news/morning-mix/wp/2015/05/22/salmonella-outbreak-linked
17 A C Kimura, V Reddy, R Marcus, P R Cieslak,
J C Mohle-Boetani, H D Kassenborg, S D Segler,
F P Harnett, T Barrett, and D L Swerdlow, Chicken
consumption is a newly identied risk factor for sporadic
Salmonella enterica serotype Enteritidis infections in the
United States A case-control study in FoodNet sites,Clin
Infect Diseases, vol 38, suppl 3, pp S244S252, 2004
18 V Ferreira, M Wiedmann, P Teixeira, and M J Stasiewicz,
Listeria monocytogenes persistence in food-associated
environments Epidemiology, strain characteristics, and
implications for public health,J Food Protect , vol 77,
pp 15070, 2014
19 M Begley and C Hill, Stress adaptation in foodborne
pathogens,Annu Rev Food Sci Technol , vol 6, pp 191210,
2015
20 B J Shapiro, J Friedman, O X Cordero, S P Preheim,
S C Timberlake, G Szabó, M F Polz, and E J Alm,
Population genomics of early events in the ecological
differentiation of bacteria,Science, vol 336, pp 4851, 2012
21 D M Heithoff, W R Shimp, J K House, Y Xie,
B C Weimer, R L Sinsheimer, and M J Mahan, Intraspecies
variation in the emergence of hyperinfectious bacterial
strains in nature,PLoS Pathogen, vol 8, no 4, 2012,
Art no e1002647
22 P Chen, R Jeannotte, and B C Weimer, Exploring bacterial
epigenomics in the NGS era-a new approach for an emerging
frontier,Trends Microbiol , vol 22, pp 292300, 2014
23 X Deng, P T Desai, H C den Bakker, M Mikoleit, B Tolar,
E Trees, R S Hendriksen, J Frye, S Porwollik, B C Weimer,
M Wiedmann, G M Weinstock, M McClelland, and
P I Fields, Genomic epidemiology of Salmonella enterica
serotype Enteritidis based on population structure of prevalent
lineages,Emerg Infect Diseases, vol 20, no 9,
pp 14811489, 2014
24 J Shah, P Desai, D Chen, J Stevens, and B C Weimer,
Proteomics of cold stress in Salmonella enterica sv Typhimur-
ium LT2,Appl Environ Microbiol , vol 79,
pp 72817289, 2014
25 D Wu, P Hugenholtz, K Mavromatis, R Pukall, E Dalin,
N N Ivanova, V Kunin, L Goodwin, M Wu, and J A Eisen,
A phylogeny-driven genomic encyclopedia of Bacteria and
Archaea,Nature, vol 462, pp 10561060, 2009
26 A Jacobsen, R S Hendriksen, F M Aaresturp, D W Ussery,
and C Friis, The Salmonella enterica pan-genome,Microb
Ecol , vol 62, pp 487504, 2011
27 C H Lüdeke, N Kong, B C Weimer, M Fischer, and
J L Jones, Complete genome sequences of a clinical and an
environmental Vibrio parahaemolyticus isolate,Genome
Announce, vol 3, no 2, 2015, Art no e00216
28 D B Storey, A M Weis, N Kong, A K Townsend,
W A Miller, B A Byrne, C C Taff, B Gilpin, C Mason,
C Fitzgerald, and B C Weimer, Large-scale release of
Campylobacter draft genomes; resources for food safety and
public health from the 100K Pathogen Genome Project,100K
Project Bioproject, 2015 [Online] Available http //www ncbi
nlm nih gov/bioproject/?term=PRJNA186441
29 D B Storey, N Arabyan, W Ng, K Thao, P Chen, N Kong,
C Huang, S Fouthoui, and B C Weimer, 2015 Large-scale
release of Salmonella draft genomes; resources for food safety
and public health from the 100K Pathogen Genome Project
[Online] Available http //www ncbi nlm nih gov/bioproject/?
term=PRJNA186441
30 S A Jackson, M L Kotewicz, I R Patel, D W Lacher,
J Gangiredla, and C A Elkins, Rapid genomic-scale analysis
of Escherichia coli O104 H4 by using high-resolution alternative
methods to next-generation sequencing,Appl Environ
Microbiol , vol 78, pp 16011605, 2012
31 A Mellmann D Harmsen, C A Cummings, E B Zentz,
S R Leopold, A Rico A, K Prior, R Szczepanowski, Y Ji,
W Zhang, S F McLaughlin, J K Henkhaus, B Leopold,
M Bielaszewska, R Prager, P M Brzoska, R L Moore,
S Guenther, J M Rothberg, and H Karch, Prospective
genomic characterization of the German enterohemorrhagic
Escherichia coli O104 H4 outbreak by rapid next generation
sequencing technology,PLoS One, vol 6, no 7, 2011,
Art no e22751
32 Y H Grad, M Lipsitch, M Feldgarden, H M Arachi,
G C Cerqueira, M FitzGerald, B J Haas, C I Murphy,
C Russ, S Sykes, B J Walker, J R Wortman, S Young,
Q Zeng, A Abouelleil, J Bochicchio, S Chauvin, T DeSmet,
S Gujja, C McCowan, A Montmayeur, S Steelman,
J Frimodt-Moller, A M Petersen, C Struve, K A Krogfelt,
E Bingen, F -X Weill, E -S Lander, C Nusbaum,
B W Birren, D T Hung, and W P Hanage, Genomic
epidemiology of the Escherichia coli O104 H4 outbreaks in
Europe, 2011,Proc Nat Acad Sci , vol 109, no 8,
pp 30653070, 2012
33 D A Rasko, M J Rosovitz, G S Myers, E F Mongodin,
W F Fricke, P Gajer, J Crabtree, M Sebaihia, N R Thomson,
R Chaudhuri, I R Henderson, V Sperandio, and J Ravel,
The pangenome structure of Escherichia coli comparative
genomic analysis of E coli commensal and pathogenic isolates,
J Bacteriol , vol 190, pp 68816893, 2008
34 I Siró, E Kápolna, B Kápolna, and A Lugasi, Functional
food Product development, marketing and consumer
acceptanceA review,Appetite, vol 51, pp 456467, 2008
35 B Ganesan, C Brothersen, and D J McMahon, Fortication
of foods with omega-3 polyunsaturated fatty acids,Critical Rev
Food Sci Nutrition, vol 54, no 1, pp 98114, 2014
36 A Barzegari and A A Saei, Designing probiotics with respect
to the native microbiome,Future Microbiol , vol 7, no 5,
pp 571575, 2012
37 B Ganesan, P Dobrowolski, and B C Weimer, Identication
of the Leucine-to-2-Methylbutyric Acid Catabolic Pathway of
Lactococcus lactis,Appl Environ Microbiol , vol 72,
pp 42644273, 2006
38 B Ganesan, M Stuart, and B C Weimer, Carbohydrate
starvation causes a metabolically active but nonculturable state in
Lactococcus lactis,Appl Environ Microbiol , vol 73,
pp 24982512, 2007
B. C. WEIMER ET AL. 1 : 11IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
39 J Ferreyra, K J Wu, A J Hrykowian, D M Bouley,
B C Weimer, and J L Sonnenburg, Gut microbiota-produced
succinate promotes Clostridum difcile infection after antibiotics
or motility disturbance,Cell Host Microbe , vol 16, no 6,
pp 770777, 2014
40 A Marcobal, M Barboza, E D Sonnenburg, E Martens,
P Desai, C Lebrilla, B C Weimer, D A Mills, B German,
and J L Sonnenburg, Bacteroides in the Infant Gut Consume
Milk Oligosaccharides via Mucus-Utilization Pathways,Cell
Host Microbe , vol 10, pp 50714, 2011
41 E A Maga, B C Weimer, and J D Murray, Dissecting the
role of milk components on gut microbiota composition,
Gut microbes, vol 4, no 2, pp 136139, 2013
42 E A Maga, P Desai, B C Weimer, N Dao, D Küeltz, and
J D Murray, Consumption of lysozyme-rich milk can alter
microbial fecal populations,Appl Environ Microbiol , vol 78,
no 17, pp 61536160, 2012
43 M Barboza, J W Froehlich, J Pinzon, I Moeller, B Lonnerdal,
J B German, B C Weimer, and C B Lebrilla, Glycosylation
of human milk lactoferrin exhibits dynamic changes during
early lactation enhancing its role in pathogenic bacteria-host
interactions,Molecular Cellular Proteomics, vol 11, no 6,
2012, Art no 015248
44 T T Nieminen, K Koskinen, P Laine, J Hultman, E Säde E,
L Paulin, A Paloranta, P Johansson, J Björkroth, and
P Auvinen, Comparison of microbial communities in marinated
and unmarinated broiler meat by metagenomics,Int J Food
Microbiol , vol 157, no 2, pp 142149, 2012
45 M Powell, W Schlosser, and E Ebel, Considering the
complexity of microbial community dynamics in food safety risk
assessment,Int J Food Microbiol , vol 2, pp 171179, 2004
46 S van Hijum, E E Vaughan, and R F Vogel, Application of
state-of-art sequencing technologies to indigenous food
fermentations,Current Opinion Biotechnol , vol 24, no 2,
pp 178186, 2013
47 S Weckx, R Van der Meulen, J Allermeersch, G Huys,
P Vandamme, and P Van Hummelen, Community dynamics
of bacteria in sourdough fermentations as revealed by their
metatranscriptome,Appl Environ Microbiol , vol 76, no 16,
pp 54025408, 2010
48 M M Leimena, J Ramiro-Garcia, M Davids, B van den
Bogert, H Smidt, E J Smid, J Boekhort, E G Zoendal,
P J Schapp, and M Kleerebezem, A comprehensive
metatranscriptome analysis pipeline and its validation using
human small intestine microbiota datasets,BMC Genomics,
vol 14, no 530, pp 214, 2013
49 K Makarova, A Slesarev, Y Wolf, A Sorokin, B Mirkin,
E Koonin, and D Mills, Comparative genomics of the lactic
acid bacteria,Proc Nat Acad Sci , vol 103, no 42,
pp 1561115 616, 2006
50 R E Timme, M W Allard, Y Lao, E Strain, J Pettengill,
C Want, C Li, C E Keys, J Zheng, R Stones, M R Wilson,
S M Musser, and E W Brown, Draft genome sequences of
21 Salmonella enterica serovar Enteritidis strains,J Bacteriol ,
vol 194, no 21, pp 59945995, 2012
51 M L Land, D Hyatt, S R Jun, G H Kora, L J Hauser,
O Lukjancenko, and D W Ussery, Quality scores for
32,000 genomes,Stds Genomic Sci , vol 9, no 1, p 20,
2014
52 R H MacArthur and E O Wilson, The Theory of Island
Biogeography Princeton, NJ, USA Princeton Univ Press, 1967
53 J T Simpson, K Wong, S D Jackman, J E Schein,
S J M Jones, and İBirol, ABySS A parallel assembler for
short read sequence data,Genome Res , vol 19, no 6,
pp 11171123, 2009
54 T Seemann, Prokka Rapid prokaryotic genome annotation,
Bioinformatics, vol 30, no 14, pp 20682069, Jul 2014
[Online] Available http //www ncbi nlm nih gov/pubmed/
24642063
55 E Afshinnekoo, C Meydan, S Chowdhury, D Jaroudi,
C Boyer, N Bernstein, J M Maritz, D Reeves, J Gandara,
and C E Mason, Geospatial resolution of human and bacterial
diversity with city-scale metagenomics,Cell Syst , vol 1, no 1,
pp 7287, 2015
56 D Ercolini, High-throughput sequencing and metagenomics
Moving forward in the culture-independent analysis of food
microbial ecology,Appl Environ Microbiol , vol 79, no 10,
pp 31483155, 2013
57 X He, D O Mishchuk, J Shah, B C Weimer, and
C M Slupsky, Cross-talk between two E coli strains and a
human colorectal adenocarcinoma-derived cell line,Sci Rep ,
vol 3, pp 34163426, 2013
58 S Edlund, D Chambliss, J Kaufman, D B Storey, and
B Weimer, A scalable platform for meta-genomic analysis,
presented at the 143rd American Public Health Association
(APHA) Annual Meeting and Exposition, Oct 31Nov 4, 2015
[Online] Available https //apha confex com/apha/143am/
webprogram/Paper334559 html
59 S B Edlund, K L Beck, N Haiminen, L P Parida, D B Storey,
B C Weimer, J H Kaufman, and D D Chambliss, Design of
the MCAW compute service for food safety bioinformatics,
IBM J Res & Dev , vol 60, no 5/6, Paper 2, pp 2 12 12, 2016
(this issue)
Received October 29, 2015; accepted for publication
December 1, 2015
Bart C. Weimer Davis School of Veterinary Medicine,
University of California, Davis, CA 95616 USA (bcweimer@
ucdavis edu) Dr Weim er is Professor of Microbiology at the
University of California, Davis His laboratory focuses on microbial
physiology and function using systems biology approaches
concerning food, animals, and the environment He has special
interest in host/microbe interactions and the microbiome His
group leads the 100K Pathogen Genome Sequencing Project that
is creating a reference database of 100,000 bacterial pathogens
associated with food, animals, and humans, with the express purpose
of enabling advanced genomics in public health His group has
published over 110 peer-reviewed scientic papers and book
chapters, been awarded six patents, authored three books, and
mentored over 30 graduate students
Dylan Bobby Storey Davis School of Veterinary Medicine,
University of California, Davis, CA 95616 USA (dylan storey@
gmail com) Dr Storey is a postdoctoral fellow at the University of
California, Davis He received B Sc and M Sc degrees in biology
from California State University, Fresno in 2006 and 2011,
respectively, and a Ph D degree in life sciences from the University
of Tennessee, Knoxville in 2014, with fellowships from the National
Science Foundation and the U S Departm ent of Agriculture to
conduct research at the intersections of computer science,
mathematics, and biology His research focuses on the application
of bioinformatic and data-science techniques to large biological
datasets including integration of large-scale -omicdata types
Christopher A. Elkins Center for Food Safety and Applied
Nutrition (CFSAN), Food and Drug Administration (FDA), Laurel,
MD 20708 USA (chris elkins@fda hhs gov) Dr Elkins is Director
of the Division of Molecular Biology in the Ofce of Applied
Research and Safety Assessment at CFSAN, FDA He received a
B A degree in biology and history from Case Western Reserve
University and a Ph D degree in microbiology from the University
of Tennessee, Knoxville, and then served as postdoctoral fellow at
University of California Berkeley At the FDA, he served as sta ff
microbiologist and principal investigator at FDAs National Center
for Toxicological Research His current research interests include
enteric microbiology and antimicrobial resistance mechanisms
Current research directions include genomic-scale analysis
of probiotics and enteric foodborne pathogens, and metagenomic
methods to advance food safety Dr Elkins is a member of the
American Society for Microbiology and serves as Editor of
Applied and Environmental Microbiology
1 : 12 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Robert C. Baker Mars Global Food Safety Center, Mars
Incorporated, McLean, VA 22101 USA (robert c baker@effem com)
Mr Baker is Director of Mars Global Food Safety Center and
Global Head of Technical Food Safety Development for Mars
Incorporated He started his career in the pharmaceutical industry,
moving to the food industry as a microbiologist in 1987 in Mars
Before his current position, Mr Baker wa s the Global Head of Food
Safety for Mars, Incorporated, accountable for the development and
execution of the Corporate Food Safety Management strategy He
received his B S degree in microbiology from Fairleigh Dickenson
University and his M S degree in food science from Rutgers
University Mr Baker is a registered microbiologist and a member
of several professional organizations
Peter Markwell Food Safety Science, Mars Incorporated,
McLean, VA 22101 USA (peter markwell@effem com)
Mr Markwell is Corporate Head of Food Safety Science for Mars,
Incorporated He is responsible for leading Food Safety Science
research projects for Mars, Incorporated These projects are spread
across four strategic platforms pathogen management, mycotoxin
management, raw material integrity management, and transforming
food safety through data integration Mr Markwell has been with
Mars, Incorporated, for 30 years During his career, he has led a
wide range of research programs and published more than 150
papers and research abstracts He has lectured to academic and other
audiences in more than 30 countries
David D. Chambliss IBM Research, Almaden Research
Center, San Jose, CA 95120 USA (chamb@us ibm com)
Dr Chambliss is a Research Staff Member in the Accelerated
Discovery Lab at the IBM Almaden Research Center He received
a Ph D degree in applied physics from Cornell University in 1989,
an M A degree from Cambridge University in 1983, and a B S E
degree from Princeton University in 1981 He leads the development
of the Metagenomics Computation and Analytics Workbench
(MCAW) Dr Chambliss has led projects in storage systems, such
as deduplication, quality of service, access pattern analytics, and
declustered RAID (Redundant Array of Independent Disks) He has
also conducted research in the physics of surfaces, with emphasis on
scanning tunneling microscope studies of epitaxial metal growth
Stefan B. Edlund IBM Research, Almaden Research Center,
San Jose, CA 95120 USA (sedlund@us ibm com)
Mr Edlund is a Senior Software Engineer in the Industrial and Applied
Genomics team at IBM Research - Almaden He has over 15 years of
experience in IBM Research and is currently performing research in
the area of food safety and bioinformatics, where he is designing a
workbench for running and analyzing outputs from (meta)genomic
pipelines Mr Edlund holds an M S degree in computer science from
the Royal Institute of Technology in Stockholm, and he is a member
of the Association for Computing Machinery
James H. Kaufman IBM Research, Almaden Research
Center, San Jose, CA 95120 USA (jhkauf@us ibm com)
Dr Kaufman is a scientist in the Advanced Discovery Laboratory at
the IBM Almaden Research Center in San Jose, California He is
currently principal investigator for the Sequence the Food Supply
Chain Consortium (SFSCC), a new project exploring metagenomics
for food safety He is also an Eclipse project co-lead for the
SpatioTemporal Epidemiological Modeler (http //www eclipse org/
stem) Dr Kaufman received his B A degree in physics from
Cornell University and his Ph D in physics from University of
California, Santa Barbara He is a Fellow of the American Physical
Society, a Distinguished Scientist of the Association for Computing
Machinery, and an IBM Distinguished Research Staff Member
During his career with IBM Research, Dr Kaufman has made
contributions in diverse elds including simulation science,
magnetic device technology, pattern formation, conducting
polymers, diamond-like carbon, superconductivity, experimental
studies of the Moon Illusion, distributed computing, privacy
protection, and grid middleware
B. C. WEIMER ET AL. 1 : 13IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
... The characterization of HPP food microbiomes leveraged current accepted public reference databases, yet it is known that these databases are still inadequate 1,2,11,56,57 . Furthermore, when considering congruence between Salmonella culturability and NGS read mapping techniques, the genetic breadth and depth of multi-genome reference sequences are essential. ...
Article
Full-text available
In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus, Aeromonas , and Citrobacter . We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species’ viability from total RNA sequencing.
... Each lab uses different protocols and the documentation of the actual process is rarely well recorded or presented. Hence, it is paramount to expose, formalize, and standardize sampling techniques, as well as workflows for bioinformatics pipelines, processing, and data management [48][49][50] . In that way, it would be possible to compare datasets and leverage systematic authentication of the microbiome and its variation throughout the supply chain to understand microbial contamination during food production on a broader scale. ...
Article
Full-text available
Microbial food spoilage is responsible for a considerable amount of waste and can cause food-borne diseases in humans, particularly in immunocompromised individuals and children. Therefore, preventing microbial food spoilage is a major concern for health authorities, regulators, consumers, and the food industry. However, the contamination of food products is difficult to control because there are several potential sources during production, processing, storage, distribution, and consumption, where microorganisms come in contact with the product. Here, we use high-throughput full-length 16S rRNA gene sequencing to provide insights into bacterial community structure throughout a pork-processing plant. Specifically, we investigated what proportion of bacteria on meat are presumptively not animal-associated and are therefore transferred during cutting via personnel, equipment, machines, or the slaughter environment. We then created a facility-specific transmission map of bacterial flow, which predicted previously unknown sources of bacterial contamination. This allowed us to pinpoint specific taxa to particular environmental sources and provide the facility with essential information for targeted disinfection. For example, Moraxella spp., a prominent meat spoilage organism, which was one of the most abundant amplicon sequence variants (ASVs) detected on the meat, was most likely transferred from the gloves of employees, a railing at the classification step, and the polishing tunnel whips. Our results suggest that high-throughput full-length 16S rRNA gene sequencing has great potential in food monitoring applications.
... These advantages, combined with downstream inspection of the prioritized rankings, further power biological discovery to bring insightful observations about the genome and the phenotype, especially when large genome populations are used in the analysis, from very divergent populations of alleles. To extend this concept, highly divergent sequences with similar function that are missed with automated gene calling approaches can be brought back into biological relevance, especially if gene mutations are tracked as new genes as was done by Weimer et al. [4] and Kaufman et al. [5][6][7]. ...
Article
Full-text available
Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed PathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms.
... Introduction of machine learning and other computational techniques to the field of biology and medicine have revolutionized the way research can be conducted in these disciplines [2]. Due to machine learning, researchers are now able to leverage the power of data in order to identify patterns that can potentially help solve important problems, such as antimicrobial resistance (AR) [3][4][5][6][7][8][9]and detecting food hazards [10][11][12][13][14]. Computational advances in these areas has also led to the emergence of consumer-centric industries. ...
Preprint
Full-text available
Computational learning methods allow researchers to make predictions, draw inferences, and automate generation of mathematical models. These models are crucial to solving real world problems, such as antimicrobial resistance, pathogen detection, and protein evolution. Machine learning methods depend upon ground truth data to achieve specificity and sensitivity. Since the data is limited in this case, as we will show during the course of this paper, and as the size of available data increases super-linearly, it is of paramount importance to understand the distribution of ground truth data and the analyses it is suited and where it may have limitations that bias downstream learning methods. In this paper, we focus on training data required to model antimicrobial resistance (AR). We report an analysis of bacterial biochemical assay data associated with whole genome sequencing (WGS) from the National Center for Biotechnology Information (NCBI), and discuss important implications when making use of assay data, utilizing genetic features as training data for machine learning models. Complete discussion of machine learning model implementation is outside the scope of this paper and the subject to a later publication. The antimicrobial assay data was obtained from NCBI BioSample, which contains descriptive information about the physical biological specimen from which experimental data is obtained and the results of those experiments themselves.[1] Assay data includes minimum inhibitory concentrations (MIC) of antibiotics, links to associated microbial WGS data, and treatment of a particular microorganism with antibiotics. We observe that there is minimal microbial data available for many antibiotics and for targeted taxonomic groups. The antibiotics with the highest number of assays have less than 1500 measurements each. Corresponding bias in available assays makes machine learning problematic for some important microbes and for building more advanced models that can work across microbial genera. In this study we focus, therefore, on the antibiotic with most assay data (tetracycline) and the corresponding genus with the most available sequence ( Acinetobacter with 14000 measurements across 49 antibiotic compounds). Using this data for training and testing, we observed contradictions in the distribution of assay outcomes and report methods to identify and resolve such conflicts. Per antibiotic, we find that there can be up to 30% of (resolvable) conflicting measurements. As more data becomes available, automated training data curation will be an important part of creating useful machine learning models to predict antibiotic resistance. CCS CONCEPTS • Applied computing → Computational biology; Computational genomics; Bioinformatics;
... Even standardization of techniques and wellcurated and high-quality database of genomic sequence for pathogenic, functional microbes and probiotics is needed for implementation of NGS methods for food safety management [98], groups such as the Consortium for Sequencing the Food Supply Chain (CSFSC) are putting efforts into characterizing and quantifying the microbiome before and after processing as well as collecting genome information on pathogenic bacteria across the food supply chain to assure food safety, traceability and authenticity [99]. ...
Article
Full-text available
Undoubtedly, the food industry is undergoing a dynamic process of transformation in its continual development in order to meet the requirements and solve the great problems represented by a constantly growing global population and food claimant in both quantity and quality. In this sense, it is necessary to evaluate the technological trends and advances that will change the landscape of the food processing industry, highlighting the latest requirements for equipment functionality. In particular, it is crucial to evaluate the influence of sustainable green biotechnology-based technologies to consolidate the food industry of the future, today, and it must be done by analyzing the mega-consumption trends that shape the future of industry, which range from local sourcing to on-the-go food, to an increase in organic foods and clean labels (understanding ingredients on food labels). While these things may seem alien to food manufacturing, they have a considerable influence on the way products are manufactured. This paper reviews in detail the conditions of the food industry, and particularly analyzes the application of emerging technologies in food preservation, extraction of bioactive compounds, bioengineering tools and other bio-based strategies for the development of the food industry.
Article
Full-text available
The implementation of omics technologies and associated bioinformatics approaches hold significant promise for generating additional evidence for food and feed risk assessments thereby enhancing the European Food Safety Authority (EFSA) capacity to deliver scientific opinions and guidance documents in the future. To explore this possibility, EFSA launched a Call for the development of a roadmap to identify the main actions needed for a wider use of Omics in future risk assessments. To address this objective, this action roadmap outlines six project proposals. These proposals are based on a comprehensive mapping of the state‐of‐the‐art omics and associated bioinformatics technologies in research, EFSA's activities as well as current and planned activities from other relevant regulatory bodies and organisations. The outlined recommendations also address some of the identified main knowledge gaps and highlight the added value that further investments in the different food & feed safety scientific domains could bring. In addition, the work in this roadmap addresses some key challenges and blockers that might hinder a wider integration of omics in risk assessment and leverages on the opportunities for cooperation with external stakeholders. Finally, this roadmap provides suggestions on how EFSA may more broadly and effectively engage with relevant stakeholders in the use of omics technologies and associated bioinformatics approaches in regulatory science.
Conference Paper
Abstract— Nanotechnology is one of the most innovative techniques in the food and beverage sector. Nanotechnology applications in many sectors of the food business have expanded as a study into using nanotechnology in food science has progressed. The food sector can use nanotechnology for food production, Food safety & processing, packaging, and quality control. Despite traditional microscale substances, nonmaterials with unique properties can enhance food sensory properties by contributing new texture, color, and appearance. Nanotechnology was utilized to develop Nano sensors for detecting hazardous components in meals as well as a smart packaging system that can identify foodborne illness quickly and accurately. This review study focuses on current advancements in nanotechnology applied to the food and beverage sector starting from use, production, packaging, and safety. Keywords— Nanotechnology; Application; Food & Beverage Sector, Nanomaterial component.
Chapter
In this chapter, we examine how we might go beyond common practice within disciplinary domains to explore new ground in three systemic contexts of resilience and community capacity building: health, urban systems, food systems and how those might impact urban design. We then propose how a transdisciplinary perspective might integrate all three into a novel design approach for resilience capacity building, an approach we call Design + Health.
Article
High throughput sequencing could become a powerful tool in food safety. This study was the first to investigate artisanal cheeses from Belgium (31 batches) using metagenetics, in relation to Listeria monocytogenes growth data acquired during a previous project. Five cheese types were considered, namely unripened acid-curd cheeses, smear- and mold-ripened soft cheeses, and Gouda-type and Saint-Paulin-type cheeses. Each batch was analyzed in triplicate the first and the last days of storage at 8 °C. Globally, 2697 OTUs belonging to 277 genera and to 15 phyla were identified. Lactococcus was dominant in all types, but Streptococcus was co-dominant in smear-ripened soft cheeses and Saint-Paulin-type cheeses. The dominant population was not always associated with added starter cultures. Bacterial richness and diversity were significantly higher in both types of soft cheeses than in other categories, including particular genera like Prevotella, Faecalibacterium and Hafnia-Obesumbacterium in mold-ripened cheeses and Brevibacterium, Brachybacterium, Microbacterium, Bacteroides, Corynebacterium, Marinilactibacillus, Fusobacterium, Halomonas and Psychrobacter in smear-ripened soft cheeses. A strong correlation was observed between no growth of L. monocytogenes in a smear-ripened cheese and the presence of an unknown Fusobacterium (relative abundance around 10%). This in silico correlation should be confirmed by further experiments in vitro and in situ.
Preprint
Full-text available
In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides , Clostridium , Lactococcus , Aeromonas , and Citrobacter . We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species' viability from total RNA sequencing.
Article
Full-text available
Campylobacter is a food-associated bacterium and a leading cause of foodborne illness worldwide, being associated with poultry in the food supply. This is the initial public release of 202 Campylobacter genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Campylobacter genus.
Article
Full-text available
The techniques of microbe community genome sequencing as applied to environmental samples - metagenomics - offer powerful insight into microbial community structure and ecology that can affect food safety decisions for public health security. In this paper, the design and characteristics of a new informatics service, the Metagenomics Computation and Analytics Workbench (MCAW), are presented and illustrated with reference to the analysis of metagenomics data. The service is designed to meet the requirements for analyzing metagenomic and metatranscriptomic sequence data to assess microbial hazards and food authentication in the supply chain. Moreover, MCAW provides for reliable storage and management of raw genomic sequences and analysis results, high-volume informatics processing, meticulous tracking of data provenance and processing steps, and function-rich visualization of results.
Article
Full-text available
Millions of people consume dietary supplements either following a doctor’s recommendation or at their own discretion to improve their overall health and well-being. This is a rapidly growing trend, with an associated and expanding manufacturing industry to meet the demand for new health-related products. In this study, we examined the contents and microbial viability of several popular probiotic products on the United States market. Culture-independent methods are proving ideal for fast and efficient analysis of foodborne pathogens and their associated microbial communities but may also be relevant for analyzing probiotics containing mixed microbial constituents. These products were subjected to next-generation whole-genome sequencing and analyzed by a custom in-house-developed k-mer counting method to validate manufacturer label information. In addition, the batch variability of respective products was examined to determine if any changes in their formulations and/or the manufacturing process occurred. Overall, the products we tested adhered to the ingredient claims and lot-to-lot differences were minimal. However, there were a few discrepancies in the naming of closely related Lactobacillus and Bifidobacterium species, whereas one product contained an apparent Enterococcus contaminant in two of its three lots. With the microbial contents of the products identified, we used traditional PCR and colony counting methods to comparatively assess our results and verify the viability of the microbes in these products with regard to the labeling claims. Of all the supplements examined, only one was found to be inaccurate in viability. Our use of next-generation sequencing as an analytical tool clearly demonstrated its utility for quickly analyzing commercially available products containing multiple microbes to ensure consumer safety. IMPORTANCE The rapidly growing supplement industry operates without a formal premarket approval process. Consumers rely on product labels to be accurate and true. Those products containing live microbials report both identity and viability on most product labels. This study used next-generation sequencing technology as an analytical tool in conjunction with classic culture methods to examine the validity of the labels on supplement products containing live microbials found in the United States marketplace. Our results show the importance of testing these products for identity, viability, and potential contaminants, as well as introduce a new culture-independent diagnostic approach for testing these products. Podcast: A podcast concerning this article is available.
Article
Full-text available
Culture-independent diagnostics reduce the reliance on traditional (and slower) culture-based methodologies. Here we capitalize on advances in next generation sequencing (NGS) to apply this approach to food pathogen detection utilizing NGS as an analytical tool. In this study, spiking of spinach with Shiga toxin-producing Escherichia coli following an established FDA culture-based protocol was used in conjunction with shotgun metagenomic sequencing to determine limits of detection, sensitivity, and specificity levels as well as inform on the microbiology of the protocol. We show that an expected level of contamination (∼10 CFU/100 g) could be adequately detected (including key virulence determinants and strain-level specificity) within 8 hours of enrichment at a sequencing depth of 10,000,000 reads. We also rationalize the relative benefit of static versus shaking culture conditions and the addition of selected antimicrobial agents, thereby validating the long-standing culture-based parameters behind such protocols. Moreover, the shotgun metagenomics approach was informative regarding the dynamics of microbial communities during the enrichment process including initial surveys of microbial loads associated with bagged spinach which included key genera such as Pseudomonas, Pantoea, and Exiguobacterium. Collectively, our metagenomic study highlights and considers various parameters required for transitioning to such sequencing-based diagnostics for food safety and the potential to develop better enrichment processes in a high throughput manner not previously possible. Future studies will investigate new species-specific DNA signature target regimes, rational design of media components in concert with judicious use of additives such as antibiotics, and alterations in the sample processing protocol to enhance detection.
Article
Full-text available
Vibrio parahaemolyticus is the leading cause of seafood-borne infections in the United States. We report complete genome sequences for two V. parahaemolyticus strains isolated in 2007, CDC_K4557 and FDA_R31 of clinical and oyster origin, respectively. These two sequences might assist in the investigation of differential virulence of this organism. FOOTNOTES Address correspondence to Jessica L. Jones, jessica.jones{at}fda.hhs.gov. Citation Lüdeke CHM, Kong N, Weimer BC, Fischer M, Jones JL. 2015. Complete genome sequences of a clinical isolate and an environmental isolate of Vibrio parahaemolyticus. Genome Announc 3(2):e00216-15. doi:10.1128/genomeA.00216-15. Received 13 February 2015. Accepted 18 February 2015. Published 26 March 2015. Copyright © 2015 Lüdeke et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported license.
Article
Full-text available
More than 80% of the microbial genomes in GenBank are of 'draft' quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing "all published genomes" and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an 'A' (codons ending with a 'U') are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.
Article
Lactic acid bacteria are beneficial microbes added to many food products and dietary supplements for their purported health benefits. Proper identification of bacteria is important to assess safety as well as proper product labeling. A custom microarray (FDA GutProbe) was developed to verify accurate labeling in commercial dietary supplements. Strain-specific attribution was achieved with GutProbe array which contains genes from the most commonly found species in probiotic supplements and food ingredients. Applied utility of the array was assessed with direct from product DNA hybridization to determine (1) if identification of multiple strains in one sample can be conducted and (2) if any lot-to-lot variations exist with eight probiotics found on the US market. GutProbe is a useful tool in identifiying a mixture of microbials in probiotics and did reveal some product variations. In addition, the array is able to identify lot-to-lot differences in these products. These strain level attribution may be useful for routine monitoring of batch variation as part of a "Good Manufacturing Practices" process. The FDA GutProbe is an efficient and reliable platform to identify the presence of microbial ingredients and determining microbe differences in dietary supplements. The GutProbe is a fast, rapid method for direct community profiling or food matrix sampling. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Article
Foodborne bacterial pathogens encounter many environmental insults or stresses during food production, processing, storage, distribution, and preparation. However, these pathogens can sense changes in their surroundings and can respond by altering gene expression. A protective response may follow that increases tolerance to one or more stresses. This phenomenon is referred to as stress adaptation and has been shown to aid in the survival of pathogens in food products and in the food processing environment. Furthermore, stress adaptation may alter the virulence properties of pathogens and can contribute to survival in vivo during infection. Elucidating the molecular mechanisms underlying stress adaptation in bacterial food pathogens is essential for the development and implementation of more effective control measures and will permit the design of optimal processing regimes that combine maximum safety with consumer demands for more fresh-like, minimally processed foods. Expected final online publication date for the Annual Review of Food Science and Technology Volume 6 is February 28, 2015. Please see http://www.annualreviews.org/catalog/pubdates.aspx for revised estimates.