Content uploaded by Bart C Weimer
Author content
All content in this area was uploaded by Bart C Weimer on Feb 22, 2018
Content may be subject to copyright.
Defining the food
microbiome for
authentication, safety, and
process management
B. C. Weimer
D. B. Storey
C. A. Elkins
R. C. Baker
P. Markwell
D. D. Chambliss
S. B. Edlund
J. H. Kaufman
Under intense scrutiny for safety and authenticity, our food supply
encompasses probiotic supplementation, fermentation organisms,
pathogenic bacteria, and microbial toxins—in short, the
microbiome and metabolome of food. Recent claims regarding
probiotic supplements, additives, and cultured foods highlight
the need for widely accepted protocols for evidence-based
oversight of such products, as well as specific methods to assess
their safety and authenticity. Rapid improvements in
high-throughput sequencing technologies, curated and annotated
reference databases of whole genome sequences, bacterial strain
banks, and novel informatics techniques coupled to a scalable
computing platform are poised to provide a robust solution
extendable to encompass systematic authentication of the
microbiome and its variations up and down the supply chain.
Members of the Sequence the Food Supply Chain Consortium are
working to characterize and quantify the microbiome at a baseline
and after processing. They are also working to create reference
databases and develop a Metagenomics Computation and
Analytics Workbench, capable of verifying the effectiveness of
good manufacturing practices and monitoring control measures
highlighted in a site’s Hazard Analysis Critical Control Point
plan. In this paper, we propose how microbial ecology,
evolvability, and phylogenetic diversity exhort the application
of new molecular techniques to assure safety, authenticity,
and traceability for wholesome food.
Introduction
According to the Food and Agriculture Organization [1],
by 2050 the world’s food supply must grow by 70% to
meet the increase in the world population [2]. That growth
is projected to occur primarily in tropical zones, where
agriculture is largely developing, and where unique
growing conditions are distinct from those in the current
centers for food production. This will likely lead to new
hazards in the food supply and potentially lead to large
food safety and public health issues as these items are
transported around the world. These regions also have
fewer food safety laws, and this relative lack of laws
enables opportunities for fraud and addition of bacteria
that may not normally be part of the food supply today.
As our food supply continues to grow globally, the
complexities of production, distribution, and consumption
will scale as well. The industry’sabilitytomeasureand
ensure that food and food ingredients are authentic and
safe throughout the global food supply chain is paramount
to guarantee that the global food supply is safe,
well-regulated, and wholesome for consumers. New food
safety regulations are being implemented globally as a
result of this situation. In September 2015, for example,
the United States published the final rules for
implementing the Food Safety Modernization Act
for enabling use of new analytical methods [3]. In
October 2015, China implemented over 150 new foodDigital Object Identifier: 10.1147/JRD.2016.2582598
B. C. WEIMER ET AL. 1:1IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
ÓCopyright 2016 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without
alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed
royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.
0018-8646/16 B2016 IBM
safety laws that also enable new analytical capacity—as
well as a committee to oversee implementation of new
food safety laws, with members from academia and
industry. Both examples highlight the recognition
that expanding the global food supply must be anchored
in safety and quality.
Advanced techniques based in genomics are now
widely available around the world to meet the new
analytical challenges of food safety. This includes
whole genome sequencing (WGS) and, more recently,
metagenomic methods [4–6] (see Table 1 for a definition).
In Table 1, we also provide additional definitions of
some of the important terms commonly used in
discussing genomics and metagenomics. These new
technologies are often called “culture-independent”
or “cultivation-independent”methods, which refer
to molecular techniques that provide information on the
presence, identity, and amounts of microbes or genes,
independent of cultivation media, which may
favor the growth of particular microbes
(thus biasing test results).
These technologies are gaining acceptance but are not
widely implemented in the food supply chain. One recent
case in point involved an investigation of dietary herbal
supplements by the New York State Attorney General’s
office that found significant fraudulent and potentially
dangerous dietary supplements (herbal products) for sale
by major retailers [7]. In this effort, they employed
culture-independent genetic fingerprinting for product
authenticity to determine potential mislabeling and
contamination; the conclusions of this are still unclear.
The approach was similar to genetic “barcoding,”currently
used by the U.S. Food and Drug Administration (FDA)
to investigate fraud in the seafood industry. In the same
vein, the FDA is currently investigating and developing
genomic methods for “probiotic”dietary supplements
involving microarray, sequencing, and associated
bioinformatic analysis for product identification and
labeling verification [8, 9]. These entirely new approaches
are hampered primarily by availability and taxonomic
depth of whole genome sequences for organisms and
microbial communities associated with specific food
types, the effect of processing, and the induced genetic
mutations during processing. Implementation of new
methods for use in food testing is critical to meet the
demands of supply chain globalization.
Table 1 A glossary of some common terms and abbreviations commonly used in a discussion of genomics and
sequencing.
1:2 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Until recently, technologies were not sufficiently
developed to provide a scientifically sound approach that
could be integrated into reliable solutions food producers
or processes could use with confidence. Since 2011,
however, many groups that produce and regulate food
have been working to bring fundamentally new
technologies to bear that can be used routinely to monitor
the safety of the global food chain [6, 10]. Such efforts
may enable the analytical capability to address the
combination of food authenticity and safety, especially in
cases where live microbes are intentionally added and
intended for consumption. A number of countries in the
European Union, North America, and Asia are evaluating
many options for implementation to update methods for
testing, hazard identification, and molecular risk
assessment but all will implement NGS (next-generation
DNA sequencing) for public health and food safety
outbreak investigations [11]. Modernizing food safety
methods throughout the supply chain should help food
producers, ingredient distributors, processors, and
manufacturers ensure that they comply with international
standards and provide confidence that enables trade and
safe consumption with the most modern molecular
methods available. Providing the means to quickly identify
unsafe ingredients before they are integrated into the
supply chain can save billions of dollars of revenue and
minimize risks to consumers by ensuring added beneficial
microbes meet claims on the product label. To this end,
platforms using NGS technology and data for food-safety
applications, with whole genomes as a database, have
potential to improve safety across the food supply. Use of
community sequencing (as mentioned, termed
metagenomics) as well as culture-independent 16S
amplicon sequencing (often used only for studying
microbial diversity) are also being developed on a smaller
scale; both are limited by the costs associated with
generating sufficient data to allow for the robust
determination of microbial community membership
within a sample. However, efforts to study the whole
“metagenome”are growing as evidenced by group
efforts, such as the Sequence the Food Supply Chain
Consortium (SFSCC), described in greater detail later in
this paper [12–14].
After reviewing the persistence of foodborne pathogens
in the environment, we assess strategies to develop and
employ metagenomics as an analytical tool and discuss
how the same metagenomic techniques can serve to
validate probiotic content and activity, and provide
information on various food-related authentication, food
safety, and process issues critical to effective management
of the global food supply. We also describe efforts to
define and articulate the role that the evolving field of
genomics and Next Gen Sequencing (NGS) analytics can
play in monitoring and quality control. We conclude the
paper by describing the development of a scalable
web-based informatics software service for use in
protecting the public health.
Intersection of genomics and enteric foodborne
pathogen burden
Despite efforts to reduce foodborne outbreaks, Salmonella,
Campylobacter, enteropathogenic Escherichia coli,and
Listeria monocytogenes outbreaks continue to occur.
The largest number of cases and outbreaks arise from
Vibrio,whileSalmonella and Campylobacter infections
remain constant in recent years [15]. Recently, Listeria
cases are increasing per capita and, in the United States,
are approaching levels seen during the 1990s [15]. These
trends signal that the extensive processes to mitigate the
persistence of pathogens in previous years are having
limited impact in curtailing foodborne illness. In part,
the globalization of the food chain brings about a new
paradigm in microbial foodborne illness. Long-distance
travel as well as expanded global distribution has fueled
the increased consumption of food from increasingly
distant locations of production. One such example is the
Salmonella outbreak in May 2015 affecting Washington,
D.C., and nine states, attributed to sushi-grade tuna.
The product originated in India and was routed through at
least three ports before it was finally served in the
Washington, D.C., area [16].
It is increasingly recognized that persistence of
pathogens in the environment is leading to linked
outbreaks over many years, which was recently
highlighted by a multi-year outbreak associated with
chicken in the western United States. In this outbreak
from the same Salmonella serotype, at least seven
genotypes were persistent in the food supply for over
five years [17].
Foodborne pathogens can adapt to multiple
environments and animal hosts, which may influence
their persistence in the food supply [18, 19]. These
adaptations occur over very short time frames, days to
weeks (not years). One example of this is found in Vibrio,
where genome evolution leads to local adaptation and
increased persistence, and is induced in response to low
pH, high salt, and temperature shifts in the ocean [20].
These same environmental factors are modified during
food processing to control pathogens in the food supply
and may lead to the evolution of resistant strains that
could become persistent within the processing chain.
Shapiro et al. [20] found that genome-scale changes occur
at a much more rapid pace than was previously
appreciated, and they concluded that this is the basis for
population-scale evolution and can be used to trace the
origin of a specific strain using the NGS and WGS.
Another example of this evolution is the emergence,
worldwide, of hyper-virulent Salmonella during zoonotic
B. C. WEIMER ET AL. 1:3IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
transmission [21]. Using gene expression profiling, the
Weimer lab (School of Veterinary Medicine, University of
California, Davis) demonstrated that an entire series of
metabolic capabilities changes induced hyper-virulence of
this organism. The trait is now stable and is being linked
to the genome evolution and mutation rate using NGS
and methylome characterization [22]. Lastly, Salmonella
enterica serotype Enteritidis, usually considered a “clonal”
organism, was detected using commonly available
typing methods, such as pulsed field gel electrophoresis
(PFGE). Recently, however, Salmonella Enteritidis was
sequenced using short read technologies with isolates
from food, the environment, and a sea mammal outbreak
to find that this serotype is much more genetically diverse
than previously appreciated and that specific lineages
can be conclusively determined using NGS and WGS [23].
These specific examples demonstrate the rapid and
consistent diagnostic power of NGS methods compared to
conventional pathogen typing methods—including
PFGE, Multi-Locus Sequence Typing (MLST), and
Multiple Locus Variable-Number Tandem Repeat Analysis
(MLVA). This realization presents food safety science
new challenges involving the implementation of new
methods that rapidly detect outbreaks, produce
actionable information, and allow quick interventions
during outbreaks.
Current surveillance schemes commonly mandate
culturing microbes from subsamples of samples, usually
under conditions that constitute a stressful environment
for the pathogen. These bacterial populations can be
used for detection of outbreaks and traceability. However,
the long waiting times involved (∼10 days for the result
and 3 to 4 weeks for serotype identification) are not
meeting the demands of the current U.S. regulatory
environment, or the spirit of the Food Safety
Modernization Act [3].
Delays in pathogen detection time negatively impact
public safety and the ability of producers, processors,
retailers, and consumers to adequately limit pathogen
levels below those expected by public health authorities.
Compounding detection time delays is the mounting
evidence that conditions used to process food create
selection pressures that result in population shifts
within the community, via induced mutations [20].
Shah et al. [24] recently verified that conditions used in
food processing lead to increased survival and virulence,
especially with temperature and acid stress, that
completely rescued Salmonella from the lethal conditions
applied to the sample, and led to resistance for
antimicrobial treatments resulting in growth and survival
at the same rate as the untreated culture.
Recent advances in sequencing technologies present
new opportunities to introduce molecular risk assessment
into public health via molecular genetics to more fully
appreciate how microbial adaptation may affect hazards
from bacteria in the food supply. These combined
examples demonstrate that genomic determination,
by the application of NGS and bioinformatic workflows,
is uniquely positioned to empower new and innovative
analyses for foodborne pathogen detection, outbreak
traceability, and human case linkage in a time scale
that matches the epidemiological demands for a global
food supply.
Whole genome sequence availability and
challenges for culture-independent analytics
One hindrance to implementation of an NGS-based
approach is the lack of a baseline set of genomic
information that captures the genomic diversity of
zoonotic pathogens of significance to the food supply and
public health. Two studies highlight increased knowledge
from the genome sequence. First, the Genomic
Encyclopedia of Bacteria and Archaea (GEBA) project
sequenced 3,500 genomes that are specific phylogenetic
branches, with the intent of increasing the breadth of
phylogenomic-based assignments. With as few as 50 new
genomes spread over the tree of life, approximately
1,060 new protein families were found [25]. In an opposite
approach, 34 very closely related Salmonella genomes
were sequenced to find a linear increase in new protein
families [26]. These two studies demonstrate that more
genome sequences are needed to provide a comprehensive
database to represent the global genome for foodborne
bacteria. These results, combined with the observation that
abiotic stress induces genome diversity in Vibrio bacteria,
suggest that a large database of genomic resources is
essential to capture the breadth of genomic diversity
required for informed decisions in food safety applications
and comparisons for disease outbreak investigation.
Information required to create more complete databases is
growing quickly with the release of new genomes [27–29].
Genomes can be found in public databases such as
the Short Read Archive (SRA), European Nucleotide
Archive (ENA), and DNA Data Bank of Japan (DDBJ).
Bioprojects, such as the 100K Pathogen Genome
Project (PRJNA186441), currently contain over
4,200 sequencing runs.
The genomic sequence availability of the most
common foodborne pathogens is uneven and does not
sufficiently represent the required genetic diversity of each
organism to make adequate public health decisions
(Table 2). The diversity of Salmonella genomes [23, 26]
suggests that a very large number of genome sequences
are required to provide an adequate reference database to
pinpoint specific epidemiology questions and to begin
tracking virulence traits, such as antibiotic resistance
and effector molecule conservation. With this type of
information a new field of molecular risk assessment can
1:4 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
be imagined that will bring about a new perspective on
public health risk [4]. Additionally, it provides the basis
for mitigation strategies. The lack of this resource
highlights the dilemma of linking serotype with virulence
as well as the burgeoning increase in sero-diversity of
Salmonella and the emergence of hypervirulence in
Salmonella and other foodborne pathogens.
In a related enteric pathogen example, non-O157
E. coli serotypes, as well the unique combinations of
emerging hybrid virulent pathotypes, highlight a need for
prospective vigilance. The 2011 outbreak of E. coli
O104 in Europe demonstrates how a rarely pathogenic
serotype became deadly through chimeric horizontal
gene transfer, loss of a key diagnostic gene (eae),
and large-scale genome rearrangement [30, 31]. The
scenario presented a very difficult problem to identify
and track using traditional methods, but was successfully
solved using NGS workflows [31, 32]. However in
retrospect, this strain was very close in sequence,
hybrid pathotype, and overall genomic architecture to
Republic of Georgia strains dating from 2009 [30].
If these closely related strains had been captured by using
real-time genomic databases, the outbreak would have
been limited using appropriate diagnostic markers and
serotype awareness. Unfortunately, in the absence of
such efforts, it became the largest E. coli outbreak with
Shiga Toxigenic E. coli (STEC) etiology in terms of
magnitude/severity of illness and death. Thus, the
question of sequencing depth can most suitably be
approached with these two enteric pathogens. In 2008,
Rasko et al. [33] examined core, unique, and total genes
for 18 E. coli isolates with a presumed asymptotic limit
of 15,000 to 20,000 genes. Using this public data as an
example, Figure 1 shows that the number of genes
discovered within the Escherichia pangenome increases
with the number of sequenced genotypes as a power law.
Our observations of the approximately 3,300 currently
complete and high-quality publicly available assemblies
would suggest that the limit for new discoverable genes
has now more than doubled. It must be noted that a
single limit on discoverable genes is not a universal
concept to be applied to every bacterial species, given the
inherent differences in their clonality and diversity
relative to S. enterica sensu stricto (Figure 2). Indeed,
the power law behavior observed in Figure 1 suggests
there may be no limit. In addition, clinical isolates
dominant the E. coli phylogenetic tree that is
underrepresented for environmental, commensal, and
probiotic (e.g., see Table 2) isolates and includes some
newly observed cryptic lineages. Therefore, depth and
strain attribution become relative to species and, in fact,
should redefine taxonomies and, perhaps more
importantly, the species concept itself. As we move
forward with culture independent analytics and
metagenomics, specificandverifiable identification of
organisms hinges on availability of reference WGS and
will be paramount for realizing and developing the
full, applied potential of this technology for food
safety and authentication.
Figure 1
Observed pangenome using publically available closed or partially
assembled E coli whole genome sequences from NCBI The best
functional fit to this data is a power law (not logarithmic or exponential)
Curve fitting was done with respect to 344 observed genome types
(representing 3,348 genome sequences), with new genes determined
using BLAST criteria of 95% identity over 90% of the gene length
Genome types were identified to computationally condense and simplify
whole genome data to a common reference representative with more
than 0 2% genetic difference
Table 2 Genome sequence availability of probiotic
bacteria from various public databases (as of 9/2015).
The data in this table comes from the National Center
for Biotechnology Information (NCBI). Sequence Read
Archive (SRA) submissions include both DNA and
RNA sequencing samples.
B. C. WEIMER ET AL. 1:5IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Cultured foods as a paradigm for microbial
signature process monitoring
For proof of concept, foods with live microbial cultures
such as yogurt present a measurable microbial signature to
stratify the nature of changes in the food microbiome
during manufacturing. The use of probiotic bacteria has
been projected to increase [34], as it becomes more
common to add probiotics to many types of foods that
result in microbiome shifts via live and active strains [35].
Both health and nutritional benefits of probiotics have
been claimed. Barzegari et al. [36] suggest that probiotic
additives should be designed for different foods that target
succession of gut microbiota communities at different
developmental stages. Ganesan et al. [37] found that
probiotics survive well as food ingredients for extended
periods in fermented products, where, although perhaps
not culturable, they actively produce bioactive compounds.
Together, these findings indicate that the baseline or
foundational microbiome of a food can shift, as beneficial
bacteria are added. Relative (and absolute) abundance of
these beneficial organisms may be predictive of addition
and safety. This concept suggests that a stable and
disease-resistant microbiome in the supply chain can
ultimately reduce foodborne disease. Measurement of
the microbiome of foods and their ingredients will enable
robust classification and predictions that can be used in
developing bioinformatic toolsets. It can also support
the definition of new biomarkers for surveillance assays
to routinely test the fidelity of products in the food
supply chain.
Probiotics marketed with generalized health claims
should contain live and active cultures. In cultured
foods, standard plating techniques are unlikely to provide
accurate profiles of the food microbiome. Interestingly,
addition of single probiotic bacteria can shift the food
microbiota in many fermented products, in addition to
changing the metabolite profiles [37, 38]. This is also
observed for production of bioactive ingredients that alter
the microbial community using small molecules [39–43].
This body of work presents a challenge that must be
solved by integrating multiple genomic, proteomic, and
metabolomic (so-called “multi-omic”) methods for use in
analysis individual methods are not sufficient to fully
capture the live and active nature of the biosystem.
This is especially true with addition of a live and active
probiotic intended to alter the human microbiome and
health as well.
The food matrix itself can be viewed as the habitat for a
community of microbes; as food ages and spoils, the
resulting biochemical changes leads to a succession in the
community. Similarly, routine processing and handling
(such as marinating meat) leads to changes in pH, salinity,
and a corresponding change in the microbial community.
Numerous scientific studies have demonstrated the
importance of microbial ecology to understanding the
effects of food processing and spoilage [44].
Microbial communities are dynamic systems that change
in response to changes in their environment including, for
example, the addition of new (probiotic) organisms [45].
Traditionally, sequencing of only 16S genes has been used
to define community membership. However, this measure
is uneven for phylogenetic estimation and often leaves
one wanting more information about the state of the
community and its viability. By studying all the genes
(i.e., the complete metagenome), the community
composition and functional potential can be observed,
so as to provide evidence for a safe microbiome, or a
microbiome functioning as intended (e.g., for a fermented
product). This is predicated on the hypothesis that the
background microbiome inherent to the food is sufficiently
stable to be predictive of both the food type (source) and
shifts in response to the addition of specific components
that drive the biochemistry within the food to shift the
community. Viewed as a biochemical dynamic system,
a food or food ingredient provides an environment for a
predictable “set of genes”(i.e., sets of organisms and sets
of metabolic pathways) restrained by the active genes
in the microbiome (i.e., expressed RNA) [46]. For this
reason, it is not sufficient to study microbiome DNA (most
of which may be derived from the host ingredient itself).
Rather it is necessary to identify both normal and
abnormal microbial communities in terms of the active
genes derived from live cells (host and microbiome)
that cooperate to create a novel environment (that may
Figure 2
Comparing the species landscape diversities for the genera
Escherichia versus Salmonella, which are major foodborne pathogens
with available and extensive sequence depth The scale bars indicate a
genetic distance of 1% relative to the respective core genome
Salmonella and Escherichia are both classified as genera but the spe-
cies diversity (and the phylogenetic complexity) of Escherichia is
higher than that of Salmonella Neighbor joining trees were generated
from the proportional distance of SNP (single nucleotide polymor-
phism) differences in core genes using the MEGA 3 1 software
1:6 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
reflect the changing metabolite profiles mentioned above).
The RNAseq data, a set of transcripts, can be used for
identification of genes and for studying the functions they
are performing [47].
We envision using metaRNAseq, not as a tool for gene
expression, but rather as a tool to reveal genes and gene
sets that change as the microbiome shifts in time or in
response to changes in food composition [48]. This
approach may be useful analytically by measuring the
resulting meta-transcriptome of the microbiome
direct-from-product to discover hundreds of evidence lines
that may be predictive of process effects relative to
specific organisms in the community, and how they shift
the microbiome in constructive or “healthy”ways.
Constructive shifts may include resistance to spoilage
and/or protection from pathogenic incursion. Conversely,
undesirable shifts could favor pathogen growth and lead
to an increased risk of disease, again which could be
mitigated by process control. These aims directly require
the use of measures for live and active organisms in
addition to the metabolites that they produce after addition
and during storage.
Focusing initially on the community metaRNAseq
signature (mRNAsig) has several additional advantages
over the use of culturing followed by WGS or WGS
sequencing alone. It is “agnostic”to and robust against
horizontal gene transfer and the transfer of plasmids.
Over time, evolving organisms may respond to pressure by
incorporating new genes. The mRNAsig will detect this
and can be made use of in detection, source tracking,
and outbreak investigation. Since the RNA transcriptome
will also provide ribosomal 16S RNA, this approach also
provides an opportunity to calibrate metaRNAseq
method with respect to more traditional 16S community
analyses, allowing for both species and gene
identification [47, 48].
As mentioned previously, implementation of an
NGS-based approach for authentication of probiotic
microbiota in food requires a reference database of whole
genomes (WGDs) and metagenomes (MGDs). WGDs are
needed to ensure that added bacteria are classified
correctly in the community and to capture the genomic
diversity of specific organisms. The food microbiome
database is also needed to determine what is “normal”
and how this shifts with and without probiotic bacteria
supplementation. The absence of such a standard
database is a significant hindrance to global food safety
and public health in broadly implementing genomics-based
methods. Use of projects like GenomeTrakr and the
100K Pathogen Genome Project attempt to solve this for
pathogenic bacteria, but there is not yet a coordinated
effort to address this problem or to validate the
completeness of candidate references in the probiotic and
functional microbe arena. Validation and authentication of
probiotic composition, by definition, must detect not only
organisms by names, but also the active genes in a
probiotic microbiome.
The Lactic Acid Bacteria Genome Consortium also used
this database-development approach in the early 2000s to
sequence the genomes of closely related lactic acid
bacteria, resulting in reclassification of the entire group of
organisms [49]. Since then, additional genomes from this
group of probiotic bacteria have been sequenced, and
technologies have advanced so that closed genomes can be
produced in a short time.
Genome sequencing: Quality control and scaling
for discrete data
The availability of genomic sequences for bacteria offers
resources for forming reference data sets for analysis, but
these resources are uneven, poorly curated, of questionable
quality, and do not adequately represent the required
organism diversity [50].
The same is true for probiotic organisms (Table 2).
Recently it was found that the SRA has a large proportion
of sequences from bacterial sources that are not of
sufficient quality to be used as a reference because they
cannot be assembled into a meaningful sequence [51–54].
For example, the diversity of Salmonella genomes
(Figure 3) suggests that a very large number of genome
sequences may be required to provide an adequate
Figure 3
The cumulative rate of discovery of putative “new genes”as a function
of the number of isolates of Salmonella increases as a power law
(linear on a log-log plot) Sequencer output for each of 792 isolates
was assembled using ABySS (abyss-pe v 1 5 2) [52, 53], and the re-
sulting contigs (contiguous overlapping sequences) were annotated
using Prokka (v 1 10) [54] Random subsets of those isolates were
selected Each point is the number of distinct amino acid sequences
inferred for regions annotated as hypothetical proteins for one subset,
plotted against the number of isolates in that subset
B. C. WEIMER ET AL. 1:7IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
reference database to pinpoint specific epidemiology
questions and to begin tracking virulence traits, such as
antibiotic resistance and effector molecule conservation
that are known to be directly involved in disease risk and
outbreaks. Figure 3 shows the cumulate rate of observation
of “new”putative genes as a function of number of
isolates sampled. The data was obtained by de novo
assembly of over 4,000 individual isolates from the 100K
Pathogen Genome Project. A subset of 792 samples was
then selected at random in separate (i.e., jackknife) trials,
and the cumulative gene count was determined for each
trial. The approximately linear behavior observed on the
log-log plot suggests a power law behavior, with the rate
of cumulative gene discovery increasing with the number
of isolates—as observed above with de novo assembly
data. In the following formulation, Ngenes is the number of
genes, and !isolates is the number of isolates:
Ngenes /!0:465
isolates:
The exponent observed in this analysis of Salmonella
diversity (0.465) is quite close to the exponent observed in
our study of Escherichia suggesting possible universality.
This type of scale-free or power-law behavior is
reminiscent of the MacArthur-Wilson scaling law, well
established in insular biogeography and other population
ecology studies [55, 56]. Considering that horizontal gene
transfer is common in the environment and among many
types of bacteria, the power-law behavior suggests that
each and every isolate man be considered as a “sample”
from a different microbial community, and the diversity
of that community scales with the number of isolates
just as the biodiversity in any ecological habitat scales
with the area of the underlying habitat. Understanding the
underlying scaling behavior will allow prediction of the
number of samples required for gene discovery and can
reveal the universality of observed exponents. This
perspective is new to food microbiology and enables the
move to a population-based understanding of the
isolates that treats them as individuals rather than a
homogenous type of organism, e.g., Salmonella.The
concept can also be expanded to metagenomes or
metaRNAseq data sets, where individual samples have
ecologies that are unique within a type of environment
(e.g., a factory, a field, or a food source).
Over time, as the necessary data are collected, a new
field of molecular risk assessment will evolve, bringing
new perspectives on public health benefits and risks.
These in turn will enable a new type of large scale
genomic-based microbiology to solve long standing
questions that traditional isolate and growth microbiology
cannot address. For these reasons, the development of
reference “type”cultures and genomes of probiotic taxa
and metagenomic databases are critical for avoiding errors,
such as concluding about the presence of an organism that
is not actually in the sample (a false positive).
Sequence the food supply chain: Paths to
harnessing HTS analytics
Recently, Mars and IBM announced the SFSCC, a
community effort to explore how advanced
next-generation sequencing technologies can be harnessed
as an analytical tool, providing a window into food
maturation processes at a scale and depth never before
achieved [12–14]. Consortium members, including
universities, are conducting experiments to follow the
microbiota of foods and food ingredients using
meta-transcriptomes. This technique can provide
conclusive evidence that demonstrates live and active
bacteria in the food as well as quantitative estimates
for monitoring probiotic mixtures and validating ingredient
claims, which are not possible using culture methods.
More generally, the method provides insight into how
microbial ecology affects any food [57].
In addition to microbial content shifts, parallel
controlled studies of the metabolome in the same samples
can provide evidence for the composition, diversity,
and metabolic byproducts of probiotic additives, as well as
metabolic shifts in the underlying microbiome. Moreover,
defining new fast and inexpensive small molecule
monitors and tests can be implemented in a manufacturing
environment as surrogate markers, and provide early
warning signs to use more advanced and definitive tests,
such as HTS (high-throughput sequencing), to confirm the
initial results. Broadly stated, the impact of this program
could go beyond safety to guide new food development
and process improvement paradigms for improved
efficiency and significant economic benefit. Studying the
complex communities of organisms in food requires
this type of multidimensional program. He et al. [58]
recently demonstrated that there is a predictable and
microbe specific response in vitro. They combined
meta-transcriptomics and NMR metabolomics to determine
that specific strains of E. coli have different small
molecule signatures during infection and association.
A scalable informatics platform to address the food
microbiome using culture-independent HTS and
metaRNAseq can be extended to metabolomics directly in
food. This would establish the baseline microbial
community for food ingredients and measure how the
inclusion of new organisms shifts both the microbiome
and desirable food chemistry. Understanding this baseline
requires more than just organism identification.
Quantification of the organisms, the gene sets, and their
dynamics is also required. We are targeting four areas
for investigation as prerequisites to harnessing HTS
analytics for food safety food applications. The first three
provide the framework for the fourth.
1:8 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Area 1: Characterization and quantification of the
baseline microbiome
A starting hypothesis is that microbial communities are
strongly and consistently influenced by their surrounding
environment. We anticipate that between communities,
group metabolism is patterned, consistent, and predictive,
so that actively expressed gene sets can be used as
markers of particular communities. Determining the
normality or core gene sets, as well as those that are
highly variable between foods, can then fuel authentication
estimates. These in turn will predict specific communities
that are indicative of environmental (or ingredient)
shifts, including shifts indicative of food fraud.
We further hypothesize that communities from different
regions (but the same food product) may vary in
taxonomic community measurements, but will be
conserved in actively expressed metabolic pathways.
As such, it is necessary to investigate and compare
communities in terms of both phylogenies and metabolic
function so as to identify specificsetsofgenesthatare
predictive, given food type and chemistry.
Area 2: Characterization and quantification of the
shifting microbiome after the addition of probiotics
The introduction of new members to a community
(e.g., probiotic additives) to food products and ingredients
will shift the community equilibrium to a new steady
state. We anticipate that this shift will be measurable and
consistent through changes in gene expression (through
RNAseq) and metabolites, and that these shifts will
occur in a predictive manner over time. The acquisition of
this data, in combination with the baseline active
microbiome estimates of raw products, will also allow for
the development of predictive models and robust assays
through data feature extraction.
Defining the mathematical relationship between
probiotic communities in finished products as a function
of the starting microbial communities, introduced bacteria,
and environmental factors requires a saleable
microbiological and computational effort.
Area 3: Systematic creation of a reference database of
probiotic genomes from well-defined isolates
Common analytical methods associated with HTS are not
exempt from errors that may falsely indicate risk where
there is none or fail to detect risk where it in fact exists.
The probability of such errors can be minimized by
implementing good experimental design and rigorous
statistical techniques. In order to further minimize
these risks the systematic creation of a reference database
of probiotic genomes from well-defined isolates is
paramount. While other projects are doing this for
pathogens, no such project yet exists for probiotic bacteria
or metaRNAseq microbiome estimates.
Area 4: Deployment of a software service for the analysis
of microbiomes
Leveraging a systems biology approach in an
environmental setting will allow samples to be
characterized in terms of both taxonomy and function.
Ensuring that this is performed in a timely fashion requires
flexible and scalable informatics solutions. However,
industry adoption of new molecular methods may result in
unintended observations of previously unrecognized
hazards in the food supply. It is likely that in-depth
sequencing will uncover previously unrecognized and
undetected community members. Caution is required to
ensure that the sequences do not falsely indicate the
presence of organisms (which are not present) if short
sequence comparisons are used. Creation of a probiotic
reference culture and metagenomic database will reduce
the likelihood of false positives, determine sensitivity
and selectivity of molecular protocols, and support
proper calibration of molecular methods, all required for
effective regulatory policies and action.
To address this fourth area, we have created the
Metagenomics Computation and Analytics Workbench
(MCAW), a scalable web-based informatics system [59].
Understanding that not all organisms, and not all genes,
have been identified in food systems, we recognize that
the system needs to accommodate and make use of a
growing corpus of reference data, probably expanding to
hundreds of terabytes. Therefore, the system is extensible,
allowing for new informatics techniques to be incorporated
and for workflows to be developed to reflect new scientific
knowledge. This provides elastic performance and
high-performance parallel computing across multiple
nodes to service the needs of multiple concurrent users,
each with terabytes of data of interest. A future
cloud-based implementation will be necessary to support
the productivity of users in different roles who may not be
experts in software and systems, efficiently presenting
views of pivotal metrics, and automatically coping
with metadata and storage management on tens of
thousands of datasets.
Analysis of a variety of microbiome metadata sets
demonstrates that as many as 50% to 80% of
metaRNAseq reads are unidentified [49]. However,
these same unidentified reads are repeatable features of
datasets from equivalent sources. For this reason,
a metagenomic software service for food safety must
track both annotated and un-annotated transcripts, and
compare both known and unknown components from
sample to sample. This necessitates a well-designed
highly structured database and associated operations for
querying and adding data. Furthermore, extraction of
salient features from the data for use as predictive markers
will be necessary in order to deliver robust assays for use
in field settings.
B. C. WEIMER ET AL. 1:9IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Conclusion
Ongoing improvement in next-generation sequencing
technologies, curated and annotated genomic references,
and bacterial strain banks are poised to enable new food
safety, traceability, and quality solutions, leveraging
systematic authentication of the microbiome and its
variation up and down the supply chain. This approach
depends upon easily available and scalable computation
and informatics. Bioinformatic workflows are too often
built and executed based on ad hoc lab specificworkflows,
with documentation of the actual process rarely being well
recorded or presented. In order for next generation
sequencing to realize its full potential across a wide
variety of food authentication, quality, and food safety
applications, there is a need to expose, formalize, and
standardize workflows for bioinformatics analysis,
processing, and data management. The provenance of
intermediate and final results data must be retained in a
system of record, both to support scientific validation and
to provide auditability of externalized conclusions.
Industrial organizations seeking to make use of
genomic, metagenomic, and metabolomic data to protect
food safety require that the informatics technologies
be available as an inexpensive and scalable cloud
computing and web-based service. With such a service, all
data sets and the workflows used to create them should
be stored using well-designed data management systems that
make use of industry-standard software and architecture.
The critical role of the bioinformatics used to create the
necessary evidence requires the creation, demonstration,
and use of an extensible and modular informatics
software service. As discussed above, standard libraries
for authentication of probiotic composition (and safety)
will improve as more genome sequences become available,
providing a robust database to represent the global genome
for use in metaRNAseq and other metagenomic
applications. As such, a bioinformatic software service
must index and track the history of all reference databases
used in analyses by dataset. It should also track and
organize the inevitable discovery of “unknown”proteins
(genes) and gene families and indexing that data to aid in
the refinement and growth of the reference database itself.
The Metagenomics Computation and Analytics
Workbench, now under development by the SFSCC, offers
a model for a scalable software service that applies HTS
technologies to food safety. The workbench builds upon
the consortium’s research, described above, to characterize
and quantify the microbiome from baseline to final steady
state and create a reference database from well-defined
isolates to support the analysis of microbiomes and
development of assays.
The consortium’s research promises to impact other
related and important goals. We expect the new
biomarkers identified by this work will have a variety of
applications including surveillance assays to routinely
test the fidelity of products in the food supply chain.
Furthermore, the informatics service now in place to
achieve the goals of this project will have direct
application to an even broader range of goals for food
safety. The FDA, the U.S. Department of Agriculture,
the Centers for Disease Control and Prevention, and other
entities all stand to benefit from an informatics facility that
is reproducible, scalable, flexible (extendible) and high
performance. It will drive end-to-end workflows on large
collections of samples, so that both the labor cost and risk
of error in production deployments can be minimized.
The system can be adapted to several application and
use cases and the platform can accommodate users with
different roles and expertise, be resource- and
cost-efficient, and scalable through cloud and other
distributed techniques.
Acknowledgments
We would like to acknowledge Mr. Mark K. Mammel
and Dr. David W. Lacher from the Food and Drug
Administration, Division of Molecular Biology,
Center for Food Safety and Applied Nutrition, for their
contributions to this manuscript including the creation of
Figures 1 and 2. We also thank Judy Douglas for her
help and advice.
References
1 Food and Agriculture Organization of the United Nations,
“2050 A third more mouths to feed [Online] Available http //
www fao org/news/story/en/item/35571/icode/
2 R A Fischer, D Byerlee, and G O Edmeades, “Can
technology deliver on the yield challenge to 2050,”presented at
the Expert Meeting on How to Feed the World in 2050, 2009
[Online] Available http //www z-punkt de/megatrends-ueber-
sicht html
3“Food safety modernization act (FSMA),”US Food Drug
Admin , Silver Spring, MD, USA, Public Law 111-353, 2011
4 C A Elkins, M L Kotewicz, S A Jackson, D W Lacher,
G S Abu-Ali, and I R Patel, “Genomic paradigms for
food-borne enteric pathogen analysis at the USFDA Case studies
highlighting method utility, integration and resolution,”Food
Additives Contaminants, A Chem Anal Control Expo Risk
Assess , vol 30, pp 1422–36, 2013
5 S R Leonard, M K Mammel, D W Lacher, and C A Elkins,
“Application of metagenomic sequencing to food safety
Detection of shiga toxin-producing escherichia coli on
fresh bagged spinach,”Appl Environ Microbiol , vol 81,
pp 8183–91, 2015
6 T M Bergholz, A I Moreno Switt, and M Wiedmann,
“Omics approaches in food safety Fulfilling the promise?”
Trends Microbiol , vol 22, pp 275–281, 2014
7 A O’Connor, “New York attorney general targets supplements
at major retailers,”New York Times, New York, NY, USA,
Feb 3, 2015 [Online] Available http //well blogs nytimes com/
2015/02/03/new-york-attorney-general-targets-supplements-at-ma-
jor-retailers
8 J N Patro, P Ramachandran, J L Lewis, M K Mammel,
T Barnaba, E A Pfeiler, and C A Elkins, “Development and
utility of the FDA ‘GutProbe’DNA microarray for identification,
genotyping and metagenomic analysis of commercially available
probiotics,”J Appl Microbiol , vol 118, pp 1478–88, 2015
1 : 10 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
9 J N Patro, P Ramachandran, T Barnaba, M K Mammel,
J L Lewis, and C A Elkins, “Culture-independent metagenomic
surveillance of commercially available probiotics with
high-throughput next-generation sequencing,”mSphere, vol 1,
no 2, Mar -Apr 2016, Art no e00057–16 doi 10 1128/
mSphere 00057-16
10 K Kupferschmidt, “Epidemiology Outbreak detectives embrace
the genome era,”Science, vol 333, no 6051, pp 1818–1819, 2011
11 J Whitworth, “WGS will become sole method but current typing
capacity should be available-ECDC,”Food Quality News,
Oct 23, 2015 [Online] Available http //www foodqualitynews
com/Lab-Technology/WGS-cost-and-time-comparable-to-current-
typing-methods
12 J Welser, “Sequencing the Food Supply Chain How a New
Consortium Will Improve Food Safety,”Forbes, Jan 29, 2015
[Online] Available http //www forbes com/sites/ibm/2015/01/29/
sequencing-the-food-supply-chain-how-a-new-consortium-will-
improve-food-safety/
13 “Launch pioneering effort to drive advances in global food
safety,”IBM, Mars, Inc , Armonk, NY, USA, Jan 2015
[Online] Available http //www mars com/global/press-center/
global-food-safety aspx
14 Consortium for Sequencing the Food Supply Chain IBM
Research and Mars tackle global health with food safety
partnership [Online] Available http //www research ibm com/
client-programs/foodsafety/
15 “Enteric diseases epidemiology branch Surveillance reports,”
Centers Disease Control Prevent , Foodnet, Atlanta, GA, USA
[Online] Available http //www cdc gov/ncezid/dfwed/edeb/
reports html
16 L Bever, “Salmonella outbreak linked to raw tuna sushi spreads
to nine states,”The Washington Post, Washington, DC, USA,
May 22, 2015 [Online] Available http //www washingtonpost
com/news/morning-mix/wp/2015/05/22/salmonella-outbreak-linked
17 A C Kimura, V Reddy, R Marcus, P R Cieslak,
J C Mohle-Boetani, H D Kassenborg, S D Segler,
F P Harnett, T Barrett, and D L Swerdlow, “Chicken
consumption is a newly identified risk factor for sporadic
Salmonella enterica serotype Enteritidis infections in the
United States A case-control study in FoodNet sites,”Clin
Infect Diseases, vol 38, suppl 3, pp S244–S252, 2004
18 V Ferreira, M Wiedmann, P Teixeira, and M J Stasiewicz,
“Listeria monocytogenes persistence in food-associated
environments Epidemiology, strain characteristics, and
implications for public health,”J Food Protect , vol 77,
pp 150–70, 2014
19 M Begley and C Hill, “Stress adaptation in foodborne
pathogens,”Annu Rev Food Sci Technol , vol 6, pp 191–210,
2015
20 B J Shapiro, J Friedman, O X Cordero, S P Preheim,
S C Timberlake, G Szabó, M F Polz, and E J Alm,
“Population genomics of early events in the ecological
differentiation of bacteria,”Science, vol 336, pp 48–51, 2012
21 D M Heithoff, W R Shimp, J K House, Y Xie,
B C Weimer, R L Sinsheimer, and M J Mahan, “Intraspecies
variation in the emergence of hyperinfectious bacterial
strains in nature,”PLoS Pathogen, vol 8, no 4, 2012,
Art no e1002647
22 P Chen, R Jeannotte, and B C Weimer, “Exploring bacterial
epigenomics in the NGS era-a new approach for an emerging
frontier,”Trends Microbiol , vol 22, pp 292–300, 2014
23 X Deng, P T Desai, H C den Bakker, M Mikoleit, B Tolar,
E Trees, R S Hendriksen, J Frye, S Porwollik, B C Weimer,
M Wiedmann, G M Weinstock, M McClelland, and
P I Fields, “Genomic epidemiology of Salmonella enterica
serotype Enteritidis based on population structure of prevalent
lineages,”Emerg Infect Diseases, vol 20, no 9,
pp 1481–1489, 2014
24 J Shah, P Desai, D Chen, J Stevens, and B C Weimer,
“Proteomics of cold stress in Salmonella enterica sv Typhimur-
ium LT2,”Appl Environ Microbiol , vol 79,
pp 7281–7289, 2014
25 D Wu, P Hugenholtz, K Mavromatis, R Pukall, E Dalin,
N N Ivanova, V Kunin, L Goodwin, M Wu, and J A Eisen,
“A phylogeny-driven genomic encyclopedia of Bacteria and
Archaea,”Nature, vol 462, pp 1056–1060, 2009
26 A Jacobsen, R S Hendriksen, F M Aaresturp, D W Ussery,
and C Friis, “The Salmonella enterica pan-genome,”Microb
Ecol , vol 62, pp 487–504, 2011
27 C H Lüdeke, N Kong, B C Weimer, M Fischer, and
J L Jones, “Complete genome sequences of a clinical and an
environmental Vibrio parahaemolyticus isolate,”Genome
Announce, vol 3, no 2, 2015, Art no e00216
28 D B Storey, A M Weis, N Kong, A K Townsend,
W A Miller, B A Byrne, C C Taff, B Gilpin, C Mason,
C Fitzgerald, and B C Weimer, “Large-scale release of
Campylobacter draft genomes; resources for food safety and
public health from the 100K Pathogen Genome Project,”100K
Project Bioproject, 2015 [Online] Available http //www ncbi
nlm nih gov/bioproject/?term=PRJNA186441
29 D B Storey, N Arabyan, W Ng, K Thao, P Chen, N Kong,
C Huang, S Fouthoui, and B C Weimer, 2015 Large-scale
release of Salmonella draft genomes; resources for food safety
and public health from the 100K Pathogen Genome Project
[Online] Available http //www ncbi nlm nih gov/bioproject/?
term=PRJNA186441
30 S A Jackson, M L Kotewicz, I R Patel, D W Lacher,
J Gangiredla, and C A Elkins, “Rapid genomic-scale analysis
of Escherichia coli O104 H4 by using high-resolution alternative
methods to next-generation sequencing,”Appl Environ
Microbiol , vol 78, pp 1601–1605, 2012
31 A Mellmann D Harmsen, C A Cummings, E B Zentz,
S R Leopold, A Rico A, K Prior, R Szczepanowski, Y Ji,
W Zhang, S F McLaughlin, J K Henkhaus, B Leopold,
M Bielaszewska, R Prager, P M Brzoska, R L Moore,
S Guenther, J M Rothberg, and H Karch, “Prospective
genomic characterization of the German enterohemorrhagic
Escherichia coli O104 H4 outbreak by rapid next generation
sequencing technology,”PLoS One, vol 6, no 7, 2011,
Art no e22751
32 Y H Grad, M Lipsitch, M Feldgarden, H M Arachi,
G C Cerqueira, M FitzGerald, B J Haas, C I Murphy,
C Russ, S Sykes, B J Walker, J R Wortman, S Young,
Q Zeng, A Abouelleil, J Bochicchio, S Chauvin, T DeSmet,
S Gujja, C McCowan, A Montmayeur, S Steelman,
J Frimodt-Moller, A M Petersen, C Struve, K A Krogfelt,
E Bingen, F -X Weill, E -S Lander, C Nusbaum,
B W Birren, D T Hung, and W P Hanage, “Genomic
epidemiology of the Escherichia coli O104 H4 outbreaks in
Europe, 2011,”Proc Nat Acad Sci , vol 109, no 8,
pp 3065–3070, 2012
33 D A Rasko, M J Rosovitz, G S Myers, E F Mongodin,
W F Fricke, P Gajer, J Crabtree, M Sebaihia, N R Thomson,
R Chaudhuri, I R Henderson, V Sperandio, and J Ravel,
“The pangenome structure of Escherichia coli comparative
genomic analysis of E coli commensal and pathogenic isolates,”
J Bacteriol , vol 190, pp 6881–6893, 2008
34 I Siró, E Kápolna, B Kápolna, and A Lugasi, “Functional
food Product development, marketing and consumer
acceptance—A review,”Appetite, vol 51, pp 456–467, 2008
35 B Ganesan, C Brothersen, and D J McMahon, “Fortification
of foods with omega-3 polyunsaturated fatty acids,”Critical Rev
Food Sci Nutrition, vol 54, no 1, pp 98–114, 2014
36 A Barzegari and A A Saei, “Designing probiotics with respect
to the native microbiome,”Future Microbiol , vol 7, no 5,
pp 571–575, 2012
37 B Ganesan, P Dobrowolski, and B C Weimer, “Identification
of the Leucine-to-2-Methylbutyric Acid Catabolic Pathway of
Lactococcus lactis,”Appl Environ Microbiol , vol 72,
pp 4264–4273, 2006
38 B Ganesan, M Stuart, and B C Weimer, “Carbohydrate
starvation causes a metabolically active but nonculturable state in
Lactococcus lactis,”Appl Environ Microbiol , vol 73,
pp 2498–2512, 2007
B. C. WEIMER ET AL. 1 : 11IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
39 J Ferreyra, K J Wu, A J Hrykowian, D M Bouley,
B C Weimer, and J L Sonnenburg, “Gut microbiota-produced
succinate promotes Clostridum difficile infection after antibiotics
or motility disturbance,”Cell Host Microbe , vol 16, no 6,
pp 770–777, 2014
40 A Marcobal, M Barboza, E D Sonnenburg, E Martens,
P Desai, C Lebrilla, B C Weimer, D A Mills, B German,
and J L Sonnenburg, “Bacteroides in the Infant Gut Consume
Milk Oligosaccharides via Mucus-Utilization Pathways,”Cell
Host Microbe , vol 10, pp 507–14, 2011
41 E A Maga, B C Weimer, and J D Murray, “Dissecting the
role of milk components on gut microbiota composition,”
Gut microbes, vol 4, no 2, pp 136–139, 2013
42 E A Maga, P Desai, B C Weimer, N Dao, D Küeltz, and
J D Murray, “Consumption of lysozyme-rich milk can alter
microbial fecal populations,”Appl Environ Microbiol , vol 78,
no 17, pp 6153–6160, 2012
43 M Barboza, J W Froehlich, J Pinzon, I Moeller, B Lonnerdal,
J B German, B C Weimer, and C B Lebrilla, “Glycosylation
of human milk lactoferrin exhibits dynamic changes during
early lactation enhancing its role in pathogenic bacteria-host
interactions,”Molecular Cellular Proteomics, vol 11, no 6,
2012, Art no 015248
44 T T Nieminen, K Koskinen, P Laine, J Hultman, E Säde E,
L Paulin, A Paloranta, P Johansson, J Björkroth, and
P Auvinen, “Comparison of microbial communities in marinated
and unmarinated broiler meat by metagenomics,”Int J Food
Microbiol , vol 157, no 2, pp 142–149, 2012
45 M Powell, W Schlosser, and E Ebel, “Considering the
complexity of microbial community dynamics in food safety risk
assessment,”Int J Food Microbiol , vol 2, pp 171–179, 2004
46 S van Hijum, E E Vaughan, and R F Vogel, “Application of
state-of-art sequencing technologies to indigenous food
fermentations,”Current Opinion Biotechnol , vol 24, no 2,
pp 178–186, 2013
47 S Weckx, R Van der Meulen, J Allermeersch, G Huys,
P Vandamme, and P Van Hummelen, “Community dynamics
of bacteria in sourdough fermentations as revealed by their
metatranscriptome,”Appl Environ Microbiol , vol 76, no 16,
pp 5402–5408, 2010
48 M M Leimena, J Ramiro-Garcia, M Davids, B van den
Bogert, H Smidt, E J Smid, J Boekhort, E G Zoendal,
P J Schapp, and M Kleerebezem, “A comprehensive
metatranscriptome analysis pipeline and its validation using
human small intestine microbiota datasets,”BMC Genomics,
vol 14, no 530, pp 2–14, 2013
49 K Makarova, A Slesarev, Y Wolf, A Sorokin, B Mirkin,
E Koonin, and D Mills, “Comparative genomics of the lactic
acid bacteria,”Proc Nat Acad Sci , vol 103, no 42,
pp 15611–15 616, 2006
50 R E Timme, M W Allard, Y Lao, E Strain, J Pettengill,
C Want, C Li, C E Keys, J Zheng, R Stones, M R Wilson,
S M Musser, and E W Brown, “Draft genome sequences of
21 Salmonella enterica serovar Enteritidis strains,”J Bacteriol ,
vol 194, no 21, pp 5994–5995, 2012
51 M L Land, D Hyatt, S R Jun, G H Kora, L J Hauser,
O Lukjancenko, and D W Ussery, “Quality scores for
32,000 genomes,”Stds Genomic Sci , vol 9, no 1, p 20,
2014
52 R H MacArthur and E O Wilson, The Theory of Island
Biogeography Princeton, NJ, USA Princeton Univ Press, 1967
53 J T Simpson, K Wong, S D Jackman, J E Schein,
S J M Jones, and İBirol, “ABySS A parallel assembler for
short read sequence data,”Genome Res , vol 19, no 6,
pp 1117–1123, 2009
54 T Seemann, “Prokka Rapid prokaryotic genome annotation,”
Bioinformatics, vol 30, no 14, pp 2068–2069, Jul 2014
[Online] Available http //www ncbi nlm nih gov/pubmed/
24642063
55 E Afshinnekoo, C Meydan, S Chowdhury, D Jaroudi,
C Boyer, N Bernstein, J M Maritz, D Reeves, J Gandara,
and C E Mason, “Geospatial resolution of human and bacterial
diversity with city-scale metagenomics,”Cell Syst , vol 1, no 1,
pp 72–87, 2015
56 D Ercolini, “High-throughput sequencing and metagenomics
Moving forward in the culture-independent analysis of food
microbial ecology,”Appl Environ Microbiol , vol 79, no 10,
pp 3148–3155, 2013
57 X He, D O Mishchuk, J Shah, B C Weimer, and
C M Slupsky, “Cross-talk between two E coli strains and a
human colorectal adenocarcinoma-derived cell line,”Sci Rep ,
vol 3, pp 3416–3426, 2013
58 S Edlund, D Chambliss, J Kaufman, D B Storey, and
B Weimer, “A scalable platform for meta-genomic analysis,”
presented at the 143rd American Public Health Association
(APHA) Annual Meeting and Exposition, Oct 31–Nov 4, 2015
[Online] Available https //apha confex com/apha/143am/
webprogram/Paper334559 html
59 S B Edlund, K L Beck, N Haiminen, L P Parida, D B Storey,
B C Weimer, J H Kaufman, and D D Chambliss, “Design of
the MCAW compute service for food safety bioinformatics,”
IBM J Res & Dev , vol 60, no 5/6, Paper 2, pp 2 1–2 12, 2016
(this issue)
Received October 29, 2015; accepted for publication
December 1, 2015
Bart C. Weimer Davis School of Veterinary Medicine,
University of California, Davis, CA 95616 USA (bcweimer@
ucdavis edu) Dr Weim er is Professor of Microbiology at the
University of California, Davis His laboratory focuses on microbial
physiology and function using systems biology approaches
concerning food, animals, and the environment He has special
interest in host/microbe interactions and the microbiome His
group leads the 100K Pathogen Genome Sequencing Project that
is creating a reference database of 100,000 bacterial pathogens
associated with food, animals, and humans, with the express purpose
of enabling advanced genomics in public health His group has
published over 110 peer-reviewed scientific papers and book
chapters, been awarded six patents, authored three books, and
mentored over 30 graduate students
Dylan Bobby Storey Davis School of Veterinary Medicine,
University of California, Davis, CA 95616 USA (dylan storey@
gmail com) Dr Storey is a postdoctoral fellow at the University of
California, Davis He received B Sc and M Sc degrees in biology
from California State University, Fresno in 2006 and 2011,
respectively, and a Ph D degree in life sciences from the University
of Tennessee, Knoxville in 2014, with fellowships from the National
Science Foundation and the U S Departm ent of Agriculture to
conduct research at the intersections of computer science,
mathematics, and biology His research focuses on the application
of bioinformatic and data-science techniques to large biological
datasets including integration of large-scale “-omic”data types
Christopher A. Elkins Center for Food Safety and Applied
Nutrition (CFSAN), Food and Drug Administration (FDA), Laurel,
MD 20708 USA (chris elkins@fda hhs gov) Dr Elkins is Director
of the Division of Molecular Biology in the Office of Applied
Research and Safety Assessment at CFSAN, FDA He received a
B A degree in biology and history from Case Western Reserve
University and a Ph D degree in microbiology from the University
of Tennessee, Knoxville, and then served as postdoctoral fellow at
University of California Berkeley At the FDA, he served as sta ff
microbiologist and principal investigator at FDA’s National Center
for Toxicological Research His current research interests include
enteric microbiology and antimicrobial resistance mechanisms
Current research directions include genomic-scale analysis
of probiotics and enteric foodborne pathogens, and metagenomic
methods to advance food safety Dr Elkins is a member of the
American Society for Microbiology and serves as Editor of
Applied and Environmental Microbiology
1 : 12 B. C. WEIMER ET AL. IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016
Robert C. Baker Mars Global Food Safety Center, Mars
Incorporated, McLean, VA 22101 USA (robert c baker@effem com)
Mr Baker is Director of Mars Global Food Safety Center and
Global Head of Technical Food Safety Development for Mars
Incorporated He started his career in the pharmaceutical industry,
moving to the food industry as a microbiologist in 1987 in Mars
Before his current position, Mr Baker wa s the Global Head of Food
Safety for Mars, Incorporated, accountable for the development and
execution of the Corporate Food Safety Management strategy He
received his B S degree in microbiology from Fairleigh Dickenson
University and his M S degree in food science from Rutgers
University Mr Baker is a registered microbiologist and a member
of several professional organizations
Peter Markwell Food Safety Science, Mars Incorporated,
McLean, VA 22101 USA (peter markwell@effem com)
Mr Markwell is Corporate Head of Food Safety Science for Mars,
Incorporated He is responsible for leading Food Safety Science
research projects for Mars, Incorporated These projects are spread
across four strategic platforms pathogen management, mycotoxin
management, raw material integrity management, and transforming
food safety through data integration Mr Markwell has been with
Mars, Incorporated, for 30 years During his career, he has led a
wide range of research programs and published more than 150
papers and research abstracts He has lectured to academic and other
audiences in more than 30 countries
David D. Chambliss IBM Research, Almaden Research
Center, San Jose, CA 95120 USA (chamb@us ibm com)
Dr Chambliss is a Research Staff Member in the Accelerated
Discovery Lab at the IBM Almaden Research Center He received
a Ph D degree in applied physics from Cornell University in 1989,
an M A degree from Cambridge University in 1983, and a B S E
degree from Princeton University in 1981 He leads the development
of the Metagenomics Computation and Analytics Workbench
(MCAW) Dr Chambliss has led projects in storage systems, such
as deduplication, quality of service, access pattern analytics, and
declustered RAID (Redundant Array of Independent Disks) He has
also conducted research in the physics of surfaces, with emphasis on
scanning tunneling microscope studies of epitaxial metal growth
Stefan B. Edlund IBM Research, Almaden Research Center,
San Jose, CA 95120 USA (sedlund@us ibm com)
Mr Edlund is a Senior Software Engineer in the Industrial and Applied
Genomics team at IBM Research - Almaden He has over 15 years of
experience in IBM Research and is currently performing research in
the area of food safety and bioinformatics, where he is designing a
workbench for running and analyzing outputs from (meta)genomic
pipelines Mr Edlund holds an M S degree in computer science from
the Royal Institute of Technology in Stockholm, and he is a member
of the Association for Computing Machinery
James H. Kaufman IBM Research, Almaden Research
Center, San Jose, CA 95120 USA (jhkauf@us ibm com)
Dr Kaufman is a scientist in the Advanced Discovery Laboratory at
the IBM Almaden Research Center in San Jose, California He is
currently principal investigator for the Sequence the Food Supply
Chain Consortium (SFSCC), a new project exploring metagenomics
for food safety He is also an Eclipse project co-lead for the
SpatioTemporal Epidemiological Modeler (http //www eclipse org/
stem) Dr Kaufman received his B A degree in physics from
Cornell University and his Ph D in physics from University of
California, Santa Barbara He is a Fellow of the American Physical
Society, a Distinguished Scientist of the Association for Computing
Machinery, and an IBM Distinguished Research Staff Member
During his career with IBM Research, Dr Kaufman has made
contributions in diverse fields including simulation science,
magnetic device technology, pattern formation, conducting
polymers, diamond-like carbon, superconductivity, experimental
studies of the Moon Illusion, distributed computing, privacy
protection, and grid middleware
B. C. WEIMER ET AL. 1 : 13IBM J. RES. & DEV. VOL. 60 NO. 5/6 PAPER 1 SEPTEMBER/NOVEMBER 2016