Matthew Davis’s research while affiliated with IBM Research - Thomas J. Watson Research Center and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


A Method and Means to Create and Maintain Accurate Databases for Pathogen Identification.
  • Patent

January 2022

·

10 Reads

·

·

Matthew A. Davis

For each unique pair of a complete set of data items, a computing device determines a distance between the data items of the unique pair. The computing device repeats the following until no data items remain in the complete set. For each data item remaining in the complete set, the computing device determines a similarity subset including each other data item that the distance between the data item and the other data item is less than a target difference threshold. The computing device moves a selected data item from a largest similarity subset to a reference database that is a subset of the complete set. The computing device removes each data item from the complete set that the distance between the selected data item and the data item is less than the threshold. A new data item can be classified using the reference database.


Fig. 1 Bioinformatic pipeline schematic for processing microbiome samples in the presence of matrix content. Description of the bioinformatic steps (light gray) applied to high protein powder metatranscriptome samples (dark gray). Black arrows indicate data flow and blue boxes describe outputs from the pipeline.
Fig. 8 Salmonella status correlations with genus relative abundances. Only those genera with the absolute value of the correlation coefficient >0.5 are shown. Positive and negative correlations are indicated in gray and blue, respectively.
Accuracy of microbial identification using two in silico constructed simulated food mixtures.
Monitoring the microbiome for food safety and quality using deep shotgun sequencing
  • Article
  • Full-text available

December 2021

·

267 Reads

·

34 Citations

npj Science of Food

·

·

David Chambliss

·

[...]

·

In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus, Aeromonas , and Citrobacter . We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species’ viability from total RNA sequencing.

Download

Monitoring the microbiome for food safety and quality using deep shotgun sequencing

May 2020

·

213 Reads

·

3 Citations

In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides , Clostridium , Lactococcus , Aeromonas , and Citrobacter . We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species' viability from total RNA sequencing.


FASER results on experimental samples
FASER results on experimental food mixture
High protein powder sequences mapping to observed source genomes
Food authentication from shotgun sequencing reads with an application on high protein powders

November 2019

·

155 Reads

·

40 Citations

npj Science of Food

Here we propose that using shotgun sequencing to examine food leads to accurate authentication of ingredients and detection of contaminants. To demonstrate this, we developed a bioinformatic pipeline, FASER (Food Authentication from SEquencing Reads), designed to resolve the relative composition of mixtures of eukaryotic species using RNA or DNA sequencing. Our comprehensive database includes >6000 plants and animals that may be present in food. FASER accurately identified eukaryotic species with 0.4% median absolute difference between observed and expected proportions on sequence data from various sources including sausage meat, plants, and fish. FASER was applied to 31 high protein powder raw factory ingredient total RNA samples. The samples mostly contained the expected source ingredient, chicken, while three samples unexpectedly contained pork and beef. Our results demonstrate that DNA/RNA sequencing of food ingredients, combined with a robust analysis, can be used to find contaminants and authenticate food ingredients in a single assay.


Figure 4.1 The cumulative rate of discovery of 'new genes' as a function of the number of (a) 866 Salmonella isolates, (b) E coli genotypes from 3348 representing 334 genotypes, and (c) 15,158 Campylobacter isolates. All increase as a power law (linear on a log-log plot) defined by Equations 4.1-4.3.
Figure 4.3 Heat maps showing the genome-genome similarity matrix derived from three Campylobacter databases containing (a) 90 genomes, (b) 218 genomes, and (c) 715 genomes.
Figure 4.4 Heat maps of genome-genome similarity (Pearson correlations) of pangenome gene (allele) presence measured with custom-designed Aff ymetrix microarray platforms. (a) Hybridization intensities for 1094 E. coli genomes; (b) hybridization intensities for 600 S. enterica genomes.
Figure 4.5 A circle plot showing the mapping of all log-normalized read alignments from a metagenomic study of poultry meal to 218 Campylobacter genomes in the curated Ensembl database. Each circle represents one specifi c genome in the reference. The area of each circle is the log of the number of sequence reads that matched that genome. The arrangement of the small circles is arbitrary. The dominant Campylobacter species are indicated in the legend.
Figure 4.6 The occurrence frequency of unique Campylobacter genome SNP variants within a metaRNAseq study of a poultry meal. The function is approximately a power law with slope near −1.
Insular Microbiogeography: Three Pathogens as Exemplars

October 2019

·

132 Reads

·

8 Citations

Traditional taxonomy in biology assumes that life is organized in a simple tree. Attempts to classify microorganisms in this way in the genomics era led microbiologists to look for finite sets of 'core' genes that uniquely group taxa as clades in the tree. However, the diversity revealed by large-scale whole genome sequencing is calling into question the long-held model of a hierarchical tree of life, which leads to questioning of the definition of a species. Large-scale studies of microbial genome diversity reveal that the cumulative number of new genes discovered increases with the number of genomes studied as a power law and subsequently leads to the lack of evidence for a unique core genome within closely related organisms. Sampling 'enough' new genomes leads to the discovery of a replacement or alternative to any gene. This power law behaviour points to an underlying self-organizing critical process that may be guided by mutation and niche selection. Microbes in any particular niche exist within a local web of organism interdependence known as the microbiome. The same mechanism that underpins the macro-ecological scaling first observed by MacArthur and Wilson also applies to microbial communities. Recent metagenomic studies of a food microbiome demonstrate the diverse distribution of community members, but also genotypes for a single species within a more complex community. Collectively, these results suggest that traditional taxonomic classification of bacteria could be replaced with a quasispecies model. This model is commonly accepted in virology and better describes the diversity and dynamic exchange of genes that also hold true for bacteria. This model will enable microbiologists to conduct population-scale studies to describe microbial behaviour, as opposed to a single isolate as a representative.



TABLE 1 (Continued) 
Draft Genome Sequences of 1,183 Salmonella Strains from the 100K Pathogen Genome Project

July 2017

·

180 Reads

·

14 Citations

Genome Announcements

Salmonella is a common food-associated bacterium that has substantial impact on worldwide human health and the global economy. This is the public release of 1,183 Salmonella draft genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Salmonella genus.


Insular microbiogeography

March 2017

·

196 Reads

·

3 Citations

The diversity revealed by large scale genomics in microbiology is calling into question long held beliefs about genome stability, evolutionary rate, even the definition of a species. MacArthur and Wilson's theory of insular biogeography provides an explanation for the diversity of macroscopic animal and plant species as a consequence of the associated hierarchical web of species interdependence. We report a large scale study of microbial diversity that reveals that the cumulative number of genes discovered increases with the number of genomes studied as a simple power law. This result is demonstrated for three different genera comparing over 15,000 isolates. We show that this power law is formally related to the MacArthur-Wilson exponent, suggesting the emerging diversity of microbial genotypes arises because the scale independent behavior first reported by MacArthur and Wilson extends down to the scale of microbes and their genes. Assessing the depth of available whole genome sequences implies a dynamically changing core genome, suggesting that traditional taxonomic classifications should be replaced with a quasispecies model that captures the diversity and dynamic exchange of genes. We report Species population "clouds" in a defined microbiome, with scale invariance extending down to the level of single-nucleotide polymorphisms (SNPs).


Figure 2: Experiment set up with population and retail store information in Berlin for each simulation  
Figure 3: Simulation result with real Berlin population and retail store data in a scale-free network.  
From farm to fork: how spatial-temporal data can accelerate foodborne illness investigation in a global food supply chain

June 2016

·

166 Reads

·

2 Citations

SIGSPATIAL Special

Foodborne disease is a global public health problem that affects millions of people every year. During a foodborne illness outbreak, rapid identification of contaminated food is vital to minimize illness, loss and impact on society. Public health officials face a significant challenge and long delays in obtaining critical information to help identify a contaminated product using traditional methods such as surveys and questionnaires. We propose a novel approach mapping geo-coded sales data against geo-coded confirmed case reports, which has the potential to reduce the time required for foodborne illness investigation. Using real grocery retail scanner data with spatial information from Germany, we have implemented a likelihood-based framework to study how such spatial data can be used to accelerate the investigation during the early stages of an outbreak. Our analysis shows that after receiving as few as 10 laboratory confirmed case reports it is possible to narrow the investigation to approximately 12 suspect products with the contaminated product included in this subset 90% of the time for approximately 80% of food products studied.


Fig. 1. Average success rate for all products at 10, 20, 50, 100 and 1000 case reports. Normally, the more case reports used in the estimation, the higher success rate we could obtain from the likelihood-based method, and more products if comtaminated 
Fig. 2. The convergence of success rate for three representative product. The result shown in this graph indicate the strati fi cation of the empirical data set used in this study. For some products, like product B (blue curve) and C (red curve), the success rates of identi fi cation if contaminated convege to 1 (the most successful) in a really fast mode when increasing the number of case reports. However, for product A (green curve), even with 1000 case reports, the trajectory of success rate does not converge to 1 at all. 
Fig. 3. First Appearance in Ordered List (FAOL). The products are sorted by First Appearance in the Top One Ordered List in this graph (blue curve). So for each product shown on X axis, it is observed that there is a consistency of identi fi ed a contaminated product in the top 3 or top 5 FAOL list. During an early stage of outbreak investigation, using the FAOL Top 5 metric, 90% of outbreaks can identify the contaminated product in just 20 or fewer case reports. This will greatly accelerate the investigation process. 
Fig. 4. Persistent Containment in Ordered List (PCOL). The data is sorted by the Persistent Containment in the Top One Ordered List (i.e., PCOL_Top_One). This metric measures the stability of the framework components in identifying the correct contaminated food product. It is observed that the likelihood method becomes stable within 100 cases for 90% products. The method achieves better performance when we relax the constraint to be the top three and top fi ve PCOL lists shown in red and green colors. 
Fig. 5. Rate of Appearance in Ordered List (RAOL) for a cluster of highly correlated food products. This group of products recurs consistently together in the ranking of likelihood-based estimation method. In this plot, for contaminated product 1, we can see that products 2 and 3 occur frequently with product 1 over the duration of an outbreak. The variance in frequency establishes a pattern that could be used to accelerate the identi fi cation of products with similar distributions but statistically signi fi cant differences in the RAOL metric. 
A modeling framework to accelerate food-borne outbreak investigations

January 2016

·

146 Reads

·

24 Citations

Food Control

Food safety procedures are critical to reducing pathogen caused food-borne disease (FBD). However there is no way to completely eliminate the risk of consuming contaminated products. When prevention efforts fail, rapid identification of the contaminated product is essential. The medical and economic losses incurred grow with the duration of the outbreak. In this paper we show that before an outbreak occurs, analysis of food sales data, as a proactive intervention, can provide useful product intelligence that we can exploit during an outbreak investigation to accelerate the identification process. Using real grocery retail sales data from Germany, we have implemented a likelihood-based approach to study how such data can be used to accelerate the investigation during the early stages of an outbreak.


Citations (14)


... [35] was then used to A cytobrush (FLOQSwabs, Coplan, Italy, EU) was used to swab the oral mucosa lateral to the palatoglossal folds, then placed in 500 µL of DNA/RNA Shield (Zymo, Irvine, CA, USA), vortexed, and stored at −20 • C. Bacterial cells were enzymatically lysed according to the protocol used by the 100 K pathogen project [29], and then RNA was isolated using Trizol LS (Ambion, Austin, TX, USA) according to manufacturer instructions. RNA sequencing libraries were prepared as described previously [30][31][32], with RNA purity and integrity confirmed using TapeStation (Agilent Technologies Inc., Santa Clara, CA, USA). Sequencing libraries were constructed using the enzymatic-based KAPA HyperPlus Library Preparation kit (KK8514) (Kappa Biosystems, Wilmington, MA, USA) on a PerkinElmer Sciclone G3 (PerkinElmer Inc. ...

Reference:

Case Report: Shift from Aggressive Periodontitis to Feline Chronic Gingivostomatitis Is Linked to Increased Microbial Diversity
Monitoring the microbiome for food safety and quality using deep shotgun sequencing

npj Science of Food

... Metagenomics is a powerful tool for characterizing microbial communities, and the translation of "omics" technologies like this to food microbiology will have a significant impact in the food industry and for public health (31,32). The applications of this technology extend far beyond just public health, they can also provide valuable insights about food quality, and there is evidence that the microbiome is likely an important and effective hazard indicator within the food supply chain (33). ...

Monitoring the microbiome for food safety and quality using deep shotgun sequencing

... Beyond this, milk is used as an ingredient to make a variety of products and other foods, with raw milk quality having considerable impacts on finished product quality, safety, and production efficiency. Other studies have aimed to characterize the microbiome of food ingredients in production settings, for example, in high protein powders (5,6), produce (7,8), and fermented foods (9)(10)(11)(12). These studies are useful in demonstrating the potential that metagenomics and metatranscriptomics have in advancing food safety and quality for targeted assessments as well as for improving sensitivity for regular surveillance. ...

Food authentication from shotgun sequencing reads with an application on high protein powders

npj Science of Food

... Currently, food safety regulatory agencies including the Food and Drug Administration (FDA), Centers for Disease Control and Prevention (CDC), United States Department of Agriculture (USDA), and European Food Safety Authority (EFSA) are converging on the use of WGS for pathogen detection and outbreak investigation. Large scale WGS of food-associated bacteria was first initiated via the 100 K Pathogen Genome Project 9 with the goal of expanding the diversity of bacterial reference genomes-a crucial need for foodborne illness outbreak investigation, traceability, and microbiome studies 10,11 . However, since WGS relies on culturing a microbial isolate prior to sequencing, there are inherent biases and limitations in its ability to describe the microorganisms and their interactions in a food sample. ...

Insular Microbiogeography: Three Pathogens as Exemplars

... Sequences were assembled using Shovil (v1.0.4) (83), checked for quality, size (4.5-6.5Mbp genome), completeness (>95% estimate), and contamination (<10% estimate) using CheckM (84), and assessed for approximate genera and species and further identity test for possible contamination using Kraken (85)(86)(87)(88)(89)(90). Sixteen sequences that did not meet quality criteria were removed from downstream analysis. ...

Insular Microbiogeography: Three Pathogens as Exemplars

... Whole genome sequencing was performed by the Weimer laboratory at the University of California, Davis, as part of the 100K Pathogen Genome Project (http://www.genomes4health.org/) (7,(34)(35)(36)(37)(38). Briefly, whole genome sequencing was performed using Illumina HiSeq XTEN with PE 150 plus index read (Illumina, San Diego, CA, USA). ...

Draft Genome Sequences of 1,183 Salmonella Strains from the 100K Pathogen Genome Project

Genome Announcements

... Introduction of machine learning and other computational techniques to the field of biology and medicine have revolutionized the way research can be conducted in these disciplines [2]. Due to machine learning, researchers are now able to leverage the power of data in order to identify patterns that can potentially help solve important problems, such as antimicrobial resistance (AR) [3][4][5][6][7][8][9]and detecting food hazards [10][11][12][13][14]. Computational advances in these areas has also led to the emergence of consumer-centric industries. ...

From farm to fork: how spatial-temporal data can accelerate foodborne illness investigation in a global food supply chain

SIGSPATIAL Special

... Listeriosis is a serious foodborne bacterial infection in humans and animals, caused by Listeria monocytogenes [1,2]. Listeria is mostly found in soil, unclean water (lakes, rivers), plants, refrigerators, and food [3]. ...

A modeling framework to accelerate food-borne outbreak investigations

Food Control

... A policy framework for technology and data utilization is needed, and a crisis situation may push policy development and reformulation of operational strategies (Kuščer et al., 2022) as well as data collection and communication planning (Barkbook-Johnson et al., 2017). Open-source software for data exchange and co-operation in a crisis is one way to advance situational picture formulation and problem solving (Falenski et al., 2013). ...

A Generic Open-Source Software Framework Supporting Scenario Simulations in Bioterrorist Crises

Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science