Eric A. Franzosa’s research while affiliated with Broad Institute of MIT and Harvard and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (171)


Quantitative pangenomic analysis identifies expansion of Fic gene families with Fap2⁺ fusobacterial strains. (a) K-mer-based subtractive genomic signature detection by Neptune’s algorithm using draft-level assemblies of Fusobacterium CTIs. Inclusion and exclusion genomes are Fap2⁺ Fna CTI-1, 2, 6 and Fap2– CTI-3, 5, and 7, respectively. Neptune’s signature scores are a sum of BLAST identity-based sensitivity and specificity of a genomic signature matching inclusion genome regions. Each dot represents a bacterial genome fragment that contains open reading frame(s). (b) Analysis of genome length-adjusted Fic gene copy number from 622 publicly available fusobacterial genomes by Fap2 protein coverage (left, scatterplot; ~3 kbp Fap2 alignment length) or Fap2 genotype (right, boxplot). Fnavp (F. nucleatum, animalis, vincentii, polymorphum) in red and non-Fnavp species in black. (c) Abundance of Fic gene families in colorectal cancer-associated bacterial strains stratified by genus and species-level taxa. Enterotoxigenic B. fragilis (ETBF) strains are positive for fragilysin (bft) genes. Polyketide synthase-positive Escherichia coli (pks⁺ E. coli) strains are defined by the presence of one or more clb cluster genes (i.e., clbA, clbB, clbS, clbQ) in their genomes. Unless otherwise noted, minimum BLAST identity threshold for protein annotation in classifying bacterial genotype is 50%. NTBF, non-toxigenic B. fragilis. Plus symbols represent mean values.
Metagenomic quantification of fusobacterial Fic gene families and Fap2 in patients with colorectal adenomas and adenocarcinomas. Linear regression analysis of Fusobacteriota taxon-specific abundance of Fic gene families and Fap2 in 2,088 fecal microbiomes by (a) case-control classes and (c) colorectal cancer tumor-node-metastasis (TNM) staging. Lines in left and bottom marginal plots represent metagenomes that are negative for either fusobacterial Fap2, Fic, or both. Metagenomic gene family abundance was normalized by sequencing depth and average microbial genome size. Analysis of fusobacterial Fic-Fap2 gene prevalence in clinical samples stratified by (b) diagnostic groups and (d) TNM stages. A positive sample is defined as having non-zero metagenomic abundance of a gene of interest. P-values from Fisher’s exact tests were adjusted by Benjamini-Hochberg (BH) step-up procedure; *q < 0.05; **q < 0.01; ***q < 0.001; ****q < 0.0001.
Characterization of Fic family proteins from Fusobacterium colon tumor isolates and type strains. (a) Clustal Omega alignment of Fic family proteins for identifications of conserved Fic motifs and their autoinhibitory domains. A phylogenetic tree was constructed using an identity matrix of aligned protein sequences and the njs function from the ape R package. VopS and FICD are protein adenylyltransferases from V. parahaemolyticus serotype O3:K6 and Homo sapiens/Mus musculus, respectively. X-axis denotes aligned amino acid coordinates. (b) AlphaFold2 structural predictions of six representative Fic enzymes encoded by Fa7/1 (39). Protein structures are assigned with corresponding clade colors as in (a). Green, autoinhibitory loop. Light-gray, Fic motif. (c) Prevalence of Fa7/1 Fic enzyme homologs having at least 50% BLAST identities in 146 publicly available Fnavp genomes stratified by Fap2 genotype. Fa, F. animalis; Fn, Fusobacterium nucleatum; Fp, Fusobacterium polymorphum; Fv, Fusobacterium vincentii.
Genetic architecture and evolution of Fic gene loci in Fusobacterium animalis strains. Synteny analysis of gene blocks in Fic2 and Fic5 loci by minimap2 alignment of Fa7/1 Fic locus sequences against representative (a) Fap2⁺ and (b) Fap2– Fa genomes. Locus sequences are genomic regions covering at least 10 kbp upstream and downstream of Fic genes. ORFs whose start and stop codons overlap or are within 5 bps apart are considered coupled ORFs. Repeat and promoter locus sequences were predicted by RepeatModeler2 and Promotech, respectively (44, 45). Locus-specific ORFs were clustered into protein families at 50% identity and coverage via MMseqs2 (46). Average percent GC was calculated over 50 bps sliding windows. Fic loci are segregated by line breaks per Fa strain.
Fa7/1 Fic expression in vitro. (a) RT-qPCR analysis of Fusobacterium Fic gene expression at 6 h in Fa7/1 cocultures with Colon26 tumorspheres under anaerobic conditions. MOI, 10:1, Fa7/1 colony forming units (CFUs) to Colon26 cancer cells number. Relative gene expression values were normalized per experiment. Data represent seven independent experiments. Each symbol is one independent experiment. Error bars are SEM. P-values from Wilcoxon’s rank-sum tests were adjusted by Benjamini-Hochberg (BH) step-up procedure; *q < 0.05; **q < 0.01; ***q < 0.001; ****q < 0.0001. (b). Liquid chromatography tandem mass spectrometry-based proteomic detection of Fic family proteins in Fa7/1 monoculture supernatants from distinct growth phases. Ion peaks of peptides matching Fa7/1 Fic protein sequences for Fic2, 4, and 5.
Virulence factor discovery identifies associations between the Fic gene family and Fap2 fusobacteria in colorectal cancer microbiomes
  • Article
  • Full-text available

January 2025

·

31 Reads

Geicho Nakatsu

·

Duhyun Ko

·

·

[...]

·

Wendy S. Garrett

Fusobacterium is a bacterium associated with colorectal cancer (CRC) tumorigenesis, progression, and metastasis. Fap2 is a fusobacteria-specific outer membrane galactose-binding lectin that mediates Fusobacterium adherence to and invasion of CRC tumors. Advances in omics analyses provide an opportunity to profile and identify microbial genomic features that correlate with the cancer-associated bacterial virulence factor Fap2. Here, we analyze genomes of Fusobacterium colon tumor isolates and find that a family of post-translational modification enzymes containing Fic domains is associated with Fap2 positivity in these strains. We demonstrate that Fic family genes expand with the presence of Fap2 in the fusobacterial pangenome. Through comparative genomic analysis, we find that Fap2⁺ Fusobacteriota are highly enriched with Fic gene families compared to other cancer-associated and human gut microbiome bacterial taxa. Using a global data set of CRC shotgun metagenomes, we show that fusobacterial Fic and Fap2 genes frequently co-occur in the fecal microbiomes of individuals with late-stage CRC. We further characterize specific Fic gene families harbored by Fap2⁺ Fusobacterium animalis genomes and detect recombination events and elements of horizontal gene transfer via synteny analysis of Fic gene loci. Exposure of a F. animalis strain to a colon adenocarcinoma cell line increases gene expression of fusobacterial Fic and virulence-associated adhesins. Finally, we demonstrate that Fic proteins are synthesized by F. animalis as Fic peptides are detectable in F. animalis monoculture supernatants. Taken together, our study uncovers Fic genes as potential virulence factors in Fap2⁺ fusobacterial genomes. IMPORTANCE Accumulating data support that bacterial members of the intra-tumoral microbiota critically influence colorectal cancer progression. Yet, relatively little is known about non-adhesin fusobacterial virulence factors that may influence carcinogenesis. Our genomic analysis and expression assays in fusobacteria identify Fic domain-containing genes, well-studied virulence factors in pathogenic bacteria, as potential fusobacterial virulence features. The Fic family proteins that we find are encoded by fusobacteria and expressed by Fusobacterium animalis merit future investigation to assess their roles in colorectal cancer development and progression.

Download

Figure 1. Inhibiting gut bacterial urease activity is a promising strategy for modulating host ammonia levels. (A) Urease enzymes convert urea into ammonia and carbon dioxide. (B) Structure of the Bacillus pasteurii urease (PDB: 4UBP) highlighting its bis-nickel metallocofactor bound to the clinically used urease inhibitor acetohydroxamic acid. Ni atoms represented as blue spheres. (C) Strategies for manipulation of urease activity in the gut microbiota including previously successful fecal microbiota transplant and proposed use of small-molecule urease inhibitors.
Figure 2. Acetohydroxamic acid (AHA) and benurestat inhibit ammonia production by urease-encoding gut bacteria. (A) Chemical structures of the FDA-approved urease inhibitor AHA and the clinical candidate benurestat. The key hydroxamate pharmacophore responsible for inhibitor activity is labeled in green. (B) Relative quantification of inhibition of ammonia production by cultured bacterial isolates grown with 8 mM urea and treated with AHA. Activity was normalized to a no inhibitor control and this experiment was performed in biological triplicate. (C) Relative quantification of ammonia production by cultured bacterial isolates grown with 8 mM urea and treated with benurestat. This experiment was performed in biological triplicate. (D) Benurestat inhibits urease activity in mouse fecal suspensions ex vivo. Error bars represent mean ± standard deviation of biological triplicate experiments for each concentration.
Figure 3. Benurestat treatment reduces fecal and serum ammonia levels in mice. (A) Study of conventional mice treated with vehicle or benurestat (administered 2 times a day for 3 days via oral gavage) and euthanized (labeled as X) on day 4. (B) Quantification of ammonia concentrations in fecal and serum samples collected from vehicle or benurestat treated conventional mice. (C) Study of a mouse model of liver injury (injection with 100 mg/kg TAA) treated with vehicle or benurestat (administered 2 times a day for 3 days via oral gavage) and euthanized on day 4. (D) Quantification of ammonia concentrations in fecal and serum samples collected from vehicle or benurestat treated conventional mice induced with hyperammonemia. P values were calculated using an unpaired t test to determine the significance between the individual groups. (**, p < 0.01; ****, p < 0.0001; ns, not statistically significant).
Figure 4. Benurestat treatment rescues mice from a lethal dose of TAA. (A) Schematic of benurestat treatment in a mouse model of acute liver injury. 7-week old Swiss Webster conventional mice were treated with 200 mg/kg of TAA on day 0 and given either vehicle or 100 mg/kg of benurestat 2 times a day over 3 days. (B) Survival analysis of vehicle or benurestat treatment in a model of acute liver injury.
A Small-Molecule Inhibitor of Gut Bacterial Urease Protects the Host from Liver Injury

January 2025

·

13 Reads

ACS Chemical Biology

Hyperammonemia is characterized by the accumulation of ammonia within the bloodstream upon liver injury. Left untreated, hyperammonemia contributes to conditions such as hepatic encephalopathy that have high rates of patient morbidity and mortality. Previous studies have identified gut bacterial urease, an enzyme that converts urea into ammonia, as a major contributor to systemic ammonia levels. Here, we demonstrate use of benurestat, a clinical candidate used against ureolytic organisms in encrusted uropathy, to inhibit urease activity in gut bacteria. Benurestat inhibits ammonia production by urease-encoding gut bacteria and is effective against individual microbes and complex gut microbiota. When administered to conventional mice with liver injury induced by thioacetamide exposure, benurestat reduced gut and serum ammonia levels and rescued 100% of mice from lethal acute liver injury. Overall, this study provides an important proof-of-concept for modulating host ammonia levels and microbiota-driven risks for hyperammonemia with gut microbiota-targeted small-molecule inhibitors.


Quantifying Metagenomic Strain Associations from Microbiomes with Anpan

January 2025

·

34 Reads

Genetic and genomic variation among microbial strains can dramatically influence their phenotypes and environmental impact, including on human health. However, inferential methods for quantifying these differences have been lacking. Strain-level metagenomic profiling data has several features that make traditional statistical methods challenging to use, including high dimensionality, extreme variation among samples, and complex phylogenetic relatedness. We present Anpan, a set of quantitative methods addressing three key challenges in microbiome strain epidemiology. First, adaptive filtering designed to interrogate microbial strain gene carriage is combined with linear models to identify strain-specific genetic elements associated with host health outcomes and other phenotypes. Second, phylogenetic generalized linear mixed models are used to characterize the association of sub-species lineages with such phenotypes. Finally, random effects models are used to identify pathways more likely to be retained or lost by outcome-associated strains. We validated our methods by simulation, showing that we achieve more accurate effect size estimation and a lower false positive rate compared to alternative methodologies. We then applied our methods to a dataset of 1,262 colorectal cancer patients, identifying functionally adaptive genes and strong phylogenetic effects associated with CRC status, sometimes complementing and sometimes extending known species-level microbiome CRC biomarkers. Anpan's methods have been implemented as a publicly available R library to support microbial community strain and genetic epidemiology in a variety of contexts, environments, and phenotypes.


Profiling lateral gene transfer events in the human microbiome using WAAFLE

January 2025

·

96 Reads

·

3 Citations

Nature Microbiology

Lateral gene transfer (LGT), also known as horizontal gene transfer, facilitates genomic diversification in microbial populations. While previous work has surveyed LGT in human-associated microbial isolate genomes, the landscape of LGT arising in personal microbiomes is not well understood, as there are no widely adopted methods to characterize LGT from complex communities. Here we developed, benchmarked and validated a computational algorithm (WAAFLE or Workflow to Annotate Assemblies and Find LGT Events) to profile LGT from assembled metagenomes. WAAFLE prioritizes specificity while maintaining high sensitivity for intergenus LGT. Applying WAAFLE to >2,000 human metagenomes from diverse body sites, we identified >100,000 high-confidence previously uncharacterized LGT (~2 per microbial genome-equivalent). These were enriched for mobile elements, as well as restriction–modification functions associated with the destruction of foreign DNA. LGT frequency was influenced by biogeography, phylogenetic similarity of involved pairs (for example, Fusobacterium periodonticum and F. nucleatum) and donor abundance. These forces manifest as networks in which hub taxa donate unequally with phylogenetic neighbours. Our findings suggest that human microbiome LGT may be more ubiquitous than previously described.


Microbiome and metabolomic profiles in response to starch/fiber food intervention in healthy dogs. (a) Design of the dietary fiber intervention study: 18 dogs were fed 12 foods for a period of 7 days each in a single random order (details in Table S2). Foods are listed in the order they were fed. Fecal samples were collected on the 7th day of each period for metabolomic and metagenomic profiling. Based on fiber and starch content, foods were classified into three groups: HSLF, MSMF, and LSHF. (b) Composition of foods based on fiber types, starch, and NFE. Numbers indicate percentages of macronutrients in each food. Foods are listed according to their starch content. (c) Bray-Curtis principal coordinate analysis of microbiomes and metabolomes following consumption of the control food (HSLF_Con_1). (d) Relative abundances of microbial families in the first sample (HSLF_Con_1) across dogs. Species-level profiles are shown in Fig. S1, and detailed taxonomic profiles (i.e., relative abundances measured using MetaPhlAn) are provided in Table S3. (e) Chemical subclasses in the HSLF_Con_1 fecal metabolomes of dogs and those of healthy humans for comparison from the PRISM cohort (n = 34). (f) Procrustes analysis of microbiomes and metabolomes following consumption of HSLF_Con_1. Points from the same dog are connected. ADS, adsorbant; BR, brewers’ rice; Con, control; COV, coefficient of variation; FR, soluble/insoluble fiber ratio; HSLF, high starch, low fiber; IFOS, inulin, fructooligosaccharides; LSHF, low starch, high fiber; MDS, multidimensional scaling; MSMF, medium starch, medium fiber; NFE, non-fermentable energy; PC, principal component; PP, plant protein; Sol, highly soluble fiber; TDF, total dietary fiber.
Effect of specific foods on the canine gut microbiome and fecal metabolome. (a) Comparison of the intra-subject BC distances among microbiome taxonomic profiles and metabolomes of the subject population in response to each food in comparison with control food i.e., HSLF_Con_1 (initial) and with the preceding food. n = 216 for each box. Metabolomes but not microbiomes corresponding to HSLF foods are more similar to those of HSLF_Con1. Similarly, metabolome in response to the metabolome of the preceding food provided both foods are in the same food group. (b) The microbiomes and metabolomes of all subjects in response to each food were compared (inter-subject, as opposed to the intra-subject data in panel a) to identify foods that induce more similar microbiomes/metabolomes versus those that are more heterogeneous. Overall, metabolomes in response to the same food are more similar than the corresponding microbiomes. n = 216 for each box. (c) Relationship between metabolome and microbiome BC distances stratified by food. Pearson correlation coefficients and P values are shown. For 8 of the 12 foods, microbiome dissimilarities translate into metabolome dissimilarities. (d) BC univariate PERMANOVA showing components of overall compositional differences in both metabolomes and microbiomes (***P < 0.001 and **P < 0.01). Food, starch, and insoluble fiber explain the most variance in microbiomes and more so, in metabolomes. (e) BC distance-based principal coordinate analysis of microbiomes and metabolomes in response to the test foods shows that metabolomes corresponding to the same food group tend to be more similar than microbiomes. Shades of red, green, and blue represent the three food groups in all panels. ADS, adsorbant; BC, Bray-Curtis; BR, brewers’ rice; Con, control; FR, soluble/insoluble fiber ratio; HSLF, high starch, low fiber; IFOS, inulin, fructooligosaccharides; LSHF, low starch, high fiber; MBW, metabolic body weight; MSMF, medium starch, medium fiber; NFE, nitrogen-free extract; PC, principal coordinate; PERMANOVA, permutational multivariate analysis of variance; PP, plant protein; Sol, highly soluble fiber; TDF, total dietary fiber.
Associations between canine gut microbial and metabolic features and dietary macronutrients. (a) Significant associations between gut microbial and metabolic features and dietary macronutrients determined using univariate linear models, with macronutrient intake as the fixed effect and subject as the random effect. A filtered list is shown for species (q value < 0.05, coefficient > 50th percentile), enzymes (q value < 0.05, coefficient > 50th percentile that are associated with ≥3 macronutrients), fatty acids (q value < 0.05), and metabolites (q value < 0.05, coefficient > 50th percentile that are associated with ≥7 macronutrients). Complete results are in Table S7. (***q < 0.001, **q < 0.01, and *q < 0.05). (b) Log10 (relative abundance) of fiber-responsive bacterial species of Bacteroides, Prevotella, and Butyricicoccus following consumption of the HSLF, MSMF, and LSHF foods. (c) Log10 (RPKM) of enzymes involved in SCFA synthesis (butyrate: EC 2.7.1.45, EC 4.2.1.55, and EC 6.2.1.16; propionate: EC 4.1.1.41) and pectin degradation (EC 3.1.1.11 and EC 3.2.1.82) following consumption of the HSLF, MSMF, and LSHF foods. Enzyme names as in panel a. (d) Total SCFA and individual SCFAs are correlated with total dietary fiber intake (g/MBW). Total SCFA: R = 0.31, P < 0.001; acetic acid: R = 0.33, P < 0.001; propionic acid: R = 0.27, P < 0.001; butyric acid: R = 0.091, P = 0.17. CAG, co-abundant gene; EC, Enzyme Commission; HSLF, high starch, low fiber; LSHF, low starch, high fiber; MBW, metabolic body weight; MSMF, medium starch, medium fiber; RPKM, reads per kilobase of exon per million reads mapped; SCFA, short-chain fatty acid.
Heterogeneity in microbial responses to dietary fiber. (a) Microbial species (n = 16) differentially abundant among food groups were identified by mixed-effect linear models and comparison of EMMs. White circles indicate species (n = 8) enriched in only one group. (b) Association between the SCFAs butyric acid/propionic acid and insoluble fiber, stratified by food groups. SCFAs are significantly associated with insoluble fiber intake only in LSHF foods. (c) Association between beneficial microbial metabolites N-acetylputrescine and butyric acid across food groups. (d) Association between products of protein degradation (indole and total BCFA) across food groups. CAG, co-abundant gene; HSLF, high starch, low fiber; LSHF, low starch, high fiber; MSMF, medium starch, medium fiber; NSC, normalized spectral counts; SCFA, short-chain fatty acid.
Metabolome and especially microbiome responses to food are highly individualized even in a relatively homogeneous companion animal population. (a, left) Subject-stratified distribution of q values for associations of insoluble fiber, soluble fiber, total dietary fiber, and starch (jointly) with microbial (n = 37) and metabolic features (n = 100; top 25 significantly associated with each macronutrient) previously identified to be significantly associated at the population level (see Fig. 3a). For subject labels, green denotes responders, pink non-responders, and gray neither. (Right) Percentage of species and metabolites significantly associated with fiber or species in each dog. (b) Consistency of significant associations between species/metabolites and fiber across animals. (c) Distribution of pairwise BC distances within and between response categories. ADS, adsorbant; BC, Bray-Curtis; BR, brewers’ rice; CAG, co-abundant gene; Con, control; FR, soluble/insoluble fiber ratio; HSLF, high starch, low fiber; IFOS, inulin, fructooligosaccharides; LSHF, low starch, high fiber; MSMF, medium starch, medium fiber; NRNR, comparison between non-responders; PP, plant protein; RNR, comparison between a responder and non-responder; RR, comparison between responders; Sol, highly soluble fiber; TDF, total dietary fiber.
Response of the gut microbiome and metabolome to dietary fiber in healthy dogs

December 2024

·

58 Reads

·

1 Citation

Dietary fiber confers multiple health benefits originating from the expansion of beneficial gut microbial activity. However, very few studies have established the metabolic consequences of interactions among specific fibers, microbiome composition, and function in either human or representative animal models. In a study design reflective of realistic population dietary variation, fecal metagenomic and metabolomic profiles were analyzed from healthy dogs fed 12 test foods containing different fiber sources and quantities (5–13% as-fed basis). Taxa and functions were identified whose abundances were associated either with overall fiber intake or with specific fiber compositions. Fourteen microbial species were significantly enriched in response to ≥1 specific fiber source; enrichment of fiber-derived metabolites was more pronounced in response to these fiber sources. Positively associated fecal metabolites, including short-chain fatty acids, acylglycerols, fiber bound sugars, and polyphenols, co-occurred with microbes enriched in specific food groups. Critically, the specific metabolite pools responsive to differential fiber intake were dependent on differences both in individual microbial community membership and in overall ecological configuration. This helps to explain, for the first time, differences in microbiome-diet associations observed in companion animal epidemiology. Thus, our study corroborates findings in human cohorts and reinforces the role of personalized microbiomes even in seemingly phenotypically homogeneous subjects. IMPORTANCE Consumption of dietary fiber changes the composition of the gut microbiome and, to a larger extent, the associated metabolites. Production of health-relevant metabolites such as short-chain fatty acids from fiber depends both on the consumption of a specific fiber and on the enrichment of beneficial metabolite-producing species in response to it. Even in a seemingly homogeneous population, the benefit received from fiber consumption is personalized and emphasizes specific fiber-microbe-host interactions. These observations are relevant for both population-wide and personalized nutrition applications.


MaAsLin 3: Refining and extending generalized multivariable linear models for meta-omic association discovery

December 2024

·

89 Reads

A key question in microbial community analysis is determining which microbial features are associated with community properties such as environmental or health phenotypes. This statistical task is impeded by characteristics of typical microbial community profiling technologies, including sparsity (which can be either technical or biological) and the compositionality imposed by most nucleotide sequencing approaches. Many models have been proposed that focus on how the relative abundance of a feature (e.g. taxon or pathway) relates to one or more covariates. Few of these, however, simultaneously control false discovery rates, achieve reasonable power, incorporate complex modeling terms such as random effects, and also permit assessment of prevalence (presence/absence) associations and absolute abundance associations (when appropriate measurements are available, e.g. qPCR or spike-ins). Here, we introduce MaAsLin 3 (Microbiome Multivariable Associations with Linear Models), a modeling framework that simultaneously identifies both abundance and prevalence relationships in microbiome studies with modern, potentially complex designs. MaAsLin 3 also newly accounts for compositionality with experimental (spike-ins and total microbial load estimation) or computational techniques, and it expands the space of biological hypotheses that can be tested with inference for new covariate types. On a variety of synthetic and real datasets, MaAsLin 3 outperformed current state-of-the-art differential abundance methods in testing and inferring associations from compositional data. When applied to the Inflammatory Bowel Disease Multi-omics Database, MaAsLin 3 corroborated many previously reported microbial associations with the inflammatory bowel diseases, but notably 77% of associations were with feature prevalence rather than abundance. In summary, MaAsLin 3 enables researchers to identify microbiome associations with higher accuracy and more specific association types, especially in complex datasets with multiple covariates and repeated measures.


Consistent global links between coffee consumption and the human gut microbiome
a, Five UK and/or US PREDICT cohorts (n = 975, 11,798, 8,470, 1,098 and 12,353), the MBS and the MLVS (n = 213 and n = 307, respectively) were used to assess diet–microbiome relationships (total n = 35,214). For later comparisons of microbiome distributions across different populations, we retrieved n = 18,984 metagenomic samples from public sources, including healthy adult individuals, newborns, non-Westernized (non-West.) individuals, ancient samples and non-human primates (NHP). P1, PREDICT1; P2, PREDICT2; P3, PREDICT3. b, We combined faecal metagenomics (n = 54,198), faecal metatranscriptomics (n = 364) and plasma metabolomics (n = 438), with the latter two from the MBS and MLVS cohorts. FFQs surveyed nutritional habits of the participants from four PREDICT cohorts, MBS and MLVS (n = 22,867 after removing individuals above the 99th percentile of coffee intake in the PREDICT cohorts as outliers). Participants were categorized as ‘high’, ‘moderate’ and ‘never’ coffee drinkers as previously established²⁵. c, Median Spearman’s correlation and median AUCs from a random forest regressor and a random forest classifier trained on the microbiome composition estimated by MetaPhlAn 4 (ref. ³⁰). d, The number of never (light green), moderate (dark cyan) and high-coffee drinkers (brown). e, ROC and AUC of random forest classifiers discriminating participants between pairs of the three coffee drinker classes, assessed in a tenfold, ten times repeated cross-validations (CV) that benefited from the other cohorts during the training phase as in the leave-one-dataset-out approach (LODO; Methods). The shaded areas represent the 95% confidence intervals (CIs) of a linear interpolation over all the folds of the test. Machine learning results using either only a CV or a LODO approach are reported in Extended Data Fig. 2a,b.
L. asaccharolyticus drives the association between the gut microbiome and coffee intake
a, The top ten SGBs from a meta-analysis of partial correlations between SGB-ranked abundances and total per-individual coffee intake considering the five cohorts analysed in this study (q < 0.001). The black markers show the per-cohort partial correlations and the light blue markers indicate the average Spearman’s correlations adjusted by sex, age and BMI. b, The same SGBs are meta-analysed with Spearman’s partial correlations (par. corr.) between SGB abundances and decaffeinated (decaf.) coffee intake in the PREDICT1 and PREDICT3 UK22A cohorts, excluding individuals who consumed caffeinated coffee only (n = 262 and 4,055). The black markers show the per-cohort correlations and dark blue symbols refer to average correlations adjusted by sex, age, BMI and caffeinated coffee. c, The prevalence of the ten SGBs in the five cohorts analysed. d, The prevalence of L. asaccharolyticus across never, moderate and high coffee drinkers and nine US regions in the PREDICT2 and PREDICT3 US22A cohorts (n = 9,210).
L. asaccharolyticus is highly prevalent with about fourfold higher average abundance in coffee drinkers, and its growth is stimulated by coffee supplementation in vitro
a, The relative abundance of L. asaccharolyticus in each cohort by coffee consumption category (never, moderate or high). The boxes represent the median and interquartile range (IQR) of the distributions, and top and bottom whiskers mark the point at 1.5 IQR. The median fold change of the high versus never comparison is reported on top if post hoc Dunn q < 0.01, and median fold change (FC) of the other two comparisons are reported on the top of each combination. n.s. (not significant) refers to post hoc Dunn q > 0.01. Total sample sizes are presented in Extended Data Fig. 1. b, L. asaccharolyticus growth on agar plates supplemented with increasing concentrations of coffee and measured by plate count (c.f.u. per ml). P values refer to one-sample t-tests compared with the control (ctrl) experiment value. c–e, Bacterial growth of L. asaccharolyiticus (c), E. coli (d) and B. fragilis (e) in liquid medium supplemented with increasing coffee concentrations and measured by changes in optical density (OD650). Percentage growth is relative to the culture medium control not supplemented with coffee (100%). Absolute OD650 values are reported in Supplementary Tables 15 and 16. The bars and error lines indicate the mean ± s.d. of five technical replicates, except for E. coli control (n = 3 and n = 4) and B. fragilis instant 5 g l⁻¹ (n = 4). The minus and plus signs refer to significant tests (Dunnett q < 0.01) that overcome specific thresholds of fold increase (incr.) or decrease (decr.).
L. asaccharolyticus is ubiquitous in modern, Westernized, adult populations and almost absent elsewhere
a, The prevalence of L. asaccharolyticus in 11 different types of host (219 subpopulations, N = 54,198) including children and adults; healthy and diseased participants; from Westernized and non-Westernized communities; non-human primates and ancient samples, compared with the ZOE PREDICT and MBS–MLVS cohorts. Human, modern samples and participant records were obtained from a development version of curatedMetagenomicData³⁵ (Supplementary Table 17). b, The per capita coffee consumption (kg per year, estimated by https://worldpopulationreview.com) for 25 countries (AUT, Austria; CHE, Switzerland; DEU, Germany; DNK, Denmark; ESP, Spain; FIN, Finland; FRA, France, GBR, UK; IRL, Ireland; ITA, Italy; LUX, Luxembourg; NLD, Netherlands; SWE, Sweden; CHN, China; IND, India; ISR, Israel; JPN, Japan; KAZ, Kazakhstan; KOR, Korea; MNG, Mongolia; MYS, Malaysia; ARG, Argentina; CAN, Canada; AUS, Australia) correlates with the prevalence of L. asaccharolyticus in healthy and diseased populations. The shaded areas around the regression line represent the 95% confidence interval estimated by bootstrapping.
Unannotated metabolites covarying with quinic acid are associated with L. asaccharolyticus
a, The correlation of coffee intake versus abundances of six known coffee metabolites in plasma metabolomics samples from the MLVS (blue) and MBS (red). The highest rank correlation is reported in each plot. Three metabolites were not measured in MBS. b, Left, a heat map showing standardized abundances of the 14 unannotated and 8 previously annotated metabolites in the MLVS cohort (n = 307) with the highest MACARRoN priority score with respect to the presence of L. asaccharolyticus. QA, quinic acid; Trig, trigonelline. Right, MACARRoN priority scores. Samples are reported by coffee intake category. c, The log2-transformed abundances of quinic acid and the top six quinic acid-correlated unannotated metabolites according to L. asaccharolyticus relative abundance (RA) categories (absent, RA <0.01%; low, 0.1%> RA ≥0.01%; high, RA >0.1%) in 190 coffee drinkers. The boxes represent the median and IQR of the distributions, and top and bottom whiskers mark the point at 1.5 IQR.
Coffee consumption is associated with intestinal Lawsonibacter asaccharolyticus abundance and prevalence across multiple cohorts

November 2024

·

530 Reads

·

2 Citations

Nature Microbiology

Although diet is a substantial determinant of the human gut microbiome, the interplay between specific foods and microbial community structure remains poorly understood. Coffee is a habitually consumed beverage with established metabolic and health benefits. We previously found that coffee is, among >150 items, the food showing the highest correlation with microbiome components. Here we conducted a multi-cohort, multi-omic analysis of US and UK populations with detailed dietary information from a total of 22,867 participants, which we then integrated with public data from 211 cohorts (N = 54,198). The link between coffee consumption and microbiome was highly reproducible across different populations (area under the curve of 0.89), largely driven by the presence and abundance of the species Lawsonibacter asaccharolyticus. Using in vitro experiments, we show that coffee can stimulate growth of L. asaccharolyticus. Plasma metabolomics on 438 samples identified several metabolites enriched among coffee consumers, with quinic acid and its potential derivatives associated with coffee and L. asaccharolyticus. This study reveals a metabolic link between a specific gut microorganism and a specific food item, providing a framework for the understanding of microbial dietary responses at the biochemical level.


Figure 3: Methods assign different profiles in real environmental data. A. Assigned taxonomic profiles depended heavily on
Figure 4: Inferred community structure varies by method. A. Methods agreed poorly on sample alpha diversity (inverse
Figure 5: Methods show improved agreement in identifying important microbial associations. A. PERMANOVA consistency
Evaluating metagenomic analyses for undercharacterized environments: what's needed to light up the microbial dark matter?

November 2024

·

118 Reads

Non-human-associated microbial communities play important biological roles, but they remain less understood than human-associated communities. Here, we assess the impact of key environmental sample properties on a variety of state-of-the-art metagenomic analysis methods. In simulated datasets, all methods performed similarly at high taxonomic ranks, but newer marker-based methods incorporating metagenomic assembled genomes outperformed others at lower taxonomic levels. In real environmental data, taxonomic profiles assigned to the same sample by different methods showed little agreement at lower taxonomic levels, but the methods agreed better on community diversity estimates and estimates of the relationships between environmental parameters and microbial profiles.




Citations (62)


... com/biobakery/waafle) from metagenomic data. WAAFLE identifies potential HGT events in metagenomes by aligning metagenomic contigs with microbial reference sequences 36 . Initially, we processed approximately 839 GB of raw metagenomic data from 70 soil samples. ...

Reference:

Gene horizontal transfers and functional diversity negatively correlated with bacterial taxonomic diversity along a nitrogen gradient
Profiling lateral gene transfer events in the human microbiome using WAAFLE

Nature Microbiology

... Beyond sequencing, other nascent methods have the potential to more directly ascertain the functional interactions of skin microbes with their hosts (14). Metabolomics and metaproteomics would interrogate the small molecules and proteins produced and modified by both host and microbial cells that may modulate niche colonization, immune responses, and tissue remodeling (15,16). ...

A host–microbiota interactome reveals extensive transkingdom connectivity

Nature

... With other tools such as Centrifuge or Kraken2, we get information on the number of reads, but no information on the distribution of reads across the genome [83,84]. While MetaPhlAn is a very accurate tool for profiling the composition of microbial communities (bacteria, archaea and eukaryotes) from metagenomic sequencing data, it still has difficulties in classifying viruses [85]. In this context, MetaAll represents the approach that maximizes sensitivity (in the clinical cases presented and the CAMI dataset, MetaAll had a sensitivity of 100%) while attempting to retain specificity through method triangulation, qualitative assessment and expert input. ...

Discovering and exploring the hidden diversity of human gut viruses using highly enriched virome samples

... • BugSigDB: a community-editable database of manually curated microbial signatures from published differential abundance studies [13] • GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison [14] • gutMGene: a comprehensive database for target genes of gut microbes and microbial metabolites [15] • mBodyMap: a curated database for microbes across human body and their associations with health and diseases [16] • MiMeDB: the Human Microbial Metabolome Database [17] For each source above, taxon sets were downloaded, taxonomic identifiers converted to NCBI taxonomic IDs by querying the NCBI Entrez API, made non-redundant and transformed into a named list of taxon sets in R where each element is a named set and the members NCBI taxonomic IDs. As MiMeDB does not provide API access or bulk downloads, a manually curated subset of taxon sets was downloaded. ...

BugSigDB captures patterns of differential abundance across a broad range of host-associated microbial signatures

Nature Biotechnology

... Upon entering the human body, they can cause a range of health issues, including autoimmune disorders, inflammation, inflammatory bowel disease, obesity, and carcinogenesis [47]. Additionally, there is a risk of horizontal gene transfer of foreign genes and mobile genetic elements within the human microbiota, posing further health risks [48]. ...

Profiling novel lateral gene transfer events in the human microbiome

... In addition, DNA-based analyses do not differentiate between viable and dead cells [26,27], which is particularly relevant for post-sanitation surfaces. Metagenomic sequencing is additionally constrained by the limitation of current reference databases and by contamination in low-biomass samples [28], although these constraints have been partially addressed by using appropriate sequencing depth, controls, and suitable protocols, e.g., by propidium monoazide treatment or by sequencing of RNA rather than DNA [8,29,30]. Culture-based methods used to identify bacteria in food processing facilities focused on foodborne pathogens and employed selective media to enumerate or isolate L. monocytogenes, E. coli O157:H7, and Salmonella [22,31]. ...

RNA-based amplicon sequencing is ineffective in measuring metabolic activity in environmental microbial communities

... Intermittent antibiotic therapy is rarely prescribed in medicine; exceptions are pulmonary disease due to atypical or typical mycobacteria [28] and bronchiectasis as prophylaxis of acute exacerbation [29]. Moreover, recent laboratory experiments suggest caution in using intermittent antibiotic treatment due to the risk of favouring the rapid evolution of antimicrobial resistance [30,31]. The emergence of antibiotic-resistant bacteria is a problem mainly related to incorrect use of antibiotics and is more often seen in upper and lower respiratory tract infections with a viral etiology, where antibiotics should not be used at all [32]. ...

Pulsed antibiotic treatments of gnotobiotic mice manifest in complex bacterial community dynamics and resistance effects
  • Citing Article
  • June 2023

Cell Host & Microbe

... Traditional treatment for IBD usually includes medications, nutritional interventions, and surgical procedures. Commonly used medications include 5-aminosalicylic acid (5-ASA) [4], corticosteroids, immunosuppressants (such as azathioprine and methotrexate), and biologics (such as antitumor necrosis factor-α antibodies) [5]. Although these treatments can relieve symptoms and control inflammation, they often fail to fundamentally solve IBD, and long-term use may lead to side effects and drug resistance [6]. ...

Gut microbial metabolism of 5-ASA diminishes its clinical efficacy in inflammatory bowel disease

Nature Medicine

... For read-based analysis, high-quality microbial reads were used for species-level community profiling, with relative abundances determined using MetaPhlAn4 v4.0.6 [66], applying against the mpa_vOct22_ CHOCOPhlAnSGB_202212 database under default parameters to capture all taxonomic levels. Additionally, functional potential profiling is performed using HU-MAnN3 v3.7 [23]. ...

Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4

Nature Biotechnology

... The idea of collecting information about bacteria from multiple experiments is not ne w, and se v eral microbial repositories and databases currently exist, e.g. Qiita ( 20 ), redbiom ( 21 ), MGnify ( 22 ), GutMDisorder ( 23 ), BugSigDB ( 24 ) and Disbiome database ( 25 ). Howe v er, db-Bact provides a combination of se v eral key aspects that to the best of our knowledge are not available together in any other r esour ce: (a) Manual annotation : genotype-phenotype associations for each study are manually curated by human experts that understand the experimental setting and ther efor e can identify bacteria related to the different phenotypic groups. ...

BugSigDB: accelerating microbiome research through systematic comparison to published microbial signatures