Examples of CNN image inputs generated from OTU tables. A The image is filled with species abundances (left) or presences (right). B For a single sample, the phylogenetic tree is constructed, populated with species abundances, and rearranged into a matrix.

Examples of CNN image inputs generated from OTU tables. A The image is filled with species abundances (left) or presences (right). B For a single sample, the phylogenetic tree is constructed, populated with species abundances, and rearranged into a matrix.

Source publication
Article
Full-text available
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turn...

Contexts in source publication

Context 1
... et al. [51,52] rendered an OTU table into an image by reshaping each sample into a square, where each pixel was colored based on the abundance or presence of microbial taxa (Fig. 1A). taxoNN rearranges an OTU table based on its inherent phylogenetic information [53], whereas PopPhy-CNN [54,55] populates a phylogenetic tree with OTU abundances, and then transforms the tree into a two-dimensional matrix (Fig. 1B). Generally, these approaches have outperformed their benchmarks (both traditional ML methods and FCNNs) ...
Context 2
... each sample into a square, where each pixel was colored based on the abundance or presence of microbial taxa (Fig. 1A). taxoNN rearranges an OTU table based on its inherent phylogenetic information [53], whereas PopPhy-CNN [54,55] populates a phylogenetic tree with OTU abundances, and then transforms the tree into a two-dimensional matrix (Fig. 1B). Generally, these approaches have outperformed their benchmarks (both traditional ML methods and FCNNs) in the task of host phenotype ...

Citations

... Additionally, advances in machine learning approaches can enable integration of complex variables, such as longitudinal data, into microbiome models. Medina et al. reviewed machine learning applications in microbiome research, emphasizing its potential to uncover patterns and interactions that traditional models may overlook [79]. ...
Article
Full-text available
Purpose of Review This review explores the application of classical ecological theory to host-associated microbiomes during initial colonization, maintenance, and recovery. We discuss unique challenges of applying these theories to host-associated microbiomes and host factors to consider going forward. Recent Findings Recent studies applying community ecology principles to host microbiomes continue to demonstrate a role for both selective and stochastic processes in shaping host-associated microbiomes. However, ecological frameworks developed to describe dynamics during homeostasis do not necessarily apply during diseased or highly perturbed states, where large variations can potentially lead to alternate stable states. Summary Despite providing valuable insights, the application of ecological theories to host-associated microbiomes has some unique challenges. The integration of host-specific factors, such as genotype or immune dynamics in ecological models or frameworks is crucial for understanding host microbiome assembly and stability, which could improve our ability to predict microbiome outcomes and improve host health.
... Sugar alcohols, such as inositol, are significant in root exudates, where they function as both carbon sources and signaling molecules. Inositol promotes several microbiological processes, including bacterial chemotaxis and biofilm formation (Medina and Kutuzova, 2022). Its transport and secretion in plants are regulated by specific transporters, and its metabolism can boost bacterial colonization competence, emphasizing its functional significance in the rhizosphere (Nazir et al., 2014). ...
... The rhizosphere, the soil region directly influenced by plant roots, is a dynamic ecosystem teeming with microbial organisms that play a crucial role in plant growth and resilience. Effective management of this microbial community holds immense potential for improving plant adaptability and achieving sustainable agricultural practices (He et al., 2010;Saeed et al., 2021;Medina and Kutuzova, 2022). Table 2 summarizes a variety of microbes and crops, illustrating how rhizosphere engineering can harness indigenous plant microbes and foster beneficial plant-microbe interactions (Yang et al., 2018). ...
Article
Full-text available
The rhizosphere, a dynamic and biologically active zone where plant roots interface with soil, plays a pivotal role in enhancing plant health, resilience, and stress tolerance. It is increasingly regarded as central to achieving Sustainable Development Goal 2 by fostering sustainable agricultural productivity. Engineering the rhizosphere microbiome has emerged as a transformative approach to promoting plant growth, improving stress adaptation, and restoring soil health while mitigating the adverse impacts of conventional farming practices. Advancements in omics technologies, sequencing tools, and synthetic microbial communities (SynComs) have shed light on the intricate plant-microbe interactions that regulate nutrient cycling, suppress diseases, and alleviate environmental stresses. Root exudates comprising organic acids, amino acids, sugars, and secondary metabolites act as biochemical cues that attract and shape beneficial microbial communities in the rhizosphere. This review highlights the potential of tailored SynComs to enhance plant resilience against abiotic stresses (e.g., drought, salinity) and biotic challenges (e.g., pathogens, pests). It further explores how advanced omics techniques, including metagenomics and metabolomics, decipher the mechanisms by which root exudates influence microbial communities and plant health. By integrating multi-disciplinary approaches and optimizing root exudate profiles, ecological engineering of plant-microbiome interactions offers a sustainable pathway for boosting crop productivity, managing soil-borne diseases, and reducing dependence on chemical inputs. These innovative strategies align with Sustainable Development Goal, contributing to global food security, long-term agricultural productivity, and ecological sustainability while preserving soil and plant health for future generations.
... Bacteria and archaea are often heavily underrepresented in deep learning models trained on genetic data Dalla-Torre et al. 2023). While modeling human genetic diversity has many direct implications for human health (Sapoval et al. 2022;Clapp et al. 2017), developing models that incorporate the vast genetic diversity across the microbial tree of life may lead to similar benefits, such as the development of novel microbiome therapeutics, inferring the health benefits of microbe-produced metabolites, and predicting the evolution of antibiotic resistance (Hernández Medina et al. 2022). Unlike the relatively static nature of the human genome, the microbiome is highly dynamic, adapting to environmental changes and interactions with its host or environment (Lloyd-Price et al. 2017;Ducarmon et al. 2023). ...
... For example, Traitar (Weimann et al. 2016a) uses support vector machines with a sparsity penalty to predict phenotypes based on Pfam annotations (Mistry et al. 2021). Those features can be aggregated over large collections of genes to use as input for machine learning methods (Weimann et al. 2016b;Barash et al. 2018;Wheeler, Gardner, and Barquist 2018;Hernández Medina et al. 2022;D'Elia et al. 2023). A different approach is to ignore gene-level information and directly work on taxonomic compositional count data (Li 2015;Calle 2019;Knight et al. 2018;Zhou and Gallins 2019;Huang et al. 2023). ...
Article
Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance, but also how sequence-level information of entire genomes allows us to identify gene associations underlying complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow up.
... Researching and adhering to the data use policies and citing data from repositories properly can save time during revisions and can foster trust and incentives for those that report hesitancy with data sharing. Lastly, we anticipate newer tools like incorporating machine learning or artificial intelligence within repositories will also help to address challenges in data quality control for both raw and processed data (reviewed in Hernández Medina et al., 2022;Kumar et al., 2024). ...
Article
Full-text available
Microbiome research is becoming a mature field with a wealth of data amassed from diverse ecosystems, yet the ability to fully leverage multi-omics data for reuse remains challenging. To provide a view into researchers’ behavior and attitudes towards data reuse, we surveyed over 700 microbiome researchers to evaluate data sharing and reuse challenges. We found that many researchers are impeded by difficulties with metadata records, challenges with processing and bioinformatics, and problems with data repository submissions. We also explored the cost constraints of data reuse at each step of the data reuse process to better understand “pain points” and to provide a more quantitative perspective from sixteen active researchers. The bioinformatics and data processing step was estimated to be the most time consuming, which aligns with some of the most frequently reported challenges from the community survey. From these two approaches, we present evidence-based recommendations for how to address data sharing and reuse challenges with concrete actions for future work.
... Artificial intelligence (AI), particularly machine learning, has become an essential tool in metagenomics, playing a crucial role in identifying, classifying, and functionally annotating viral sequences (Hernandez Medina et al., 2022;Wani et al., 2022;Yan et al., 2024). AI-driven deep learning models and ensemble classifiers have been successfully employed for predicting virus-host interactions, reconstructing viral genomes, and improving classification accuracy (Yakimovich, 2021;Elste et al., 2024). ...
Article
Full-text available
Introduction The human vaginal virome is an essential yet understudied component of the vaginal microbiome. Its diversity and potential contributions to health and disease, particularly vaginitis, remain poorly understood. Methods We conducted metagenomic sequencing on 24 pooled vaginal swab libraries collected from 267 women, including both healthy individuals and those diagnosed with vaginitis. Viral community composition, diversity indices (Shannon, Richness, and Pielou), and phylogenetic characteristics were analyzed. Virus–host associations were also investigated. Results DNA viruses dominated the vaginal virome. Anelloviridae and Papillomaviridae were the most prevalent eukaryotic viruses, while Siphoviridae and Microviridae were the leading bacteriophages. Compared to healthy controls, the vaginitis group exhibited significantly reduced alpha diversity and greater beta diversity dispersion, indicating altered viral community structure. Anelloviruses, detected in both groups, showed extensive lineage diversity, frequent recombination, and pronounced phylogenetic divergence. HPV diversity and richness were significantly elevated in the vaginitis group, alongside an unbalanced distribution of viral lineages. Novel phage–bacterial associations were also identified, suggesting a potential role for bacteriophages in shaping the vaginal microbiome. Discussion These findings provide new insights into the composition and structure of the vaginal virome and its potential association with vaginal dysbiosis. The distinct virome characteristics observed in women with vaginitis highlight the relevance of viral communities in reproductive health. Future studies incorporating individual-level sequencing and metatranscriptomics are warranted to explore intra-host viral dynamics, assess viral activity, and clarify the functional roles of vaginal viruses in host–microbiome interactions.
... AI-driven bioinformatics pipelines can analyze vast metagenomic datasets, classify microbial taxa, and identify novel microbial strains with enhanced precision (43). Deep learning algorithms facilitate the functional annotation of genes and metabolic pathways, improving our understanding of microbial roles in health and disease (44). ...
... To investigate microbiome-trait associations, one primary focus has been on identifying predictive microbial markers for disease prediction from microbial samples [8]. Here, a microbial sample is typically characterized by its taxonomic profile, which includes the abundance of microbial taxa at certain taxonomic levels [9], such as species, genus, family, and so on. ...
... Another difficulty in analyzing microbiome data stems from its sparsity, with a substantial portion of data entries being zeros [8]. These zeros can indicate either the true absence of the taxa in the environmental sample (i.e., biological zeros) or the failure to detect the taxa due to low sequencing depth and sampling variation (i.e., technical zeros) [11]. ...
... b Each internal neuron in MIOSTONE has the capability to discern whether taxa within the corresponding taxonomic group provide a more effective explanation of the trait when assessed either holistically (i.e., additively) as a group or individually (i.e., non-linearly) as distinct taxa. c MIOSTONE establishes a versatile microbiome data analysis pipeline, applicable to a variety of tasks including disease status prediction, microbiome representation learning, microbiome-disease association identification, and enhancement of predictive performance in tasks with limited samples through knowledge transfer Machine (SVM), XGBoost, and multi-layer perceptron (MLP), are widely used for predicting disease status [8]. Notably, tree-aware methods, such as DeepBiome [34], Ph-CNN [35], PopPhy-CNN [28], TaxoNN [27], and MDeep [31], are specifically designed to leverage phylogenetic or taxonomic structures in microbial taxa to enhance disease prediction (refer to the "Baseline methods" and "Benchmark details" sections for more baseline details). ...
Article
Full-text available
The human microbiome, a complex ecosystem of microorganisms inhabiting the body, plays a critical role in human health. Investigating its association with host traits is essential for understanding its impact on various diseases. Although shotgun metagenomic sequencing technologies have produced vast amounts of microbiome data, analyzing such data is highly challenging due to its sparsity, noisiness, and high feature dimensionality. Here, we develop MIOSTONE, an accurate and interpretable neural network model for microbiome-disease association that simulates a real taxonomy by encoding the relationships among microbial features. The taxonomy-encoding architecture provides a natural bridge from variations in microbial taxa abundance to variations in traits, encompassing increasingly coarse scales from species to domains. MIOSTONE has the ability to determine whether taxa within the corresponding taxonomic group provide a better explanation in a data-driven manner. MIOSTONE serves as an effective predictive model, as it not only accurately predicts microbiome-trait associations across extensive simulated and real datasets but also offers interpretability for scientific discovery. Both attributes are crucial for facilitating in silico investigations into the biological mechanisms underlying such associations among microbial taxa. CsEVhp6wjqighqiAz-eKD7Video Abstract
... In recent years, Machine learning (ML) methods (e.g.; Linear Regression, Random Forest, and Support Vector Machines) and Deep Learning (DL) methods (e.g.; Fully-Connected Neural Networks and Convolutional Neural Networks) have been widely applied in microbiome research, particularly for biomarkers discovery [40][41][42]. ...
Article
Full-text available
Purpose: The ocular surface (OS) microbiome is influenced by various factors and impacts on ocular health. Understanding its composition and dynamics is crucial for developing targeted interventions for ocular diseases. This study aims to identify host variables, including physiological, environmental, and lifestyle (PEL) factors, that influence the ocular microbiome composition and establish valid associations between the ocular microbiome and health outcomes. Methods: The 16S rRNA gene sequencing was performed on OS samples collected from 135 healthy individuals using eSwab. DNA was extracted, libraries prepared, and PCR products purified and analyzed. PEL confounding factors were identified, and a cross-validation strategy using various bioinformatics methods including Machine learning was used to identify features that classify microbial profiles. Results: Nationality, allergy, sport practice, and eyeglasses usage are significant PEL confounding factors influencing the eye microbiome. Alpha-diversity analysis revealed significant differences between Spanish and Italian subjects (-value < 0.001), with a median Shannon index of 1.05 for Spanish subjects and 0.59 for Italian subjects. Additionally, 8 microbial genera were significantly associated with eyeglass usage. Beta-diversity analysis indicated significant differences in microbial community composition based on nationality, age, sport, and eyeglasses usage. Differential abundance analysis identified several microbial genera associated with these PEL factors. The Support Vector Machine (SVM) model for Nationality achieved an accuracy of 100%, with an AUC-ROC score of 1.0, indicating excellent performance in classifying microbial profiles. Conclusion: This study underscores the importance of considering PEL factors when studying the ocular microbiome. Our findings highlight the complex interplay between environmental, lifestyle, and demographic factors in shaping the OS microbiome. Future research should further explore these interactions to develop personalized approaches for managing ocular health.
... On the other hand, lightweight architectures like MobileNetV3-L have proven to be viable alternatives for energy-efficient applications [57]. The study also demonstrates how the integration of deep learning models with environmental data contributes to a broader understanding of fungal diversity [58,59]. However, the reliance of deep learning algorithms on large datasets and the requirement for high computational power pose challenges to their broader adoption [60,61]. ...
Article
Full-text available
Fungi play a critical role in ecosystems, contributing to biodiversity and providing economic and biotechnological value. In this study, we developed a novel deep learning-based framework for the classification of seven macrofungi species from the genera Mycena and Marasmius, leveraging their unique ecological and morphological characteristics. The proposed approach integrates a custom convolutional neural network (CNN) with a self-organizing map (SOM) adapted for supervised learning and a Kolmogorov–Arnold Network (KAN) layer to enhance classification performance. The experimental results demonstrate significant improvements in classification metrics when using the CNN-SOM and CNN-KAN architectures. Additionally, advanced pretrained models such as MaxViT-S and ResNetV2-50 achieved high accuracy rates, with MaxViT-S achieving 98.9% accuracy. Statistical analyses using the chi-square test confirmed the reliability of the results, emphasizing the importance of validating evaluation metrics statistically. This research represents the first application of SOM in fungal classification and highlights the potential of deep learning in advancing fungal taxonomy. Future work will focus on optimizing the KAN architecture and expanding the dataset to include more fungal classes, further enhancing classification accuracy and ecological understanding.
... Compositional tables are usually used to identify the relative abundances of specific species, but each sample contains a huge number of features, many of which are sparse in terms of numbers; furthermore, there are excessive zero counts. 15 Typically, the application of a prevalence percentage filter, the use of log-transformations, applying a staying-in-the-simplex approach, or using ratios calculations are the normal approaches to solving the above problems. 16 In spite of technological advances and the use of broad metadata collections in published studies, further analytical refinement, together with improved study design and increased sample size, are warranted in order to facilitate standardization of the methods used and the translation of study findings into useful clinical findings. ...
... The use of different ML algorithms during the analysis of the microbiome composition can also enhance microbial biomarker classification, phenotype prediction, possible host interactions and potential endogenous component interactions. 12,15 In this study, we adopt an integrative approach involving the use of ML together with differential abundance methods in order to investigate the composition and diversity of gut microbiota by analyzing full-length 16S rRNA gene sequencing. The fecal samples came from a relatively large cohort of diabetic patients with diverse levels of renal function, as well as from controls subjects with normal renal function. ...
Article
Full-text available
Diabetic kidney disease (DKD) is a serious healthcare dilemma. Nonetheless, the interplay between the functional capacity of gut microbiota and their host remains elusive for DKD. This study aims to elucidate the functional capability of gut microbiota to affect kidney function of DKD patients. A total of 990 subjects were enrolled consisting of a control group (n = 455), a type 2 diabetes mellitus group (DM, n = 204), a DKD group (n = 182) and a chronic kidney disease group (CKD, n = 149). Full-length sequencing of 16S rRNA genes from stool DNA was conducted. Three findings are pinpointed. Firstly, new types of microbiota biomarkers have been created using a machine-learning (ML) method, namely relative abundance of a microbe, presence or absence of a microbe, and the hierarchy ratio between two different taxonomies. Four different panels of features were selected to be analyzed: (i) DM vs. Control, (ii) DKD vs. DM, (iii) DKD vs. CKD, and (iv) CKD vs. Control. These had accuracy rates between 0.72 and 0.78 and areas under curve between 0.79 and 0.86. Secondly, 13 gut microbiota biomarkers, which are strongly correlated with anthropometric, metabolic and/or renal indexes, concomitantly identified by the ML algorithm and the differential abundance method were highly discriminatory. Finally, the predicted functional capability of a DKD-specific biomarker, Gemmiger spp. is enriched in carbohydrate metabolism and branched-chain amino acid (BCAA) biosynthesis. Coincidentally, the circulating levels of various BCAAs (L-valine, L-leucine and L-isoleucine) and their precursor, L-glutamate, are significantly increased in DM and DKD patients, which suggests that, when hyperglycemia is present, there has been alterations in various interconnected pathways associated with glycolysis, pyruvate fermentation and BCAA biosynthesis. Our findings demonstrate that there is a link involving the gut-kidney axis in DKD patients. Furthermore, our findings highlight specific gut bacteria that can acts as useful biomarkers; these could have mechanistic and diagnostic implications.