Figure - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turn...
Context in source publication
Context 1
... microbiome research, ML has been applied to tackle tasks such as phenotyping (namely, predicting an environmental or host phenotype), microbial feature classification (i.e., determining the abundance, diversity, or distribution of the microbiota), studying the complex physical and chemical interactions between the microbiome's components, and monitoring for changes in the composition of the microbiome [9,10]. In Table 1, we enumerate select examples of each of these tasks. ...Citations
... On the other hand, lightweight architectures like MobileNetV3-L have proven to be viable alternatives for energy-efficient applications [57]. The study also demonstrates how the integration of deep learning models with environmental data contributes to a broader understanding of fungal diversity [58,59]. However, the reliance of deep learning algorithms on large datasets and the requirement for high computational power pose challenges to their broader adoption [60,61]. ...
Fungi play a critical role in ecosystems, contributing to biodiversity and providing economic and biotechnological value. In this study, we developed a novel deep learning-based framework for the classification of seven macrofungi species from the genera Mycena and Marasmius, leveraging their unique ecological and morphological characteristics. The proposed approach integrates a custom convolutional neural network (CNN) with a self-organizing map (SOM) adapted for supervised learning and a Kolmogorov–Arnold Network (KAN) layer to enhance classification performance. The experimental results demonstrate significant improvements in classification metrics when using the CNN-SOM and CNN-KAN architectures. Additionally, advanced pretrained models such as MaxViT-S and ResNetV2-50 achieved high accuracy rates, with MaxViT-S achieving 98.9% accuracy. Statistical analyses using the chi-square test confirmed the reliability of the results, emphasizing the importance of validating evaluation metrics statistically. This research represents the first application of SOM in fungal classification and highlights the potential of deep learning in advancing fungal taxonomy. Future work will focus on optimizing the KAN architecture and expanding the dataset to include more fungal classes, further enhancing classification accuracy and ecological understanding.
... Compositional tables are usually used to identify the relative abundances of specific species, but each sample contains a huge number of features, many of which are sparse in terms of numbers; furthermore, there are excessive zero counts. 15 Typically, the application of a prevalence percentage filter, the use of log-transformations, applying a staying-in-the-simplex approach, or using ratios calculations are the normal approaches to solving the above problems. 16 In spite of technological advances and the use of broad metadata collections in published studies, further analytical refinement, together with improved study design and increased sample size, are warranted in order to facilitate standardization of the methods used and the translation of study findings into useful clinical findings. ...
... The use of different ML algorithms during the analysis of the microbiome composition can also enhance microbial biomarker classification, phenotype prediction, possible host interactions and potential endogenous component interactions. 12,15 In this study, we adopt an integrative approach involving the use of ML together with differential abundance methods in order to investigate the composition and diversity of gut microbiota by analyzing full-length 16S rRNA gene sequencing. The fecal samples came from a relatively large cohort of diabetic patients with diverse levels of renal function, as well as from controls subjects with normal renal function. ...
Diabetic kidney disease (DKD) is a serious healthcare dilemma. Nonetheless, the interplay between the functional capacity of gut microbiota and their host remains elusive for DKD. This study aims to elucidate the functional capability of gut microbiota to affect kidney function of DKD patients. A total of 990 subjects were enrolled consisting of a control group (n = 455), a type 2 diabetes mellitus group (DM, n = 204), a DKD group (n = 182) and a chronic kidney disease group (CKD, n = 149). Full-length sequencing of 16S rRNA genes from stool DNA was conducted. Three findings are pinpointed. Firstly, new types of microbiota biomarkers have been created using a machine-learning (ML) method, namely relative abundance of a microbe, presence or absence of a microbe, and the hierarchy ratio between two different taxonomies. Four different panels of features were selected to be analyzed: (i) DM vs. Control, (ii) DKD vs. DM, (iii) DKD vs. CKD, and (iv) CKD vs. Control. These had accuracy rates between 0.72 and 0.78 and areas under curve between 0.79 and 0.86. Secondly, 13 gut microbiota biomarkers, which are strongly correlated with anthropometric, metabolic and/or renal indexes, concomitantly identified by the ML algorithm and the differential abundance method were highly discriminatory. Finally, the predicted functional capability of a DKD-specific biomarker, Gemmiger spp. is enriched in carbohydrate metabolism and branched-chain amino acid (BCAA) biosynthesis. Coincidentally, the circulating levels of various BCAAs (L-valine, L-leucine and L-isoleucine) and their precursor, L-glutamate, are significantly increased in DM and DKD patients, which suggests that, when hyperglycemia is present, there has been alterations in various interconnected pathways associated with glycolysis, pyruvate fermentation and BCAA biosynthesis. Our findings demonstrate that there is a link involving the gut-kidney axis in DKD patients. Furthermore, our findings highlight specific gut bacteria that can acts as useful biomarkers; these could have mechanistic and diagnostic implications.
... Moreover, the integration of multi-omics data (i.e., transcriptomics, proteomics, and metabolomics of strains and communities) or complementary modeling approaches (e.g., kinetic/regulatory network models, dynamic models) can help to further refine models of microbial community dynamics. Finally, machine learning and artificial intelligence (AI) should be applied to confidently identify key patterns without the need to fully resolve the complete structure of the interaction network [97]. Given these exciting opportunities, it is likely that the next years will see a major transition in our ability to unravel structure-function relationships in microbial communities. ...
The structure and function of microbial communities is shaped by intricate ecological interactions amongst the constituent microorganisms. Thus, a mechanis-tic understanding of emergent community-level functions requires knowledge on how the architecture of the underlying interaction networks affects these properties. To address this, researchers employ different sequencing-based and experimental approaches to infer the topology of a given network. However, it remains generally unclear which method is best suited for quantifying critical network parameters. Here we provide a comparative overview of different approaches serving this purpose, with particular emphasis on their strengths and weaknesses. In this way, our work can help to guide the design of studies that aim at unraveling structure-function relationships in microbial communities.
... AI models like DeepMicro leverage high-throughput sequencing data to identify patterns, classify microorganisms, and forecast disease associations [55]. For example, Meta-Spec combines host and microbial data to illustrate the relationship between particular microbiome patterns and specific diseases [56]. Moreover, machine learning methods such as Random Forest and Support Vector Machines have been utilized to detect microbial signatures linked to different health conditions, enhancing diagnostic precision [57]. ...
CRC remains a significant public health challenge due to its high prevalence and mortality rates. Emerging evidence highlights the critical role of the gut microbiota in both the pathogenesis of CRC and the efficacy of treatment strategies, including chemotherapy and immunotherapy. Dysbiosis, characterized by imbalances in microbial communities, has been implicated in CRC progression and therapeutic outcomes. This review examines the intricate relationship between gut microbiota composition and CRC, emphasizing the potential for microbial profiles to serve as biomarkers for early detection and prognosis. Various interventions, such as prebiotics, probiotics, postbiotics, fecal microbiota transplantation, and dietary modifications, aim to restore microbiota balance and shift dysbiosis toward eubiosis, thereby improving health outcomes. Additionally, the integration of microbial profiling into clinical practice could enhance diagnostic capabilities and personalize treatment strategies, advancing the field of oncology. The study of intratumoral microbiota offers new diagnostic and prognostic tools that, combined with artificial intelligence algorithms, could predict treatment responses and assess the risk of adverse effects. Given the growing understanding of the gut microbiome–cancer axis, developing microbiota-oriented strategies for CRC prevention and treatment holds promise for improving patient care and clinical outcomes.
... While metagenomic tools can accurately identify the exact composition of a small number of gut microbiomes, the computational complexity required to process the large datasets that describe vast numbers of samples is still too high [59]. One potential method of characterising and predicting the composition of a large number of gut microbiomes in a more computationally affordable yet coarse-grained way involves the use of machine learning (ML) techniques to make several simplifying assumptions [20] due to their ability to identify patterns in large sets of data without being explicitly programmed to do so [20]. From these patterns, simplified assumptions can be made to construct models that can estimate the microbiome of a sample based on several input parameters. ...
... While metagenomic tools can accurately identify the exact composition of a small number of gut microbiomes, the computational complexity required to process the large datasets that describe vast numbers of samples is still too high [59]. One potential method of characterising and predicting the composition of a large number of gut microbiomes in a more computationally affordable yet coarse-grained way involves the use of machine learning (ML) techniques to make several simplifying assumptions [20] due to their ability to identify patterns in large sets of data without being explicitly programmed to do so [20]. From these patterns, simplified assumptions can be made to construct models that can estimate the microbiome of a sample based on several input parameters. ...
While current efforts to control agricultural insect pests largely focus on the widespread use of insecticides, predicting microbiome composition can provide important data for creating more efficient and long-lasting pest control methods by analysing the pest’s food-digesting capacity and resistance to bacteria or viruses. We aim to develop a machine learning model to predict the microbiome composition in agricultural pests and investigate the dynamics of these microbiome compositions using metagenomic samples taken from fruit flies. In this paper, we propose three machine learning-based biological models. Firstly, we propose an intrafamilial model that predicts the relative abundance of bacterial families within themselves using their past generations. Next, we propose two interfamilial models following quantitative and qualitative approaches. The quantitative model predicts the number of bacterial families in a given sample based on the presence of other families in that sample. The qualitative model predicts the relative abundance using binary information of all bacterial families. All three models were tested against least angle regression, random forest, elastic-net, and Lasso. The third approach exhibits promising results by applying a random forest with the lowest mean coefficient of variance of 1.25. The overall results of this study highlight how complex these dynamic systems are and demonstrate that more computationally efficient methods can characterise them quickly. The results of this study are intended to be used as a tool to identify vital taxological families, genera and species of the potential microbiome for better pest control.
... Additionally, ML has been instrumental in predicting sample origins, as demonstrated in global studies like MetaSUB (The International MetaSUB Consortium, 2021). By integrating resistome, virome, and mobilome datasets, ML provides a comprehensive view of microbial ecosystems (Medina et al., 2022;Bhattacharya et al., 2022). Its application in this work not only enhances predictive precision but also underscores ML's critical role in advancing environmental microbiology and shaping public health strategies. ...
Antimicrobial resistance (AMR) is a growing global health concern, driven by urbanization and anthropogenic activities. This study investigated AMR distribution and dynamics across microbiomes from six U.S. cities, focusing on resistomes, viromes, and mobile genetic elements (MGEs). Using metagenomic data from the CAMDA 2023 challenge, we applied tools such as AMR++, Bowtie, AMRFinderPlus, and RGI for resistome profiling, along with clustering, normalization, and machine learning techniques to identify predictive markers. AMR++ and Bowtie outperformed other tools in detecting diverse AMR markers, with binary normalization improving classification accuracy. MGEs were found to play a critical role in AMR dissemination, with 394 genes shared across all cities. Removal of MGE-associated AMR genes altered resistome profiles and reduced model performance. The findings reveal a heterogeneous AMR landscape in urban microbiomes, particularly in New York City, which showed the highest resistome diversity. These results underscore the importance of MGEs in AMR profiling and provide valuable insights for designing targeted strategies to address AMR in urban settings.
... Therefore, the aim of this manuscript is to develop a more comprehensive understanding of how various deep learning architectures can improve our insights into microbiome dynamics, functions, and interactions within microbial communities and with hosts. The paper surpasses previous reviews focused on ML techniques that merely describe deep learning approaches for the analysis of microbiome datasets (Hernández Medina et al., 2022;Geman et al., 2018;Mathieu et al., 2022;LaPierre et al., 2019;Deng et al., 2021;Roy et al., 2024). It introduces non-specialized readers without background technical knowledge to a clear understanding of various deep learning architectures, along with their specific applications in microbiome analysis, illustrated by diverse examples and schemes. ...
... Finally, simulated data are cheap to generate compared to mock or biological data. However generating the realist dataset is challenging as methods need to take into account the characteristics of the microbiome data such as correlation between taxa, sparsity, overdispersion, and compositionality (He et al., 2024). Ideally, for benchmarking purposes, various different datasets should be analyzed, as different sample types (for example gut vs. soil) can be characterized by different microbial diversity. ...
Microbiome research, the study of microbial communities in diverse environments, has seen significant advances due to the integration of deep learning (DL) methods. These computational techniques have become essential for addressing the inherent complexity and high-dimensionality of microbiome data, which consist of different types of omics datasets. Deep learning algorithms have shown remarkable capabilities in pattern recognition, feature extraction, and predictive modeling, enabling researchers to uncover hidden relationships within microbial ecosystems. By automating the detection of functional genes, microbial interactions, and host-microbiome dynamics, DL methods offer unprecedented precision in understanding microbiome composition and its impact on health, disease, and the environment. However, despite their potential, deep learning approaches face significant challenges in microbiome research. Additionally, the biological variability in microbiome datasets requires tailored approaches to ensure robust and generalizable outcomes. As microbiome research continues to generate vast and complex datasets, addressing these challenges will be crucial for advancing microbiological insights and translating them into practical applications with DL. This review provides an overview of different deep learning models in microbiome research, discussing their strengths, practical uses, and implications for future studies. We examine how these models are being applied to solve key problems and highlight potential pathways to overcome current limitations, emphasizing the transformative impact DL could have on the field moving forward.
... In summary, autoencoders and other ANN architectures have proven to be efficient in handling the complex and diverse characteristics of microbiome data in medical science. This is critical for accurately predicting phenotypes and identifying biomarkers, as demonstrated by Hernández Medina et al. (2022) [149]. Expanding the use of ML techniques from medical to agricultural microbiome analysis presents numerous challenges, including data collection, sparsity, and biomarker identification. ...
... In summary, autoencoders and other ANN architectures have proven to be efficient in handling the complex and diverse characteristics of microbiome data in medical science. This is critical for accurately predicting phenotypes and identifying biomarkers, as demonstrated by Hernández Medina et al. (2022) [149]. Expanding the use of ML techniques from medical to agricultural microbiome analysis presents numerous challenges, including data collection, sparsity, and biomarker identification. ...
Soil is a depletable and non-renewable resource essential for food production, crop growth, and supporting ecosystem services, such as the retaining and cycling of various elements, including water. Therefore characterization and preservation of soil biological health is a key point for the development of sustainable agriculture. We conducted a comprehensive review of the use of Artificial Intelligence (AI) techniques to develop forecasting models based on soil microbiota data able to monitor and predict soil health. We also investigated the potentiality of AI-based Decision Support Systems (DSSs) for improving the use of microorganisms to enhance soil health and fertility. While available studies are limited, potential applications of AI seem relevant to develop predictive models for soil fertility, based on its biological properties and activities, and implement sustainable precision agriculture, safeguarding ecosystems, bolstering soil resilience, and ensuring the production of high-quality food.
... This research area has a broad scope, encompassing applications such as age prediction [21][22][23][24], health status assessment [25][26][27][28][29][30][31][32], and microbial source tracking [33][34][35][36][37] (Supplementary Table 1). These specific predictions require a large number of microbiome samples and sophisticated methods to extract meaningful patterns for predictive purposes [38]. ...
The volume of microbiome data is growing at an exponential rate, and the current methodologies for big data mining are encountering substantial obstacles. Effectively managing and extracting valuable insights from these vast microbiome datasets has emerged as a significant challenge in the field of contemporary microbiome research. This comprehensive review delves into the utilization of foundation models and transfer learning techniques within the context of microbiome-based classification and prediction tasks, advocating for a transition away from traditional task-specific or scenario-specific models towards more adaptable, continuous learning models. The article underscores the practicality and benefits of initially constructing a robust foundation model, which can then be fine-tuned using transfer learning to tackle specific context tasks. In real-world scenarios, the application of transfer learning empowers models to leverage disease-related data from one geographical area and enhance diagnostic precision in different regions. This transition from relying on "good models" to embracing "adaptive models" resonates with the philosophy of “teaching a man to fish” thereby paving the way for advancements in personalized medicine and accurate diagnosis. Empirical research suggests that the integration of foundation models with transfer learning methodologies substantially boosts the performance of models when dealing with large-scale and diverse microbiome datasets, effectively mitigating the challenges posed by data heterogeneity.
... Deep learning models go further by integrating temporal data, identifying prognostic biomarkers, and differentiating between inflammatory skin conditions with overlapping clinical features. These computational tools enable clinicians to tailor interventions precisely, reducing trial-and-error prescribing and improving outcomes [10,11,89]. ...
... As these biomarkers are validated, they can be incorporated into clinical decision support systems, allowing physicians to predict treatment responses with greater confidence and refine therapeutic protocols for individual patients. Over time, these advancements may improve the precision and personalization of dermatological care, particularly for individuals with underlying metabolic dysfunctions [89][90][91]. ...
Metabolic disorders, including type 2 diabetes mellitus (T2DM), obesity, and metabolic syndrome, are systemic conditions that profoundly impact the skin microbiota, a dynamic community of bacteria, fungi, viruses, and mites essential for cutaneous health. Dysbiosis caused by metabolic dysfunction contributes to skin barrier disruption, immune dysregulation, and increased susceptibility to inflammatory skin diseases, including psoriasis, atopic dermatitis, and acne. For instance, hyperglycemia in T2DM leads to the formation of advanced glycation end products (AGEs), which bind to the receptor for AGEs (RAGE) on keratinocytes and immune cells, promoting oxidative stress and inflammation while facilitating Staphylococcus aureus colonization in atopic dermatitis. Similarly, obesity-induced dysregulation of sebaceous lipid composition increases saturated fatty acids, favoring pathogenic strains of Cutibacterium acnes, which produce inflammatory metabolites that exacerbate acne. Advances in metabolomics and microbiome sequencing have unveiled critical biomarkers, such as short-chain fatty acids and microbial signatures, predictive of therapeutic outcomes. For example, elevated butyrate levels in psoriasis have been associated with reduced Th17-mediated inflammation, while the presence of specific Lactobacillus strains has shown potential to modulate immune tolerance in atopic dermatitis. Furthermore, machine learning models are increasingly used to integrate multi-omics data, enabling personalized interventions. Emerging therapies, such as probiotics and postbiotics, aim to restore microbial diversity, while phage therapy selectively targets pathogenic bacteria like Staphylococcus aureus without disrupting beneficial flora. Clinical trials have demonstrated significant reductions in inflammatory lesions and improved quality-of-life metrics in patients receiving these microbiota-targeted treatments. This review synthesizes current evidence on the bidirectional interplay between metabolic disorders and skin microbiota, highlighting therapeutic implications and future directions. By addressing systemic metabolic dysfunction and microbiota-mediated pathways, precision strategies are paving the way for improved patient outcomes in dermatologic care.