ArticleLiterature Review

Shotgun metagenomics, from sampling to analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Diverse microbial communities of bacteria, archaea, viruses and single-celled eukaryotes have crucial roles in the environment and in human health. However, microbes are frequently difficult to culture in the laboratory, which can confound cataloging of members and understanding of how communities function. High-throughput sequencing technologies and a suite of computational pipelines have been combined into shotgun metagenomics methods that have transformed microbiology. Still, computational approaches to overcome the challenges that affect both assembly-based and mapping-based metagenomic profiling, particularly of high-complexity samples or environments containing organisms with limited similarity to sequenced genomes, are needed. Understanding the functions and characterizing specific strains of these communities offers biotechnological promise in therapeutic discovery and innovative ways to synthesize products using microbial factories and can pinpoint the contributions of microorganisms to planetary, animal and human health.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Regarding sample collection and preparation, it is important to note that protocols for this step can affect the accuracy of the metagenomic data, and as such, collection and storage methods that are validated for a specific sample type cannot be used for other sample types with full confidence [75]. The key objective of this step is to collect sufficient microbial biomass that represents the diversity of the environment being studied [76]. Once a sufficient sample has been taken it must undergo processing to isolate and extract the DNA contained within it. ...
... Library preparation and DNA sequencing occur next and are chosen based on several variables such as cost, availability of required materials/services, DNA sample quantification, etc. [76]. The four basic steps of library preparation are as follows: DNA fragmentation, adapter sequences, size selection, and final library quantification and QC [75]. ...
... These fragments can then be sequenced on next-generation sequencing (NGS) technology. Several companies have their own library preparation and sequencing kits, such as Illumina DNA prep (known previously as Nextra DNA Flex) and TruSeq DNA PCR-free, which is predominant in WGS studies, and both have been recognized as the industry standard; however, several alternatives are available such as Ion Torrent S5 along with Oxford Nanopore MiniON and Pacific Biosciences Sequel [76,81,82]. ...
Article
Full-text available
Bioinformatic methodologies play a crucial role in the assessment of gut microbiota, offering advanced tools for analyzing complex microbial communities. These methodologies involve high-throughput sequencing technologies, such as 16S rRNA gene sequencing and metagenomics, which generate vast amounts of data on microbial diversity and functional potential, as well as whole-genome sequencing, which, while being more costly, has a more expansive potential. Bioinformatics tools and algorithms process these data to identify microbial taxa and quantify and elucidate their roles within the microbiome. Advanced statistical and computational models further enable the identification of microbiota patterns associated with various diseases and health conditions. Overall, bioinformatic approaches are essential for deciphering the complexities of gut microbiota so that, in the future, we may be able to discover treatments and technologies aimed at restoring or optimizing the microbiome. The goal of this review is to describe the differences in methodology and utilization of 16S versus whole-genome sequencing to address the increased understanding of the role that the gut microbiome plays in human physiology and pathology.
... Advances in genomic technologies have created an explosion of biological data that offers unparalleled opportunities to investigate the mechanisms and spread of resistance at a molecular level [9]. Millions of genomic data have been generated by high-throughput sequencing and metagenomic analyses from clinical isolates, environmental samples, and microbiomes [10]. The analysis of genetic material recovered directly from environmental samples through metagenomics will avoid the need for culturing [11]. ...
... To understand the underlying structure of the data and the clustering outcomes, PCA was performed to reduce the dimensionality of the dataset from two features to two principal components [10]. The first two principal components captured a significant portion of the variance in the data, enabling effective visualization. ...
... PCA was employed to reduce the dimensionality of the data and facilitate visualization [10]. This analysis transforms the original variables into a new set of uncorrelated variables called principal components, ordered by the amount of variance they capture from the data [22]. ...
Article
Full-text available
Citation: Sakagianni, A.; Koufopoulou, C.; Koufopoulos, P.; Kalantzi, S.; Theodorakis, N.; Nikolaou, M.; Paxinou, E.; Kalles, D.; Verykios, V.S.; Myrianthefs, P.; et al. Abstract: Background/Objectives: The emergence of antimicrobial resistance (AMR) due to the misuse and overuse of antibiotics has become a critical threat to global public health. There is a dire need to forecast AMR to understand the underlying mechanisms of resistance for the development of effective interventions. This paper explores the capability of machine learning (ML) methods, particularly unsupervised learning methods, to enhance the understanding and prediction of AMR. It aims to determine the patterns from AMR gene data that are clinically relevant and, in public health, capable of informing strategies. Methods: We analyzed AMR gene data in the PanRes dataset by applying unsupervised learning techniques, namely K-means clustering and Principal Component Analysis (PCA). These techniques were applied to identify clusters based on gene length and distribution according to resistance class, offering insights into the resistance genes' structural and functional properties. Data preprocessing, such as filtering and normalization, was conducted prior to applying machine learning methods to ensure consistency and accuracy. Our methodology included the preprocessing of data and reduction of dimensionality to ensure that our models were both accurate and interpretable. Results: The unsupervised learning models highlighted distinct clusters of AMR genes, with significant patterns in gene length, including their associated resistance classes. Further dimensionality reduction by PCA allows for clearer visualizations of relationships among gene groupings. These patterns provide novel insights into the potential mechanisms of resistance, particularly the role of gene length in different resistance pathways. Conclusions: This study demonstrates the potential of ML, specifically unsupervised approaches, to enhance the understanding of AMR. The identified patterns in resistance genes could support clinical decision-making and inform public health interventions. However, challenges remain, particularly in integrating genomic data and ensuring model interpretability. Further research is needed to advance ML applications in AMR prediction and management.
... Это особая обширная область научных исследований со своей терминологией, методологией и даже философией. Для исследования клинических образцов материала из МВП (в основном, моча, реже пунктат) реализуют два основных подхода [14][15][16]. Первый состоит в выявлении конкретных родов и видов микроорганизмов с помощью ПЦР-секвенирования определенных вариабельных участков 16S рРНК (поиск ампликонов, заданных праймерами). Для исследования берется материал из колоний с последующим проведением расширенного культурального метода EQUC. ...
... Этап математической обработки полученных данных в рамках принятых в настоящее время биоинформационных подходов также не лишен «подводных камней». Результаты расчетов целиком определяются тем, на основании каких баз данных производился анализ, при этом дело даже не в полноте той или иной базы, а в принципах и методах формирования, глубине обработки данных и т. д. [16]. К тому же, несопоставимость результатов «отягощается» отсутствием свободного доступа ко всем базам. ...
Article
В обзоре представлены последние данные о микробиоме мочевыводящих путей (МВП). Наличие резидентного микробиома МВП стало неожиданным открытием последних лет. Было обнаружено, что среди постоянных обитателей МВП преобладают медленно растущие трудно культивируемые виды, в том числе анаэробные бактерии. В связи с этим назрела насущная необходимость в пересмотре ряде аспектов общепринятых схем диагностики, картины патогенеза, терапевтических подходов в лечении инфекций мочевыводящих путей, ранее опиравшихся на представление о стерильности МВП. Разрабатывается концепция «здорового микробиома» МВП. Получила очередное подтверждение необходимость сопоставления микробиологических находок с клинической картиной заболеваний. Показано, что изменения в микробном сообществе кишечника напрямую влияют на микрофлору МВП. Поэтому одним из перспективных подходов терапии хронических инфекций мочевыводящих путей является создание сложных многокомпонентных синбиотиков, содержащих нормальных обитателей кишечного тракта и необходимые для их роста субстраты. Уже сейчас при комплексном лечении хронических инфекций мочевыводящих путей используют пробиотики, фекальную трансплантацию и специальные диеты. Лавинообразный рост числа работ по изучению микробиомов человека позволяет ожидать значительный прогресс в области практического применения результатов этих исследований в ближайшее время.
... Shotgun metagenomics, the sequencing of all microbial genomes in a sample, has allowed for unprecedented insight into microbial communities without the need for cultivating those communities in a wet lab 1,2 . Computational analyses often proceed by assembling metagenome-assembled genomes (MAGs) or profiling the reads against a database of reference sequences. ...
... We can use this estimator once we know λ. In practice, we use the λ estimate in equation (1) to first estimate λ, because this estimate does not depend on τ, and then we plug in the estimate λ into equation (3) to get our final ANI estimate τ . Note that the λ-adjusted ANI in equation (3) could be >1, in which case we threshold τ = 1. ...
Article
Full-text available
Profiling metagenomes against databases allows for the detection and quantification of microorganisms, even at low abundances where assembly is not possible. We introduce sylph, a species-level metagenome profiler that estimates genome-to-metagenome containment average nucleotide identity (ANI) through zero-inflated Poisson k-mer statistics, enabling ANI-based taxa detection. On the Critical Assessment of Metagenome Interpretation II (CAMI2) Marine dataset, sylph was the most accurate profiling method of seven tested. For multisample profiling, sylph took >10-fold less central processing unit time compared to Kraken2 and used 30-fold less memory. Sylph’s ANI estimates provided an orthogonal signal to abundance, allowing for an ANI-based metagenome-wide association study for Parkinson disease (PD) against 289,232 genomes while confirming known butyrate–PD associations at the strain level. Sylph took <1 min and 16 GB of random-access memory to profile metagenomes against 85,205 prokaryotic and 2,917,516 viral genomes, detecting 30-fold more viral sequences in the human gut compared to RefSeq. Sylph offers precise, efficient profiling with accurate containment ANI estimation even for low-coverage genomes.
... The microbiome can be profiled using high-throughput sequencing technologies that focus on specific marker genes, such as the 16S ribosomal RNA (rRNA) gene for bacteria (Jovel, 2016), or via metagenomic sequencing that analyzes the entire genetic content of microbial communities (Quince et al., 2017). Targeted 16S rRNA gene amplicon sequencing is widely used due to its cost-effectiveness and ability to classify bacteria at various taxonomic levels. ...
... Metagenomic sequencing captures all the genetic material of the metagenome in a sample, allowing for species-level identification and insights into the functional potential of the microbiome. However, this method is more expensive, requires more complex data analysis, and can be influenced by the presence of host DNA (Quince et al., 2017). In addition, a lack of standardized amplification and sequencing protocols, and universally accepted databases, combined with inconsistent analytical methods, complicates the study of the non-bacterial components of the microbiome, contributing to the relative scarcity of data on these microbial members (Zhou et al., 2022). ...
... These include ecosystems with low microbial densities, such as the skin, airway system, and urogenital tract [82]. However, current technologies face challenges, including sequencing errors, biases in data interpretation, and incomplete reference databases, which may hinder the accuracy and comprehensiveness of microbial profiling [83]. ...
Article
Full-text available
Although global life expectancy has increased over the past 20 years due to advancements in managing infectious diseases, one-fifth of people still die from infections. In response to this ongoing threat, significant efforts are underway to develop vaccines and antimicrobial agents. However, pathogens evolve resistance mechanisms, complicating their control. The COVID-19 pandemic has underscored the limitations of focusing solely on the pathogen-killing strategies of immunology and microbiology to address complex, multisystemic infectious diseases. This highlights the urgent need for practical advancements, such as microbiome therapeutics, that address these limitations while complementing traditional approaches. Our review emphasizes key outcomes in the field, including evidence of probiotics reducing disease severity and insights into host-microbiome crosstalk that have informed novel therapeutic strategies. These findings underscore the potential of microbiome-based interventions to promote physiological function alongside existing strategies aimed at enhancing host immune responses and pathogen destruction. This narrative review explores microbiome therapeutics as next-generation treatments for infectious diseases, focusing on the application of probiotics and their role in host-microbiome interactions. While offering a novel perspective grounded in a cooperative defense system, this review also addresses the practical challenges and limitations in translating these advancements into clinical settings.
... In addition, another promising line is the use of functional genes as indicators using metagenomic data. This methodology provides access to the genomic content of communities of microorganisms in the environment and allows the characterization of the gene expression potential of environmental samples and sequences not represented in databases (Quince et al. 2017). Recent research efforts on metagenomics have focused on monitoring pollution in aquatic ecosystems using biomarkers of aquatic toxicology monitoring (Chen et al. 2015;Behera et al. 2020). ...
... While in fecal samples, our group usually analyzed the whole 16S rRNA gene (approximately 1500 bp) from fecal samples [88], after several attempts in tissue samples, the maximum amplicon size reached was approximately 450 bp corresponding V3-V4 region of the 16SrRNA gene. Similar limitations have been observed when investigating mucosal microbiota with shotgun metagenomics [89]. We hypothesized that studying the complete 16SrRNA gene amplicon in tissue samples is unfeasible due to human DNA's high presence and the target gene's fragmentation. ...
Article
Full-text available
Background/Objective: Colorectal cancer (CRC) is one of the most common cancers worldwide. Increasing scientific evidence supports the idea that gut microbiota dysbiosis accompanies colorectal tumorigenesis, and these changes could be causative. Implementing gut microbiota analysis in clinical practice is limited by sample type, sequencing platform and taxonomic classification. This article aims to address these limitations, providing new insights into the microbiota associated with CRC pathogenesis and implementing its analyses in personalized medicine. Methods: To that aim, we evaluate differences in the bacterial composition of 130 paired tumor and non-tumor adjacent tissues from a cohort of CRC patients from the Biobank of the University of Navarra, Spain. The V3–V4 region of the 16S rRNA gene was amplified, sequenced using the MinION platform, and taxonomically classified using the NCBI database. Results: To our knowledge, this is the first study to report an increased relative abundance of Streptococcus periodonticum and a decreased relative abundance of Corynebacterium associated with CRC. Genera such as Fusobacterium, Leptotrichia and Streptococcus showed higher relative abundances in tumor than in non-tumor tissues, as previously described in the literature. Specifically, we identified higher levels of Fusobacterium animalis, Fusobacterium nucleatum, Fusobacterium polymorphum and S. periodonticum in tumor tissues. In contrast, genera such as Bacteroides and Corynebacterium showed lower relative abundances in tumor tissues. There were also differences at the taxonomic level between tumor locations. Conclusions: These results, consistent with previous studies, further support the hypothesis that Leptotrichia and Fusobacterium contribute to CRC progression, with F. nucleatum and F. animalis proposed as key CRC pathogenic taxa. Overall, these results contribute to a better understanding of the CRC-associated microbiota, addressing critical barriers to its implementation in personalized medicine.
... 6,7 It involves the extraction, cloning, functional screening, and direct random shotgun sequencing of the entire genetic complement of habitat. [8][9][10] The uncultured microorganism's genomes comprise vast quantities of data, and one of the most advanced methods to discover and investigate this potential is metagenomics. Pharmaceuticals, agrochemicals, and fine chemicals are all produced using metagenomics technology, since the advantages of chiral synthesis catalysed by enzymes are becoming more widely acknowledged. ...
Article
Full-text available
The quantity and diversity of the microbial community in soil make it possibly the most difficult of all the natural ecosystems. It is thought to be challenging to culture up to 99% of the microorganisms in a given environment. The intricacy of microbial variety is impacted by numerous interconnected factors, including as soil structure, water content, biotic activity, pH, and fluctuations in climate. Environmental DNA isolation and purification are often the first steps in the soil metagenomic analysis process. Creating genomic DNA libraries and then using them for high-throughput sequencing or library screening are the main steps in the application of metagenomics. These genomic sequences are currently being used to advance our knowledge of the ecology and physiology of these bacteria as well as for new biotechnological and medicinal applications. To completely comprehend the intricacies involved in the operation of microbial communities and the interactions between different microorganisms within specific niches, metagenomic sequences are employed. This study focuses on the latest advancements in biotechnological approaches and procedures for identifying novel genes from uncultured microorganisms and intricate microbial habitats.
... Recent developments in microbiome research utilizing cutting-edge microbiological technologies, such as whole genome sequencing and multi-omics analysis, have revealed the role of dysbiosis in the pathophysiology of IBS [43,44]. Approximately 10% of IBS cases have been reported following episodes of gastroenteritis, leading to PI-IBS [45]. ...
Article
Full-text available
Irritable bowel syndrome (IBS) is one of the most prevalent functional gastrointestinal disorders characterized by recurrent abdominal pain and altered bowel habits. The exact pathophysiological mechanisms for IBS development are not completely understood. Several factors, including genetic predisposition, environmental and psychological influences, low-grade inflammation, alterations in gastrointestinal motility, and dietary habits, have been implicated in the pathophysiology of the disorder. Additionally, emerging evidence highlights the role of gut microbiota in the pathophysiology of IBS. This review aims to thoroughly investigate how alterations in the gut microbiota impact physiological functions such as the brain–gut axis, immune system activation, mucosal inflammation, gut permeability, and intestinal motility. Our research focuses on the dynamic “microbiome shifts”, emphasizing the enrichment or depletion of specific bacterial taxa in IBS and their profound impact on disease progression and pathology. The data indicated that specific bacterial populations are implicated in IBS, including reductions in beneficial species such as Lactobacillus and Bifidobacterium, along with increases in potentially harmful bacteria like Firmicutes and Proteobacteria. Emphasis is placed on the imperative need for further research to delineate the role of specific microbiome alterations and their potential as therapeutic targets, providing new insights into personalized treatments for IBS.
... Various omics technologies are widely used to understand the genetic makeup of pathogens. Shotgun and amplicon-based metagenomics have numerous applications, particularly in detecting airborne and soilborne pathogens such as Magnaporthe, Rhizoctonia, Fusarium, Verticillium, Zymoseptoria, Sclerotinia, Pythium, and Phytophthora, which affect many major crops (Català et al., 2015;Islam et al., 2016;Quince et al., 2017;Keepers et al., 2019). These pathogens are among the top ten economically important due to the severe yield losses they cause, which can reach up to 100%. ...
Article
Full-text available
Plant diseases are responsible for 20–40% of global crop yield losses, posing a significant threat to food security in the face of an ever-growing population. Genomic surveillance emerges as a powerful tool for diagnosing, early warning, and mitigating emerging plant diseases. This approach provides molecular insights into plant-pathogen interactions, essential for developing durable management strategies. Various omics techniques, including metagenomics, are employed in genomic surveillance to systematically monitor and analyze pathogen genomes. These analyses enable early detection of emerging threats, characterization of pathogen populations, tracking of pathogen movement, and accurate prediction of disease outbreaks. Genomic data serve as the foundation for point-of-care disease management using genome-specific primers and CRISPR technology. Despite its significant advantages, genomic surveillance faces challenges such as data analysis complexity, protocol standardization, ethical considerations, and technology accessibility. Key strategies to address these challenges include open data sharing, open science, and international collaboration. Recent advancements in sequencing technologies, bioinformatics tools, and collaborative networks offer promising solutions to these challenges, enhancing the potential of genomic surveillance in plant pathology. This comprehensive review updates the current progress and future prospects of genomic surveillance in disease detection and sustainable plant health management. It critically discusses the challenges of large-scale application and explores mitigation strategies through open data sharing, open science, and international collaboration.
... Although there have been selective studies of the rice rhizosphere [15,16], the overall pattern of the genomic and functional composition, especially for nutrient cycles, of the rice rhizosphere microbial community across soil salinity and rice varieties remains largely unexplored. Metagenomics-based taxonomic and functional gene assemblages may reveal both taxonomic abundances of all taxa within the community and functional genes present in the rice root biota [17,18]. ...
Preprint
Full-text available
Rice rhizosphere microbiota plays a crucial role in crop yield and abiotic stress tolerance. However, little is known about how the composition and function of rhizosphere soil microbial communities respond to soil salinity, alkalinity, and rice variety in rice paddy ecosystems. In this study, we analyzed the composition and function of rhizosphere soil microbial communities associated with two rice varieties (Jida177 and Tongxi933) cultivated in soils with different levels of salinity-alkalinity in Northeast China using a metagenomics approach. Our results indicate that the rhizospheres of Jida177 and Tongxi933 rice varieties harbor distinct microbial communities, and these microbial communities are differentiated based on both soil salinity-alkalinity and rice varieties. Furthermore, the observed differences in rice yield and grain quality between the Jida177 and Tongxi933 rice varieties suggest that these changes may be attributed to alterations in the rhizosphere microbiome under varying salinity conditions. These findings may pave the way for more efficient soil management and deeper understanding of the potential effects of soil salinization on the rice rhizosphere system.
... Whole community fingerprinting, including the evolving classifications accuracy of "next-generation" sequencing and "long-read" sequencing (Tedersoo et al. 2021;Satam et al. 2023), relies on high-quality DNA suitable for library preparation followed by sequencing (Endrullat et al. 2016;Bag et al. 2016;Costa et al. 2021). Community profiling by amplicon sequencing or shotgun sequencing has been commonplace for bacterial communities and is of increasing interest for larger-genome fungal communities (Escobar-Zepeda, Vera-Ponce De León, and Sanchez-Flores 2015; Quince et al. 2017;Donovan et al. 2018;Nilsson et al. 2019). To obtain suitable DNA, methods must be evaluated and optimized regarding their yields, integrity, and purity. ...
Article
Full-text available
As technologies advance alongside metabarcoding and metagenomic resources, particularly for larger fungal genomes, DNA extraction methods must be optimized to meet higher thresholds, especially from complex environmental substrates. This study focused on extracting fungal genomic compounds from woody substrates, a challenge due to the embedment of endophytic and saprotrophic fungi within wood cells, the physical recalcitrance of wood, the adsorption of nucleic acids to wood polymers, and the release of downstream inhibitors. Hypothesizing that cetyltrimethylammonium bromide would be the best option, we compared prominent methods by extracting and sequencing microbial DNA from sound and decayed birch (Betula papyrifera) and pine (Pinus resinosa). DNA quantities varied significantly depending on extraction methods and decay stage. The quality of DNA, in terms of purity and integrity, significantly impacted whether the samples could be amplified and sequenced. However, amplicon sequencing of bacterial and fungal communities revealed no significant extraction bias. This, along with the sequencing effectiveness and cost/time efficiency, indicates that Qiagen is the gold standard for woody substrates. This study increases confidence in published amplicon data sets regardless of the extraction methods, provides a cost‐benefit table for making protocol decisions, and offers guidance on fungal DNA extractions from complex organic substrates (sound and decayed wood) that would best suit future metagenomic efforts.
... Shotgun metagenomic sequencing facilitates high-throughput profiling of complex microbial communities in environmental samples. [1][2][3] Applied to the human gut microbiome, metagenomics reveals its structure, function, and variations 4-6 as well as its associations with host physiologies, including diseases, immune function, and response to cancer therapy. [7][8][9][10][11] However, the microbial profiles obtained from metagenomic analysis are inherently compositional, with the abundance of each microbial species represented in relative proportions (fraction of total reads). ...
... These findings provide deeper insights into the gene regulatory dynamics influenced by maternal effects throughout different developmental stages. The 'Toll-like receptor signaling pathway' plays a crucial role in recognizing pathogen-associated molecular patterns (PAMPs) and initiating immune responses [41] . During the maternal and offspring stages, exposure to external environments increases the likelihood of encountering bacteria and viruses, which in turn activates the expression of relevant characteristic genes involved in this pathway [42] . ...
Article
Full-text available
Chickens are important breeding animals and models for biomedical research, particularly due to their oviparous nature, which makes it an ideal subject for studying maternal effects. This study employs RNA-Seq to conduct a comprehensive analysis of the transcriptomics of the poultry liver, with a focus on maternal transgenerational effects. Samples were examined from broiler breeders, E19 embryos, and 21-day-old offspring, identifying 2,753 DEGs. GO analysis revealed significant enrichment of differentially expressed RNAs in functions such as actin filament binding and lysosomal activity. KEGG analysis identified pathways associated with endocytosis and Toll-like receptor signaling, displaying a high-low-high expression pattern across the broiler breeders, embryos, and offspring, which is closely linked to immune function regulation. Conversely, the Neuroactive ligand-receptor interaction and Calcium signaling exhibited a low-high-low expression pattern, which is intimately associated with organogenesis, and embryonic development. Additionally, based on DEGs, genes such as IGF1, IGFBP, FASN, and ELOVL were identified, which are significantly expressed in embryos and are crucial for development and lipid metabolism regulation. In summary, the present research provides a valuable transcriptional regulatory network for studying maternal effects on liver tissue development in broiler breeders, laying a foundation for further exploration of the molecular mechanisms underlying maternal effects.
... While most studies rely on the former for taxonomic classi cation and quanti cation of bacterial and archaeal taxa (Gotschlich et al., 2019), 16S rRNA sequencing does not provide direct information of microbial gene contents, which prevents direct inference of metabolic functions (Langille et al., 2013). In consequence, genome-resolved metagenomics (GRM) derived from shotgun sequencing is becoming increasingly popular for its enhanced capacity to yield direct information on microbial functional capabilities through the reconstruction of metagenome-assembled genomes (MAGs) (Quince et al., 2017;Durazzi et al., 2021). Recent studies have evidenced its suitability to assess the functional potential of microbial communities and taxonomic identi cation up to species-or strain-level (Cao et al., 2020;Kayani et al., 2022;Koziol et al., 2023). ...
Preprint
Full-text available
Genome-resolved metagenomics, based on shotgun sequencing, has become a powerful strategy for investigating animal-associated microbiomes, due its heightened capability for delivering detailed taxonomic, phylogenetic, and functional insights compared to amplicon sequencing-based approaches. While genome-resolved metagenomics holds promise across various non-lethal sample types, their effectiveness in yielding high-quality metagenome-assembled genomes (MAGs) remains largely unexplored. Our investigation of fecal and cloacal microbiota of the mesquite lizards ( Sceloporus grammicus ) using genome-resolved metagenomics revealed that fecal samples contributed 97% of the 127 reconstructed bacterial genomes, whereas only 3% were recovered from cloacal swabs, which were largely enriched with host DNA. Taxonomic, phylogenetic and functional alpha microbial diversity was greater in fecal samples than in cloacal swabs. We also observed significant differences in microbial community composition between sampling methods, and higher inter-individual variation in cloacal swabs. Bacteroides , Phocaeicola and Parabacteroides (all Bacteroidota) were more abundant in the feces, whereas Hafnia and Salmonella (both Pseudomonadota) increased in the cloaca. Functional analyses showed that metabolic capacities of the microbiota to degrade polysaccharides, sugars and nitrogen compounds were enriched in fecal samples, likely reflecting the role of the microbiota in nutrient metabolism. Overall, our results indicate that fecal samples outperform cloacal swabs in characterizing microbial assemblages within lizards using genome-resolved metagenomics.
... In order to study the role of the oral microbiome in periodontitis, we analyzed shotgun metagenomic sequencing data (11) of the gingival microbiome of 262 samples from healthy individuals and from patients with periodontitis. We profiled subjects with healthy subgingival or supragingival plaque microbiome (n = 214) or with periodonti tis (n = 48). ...
Article
Full-text available
Oral microbial dysbiosis has been associated with periodontitis in studies using 16S rRNA gene sequencing analysis. However, this technology is not sufficient to consistently separate the bacterial species to species level, and reproducible oral microbiome signatures are scarce. Obtaining these signatures would significantly enhance our understanding of the underlying pathophysiological processes of this condition and foster the development of improved therapeutic strategies, potentially personalized to individual patients. Here, we sequenced newly collected samples from 24 patients with periodontitis, and we collected available oral microbiome data from 24 samples in patients with periodontitis and from 214 samples in healthy individuals (n = 262). Data were harmonized, and we performed a pooled analysis of individual patient data. By metagenomic sequencing of the plaque microbiome, we found microbial signatures for periodontitis and defined a periodontitis-related complex, composed by the most discriminative bacteria. A simple two-factor decision tree, based on Tannerella forsythia and Fretibacterium fastidiosum, was associated with periodontitis with high accuracy (area under the curve: 0.94). Altogether, we defined robust oral microbiome signatures relevant to the pathophysiology of periodontitis that can help define promising targets for microbiome therapeutic modulation when caring for patients with periodontitis. IMPORTANCE Oral microbial dysbiosis has been associated with periodontitis in studies using 16S rRNA gene sequencing analysis. However, this technology is not sufficient to consistently separate the bacterial species to species level, and reproducible oral microbiome signatures are scarce. Here, using ultra-deep metagenomic sequencing and machine learning tools, we defined a simple two-factor decision tree, based on Tannerella forsythia and Fretibacterium fastidiosum, that was highly associated with periodontitis. Altogether, we defined robust oral microbiome signatures relevant to the pathophysiology of periodontitis that can help define promising targets for microbiome therapeutic modulation when caring for patients with periodontitis.
... As high-throughput sequencing methods have advanced, metagenomic sequencing and its associated analytical methods and computational tools provide new strategies for studying the metabolic potential, microbial interaction, and viral diversity of microbial communities [21][22][23][24]. The whole metabolic potential of microbial communities has been extensively studied in many natural ecologies, such as mangrove ecosystems [25,26], glacial ecosystems [27], and underground estuaries [28]. ...
Article
Full-text available
Background The ecosystems of marine ranching have enhanced marine biodiversity and ecological balance and have promoted the natural recovery and enhancement of fishery resources. The microbial communities of these ecosystems, including bacteria, fungi, protists, and viruses, are the drivers of biogeochemical cycles. Although seasonal changes in microbial communities are critical for ecosystem functioning, the current understanding of microbial-driven metabolic properties and their viral communities in marine sediments remains limited. Here, we employed amplicon (16S and 18S) and metagenomic approaches aiming to reveal the seasonal patterns of microbial communities, bacterial-eukaryotic interactions, whole metabolic potential, and their coupling mechanisms with carbon (C), nitrogen (N), and sulfur (S) cycling in marine ranching sediments. Additionally, the characterization and diversity of viral communities in different seasons were explored in marine ranching sediments. Results The current study demonstrated that seasonal variations dramatically affected the diversity of microbial communities in marine ranching sediments and the bacterial-eukaryotic interkingdom co-occurrence networks. Metabolic reconstruction of the 113 medium to high-quality metagenome-assembled genomes (MAGs) was conducted, and a total of 8 MAGs involved in key metabolic genes and pathways (methane oxidation - denitrification - S oxidation), suggesting a possible coupling effect between the C, N, and S cycles. In total, 338 viral operational taxonomic units (vOTUs) were identified, all possessing specific ecological characteristics in different seasons and primarily belonging to Caudoviricetes, revealing their widespread distribution and variety in marine sediment ecosystems. In addition, predicted virus-host linkages showed that high host specificity was observed, with few viruses associated with specific hosts. Conclusions This finding deepens our knowledge of element cycling and viral diversity in fisheries enrichment ecosystems, providing insights into microbial-virus interactions in marine sediments and their effects on biogeochemical cycling. These findings have potential applications in marine ranching management and ecological conservation. DRyDF7ykR-XALqAtu34uZtVideo Abstract
... The linear discriminant analysis effect size (LEfSe) based on linear discriminant analysis (LDA) was used to estimate the magnitude of the effect of abundance of each species on the differential effect and to screen for differential species between groups (Quince et al., 2017). We subjected UHF-C, PHF-C, and EHF-C to analysis to assess differential species at different levels of hydrolysis for formulation powders prepared with bovine WPC (Figure 5e,f). ...
Article
Full-text available
Milk protein sensitivity is a major challenge in infant feeding, especially for infants who cannot receive adequate breastfeeding. Hydrolyzed milk protein is a mainstream way to address this difficulty. The aim of this study was to assess the effect of differences in whey protein concentrate (WPC) source and the degree of hydrolysis on blocking allergy and to analyze the possible mechanisms by which hydrolyzed infant formula (IF) blocks allergy through colony‐metabolism–immunity response. First, we prepared six groups of goat's milk IF with unhydrolyzed, partially, and extensively hydrolyzed WPC, which come from cow's milk WPC and goat's milk WPC. Subsequently, we evaluated their effects on allergy. The results showed that the hydrolyzed IF improved the allergic characteristics of mice, including low levels of total immunoglobulin E (IgE), specific IgE, histamine, and mucosal mast cell protease‐1 (mMCP‐1). Furthermore, the hydrolyzed IF promoted the immune response of T helper 1 (Th1) and regulatory T (Treg) cells by enhancing the messenger RNA (mRNA) expression of T‐box transcription factor 21 (T‐bet) and forkhead box protein P3 (Foxp3), which in turn suppressed the T helper 2 (Th2) overexpressed immune response in allergy (GATA‐binding protein 3 (GATA‐3) and retinoic‐acid‐receptor‐related orphan receptor gamma t (RORγt) mRNA expression, as well as interleukin 4 (IL‐4) and interleukin 5 (IL‐5) levels). Hydrolyzed IF promoted an increase in beneficial gut microbe Lactobacillus and Alistipes, which in turn promoted an increase in intestinal butyrate levels. The beneficial bacteria and their metabolized butyrate may have suppressed the abundance of the allergy‐characterizing bacterium Rikenellaceae‐RC9‐gut‐group. The final result we obtained was that for both cow's milk WPC and goat's milk WPC, at similar levels of hydrolysis, they did not bring about a significant effect on allergy symptoms. The hydrolyzed IF improved the allergic characteristics of mice, the deeper the degree of hydrolysis of WPC, the more obvious the effect of reducing allergic symptoms in model mice.
... The ability to reconstruct nearly complete metagenome-assembled genomes (MAGs) from environmental metagenomes has facilitated the recovery of many previously uncharacterized microbes and the expansion of the phylogenetic and functional diversity of uncultivated Bacteria and Archaea [15,16]. Through reconstructed MAGs, members of the newly defined superphylum Patescibacteria, consisting of uncultured, deeply branching lineages in bacteria, have been found in a variety of environments, including groundwater [17], soils [18], marine sediments [19], and the human oral cavity [20]. ...
Article
Full-text available
Background Built environments (BEs) are typically considered to be oligotrophic and harsh environments for microbial communities under normal, non-damp conditions. However, the metabolic functions of microbial inhabitants in BEs remain poorly understood. This study aimed to shed light on the functional capabilities of microbes in BEs by analyzing 860 representative metagenome-assembled genomes (rMAGs) reconstructed from 738 samples collected from BEs across the city of Hong Kong and from the skin surfaces of human occupants. The study specifically focused on the metabolic functions of rMAGs that are either phylogenetically novel or prevalent in BEs. Results The diversity and composition of BE microbiomes were primarily shaped by the sample type, with Micrococcus luteus and Cutibacterium acnes being prevalent. The metabolic functions of rMAGs varied significantly based on taxonomy, even at the strain level. A novel strain affiliated with the Candidatus class Xenobia in the Candidatus phylum Eremiobacterota and two novel strains affiliated with the superphylum Patescibacteria exhibited unique functions compared with their close relatives, potentially aiding their survival in BEs and on human skins. The novel strains in the class Xenobia possessed genes for transporting nitrate and nitrite as nitrogen sources and nitrosative stress mitigation induced by nitric oxide during denitrification. The two novel Patescibacteria strains both possessed a broad array of genes for amino acid and trace element transport, while one of them carried genes for carotenoid and ubiquinone biosynthesis. The globally prevalent M. luteus in BEs displayed a large and open pangenome, with high infraspecific genomic diversity contributed by 11 conspecific strains recovered from BEs in a single geographic region. The versatile metabolic functions encoded in the large accessory genomes of M. luteus may contribute to its global ubiquity and specialization in BEs. Conclusions This study illustrates that the microbial inhabitants of BEs possess metabolic potentials that enable them to tolerate and counter different biotic and abiotic conditions. Additionally, these microbes can efficiently utilize various limited residual resources from occupant activities, potentially enhancing their survival and persistence within BEs. A better understanding of the metabolic functions of BE microbes will ultimately facilitate the development of strategies to create a healthy indoor microbiome. Bh9ATT9ssZEtNaJHv1wg7GVideo Abstract
... Consequently, homology-based approaches often ineffective when dealing with new species. Additionally, the slow data processing speed of homology-based methods severely restricts their utility [27]. ...
Article
Full-text available
Background The rapid advancements in deep neural network models have significantly enhanced the ability to extract features from microbial sequence data, which is critical for addressing biological challenges. However, the scarcity and complexity of labeled microbial data pose substantial difficulties for supervised learning approaches. To address these issues, we propose DNASimCLR, an unsupervised framework designed for efficient gene sequence data feature extraction. Results DNASimCLR leverages convolutional neural networks and the SimCLR framework, based on contrastive learning, to extract intricate features from diverse microbial gene sequences. Pre-training was conducted on two classic large scale unlabelled datasets encompassing metagenomes and viral gene sequences. Subsequent classification tasks were performed by fine-tuning the pretrained model using the previously acquired model. Our experiments demonstrate that DNASimCLR is at least comparable to state-of-the-art techniques for gene sequence classification. For convolutional neural network-based approaches, DNASimCLR surpasses the latest existing methods, clearly establishing its superiority over the state-of-the-art CNN-based feature extraction techniques. Furthermore, the model exhibits superior performance across diverse tasks in analyzing biological sequence data, showcasing its robust adaptability. Conclusions DNASimCLR represents a robust and database-agnostic solution for gene sequence classification. Its versatility allows it to perform well in scenarios involving novel or previously unseen gene sequences, making it a valuable tool for diverse applications in genomics.
... Studying the functional contributions of microorganisms can provide deeper insights into the mechanisms through which microorganisms enhance vertebrates' acclimation capacity (Koziol et al. 2023) and therefore inform strategies to best modulate microbiomes (Shetty et al. 2017). This can be achieved through genomeresolved metagenomics, which allows for the near-complete reconstruction of bacterial genomes from faecal samples (Quince et al. 2017), which can then be functionally annotated to gain insights into their metabolic capacities (Shaffer et al. 2020). The incorporation of RNAseq transcriptomics data not only provides information on functional capacities but also measures the actual functional activity of microorganisms by identifying genes that are differentially expressed (up-or down-regulated) under different conditions. ...
Article
Full-text available
Microorganisms associated with animals harbour a unique set of functional traits pivotal for the normal functioning of their hosts. This realisation has led researchers to hypothesise that animal‐associated microbial communities may boost the capacity of their hosts to acclimatise and adapt to environmental changes, two eco‐evolutionary processes with significant applied relevance. Aiming to assess the importance of microorganisms for wild vertebrate conservation, we conducted a quantitative systematic review to evaluate the scientific evidence for the contribution of gut microorganisms to the acclimation and adaptation capacity of wild vertebrate hosts. After screening 1974 publications, we scrutinised the 109 studies that met the inclusion criteria based on 10 metrics encompassing study design, methodology and reproducibility. We found that the studies published so far were not able to resolve the contribution of gut microorganisms due to insufficient study design and research methods for addressing the hypothesis. Our findings underscore the limited application to date of microbiome knowledge in vertebrate conservation and management, highlighting the need for a paradigm shift in research approaches. Considering these results, we advocate for a shift from observational studies to experimental manipulations, where fitness or related indicators are measured, coupled with an update in molecular techniques used to analyse microbial functions. In addition, closer collaboration with conservation managers and practitioners from the inception of the project is needed to encourage meaningful application of microbiome knowledge in adaptive wildlife conservation management.
... Metagenomics in plant pathology allows researchers to explore the composition and dynamics of microbial communities in different environments, such as phyllosphere (leaf surface), rhizosphere (root zone), and soil. High-throughput sequencing technologies, including Next-Generation Sequencing (NGS), have revolutionized metagenomic studies by enabling the generation of vast amounts of sequence data from diverse environmental samples (Quince et al., 2017). ...
Article
Full-text available
The field of plant pathology has undergone a transformative evolution, transitioning from traditional, labor-intensive methods to the genomic era marked by significant advancements in molecular biology and computational sciences. This shift has revolutionized our understanding of plant diseases and disease resistance. Genomics, particularly Next-Generation Sequencing (NGS) and CRISPR-Cas systems, has played a central role in this transformation. NGS has allowed for comprehensive genome and transcriptome analysis, facilitating the identification of disease resistance genes and the study of gene expression during pathogen attacks. CRISPR-Cas systems have enabled precise genome editing, contributing to our understanding of disease resistance mechanisms and the development of disease-resistant plant varieties. While these advancements offer exciting prospects, they also come with challenges, including data analysis complexity, off-target effects, and ethical considerations. Nevertheless, the genomic era of plant pathology promises to reshape agriculture and disease management, offering sustainable solutions to crop losses and food security challenges. The integration of genomics in plant pathology has revolutionized our understanding of plant-pathogen interactions and disease resistance mechanisms. This article highlights the significance of genomics in various aspects of plant pathology, from the study of microbial communities through metagenomics to the identification and manipulation of disease resistance genes. The use of technologies like Next-Generation Sequencing (NGS) and CRISPR-Cas systems has enabled precise genome analysis and editing, facilitating the development of disease-resistant crop varieties. However, challenges such as regulatory approval, genetic erosion, climate change, and ethical considerations must be addressed. Despite these challenges, genomics offers promising opportunities to enhance crop disease resistance and ensure global food security in the face of evolving pathogens and changing environments. Collaboration between researchers, breeders, policymakers, and capacity building in developing countries will be essential to fully leverage the potential of genomics in agriculture.
... In the last decade, metagenomics has accelerated the pace of research into microbial ecology and human-associated microbiomes and into their intimate association with our health (1). Hundreds of thousands of microbial and viral genomes assembled from shotgun metagenomics permit the study of microbes and their interactions with each other and their environment (2)(3)(4). ...
Preprint
Full-text available
Metagenomics has revolutionized environmental and human-associated microbiome studies. However, the limited fraction of proteins with known biological process and molecular functions presents a major bottleneck. In prokaryotes and viruses, evolution favors keeping genes participating in the same biological processes co-localized as conserved gene clusters. Conversely, conservation of gene neighborhood indicates functional association. Spacedust is a tool for systematic, de novo discovery of conserved gene clusters. To find homologous protein matches it uses fast and sensitive structure comparison with Foldseek. Partially conserved clusters are detected using novel clustering and order conservation P-values. We demonstrate Spacedust's sensitivity with an all-vs-all analysis of 1308 bacterial genomes, identifying 72843 conserved gene clusters containing 58% of the 4.2 million genes. It recovered recover 95% of antiviral defense system clusters annotated by a specialized tool. Spacedust's high sensitivity and speed will facilitate the large-scale annotation of the huge numbers of sequenced bacterial, archaeal and viral genomes.
... Whole-genome shotgun (WGS) sequencing is also a molecular sequencing technique that involves sequencing the entire genetic material in a sample, providing a comprehensive view of the microbiome, including bacteria, archaea, viruses, and fungi [41]. The process requires four steps: sample collection (fecal or intestinal tissue), DNA extraction, sequencing the whole genome via high-throughput sequencing platforms, and, finally, data analysis [42]. In contrast to 16S rRNA sequencing, shotgun metagenomics sequences the entire genome of all organisms present and allows the characterization of the genetic and genomic diversity along with the functional potential of the microbial domains and offers the possibility to assign taxonomy at the species and strain levels [41]. ...
Article
Full-text available
The complex biology of the microbiome was elucidated once the genomics era began. The proteogenomic approach analyzes and integrates genetic makeup (genomics) and microbial communities′ expressed proteins (proteomics). Therefore, researchers gained insights into gene expression, protein functions, and metabolic pathways, understanding microbial dynamics and behavior, interactions with host cells, and responses to environmental stimuli. In this context, our work aims to bring together data regarding the application of genomics, proteomics, and bioinformatics in microbiome research and to provide new perspectives for applying microbiota modulation in clinical practice with maximum efficiency. This review also synthesizes data from the literature, shedding light on the potential biomarkers and therapeutic targets for various diseases influenced by the microbiome.
Article
The prenatal period is a critical developmental juncture with enduring effects on offspring health trajectories. An individual's gut microbiome is associated with health and developmental outcomes across the lifespan. Prenatal stress can disrupt an infant's microbiome, thereby increasing susceptibility to adverse outcomes. This cross-species systematic review investigates whether maternal prenatal stress affects the offspring's gut microbiome. The study analyzes 19 empirical, peer-reviewed research articles, including humans, rodents, and non-human primates, that included prenatal stress as a primary independent variable and offspring gut microbiome characteristics as an outcome variable. Prenatal stress appeared to correlate with differences in beta diversity and specific microbial taxa, but not alpha diversity. Prenatal stress is positively correlated with Proteobacteria, Bacteroidaceae, Lachnospiraceae, Prevotellaceae, Bacteroides, and Serratia. Negative correlations were observed for Actinobacteria, Enterobacteriaceae, Streptococcaceae, Bifidobacteria, Eggerthella, Parabacteroides, and Streptococcus. Evidence for the direction of association between prenatal stress and Lactobacillus was mixed. The synthesis of findings was limited by differences in study design, operationalization and timing of prenatal stress, timing of infant microbiome sampling, and microbiome analysis methods.
Chapter
In this chapter, we provide a brief overview of the tools, technologies and models, used to interrogate the microbiome and probe its diversity, functionality, and prevalence, along with the caveats of their usage.
Article
Full-text available
During a bacterial infection or colonization, the detection of antimicrobial resistance (AMR) is critical, but slow due to culture-based approaches for clinical and screening samples. Culture-based phenotypic AMR detection and confirmation require up to 72 hours (h) or even weeks for slow-growing bacteria. Direct shotgun metagenomics by long-read sequencing using Oxford Nanopore Technologies (ONT) may reduce the time for bacterial species and AMR gene identification. However, screening swabs for metagenomics is complex due to the range of Gram-negative and -positive bacteria, diverse AMR genes, and host DNA present in the samples. Therefore, DNA extraction is a critical initial step. We aimed to compare the performance of different DNA extraction protocols for ONT applications to reliably identify species and AMR genes using a shotgun long-read metagenomic approach. We included three different sample types: ZymoBIOMICS Microbial Community Standard, an in-house mock community of ESKAPE pathogens including Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Escherichia coli (ESKAPE Mock), and anonymized clinical swab samples. We processed all sample types with four different DNA extraction kits utilizing different lysis (enzymatic vs. mechanical) and purification (spin-column vs. magnetic beads) methods. We used kits from Qiagen (QIAamp DNA Mini and QIAamp PowerFecal Pro DNA) and Promega (Maxwell RSC Cultured Cells and Maxwell RSC Buccal Swab DNA). After extraction, samples were subject to the Rapid Barcoding Kit (RBK004) for library preparation followed by sequencing on the GridION with R9.4.1 flow cells. The fast5 files were base called to fastq files using Guppy in High Accuracy (HAC) mode with the inbuilt MinKNOW software. Raw read quality was assessed using NanoPlot and human reads were removed using Minimap2 alignment against the Hg38 genome. Taxonomy identification was performed on the raw reads using Kraken2 and on assembled contigs using Minimap2. The AMR genes were identified using Minimap2 with alignment against the CARD database on both the raw reads and assembled contigs. We identified all bacterial species present in the Zymo Mock Community (8/8) and ESKAPE Mock (6/6) with Qiagen PowerFecal Pro DNA kit (chemical and mechanical lysis) at read and assembly levels. Enzymatic lysis retrieved fewer aligned bases for the Gram-positive species (Staphylococcus aureus and Enterococcus faecium) from the ESKAPE Mock on the assembly level compared to the mechanical lysis. We detected the AMR genes from Gram-negative and -positive species in the ESKAPE Mock with the QIAamp PowerFecal Pro DNA kit on reads level with a maximum median time of 1.9 h of sequencing. Long-read metagenomics with ONT may reduce the turnaround time in screening for AMR genes. Currently, the QIAamp PowerFecal Pro DNA kit (chemical and mechanical lysis) for DNA extraction along with the Rapid Barcoding Kit for the ONT sequencing captured the best taxonomy and AMR identification for our specific use case. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-80660-3.
Article
Full-text available
Hologenomics, the joint analysis of host genomes and microbial metagenomes, has the potential to address fundamental biological questions from a systemic host‐microbiota perspective. However, multiple fieldwork, laboratory and bioinformatic steps challenge quality, representativeness and comparability of hologenomic data. Leveraging the first 2025 samples sourced from 151 wild vertebrate species analysed in the Earth Hologenome Initiative, we scrutinise hologenomic data generation steps, including laboratory and bioinformatic procedures. Comparisons across taxa and sample types provide novel insights into the relationships between laboratory quality metrics and derived data, the variation of host, prokaryotic and non‐prokaryotic fractions of shotgun data, and the relationship between data quality and quantity with genome and metagenome reconstruction. Our results show that faecal samples are significantly better than anal and cloacal swabs to study intestinal microbiomes using genome‐resolved metagenomics. We also report that birds and bats both have substantially lower microbial DNA fractions and a higher degree of sample‐to‐sample variability compared to amphibians, reptiles and non‐flying mammals. Based on these data, we provide suggestions to the field for robustly and efficiently generating hologenomic data from wild vertebrates.
Article
Coral reefs, the rainforests of the sea, are vital hotspots for marine biodiversity. However, the persistent challenge of climate change directly threatens the delicate balance of coral reef ecosystems, impacting myriad species and critical ecosystem services. Therefore, this comprehensive review critically discusses the associated challenges in assessing and preserving coral reef diversity, emphasizing the need for novel biomonitoring techniques due to the elusive and cryptic nature of many reef organisms. The review focuses on environmental DNA (eDNA) analysis as a non-invasive tool for coral species monitoring at various ecological levels. The review highlights that using eDNA in coral reef monitoring requires careful consideration of multiple factors, such as strategic assay development, optimization, and marker selection, substrate selection, and sample volume, which are critical for maximizing the probability of species detection. Moreover, integrating environmental RNA (eRNA) provides additional insights into temporal aspects advancing the coral reef biodiversity research and conservation efforts.
Article
Full-text available
The rapid advancements in sequencing technologies and bioinformatics have enabled metagenomic research of complex microbial systems, but reliable results depend on consistent laboratory and bioinformatics approaches. Current efforts to identify best practices often focus on optimizing specific steps, making it challenging to understand the influence of each stage on microbial population analysis and compare data across studies. This study evaluated DNA extraction, library construction methodologies, sequencing platforms, and computational approaches using a dog stool sample, two synthetic microbial community mixtures, and various sequencing data sources. Our work, the most comprehensive evaluation of metagenomic methods to date. We developed a software tool, termed minitax, which provides consistent results across the range of platforms and methodologies. Our findings showed that the Zymo Research Quick-DNA HMW MagBead Kit, Illumina DNA Prep library preparation method, and the minitax bioinformatics tool were the most effective for high-quality microbial diversity analysis. However, the effectiveness of pipelines or method combinations is sample-specific, making it difficult to identify a universally optimal approach. Therefore, employing multiple approaches is crucial for obtaining reliable outcomes in microbial systems.
Article
Full-text available
Introduction The 1980 eruption of Mount St. Helens had devastating effects above and belowground in forested montane ecosystems, including the burial and destruction of soil microbes. Soil microbial propagules and legacies in recovering ecosystems are important for determining post-disturbance successional trajectories. Soil microorganisms regulate nutrient cycling, interact with many other organisms, and therefore may support successional pathways and complementary ecosystem functions, even in harsh conditions. Historic forest management methods, such as old-growth and clearcut regimes, and locations of small mammal enclo and clearcut forests, as well as in locations of historic short-term gopher enclosures ( Thomomys talpoides ), to evaluate community response to forest management practices and to examine vectors for dispersing microbial consortia to the surface of the volcanic landscape. These biotic interactions may have primed ecological succession in the volcanic landscape, specifically Bear Meadow and the Pumice Plain, by creating microsite conditions conducive to primary succession and plant establishment. Methods and results Using molecular techniques, we examined bacterial, fungal, and AMF communities to determine how these variables affected microbial communities and soil properties. We found that bacterial/archaeal 16S, fungal ITS2, and AMF SSU community composition varied among forestry practices and across sites with long-term lupine plots and gopher enclosures. The findings also related to detected differences in C and N concentrations and ratios in soil from our study sites. Fungal communities from previously clearcut locations were less diverse than in gopher plots within the Pumice Plain. Yet, clearcut meadows harbored fewer ancestral AM fungal taxa than were found within the old-growth forest. Discussion By investigating both forestry practices and mammals in microbial dispersal, we evaluated how these interactions may have promoted revegetation and ecological succession within the Pumice Plains of Mount St. Helens. In addition to providing evidence about how dispersal vectors and forest structure influence post-eruption soil microbiomes, this project also informs research and management communities about belowground processes and microbial functional traits in facilitating succession and ecosystem function.
Article
Full-text available
A healthy ocular surface is inhabited by microorganisms that constitute the ocular microbiome. The core of the ocular microbiome is still a subject of debate. Numerous culture-dependent and gene sequencing studies have revealed the composition of the ocular microbiome. There was a confirmed correlation between the ocular microbiome and ocular surface homeostasis as well as between ocular dysbiosis and pathologies such as blepharitis, microbial keratitis, and conjunctivitis. However, the role of the ocular microbiome in the pathogenesis and treatment of ocular surface diseases remains unclear. This article reviews available data on the ocular microbiome and microbiota, their role in maintaining ocular homeostasis, and the impact of dysbiosis on several ophthalmic disorders. Moreover, we aimed to discuss potential treatment targets within the ocular microbiota.
Article
Full-text available
The implementation of omics technologies and associated bioinformatics approaches hold significant promise for generating additional evidence for food and feed risk assessments thereby enhancing the European Food Safety Authority (EFSA) capacity to deliver scientific opinions and guidance documents in the future. To explore this possibility, EFSA launched a Call for the development of a roadmap to identify the main actions needed for a wider use of Omics in future risk assessments. To address this objective, this action roadmap outlines six project proposals. These proposals are based on a comprehensive mapping of the state‐of‐the‐art omics and associated bioinformatics technologies in research, EFSA's activities as well as current and planned activities from other relevant regulatory bodies and organisations. The outlined recommendations also address some of the identified main knowledge gaps and highlight the added value that further investments in the different food & feed safety scientific domains could bring. In addition, the work in this roadmap addresses some key challenges and blockers that might hinder a wider integration of omics in risk assessment and leverages on the opportunities for cooperation with external stakeholders. Finally, this roadmap provides suggestions on how EFSA may more broadly and effectively engage with relevant stakeholders in the use of omics technologies and associated bioinformatics approaches in regulatory science.
Book
"Teknik Desain Primer Real-Time PCR" adalah panduan komprehensif yang menggali seluk-beluk desain primer untuk aplikasi Real-Time PCR. Buku ini memadukan teori fundamental dengan praktik terkini, membimbing pembaca dari konsep dasar hingga strategi canggih dalam optimasi primer. Dimulai dengan pengenalan tentang Real-Time PCR, buku ini kemudian mendalami karakteristik primer yang ideal dan faktor-faktor kritis dalam desainnya. Pembaca akan mempelajari langkah-langkah sistematis dalam proses desain, termasuk pemilihan sekuens target, pertimbangan GC content, dan strategi untuk menghindari struktur sekunder yang tidak diinginkan. Buku ini juga mencakup penggunaan berbagai alat dan software untuk desain primer, serta teknik optimasi dan validasi. Studi kasus praktis memberikan wawasan tentang aplikasi di dunia nyata. Yang menonjol adalah bab tentang tren terbaru, yang mengeksplorasi penggunaan AI, nanoteknologi, dan pendekatan inovatif lainnya dalam desain primer. Dengan penekanan pada integrasi pengetahuan tradisional dan teknologi mutakhir, buku ini menjadi sumber daya berharga bagi peneliti, teknisi laboratorium, dan mahasiswa yang ingin menguasai seni dan ilmu desain primer untuk Real-Time PCR.
Article
Full-text available
The early-life gut microbiota (GM) is increasingly recognized for its contributions to human health and disease over time. Microbiota composition, influenced by factors like race, geography, lifestyle, and individual differences, is subject to change. The GM serves dual roles, defending against pathogens and shaping the host immune system. Disruptions in microbial composition can lead to immune dysregulation, impacting defense mechanisms. Additionally, GM aids digestion, releasing nutrients and influencing physiological systems like the liver, brain, and endocrine system through microbial metabolites. Dysbiosis disrupts intestinal homeostasis, contributing to age-related diseases. Recent studies are elucidating the bacterial species that characterize a healthy microbiota, defining what constitutes a ‘healthy’ colonic microbiota. The present review article focuses on the importance of microbiome composition for the development of homeostasis and the roles of GM during aging and the age-related diseases caused by the alteration in gut microbial communities. This article might also help the readers to find treatments targeting GM for the prevention of various diseases linked to it effectively.
Article
Despite the ecological and biotechnological significance of microalgae-bacterium symbionts, the response of host-virus interactions to external environmental fluctuations and the role of viruses in phycosphere remain largely unexplored. Herein, we employed algal-bacterial granular sludge (ABGS) with varying light intensity and organic carbon loading to investigate the mechanisms of microalgae-bacterium-virus symbionts in response to environmental fluctuations. Metagenomics revealed that enhanced light intensity decreased the diversity of microalgae, so did the diversity of symbiotic bacteria and viruses. As carbon sources decreased, bacteria prompted horizontal gene transfer in phycosphere by 12.76%-157.40%, increased the proportion of oligotrophs as keystone species (0.00% vs 14.29%) as well as viruses using oligotrophs as hosts (18.52% vs 25.00%). Furthermore, virus-carried auxiliary metabolic genes (AMGs) and biosynthetic gene clusters (BGCs) encoding vitamin B12 synthesis (e.g., cobS), antioxidation (e.g., queC), and microbial aggregation (e.g., cysE). Additionally, phylogenetic and similarity analysis further revealed the evolutionary origin and potential horizontal transfer of the AMGs and BGCs, which could potentially enhance the adaptability of bacteria and eukaryotic microalgae. Overall, our research demonstrates that environmental fluctuations have cascading effects on the microalgae-bacteria-virus interactions, and emphasizes the important role of viruses in maintaining the stability of the phycosphere symbiotic community.
Article
Tremendous advances in mass spectrometric and bioinformatic approaches have expanded proteomics into the field of microbial ecology. The commonly used spectral annotation method for metaproteomics data relies on database searching, which requires sample-specific databases obtained from whole metagenome sequencing experiments. However, creating these databases is complex, time-consuming, and prone to errors, potentially biasing experimental outcomes and conclusions. This asks for alternative approaches that can provide rapid and orthogonal insights into metaproteomics data. Here we present NovoLign, a de novo metaproteomics pipeline that performs sequence alignment of de novo sequences from complete metaproteomics experiments. The pipeline enables rapid taxonomic profiling of complex communities and evaluates the taxonomic coverage of metaproteomics outcomes obtained from database searches. Furthermore, the NovoLign pipeline supports the creation of reference sequence databases for database searching to ensure comprehensive coverage. We assessed the NovoLign pipeline for taxonomic coverage and false positive annotations using a wide range of in silico and experimental data, including pure reference strains, laboratory enrichment cultures, synthetic communities, and environmental microbial communities. In summary, we present NovoLign, a de novo metaproteomics pipeline that employs large-scale sequence alignment to enable rapid taxonomic profiling, evaluation of database searching outcomes, and the creation of reference sequence databases. The NovoLign pipeline is publicly available via: https://github.com/hbckleikamp/NovoLign.
Article
Geosmin and 2‐methylisoborneol (MIB) are known to cause taste‐and‐odour problems in recirculating aquaculture systems (RAS). Both geosmin and MIB are microbial metabolites belonging to terpenoids. Precursors for terpenoids are biosynthesized via the methylerythritol phosphate (MEP) and the mevalonate (MVA) pathways. We carried out a metagenomic analysis of 50 samples from five RAS to investigate terpenoid biosynthesis and metabolic potential for geosmin and MIB production in RAS microbiomes. A total of 1008 metagenome‐assembled genomes (MAGs) representing 26 bacterial and three archaeal phyla were recovered. Although most archaea are thought to use the MVA pathway for terpenoid precursor biosynthesis, an Iainarchaeota archaeal MAG is shown to harbour a complete set of genes encoding the MEP pathway but lacking genes associated with the MVA pathway. In this study, a total of 16 MAGs affiliated with five bacterial phyla ( Acidobacteriota , Actinobacteriota , Bacteroidota , Chloroflexota , and Myxococcota ) were identified as possessing potential geosmin or MIB synthases. These putative taste and odour producers were diverse, many were taxonomically unidentified at the genus or species level, and their relative abundance differed between the investigated RAS farms. The metagenomic study of the RAS microbiomes revealed a previously unknown phylogenetic diversity of the potential to produce geosmin and MIB.
Article
Full-text available
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Article
Full-text available
Background: We introduce DESMAN for De novo Extraction of Strains from MetAgeNomes. Metagenome sequencing generates short reads from throughout the genomes of a microbial community. Increasingly large, multi-sample metagenomes, stratified in space and time are being generated from communities with thousands of species. Repeats result in fragmentary co-assemblies with potentially millions of contigs. Contigs can be binned into metagenome assembled genomes (MAGs) but strain level variation will remain. DESMAN identifies variants on core genes, then uses co-occurrence across samples to link variants into strain sequences and abundance profiles. These strain profiles are then searched for on non-core genes to determine the accessory genes present in each strain. Results: We validated DESMAN on a synthetic twenty genome community with 64 samples. We could resolve the five E. coli strains present with 99.58% accuracy across core gene variable sites and their gene complement with 95.7% accuracy. Similarly, on real fecal metagenomes from the 2011 E. coli (STEC) O104:H4 outbreak, the outbreak strain was reconstructed with 99.8% core sequence accuracy. Application to an anaerobic digester metagenome time series reveals that strain level variation is endemic with 16 out of 26 MAGs (61.5%) examined exhibiting two strains. In almost all cases the strain proportions were not statistically different between replicate reactors, suggesting intra-species niche partitioning. The only exception being when the two strains had almost identical gene complement and, hence, functional capability. Conclusions: DESMAN will provide a provide a powerful tool for de novo resolution of fine-scale variation in microbial communities. It is available as open source software from https://github.com/chrisquince/DESMAN.
Article
Full-text available
Among the human health conditions linked to microbial communities, phenotypes are often associated with only a subset of strains within causal microbial groups. While it has been critical for decades in microbial physiology to characterize individual strains, this has been challenging when using culture-independent high-throughput metagenomics. We introduce StrainPhlAn, a novel metagenomic strain identification approach, and apply it to characterize the genetic structure of thousands of strains from >125 species in >1,500 gut metagenomes drawn from populations spanning North/South American, European, Asian, and African countries. The method relies on per-sample dominant sequence variant reconstruction within species-specific marker genes. It identified primarily subject-specific strain variants (<5% inter-subject strain sharing), and we determined that a single strain typically dominated each species and was retained over time (for >70% of species). Microbial population structure was correlated in several distinct ways with the geographic structure of the host population. In some cases discrete subspecies (e.g. for Eubacterium rectale and Prevotella copri) or continuous microbial genetic variations (e.g. for Faecalibacterium prausnitzii) were associated with geographically distinct human populations, whereas few strains occurred in multiple unrelated cohorts. We further estimated the genetic variability of gut microbes, with Bacteroides species appearing remarkably consistent (0.45% median number of nucleotide variants between strains) whereas P. copri was among the most plastic gut colonizers. We thus characterize here the population genetics of previously inaccessible intestinal microbes, providing a comprehensive strain-level genetic overview of the gut microbial diversity.
Article
Full-text available
Background Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. Methods We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. Results Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. Conclusions Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way.
Article
Full-text available
We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare SNPs to track strains between hosts. Using this approach, we found that although species compositions of mothers and infants converged over time, strain-level similarity diverged. Specifically, early colonizing bacteria were often transmitted from an infant's mother, while late colonizing bacteria were often transmitted from other sources in the environment and were enriched for spore-formation genes. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data are analyzed at a coarser taxonomic resolution.
Article
Full-text available
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
Article
Full-text available
“Normal” for the gut microbiota For the benefit of future clinical studies, it is critical to establish what constitutes a “normal” gut microbiome, if it exists at all. Through fecal samples and questionnaires, Falony et al. and Zhernakova et al. targeted general populations in Belgium and the Netherlands, respectively. Gut microbiota composition correlated with a range of factors including diet, use of medication, red blood cell counts, fecal chromogranin A, and stool consistency. The data give some hints for possible biomarkers of normal gut communities. Science , this issue pp. 560 and 565
Article
Full-text available
The tree of life is one of the most important organizing principles in biology1. Gene surveys suggest the existence of an enormous number of branches2, but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships3, 4, 5 or on the known, well-classified diversity of life with an emphasis on eukaryotes6. These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts7,8. Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses.
Article
Full-text available
Background In the last 5 years, the rapid pace of innovations and improvements in sequencing technologies has completely changed the landscape of metagenomic and metagenetic experiments. Therefore, it is critical to benchmark the various methodologies for interrogating the composition of microbial communities, so that we can assess their strengths and limitations. The most common phylogenetic marker for microbial community diversity studies is the 16S ribosomal RNA gene and in the last 10 years the field has moved from sequencing a small number of amplicons and samples to more complex studies where thousands of samples and multiple different gene regions are interrogated.
Article
Full-text available
Profiling microbial community function from metagenomic sequencing data remains a computationally challenging problem. Mapping millions of DNA reads from such samples to reference protein databases requires long run-times, and short read lengths can result in spurious hits to unrelated proteins (loss of specificity). We developed ShortBRED (Short, Better Representative Extract Dataset) to address these challenges, facilitating fast, accurate functional profiling of metagenomic samples. ShortBRED consists of two components: (i) a method that reduces reference proteins of interest to short, highly representative amino acid sequences ("markers") and (ii) a search step that maps reads to these markers to quantify the relative abundance of their associated proteins. After evaluating ShortBRED on synthetic data, we applied it to profile antibiotic resistance protein families in the gut microbiomes of individuals from the United States, China, Malawi, and Venezuela. Our results support antibiotic resistance as a core function in the human gut microbiome, with tetracycline-resistant ribosomal protection proteins and Class A beta-lactamases being the most widely distributed resistance mechanisms worldwide. ShortBRED markers are applicable to other homology-based search tasks, which we demonstrate here by identifying phylogenetic signatures of antibiotic resistance across more than 3,000 microbial isolate genomes. ShortBRED can be applied to profile a wide variety of protein families of interest; the software, source code, and documentation are available for download at http://huttenhower.sph.harvard.edu/shortbred.
Article
Full-text available
Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.
Article
Full-text available
Significance The field of microbiome research is moving from 16S rDNA gene sequencing to metagenomic sequencing of complete communities, which clearly gives a more comprehensive genomic and functional representation of the organisms present. Here we describe, quantify, and compare biases associated with four currently available next-generation sequencing library preparation methods using a synthetic DNA mock community and an extraction spike-in control of microbial cells. Our study highlights a critical need for consistency in protocols and data analysis procedures, especially when attempting to interpret human microbiome data for human health.
Article
Full-text available
Advances in high-throughput sequencing and ‘omics technologies are revolutionizing studies of naturally occurring microbial communities. Comprehensive investigations of microbial lifestyles require the ability to interactively organize and visualize genetic information and to incorporate subtle differences that enable greater resolution of complex data. Here we introduce anvi’o, an advanced analysis and visualization platform that offers automated and human-guided characterization of microbial genomes in metagenomic assemblies, with interactive interfaces that can link ‘omics data from multiple sources into a single, intuitive display. Its extensible visualization approach distills multiple dimensions of information about each contig, offering a dynamic and unified work environment for data exploration, manipulation, and reporting. Using anvi’o, we re-analyzed publicly available datasets and explored temporal genomic changes within naturally occurring microbial populations through de novo characterization of single nucleotide variations, and linked cultivar and single-cell genomes with metagenomic and metatranscriptomic data. Anvi’o is an open-source platform that empowers researchers without extensive bioinformatics skills to perform and communicate in-depth analyses on large ‘omics datasets.
Article
Full-text available
Assessment and characterization of gut microbiota has become a major research area in human disease, including type 2 diabetes, the most prevalent endocrine disease worldwide. To carry out analysis on gut microbial content in patients with type 2 diabetes, we developed a protocol for a metagenome-wide association study (MGWAS) and undertook a two-stage MGWAS based on deep shotgun sequencing of the gut microbial DNA from 345 Chinese individuals. We identified and validated approximately 60,000 type-2-diabetes-associated markers and established the concept of a metagenomic linkage group, enabling taxonomic species-level analyses. MGWAS analysis showed that patients with type 2 diabetes were characterized by a moderate degree of gut microbial dysbiosis, a decrease in the abundance of some universal butyrate-producing bacteria and an increase in various opportunistic pathogens, as well as an enrichment of other microbial functions conferring sulphate reduction and oxidative stress resistance. An analysis of 23 additional individuals demonstrated that these gut microbial markers might be useful for classifying type 2 diabetes.
Article
Full-text available
We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies.
Article
Full-text available
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.
Article
Full-text available
Targeted manipulation of the gut flora is increasingly being recognized as a means to improve human health. Yet, the temporal dynamics and intra- and interindividual heterogeneity of the microbiome represent experimental limitations, especially in human cross-sectional studies. Therefore, rodent models represent an invaluable tool to study the host-microbiota interface. Progress in technical and computational tools to investigate the composition and function of the microbiome has opened a new era of research and we gradually begin to understand the parameters that influence variation of host-associated microbial communities. To isolate true effects from confounding factors, it is essential to include such parameters in model intervention studies. Also, explicit journal instructions to include essential information on animal experiments are mandatory. The purpose of this review is to summarize the factors that influence microbiota composition in mice and to provide guidelines to improve the reproducibility of animal experiments. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Article
Full-text available
Metagenomic sequencing increased our understanding of the role of the microbiome in health and disease, yet it only provides a snapshot of a highly dynamic ecosystem. Here, we show that the pattern of metagenomic sequencing read coverage for different microbial genomes contains a single trough and a single peak, the latter coinciding with the bacterial origin of replication. Furthermore, the ratio of sequencing coverage between the peak and trough provides a quantitative measure of a species’ growth rate. We demonstrate this in vitro and in vivo, under different growth conditions, and in complex bacterial communities. For several bacterial species, peak-to-trough coverage ratios, but not relative abundances, correlated with the manifestation of inflammatory bowel disease and type II diabetes.
Article
Full-text available
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
Article
Full-text available
The increased availability of genomic and metagenomic data poses challenges at multiple analysis levels, including visualization of very large-scale microbial and microbial community data paired with rich metadata. We developed GraPhlAn (Graphical Phylogenetic Analysis), a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes. This includes phylogenies spanning up to thousands of taxa, annotated with metadata ranging from microbial community abundances to microbial physiology or host and environmental phenotypes. GraPhlAn has been developed as an open-source command-driven tool in order to be easily integrated into complex, publication-quality bioinformatics pipelines. It can be executed either locally or through an online Galaxy web application. We present several examples including taxonomic and phylogenetic visualization of microbial communities, metabolic functions, and biomarker discovery that illustrate GraPhlAn's potential for modern microbial and community genomics.
Article
Full-text available
A prominent feature of the bacterial domain is a radiation of major lineages that are defined as candidate phyla because they lack isolated representatives. Bacteria from these phyla occur in diverse environments and are thought to mediate carbon and hydrogen cycles. Genomic analyses of a few representatives suggested that metabolic limitations have prevented their cultivation. Here we reconstructed 8 complete and 789 draft genomes from bacteria representing >35 phyla and documented features that consistently distinguish these organisms from other bacteria. We infer that this group, which may comprise >15% of the bacterial domain, has shared evolutionary history, and describe it as the candidate phyla radiation (CPR). All CPR genomes are small and most lack numerous biosynthetic pathways. Owing to divergent 16S ribosomal RNA (rRNA) gene sequences, 50-100% of organisms sampled from specific phyla would evade detection in typical cultivation-independent surveys. CPR organisms often have self-splicing introns and proteins encoded within their rRNA genes, a feature rarely reported in bacteria. Furthermore, they have unusual ribosome compositions. All are missing a ribosomal protein often absent in symbionts, and specific lineages are missing ribosomal proteins and biogenesis factors considered universal in bacteria. This implies different ribosome structures and biogenesis mechanisms, and underlines unusual biology across a large part of the bacterial domain.
Article
Full-text available
Whole-genome sequencing has become an indispensible tool of modern biology. However, the cost of sample preparation relative to the cost of sequencing remains high, especially for small genomes where the former is dominant. Here we present a protocol for rapid and inexpensive preparation of hundreds of multiplexed genomic libraries for Illumina sequencing. By carrying out the Nextera tagmentation reaction in small volumes, replacing costly reagents with cheaper equivalents, and omitting unnecessary steps, we achieve a cost of library preparation of $8 per sample, approximately 6 times cheaper than the standard Nextera XT protocol. Furthermore, our procedure takes less than 5 hours for 96 samples. Several hundred samples can then be pooled on the same HiSeq lane via custom barcodes. Our method will be useful for re-sequencing of microbial or viral genomes, including those from evolution experiments, genetic screens, and environmental samples, as well as for other sequencing applications including large amplicon, open chromosome, artificial chromosomes, and RNA sequencing.
Article
Full-text available
Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the effects of WGA on 31 different microbial communities from five biotopes that also included low-biomass samples from drinking water and groundwater. Our findings provide evidence that microbiome segregation by biotope was possible despite WGA treatment. Nevertheless, samples from different biotopes revealed different levels of distortion, with genomic GC content significantly correlated with WGA perturbation. Certain phyloge-netic clades revealed a homogenous trend across various sample types, for instance Alpha-and Betaproteobacteria showed a decrease in their abundance after WGA treatment. On the other hand, Enterobacteriaceae, an important biomarker group for fecal contamination in groundwater and drinking water, were strongly affected by WGA treatment without a predictable pattern. These novel results describe the impact of WGA on low-biomass samples and may highlight issues to be aware of when designing future metage-nomic studies that necessitate preceding WGA treatment.
Article
Full-text available
Microbes are dominant drivers of biogeochemical processes, yet drawing a global picture of functional diversity, microbial community structure, and their ecological determinants remains a grand challenge. We analyzed 7.2 terabases of metagenomic data from 243 Tara Oceans samples from 68 locations in epipelagic and mesopelagic waters across the globe to generate an ocean microbial reference gene catalog with >40 million nonredundant, mostly novel sequences from viruses, prokaryotes, and picoeukaryotes. Using 139 prokaryote-enriched samples, containing >35,000 species, we show vertical stratification with epipelagic community composition mostly driven by temperature rather than other environmental factors or geography. We identify ocean microbial core functionality and reveal that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems. Copyright © 2015, American Association for the Advancement of Science.
Article
Full-text available
Despite extensive direct sequencing efforts and advanced analytical tools, reconstructing microbial genomes from soil using metagenomics have been challenging due to the tremendous diversity and relatively uniform distribution of genomes found in this system. Here we used enrichment techniques in an attempt to decrease the complexity of a soil microbiome prior to sequencing by submitting it to a range of physical and chemical stresses in 23 separate microcosms for 4 months. The metagenomic analysis of these microcosms at the end of the treatment yielded 540 Mb of assembly using standard de novo assembly techniques (a total of 559,555 genes and 29,176 functions), from which we could recover novel bacterial genomes, plasmids and phages. The recovered genomes belonged to Leifsonia (n = 2), Rhodanobacter (n = 5), Acidobacteria (n = 2), Sporolactobacillus (n = 2, novel nitrogen fixing taxon), Ktedonobacter (n = 1, second representative of the family Ktedonobacteraceae), Streptomyces (n = 3, novel polyketide synthase modules), and Burkholderia (n = 2, includes mega-plasmids conferring mercury resistance). Assembled genomes averaged to 5.9 Mb, with relative abundances ranging from rare (<0.0001%) to relatively abundant (>0.01%) in the original soil microbiome. Furthermore, we detected them in samples collected from geographically distant locations, particularly more in temperate soils compared to samples originating from high-latitude soils and deserts. To the best of our knowledge, this study is the first successful attempt to assemble multiple bacterial genomes directly from a soil sample. Our findings demonstrate that developing pertinent enrichment conditions can stimulate environmental genomic discoveries that would have been impossible to achieve with canonical approaches that focus solely upon post-sequencing data treatment.
Article
Full-text available
Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes, and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microorganisms at low abundance in complex microbial communities from terrestrial sediments. The long read data revealed multiple (probably dozens of) closely related species and strains from previously undescribed Deltaproteobacteria and Aminicenantes (candidate phylum OP8). Notably, these are the most abundant organisms in the communities, yet short-read assemblies achieved only partial genome coverage, mostly in the form of short scaffolds (N50=~2,200 bps). Genome architecture and metabolic potential for these lineages were reconstructed using a new synteny-based method. Analysis of long read data also revealed thousands of species, whose abundances were <0.1%, in all samples. Most of the organisms in this 'long tail' of rare organisms belong to phyla that are also represented by abundant organisms. Genes encoding glycosyl hydrolases are significantly more abundant than expected in rare genomes, suggesting that rare species may augment the capability for carbon turnover and confer resilience to changing environmental conditions. Overall the study showed that a diversity of closely-related strains and rare organisms account for a major portion of the communities. These are probably common features of many microbial communities, and can be effectively studied using a combination of long and short reads. Published by Cold Spring Harbor Laboratory Press.
Article
Full-text available
The accessibility of high-throughput sequencing has revolutionized many fields of biology. In order to better understand host-associated viral and microbial communities, a comprehensive workflow for DNA and RNA extraction was developed. The workflow concurrently generates viral and microbial metagenomes, as well as metatranscriptomes, from a single sample for next-generation sequencing. The coupling of these approaches provides an overview of both the taxonomical characteristics and the community encoded functions. The presented methods use Cystic Fibrosis (CF) sputum, a problematic sample type, because it is exceptionally viscous and contains high amount of mucins, free neutrophil DNA, and other unknown contaminants. The protocols described here target these problems and successfully recover viral and microbial DNA with minimal human DNA contamination. To complement the metagenomics studies, a metatranscriptomics protocol was optimized to recover both microbial and host mRNA that contains relatively few ribosomal RNA (rRNA) sequences. An overview of the data characteristics is presented to serve as a reference for assessing the success of the methods. Additional CF sputum samples were also collected to (i) evaluate the consistency of the microbiome profiles across seven consecutive days within a single patient, and (ii) compare the consistency of metagenomic approach to a 16S ribosomal RNA gene-based sequencing. The results showed that daily fluctuation of microbial profiles without antibiotic perturbation was minimal and the taxonomy profiles of the common CF-associated bacteria were highly similar between the 16S rDNA libraries and metagenomes generated from the hypotonic lysis (HL)-derived DNA. However, the differences between 16S rDNA taxonomical profiles generated from total DNA and HL-derived DNA suggest that hypotonic lysis and the washing steps benefit in not only removing the human-derived DNA, but also microbial-derived extracellular DNA that may misrepresent the actual microbial profiles.
Article
Full-text available
Several bacterial species have been implicated in the development of colorectal carcinoma (CRC), but CRC-associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here, we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor-free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test (FOBT) and when both approaches were combined, sensitivity improved > 45% relative to the FOBT, while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early- and late-stage cancer and could be validated in independent patient and control populations (N = 335) from different countries. CRC-associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor-related host–microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients, accompanied by an increase of lipopolysaccharide metabolism.
Article
Full-text available
Background The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. Findings We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). Conclusions Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods.
Article
Full-text available
Background The study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. Results In this study we demonstrate that contamin