ArticleLiterature Review

Shotgun metagenomics, from sampling to analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Diverse microbial communities of bacteria, archaea, viruses and single-celled eukaryotes have crucial roles in the environment and in human health. However, microbes are frequently difficult to culture in the laboratory, which can confound cataloging of members and understanding of how communities function. High-throughput sequencing technologies and a suite of computational pipelines have been combined into shotgun metagenomics methods that have transformed microbiology. Still, computational approaches to overcome the challenges that affect both assembly-based and mapping-based metagenomic profiling, particularly of high-complexity samples or environments containing organisms with limited similarity to sequenced genomes, are needed. Understanding the functions and characterizing specific strains of these communities offers biotechnological promise in therapeutic discovery and innovative ways to synthesize products using microbial factories and can pinpoint the contributions of microorganisms to planetary, animal and human health.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As is the converse, recognizing the bioinformatics steps to be used is highly valuable for setting up study designs and laboratory procedures. While such critical aspects are mentioned in other review articles [13][14][15] strategies to detect and minimize their impact are seldom proposed. ...
... Microbial shotgun metagenomics can be conducted following many different field, laboratory, and bioinformatic procedures, which have been explained in detail elsewhere [13,14]. Once the study design is established, the workflow begins with the collection and preservation of samples, and continues with the laboratory processing of them to generate DNA libraries, which are bioinformatically processed after DNA sequencing ( Figure 1). ...
... If the funds or infrastructure available do not permit the sequencing samples deep enough to recover the desired metagenomic depth (see P11), one possibility is to deplete non-microbial host DNA during laboratory processing [55,56]. Although this often comes at the expense of introducing compositional biases [14], it might be the best option when relative abundances of taxa are not that relevant, such as when building de novo bacterial genome catalogues (see P11 and P15). Alternatively, if a screening using shallow depth sequencing has been performed, researchers can sequence the samples with lower relative amounts of microbial DNA more deeply, or discard the samples with excessive non-microbial DNA. ...
... First, there can be genes that are truly present in the genome the MAG represents but are unobserved in a MAG. Common reasons for this error include inadequate sequencing depth, high diversity in the metagenomes under study, and the inherent limitations of short read sequencing for reconstructing repetitive regions [8][9][10][11][12]. A second type of error in MAGs is erroneously observed genes: genes that are included in a MAG that are not truly present in the originating genome. ...
... Researchers often have to decide how to allocate budget across number of samples (including replicates and control data) and sequencing depth per sample. While existing guidelines for sequencing depth have focused on taxonomy estimation, MAG reconstruction, and gene detection [9][10][11][32][33][34], our proposed modeling approach enables the principled study of the design of shotgun sequencing experiments to maximize power to detect differences in gene presence across sample groups. ...
Article
Full-text available
Recovering metagenome-assembled genomes (MAGs) from shotgun sequencing data is an increasingly common task in microbiome studies, as MAGs provide deeper insight into the functional potential of both culturable and non-culturable microorganisms. However, metagenome-assembled genomes vary in quality and may contain omissions and contamination. These errors present challenges for detecting genes and comparing gene enrichment across sample types. To address this, we propose happi, an approach to testing hypotheses about gene enrichment that accounts for genome quality. We illustrate the advantages of happi over existing approaches using published Saccharibacteria MAGs, Streptococcus thermophilus MAGs, and via simulation.
... Metagenomic approaches (e.g., 16S amplicon and shotgun sequencing) provide relatively simple and rapid ways to profile the taxonomic composition and functional potential of microbial community and to recover whole genome sequences without the necessity of culturing 14 . Recent metagenomic surveys on desert microbiomes have significantly advanced our current understanding of the composition and function of microbial populations in the global deserts [15][16][17] , such as the Atacama Desert 18-21 , Namib Desert [22][23][24] , Negev Desert 25,26 , Gurbantunggut Desert 27 and polar deserts 28,29 , laying an important foundation for further in-depth exploration of desert microbial resources. ...
... It is noteworthy that strategies for selective culture enrichment to reduce community complexity may aid the metagenomic studies in specific environmental biomes 14 (e.g., gut 37,38 ). However, no studies have jointly employed culturomics and cultureenriched metagenomic sequencing to study the desert microbiome. ...
Article
Full-text available
Deserts occupy one-third of the Earth’s terrestrial surface and represent a potentially significant reservoir of microbial biodiversity, yet the majority of desert microorganisms remain uncharacterized and are seen as “microbial dark matter”. Here, we introduce a multi-omics strategy, culturomics-based metagenomics (CBM) that integrates large-scale cultivation, full-length 16S rRNA gene amplicon, and shotgun metagenomic sequencing. The results showed that CBM captured a significant amount of taxonomic and functional diversity missed in direct sequencing by increasing the recovery of amplicon sequence variants (ASVs) and high/medium-quality metagenome-assembled genomes (MAGs). Importantly, CBM allowed the post hoc recovery of microbes of interest (e.g., novel or specific taxa), even those with extremely low abundance in the culture. Furthermore, strain-level analyses based on CBM and direct sequencing revealed that the desert soils harbored a considerable number of novel bacterial candidates (1941, 51.4%), of which 1095 (from CBM) were culturable. However, CBM would not exactly reflect the relative abundance of true microbial composition and functional pathways in the in situ environment, and its use coupled with direct metagenomic sequencing could provide greater insight into desert microbiomes. Overall, this study exemplifies the CBM strategy with high-resolution is an ideal way to deeply explore the untapped novel bacterial resources in desert soils, and substantially expands our knowledge on the microbial dark matter hidden in the vast expanse of deserts.
... (2) In a shotgun metagenomics (MGx) experiment [21], one extracts, shears, and sequences whole genomes from bacterial cells in the community. Therefore, through the production of full genome assemblies, this is the method that is most capable of accurately locating a genome in a phylogenetic tree and thus identifying novel species. ...
... A summary of the complete workflows for these approaches is shown in Figure 1. (2) In a shotgun metagenomics (MGx) experiment [21], one extracts, shears, and sequences whole genomes from bacterial cells in the community. Therefore, through the production of full genome assemblies, this is the method that is most capable of accurately locating a genome in a phylogenetic tree and thus identifying novel species. ...
Article
Full-text available
The interaction between the microbial communities in the human body and the onset and progression of cancer has not been investigated until recently. The vast majority of the metagenomics research in this area has concentrated on the composition of microbiomes, attempting to link the overabundance or depletion of certain microorganisms to cancer proliferation, metastatic behaviour, and its resistance to therapies. However, studies elucidating the functional implications of the microbiome activity in cancer patients are still scarce; in particular, there is an overwhelming lack of studies assessing such implications directly, through analysis of the transcriptome of the bacterial community. This review summarises the contributions of metagenomics and metatranscriptomics to the knowledge of the microbial environment associated with several cancers; most importantly, it highlights all the advantages that metatranscriptomics has over metagenomics and suggests how such an approach can be leveraged to advance the knowledge of the cancer bacterial environment.
... SMS libraries for gDNA obtained from native and pre-enriched stool samples (referred hereafter as "native SMS" and "pre-enriched SMS, " respectively) were prepared with the NEBNext ® Ultra™ II FS DNA Library Prep Kit and sequenced on a NovaSeq 6000 platform (2 × 150-bp paired-end output, minimum depth of raw reads, 10 million, M) by Eurofins Genomics. Downstream analyses were conducted as suggested by the general guidelines described by Quince et al. (2017). The subsequent SMS raw reads were quality checked with FastQC v0.11.9. ...
... Although pre-enriched SMS is a promising alternative to non-targeted native SMS, further studies should investigate the effect of deep sequencing (beyond >10 M reads) with and without non-targeted pre-enriched SMS (i.e., no addition of antibiotic) and native SMS alone, since performance [i.e., sensitivity to detect specific ARGs (bla CTX-M or bla DHA ) or all present ARGs] may simply increase due to the availability of more reads or vice versa, as may have been the case in this study. Furthermore, the pre-analytical effects (e.g., stool collection methods, gDNA isolation kit) should be considered as they may impact the sensitivity of SMS analyses (Thomas et al., 2012;Quince et al., 2017;Guan et al., 2021). ...
Article
Full-text available
We implemented culture- and shotgun metagenomic sequencing (SMS)-based methods to assess the gut colonization with extended-spectrum cephalosporin-resistant Enterobacterales (ESC-R- Ent ) in 42 volunteers. Both methods were performed using native and pre-enriched (broth supplemented with cefuroxime) stools. Native culture screening on CHROMID ® ESBL plates resulted in 17 positive samples, whereas the pre-enriched culture (gold-standard) identified 23 carriers. Overall, 26 ESC-R- Ent strains (24 Escherichia coli ) were identified: 25 CTX-M and 3 DHA-1 producers (2 co-producing CTX-Ms). Using the SMS on native stool (“native SMS”) with thresholds ≥60% for both identity and coverage, only 7 of the 23 pre-enriched culture-positive samples resulted positive for bla CTX-M / bla DHA genes (native SMS reads mapping to bla CTX-M / bla DHAs identified in gold-standard: sensitivity, 59.0%; specificity 100%). Moreover, an average of 31.5 and 24.6 antimicrobial resistance genes (ARGs) were detected in the 23 pre-enriched culture-positive and the 19 negative samples, respectively. When the pre-enriched SMS was implemented, more bla CTX-M / bla DHA genes were detected than in the native assay, including in stools that were pre-enriched culture-negative (pre-enriched SMS reads mapping to bla CTX-M / bla DHAs identified in gold-standard: sensitivity, 78.3%; specificity 75.0%). In addition, the pre-enriched SMS identified on average 38.6 ARGs/sample, whereas for the corresponding native SMS it was 29.4 ARGs/sample. Notably, stools resulting false-negative by using the native SMS had lower concentrations of ESC-R- Ent (average: ~10 ⁵ vs. ~10 ⁷ CFU/g) and E. coli classified reads (average: 193,959 vs. 1.45 million) than those of native SMS positive samples. Finally, the detection of bla CTX-M / bla DHA genes was compared with two well-established bioinformatic tools. In conclusion, only the pre-enriched SMS assured detection of most carriers of ESC-R- Ent . However, its performance was not comparable to the pre-enriched culture-based approach.
... It is important to develop tools that allow to 11 characterize and understand both its composition and its links with human health and 12 disease. 13 The common approach used to explore the microbiome is to start by charting the 14 species that compose it, quantify their diversity as well as their abundance and 15 eventually, their functional potential. This is made possible by advances of 16 Next-Generation Sequencing (NGS) technologies, without the need to cultivate specific 17 organisms. ...
... Unlike 16S, WGS data are highly 40 resolutive and more complex, enabling differentiation down to the strain level as well as 41 direct functional potential profiling. ( [12] and [13]). However, the short read 42 technologies used can make it challenging for the bioinformatics pipelines to classify 43 sequences. ...
Preprint
Full-text available
The ever decreasing cost of sequencing and the multiplication of potential applications for the study of metagenomes have led to an unprecedented increase in the volume of data generated. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome has been shown to play an important role in human health, providing critical information for patient diagnosis and prognosis. However, the analysis of metagenomic data remains challenging for many reasons, including reference catalogs, sparsity and compositionality of the data, to name a few. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. In fact, DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification, and disease prediction. Beyond the generation of predictive models, a key aspect of such methods remains their interpretability. In this article, we provide a systematic review of deep learning approaches in metagenomics, whether based on convolutional networks, autoencoders, or attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the key role the microbiome plays in our health.
... The copyright holder for this preprint this version posted July 27, 2023. ; In the mock samples, the most abundant taxa were represented by Beta-proteobacteria (Burkholderia, Ralstonia), Gamma-proteobacteria (Acinetobacter, Escherichia/Shigella), Actinomycetia (Cutibacterium), and other bacterial genera known to be kit and laboratory contaminants (Fig. 7A) (82)(83)(84)(85). A full list of OTUs found in the mock samples can be found in Table S1. ...
... Our study has obvious limitations, as a multitude of additional parameters, such as details of sampling, storage and homogenization procedures can affect the quality of purified DNA and communities' composition (29,58,84,92,95). Also, the usage of environmental samples for kit benchmarking does not allow to identify kit biases for specific bacterial taxa ("taxa-specific biases"), as the "ground truth" composition of investigated communities' was not known. ...
Preprint
Full-text available
Metagenomics is widely applied to study marine microbial communities. High quality DNA samples is critical for metagenomic projects success. In this work, we systematically evaluated the performance of eight widely used commercial DNA purification kits (Stool; Microbiome; PowerFecal; Blood and Tissue, and PowerSoil (all from Qiagen), PureLink Microbiome (Invitrogen), Monarch HMW DNA (NEB), and Soil Genomic DNA Isolation (LSBio)) with three types of samples: water, sea floor sediments, and digestive tract of a model invertebrate Pacific oyster Magallana gigas. For each kit-sample combination we measured the quantity of purified DNA, extent of DNA fragmentation, the presence of PCR-inhibiting contaminants, admixture of eukaryotic DNA, alpha-diversity, and reproducibility of the resulting community composition based on 16S rRNA amplicons sequencing. Additionally, we determined a kitome e.g., a set of contaminating taxa inherent for each type of DNA purification kit used. Each kit was ranked according to its performance. The resulting matrix of evaluated parameters allows one to select the best DNA purification procedure for a given type of sample.
... Shotgun metagenomics, i.e. the untargeted sequencing of DNA fragments from a mixed sample of genomes in a community, is now an established tool in microbial community analysis [1]. It enables the sequencing of genetic material from microbes that cannot otherwise be studied, for example through isolation and culturing [1]. ...
... Shotgun metagenomics, i.e. the untargeted sequencing of DNA fragments from a mixed sample of genomes in a community, is now an established tool in microbial community analysis [1]. It enables the sequencing of genetic material from microbes that cannot otherwise be studied, for example through isolation and culturing [1]. Pioneering studies have used metagenomics to survey the taxonomic and functional composition of various microbiomes such as the human gut or soil [2]. ...
Preprint
Full-text available
We introduce a novel metagenomics assembler for high-accuracy long reads. Our approach, implemented as metaMDBG, combines highly efficient de Bruijn graph assembly in minimizer space, with both a multi- k ′ approach for dealing with variations in genome coverage depth and an abundance-based filtering strategy for simplifying strain complexity. The resulting algorithm is more efficient than the state-of-the-art but with better assembly results. metaMDBG was 1.5 to 12 times faster than competing assemblers and requires between one-tenth and one-thirtieth of the memory across a range of data sets. We obtained up to twice as many high-quality circularised prokaryotic metagenome assembled genomes (MAGs) on the most complex communities, and a better recovery of viruses and plasmids. metaMDBG performs particularly well for abundant organisms whilst being robust to the presence of strain diversity. The result is that for the first time it is possible to efficiently reconstruct the majority of complex communities by abundance as nearcomplete MAGs.
... Accurate strain-level target pathogen identification is critical for public health surveillance, especially for controlling disease outbreaks (Buytaers et al., 2021). Metagenomic sequencing is an efficient and advanced technology for the study of microbiomes and target pathogen detection (Quince et al., 2017). The accurate identification of target pathogens in metagenomic data is of major importance. ...
Article
Full-text available
Motivation High-resolution target pathogen detection using metagenomic sequencing data represents a major challenge due to the low concentration of target pathogens in samples. We introduced mStrain, a novel Yesinia pestis strain/lineage-level identification tool that utilizes metagenomic data. mStrain successfully identified Y. pestis at the strain/lineage level by extracting sufficient information regarding single nucleotide polymorphisms (SNPs), which can therefore be an effective tool for identification and source tracking of Y. pestis based on metagenomic data during a plague outbreak. Definition Strain-level identification assigning the reads in the metagenomic sequencing data to an exactly known or most closely representative Y. pestis strain. Lineage-level identification assigning the reads in the metagenomic sequencing data to a specific lineage on the phylogenetic tree. CanoSNPs the unique and typical SNPs present in all representative strains. Ancestor/derived state an SNP is defined as the ancestor state when consistent with the allele of Y. pseudotuberculosis strain IP32953; otherwise, the SNP is defined as the derived state. [(Li and Cui, 2018)]. Availability The code for running mStrain, the test dataset, and instructions for running the code can be found at the following GitHub repository: https://github.com/xwqian1123/mStrain. Supplementary information Supplementary data are available at Bioinformatics Advances online.
... In 16S rRNA sequencing, the 16S rRNA, which is ubiquitous in all bacterial organisms but also has distinct variable regions that can be used to discriminate between different bacteria is first PCR-amplified and then sequenced 10 . Shotgun sequencing on the other hand is an untargeted sequencing of all microbial genomes in a sample 13 . In either case, short reads are preprocessed through steps of quality control and filtering steps. ...
Article
Full-text available
Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional. Analyzing compositional data presents many challenges because they are restricted to a simplex. In a logistic normal multinomial model, the relative abundance is mapped from a simplex to a latent variable that exists on the real Euclidean space using the additive log-ratio transformation. While a logistic normal multinomial approach brings flexibility for modeling the data, it comes with a heavy computational cost as the parameter estimation typically relies on Bayesian techniques. In this paper, we develop a novel mixture of logistic normal multinomial models for clustering microbiome data. Additionally, we utilize an efficient framework for parameter estimation using variational Gaussian approximations (VGA). Adopting a variational Gaussian approximation for the posterior of the latent variable reduces the computational overhead substantially. The proposed method is illustrated on simulated and real datasets.
... A metagenomic analysis refers to the untargeted replication and analysis of all DNA present in a sample using WGS techniques. From this, an idea of the taxonomic diversity, the presence and functionality of recognized genes, the discovery of novel genes and whole genome sequence recovery can be envisaged [234]. ...
Article
Full-text available
The extended-spectrum β-lactamase (ESBL)-producing Enterobacterales (ESBL-EB) encompass several important human pathogens and are found on the World Health Organization (WHO) priority pathogens list of antibiotic-resistant bacteria. They are a group of organisms which demonstrate resistance to third-generation cephalosporins (3GC) and their presence has been documented worldwide, including in aquaculture and the aquatic environment. This risk profile was developed following the Codex Guidelines for Risk Analysis of Foodborne Antimicrobial Resistance with the objectives of describing the current state of knowledge of ESBL-EB in relation to retail shrimp and salmon available to consumers in Canada, the primary aquacultured species consumed in Canada. The risk profile found that Enterobacterales and ESBL-EB have been found in multiple aquatic environments, as well as multiple host species and production levels. Although the information available did not permit the conclusion as to whether there is a human health risk related to ESBLs in Enterobacterales in salmon and shrimp available for consumption by Canadians, ESBL-EB in imported seafood available at the retail level in Canada have been found. Surveillance activities to detect ESBL-EB in seafood are needed; salmon and shrimp could be used in initial surveillance activities, representing domestic and imported products.
... On the other hand, DNA-to-Marker methods also suffer from low Recall more than DNA-to-DNA methods, i.e., false-negative identifications in microbial profiling 13 , because DNA-to-Marker methods such as MetaPhlAn and mOTUs have less identifiable species in their reference databases compared to DNA-to-DNA methods, which is caused by (1) missing of universal markers in some microbial genomes; (2) incomplete genome information in publicly available databases which may contribute to the missing marker issue, and (3) unfriendly reference database customization 24 . Notably, it is possible that the markers of low abundance species may not be fully detected in the sequencing data, especially if the markers do not cover the entire genome of the microbe 38 . ...
Article
Full-text available
Accurate species identification and abundance estimation are critical for the interpretation of whole metagenome sequencing (WMS) data. Yet, existing metagenomic profilers suffer from false-positive identifications, which can account for more than 90% of total identified species. Here, by leveraging species-specific Type IIB restriction endonuclease digestion sites as reference instead of universal markers or whole microbial genomes, we present a metagenomic profiler, MAP2B (MetAgenomic Profiler based on type IIB restriction sites), to resolve those issues. We first illustrate the pitfalls of using relative abundance as the only feature in determining false positives. We then propose a feature set to distinguish false positives from true positives, and using simulated metagenomes from CAMI2, we establish a false-positive recognition model. By benchmarking the performance in metagenomic profiling using a simulation dataset with varying sequencing depth and species richness, we illustrate the superior performance of MAP2B over existing metagenomic profilers in species identification. We further test the performance of MAP2B using real WMS data from an ATCC mock community, confirming its superior precision against sequencing depth. Finally, by leveraging WMS data from an IBD cohort, we demonstrate the taxonomic features generated by MAP2B can better discriminate IBD and predict metabolomic profiles.
... [1] "Shotgun metagenomics", the non-targeted sequencing of microbial genomes in a sample, allows the investigation of complex microbial communities by a number of analyses. [2] Analyzing metagenomes through de novo assembly to contigs and binning of those contigs to yield so-called metagenome-assembled genomes (MAGs) has become increasingly common over the course of the last years, with studies assembling large numbers of genomes from metagenomic reads (see for example [3][4][5]). This approach has already provided a greater overview of microbial diversity and deeper insights into new metabolic pathways. ...
Preprint
Full-text available
Background The possibility of recovering metagenome-assembled genomes (MAGs) from sequence reads allows for further insights into microbial communities and their members, possibly even analyzing such sequences with tools designed for single-isolate genomes. As result quality depends on sequence quality, performance of tools for single-isolate genomes on MAGs should be tested beforehand. Bioinformatics can be leveraged to quickly create varied synthetic test sets with known composition for this purpose. Results We present MAGICIAN, a flexible, user-friendly pipeline for the simulation of MAGs. MAGICIAN combines a synthetic metagenome simulator with a metagenomic assembly and binning pipeline to simulate MAGs based on user-supplied input genomes, allowing users to test performance of tools on MAGs while having a ground truth to compare results to. We also demonstrate the use of simulated MAGs by evaluating the suitability of such genomes obtained with MAGICIAN's current default pipeline for analysis with the antimicrobial resistance gene identification tool ResFinder. Conclusions Using MAGICIAN, it is possible to simulate MAGs which, while generally high in quality, reflect issues encountered with real-world data, thus providing realistic best-case data. Evaluating the results of ResFinder analysis of these genomes revealed a risk for plausible-looking false positives, which underlines the need for pipeline validation so that researchers are aware of the potential issues when interpreting real-world data.
... The most commonly used untargeted methods are 16S rRNA gene sequencing [8][9][10] and DNA shotgun metagenomic sequencing. The former technique provides an overview of bacterial communities up to the genus level, whereas the latter provides bacterial taxonomy up to the species and strain-level resolution [11][12][13]. Amplifying specific gene regions is the first step in 16S rRNA sequencing. However, the choice of primer sets can lead to variable results with different levels of amplification bias. ...
Article
Full-text available
DNA shotgun sequencing is an untargeted approach for identifying changes in relative abundances, while qPCR allows reproducible quantification of specific bacteria. The canine dysbiosis index (DI) assesses the canine fecal microbiota by using a mathematical algorithm based on qPCR results. We evaluated the correlation between qPCR and shotgun sequencing using fecal samples from 296 dogs with different clinical phenotypes. While significant correlations were found between qPCR and sequencing, certain taxa were only detectable by qPCR and not by sequencing. Based on sequencing, less than 2% of bacterial species (17/1190) were consistently present in all healthy dogs (n = 76). Dogs with an abnormal DI had lower alpha-diversity compared to dogs with normal DI. Increases in the DI correctly predicted the gradual shifts in microbiota observed by sequencing: minor changes (R = 0.19, DI < 0 with any targeted taxa outside the reference interval, RI), mild-moderate changes (R = 0.24, 0 < DI < 2), and significant dysbiosis (R = 0.54, 0.73, and 0.91 for DI > 2, DI > 5, and DI > 8, respectively), compared to dogs with a normal DI (DI < 0, all targets within the RI), as higher R-values indicated larger dissimilarities. In conclusion, the qPCR-based DI is an effective indicator of overall microbiota shifts observed by shotgun sequencing in dogs.
... This technology, coupled with recent advances in bioinformatics, enables the determination of genotypic variation within species [17], which helps to delineate metabolic potential and diversity. This leads to inferring functional information [18], enriching taxonomic profiling with unprecedented depth up to species-and intra-species-level variation of microorganisms that could affect the flavour of cheese, and discovering phage abundance [19]. However, a significant challenge facing this approach is the difficulty in assembling genomes from highly diverse sequences of cheese microbiota using as a reference the available genomes deposited in widely used public databases [20]. ...
Article
Full-text available
Shotgun metagenomic sequencing was used to investigate the diversity of the microbial community of Cheddar cheese ripened over 32 months. The changes in taxa abundance were compared from assembly-based, non-assembly-based, and mOTUs2 sequencing pipelines to delineate the community profile for each age group. Metagenomic assembled genomes (MAGs) passing the quality threshold were obtained for 11 species from 58 samples. Although Lactococcus cremoris and Lacticaseibacillus paracasei were dominant across the shotgun samples, other species were identified using MG-RAST. NMDS analysis of the beta diversity of the microbial community revealed the similarity of the cheeses in older age groups (7 months to 32 months). As expected, the abundance of Lactococcus cremoris consistently decreased over ripening, while the proportion of permeable cells increased. Over the ripening period, the relative abundance of viable Lacticaseibacillus paracasei progressively increased, but at a variable rate among trials. Reads attributed to Siphoviridae and Ascomycota remained below 1% relative abundance. The functional profiles of PMA-treated cheeses differed from those of non-PMA-treated cheeses. Starter rotation was reflected in the single nucleotide variant profiles of Lactococcus cremoris (SNVs of this species using mOTUs2), while the incoming milk was the leading factor in discriminating Lacticaseibacillus paracasei/casei SNV profiles. The relative abundance estimates from Kraken2, non-assembly-based (MG-RAST) and marker gene clusters (mOTUs2) were consistent across age groups for the two dominant taxa. Metagenomics enabled sequence variant analysis below the bacterial species level and functional profiling that may affect the metabolic interactions between subpopulations in cheese during ripening, which could help explain the overall flavour development of cheese. Future work will integrate microbial variants with volatile profiles to associate the development of compounds related to cheese flavour at each ripening stage.
... Despite its convenience, amplicon sequencing suffers from PCR bias and can have limited resolution in discriminating closely related species or strains of the same species. Metagenomic sequencing, which directly sequences all genomic DNA within an environment, enables both the profiling of phylogenetic diversity and the comprehensive accounting of all the genes present within a microbiome 5 . However, because the data is acquired as a pool of mixed sequencing reads originating from all organisms, the bioinformatic reassembly requires sophisticated computational algorithms for assembly and sometimes yields disconnected genomic fragments 6 . ...
Preprint
Single cell sequencing is useful for resolving complex systems into their composite cell types and computationally mining them for unique features that are masked in pooled sequencing. However, while commercial instruments have made single cell analysis widespread for mammalian cells, analogous tools for microbes are limited. Here, we present EASi-seq (Easily Accessible Single microbe sequencing). By adapting the single cell workflow of the commercial Mission Bio Tapestri instrument, this method allows for efficient sequencing of individual microbes' genomes. EASi-seq allows thousands of microbes to be sequenced per run and, as we show, can generate detailed atlases of human and environmental microbiomes. The ability to capture large shotgun genome datasets from thousands of single microbes provides new opportunities in discovering and analyzing species subpopulations. To facilitate this, we develop a companion bioinformatic pipeline that clusters microbes by similarity, improving whole genome assembly, strain identification, taxonomic classification, and gene annotation. In addition, we demonstrate integration of metagenomic contigs with the EASi-seq datasets to reduce capture bias and increase coverage. Overall, EASi-seq enables high quality single cell genomic data for microbiome samples using an accessible workflow that can be run on a commercially available platform.
... According to NaPDoS, the C and KS domain sequences with below 85% identity may indicate that the domain of interest may contribute to BGC that has most likely not yet been characterized [34], suggesting that some of those BGCs might be encoding the novel metabolites. As in this case, shotgun metagenomics may lead to fragmented gene clusters, which may account for many low identity scores [55]. Indeed, the phylogenetic results showed a wide range of bioactivity of the retrieved C and KS domain sequences, including antibacterial, antitumor, antifungal, and immunosuppressant activities, although further studies would be required to assess the extent of these novelties. ...
Article
Full-text available
The Borra caves, the second largest subterranean karst cave ecosystem in the Indian sub-continent, are located at the Ananthagiri hills of Araku Valley in the Alluri district of Andhra Pradesh, India. The present investigation applied a shotgun metagenomic approach to gain insights into the microbial community structure, metabolic potential, and biosynthetic gene cluster (BGC) diversity of the microbes colonizing the surface of the speleothems from the aphotic zone of Borra caves. The taxonomic analysis of the metagenome data illustrated that the speleothem-colonizing core microbial community was dominated mainly by Alpha-, Beta-, and Gamma-Proteobacteria, Actinobacteria, Firmicutes, and Bacteroidetes. The key energy metabolic pathways analysis provides strong evidence of chemolithoautotrophic and chemoheterotrophic modes of nutrition in the speleothem-colonizing microbial community. Metagenome data suggests that sulfur reducers and sulfur-disproportionating microbes might play a vital role in energy generation in this ecosystem. Our metagenome data also suggest that the dissimilatory nitrifiers and nitrifying denitrifiers might play an essential role in conserving nitrogen pools in the ecosystem. Furthermore, metagenome-wide BGCs mining retrieved 451 putative BGCs; NRPS was the most abundant (24%). Phylogenetic analysis of the C domain of NRPS showed that sequences were distributed across all six function categories of the known C domain, including several novel subclades. For example, a novel subclade had been recovered within the LCL domain clade as a sister subclade of immunosuppressant cyclosporin encoding C domain sequences. Our result suggested that subterranean cave microbiomes might be a potential reservoir of novel microbial metabolites.
... Furthermore, we determined and compared the genome completeness of both approaches to highlight the advantages and disadvantages of either approach. Additionally, since total RNA-Seq has not yet been extensively tested, we also determined the impact of commonly used data processing tools on SSU rRNA reconstruction, as it has been repeatedly shown that results based on HTS are heavily influenced by the choice of bioinformatics tools (Bashiardes et al., 2016;Knight et al., 2018;McIntyre et al., 2017;Quince et al., 2017;Shakya et al., 2019;Vollmers et al., 2017). Lastly, we applied metagenomics and total RNA-Seq and the same data processing tools to an aquarium sample, which served as a proxy for an environmental freshwater sample. ...
Article
Full-text available
The small subunit (SSU) ribosomal RNA (rRNA) is the most commonly used marker for the identification of microbial taxa, but its full‐length reconstruction from high‐throughput sequencing (HTS) data remains challenging. Metagenomics and total RNA sequencing (total RNA‐Seq) are target‐PCR‐free HTS methods that are used to characterize microbial communities and simultaneously reconstruct SSU rRNA sequences. However, more testing is required to determine and improve their effectiveness. We processed metagenomics and total RNA‐Seq data retrieved from a commercially available mock microbial community and an aquarium sample using 112 combinations of data processing tools. We determined the SSU rRNA reconstruction completeness of both sequencing methods for both samples and analysed the impact of data processing tools on SSU rRNA completeness. In contrast to metagenomics, total RNA‐Seq allowed for the complete or near‐complete reconstruction of all mock community SSU rRNA sequences and generated up to 438 SSU rRNA sequences with ≥80% completeness from the aquarium sample using only 1/5 of an Illumina MiSeq run. SSU rRNA completeness of metagenomics significantly correlated with the genome size of mock community species. Data processing tools impacted SSU rRNA completeness, in particular the utilized assemblers. These results are promising for the high‐throughput reconstruction of novel full‐length SSU rRNA sequences and could advance the simultaneous application of multiple ‐omics approaches in routine environmental assessments to allow for more holistic assessments of ecosystems.
... The metagenomic analysis was conducted following the general guidelines 34 and based on the bioBakery computational environment. 35,36 Highresolution taxonomic profiling of the TwinsUK and ZOE PREDICT-1 metagenomes was performed using MetaPhlAn 4.beta.2 ...
Article
Full-text available
Short-chain fatty acids (SCFA) are involved in immune system and inflammatory responses. We comprehensively assessed the host genetic and gut microbial contribution to a panel of eight serum and stool SCFAs in two cohorts (TwinsUK, n = 2507; ZOE PREDICT-1, n = 328), examined their postprandial changes and explored their links with chronic and acute inflammatory responses in healthy individuals and trauma patients. We report low concordance between circulating and fecal SCFAs, significant postprandial changes in most circulating SCFAs, and a heritable genetic component (average h2: serum = 14%(SD = 14%); stool = 12%(SD = 6%)). Furthermore, we find that gut microbiome can accurately predict their fecal levels (AUC>0.71) while presenting weaker associations with serum. Finally, we report different correlation patterns with inflammatory markers depending on the type of inflammatory response (chronic or acute trauma). Our results illustrate the breadth of the physiological relevance of SCFAs on human inflammatory and metabolic responses highlighting the need for a deeper understanding of this important class of molecules.
... Although sequence-based methods have been transformative for microbiome research, they are not perfect. Biases can be introduced at every step of sequence-based studies, from sample collection and storage, through laboratory-based steps such as DNA extraction, to choice of bioinformatic pipelines and reference databases used to analyse the data 33 . Comparisons of sequence-based versus culture-based studies of the microbiota have shown that sequence-based approaches completely failed to detect some species that were only recovered using traditional culturing methods 34 . ...
Article
Over the past two decades, interest in human microbiome research has increased exponentially. Regrettably, this increased activity has brought with it a degree of hype and misinformation, which can undermine progress and public confidence in the research. Here we highlight selected human microbiome myths and misconceptions that lack a solid evidence base. By presenting these examples, we hope to draw increased attention to the implications of inaccurate dogma becoming embedded in the literature, and the importance of acknowledging nuance when describing the complex human microbiome. Free full text available via Readcube: https://rdcu.be/dioPr
... Indeed, a significant fraction of common human syndromes with suspected infectious causes remain of unknown or unidentifiable etiology despite extensive screening (Denno et al., 2005;Jain et al., 2015;Kapikian, 1993;Khetsuriani et al., 2002;Sivertsen and Christensen, 1996). In contrast, the shotgun-driven approach by metagenomic Next-Generation Sequencing (mNGS) is an unbiased method in which the total nucleic acid content within a given clinical sample is randomly amplified and sequenced (Delwart, 2007;Dulanto Chiang and Dekker, 2020;Mokili et al., 2012;Quince et al., 2017;Schlaberg et al., 2017). This enables the simultaneous identification of virtually any pathogen of bacterial, viral, fungal, or parasitic origin, potentially even if unknown so far, in just one analysis (Tschumi et al., 2019;Naccache et al., 2015;Cordey et al., 2016). ...
Article
The ability of viral metagenomic Next-Generation Sequencing (mNGS) to unbiasedly detect nucleic acids in a clinical sample is a powerful tool for advanced diagnosis of viral infections. When clinical symptoms do not provide a clear differential diagnosis, extensive laboratory testing with virus-specific PCR and serology can be replaced by a single viral mNGS analysis. However, widespread diagnostic use of viral mNGS is thus far limited by long sample-to-result times, as most protocols rely on Illumina sequencing, which provides high and accurate sequencing output but is time-consuming and expensive. Here, we describe the development of an mNGS protocol based on the more cost-effective Nanopore Flongle sequencing with decreased turnaround time and lower, yet sufficient sequencing output to provide sensitive virus detection. Sample preparation (6h) and sequencing (2h) times are substantially reduced compared to Illumina mNGS and allow detection of DNA/RNA viruses at low input (up to 33-38 cycle threshold of specific qPCR). Although Flongles yield lower sequencing output, direct comparison with Illumina mNGS on diverse clinical samples showed similar results. Collectively, the novel Nanopore mNGS approach is specifically tailored for use in clinical diagnostics and provides a rapid and cost-effective mNGS strategy for individual testing of severe cases.
... In metagenomics, DNA is isolated from the environment and sequenced in either 16S, 23S, 28S, or ITS amplicon to study the taxonomic content and whole genome-wide shotgun-based NGS methods for the possible taxonomical and functional content of microbial groups. Next-generation sequencing of metagenome workflows has three basic steps: library preparation, sequencing, and data analysis (Quince et al. 2017). The inbuilt software sequencer automatically identifies the nucleotides, finally ending in building bases in large files called raw reads. ...
Article
Full-text available
Metagenomics has now evolved as a promising technology for understanding the microbial population in the environment. By metagenomics, a number of extreme and complex environment has been explored for their microbial population. Using this technology, researchers have brought out novel genes and their potential characteristics, which have robust applications in food, pharmaceutical, scientific research, and other biotechnological fields. A sequencing platform can provide a sequence of microbial populations in any given environment. The sequence needs to be analysed computationally to derive meaningful information. It is presumed that only bioinformaticians with extensive computational skills can process the sequencing data till the downstream end. However, numerous open-source software and online servers are available to analyse the metagenomic data developed for a biologist with less computational skills. This review is focused on bioinformatics tools such as Galaxy, CSI-NGS portal, ANASTASIA and SHAMAN, EBI- metagenomics, IDseq, and MG-RAST for analysing metagenomic data.
... Approaches for profiling the mobile fraction of the community resistome To perform shotgun sequencing, the total metagenomic DNA of the community is first extracted and then sequenced to produce short (~150-300 bp) sequencing reads 149 . The reads can subsequently be assembled into larger contigs, which are then grouped into bins based on the predicted common microbial origin 149 (see the figure). ...
Article
Antibiotic-mediated perturbation of the gut microbiome is associated with numerous infectious and autoimmune diseases of the gastrointestinal tract. Yet, as the gut microbiome is a complex ecological network of microorganisms, the effects of antibiotics can be highly variable. With the advent of multi-omic approaches for systems-level profiling of microbial communities, we are beginning to identify microbiome-intrinsic and microbiome-extrinsic factors that affect microbiome dynamics during antibiotic exposure and subsequent recovery. In this Review, we discuss factors that influence restructuring of the gut microbiome on antibiotic exposure. We present an overview of the currently complex picture of treatment-induced changes to the microbial community and highlight essential considerations for future investigations of antibiotic-specific outcomes. Finally, we provide a synopsis of available strategies to minimize antibiotic-induced damage or to restore the pretreatment architectures of the gut microbial community.
... Currently, insights into the structure and function of the microbiota community mainly come from 16S rRNA gene profiling and shotgun metagenomics. While 16S rRNA amplicon sequencing offers a cost-efficient way to assess bacterial abundance at a higher taxonomic level, whole-genome shotgun metagenomics resolves the abundance of species and strains, together with the functional potential they encode (Quince et al, 2017;Almeida et al, 2019;Pasolli et al, 2019). In addition, gene and protein expression and metabolite abundance in the community can be quantified with metatranscriptomics (Bashiardes et al, 2016), metaproteomics (Zhang & Figeys, 2019) and metabolomics Han et al, 2021), respectively. ...
Article
Full-text available
Multi-omics analyses are used in microbiome studies to understand molecular changes in microbial communities exposed to different conditions. However, it is not always clear how much each omics data type contributes to our understanding and whether they are concordant with each other. Here, we map the molecular response of a synthetic community of 32 human gut bacteria to three non-antibiotic drugs by using five omics layers (16S rRNA gene profiling, metagenomics, metatranscriptomics, metaproteomics and metabolomics). We find that all the omics methods with species resolution are highly consistent in estimating relative species abundances. Furthermore, different omics methods complement each other for capturing functional changes. For example, while nearly all the omics data types captured that the antipsychotic drug chlorpromazine selectively inhibits Bacteroidota representatives in the community, the metatranscriptome and metaproteome suggested that the drug induces stress responses related to protein quality control. Metabolomics revealed a decrease in oligosaccharide uptake, likely caused by Bacteroidota depletion. Our study highlights how multi-omics datasets can be utilized to reveal complex molecular responses to external perturbations in microbial communities.
... Whole genome shotgun sequencing overcomes the limitations of 16S rRNA gene sequencing as it profiles the taxonomic composition of complex bacterial, viral, fungal and archaeal communities down to species level [16,17]. Whereas one study examined the airway metagenome of healthy infants and reported on the importance of both the high-(95% most abundant) and lowabundance (5% least abundant) species biosphere for shaping a healthy microbiome [18], to our knowledge, the longitudinal development of the airway metagenome of preterm infants has not been studied. ...
Article
Full-text available
Preterm birth is accompanied with many complications and requires severe therapeutic regimens at the neonatal intensive care unit. The influence of the above-mentioned factors on the premature-born infants' respiratory metagenome or more generally its maturation is unknown. We therefore applied shotgun metagenome sequencing of oropharyngeal swabs to analyze the airway metagenome development of 24 preterm infants from one week postpartum to 15 months of age. Beta diversity analysis revealed a distinct clustering of airway microbial communities from hospitalized preterms and samples after hospital discharge. At nine and 15 months of age, the preterm infants lost their hospital-acquired individual metagenome signatures towards a common taxonomic structure. However, ecological network analysis and Random Forest classification of cross-sectional data revealed that by this age the preterm infants did not succeed in establishing the uniform and stable bacterial community structures that are characteristic for healthy full-term infants.
... Most environmental bacteria cannot be isolated and the few organisms that are culturable outside of their natural environments fail to adequately represent prokaryotic diversity [9,10]. Metagenomic sequencing can provide functional 'potential' [11] and can be used to estimate bacterial replication rates [12]. However, genome-based indicators of functional potential often fail to predict observed traits. ...
Article
Full-text available
Predicting ecosystem function is critical to assess and mitigate the impacts of climate change. Quantitative predictions of microbially mediated ecosystem processes are typically uninformed by microbial biodiversity. Yet new tools allow the measurement of taxon-specific traits within natural microbial communities. There is mounting evidence of a phylogenetic signal in these traits, which may support prediction and microbiome management frameworks. We investigated phylogeny-based trait prediction using bacterial growth rates from soil communities in Arctic, boreal, temperate, and tropical ecosystems. Here we show that phylogeny predicts growth rates of soil bacteria, explaining an average of 31%, and up to 58%, of the variation within ecosystems. Despite limited overlap in community composition across these ecosystems, shared nodes in the phylogeny enabled ancestral trait reconstruction and cross-ecosystem predictions. Phylogenetic relationships could explain up to 38% (averaging 14%) of the variation in growth rates across the highly disparate ecosystems studied. Our results suggest that shared evolutionary history contributes to similarity in the relative growth rates of related bacteria in the wild, allowing phylogeny-based predictions to explain a substantial amount of the variation in taxon-specific functional traits, within and across ecosystems.
... To overcome sequencing error and short read length, we used a mapping-based consensus method. This method cannot capture underlying diversity in non-dominant sequence types [59]. Prochlorococcus has been shown to have multiple co-existing microdiverse haplotypes in situ, which vary in abundance [43]. ...
Article
Full-text available
Prochlorococcus is the most numerically abundant photosynthetic organism in the surface ocean. The Prochlorococcus high-light and warm-water adapted ecotype (HLII) is comprised of extensive microdiversity, but specific functional differences between microdiverse sub-clades remain elusive. Here we characterized both functional and phylogenetic diversity within the HLII ecotype using Bio-GO-SHIP metagenomes. We found widespread variation in gene frequency connected to local environmental conditions. Metagenome-assembled marker genes and genomes revealed a globally distributed novel HLII haplotype defined by adaptation to chronically low P conditions (HLII-P). Environmental correlation analysis revealed different factors were driving gene abundances verses phylogenetic differences. An analysis of cultured HLII genomes and metagenome-assembled genomes revealed a subclade within HLII, which corresponded to the novel HLII-P haplotype. This work represents the first global assessment of the HLII ecotype’s phylogeography and corresponding functional differences. These findings together expand our understanding of how microdiversity structures functional differences and reveals the importance of nutrients as drivers of microdiversity in Prochlorococcus .
... In particular, there is a lack of understanding regarding the long-term fate and functionality of ENMs in the soil-microbe-plant environment. Recent advances in shotgun metagenomic analyses of microbial communities provide a powerful platform to mechanistically probe the interplay between ENMs and the plant-soil microbiome [16]. ...
Article
The plant-associated microbiome is known to be a critical component for crop growth, nutrient acquisition, resistance to pathogens, and abiotic stress tolerance. Conventional approaches have been attempted to manipulate the plant–soil microbiome to improve plant performance; however, several issues have arisen, such as collateral negative impacts on microbiota composition. The lack of reliability and robustness of conventional techniques warrants efforts to develop novel alternative strategies. Nano-enabled approaches have emerged as promising platforms for enhancing agricultural sustainability and global food security. Specifically, the use of engineered nanomaterials (ENMs) as nanoscale agrochemicals has great potential to modulate the plant-associated microbiome. We review the dynamic interplay between nano-agrochemicals and the plant-associated microbiome for the safe development and use of nano-enabled microbiome engineering.
... Because it lacks an amplification step, it represents only the most common genes. It also represents only genetic potential, as opposed to those genes undergoing expression or translation 38 . Shotgun metatranscriptomic (MTT) data provides a shotgun representation of the community translation pool. ...
Article
Full-text available
While healthy gut microbiomes are critical to human health, pertinent microbial processes remain largely undefined, partially due to differential bias among profiling techniques. By simultaneously integrating multiple profiling methods, multi-omic analysis can define generalizable microbial processes, and is especially useful in understanding complex conditions such as Autism. Challenges with integrating heterogeneous data produced by multiple profiling methods can be overcome using Latent Dirichlet Allocation (LDA), a promising natural language processing technique that identifies topics in heterogeneous documents. In this study, we apply LDA to multi-omic microbial data (16S rRNA amplicon, shotgun metagenomic, shotgun metatranscriptomic, and untargeted metabolomic profiling) from the stool of 81 children with and without Autism. We identify topics, or microbial processes, that summarize complex phenomena occurring within gut microbial communities. We then subset stool samples by topic distribution, and identify metabolites, specifically neurotransmitter precursors and fatty acid derivatives, that differ significantly between children with and without Autism. We identify clusters of topics, deemed “cross-omic topics”, which we hypothesize are representative of generalizable microbial processes observable regardless of profiling method. Interpreting topics, we find each represents a particular diet, and we heuristically label each cross-omic topic as: healthy/general function, age-associated function, transcriptional regulation, and opportunistic pathogenesis.
Preprint
Full-text available
Background To investigate microbial communities and their contributions to carbon and nutrient cycling along water gradients can enhance our comprehension of climate change impacts on ecosystem services. Results We conducted an assessment of microbial communities, metagenomic functions, and metabolomic profiles within four ecosystems, i.e., desert grassland (DG), shrub-steppe (SS), forest (FO) and marsh (MA) in the Altai region of Xinjiang, China. Soil total carbon (TC), total nitrogen, NH4⁺, and NO3⁻ increased linearly, but pH decreased with soil water gradients. Microbial abundances and richness also increased with soil moisture except the abundances of fungi and protists being lowest in MA. Within prokaryotes, the relative abundances of Proteobacteria and Acidobacteria increased, whereas those of Actinobacteria and Thaumarchaeota decreased along water gradients. In fungi and protists, Basidiomycota and Mortierellomycota, Evosea and Endomyxa became dominant in FO and MA, respectively, but the relative abundance of Cercozoa decreased along soil moisture gradients. The β-diversity of microbiomes, metagenomic and metabolomic functioning were linearly distributed along soil moisture gradients, significantly associated with soil factors of TC, NH4⁺, and pH. For soil metagenomic functions, the metabolic genes related to Carbohydrate (CO2 fixation, Di- and oligosaccharides, Fermentation, and One-carbon metabolism), Iron (Iron acquisition in Vibrio and Campylobacter iron metabolism) decreased with soil moisture, while genes related to the metabolisms of Nitrogen (Ammonia assimilation, Denitrification, Nitrogen fixation, and Nitrosative stress) and Potassium (Potassium homeostasis) increased linearly along water gradients. Additionally, MA harbored the most abundant metabolomics dominated by lipids and lipid-like molecules (Erucic acid, Hypogeic acid, and Kojibiose, etc.), and organic oxygen compounds (Maltotetraose, Quinone, Sucrose, and Trehalose, etc.), except certain metabolites showing decline trends along water gradients, such as N'-Hydroxymethylnorcotinine and 5-Hydroxyenterolactone. Conclusions Our study suggests that future ecosystem succession facilitated by changes in rainfall patterns will significantly alter soil microbial taxa, functional potential and metabolite fractions.
Preprint
Full-text available
Single cell sequencing is useful for resolving complex systems into their composite cell types and computationally mining them for unique features that are masked in pooled sequencing. However, while commercial instruments have made single cell analysis widespread for mammalian cells, analogous tools for microbes are limited. Here, we present EASi-seq (Easily Accessible Single microbe sequencing). By adapting the single cell workflow of the commercial Mission Bio Tapestri instrument, this method allows for efficient sequencing of individual microbes’ genomes. EASi-seq allows thousands of microbes to be sequenced per run and, as we show, can generate detailed atlases of human and environmental microbiomes. The ability to capture large shotgun genome datasets from thousands of single microbes provides new opportunities in discovering and analyzing species subpopulations. To facilitate this, we develop a companion bioinformatic pipeline that clusters microbes by similarity, improving whole genome assembly, strain identification, taxonomic classification, and gene annotation. In addition, we demonstrate integration of metagenomic contigs with the EASi-seq datasets to reduce capture bias and increase coverage. Overall, EASi-seq enables high quality single cell genomic data for microbiome samples using an accessible workflow that can be run on a commercially available platform.
Article
Full-text available
The literature of human and other host-associated microbiome studies is expanding rapidly, but systematic comparisons among published results of host-associated microbiome signatures of differential abundance remain difficult. We present BugSigDB, a community-editable database of manually curated microbial signatures from published differential abundance studies accompanied by information on study geography, health outcomes, host body site and experimental, epidemiological and statistical methods using controlled vocabulary. The initial release of the database contains >2,500 manually curated signatures from >600 published studies on three host species, enabling high-throughput analysis of signature similarity, taxon enrichment, co-occurrence and coexclusion and consensus signatures. These data allow assessment of microbiome differential abundance within and across experimental conditions, environments or body sites. Database-wide analysis reveals experimental conditions with the highest level of consistency in signatures reported by independent studies and identifies commonalities among disease-associated signatures, including frequent introgression of oral pathobionts into the gut.
Article
A growing body of evidence suggests that the gut microbiota contributes to the development of neurodegenerative diseases via the microbiota-gut-brain axis. As a contributing factor, microbiota dysbiosis always occurs in pathological changes of neurodegenerative diseases, such as Alzheimer’s disease, Parkinson’s disease, and amyotrophic lateral sclerosis. High-throughput sequencing technology has helped to reveal that the bidirectional communication between the central nervous system and the enteric nervous system is facilitated by the microbiota’s diverse microorganisms, and for both neuroimmune and neuroendocrine systems. Here, we summarize the bioinformatics analysis and wet-biology validation for the gut metagenomics in neurodegenerative diseases, with an emphasis on multi-omics studies and the gut virome. The pathogen-associated signaling biomarkers for identifying brain disorders and potential therapeutic targets are also elucidated. Finally, we discuss the role of diet, prebiotics, probiotics, postbiotics and exercise interventions in remodeling the microbiome and reducing the symptoms of neurodegenerative diseases.
Article
Carotenoids have been associated with risk reduction for several chronic diseases, including the association of their dietary intake/circulating levels with reduced incidence of obesity, type 2 diabetes, certain types of cancer, and even lower total mortality. In addition to some carotenoids constituting vitamin A precursors, they are implicated in potential antioxidant effects and pathways related to inflammation and oxidative stress, including transcription factors such as nuclear factor kappa B (NF-κB) and nuclear factor erythroid 2-related factor 2 (Nrf2). Carotenoids and metabolites may also interact with nuclear receptors, mainly retinoic acid receptor/retinoid X receptor (RAR/RXR) and peroxisome proliferator-activated receptors (PPARs) that play a role in the immune system and cellular differentiation. Therefore, a large number of downstream targets are likely influenced by carotenoids, including but not limited to genes, proteins implicated in oxidative stress and inflammation, antioxidant enzymes and cellular differentiation processes. Furthermore, recent studies also propose an association between carotenoid intake and gut microbiota. While all these endpoints could be individually assessed, a more complete/integrative way to determine a multitude of health-related aspects of carotenoids includes (multi)-omics-related techniques, especially transcriptomics, proteomics, lipidomics and metabolomics, but also metagenomics, measured in a variety of biospecimens including plasma, urine, stool, white blood cells or other tissue cellular extracts. In this review, we highlight the use of -omics technologies to assess health-related effects of carotenoids in mammalian organisms and models.
Preprint
Full-text available
Archaea, bacteria, and fungi in the soil are increasingly recognized as determinants of agricultural productivity and sustainability. A crucial step for exploring soil microbiomes with high ecosystem functions is to perform statistical analyses on potential relationship between microbiome structure and functions based on comparisons of hundreds or thousands of environmental samples collected across broad geographic ranges. In this study, we integrated agricultural field metadata with microbial community analyses by targeting > 2,000 soil samples collected along a latitudinal gradient from cool-temperate to subtropical regions in Japan (26.1-42.8 N). The data involving 579 archaeal, 26,640 bacterial, and 6,306 fungal amplicon sequencing variants detected across the fields of 19 crop plant species allowed us to conduct statistical analyses on relationship among edaphic factors, microbiome compositions, and crop disease prevalence. We then found that not only compositions of prokaryotic and fungal communities but also balance between prokaryotic and fungal abundance had statistically significant impacts on crop disease status. A network analysis suggested that the prokaryotes and fungi could be classified into several species sets (network modules), which differed substantially in associations with crop disease prevalence. Within the network of microbe-to-microbe coexistence, ammonium-oxidizing archaea and nitrite-oxidizing bacteria were inferred to play some roles in shifts between crop-disease-promotive and crop-disease-suppressive states of soil microbiomes. The bird's-eye view of soil microbiome structure will provide a basis for designing agroecosystems with high nutrient-use efficiency and disease-suppressive functions.
Article
Full-text available
As coral reef ecosystems experience unprecedented change, effective monitoring of reef features supports management, conservation, and intervention efforts. ‘Omics techniques show promise in quantifying key components of reef ecosystems, dissolved metabolites and microorganisms, that may serve as invisible sensors for reef ecosystem dynamics. Dissolved metabolites are released by reef organisms and transferred among microorganisms, acting as chemical currencies, and contributing to nutrient cycling and signaling on reefs. Here we applied four ‘omics techniques (taxonomic microbiome via amplicon sequencing, functional microbiome via shotgun metagenomics, targeted metabolomics and untargeted metabolomics) to waters overlying Florida’s Coral Reef, as well as microbiome profiling on individual coral colonies from these reefs to understand how microbes and dissolved metabolites reflect biogeographical, benthic and nutrient properties of this 500-km barrier reef. We show that the microbial and metabolite ‘omics approaches both differentiated reef habitats based on geographic zone. Further, seawater microbiome profiling and targeted metabolomics were significantly related to more reef habitat characteristics, such as amount of hard and soft coral, compared to metagenomic sequencing and untargeted metabolomics. Across five coral species, microbiomes were also significantly related to reef zone, followed by species and disease status, suggesting that the geographic water circulation patterns in Florida also impact the microbiomes of reef builders. A combination of differential abundance and indicator species analyses revealed metabolite and microbial signatures of specific reef zones, which demonstrates the utility of these techniques to provide new insights into reef microbial and metabolite features that reflect broader ecosystem processes.
Article
Full-text available
Microbiological and biomolecular approaches to cultural heritage research have expanded the established research horizon from the prevalent focus on the cultural objects' conservation and human health protection to the relatively recent applications to provenance inquiry and assessment of environmental impacts in a global context of a changing climate. Standard microbiology and molecular biology methods developed for other materials, specimens, and contexts could, in principle, be applied to cultural heritage research. However, given certain characteristics common to several heritage objects—such as uniqueness, fragility, high value, and restricted access, tailored approaches are required. In addition, samples of heritage objects may yield low microbial biomass, rendering them highly susceptible to cross-contamination. Therefore, dedicated methodology addressing these limitations and operational hurdles is needed. Here, we review the main experimental challenges and propose a standardized workflow to study the microbiome of cultural heritage objects, illustrated by the exploration of bacterial taxa. The methodology was developed targeting the challenging side of the spectrum of cultural heritage objects, such as the delicate written record, while retaining flexibility to adapt and/or upscale it to heritage artifacts of a more robust constitution or larger dimensions. We hope this tailored review and workflow will facilitate the interdisciplinary inquiry and interactions among the cultural heritage research community.
Article
Full-text available
Methylmercury (MeHg) formation is a concerning environmental issue described in waters and sediments from multiple aquatic ecosystems. The genetic and metabolic bases of mercury (Hg) methylation have been well described in anoxic environments, but a number of factors seem to point towards alternative pathways potentially occurring in pelagic waters under oxic conditions. Boreal aquatic ecosystems are predicted to undergo increasing concentrations of dissolved organic matter (DOM) as a result of higher terrestrial runoff induced by climate change, which may have important implications in the formation of MeHg in the water column. In this review, different Hg methylation mechanisms postulated in the literature are discussed, with particular focus on potential pathways independent of the hgcAB gene pair and occurring under oxic conditions. Potential effects of DOM on Hg methylation and MeHg bioaccumulation are examined in the context of climate in boreal aquatic ecosystems. Furthermore, the implementation of meta-omic technologies and standardized methods into field measurements and incubation experiments is discussed as a valuable tool to determine taxonomic and functional aspects of Hg methylation in oxic waters and under climate change-induced conditions.
Irreversible pulpitis is an inflammation of the tooth pulp caused by an opportunity-driven invasion of the pulp space by oral microbiota typically prevalent in the oral cavity. Microbial organisms are extensively recognised to be the fundamental cause of endodontic infections and treatment failures. Previously, bacterial species responsible for these infections were largely recognised using conventional microbial culture techniques, lending credence to the widely held belief that anaerobic Gram-negative bacteria frequently enter the pulp space and trigger endodontic infections. The advent of novel technologies grants the advantage of detecting and studying microbial populations via an amalgamation of the modern "Omics" techniques and meticulous bioinformatics analysis, additionally detecting the metatranscriptome, metaproteome and metabolome along with the metagenome. Amongst these analytical strategies, metagenomic analyses are essentially pragmatic for investigating the oral microbiome. Metagenomics favor not only assessment of microbial composition in diseased conditions, but also contributes to detection of novel, potentially pathogenic species inclusive of non-viable bacteria. The present review describes current knowledge of root canal microbiome, including its composition and functional attributes, the novel strategies available for detection of microbiome as well as challenges associated and provides some crucial pointers for areas of future research.
Article
KEY POINTS Evidence suggests that the intestinal microbiome may play an important role in the pathogenesis and progression of acute critical illness in humans and other mammals, although evidence in small animal medicine is scarce. Moreover, the intestinal microbiota plays many important metabolic roles (production of short-chain fatty acids, trimethylamine-N-oxide, and normal bile acid metabolism) and is crucial for immunity as well as defense against enteropathogens. Multiple changes can occur as a result of critical illness (ie, hypoperfusion, shock, inflammation, impaired immunity, dietary changes, medication, and decreased intestinal motility), which can make the cat or dog prone to the development of dysbiosis. The use of probiotics and fecal microbiota transplantation as instruments to modulate the intestinal microbiota seems to be safe and effective in studies on critically ill dogs with acute gastrointestinal diseases.
Article
Full-text available
Oral biofilms or dental plaques are one of the major etiological factors for diverse oral diseases. We aimed to evaluate the effect of a multichannel oral irrigator (MCOI) on periodontal health in 29 participants randomly divided into two groups: the MCOI group and the control group. To evaluate the effect of the MCOI on periodontal health, the modified Quigley Hein Plaque Index (PI), Mühlemann-Son Sulcus Bleeding Index (SBI), bleeding on probing (BOP), and swelling were evaluated and compared before and after MCOI use for 3 days. Although PI and SBI showed statistically significant increases in the control group, the MCOI group showed no significant changes in either parameter. Moreover, the percentage of BOP was significantly lower in the MCOI group. Saliva samples were analyzed by 16s rRNA amplicon sequencing to investigate changes in the oral microbiome. Sequencing results showed that Porphyromonas spp. were significantly increased in the control group, whereas no significant change was detected in the MCOI group. Using the MCOI, enriched populations and functional pathways were detected in pioneer species comprising non-mutans streptococci. These findings provide evidence of the effectiveness of the MCOI in maintaining periodontal health and a healthy microbial ecology in the oral cavity.
Article
Full-text available
Objective Accumulating evidence from microbial studies have highlighted the modulatory roles of intestinal microbes in numerous human diseases, however the shared microbial signatures across different diseases remain relatively unclear. Methods To consolidate existing knowledge across multiple studies, we performed meta-analyses of 17 disease types, covering 34 case-control datasets of 16S rRNA sequencing data, to identify shared alterations amongst different diseases. Furthermore, the impact of a microbial species, L. salivarius, was established in a DSS-induced colitis model and a CII-induced arthritis mouse models. Results Microbial alterations amongst autoimmune diseases were substantially more consistent compared with that of other diseases (cancer, metabolic disease and nervous system disease), with microbial signatures exhibiting notable discriminative power for disease prediction. Autoimmune diseases were characterized by the enrichment of Enterococcus, Veillonella, Streptococcus, Lactobacillus, and the depletion of Ruminococcus, Gemmiger, Oscillibacter, Faecalibacterium, Lachnospiracea incertae sedis, Anaerostipes, Coprococcus, Alistipes, Roseburia, Bilophila, Barnesiella, Dorea, Ruminococcus2, Butyricicoccus, Phascolarctobacterium, Parabacteroides and Odoribacter, amongst others. Functional investigation of L. salivarius, whose genus was commonly enriched in numerous autoimmune diseases, demonstrated protective roles in two separate inflammatory mouse models. Conclusion Our study highlights a strong link between autoimmune diseases and the gut microbiota, with notably consistent microbial alterations compared with that of other diseases, indicating that therapeutic strategies which target the gut microbiome may be transferable across different autoimmune diseases. Functional validation of L. salivarius highlighted that bacterial genera associated with disease may not always be antagonistic, but may represent protective or adaptive responses to disease.
Article
The gut microbiome plays a significant role in methamphetamine addiction. Previous studies using short-read amplicon sequencing have described alterations in microbiota at the genus level and predicted function, in which taxonomic resolution is insufficient for accurate functional measurements. To address this limitation, we employed metagenome sequencing to intuitively associate species to functions of gut microbiota in methamphetamine-induced conditioned place preference. We observed differential perturbations of species-level functional profiling of the gut microbiota across phases of METH-induced CPP, with alterations in SCFA metabolism and bacterial motility at the acquisition phase and substance dependence-alcoholism pathway and amino acid metabolism at the extinction phase. Our findings suggest that reduced beneficial bacteria, i.e., Lactobacillus reuteri, contributed to the alteration of SCFA metabolism, while the increased abundance of Akkermansia muciniphila during the extinction phase may be associated with altered phenylalanine, tyrosine, and tryptophan metabolism and substance dependence pathway. Our study further supports the association between specific microbial taxa and METH-induced rewarding.
Article
Full-text available
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Article
Full-text available
Background: We introduce DESMAN for De novo Extraction of Strains from MetAgeNomes. Metagenome sequencing generates short reads from throughout the genomes of a microbial community. Increasingly large, multi-sample metagenomes, stratified in space and time are being generated from communities with thousands of species. Repeats result in fragmentary co-assemblies with potentially millions of contigs. Contigs can be binned into metagenome assembled genomes (MAGs) but strain level variation will remain. DESMAN identifies variants on core genes, then uses co-occurrence across samples to link variants into strain sequences and abundance profiles. These strain profiles are then searched for on non-core genes to determine the accessory genes present in each strain. Results: We validated DESMAN on a synthetic twenty genome community with 64 samples. We could resolve the five E. coli strains present with 99.58% accuracy across core gene variable sites and their gene complement with 95.7% accuracy. Similarly, on real fecal metagenomes from the 2011 E. coli (STEC) O104:H4 outbreak, the outbreak strain was reconstructed with 99.8% core sequence accuracy. Application to an anaerobic digester metagenome time series reveals that strain level variation is endemic with 16 out of 26 MAGs (61.5%) examined exhibiting two strains. In almost all cases the strain proportions were not statistically different between replicate reactors, suggesting intra-species niche partitioning. The only exception being when the two strains had almost identical gene complement and, hence, functional capability. Conclusions: DESMAN will provide a provide a powerful tool for de novo resolution of fine-scale variation in microbial communities. It is available as open source software from https://github.com/chrisquince/DESMAN.
Article
Full-text available
Among the human health conditions linked to microbial communities, phenotypes are often associated with only a subset of strains within causal microbial groups. While it has been critical for decades in microbial physiology to characterize individual strains, this has been challenging when using culture-independent high-throughput metagenomics. We introduce StrainPhlAn, a novel metagenomic strain identification approach, and apply it to characterize the genetic structure of thousands of strains from >125 species in >1,500 gut metagenomes drawn from populations spanning North/South American, European, Asian, and African countries. The method relies on per-sample dominant sequence variant reconstruction within species-specific marker genes. It identified primarily subject-specific strain variants (<5% inter-subject strain sharing), and we determined that a single strain typically dominated each species and was retained over time (for >70% of species). Microbial population structure was correlated in several distinct ways with the geographic structure of the host population. In some cases discrete subspecies (e.g. for Eubacterium rectale and Prevotella copri) or continuous microbial genetic variations (e.g. for Faecalibacterium prausnitzii) were associated with geographically distinct human populations, whereas few strains occurred in multiple unrelated cohorts. We further estimated the genetic variability of gut microbes, with Bacteroides species appearing remarkably consistent (0.45% median number of nucleotide variants between strains) whereas P. copri was among the most plastic gut colonizers. We thus characterize here the population genetics of previously inaccessible intestinal microbes, providing a comprehensive strain-level genetic overview of the gut microbial diversity.
Article
Full-text available
Background Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. Methods We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. Results Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. Conclusions Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way.
Article
Full-text available
We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare SNPs to track strains between hosts. Using this approach, we found that although species compositions of mothers and infants converged over time, strain-level similarity diverged. Specifically, early colonizing bacteria were often transmitted from an infant's mother, while late colonizing bacteria were often transmitted from other sources in the environment and were enriched for spore-formation genes. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data are analyzed at a coarser taxonomic resolution.
Article
Full-text available
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
Article
Full-text available
“Normal” for the gut microbiota For the benefit of future clinical studies, it is critical to establish what constitutes a “normal” gut microbiome, if it exists at all. Through fecal samples and questionnaires, Falony et al. and Zhernakova et al. targeted general populations in Belgium and the Netherlands, respectively. Gut microbiota composition correlated with a range of factors including diet, use of medication, red blood cell counts, fecal chromogranin A, and stool consistency. The data give some hints for possible biomarkers of normal gut communities. Science , this issue pp. 560 and 565
Article
Full-text available
The tree of life is one of the most important organizing principles in biology1. Gene surveys suggest the existence of an enormous number of branches2, but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships3, 4, 5 or on the known, well-classified diversity of life with an emphasis on eukaryotes6. These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts7,8. Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses.
Article
Full-text available
Background In the last 5 years, the rapid pace of innovations and improvements in sequencing technologies has completely changed the landscape of metagenomic and metagenetic experiments. Therefore, it is critical to benchmark the various methodologies for interrogating the composition of microbial communities, so that we can assess their strengths and limitations. The most common phylogenetic marker for microbial community diversity studies is the 16S ribosomal RNA gene and in the last 10 years the field has moved from sequencing a small number of amplicons and samples to more complex studies where thousands of samples and multiple different gene regions are interrogated.
Article
Full-text available
Profiling microbial community function from metagenomic sequencing data remains a computationally challenging problem. Mapping millions of DNA reads from such samples to reference protein databases requires long run-times, and short read lengths can result in spurious hits to unrelated proteins (loss of specificity). We developed ShortBRED (Short, Better Representative Extract Dataset) to address these challenges, facilitating fast, accurate functional profiling of metagenomic samples. ShortBRED consists of two components: (i) a method that reduces reference proteins of interest to short, highly representative amino acid sequences ("markers") and (ii) a search step that maps reads to these markers to quantify the relative abundance of their associated proteins. After evaluating ShortBRED on synthetic data, we applied it to profile antibiotic resistance protein families in the gut microbiomes of individuals from the United States, China, Malawi, and Venezuela. Our results support antibiotic resistance as a core function in the human gut microbiome, with tetracycline-resistant ribosomal protection proteins and Class A beta-lactamases being the most widely distributed resistance mechanisms worldwide. ShortBRED markers are applicable to other homology-based search tasks, which we demonstrate here by identifying phylogenetic signatures of antibiotic resistance across more than 3,000 microbial isolate genomes. ShortBRED can be applied to profile a wide variety of protein families of interest; the software, source code, and documentation are available for download at http://huttenhower.sph.harvard.edu/shortbred.
Article
Full-text available
Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.
Article
Full-text available
Significance The field of microbiome research is moving from 16S rDNA gene sequencing to metagenomic sequencing of complete communities, which clearly gives a more comprehensive genomic and functional representation of the organisms present. Here we describe, quantify, and compare biases associated with four currently available next-generation sequencing library preparation methods using a synthetic DNA mock community and an extraction spike-in control of microbial cells. Our study highlights a critical need for consistency in protocols and data analysis procedures, especially when attempting to interpret human microbiome data for human health.
Article
Full-text available
Advances in high-throughput sequencing and ‘omics technologies are revolutionizing studies of naturally occurring microbial communities. Comprehensive investigations of microbial lifestyles require the ability to interactively organize and visualize genetic information and to incorporate subtle differences that enable greater resolution of complex data. Here we introduce anvi’o, an advanced analysis and visualization platform that offers automated and human-guided characterization of microbial genomes in metagenomic assemblies, with interactive interfaces that can link ‘omics data from multiple sources into a single, intuitive display. Its extensible visualization approach distills multiple dimensions of information about each contig, offering a dynamic and unified work environment for data exploration, manipulation, and reporting. Using anvi’o, we re-analyzed publicly available datasets and explored temporal genomic changes within naturally occurring microbial populations through de novo characterization of single nucleotide variations, and linked cultivar and single-cell genomes with metagenomic and metatranscriptomic data. Anvi’o is an open-source platform that empowers researchers without extensive bioinformatics skills to perform and communicate in-depth analyses on large ‘omics datasets.
Article
Full-text available
Assessment and characterization of gut microbiota has become a major research area in human disease, including type 2 diabetes, the most prevalent endocrine disease worldwide. To carry out analysis on gut microbial content in patients with type 2 diabetes, we developed a protocol for a metagenome-wide association study (MGWAS) and undertook a two-stage MGWAS based on deep shotgun sequencing of the gut microbial DNA from 345 Chinese individuals. We identified and validated approximately 60,000 type-2-diabetes-associated markers and established the concept of a metagenomic linkage group, enabling taxonomic species-level analyses. MGWAS analysis showed that patients with type 2 diabetes were characterized by a moderate degree of gut microbial dysbiosis, a decrease in the abundance of some universal butyrate-producing bacteria and an increase in various opportunistic pathogens, as well as an enrichment of other microbial functions conferring sulphate reduction and oxidative stress resistance. An analysis of 23 additional individuals demonstrated that these gut microbial markers might be useful for classifying type 2 diabetes.
Article
Full-text available
We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies.
Article
Full-text available
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.
Article
Full-text available
Targeted manipulation of the gut flora is increasingly being recognized as a means to improve human health. Yet, the temporal dynamics and intra- and interindividual heterogeneity of the microbiome represent experimental limitations, especially in human cross-sectional studies. Therefore, rodent models represent an invaluable tool to study the host-microbiota interface. Progress in technical and computational tools to investigate the composition and function of the microbiome has opened a new era of research and we gradually begin to understand the parameters that influence variation of host-associated microbial communities. To isolate true effects from confounding factors, it is essential to include such parameters in model intervention studies. Also, explicit journal instructions to include essential information on animal experiments are mandatory. The purpose of this review is to summarize the factors that influence microbiota composition in mice and to provide guidelines to improve the reproducibility of animal experiments. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Article
Full-text available
Metagenomic sequencing increased our understanding of the role of the microbiome in health and disease, yet it only provides a snapshot of a highly dynamic ecosystem. Here, we show that the pattern of metagenomic sequencing read coverage for different microbial genomes contains a single trough and a single peak, the latter coinciding with the bacterial origin of replication. Furthermore, the ratio of sequencing coverage between the peak and trough provides a quantitative measure of a species’ growth rate. We demonstrate this in vitro and in vivo, under different growth conditions, and in complex bacterial communities. For several bacterial species, peak-to-trough coverage ratios, but not relative abundances, correlated with the manifestation of inflammatory bowel disease and type II diabetes.
Article
Full-text available
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
Article
Full-text available
The increased availability of genomic and metagenomic data poses challenges at multiple analysis levels, including visualization of very large-scale microbial and microbial community data paired with rich metadata. We developed GraPhlAn (Graphical Phylogenetic Analysis), a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes. This includes phylogenies spanning up to thousands of taxa, annotated with metadata ranging from microbial community abundances to microbial physiology or host and environmental phenotypes. GraPhlAn has been developed as an open-source command-driven tool in order to be easily integrated into complex, publication-quality bioinformatics pipelines. It can be executed either locally or through an online Galaxy web application. We present several examples including taxonomic and phylogenetic visualization of microbial communities, metabolic functions, and biomarker discovery that illustrate GraPhlAn's potential for modern microbial and community genomics.
Article
Full-text available
A prominent feature of the bacterial domain is a radiation of major lineages that are defined as candidate phyla because they lack isolated representatives. Bacteria from these phyla occur in diverse environments and are thought to mediate carbon and hydrogen cycles. Genomic analyses of a few representatives suggested that metabolic limitations have prevented their cultivation. Here we reconstructed 8 complete and 789 draft genomes from bacteria representing >35 phyla and documented features that consistently distinguish these organisms from other bacteria. We infer that this group, which may comprise >15% of the bacterial domain, has shared evolutionary history, and describe it as the candidate phyla radiation (CPR). All CPR genomes are small and most lack numerous biosynthetic pathways. Owing to divergent 16S ribosomal RNA (rRNA) gene sequences, 50-100% of organisms sampled from specific phyla would evade detection in typical cultivation-independent surveys. CPR organisms often have self-splicing introns and proteins encoded within their rRNA genes, a feature rarely reported in bacteria. Furthermore, they have unusual ribosome compositions. All are missing a ribosomal protein often absent in symbionts, and specific lineages are missing ribosomal proteins and biogenesis factors considered universal in bacteria. This implies different ribosome structures and biogenesis mechanisms, and underlines unusual biology across a large part of the bacterial domain.
Article
Full-text available
Whole-genome sequencing has become an indispensible tool of modern biology. However, the cost of sample preparation relative to the cost of sequencing remains high, especially for small genomes where the former is dominant. Here we present a protocol for rapid and inexpensive preparation of hundreds of multiplexed genomic libraries for Illumina sequencing. By carrying out the Nextera tagmentation reaction in small volumes, replacing costly reagents with cheaper equivalents, and omitting unnecessary steps, we achieve a cost of library preparation of $8 per sample, approximately 6 times cheaper than the standard Nextera XT protocol. Furthermore, our procedure takes less than 5 hours for 96 samples. Several hundred samples can then be pooled on the same HiSeq lane via custom barcodes. Our method will be useful for re-sequencing of microbial or viral genomes, including those from evolution experiments, genetic screens, and environmental samples, as well as for other sequencing applications including large amplicon, open chromosome, artificial chromosomes, and RNA sequencing.
Article
Full-text available
Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the effects of WGA on 31 different microbial communities from five biotopes that also included low-biomass samples from drinking water and groundwater. Our findings provide evidence that microbiome segregation by biotope was possible despite WGA treatment. Nevertheless, samples from different biotopes revealed different levels of distortion, with genomic GC content significantly correlated with WGA perturbation. Certain phyloge-netic clades revealed a homogenous trend across various sample types, for instance Alpha-and Betaproteobacteria showed a decrease in their abundance after WGA treatment. On the other hand, Enterobacteriaceae, an important biomarker group for fecal contamination in groundwater and drinking water, were strongly affected by WGA treatment without a predictable pattern. These novel results describe the impact of WGA on low-biomass samples and may highlight issues to be aware of when designing future metage-nomic studies that necessitate preceding WGA treatment.
Article
Full-text available
Microbes are dominant drivers of biogeochemical processes, yet drawing a global picture of functional diversity, microbial community structure, and their ecological determinants remains a grand challenge. We analyzed 7.2 terabases of metagenomic data from 243 Tara Oceans samples from 68 locations in epipelagic and mesopelagic waters across the globe to generate an ocean microbial reference gene catalog with >40 million nonredundant, mostly novel sequences from viruses, prokaryotes, and picoeukaryotes. Using 139 prokaryote-enriched samples, containing >35,000 species, we show vertical stratification with epipelagic community composition mostly driven by temperature rather than other environmental factors or geography. We identify ocean microbial core functionality and reveal that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems. Copyright © 2015, American Association for the Advancement of Science.
Article
Full-text available
Despite extensive direct sequencing efforts and advanced analytical tools, reconstructing microbial genomes from soil using metagenomics have been challenging due to the tremendous diversity and relatively uniform distribution of genomes found in this system. Here we used enrichment techniques in an attempt to decrease the complexity of a soil microbiome prior to sequencing by submitting it to a range of physical and chemical stresses in 23 separate microcosms for 4 months. The metagenomic analysis of these microcosms at the end of the treatment yielded 540 Mb of assembly using standard de novo assembly techniques (a total of 559,555 genes and 29,176 functions), from which we could recover novel bacterial genomes, plasmids and phages. The recovered genomes belonged to Leifsonia (n = 2), Rhodanobacter (n = 5), Acidobacteria (n = 2), Sporolactobacillus (n = 2, novel nitrogen fixing taxon), Ktedonobacter (n = 1, second representative of the family Ktedonobacteraceae), Streptomyces (n = 3, novel polyketide synthase modules), and Burkholderia (n = 2, includes mega-plasmids conferring mercury resistance). Assembled genomes averaged to 5.9 Mb, with relative abundances ranging from rare (<0.0001%) to relatively abundant (>0.01%) in the original soil microbiome. Furthermore, we detected them in samples collected from geographically distant locations, particularly more in temperate soils compared to samples originating from high-latitude soils and deserts. To the best of our knowledge, this study is the first successful attempt to assemble multiple bacterial genomes directly from a soil sample. Our findings demonstrate that developing pertinent enrichment conditions can stimulate environmental genomic discoveries that would have been impossible to achieve with canonical approaches that focus solely upon post-sequencing data treatment.
Article
Full-text available
Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes, and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microorganisms at low abundance in complex microbial communities from terrestrial sediments. The long read data revealed multiple (probably dozens of) closely related species and strains from previously undescribed Deltaproteobacteria and Aminicenantes (candidate phylum OP8). Notably, these are the most abundant organisms in the communities, yet short-read assemblies achieved only partial genome coverage, mostly in the form of short scaffolds (N50=~2,200 bps). Genome architecture and metabolic potential for these lineages were reconstructed using a new synteny-based method. Analysis of long read data also revealed thousands of species, whose abundances were <0.1%, in all samples. Most of the organisms in this 'long tail' of rare organisms belong to phyla that are also represented by abundant organisms. Genes encoding glycosyl hydrolases are significantly more abundant than expected in rare genomes, suggesting that rare species may augment the capability for carbon turnover and confer resilience to changing environmental conditions. Overall the study showed that a diversity of closely-related strains and rare organisms account for a major portion of the communities. These are probably common features of many microbial communities, and can be effectively studied using a combination of long and short reads. Published by Cold Spring Harbor Laboratory Press.
Article
Full-text available
The accessibility of high-throughput sequencing has revolutionized many fields of biology. In order to better understand host-associated viral and microbial communities, a comprehensive workflow for DNA and RNA extraction was developed. The workflow concurrently generates viral and microbial metagenomes, as well as metatranscriptomes, from a single sample for next-generation sequencing. The coupling of these approaches provides an overview of both the taxonomical characteristics and the community encoded functions. The presented methods use Cystic Fibrosis (CF) sputum, a problematic sample type, because it is exceptionally viscous and contains high amount of mucins, free neutrophil DNA, and other unknown contaminants. The protocols described here target these problems and successfully recover viral and microbial DNA with minimal human DNA contamination. To complement the metagenomics studies, a metatranscriptomics protocol was optimized to recover both microbial and host mRNA that contains relatively few ribosomal RNA (rRNA) sequences. An overview of the data characteristics is presented to serve as a reference for assessing the success of the methods. Additional CF sputum samples were also collected to (i) evaluate the consistency of the microbiome profiles across seven consecutive days within a single patient, and (ii) compare the consistency of metagenomic approach to a 16S ribosomal RNA gene-based sequencing. The results showed that daily fluctuation of microbial profiles without antibiotic perturbation was minimal and the taxonomy profiles of the common CF-associated bacteria were highly similar between the 16S rDNA libraries and metagenomes generated from the hypotonic lysis (HL)-derived DNA. However, the differences between 16S rDNA taxonomical profiles generated from total DNA and HL-derived DNA suggest that hypotonic lysis and the washing steps benefit in not only removing the human-derived DNA, but also microbial-derived extracellular DNA that may misrepresent the actual microbial profiles.
Article
Full-text available
Several bacterial species have been implicated in the development of colorectal carcinoma (CRC), but CRC-associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here, we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor-free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test (FOBT) and when both approaches were combined, sensitivity improved > 45% relative to the FOBT, while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early- and late-stage cancer and could be validated in independent patient and control populations (N = 335) from different countries. CRC-associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor-related host–microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients, accompanied by an increase of lipopolysaccharide metabolism.