Supervised classifcation of human microbiota
Department of Computer Science, University of Colorado, Boulder, CO, USA.FEMS microbiology reviews (Impact Factor: 13.24). 09/2010; 35(2):343-59. DOI: 10.1111/j.1574-6976.2010.00251.x
Recent advances in DNA sequencing technology have allowed the collection of high-dimensional data from human-associated microbial communities on an unprecedented scale. A major goal of these studies is the identification of important groups of microorganisms that vary according to physiological or disease states in the host, but the incidence of rare taxa and the large numbers of taxa observed make that goal difficult to obtain using traditional approaches. Fortunately, similar problems have been addressed by the machine learning community in other fields of study such as microarray analysis and text classification. In this review, we demonstrate that several existing supervised classifiers can be applied effectively to microbiota classification, both for selecting subsets of taxa that are highly discriminative of the type of community, and for building models that can accurately classify unlabeled data. To encourage the development of new approaches to supervised classification of microbiota, we discuss several structures inherent in microbial community data that may be available for exploitation in novel approaches, and we include as supplemental information several benchmark classification tasks for use by the community.
[Show abstract] [Hide abstract]
- "Both settings are commonly observed in metagenomics. For example, recent works have examined supervised learning of " microbiome phenotypes " , where labeled data are available; and other recent works have examined using unsupervised learning for developing a binning algorithms for clustering sequences with a high level of similarity . The primary advantage of machine learning is that once is learned from training data, it can be used to predict the classification of future data—whose labels are unknown. "
ABSTRACT: Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: (i) a deep belief network, and (ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multilayer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy - as that depends on the specific application - but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.
[Show abstract] [Hide abstract]
- "mu - nities . Note that these counts of shared microbial OTUs are sensitive to sampling effort ; more exten - sive sampling of water and invertebrate microbiota would presumably reveal additional microbes and might therefore increase these estimates of overlap with the fish gut microbiota . We then used Bayesian community - level source tracking ( Knights et al . , 2011 ) to estimate how much of the stickleback gut microbiota is from water , or invertebrate prey sources ( after filtering OTUs found in fewer than 1% of all samples ) : on average 12 . 6% of the fish gut microbiota was from water sources , 73 . 3% from prey sources and 14 . 1% unknown ( o0 . 05% of all samples came from presumed human gut"
ABSTRACT: To explain differences in gut microbial communities we must determine how processes regulating microbial community assembly (colonization, persistence) differ among hosts and affect microbiota composition. We surveyed the gut microbiota of threespine stickleback (Gasterosteus aculeatus) from 10 geographically clustered populations and sequenced environmental samples to track potential colonizing microbes and quantify the effects of host environment and genotype. Gut microbiota composition and diversity varied among populations. These among-population differences were associated with multiple covarying ecological variables: habitat type (lake, stream, estuary), lake geomorphology and food- (but not water-) associated microbiota. Fish genotype also covaried with gut microbiota composition; more genetically divergent populations exhibited more divergent gut microbiota. Our results suggest that population level differences in stickleback gut microbiota may depend more on internal sorting processes (host genotype) than on colonization processes (transient environmental effects).
[Show abstract] [Hide abstract]
- "We compared OTU frequencies between preservation methods using a Kruskal–Wallis test and FDR corrected p-values (QIIME script: group_significance.py). Supervised learning analyses were also performed in QIIME to determine if preservation method or week affected microbial composition (OTUs) (Breiman, 2001; Knights et al., 2011). This analysis uses 80% of the data as a training set, and 20% of the data as a test set. "
ABSTRACT: Studies of the gut microbiome have become increasingly common with recent technological advances. Gut microbes play an important role in human and animal health, and gut microbiome analysis holds great potential for evaluating health in wildlife, as microbiota can be assessed from non-invasively collected fecal samples. However, many common fecal preservation protocols (e.g. freezing at -80°C) are not suitable for field conditions, or have not been tested for long-term (greater than 2 weeks) storage. In this study, we collected fresh fecal samples from captive spider monkeys (Ateles geoffroyi) at the Columbian Park Zoo (Lafayette, IN, USA). The samples were pooled, homogenized, and preserved for up to 8 weeks prior to DNA extraction and sequencing. Preservation methods included: freezing at -20°C, freezing at -80°C, immersion in 100% ethanol, application to FTA cards, and immersion in RNAlater. At 0 (fresh), 1, 2, 4, and 8 weeks from fecal collection, DNA was extracted and microbial DNA was amplified and sequenced. DNA concentration, purity, microbial diversity, and microbial composition were compared across all methods and time points. DNA concentration and purity did not correlate with microbial diversity or composition. Microbial composition of frozen and ethanol samples were most similar to fresh samples. FTA card and RNAlater-preserved samples had the least similar microbial composition and abundance compared to fresh samples. Microbial composition and diversity were relatively stable over time within each preservation method. Based on these results, if freezers are not available, we recommend preserving fecal samples in ethanol (for up to 8 weeks) prior to microbial extraction and analysis. Copyright © 2015. Published by Elsevier B.V.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.