Supervised classifcation of human microbiota

Department of Computer Science, University of Colorado, Boulder, CO, USA.
FEMS microbiology reviews (Impact Factor: 13.24). 09/2010; 35(2):343-59. DOI: 10.1111/j.1574-6976.2010.00251.x
Source: PubMed


Recent advances in DNA sequencing technology have allowed the collection of high-dimensional data from human-associated microbial communities on an unprecedented scale. A major goal of these studies is the identification of important groups of microorganisms that vary according to physiological or disease states in the host, but the incidence of rare taxa and the large numbers of taxa observed make that goal difficult to obtain using traditional approaches. Fortunately, similar problems have been addressed by the machine learning community in other fields of study such as microarray analysis and text classification. In this review, we demonstrate that several existing supervised classifiers can be applied effectively to microbiota classification, both for selecting subsets of taxa that are highly discriminative of the type of community, and for building models that can accurately classify unlabeled data. To encourage the development of new approaches to supervised classification of microbiota, we discuss several structures inherent in microbial community data that may be available for exploitation in novel approaches, and we include as supplemental information several benchmark classification tasks for use by the community.

1 Follower
135 Reads
    • "Both settings are commonly observed in metagenomics. For example, recent works have examined supervised learning of " microbiome phenotypes " [12], where labeled data are available; and other recent works have examined using unsupervised learning for developing a binning algorithms for clustering sequences with a high level of similarity [13]. The primary advantage of machine learning is that once is learned from training data, it can be used to predict the classification of future data—whose labels are unknown. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: (i) a deep belief network, and (ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multilayer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy - as that depends on the specific application - but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.
    IEEE transactions on nanobioscience 08/2015; 14(6). DOI:10.1109/TNB.2015.2461219 · 2.31 Impact Factor
  • Source
    • "mu - nities . Note that these counts of shared microbial OTUs are sensitive to sampling effort ; more exten - sive sampling of water and invertebrate microbiota would presumably reveal additional microbes and might therefore increase these estimates of overlap with the fish gut microbiota . We then used Bayesian community - level source tracking ( Knights et al . , 2011 ) to estimate how much of the stickleback gut microbiota is from water , or invertebrate prey sources ( after filtering OTUs found in fewer than 1% of all samples ) : on average 12 . 6% of the fish gut microbiota was from water sources , 73 . 3% from prey sources and 14 . 1% unknown ( o0 . 05% of all samples came from presumed human gut"
    [Show abstract] [Hide abstract]
    ABSTRACT: To explain differences in gut microbial communities we must determine how processes regulating microbial community assembly (colonization, persistence) differ among hosts and affect microbiota composition. We surveyed the gut microbiota of threespine stickleback (Gasterosteus aculeatus) from 10 geographically clustered populations and sequenced environmental samples to track potential colonizing microbes and quantify the effects of host environment and genotype. Gut microbiota composition and diversity varied among populations. These among-population differences were associated with multiple covarying ecological variables: habitat type (lake, stream, estuary), lake geomorphology and food- (but not water-) associated microbiota. Fish genotype also covaried with gut microbiota composition; more genetically divergent populations exhibited more divergent gut microbiota. Our results suggest that population level differences in stickleback gut microbiota may depend more on internal sorting processes (host genotype) than on colonization processes (transient environmental effects).
    The ISME Journal 04/2015; DOI:10.1038/ismej.2015.64 · 9.30 Impact Factor
  • Source
    • "Comparisons of alpha diversities were based on averages of 1000 rarefactions . Random forests supervised learning classification as implemented in QIIME (Knights et al. 2011) as well as ANOSIM and PERMANOVA tests of Bray-Curtis, weighted UniFrac, and unweighted UniFrac beta diversity metrics were used to compare community level differences between treatments. Individual OTUs that appeared in at least 25% of samples were examined for relative abundance differences between treatments using ANOVA with Bonferroni correction. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The recent development of methods applying next-generation sequencing to microbial community characterization has led to the proliferation of these studies in a wide variety of sample types. Yet, variation in the physical properties of environmental samples demands that optimal DNA extraction techniques be explored for each new environment. The microbiota associated with many species of insects offer an extraction challenge as they are frequently surrounded by an armored exoskeleton, inhibiting disruption of the tissues within. In this study, we examine the efficacy of several commonly used protocols for extracting bacterial DNA from ants. While bacterial community composition recovered using Illumina 16S rRNA amplicon sequencing was not detectably biased by any method, the quantity of bacterial DNA varied drastically, reducing the number of samples that could be amplified and sequenced. These results indicate that the concentration necessary for dependable sequencing is around 10,000 copies of target DNA per microliter. Exoskeletal pulverization and tissue digestion increased the reliability of extractions, suggesting that these steps should be included in any study of insect-associated microorganisms that relies on obtaining microbial DNA from intact body segments. Although laboratory and analysis techniques should be standardized across diverse sample types as much as possible, minimal modifications such as these will increase the number of environments in which bacterial communities can be successfully studied.
    MicrobiologyOpen 09/2014; 3(6). DOI:10.1002/mbo3.216 · 2.21 Impact Factor
Show more