Extrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknown

Department of Applied Mathematics, University of Colorado, Boulder, Colorado, United States of America.
PLoS ONE (Impact Factor: 3.23). 06/2011; 6(6):e21105. DOI: 10.1371/journal.pone.0021105
Source: PubMed


The availability of high-throughput parallel methods for sequencing microbial communities is increasing our knowledge of the microbial world at an unprecedented rate. Though most attention has focused on determining lower-bounds on the α-diversity i.e. the total number of different species present in the environment, tight bounds on this quantity may be highly uncertain because a small fraction of the environment could be composed of a vast number of different species. To better assess what remains unknown, we propose instead to predict the fraction of the environment that belongs to unsampled classes. Modeling samples as draws with replacement of colored balls from an urn with an unknown composition, and under the sole assumption that there are still undiscovered species, we show that conditionally unbiased predictors and exact prediction intervals (of constant length in logarithmic scale) are possible for the fraction of the environment that belongs to unsampled classes. Our predictions are based on a poissonization argument, which we have implemented in what we call the Embedding algorithm. In fixed i.e. non-randomized sample sizes, the algorithm leads to very accurate predictions on a sub-sample of the original sample. We quantify the effect of fixed sample sizes on our prediction intervals and test our methods and others found in the literature against simulated environments, which we devise taking into account datasets from a human-gut and -hand microbiota. Our methodology applies to any dataset that can be conceptualized as a sample with replacement from an urn. In particular, it could be applied, for example, to quantify the proportion of all the unseen solutions to a binding site problem in a random RNA pool, or to reassess the surveillance of a certain terrorist group, predicting the conditional probability that it deploys a new tactic in a next attack.

Download full-text


Available from: Manuel E. Lladser,
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A classical problem in statistics is estimating the expected coverage of a sample, which has had applications in gene expression, microbial ecology, optimization, and even numismatics. Here we consider a related extension of this problem to random samples of two discrete distributions. Specifically, we estimate what we call the dissimilarity probability of a sample, i.e., the probability of a draw from one distribution not being observed in [Formula: see text] draws from another distribution. We show our estimator of dissimilarity to be a [Formula: see text]-statistic and a uniformly minimum variance unbiased estimator of dissimilarity over the largest appropriate range of [Formula: see text]. Furthermore, despite the non-Markovian nature of our estimator when applied sequentially over [Formula: see text], we show it converges uniformly in probability to the dissimilarity parameter, and we present criteria when it is approximately normally distributed and admits a consistent jackknife estimator of its variance. As proof of concept, we analyze V35 16S rRNA data to discern between various microbial environments. Other potential applications concern any situation where dissimilarity of two discrete distributions may be of interest. For instance, in SELEX experiments, each urn could represent a random RNA pool and each draw a possible solution to a particular binding site problem over that pool. The dissimilarity of these pools is then related to the probability of finding binding site solutions in one pool that are absent in the other.
    PLoS ONE 11/2012; 7(11):e42368. DOI:10.1371/journal.pone.0042368 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we consider moderate deviations for Good's coverage estimator. The moderate deviation principle and the self-normalized moderate deviation principle for Good's coverage estimator are established. The results are also applied to the hypothesis testing problem and the confidence interval for the coverage.
    The Annals of Statistics 05/2013; 41(2). DOI:10.1214/13-AOS1091 · 2.18 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Vaginally administered antiviral agents may reduce the risk of HIV and HSV acquisition. Delivery of these drugs using intravaginal rings (IVRs) holds the potential benefits of improving adherence and decreasing systemic exposure, while maintaining steady-state drug levels in the vaginal tract. Elucidating how IVRs interact with the vaginal microbiome constitutes a critical step in evaluating the safety of these devices, as shifts the vaginal microbiome have been linked with several disease states. To date, clinical IVR trials have relied on culture-dependent methods that omit the high diversity of unculturable microbial population. Longitudinal, culture-independent characterization of the microbiota in vaginal samples from 6 women with recurrent genital HSV who used an acyclovir IVR was carried out and compared to the communities developing in biofilms on the IVR surface. The analysis utilized Illumina MiSeq sequence datasets generated from bar-coded amplicons of 16S rRNA gene fragments. Specific taxa in the vaginal communities of the study participants were found to be associated with the duration of recurrent genital HSV status and the number of HSV outbreaks. Taxonomic comparison of the vaginal and IVR biofilm communities did not reveal any significant differences, suggesting that the IVRs were not systematically enriched with members of the vaginal microbiome. Device usage did not alter the participants' vaginal microbial communities, within the confines of the current study design. Rigorous, molecular analysis of the effects of intravaginal devices on the corresponding microbial communities shows promise for integration with traditional approaches in the clinical evaluation of candidate products.
    Antiviral research 12/2013; 102. DOI:10.1016/j.antiviral.2013.12.004 · 3.94 Impact Factor
Show more