Estimating population diversity with CatchAll

Department of Statistical Science, Cornell University, Ithaca, NY, 14853, USA.
Bioinformatics (Impact Factor: 4.62). 02/2012; DOI: 10.1093/bioinformatics/bts075
Source: PubMed

ABSTRACT Motivation: The massive data produced by next-generation sequencing require advanced statistical tools. We address estimating the total diversity or species richness in a population. To date, only relatively simple methods have been implemented in available software. There is a need for software employing modern, computationally intensive statistical analyses including error, goodness-of-fit and robustness assessments. Results: We present CatchAll, a fast, easy-to-use, platform-independent program that computes maximum likelihood estimates for finite-mixture models, weighted linear regression-based analyses and coverage-based non-parametric methods, along with outlier diagnostics. Given sample 'frequency count' data, CatchAll computes 12 different diversity estimates and applies a model-selection algorithm. CatchAll also derives discounted diversity estimates to adjust for possibly uncertain low-frequency counts. It is accompanied by an Excel-based graphics program.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Viruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing platforms and assemblers. Further, it investigates the suitability of different methods to estimate viral diversity in metagenomes. We created in silico metagenomes mimicking various platforms at different sequencing depths. The CLC assembler revealed subpar compared to IDBA_UD and CAMERA , which are metagenomic-specific. Up to a saturation point, Illumina platforms proved more capable of reconstructing large portions of viral genomes compared to 454. Read length was an important factor for limiting chimericity, while scaffolding marginally improved contig length and accuracy. The genome length of the various viruses in the metagenomes did not significantly affect genome reconstruction, but the co-existence of highly similar genomes was detrimental. When evaluating diversity estimation tools, we found that PHACCS results were more accurate than those from CatchAll and clustering, which were both orders of magnitude above expected. Assemblers designed specifically for the analysis of metagenomes should be used to facilitate the creation of high-quality long contigs. Despite the high coverage possible, scientists should not expect to always obtain complete genomes, because their reconstruction may be hindered by co-existing species bearing highly similar genomic regions. Further development of metagenomics-oriented assemblers may help bypass these limitations in future studies. Meanwhile, the lack of fully reconstructed communities keeps methods to estimate viral diversity relevant. While none of the three methods tested had absolute precision, only PHACCS was deemed suitable for comparative studies.
    BMC Genomics 11/2014; 15(1):989. DOI:10.1186/1471-2164-15-989 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We wish to estimate the total number of classes in a population. The classical approach assumes that each class independently contributes a Poisson number of representatives to the sample according to its sampling intensity; these intensities follow a stochastic abundance distribution. In this paper we present what we believe to be the first parametric departure from the mixed Poisson framework. We draw on probability theory that characterizes distributions on the integers by the ratios of their consecutive probabilities. Based on these distributions we construct a nonlinear regression model for the ratios of consecutive frequency counts; this allows us to predict the unobserved count and hence to estimate the total diversity. We find that this approach results in realistic estimates with good fits to data and reasonable standard errors, and it is geometrically intuitive. The method is especially well-suited to the high diversity setting typical of modern microbial datasets derived from next-generation sequencing. We demonstrate its performance in low, medium and high diversity contexts, and via simulation. Finally, we present a dataset for which our method outperforms all competitors.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The oral and conjunctival microbiotas likely play important roles in protection from opportunistic infections, while also being the source of potential pathogens. Yet, there has been limited investigation in cats, and the impact of comorbidities such as feline immunodeficiency virus (FIV) infection has not been reported. Oral and conjunctival swabs were collected from cats with FIV infection and FIV-uninfected controls, and subjected to 16S rRNA gene (V4) PCR and next generation sequencing. 9,249 OTUs were identified from conjunctival swabs, yet the most common 20 (0.22%) OTUs accounted for 76% of sequences. The two most abundant OTUs both belonged to Staphylococcus, and accounted for 37% of sequences. Cats with FIV infection had significantly lower relative abundances of Verrucomicrobia, Fibrobacteres, Spirochaetes, Bacteroidetes and Tenericutes, and a higher relative abundance of Deinococcus-Thermus. There were significant differences in both community membership (P = 0.006) and community structure (P = 0.02) between FIV-infected and FIV-uninfected cats. FIV-infected cats had significantly higher relative abundances of Fusobacteria and Actinobacteria in the oral cavity, and significantly higher relative abundances of several bacterial classes including Fusobacteria (0.022 vs 0.007, P = 0.006), Actinobacteria (0.017 vs 0.003, P = 0.003), Sphingobacteria (0.00015 vs 0.00003, P = 0.0013) and Flavobacteria (0.0073 vs 0.0034, P = 0.030). The feline conjunctival and oral microbiotas are complex polymicrobial communities but dominated by a limited number of genera. There is an apparent impact of FIV infection on various components of the microbiota, and assessment of the clinical relevance of these alterations in required.
    Veterinary Research 03/2015; 46(1). DOI:10.1186/s13567-014-0140-5 · 3.38 Impact Factor


Available from
Jun 3, 2014