Differential abundance analysis for microbial marker-gene surveys

1] Graduate Program in Applied Mathematics & Statistics, and Scientific Computation, University of Maryland, College Park, Maryland, USA. [2] Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
Nature Methods (Impact Factor: 32.07). 09/2013; 10(12). DOI: 10.1038/nmeth.2658
Source: PubMed


We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.

Download full-text


Available from: Joseph Paulson, Feb 17, 2015
1 Follower
73 Reads
    • "Sequences representing any phylotypes classified as mitochondria or chloroplast were removed. In order to reduce potential amplicon sequencing biases, we first removed samples with < 10 000 sequences and then we normalised the sequence counts, using a cumulative-sum scaling approach (Paulson et al. 2013). The total number of samples included in downstream analyses was 556 for bacteria and 480 for fungi. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The complexities of the relationships between plant and soil microbial communities remain unresolved. We determined the associations between plant aboveground and belowground (root) distributions and the communities of soil fungi and bacteria found across a diverse tropical forest plot. Soil microbial community composition was correlated with the taxonomic and phylogenetic structure of the aboveground plant assemblages even after controlling for differences in soil characteristics, but these relationships were stronger for fungi than for bacteria. In contrast to expectations, the species composition of roots in our soil core samples was a poor predictor of microbial community composition perhaps due to the patchy, ephemeral, and highly overlapping nature of fine root distributions. Our ability to predict soil microbial composition was not improved by incorporating information on plant functional traits suggesting that the most commonly measured plant traits are not particularly useful for predicting the plot-level variability in belowground microbial communities.
    Ecology Letters 10/2015; DOI:10.1111/ele.12536 · 10.69 Impact Factor
  • Source
    • "We used MetagenomeSeq (Paulson et al., 2013) software to determine the differentially abundant OTUs, families, and genera, present across all groups. We also used nonparametric Kruskal–Wallis H-test (post hoc Tukey Kramer tests, Bonferroni multiple test correction) for multi-group comparisons . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dietary intervention with extensively hydrolyzed casein formula supplemented with Lactobacillus rhamnosus GG (EHCF+LGG) accelerates tolerance acquisition in infants with cow's milk allergy (CMA). We examined whether this effect is attributable, at least in part, to an influence on the gut microbiota. Fecal samples from healthy controls (n=20) and from CMA infants (n=19) before and after treatment with EHCF with (n=12) and without (n=7) supplementation with LGG were compared by 16S rRNA-based operational taxonomic unit clustering and oligotyping. Differential feature selection and generalized linear model fitting revealed that the CMA infants have a diverse gut microbial community structure dominated by Lachnospiraceae (20.5±9.7%) and Ruminococcaceae (16.2±9.1%). Blautia, Roseburia and Coprococcus were significantly enriched following treatment with EHCF and LGG, but only one genus, Oscillospira, was significantly different between infants that became tolerant and those that remained allergic. However, most tolerant infants showed a significant increase in fecal butyrate levels, and those taxa that were significantly enriched in these samples, Blautia and Roseburia, exhibited specific strain-level demarcations between tolerant and allergic infants. Our data suggest that EHCF+LGG promotes tolerance in infants with CMA, in part, by influencing the strain-level bacterial community structure of the infant gut.
    The ISME Journal 09/2015; DOI:10.1038/ismej.2015.151 · 9.30 Impact Factor
  • Source
    • "Two recently proposed methods were collaboratively used for multinomial statistical analysis of the microbiome data. The statistical analysis consisted of three steps: (1) for each microbiome community, use the R statistical software package for HMP (HMP-R) by La Rosa et al. [39,40] to test the underlying probabilistic model based on the Dirichlet multinomial (DM) distribution and to determine the DM parameters, proportions, and dispersion [39]; (2) use the HMP-R to perform hypothesis testing of overall significant differences between communities; and (3) use the R software package metagenomeSeq to determine OTUs that are statistically different in the two communities [41,42]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Sample storage conditions, extraction methods, PCR primers, and parameters are major factors that affect metagenomics analysis based on microbial 16S rRNA gene sequencing. Most published studies were limited to the comparison of only one or two types of these factors. Systematic multi-factor explorations are needed to evaluate the conditions that may impact validity of a microbiome analysis. This study was aimed to improve methodological options to facilitate the best technical approaches in the design of a microbiome study. Three readily available mock bacterial community materials and two commercial extraction techniques, Qiagen DNeasy and MO BIO PowerSoil DNA purification methods, were used to assess procedures for 16S ribosomal DNA amplification and pyrosequencing-based analysis. Primers were chosen for 16S rDNA quantitative PCR and amplification of region V3 to V1. Swabs spiked with mock bacterial community cells and clinical oropharyngeal swabs were incubated at respective temperatures of -80°C, -20°C, 4°C, and 37°C for 4 weeks, then extracted with the two methods, and subjected to pyrosequencing and taxonomic and statistical analyses to investigate microbiome profile stability. Results The bacterial compositions for the mock community DNA samples determined in this study were consistent with the projected levels and agreed with the literature. The quantitation accuracy of abundances for several genera was improved with changes made to the standard Human Microbiome Project (HMP) procedure. The data for the samples purified with DNeasy and PowerSoil methods were statistically distinct; however, both results were reproducible and in good agreement with each other. The temperature effect on storage stability was investigated by using mock community cells and showed that the microbial community profiles were altered with the increase in incubation temperature. However, this phenomenon was not detected when clinical oropharyngeal swabs were used in the experiment. Conclusions Mock community materials originated from the HMP study are valuable controls in developing 16S metagenomics analysis procedures. Long-term exposure to a high temperature may introduce variation into analysis for oropharyngeal swabs, suggestive of storage at 4°C or lower. The observed variations due to sample storage temperature are in a similar range as the intrapersonal variability among different clinical oropharyngeal swab samples.
    09/2014; 2(1):31. DOI:10.1186/2049-2618-2-31
Show more