Differential abundance analysis for microbial marker-gene surveys

1] Graduate Program in Applied Mathematics & Statistics, and Scientific Computation, University of Maryland, College Park, Maryland, USA. [2] Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
Nature Methods (Impact Factor: 32.07). 09/2013; 10(12). DOI: 10.1038/nmeth.2658
Source: PubMed


We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.

Download full-text


Available from: Joseph Paulson, Feb 17, 2015
1 Follower
63 Reads
  • Source
    • "We used MetagenomeSeq (Paulson et al., 2013) software to determine the differentially abundant OTUs, families, and genera, present across all groups. We also used nonparametric Kruskal–Wallis H-test (post hoc Tukey Kramer tests, Bonferroni multiple test correction) for multi-group comparisons . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dietary intervention with extensively hydrolyzed casein formula supplemented with Lactobacillus rhamnosus GG (EHCF+LGG) accelerates tolerance acquisition in infants with cow's milk allergy (CMA). We examined whether this effect is attributable, at least in part, to an influence on the gut microbiota. Fecal samples from healthy controls (n=20) and from CMA infants (n=19) before and after treatment with EHCF with (n=12) and without (n=7) supplementation with LGG were compared by 16S rRNA-based operational taxonomic unit clustering and oligotyping. Differential feature selection and generalized linear model fitting revealed that the CMA infants have a diverse gut microbial community structure dominated by Lachnospiraceae (20.5±9.7%) and Ruminococcaceae (16.2±9.1%). Blautia, Roseburia and Coprococcus were significantly enriched following treatment with EHCF and LGG, but only one genus, Oscillospira, was significantly different between infants that became tolerant and those that remained allergic. However, most tolerant infants showed a significant increase in fecal butyrate levels, and those taxa that were significantly enriched in these samples, Blautia and Roseburia, exhibited specific strain-level demarcations between tolerant and allergic infants. Our data suggest that EHCF+LGG promotes tolerance in infants with CMA, in part, by influencing the strain-level bacterial community structure of the infant gut.
    The ISME Journal 09/2015; DOI:10.1038/ismej.2015.151 · 9.30 Impact Factor
  • Source
    • "Two recently proposed methods were collaboratively used for multinomial statistical analysis of the microbiome data. The statistical analysis consisted of three steps: (1) for each microbiome community, use the R statistical software package for HMP (HMP-R) by La Rosa et al. [39,40] to test the underlying probabilistic model based on the Dirichlet multinomial (DM) distribution and to determine the DM parameters, proportions, and dispersion [39]; (2) use the HMP-R to perform hypothesis testing of overall significant differences between communities; and (3) use the R software package metagenomeSeq to determine OTUs that are statistically different in the two communities [41,42]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Sample storage conditions, extraction methods, PCR primers, and parameters are major factors that affect metagenomics analysis based on microbial 16S rRNA gene sequencing. Most published studies were limited to the comparison of only one or two types of these factors. Systematic multi-factor explorations are needed to evaluate the conditions that may impact validity of a microbiome analysis. This study was aimed to improve methodological options to facilitate the best technical approaches in the design of a microbiome study. Three readily available mock bacterial community materials and two commercial extraction techniques, Qiagen DNeasy and MO BIO PowerSoil DNA purification methods, were used to assess procedures for 16S ribosomal DNA amplification and pyrosequencing-based analysis. Primers were chosen for 16S rDNA quantitative PCR and amplification of region V3 to V1. Swabs spiked with mock bacterial community cells and clinical oropharyngeal swabs were incubated at respective temperatures of -80°C, -20°C, 4°C, and 37°C for 4 weeks, then extracted with the two methods, and subjected to pyrosequencing and taxonomic and statistical analyses to investigate microbiome profile stability. Results The bacterial compositions for the mock community DNA samples determined in this study were consistent with the projected levels and agreed with the literature. The quantitation accuracy of abundances for several genera was improved with changes made to the standard Human Microbiome Project (HMP) procedure. The data for the samples purified with DNeasy and PowerSoil methods were statistically distinct; however, both results were reproducible and in good agreement with each other. The temperature effect on storage stability was investigated by using mock community cells and showed that the microbial community profiles were altered with the increase in incubation temperature. However, this phenomenon was not detected when clinical oropharyngeal swabs were used in the experiment. Conclusions Mock community materials originated from the HMP study are valuable controls in developing 16S metagenomics analysis procedures. Long-term exposure to a high temperature may introduce variation into analysis for oropharyngeal swabs, suggestive of storage at 4°C or lower. The observed variations due to sample storage temperature are in a similar range as the intrapersonal variability among different clinical oropharyngeal swab samples.
    09/2014; 2(1):31. DOI:10.1186/2049-2618-2-31
  • Source
    • "We segregated diarrheal stool based on diagnosis of dysentery (presence of blood) and found a total of 30 OTUs that were strongly correlated with dysentery when comparing with non-dysentery diarrheal stool (metagenomeSeq [18], P <0.05). These include several well-known pathogens such as Enterococcus faecalis, Campylobacter jejuni, Bacteroides fragilis, Clostridium perfringens, Enterobacter cancerogenus, and members of the Granulicatella, Haemophilus, Klebsiella, and Escherichia/Shigella genera. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Diarrheal diseases continue to contribute significantly to morbidity and mortality in infants and young children in developing countries. There is an urgent need to better understand the contributions of novel, potentially uncultured, diarrheal pathogens to severe diarrheal disease, as well as distortions in normal gut microbiota composition that might facilitate severe disease. Results We use high throughput 16S rRNA gene sequencing to compare fecal microbiota composition in children under five years of age who have been diagnosed with moderate to severe diarrhea (MSD) with the microbiota from diarrhea-free controls. Our study includes 992 children from four low-income countries in West and East Africa, and Southeast Asia. Known pathogens, as well as bacteria currently not considered as important diarrhea-causing pathogens, are positively associated with MSD, and these include Escherichia/Shigella, and Granulicatella species, and Streptococcus mitis/pneumoniae groups. In both cases and controls, there tend to be distinct negative correlations between facultative anaerobic lineages and obligate anaerobic lineages. Overall genus-level microbiota composition exhibit a shift in controls from low to high levels of Prevotella and in MSD cases from high to low levels of Escherichia/Shigella in younger versus older children; however, there was significant variation among many genera by both site and age. Conclusions Our findings expand the current understanding of microbiota-associated diarrhea pathogenicity in young children from developing countries. Our findings are necessarily based on correlative analyses and must be further validated through epidemiological and molecular techniques.
    Genome Biology 06/2014; 15(R76). DOI:10.1186/gb-2014-15-6-r76 · 10.81 Impact Factor
Show more