[Show abstract][Hide abstract] ABSTRACT: Background
We previously showed that stool samples of pre-adolescent and adolescent US children diagnosed with diarrhea-predominant IBS (IBS-D) had different compositions of microbiota and metabolites compared to healthy age-matched controls. Here we explored whether observed fecal microbiota and metabolite differences between these two adolescent populations can be used to discriminate between IBS and health.
We constructed individual microbiota- and metabolite-based sample classification models based on the partial least squares multivariate analysis and then applied a Bayesian approach to integrate individual models into a single classifier. The resulting combined classification achieved 84 % accuracy of correct sample group assignment and 86 % prediction for IBS-D in cross-validation tests. The performance of the cumulative classification model was further validated by the de novo analysis of stool samples from a small independent IBS-D cohort.
High-throughput microbial and metabolite profiling of subject stool samples can be used to facilitate IBS diagnosis.
Electronic supplementary material
The online version of this article (doi:10.1186/s40168-015-0139-9) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: The goal of this study was to determine if fecal metabolite and microbiota profiles can serve as biomarkers of human intestinal diseases, and to uncover possible gut microbe-metabolite associations. We employed proton nuclear magnetic resonance to measure fecal metabolites of healthy children and those diagnosed with diarrhea-predominant irritable bowel syndrome (IBS-D). Metabolite levels were associated with fecal microbial abundances. Using several ordination techniques, healthy and irritable bowel syndrome (IBS) samples could be distinguished based on the metabolite profiles of fecal samples, and such partitioning was congruent with the microbiota-based sample separation. Measurements of individual metabolites indicated that the intestinal environment in IBS-D was characterized by increased proteolysis, incomplete anaerobic fermentation and possible change in methane production. By correlating metabolite levels with abundances of microbial genera, a number of statistically significant metabolite-genus associations were detected in stools of healthy children. No such associations were evident for IBS children. This finding complemented the previously observed reduction in the number of microbe-microbe associations in the distal gut of the same cohort of IBS-D children.The ISME Journal advance online publication, 30 January 2015; doi:10.1038/ismej.2014.258.
No preview · Article · Jan 2015 · The ISME Journal
[Show abstract][Hide abstract] ABSTRACT: Studies aimed at identifying serum markers of cellular metabolism (biomarkers) that are associated at baseline with aerobic capacity (VO2max) in young, healthy individuals have yet to be reported. Therefore, the goal of the present study was to use the standard chemistry screen and untargeted mass spectrometry (MS)-based metabolomic profiling to identify significant associations between baseline levels of serum analytes or metabolites with VO2max (77 subjects, age range 18–35 years). Use of multivariable linear regression identified three analytes (standard chemistry screen) and twenty-three metabolites (MS-based metabolomics) containing significant, sex-adjusted associations with VO2max. In addition, fourteen metabolites were found to contain sex-specific associations with aerobic capacity. Subsequent stepwise multivariable linear regression identified the combination of SGOT, 4-ethylphenylsulfate, tryptophan, γ-tocopherol, and α-hydroxyisovalerate as overall, sex-adjusted baseline predictors of VO2max (adjusted R
2 = 0.66). However, the results of the stepwise model were found to be sensitive to outliers; therefore, random forest (RF) regression was performed. Use of RF regression identified a combination of seven covariates that explained 57.6 % of the variability inherent in VO2max. Furthermore, inclusion of significant analytes, metabolites and sex-specific metabolites into a stepwise regression model identified the combination of five metabolites in males and seven metabolites in females as being able to explain 80 and 58 % of the variability inherent in VO2max, respectively. In conclusion, the evidence presented in the current report is the first attempt to identify baseline serum biomarkers that are significantly associated with VO2max in young, healthy adult humans.
Full-text · Article · Nov 2012 · Arbeitsphysiologie
[Show abstract][Hide abstract] ABSTRACT: The interpretation of nuclear magnetic reso-nance (NMR) experimental results for metabolomics studies requires intensive signal processing and multivariate data analysis techniques. Standard quantification techniques at-tempt to minimize effects from variations in peak positions caused by sample pH, ionic strength, and composition. These techniques fail to account for adjacent signals which can lead to drastic quantification errors. Attempts at full spectrum deconvolution have been limited in adoption and development due to the computational resources required. Herein, we develop a novel localized deconvolution al-gorithm for general purpose quantification of NMR-based metabolomics studies. Localized deconvolution decreases average absolute quantification error by 97% and average relative quantification error by 88%. When applied to a 1 H metabolomics study, the cross-validation metric, Q 2 , improved 16% by reducing within group variability. This increase in accuracy leads to additional computing costs that are overcome by translating the algorithm to the map-reduce design paradigm.
[Show abstract][Hide abstract] ABSTRACT: 2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) elicits a broad spectrum of species-specific effects that have not yet been fully characterized. This study compares the temporal effects of TCDD on hepatic aqueous and lipid metabolite extracts from immature ovariectomized C57BL/6 mice and Sprague-Dawley rats using gas chromatography-mass spectrometry and nuclear magnetic resonance-based metabolomic approaches and integrates published gene expression data to identify species-specific pathways affected by treatment. TCDD elicited metabolite and gene expression changes associated with lipid metabolism and transport, choline metabolism, bile acid metabolism, glycolysis, and glycerophospholipid metabolism. Lipid metabolism is altered in mice resulting in increased hepatic triacylglycerol as well as mono- and polyunsaturated fatty acid (FA) levels. Mouse-specific changes included the induction of CD36 and other cell surface receptors as well as lipases- and FA-binding proteins consistent with hepatic triglyceride and FA accumulation. In contrast, there was minimal hepatic fat accumulation in rats and decreased CD36 expression. However, choline metabolism was altered in rats, as indicated by decreases in betaine and increases in phosphocholine with the concomitant induction of betaine-homocysteine methyltransferase and choline kinase gene expression. Results from these studies show that aryl hydrocarbon receptor-mediated differential gene expression could be linked to metabolite changes and species-specific alterations of biochemical pathways.
[Show abstract][Hide abstract] ABSTRACT: The interpretation of nuclear magnetic resonance (NMR) experimental results for metabolomics studies requires intensive signal
processing and multivariate data analysis techniques. A key step in this process is the quantification of spectral features,
which is commonly accomplished by dividing an NMR spectrum into several hundred integral regions or bins. Binning attempts
to minimize effects from variations in peak positions caused by sample pH, ionic strength, and composition, while reducing
the dimensionality for multivariate statistical analyses. Herein we develop an improved novel spectral quantification technique,
dynamic adaptive binning. With this technique, bin boundaries are determined by optimizing an objective function using a dynamic
programming strategy. The objective function measures the quality of a bin configuration based on the number of peaks per
bin. This technique shows a significant improvement over both traditional uniform binning and other adaptive binning techniques.
This improvement is quantified via synthetic validation sets by analyzing an algorithm’s ability to create bins that do not
contain more than a single peak and that maximize the distance from peak to bin boundary. The validation sets are developed
by characterizing the salient distributions in experimental NMR spectroscopic data. Further, dynamic adaptive binning is applied
to a 1H NMR-based experiment to monitor rat urinary metabolites to empirically demonstrate improved spectral quantification.
[Show abstract][Hide abstract] ABSTRACT: As metabolomic technology expands, validated techniques for analyzing highly dimensional categorical data are becoming increasingly
important. This manuscript presents a novel latent vector-based methodology for analyzing complex data sets with multiple
groups that include both high and low doses using orthogonal projections to latent structures (OPLS) coupled with hierarchical
clustering. This general methodology allows complex experimental designs (e.g., multiple dose and time combinations) to be
encoded and directly compared. Further, it allows for the inclusion of low dose samples that do not exhibit a strong enough
individual response to be modeled independently. A dose- and time-responsive metabolomic study was completed to evaluate and
demonstrate this methodology. Single doses (0.1–100mg/kg body weight) of α-naphthylisothiocyanate (ANIT), a common model
of hepatic cholestasis, were administered orally in corn oil to male Fischer 344 rats. Urine samples were collected pre-dose
and daily through day-4 post-dose. Blood samples were collected pre and post-dose to assess indices of clinical toxicity.
Urine samples were analyzed by 1H-NMR spectroscopy, and the spectra were adaptively binned to reduce dimensionality. The proposed methodology for NMR-based
urinary metabolomics was sensitive enough to detect ANIT-induced effects with respect to both dose and time at doses below
the threshold of clinical toxicity. A pattern of ANIT-dependent effects established at the highest dose was seen in the 50
and 20mg/kg dose groups, an effect not directly identifiable with individual principal component analysis (PCA). Coupling
the pattern found by the OPLS algorithm and hierarchical clustering revealed a relationship between the 100, 50 and 20mg/kg
dose groups, suggesting a characteristic effect of ANIT exposure. These studies demonstrate that the use of a metabolomics
approach with flexible binning of 1H spectra and appropriate application of multivariate analyses can reveal biologically relevant information about the temporal
metabolic perturbations caused by exposure and toxicity.
KeywordsNMR metabolomics–High dimension categorical data–Adaptive binning
[Show abstract][Hide abstract] ABSTRACT: Common contemporary practice within the nuclear magnetic resonance (NMR) metabolomics community is to evaluate and validate novel algorithms on empirical data or simplified simulated data. Empirical data captures the complex characteristics of experimental data, but the optimal or most correct analysis is unknown a priori; therefore, researchers are forced to rely on indirect performance metrics, which are of limited value. In order to achieve fair and complete analysis of competing techniques more exacting metrics are required. Thus, metabolomics researchers often evaluate their algorithms on simplified simulated data with a known answer. Unfortunately, the conclusions obtained on simulated data are only of value if the data sets are complex enough for results to generalize to true experimental data. Ideally, synthetic data should be indistinguishable from empirical data, yet retain a known best analysis.
We have developed a technique for creating realistic synthetic metabolomics validation sets based on NMR spectroscopic data. The validation sets are developed by characterizing the salient distributions in sets of empirical spectroscopic data. Using this technique, several validation sets are constructed with a variety of characteristics present in 'real' data. A case study is then presented to compare the relative accuracy of several alignment algorithms using the increased precision afforded by these synthetic data sets.
These data sets are available for download at http://birg.cs.wright.edu/nmr_synthetic_data_sets.
[Show abstract][Hide abstract] ABSTRACT: Metabolomics offers the potential to assess the effects of toxicants on metabolite levels. To fully realize this potential,
a robust analytical workflow for identifying and quantifying treatment-elicited changes in metabolite levels by nuclear magnetic
resonance (NMR) spectrometry has been developed that isolates and aligns spectral regions across treatment and vehicle groups
to facilitate analytical comparisons. The method excludes noise regions from the resulting reduced spectra, significantly
reducing data size. Principal components analysis (PCA) identifies data clusters associated with experimental parameters.
Cluster-centroid scores, derived from the principal components that separate treatment from vehicle samples, are used to reconstruct
the mean spectral estimates for each treatment and vehicle group. Peak amplitudes are determined by scanning the reconstructed
mean spectral estimates. Confidence levels from Mann–Whitney order statistics and amplitude change ratios are used to identify
treatment-related changes in peak amplitudes. As a demonstration of the method, analysis of 13C NMR data from hepatic lipid extracts of immature, ovariectomized C57BL/6 mice treated with 30 μg/kg 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) or sesame oil vehicle, sacrificed at 72, 120, or 168 h, identified 152 salient peaks. PCA clustering showed
a prominent treatment effect at all three time points studied, and very little difference between time points of treated animals.
Phenotypic differences between two animal cohorts were also observed. Based on spectral peak identification, hepatic lipid
extracts from treated animals exhibited redistribution of unsaturated fatty acids, cholesterols, and triacylglycerols. This
method identified significant changes in peaks without the loss of information associated with spectral binning, increasing
the likelihood of identifying treatment-elicited metabolite changes.
[Show abstract][Hide abstract] ABSTRACT: In this study we examined the urinary metabolite profiles from rats following a single exposure to the kidney toxicants D-serine, puromycin, hippuric acid and amphotericin B at various doses, and as a function of time post-dose. In toxicology, such dose-time metabonomics studies are important for an accurate determination of the severity of biological effects, and for biomarker identification that may be associated with toxicity. The metabonomics analysis yielded a dose-response curve in principal component analysis space, and was able to detect exposure to D-serine and puromycin at much lower doses than standard clinical chemistry measures. Additionally, characteristic features in the urinary metabolite profiles could be ascertained as a function of dose. The results showed common features and some unique features in urinary metabolite profiles when analyzed by NMR and LC-MS, respectively.
[Show abstract][Hide abstract] ABSTRACT: The work described in the following report was initiated to investigate the possibility of using novel biotechnologies for the discovery, down-selection, and pre-validation of biomarkers of toxic substance effects within the warfighter prior to health and operational performance decrement. Using the biotechnology of metabonomics, this effort focused on using nuclear magnetic resonance (NMR) spectroscopy and ultra pressure liquid chromatography mass spectrometry (UPLC/MS) for identification of liver-selective toxic effects following exposure to a known hepatotoxicant (alpha-naphthylisothiocyanate; ANIT) that induces cholestasis. Urine samples were analyzed by NMR spectroscopy and UPLC/MS and the data processed and analyzed by principal component analysis, linear discriminant analysis, and hierarchical clustering analysis. NMR- and UPLC/MS-based urinary metabonomics were sensitive enough to detect ANIT-induced toxic effects with respect to both dose and time. Understanding the cellular response to chemical exposure at the molecular level will not only facilitate the elucidation of the mechanism of chemical toxicity, but also allow accurate prediction of chemical toxicity and phenotypic outcome. Ultimately, this will lead to the identification of novel biomarkers for rapid monitoring and prediction of health hazards to the warfighter associated with chemical exposure.
[Show abstract][Hide abstract] ABSTRACT: In many metabolomics studies, NMR spectra are divided into bins of fixed width. This spectral quantification technique, known
as uniform binning, is used to reduce the number of variables for pattern recognition techniques and to mitigate effects from
variations in peak positions; however, shifts in peaks near the boundaries can cause dramatic quantitative changes in adjacent
bins due to non-overlapping boundaries. Here we describe a new Gaussian binning method that incorporates overlapping bins
to minimize these effects. A Gaussian kernel weights the signal contribution relative to distance from bin center, and the
overlap between bins is controlled by the kernel standard deviation. Sensitivity to peak shift was assessed for a series of
test spectra where the offset frequency was incremented in 0.5Hz steps. For a 4Hz shift within a bin width of 24Hz, the
error for uniform binning increased by 150%, while the error for Gaussian binning increased by 50%. Further, using a urinary
metabolomics data set (from a toxicity study) and principal component analysis (PCA), we showed that the information content
in the quantified features was equivalent for Gaussian and uniform binning methods. The separation between groups in the PCA
scores plot, measured by the J
2 quality metric, is as good or better for Gaussian binning versus uniform binning. The Gaussian method is shown to be robust
in regards to peak shift, while still retaining the information needed by classification and multivariate statistical techniques
for NMR-metabolomics data.
[Show abstract][Hide abstract] ABSTRACT: In many metabolomics studies, NMR spectra are divided into bins of fixed width to reduce the number of variables for pattern recognition techniques and to mitigate effects from variations in peak positions. Using this method, shifts in peaks near bin boundaries can cause dramatic quantitative changes in adjacent bins. Here we describe a quantization technique using a Gaussian kernel that incorporates overlapping bins to minimize these effects. Sensitivity to peak shift was assessed for a series of test spectra where the offset frequency was incremented in 1 Hz steps. For a 4 Hz shift within a bin width of 24 Hz, the error for uniform binning increased by 150% while the error for Gaussian binning increased by 50%. Using a urinary metabolomics dataset (from a toxicity study) and principal component analysis (PCA), we showed that the information content in the quantified features was equivalent for Gaussian and uniform binning methods.
[Show abstract][Hide abstract] ABSTRACT: Nuclear magnetic resonance (NMR) spectroscopy is a non-invasive method of acquiring a metabolic profile from biofluids. This metabolic information may provide keys to the early detection of exposure to a toxin. A typical NMR toxicology data set has low sample size and high dimensionality. Thus, traditional pattern recognition techniques are not always feasible. In this paper, we evaluate several common alternatives for isolating these biomarkers. The fold test, unpaired t-test, and paired t-test were performed on an NMR-derived toxicological data set and results were compared. The paired t-test method was preferred, due to its ability to attribute statistical significance, to take into consideration consistency of a single subject over a time course, and to mitigate the low sample, high dimensionality problem. We then grouped the resulting statistically salient potential biomarkers based on their significance patterns and compared results to several known metabolites affected by the tested toxin. Based on these results, we present a statistical protocol of sequential t-tests and clustering techniques for identifying putative biomarkers. We then present the results of this protocol applied to a specific real world toxicological data set.
[Show abstract][Hide abstract] ABSTRACT: Nuclear magnetic resonance (NMR) spectroscopy is a non-invasive method of acquiring a metabolic profile from biofluids. Identifying biomarkers from these profiles may provide keys to the early detection of exposure to a toxin. Two common features of NMR data sets are small sample size and a large number of variables (i.e. high dimensionality). The high dimensionality arises from each sample spectrum being divided into a large number of regions, each of which is a dimension. Pattern recognition techniques can then be used to identify biomarkers from a data set that consists of metabolic profiles from a small number of samples. A typical first step of this analysis is to individually identify responsive spectral regions, followed by associating these regions with metabolites and biomarkers. In this paper, we evaluate several common alternatives to identify responsive regions, including the fold test, paired t-test, and logistic regression. Further, when performing these types of analyses, the issues of multiple-comparisons and false positive rates must be addressed. We compare several corrections for these issues including the Bonferroni, Holm’s, Westfall and Young, permutation, and bootstrap methods. The results of these statistical tests in combination with the multiple-comparison corrections were compared on both a simulated data set and an NMR-derived toxicology data set. Based on these results, we present a statistical protocol for determining putative biomarkers, designed to mitigate the low sample size, high dimensionality, and false positive issues associated with NMR data.