ArticlePDF Available

MetaboLights - An open-access general-purpose repository for metabolomics studies and associated meta-data

Authors:

Abstract and Figures

MetaboLights (http://www.ebi.ac.uk/metabolights) is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.
Content may be subject to copyright.
A preview of the PDF is not available
... 1 H NMR human metabolome was obtained from 84 healthy volunteers and 50 T2DM patients. Raw 1D Bruker spectral data files were found in the MetaboLights database (Haug et al. (2013); study MTBLS1). In the original study, spectra were normalized by the area under the curve after excluding water (4.24-5.04 ...
... In this project, 1 H NMR spectra were acquired on a Bruker Avance III HD NMR spectrometer (Bruker SA, Wissembourg, France) operating at 600.13 MHz for 1 H resonance frequency from plasma of 97 Large White newborns collected on umbilical cord. NMR raw spectra are available in the Metabolights database (Haug et al., 2013): MTBLS2137. The same samples were also used to obtained the concentrations of 27 targeted amino acids measured with an Ultra Performance Liquid Chromatography (UPLC). ...
Thesis
Parmi les nombreuses données omiques qui décrivent le fonctionnement biologique d'un organisme, le métabolome suscite un intérêt croissant car il est plus proche des phénotypes d'intérêt et qu'il a donc avoir un potentiel important pour la recherche de biomarqueurs. La spectrométrie par résonance magnétique nucléaire (RMN) est une technologie haut-débit qui produit des spectres caractéristiques du mélange complexe de métabolites présents dans un échantillon d'intérêt. Cependant, leur interprétation biologique est difficile car ceux-ci ne donnent pas une mesure explicite des différentes quantités de métabolites présents dans l'échantillon.Une approche prometteuse pour l'analyse de ces données consiste à identifier et quantifier les métabolites présents dans le mélange complexe à partir de son spectre et à réaliser l'analyse statistique sur les résultats de cette quantification. Une première partie de cette thèse a consisté en l'amélioration d'une méthode de quantification existante, ASICS, ainsi qu'à son implémentation dans un package R/Bioconductor. Une nouvelle méthode, prenant en compte l'ensemble des spectres d'une expérience lors de la quantification, a aussi été proposée dans le but d'améliorer la fiabilité des résultats.Un second volet de cette thèse concerne l'application de cette méthode au problème de mortalité néonatale des porcelets et plus précisément à la description des mécanismes impliqués dans la mise en place de la maturité. L'analyse des spectres RMN de plasma, d'urine et de liquide amniotique de foetus en fin de gestation a permis d'identifier des voies métaboliques impliquant de nombreux acides aminés et sucres (croissance et apport d'énergie) ainsi que le métabolisme du glutathion (stress oxydatif).
... The Ribo-seq and RNA-seq data from this study have been submitted to the NCBI Gene Expression Omnibus (GEO) under accession number GSE143390. The raw proteomics data have been deposited to the MetabolLights study 70 repository with the dataset identifier MTBLS2443. Data resources used in this study are: hg38 genome FASTA (https://hgdownload.soe.ucsc.edu/ ...
Article
Full-text available
Ample evidence indicates that codon usage bias regulates gene expression. How viruses, such as the emerging mosquito-borne Chikungunya virus (CHIKV), express their genomes at high levels despite an enrichment in rare codons remains a puzzling question. Using ribosome footprinting, we analyze translational changes that occur upon CHIKV infection. We show that CHIKV infection induces codon-specific reprogramming of the host translation machinery to favor the translation of viral RNA genomes over host mRNAs with an otherwise optimal codon usage. This reprogramming was mostly apparent at the endoplasmic reticulum, where CHIKV RNAs show high ribosome occupancy. Mechanistically, it involves CHIKV-induced overexpression of KIAA1456, an enzyme that modifies the wobble U34 position in the anticodon of tRNAs, which is required for proper decoding of codons that are highly enriched in CHIKV RNAs. Our findings demonstrate an unprecedented interplay of viruses with the host tRNA epitranscriptome to adapt the host translation machinery to viral production.
... More generic repositories include Metabolights (Haug et al., 2013) or Metabolomics Workbench (Sud et al., 2016). Nevertheless, their coverage of MSI experiments is rather limited. ...
Article
Full-text available
Mass spectrometry imaging (MSI) has become a widespread analytical technique to perform nonlabeled spatial molecular identification. The Achilles' heel of MSI is the annotation and identification of molecular species due to intrinsic limitations of the technique (lack of chromatographic separation and the difficulty to apply tandem MS). Successful strategies to perform annotation and identification combine extra analytical steps, like using orthogonal analytical techniques to identify compounds; with algorithms that integrate the spectral and spatial information. In this review, we discuss different experimental strategies and bioinformatics tools to annotate and identify compounds in MSI experiments. We target strategies and tools for small molecule applications, such as lipidomics and metabolomics. First, we explain how sample preparation and the acquisition process influences annotation and identification, from sample preservation to the use of orthogonal techniques. Then, we review twelve software tools for annotation and identification in MSI. Finally, we offer perspectives on two current needs of the MSI community: the adaptation of guidelines for communicating confidence levels in identifications; and the creation of a standard format to store and exchange annotations and identifications in MSI.
... Raw LC-MS data and other details are publicly available for download with the accession number MTBLS2401 from the MetaboLights public repository (www.ebi.ac.uk/metabolights/MTBLS2401; Haug et al., 2013), according to the grapevine and wine metabolomics-based guidelines for FAIR data and metadata management (Savoi et al., 2021). ...
Article
Grape juice is a major source of potential health‐promoting bioactive polyphenols, especially for children and those who do not consume wine. Since the subtropical climate may negatively affect the concentrations of grape polyphenols, especially anthocyanins, elicitors such as methyl jasmonate (MeJa) could be used to promote polyphenol biosynthesis. This work aimed to investigate the impact of MeJa treatment on grape juice produced via a traditional low‐cost process from two Vitis labrusca cultivars and in two Brazilian regions. The untargeted LC‐MS analytical protocol demonstrated that Isabel Precoce juices strongly benefited from MeJa treatment, especially regarding their anthocyanic profile, regardless of the cultivation region. Known MeJa markers in wine and V. vinifera grapes (flavanols, flavonols, and stilbenes) in this experiment had mixed behaviours depending on the region/variety/cultivation. Moreover, it was found that all the detected hydroxycinnamates were influenced by the treatment, especially the concentration of their glucosides, which was increased. Glutathione, 2‐S‐glutathionyl caftaric acid, and indole lactic acid glucoside were identified for the first time as MeJa treatment biomarkers in grape products, indicating a possible positive effect on juice antioxidant properties.
... The strong association among age, obesity, and diabetes and significant disease outcomes suggests that metabolic disturbances may play important roles in how the infection progresses (Haug et al., 2013). Recent studies have revealed critical metabolic dysregulations occurring in COVID-19 cases Wu et al., 2020a;Jimenez et al., 2021). ...
Article
Full-text available
The severity, disabilities, and lethality caused by the coronavirus 2019 (COVID-19) disease have dumbfounded the entire world on an unprecedented scale. The multifactorial aspect of the infection has generated interest in understanding the clinical history of COVID-19, particularly the classification of severity and early prediction on prognosis. Metabolomics is a powerful tool for identifying metabolite signatures when profiling parasitic, metabolic, and microbial diseases. This study undertook a metabolomic approach to identify potential metabolic signatures to discriminate severe COVID-19 from non-severe COVID-19. The secondary aim was to determine whether the clinical and laboratory data from the severe and non-severe COVID-19 patients were compatible with the metabolomic findings. Metabolomic analysis of samples revealed that 43 metabolites from 9 classes indicated COVID-19 severity: 29 metabolites for non-severe and 14 metabolites for severe disease. The metabolites from porphyrin and purine pathways were significantly elevated in the severe disease group, suggesting that they could be potential prognostic biomarkers. Elevated levels of the cholesteryl ester CE (18:3) in non-severe patients matched the significantly different blood cholesterol components (total cholesterol and HDL, both p < 0.001) that were detected. Pathway analysis identified 8 metabolomic pathways associated with the 43 discriminating metabolites. Metabolomic pathway analysis revealed that COVID-19 affected glycerophospholipid and porphyrin metabolism but significantly affected the glycerophospholipid and linoleic acid metabolism pathways (p = 0.025 and p = 0.035, respectively). Our results indicate that these metabolomics-based markers could have prognostic and diagnostic potential when managing and understanding the evolution of COVID-19.
... The strong association among age, obesity, and diabetes and significant disease outcomes suggests that metabolic disturbances may play important roles in how the infection progresses (Haug et al., 2013). Recent studies have revealed critical metabolic dysregulations occurring in COVID-19 cases Wu et al., 2020a;Jimenez et al., 2021). ...
Article
Full-text available
The severity, disabilities, and lethality caused by the coronavirus 2019 (COVID-19) disease have dumbfounded the entire world on an unprecedented scale. The multifactorial aspect of the infection has generated interest in understanding the clinical history of COVID-19, particularly the classification of severity and early prediction on prognosis. Metabolomics is a powerful tool for identifying metabolite signatures when profiling parasitic, metabolic, and microbial diseases. This study undertook a metabolomic approach to identify potential metabolic signatures to discriminate severe COVID-19 from non-severe COVID-19. The secondary aim was to determine whether the clinical and laboratory data from the severe and non-severe COVID-19 patients were compatible with the metabolomic findings. Metabolomic analysis of samples revealed that 43 metabolites from 9 classes indicated COVID-19 severity: 29 metabolites for non-severe and 14 metabolites for severe disease. The metabolites from porphyrin and purine pathways were significantly elevated in the severe disease group, suggesting that they could be potential prognostic biomarkers. Elevated levels of the cholesteryl ester CE (18:3) in non-severe patients matched the significantly different blood cholesterol components (total cholesterol and HDL, both p < 0.001) that were detected. Pathway analysis identified 8 metabolomic pathways associated with the 43 discriminating metabolites. Metabolomic pathway analysis revealed that COVID-19 affected glycerophospholipid and porphyrin metabolism but significantly affected the glycerophospholipid and linoleic acid metabolism pathways (p = 0.025 and p = 0.035, respectively). Our results indicate that these metabolomics-based markers could have prognostic and diagnostic potential when managing and understanding the evolution of COVID-19.
Article
Peptidic natural products (PNPs) represent a medically important class of secondary metabolites that includes antibiotics, anti-inflammatory and antitumor agents. Advances in tandem mass spectra (MS/MS) acquisition and in silico database search methods have enabled high-throughput PNP discovery. However, the resulting spectra annotations are often error-prone and their validation remains a bottleneck. Here, we present NPvis, a visualizer suitable for the evaluation of PNP–MS/MS matches. The tool interactively maps annotated spectrum peaks to the corresponding PNP fragments and allows researchers to assess the match correctness. NPvis accounts for the wide chemical diversity of PNPs that prevents the use of the existing proteomics visualizers. Moreover, NPvis works even if the exact chemical structure of the matching PNP is unknown. The tool is available online and as a standalone application. We hope that it will benefit the community by streamlining PNP data analysis and validation.
Article
The types of metabolites measured in metabolomics studies vary depending on many factors, including differences in methods. Centralizing the distributed raw data is also often difficult due to confidentiality issues. These difficulties prevent the integrated analysis of metabolomic data from multiple studies. In this study, we extend the data collaboration analysis, an integrated data analysis method, by sharing dimensionality-reduced intermediate representations instead of the raw data to allow it to be applied to distributed data where the samples are completely different, and features are partially common. We then evaluated the improvement in performance using non-common features in the data collaboration analysis. For each of the four artificial datasets and the two datasets generated from metabolomics public data where samples are completely different and features are partially common, we compared the area under the curve in the receiver operating characteristic curve (ROC-AUC), AUC in the precision-recall curve (PR-AUC), accuracy, precision, recall, and F1 score using the data collaboration analysis with all the features including non-common features, the data collaboration analysis with only the common features of the distributed datasets, and a case where only local data were used for training. In most cases, the data collaboration analysis using all features demonstrated better results compared to the data collaboration analysis only using common features (by 1.3–4.8 points ROC-AUC for each dataset on average) or that trained on only one of the datasets (by 1.8–2.9 points ROC-AUC for each dataset on average). It was confirmed that the data collaboration analysis could integrate and analyze distributed data where samples are completely different and features are partially common, which can improve the classification accuracy in machine learning without sharing the raw data.
Article
Full-text available
Dysregulation of adipose tissue plasmalogen metabolism is associated with obesity-related metabolic diseases. We report that feeding mice a high-fat diet reduces adipose tissue lysoplasmalogen levels and increases transmembrane protein 86 A (TMEM86A), a putative lysoplasmalogenase. Untargeted lipidomic analysis demonstrates that adipocyte-specific TMEM86A-knockout (AKO) increases lysoplasmalogen content in adipose tissue, including plasmenyl lysophosphatidylethanolamine 18:0 (LPE P-18:0). Surprisingly, TMEM86A AKO increases protein kinase A signalling pathways owing to inhibition of phosphodiesterase 3B and elevation of cyclic adenosine monophosphate. TMEM86A AKO upregulates mitochondrial oxidative metabolism, elevates energy expenditure, and protects mice from metabolic dysfunction induced by high-fat feeding. Importantly, the effects of TMEM86A AKO are largely reproduced in vitro and in vivo by LPE P-18:0 supplementation. LPE P-18:0 levels are significantly lower in adipose tissue of human patients with obesity, suggesting that TMEM86A inhibition or lysoplasmalogen supplementation might be therapeutic approaches for preventing or treating obesity-related metabolic diseases. Dysregulation of plasmalogen metabolism in adipose tissue is associated with metabolic diseases. Here the authors characterize the role of adipocyte TMEM86A as a lysoplasmalogenase and show its deletion is protective against high fat diet induced metabolic disease, an effect that can be recapitulated by plasmenyl lysophosphatidylethanolamine 18:0 supplementation.
Article
To elucidate the function of oxidative phosphorylation (OxPhos) during B cell differentiation, we employ CD23Cre-driven expression of the dominant-negative K320E mutant of the mitochondrial helicase Twinkle (DNT). DNT-expression depletes mitochondrial DNA during B cell maturation, reduces the abundance of respiratory chain protein subunits encoded by mitochondrial DNA, and, consequently, respiratory chain super-complexes in activated B cells. Whereas B cell development in DNT mice is normal, B cell proliferation, germinal centers, class switch to IgG, plasma cell maturation, and T cell-dependent as well as T cell-independent humoral immunity are diminished. DNT expression dampens OxPhos but increases glycolysis in lipopolysaccharide and B cell receptor-activated cells. Lipopolysaccharide-activated DNT-B cells exhibit altered metabolites of glycolysis, the pentose phosphate pathway, and the tricarboxylic acid cycle and a lower amount of phosphatidic acid. Consequently, mTORC1 activity and BLIMP1 induction are curtailed, whereas HIF1α is stabilized. Hence, mitochondrial DNA controls the metabolism of activated B cells via OxPhos to foster humoral immunity.
Article
Full-text available
Metabolite profiling in biomarker discovery, enzyme substrate assignment, drug activity/specificity determination, and basic metabolic research requires new data preprocessing approaches to correlate specific metabolites to their biological origin. Here we introduce an LC/MS-based data analysis approach, XCMS, which incorporates novel nonlinear retention time alignment, matched filtration, peak detection, and peak matching. Without using internal standards, the method dynamically identifies hundreds of endogenous metabolites for use as standards, calculating a nonlinear retention time correction profile for each sample. Following retention time correction, the relative metabolite ion intensities are directly compared to identify changes in specific endogenous metabolites, such as potential biomarkers. The software is demonstrated using data sets from a previously reported enzyme knockout study and a large-scale study of plasma samples. XCMS is freely available under an open-source license at http://metlin.scripps.edu/download/.
Article
Full-text available
Metabolic serotypes sensitive to caloric intake may enable sera metabolomic profiles to validate epidemiological parameters and predict disease risk in humans. This long-range goal is complicated by the lack of known state markers and the requirement for simultaneous monitoring of multiple small changes. Therefore, analytical precision for appropriate high data density studies using HPLC separations coupled with coulometric array detectors was evaluated over a two month period in pooled rat sera samples (previously collected and stored at –80C), and in authentic biochemical standards. In sera, mean coefficients of variation (CV) of retention time and ratio accuracy within the established metabolic serotype varied within 1% and 3%, respectively. In sets of purified standards, the same parameters fluctuated, correspondently, in ranges of 0.1% and 1%. Median CV of the metabolite concentrations were ~13% in standards and ~11–19% in sera, and varied non-monotonically with the analytical system status and experimental design. These parameters were shown to be sufficiently controlled so as not to dominate intra-group biological variability in serum metabolomics studies. Continuation of experimental runs across an analytical breakpoint (column replacement) was associated with disproportionate changes in metabolite concentrations, independent of maintained analytical precision. These changes were sufficient to shift overall profile localization in megavariate projection analyses. We developed a mathematical approach to normalize this break and use partial least squares projection to latent structure discriminant analysis to confirm validity of this normalization approach. This generally applicable mathematical correction helps enable longer term high data density studies by removing a critical source of systemic variation.
Article
Full-text available
The goal of this group is to define the reporting requirements associated with the statistical analysis (including univariate, multivariate, informatics, machine learning etc.) of metabolite data with respect to other measured/collected experimental data (often called meta-data). These definitions will embrace as many aspects of a complete metabolomics study as possible at this time. In chronological order this will include: Experimental Design, both in terms of sample collection/matching, and data acquisition scheduling of samples through whichever spectroscopic technology used; Deconvolution (if required); Pre-processing, for example, data cleaning, outlier detection, row/column scaling, or other transformations; Definition and parameterization of subsequent visualizations and Statistical/Machine learning Methods applied to the dataset; If required, a clear definition of the Model Validation Scheme used (including how data are split into training/validation/test sets); Formal indication on whether the data analysis has been Independently Tested (either by experimental reproduction, or blind hold out test set). Finally, data interpretation and the visual representations and hypotheses obtained from the data analyses.
Article
Full-text available
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.
Chapter
In the post-genomic era, biological science continues atransition from apredominantly qualitative towards an increasingly quantitative science. Genomic, transcriptomic, proteomic, and now metabolomic technologies significantly contribute to the generation of huge amounts of data. These data, which typically describe changes in gene expression or changes in protein and metabolite pools, cannot effectively be analysed and interpreted by computer based programming if access is only provided through traditional publication schemes. Therefore ‘-omics’ data sets require formalised representation and access through databases. Otherwise important information will be lost which may serve as reference data for current and future science. Transcript and protein profiling is dominated by few almost comprehensive technologies. In contrast, the metabolomic field will require multiple analytical profiling approaches to cover the chemical multitude of primary and secondary metabolism. As aconsequence, technology-oriented metabolomics databases start to emerge. We will use GC-TOF-MS-based metabolite profiling as an example for the prototypical design of central database objects and structures. The focus will be on the required detailed information for the archiving of metabolite fingerprinting and profiling data sets. Special consideration is given to aspects of maintaining information sufficient and necessary for the experimental reproduction of metabolite identification and quantification results. Both aspects are essential for the sustainable use of GC-TOF-MS-based metabolite profiling and for the comparison to other metabolomics technologies.
Article
The most successful collaborative projects in the scientific panoply are those that provide solutions to problems shared by many worldwide. The ELIXIR project is one such example, its aim being to ensure that European countries are able to respond to the set of grand challenges that they all face. ELIXIR has as its stated mission ‘the construction and operation of a sustainable infrastructure for biological information in Europe to support life science research and its translation to medicine and the environment, the bio-industries and society’.
Article
Many species that contribute to the commercial and ecological richness of our marine ecosystems are harbingers of environmental change. The ability of organisms to rapidly detect and respond to changes in the surrounding environment represents the foundation for application of molecular profiling technologies towards marine sentinel species in an attempt to identify signature profiles that may reside within the transcriptome, proteome, or metabolome and that are indicative of a particular environmental exposure event. The current review highlights recent examples of the biological information obtained for marine sentinel teleosts, mammals, and invertebrates. While in its infancy, such basal information can provide a systems biology framework in the detection and evaluation of environmental chemical contaminant effects on marine fauna. Repeated evaluation across different seasons and local marine environs will lead to discrimination between signature profiles representing normal variation within the complex milieu of environmental factors that trigger biological response in a given sentinel species and permit a greater understanding of normal versus anthropogenic-associated modulation of biological pathways, which prove detrimental to marine fauna. It is anticipated that incorporation of contaminant-specific molecular signatures into current risk assessment paradigms will lead to enhanced wildlife management strategies that minimize the impacts of our industrialized society on marine ecosystems.