Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis

Metabolomics Research Group, RIKEN Plant Science Center, 1-7-22 Tsurumi-ku, Suehiro-cho, Yokohama, Kanagawa, 230-0045, Japan.
BMC Bioinformatics (Impact Factor: 2.67). 04/2010; 11:214. DOI: 10.1186/1471-2105-11-214
Source: PubMed

ABSTRACT Analysis of data from high-throughput experiments depends on the availability of well-structured data that describe the assayed biomolecules. Procedures for obtaining and organizing such meta-data on genes, transcripts and proteins have been streamlined in many data analysis packages, but are still lacking for metabolites. Chemical identifiers are notoriously incoherent, encompassing a wide range of different referencing schemes with varying scope and coverage. Online chemical databases use multiple types of identifiers in parallel but lack a common primary key for reliable database consolidation. Connecting identifiers of analytes found in experimental data with the identifiers of their parent metabolites in public databases can therefore be very laborious.
Here we present a strategy and a software tool for integrating metabolite identifiers from local reference libraries and public databases that do not depend on a single common primary identifier. The program constructs groups of interconnected identifiers of analytes and metabolites to obtain a local metabolite-centric SQLite database. The created database can be used to map in-house identifiers and synonyms to external resources such as the KEGG database. New identifiers can be imported and directly integrated with existing data. Queries can be performed in a flexible way, both from the command line and from the statistical programming environment R, to obtain data set tailored identifier mappings.
Efficient cross-referencing of metabolite identifiers is a key technology for metabolomics data analysis. We provide a practical and flexible solution to this task and an open-source program, the metabolite masking tool (MetMask), available at, that implements our ideas.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the ultimate goals in plant systems biology is to elucidate the genotype-phenotype relationship in plant cellular systems. Integrated network analysis that combines omics data with mathematical models has received particular attention. Here we focus on the latest cutting-edge computational advances that facilitate their combination. We highlight (1) network visualization tools, (2) pathway analyses, (3) genome-scale metabolic reconstruction, and (4) the integration of high-throughput experimental data and mathematical models. Multi-omics data that contain the genome, transcriptome, proteome, and metabolome and mathematical models are expected to integrate and expand our knowledge of complex plant metabolisms.
    Frontiers in Plant Science 11/2014; 5:598. DOI:10.3389/fpls.2014.00598 · 3.64 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite recent intensive research efforts in functional genomics, the functions of only a limited number of Arabidopsis (Arabidopsis thaliana) genes have been determined experimentally and improving gene annotation remains a major challenge in plant science. As metabolite profiling can characterize the metabolomic phenotype of a genetic perturbation in the plant metabolism, it provides clues to the function(s) of genes of interest. We chose 50 Arabidopsis mutants including a set of characterized and uncharacterized mutants, that resemble wild-type plants. We performed metabolite profiling of the plants using gas chromatography-mass spectrometry (GC-MS). To make the dataset available as an efficient public functional genomics tool for hypothesis generation, we developed our MeKO database. It allows evaluation of whether a mutation affects metabolism during normal plant growth and contains images of mutants, data on differences in metabolite accumulation, and interactive analysis tools. Non-processed data, including chromatograms, mass spectra, and experimental metadata, follow the guidelines set by Metabolomics Standards Initiative (MSI) and are freely downloadable. Proof-of-concept analysis suggests that the MeKO database is highly useful for the generation of hypotheses for genes of interest and for improving gene annotation. MeKO is publicly available at
    Plant physiology 05/2014; 165(3). DOI:10.1104/pp.114.240986 · 7.39 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Plants produce various volatile organic compounds (VOCs), which are thought to be a crucial factor in their interactions with harmful insects, plants and animals. Composition of VOCs may differ when plants are grown under different nutrient conditions, i.e., macronutrient-deficient conditions. However, in plants, relationships between macronutrient assimilation and VOC composition remain unclear. In order to identify the kinds of VOCs that can be emitted when plants are grown under various environmental conditions, we established a conventional method for VOC profiling in Arabidopsis thaliana (Arabidopsis) involving headspace-solid-phase microextraction-gas chromatography-time-of-flight-mass spectrometry (HS-SPME-GC-TOF-MS). We grew Arabidopsis seedlings in an HS vial to directly perform HS analysis. To maximize the analytical performance of VOCs, we optimized the extraction method and the analytical conditions of HP-SPME-GC-TOF-MS. Using the optimized method, we conducted VOC profiling of Arabidopsis seedlings, which were grown under two different nutrition conditions, nutrition-rich and nutrition-deficient conditions. The VOC profiles clearly showed a distinct pattern with respect to each condition. This study suggests that HS-SPME-GC-TOF-MS analysis has immense potential to detect changes in the levels of VOCs in not only Arabidopsis, but other plants grown under various environmental conditions.
    06/2013; 3(2):223-42. DOI:10.3390/metabo3020223

Full-text (3 Sources)

Available from
Jun 6, 2014