Article

Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis.

Metabolomics Research Group, RIKEN Plant Science Center, 1-7-22 Tsurumi-ku, Suehiro-cho, Yokohama, Kanagawa, 230-0045, Japan.
BMC Bioinformatics (impact factor: 2.75). 01/2010; 11:214. DOI:10.1186/1471-2105-11-214 pp.214
Source: PubMed

ABSTRACT Analysis of data from high-throughput experiments depends on the availability of well-structured data that describe the assayed biomolecules. Procedures for obtaining and organizing such meta-data on genes, transcripts and proteins have been streamlined in many data analysis packages, but are still lacking for metabolites. Chemical identifiers are notoriously incoherent, encompassing a wide range of different referencing schemes with varying scope and coverage. Online chemical databases use multiple types of identifiers in parallel but lack a common primary key for reliable database consolidation. Connecting identifiers of analytes found in experimental data with the identifiers of their parent metabolites in public databases can therefore be very laborious.
Here we present a strategy and a software tool for integrating metabolite identifiers from local reference libraries and public databases that do not depend on a single common primary identifier. The program constructs groups of interconnected identifiers of analytes and metabolites to obtain a local metabolite-centric SQLite database. The created database can be used to map in-house identifiers and synonyms to external resources such as the KEGG database. New identifiers can be imported and directly integrated with existing data. Queries can be performed in a flexible way, both from the command line and from the statistical programming environment R, to obtain data set tailored identifier mappings.
Efficient cross-referencing of metabolite identifiers is a key technology for metabolomics data analysis. We provide a practical and flexible solution to this task and an open-source program, the metabolite masking tool (MetMask), available at http://metmask.sourceforge.net, that implements our ideas.

0 0
 · 
0 Bookmarks
 · 
45 Views
  • Article: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.
    [show abstract] [hide abstract]
    ABSTRACT: DNA microarrays can be used to identify gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes. Using this approach, we identify a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle. Expression of these genes is high at sites of insulin-mediated glucose disposal, activated by PGC-1alpha and correlated with total-body aerobic capacity. Our results associate this gene set with clinically important variation in human metabolism and illustrate the value of pathway relationships in the analysis of genomic profiling experiments.
    Nature Genetics 08/2003; 34(3):267-73. · 35.53 Impact Factor
  • Article: Integrating functional knowledge during sample clustering for microarray data using unsupervised decision trees.
    [show abstract] [hide abstract]
    ABSTRACT: Clustering of microarray gene expression data is performed routinely, for genes as well as for samples. Clustering of genes can exhibit functional relationships between genes; clustering of samples on the other hand is important for finding e.g. disease subtypes, relevant patient groups for stratification or related treatments. Usually this is done by first filtering the genes for high-variance under the assumption that they carry most of the information needed for separating different sample groups. If this assumption is violated, important groupings in the data might be lost. Furthermore, classical clustering methods do not facilitate the biological interpretation of the results. Therefore, we propose to methodologically integrate the clustering algorithm with prior biological information. This is different from other approaches as knowledge about classes of genes can be directly used to ease the interpretation of the results and possibly boost clustering performance. Our approach computes dendrograms that resemble decision trees with gene classes used to split the data at each node which can help to find biologically meaningful differences between the sample groups. We have tested the proposed method both on simulated and real data and conclude its usefulness as a complementary method, especially when assumptions of few differentially expressed genes along with an informative mapping of genes to different classes are met.
    Biometrical Journal 05/2007; 49(2):214-29. · 1.25 Impact Factor
  • Source
    Article: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.
    [show abstract] [hide abstract]
    ABSTRACT: Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.
    Nucleic Acids Research 12/2008; 37(1):1-13. · 8.03 Impact Factor

Full-text (3 Sources)

View
5 Downloads
Available from
24 Jan 2013

Keywords

assayed biomolecules
 
Chemical identifiers
 
Connecting identifiers
 
data analysis packages
 
different referencing schemes
 
experimental data
 
external resources
 
integrating metabolite identifiers
 
key technology
 
local metabolite-centric SQLite database
 
map in-house identifiers
 
metabolite identifiers
 
metabolomics data analysis
 
New identifiers
 
Online chemical databases use multiple types
 
parent metabolites
 
program constructs groups
 
reliable database consolidation
 
tailored identifier mappings
 
well-structured data