[Show abstract][Hide abstract] ABSTRACT: In metabolomics there is an ever-growing need for faster and more comprehensive analysis methods to cope with the increasing size of biological studies. Direct-infusion ion-cyclotron-resonance Fourier-transform spectrometry (DI-ICR-FT-MS) is used in non-targeted metabolomics to obtain high-resolution snapshots of the metabolic state of a system. We applied this technology to a Caenorhabditis elegans-Pseudomonas aeruginosa infection model and optimized times needed for cultivation and mass-spectrometric analysis. Our results reveal that DI-ICR-FT-MS is a promising tool for high-throughput in-depth non-targeted metabolomics. We performed whole-worm metabolomics and recovered markers of the induced metabolic changes in C. elegans brought about by interaction with pathogens. In this investigation, we reveal complex metabolic phenotypes enabling clustering based upon challenge. Specifically, we observed a marked decrease in amino-acid metabolism with infection by P. aeruginosa and a marked increase in sugar metabolism with infection by Salmonella enterica. We were also able to discriminate between infection with a virulent wild-type Pseudomonas and with an attenuated mutant, making it possible to use this method in larger genetic screens to identify host and pathogen effectors affecting the metabolic phenotype of infection.
Analytical and Bioanalytical Chemistry 11/2014; · 3.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.
[Show abstract][Hide abstract] ABSTRACT: The pathobiology of common diseases is influenced by heterogeneous factors interacting in complex networks. CIDeR http://mips.helmholtz-muenchen.de/cider/ is a publicly available, manually curated, integrative database of metabolic and neurological disorders. The resource provides structured information on 18,813 experimentally validated interactions between molecules, bioprocesses and environmental factors extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make CIDeR a versatile knowledge base for biologists, analysis of large-scale data and systems biology approaches.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain. RESULTS: Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*alpha) is a conservative critical value for the p-gain, where alpha is the level of significance and B the number of tested metabolite pairs. CONCLUSIONS: We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
[Show abstract][Hide abstract] ABSTRACT: Systems Biology is a field in biological science that focuses on the combination of several or all "omics"-approaches in order to find out how genes, transcripts, proteins and metabolites act together in the network of life. Metabolomics as analog to genomics, transcriptomics and proteomics is more and more integrated into biological studies and often transcriptomic and metabolomic experiments are combined in one setup. At a first glance both data types seem to be completely different, but both produce information on biological entities, either transcripts or metabolites. Both types can be overlaid on metabolic pathways to obtain biological information on the studied system. For the joint analysis of both data types the MassTRIX webserver was updated. MassTRIX is freely available at www.masstrix.org.
PLoS ONE 01/2012; 7(7):e39860. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.
[Show abstract][Hide abstract] ABSTRACT: Metabolomics is an emerging field that is based on the quantitative measurement of as many small organic molecules occurring in a biological sample as possible. Due to recent technical advances, metabolomics can now be used widely as an analytical high-throughput technology in drug testing and epidemiological metabolome and genome wide association studies. Analogous to chip-based gene expression analyses, the enormous amount of data produced by modern kit-based metabolomics experiments poses new challenges regarding their biological interpretation in the context of various sample phenotypes. We developed metaP-server to facilitate data interpretation. metaP-server provides automated and standardized data analysis for quantitative metabolomics data, covering the following steps from data acquisition to biological interpretation: (i) data quality checks, (ii) estimation of reproducibility and batch effects, (iii) hypothesis tests for multiple categorical phenotypes, (iv) correlation tests for metric phenotypes, (v) optionally including all possible pairs of metabolite concentration ratios, (vi) principal component analysis (PCA), and (vii) mapping of metabolites onto colored KEGG pathway maps. Graphical output is clickable and cross-linked to sample and metabolite identifiers. Interactive coloring of PCA and bar plots by phenotype facilitates on-line data exploration. For users of commercial metabolomics kits, cross-references to the HMDB, LipidMaps, KEGG, PubChem, and CAS databases are provided. metaP-server is freely accessible at http://metabolomics.helmholtz-muenchen.de/metap2/.
BioMed Research International 01/2011; 2011. · 2.71 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A decline in body insulin sensitivity in apparently healthy individuals indicates a high risk to develop type 2 diabetes. Investigating the metabolic fingerprints of individuals with different whole body insulin sensitivity according to the formula of Matsuda, et al. (ISI(Matsuda)) by a non-targeted metabolomics approach we aimed a) to figure out an unsuspicious and altered metabolic pattern, b) to estimate a threshold related to these changes based on the ISI, and c) to identify the metabolic pathways responsible for the discrimination of the two patterns.
By applying infusion ion cyclotron resonance Fourier transform mass spectrometry, we analyzed plasma of 46 non-diabetic subjects exhibiting high to low insulin sensitivities. The orthogonal partial least square model revealed a cluster of 28 individuals with alterations in their metabolic fingerprints associated with a decline in insulin sensitivity. This group could be separated from 18 subjects with an unsuspicious metabolite pattern. The orthogonal signal correction score scatter plot suggests a threshold of an ISI(Matsuda) of 15 for the discrimination of these two groups. Of note, a potential subgroup represented by eight individuals (ISI(Matsuda) value between 8.5 and 15) was identified in different models. This subgroup may indicate a metabolic transition state, since it is already located within the cluster of individuals with declined insulin sensitivity but the metabolic fingerprints still show some similarities with unaffected individuals (ISI >15). Moreover, the highest number of metabolite intensity differences between unsuspicious and altered metabolic fingerprints was detected in lipid metabolic pathways (arachidonic acid metabolism, metabolism of essential fatty acids and biosynthesis of unsaturated fatty acids), steroid hormone biosyntheses and bile acid metabolism, based on data evaluation using the metabolic annotation interface MassTRIX.
Our results suggest that altered metabolite patterns that reflect changes in insulin sensitivity respectively the ISI(Matsuda) are dominated by lipid-related pathways. Furthermore, a metabolic transition state reflected by heterogeneous metabolite fingerprints may precede severe alterations of metabolism. Our findings offer future prospects for novel insights in the pathogenesis of the pre-diabetic phase.
PLoS ONE 01/2010; 5(10):e13317. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing approximately 16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a 'Phylogenetic Conservation' analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html).
Nucleic Acids Research 11/2009; 38(Database issue):D497-501. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The bacterial type II protein secretion (T2S) and type IV piliation (T4P) systems share several common features. In particular, it is well established that the T2S system requires the function of a pilus-like structure, called pseudopilus, which is built upon assembly of pilin-like subunits, called pseudopilins. Pilins and pseudopilins have a hydrophobic N-terminal region, which precedes an extended hydrophilic C-terminal region. In the case of pilins, it was shown that oligomerisation and formation of helical fibers, takes place through interaction between the hydrophobic domains. XcpT, is the most abundant protein of the Pseudomonas aeruginosa T2S, and was proposed to be the main component in the pseudopilus. In this study we present the high-resolution NMR structure of the hydrophilic domain of XcpT (XcpTp). XcpTp is lacking the C-terminal disulfide bridged "D" domain found in type IV pilins and likely involved in receptor binding. This is in agreement with the idea that the XcpT-containing pseudopilus is required for protein secretion and not for bacterial attachment. Interestingly, by solving the 3D structure of XcpTp we revealed that the previously called alphabeta-loop pilin region is in fact highly conserved among major type II pseudopilins and constitutes a specific consensus motif for identifying major pseudopilins, which belong to this family.
Journal of Structural Biology 09/2009; 169(1):75-80. · 3.36 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Cross-mapping of gene and protein identifiers between different databases is a tedious and time-consuming task. To overcome this, we developed CRONOS, a cross-reference server that contains entries from five mammalian organisms presented by major gene and protein information resources. Sequence similarity analysis of the mapped entries shows that the cross-references are highly accurate. In total, up to 18 different identifier types can be used for identification of cross-references. The quality of the mapping could be improved substantially by exclusion of ambiguous gene and protein names which were manually validated. Organism-specific lists of ambiguous terms, which are valuable for a variety of bioinformatics applications like text mining are available for download. AVAILABILITY: CRONOS is freely available to non-commercial users at http://mips.gsf.de/genre/proj/cronos/index.html, web services are available at http://mips.gsf.de/CronosWSService/CronosWS?wsdl.
[Show abstract][Hide abstract] ABSTRACT: The generation of expressed sequence tag (EST) libraries offers an affordable approach to investigate organisms, if no genome sequence is available. OREST (http://mips.gsf.de/genre/proj/orest/index.html) is a server-based EST analysis pipeline, which allows the rapid analysis of large amounts of ESTs or cDNAs from mammalia and fungi. In order to assign the ESTs to genes or proteins OREST maps DNA sequences to reference datasets of gene products and in a second step to complete genome sequences. Mapping against genome sequences recovers additional 13% of EST data, which otherwise would escape further analysis. To enable functional analysis of the datasets, ESTs are functionally annotated using the hierarchical FunCat annotation scheme as well as GO annotation terms. OREST also allows to predict the association of gene products and diseases by Morbid Map (OMIM) classification. A statistical analysis of the results of the dataset is possible with the included PROMPT software, which provides information about enrichment and depletion of functional and disease annotation terms. OREST was successfully applied for the identification and functional characterization of more than 3000 EST sequences of the common marmoset monkey (Callithrix jacchus) as part of an international collaboration.
Nucleic Acids Research 08/2008; 36(Web Server issue):W140-4. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes.
Nucleic Acids Research 02/2008; 36(Database issue):D646-50. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal model, the molecular tools for genetic analysis are extremely limited.
Here we report the development of the first marmoset-specific oligonucleotide microarray (EUMAMA) containing probe sets targeting 1541 different marmoset transcripts expressed in hippocampus. These 1541 transcripts represent a wide variety of different functional gene classes. Hybridisation of the marmoset microarray with labelled RNA from hippocampus, cortex and a panel of 7 different peripheral tissues resulted in high detection rates of 85% in the neuronal tissues and on average 70% in the non-neuronal tissues. The expression profiles of the 2 neuronal tissues, hippocampus and cortex, were highly similar, as indicated by a correlation coefficient of 0.96. Several transcripts with a tissue-specific pattern of expression were identified. Besides the marmoset microarray we have generated 3215 ESTs derived from marmoset hippocampus, which have been annotated and submitted to GenBank [GenBank: EF214838-EF215447, EH380242-EH382846].
We have generated the first marmoset-specific DNA microarray and demonstrated its use to characterise large-scale gene expression profiles of hippocampus but also of other neuronal and non-neuronal tissues. In addition, we have generated a large collection of ESTs of marmoset origin, which are now available in the public domain. These new tools will facilitate molecular genetic research into this non-human primate animal model.