Brigitte Wägele

Technische Universität München, München, Bavaria, Germany

Are you Brigitte Wägele?

Claim your profile

Publications (21)115.63 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In metabolomics there is an ever-growing need for faster and more comprehensive analysis methods to cope with the increasing size of biological studies. Direct-infusion ion-cyclotron-resonance Fourier-transform spectrometry (DI-ICR-FT-MS) is used in non-targeted metabolomics to obtain high-resolution snapshots of the metabolic state of a system. We applied this technology to a Caenorhabditis elegans-Pseudomonas aeruginosa infection model and optimized times needed for cultivation and mass-spectrometric analysis. Our results reveal that DI-ICR-FT-MS is a promising tool for high-throughput in-depth non-targeted metabolomics. We performed whole-worm metabolomics and recovered markers of the induced metabolic changes in C. elegans brought about by interaction with pathogens. In this investigation, we reveal complex metabolic phenotypes enabling clustering based upon challenge. Specifically, we observed a marked decrease in amino-acid metabolism with infection by P. aeruginosa and a marked increase in sugar metabolism with infection by Salmonella enterica. We were also able to discriminate between infection with a virulent wild-type Pseudomonas and with an attenuated mutant, making it possible to use this method in larger genetic screens to identify host and pathogen effectors affecting the metabolic phenotype of infection.
    Analytical and Bioanalytical Chemistry 11/2014; 407(4):1-15. DOI:10.1007/s00216-014-8331-5 · 3.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.
    PLoS Genetics 10/2012; 8(10):e1003005. DOI:10.1371/journal.pgen.1003005 · 8.17 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The pathobiology of common diseases is influenced by heterogeneous factors interacting in complex networks. CIDeR http://mips.helmholtz-muenchen.de/cider/ is a publicly available, manually curated, integrative database of metabolic and neurological disorders. The resource provides structured information on 18,813 experimentally validated interactions between molecules, bioprocesses and environmental factors extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make CIDeR a versatile knowledge base for biologists, analysis of large-scale data and systems biology approaches.
    Genome biology 07/2012; 13(7):R62. DOI:10.1186/gb-2012-13-7-r62 · 10.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Systems Biology is a field in biological science that focuses on the combination of several or all "omics"-approaches in order to find out how genes, transcripts, proteins and metabolites act together in the network of life. Metabolomics as analog to genomics, transcriptomics and proteomics is more and more integrated into biological studies and often transcriptomic and metabolomic experiments are combined in one setup. At a first glance both data types seem to be completely different, but both produce information on biological entities, either transcripts or metabolites. Both types can be overlaid on metabolic pathways to obtain biological information on the studied system. For the joint analysis of both data types the MassTRIX webserver was updated. MassTRIX is freely available at www.masstrix.org.
    PLoS ONE 07/2012; 7(7):e39860. DOI:10.1371/journal.pone.0039860 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain. Results Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs. Conclusions We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
    BMC Bioinformatics 06/2012; 13(1):120. DOI:10.1186/1471-2105-13-120 · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Caenorhabditis elegans is a widely used model organism, which was introduced in the 1960s by Sydney Brenner. Its sequenced genome, the easy cultivation, the short reproduction cycle and transparency make it an ideal model organism. We use C. elegans to explore the effects of pathogens, like Pseudomonas aeruginosa, or beneficial bacteria, like probiotics on the metabolome. For non-targeted metabolomics studies ultrahigh performance liquid chromatography – ultrahigh resolution time of flight mass spectrometry (UPLC-UHR-ToF-MS) and ion cyclotron resonance – Fourier transform mass spectrometry (ICR-FT/MS) are conducted. Both techniques allow precise measurement of up to 5000 features or over 20000 masses, for UPLC-UHR-ToF-MS or ICR-FT/MS respectively on a routine basis. Together with sophisticated extraction methods this yields are broad coverage of the C. elegans metabolome. First work shows that this methodology is able to discriminate between different states in P. aeruginosa infection and other stresses. To evaluate if probiotic feeding offers benefits in P. aeruginosa infection, C. elegans was pre-fed with different strains of probiotics prior to infection. Survival was compared against Escherichia coli OP50, a standard laboratory food for C. elegans. Most of the probiotics strain didn´t show any effect, whereas two showed a positive and one a negative effect on survival. C. elegans fed with these strains are currently under metabolome measurement. First results show that our platform is able to discriminate between different infection states, proofing that it is applicable to C. elegans biology. Previous research already discovered positive effects of Lactobacilllus isolates in a C. elegans – S. thyphimurium model. Our First results from C. elegans killing assays show that two probiotics offer protection also in P. aeruginosa infections. Metabolome analysis will show which metabolic pathways are affected by infection and how probiotics can alter the metabolism to increase resistance. Together with bioactivity guided fractionation of small molecules from probiotic cultures, this will allow identification of active compounds offering positive effects produced by probiotics.
    3rd TNO Beneficial Microbes Conference, Amsterdam; 03/2012
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.
    Nature 09/2011; 477(7362):54-60. DOI:10.1038/nature10354 · 42.35 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Caenorhabditis elegans is a widely used model organism, which was introduced in the 1960s by Sydney Brenner. Its sequenced genome, the easy cultivation, the short reproduction cycle and transparency makes it an ideal model organism. It is routinely used for studying host-pathogen interactions, neurobiology, physiology, ecology and many more. Despite its popularity, surprisingly few metabolomics studies, mostly using NMR have been carried out using C. elegans. Here we present the setup of a MS based metabolomic platform for C. elegans studies. The platform consists of two separated instruments. On the one hand, an ICR-FT-MS with a 12 T magnet and on the other hand a UPLC-UHR-ToF-MS for non-targeted metabolomics and compound purification are used. Additionally we apply a NanoMate as Chip-ESI-Source and online fraction collector at the UPLC-MS system. The performed chromatographic methods were RP and HILIC for the separation of both, non-polar and polar, metabolites. All measurements were carried out in positive and negative ionization mode. Mass accuracy and precision during the measurement were achieved by independent calibration of each UPLC-MS run with a standard mix and in combination with a lock mass. Automated data pre-processing including calibration, chromatogram deconvolution, peak picking and alignment saved valuable time. For data analysis multivariate statistics, unsupervised (e.g. HCA, PCA) and supervised methods (PLS) were chosen. The developed methods are currently applied to different projects using C. elegans as model organism in the fields of host-pathogen interactions, nutritional studies, probiotic research and the analysis of knock-out mutants. Ongoing developments in sample preparation and chromatographic methods, in combination with the implemented automated data pre-processing will allow us to use this platform for high throughput metabolome analysis of bigger sample amounts and studies in the future. [1] Blaise BJ, Giacomotto J, Elena Bnd, Dumas M-E, Toulhoat P, et al. (2007) Metabotyping of Caenorhabditis elegans reveals latent phenotypes. Proceedings of the National Academy of Sciences 104: 19808-19812. [2] Suhre K, Schmitt-Kopplin P (2008) MassTRIX: mass translator into pathways. Nucleic Acids Research 36: W481-484. [3] Pluskal T, Castillo S, Villar-Briones A, Oresic M (2010) MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11: 395.
    Trends in Metabolomics - Analytics and Applications; 05/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Caenorhabditis elegans is a widely used model organism, which was introduced in the 1960s by Sydney Brenner. Its sequenced genome, the easy cultivation, the short reproduction cycle and transparency makes it an ideal model organism. It is routinely used for studying host-pathogen interactions, neurobiology, physiology, ecology and many more. Despite its popularity, surprisingly few metabolomics studies, mostly using NMR have been carried out using C. elegans. Here we present the setup of a MS based metabolomic platform for C. elegans studies. The platform consists of two separated instruments. On the one hand, an ICR-FT-MS with a 12 T magnet and on the other hand a UPLC-UHR-ToF-MS for non-targeted metabolomics and compound purification are used. Additionally we apply a NanoMate as Chip-ESI-Source and online fraction collector at the UPLC-MS system. The performed chromatographic methods were RP and HILIC for the separation of both, non-polar and polar, metabolites. All measurements were carried out in positive and negative ionization mode. Mass accuracy and precision during the measurement were achieved by independent calibration of each UPLC-MS run with a standard mix and in combination with a lock mass. Automated data pre-processing including calibration, chromatogram deconvolution, peak picking and alignment saved valuable time. For data analysis multivariate statistics, unsupervised (e.g. HCA, PCA) and supervised methods (PLS) were chosen. The developed methods are currently applied to different projects using C. elegans as model organism in the fields of host-pathogen interactions, nutritional studies, probiotic research and the analysis of knock-out mutants. Ongoing developments in sample preparation and chromatographic methods, in combination with the implemented automated data pre-processing will allow us to use this platform for high throughput metabolome analysis of bigger sample amounts and studies in the future. [1] Blaise BJ, Giacomotto J, Elena Bnd, Dumas M-E, Toulhoat P, et al. (2007) Metabotyping of Caenorhabditis elegans reveals latent phenotypes. Proceedings of the National Academy of Sciences 104: 19808-19812. [2] Suhre K, Schmitt-Kopplin P (2008) MassTRIX: mass translator into pathways. Nucleic Acids Research 36: W481-484. [3] Pluskal T, Castillo S, Villar-Briones A, Oresic M (2010) MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11: 395.
    Anakon 2011; 03/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Metabolomics is an emerging field that is based on the quantitative measurement of as many small organic molecules occurring in a biological sample as possible. Due to recent technical advances, metabolomics can now be used widely as an analytical high-throughput technology in drug testing and epidemiological metabolome and genome wide association studies. Analogous to chip-based gene expression analyses, the enormous amount of data produced by modern kit-based metabolomics experiments poses new challenges regarding their biological interpretation in the context of various sample phenotypes. We developed metaP-server to facilitate data interpretation. metaP-server provides automated and standardized data analysis for quantitative metabolomics data, covering the following steps from data acquisition to biological interpretation: (i) data quality checks, (ii) estimation of reproducibility and batch effects, (iii) hypothesis tests for multiple categorical phenotypes, (iv) correlation tests for metric phenotypes, (v) optionally including all possible pairs of metabolite concentration ratios, (vi) principal component analysis (PCA), and (vii) mapping of metabolites onto colored KEGG pathway maps. Graphical output is clickable and cross-linked to sample and metabolite identifiers. Interactive coloring of PCA and bar plots by phenotype facilitates on-line data exploration. For users of commercial metabolomics kits, cross-references to the HMDB, LipidMaps, KEGG, PubChem, and CAS databases are provided. metaP-server is freely accessible at http://metabolomics.helmholtz-muenchen.de/metap2/.
    BioMed Research International 01/2011; 2011. DOI:10.1155/2011/839862 · 2.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A decline in body insulin sensitivity in apparently healthy individuals indicates a high risk to develop type 2 diabetes. Investigating the metabolic fingerprints of individuals with different whole body insulin sensitivity according to the formula of Matsuda, et al. (ISI(Matsuda)) by a non-targeted metabolomics approach we aimed a) to figure out an unsuspicious and altered metabolic pattern, b) to estimate a threshold related to these changes based on the ISI, and c) to identify the metabolic pathways responsible for the discrimination of the two patterns. By applying infusion ion cyclotron resonance Fourier transform mass spectrometry, we analyzed plasma of 46 non-diabetic subjects exhibiting high to low insulin sensitivities. The orthogonal partial least square model revealed a cluster of 28 individuals with alterations in their metabolic fingerprints associated with a decline in insulin sensitivity. This group could be separated from 18 subjects with an unsuspicious metabolite pattern. The orthogonal signal correction score scatter plot suggests a threshold of an ISI(Matsuda) of 15 for the discrimination of these two groups. Of note, a potential subgroup represented by eight individuals (ISI(Matsuda) value between 8.5 and 15) was identified in different models. This subgroup may indicate a metabolic transition state, since it is already located within the cluster of individuals with declined insulin sensitivity but the metabolic fingerprints still show some similarities with unaffected individuals (ISI >15). Moreover, the highest number of metabolite intensity differences between unsuspicious and altered metabolic fingerprints was detected in lipid metabolic pathways (arachidonic acid metabolism, metabolism of essential fatty acids and biosynthesis of unsaturated fatty acids), steroid hormone biosyntheses and bile acid metabolism, based on data evaluation using the metabolic annotation interface MassTRIX. Our results suggest that altered metabolite patterns that reflect changes in insulin sensitivity respectively the ISI(Matsuda) are dominated by lipid-related pathways. Furthermore, a metabolic transition state reflected by heterogeneous metabolite fingerprints may precede severe alterations of metabolism. Our findings offer future prospects for novel insights in the pathogenesis of the pre-diabetic phase.
    PLoS ONE 10/2010; 5(10):e13317. DOI:10.1371/journal.pone.0013317 · 3.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bacterial protein secretion represents a key mechanism for infection and pathogenesis and enables the modulation of infected hosts by pathogenic bacteria. The diagnostic and therapeutic potential of this process is not limited to the secretion machineries and the segregated proteins themselves, but includes the largely unknown effects of these proteins on the host cells. To exploit this potential, the Pathomics project (http://webclu.bio.wzw.tum.de/pathomics) within the ERA-NET PathoGenoMics, focus on the systematic and integrative investigation of the host-pathogen protein-protein interactomes of human pathogens and their influence on the metabolic system of the host cells. Two bacteria, Pseudomonas aeruginosa, a gram-negative opportunistic human pathogen, and Chlamydiae, gram-negative obligate intercellular human pathogens which cause several diseases like trachoma or sexually transmitted diseases are the topic of this research, funded by the ERA-NET PathoGenoMics. The project is divided in three work packages (WP). WP1 focuses on the transcriptional context of the secretomes, WP2 on the host-pathogen protein-protein interactomes and WP3 on the host-pathogen metabolomes. We present here the different approaches for the analysis of the metabolomes in the consortium. Non-Targeted analysis is performed on FT-ICR-MS. Since thousand of signals are expected from these experiments, MassTRIX - a web server for database driven metabolite annotation - is used as one part of the data analysis. Furthermore different targeted and non-targeted analytical methods like UPLC-MS, CE-MS or 2D-LC, also coupled to FT-ICR-MS are used for targeted analysis and high resolution mapping of the metabolomes. Suhre, K. and Schmitt-Kopplin, Ph. MassTRIX: mass translator into pathways, Nucleic Acid Research, 2008; doi: 10.1093/nar/gkn194
    Metabolomics and More, München; 10/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing approximately 16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a 'Phylogenetic Conservation' analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html).
    Nucleic Acids Research 11/2009; 38(Database issue):D497-501. DOI:10.1093/nar/gkp914 · 9.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The bacterial type II protein secretion (T2S) and type IV piliation (T4P) systems share several common features. In particular, it is well established that the T2S system requires the function of a pilus-like structure, called pseudopilus, which is built upon assembly of pilin-like subunits, called pseudopilins. Pilins and pseudopilins have a hydrophobic N-terminal region, which precedes an extended hydrophilic C-terminal region. In the case of pilins, it was shown that oligomerisation and formation of helical fibers, takes place through interaction between the hydrophobic domains. XcpT, is the most abundant protein of the Pseudomonas aeruginosa T2S, and was proposed to be the main component in the pseudopilus. In this study we present the high-resolution NMR structure of the hydrophilic domain of XcpT (XcpTp). XcpTp is lacking the C-terminal disulfide bridged "D" domain found in type IV pilins and likely involved in receptor binding. This is in agreement with the idea that the XcpT-containing pseudopilus is required for protein secretion and not for bacterial attachment. Interestingly, by solving the 3D structure of XcpTp we revealed that the previously called alphabeta-loop pilin region is in fact highly conserved among major type II pseudopilins and constitutes a specific consensus motif for identifying major pseudopilins, which belong to this family.
    Journal of Structural Biology 09/2009; 169(1):75-80. DOI:10.1016/j.jsb.2009.09.003 · 3.23 Impact Factor
  • Source
    Brigitte Wägele
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent developments in bioanalytical techniques allow for comprehensive analyses of organisms with respect to genomics, transcriptomics and proteomics. To visualize the heterogeneous data types in their genomic context, generic methods for mapping transcript and protein sequences to genomic reference data sets have been developed. Additionally, an application for automated functional annotation of sets of ESTs, that statistically analyses the enrichment and depletion of functional modules has been established. To provide data integration for systems biology approaches, a tool for automated cross-referencing of entries from different databases has been designed. For that purpose lists of ambiguous biological terms were created, that will allow for enhanced text mining in biomedical science.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cross-mapping of gene and protein identifiers between different databases is a tedious and time-consuming task. To overcome this, we developed CRONOS, a cross-reference server that contains entries from five mammalian organisms presented by major gene and protein information resources. Sequence similarity analysis of the mapped entries shows that the cross-references are highly accurate. In total, up to 18 different identifier types can be used for identification of cross-references. The quality of the mapping could be improved substantially by exclusion of ambiguous gene and protein names which were manually validated. Organism-specific lists of ambiguous terms, which are valuable for a variety of bioinformatics applications like text mining are available for download. Availability: CRONOS is freely available to non-commercial users at http://mips.gsf.de/genre/proj/cronos/index.html, web services are available at http://mips.gsf.de/CronosWSService/CronosWS?wsdl. Contact: brigitte.waegele@helmholtz-muenchen.de Supplementary information: Supplementary data are available at Bioinformatics online. The online Supplementary Material contains all figures and tables referenced by this article.
    Bioinformatics 12/2008; 25(1):141-3. DOI:10.1093/bioinformatics/btn590 · 4.62 Impact Factor
  • Source
    Brigitte Waegele · Thorsten Schmidt · H Werner Mewes · Andreas Ruepp
    [Show abstract] [Hide abstract]
    ABSTRACT: The generation of expressed sequence tag (EST) libraries offers an affordable approach to investigate organisms, if no genome sequence is available. OREST (http://mips.gsf.de/genre/proj/orest/index.html) is a server-based EST analysis pipeline, which allows the rapid analysis of large amounts of ESTs or cDNAs from mammalia and fungi. In order to assign the ESTs to genes or proteins OREST maps DNA sequences to reference datasets of gene products and in a second step to complete genome sequences. Mapping against genome sequences recovers additional 13% of EST data, which otherwise would escape further analysis. To enable functional analysis of the datasets, ESTs are functionally annotated using the hierarchical FunCat annotation scheme as well as GO annotation terms. OREST also allows to predict the association of gene products and diseases by Morbid Map (OMIM) classification. A statistical analysis of the results of the dataset is possible with the included PROMPT software, which provides information about enrichment and depletion of functional and disease annotation terms. OREST was successfully applied for the identification and functional characterization of more than 3000 EST sequences of the common marmoset monkey (Callithrix jacchus) as part of an international collaboration.
    Nucleic Acids Research 08/2008; 36(Web Server issue):W140-4. DOI:10.1093/nar/gkn253 · 9.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes.
    Nucleic Acids Research 02/2008; 36(Database issue):D646-50. DOI:10.1093/nar/gkm936 · 9.11 Impact Factor