Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA

Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany.
Bioinformatics (Impact Factor: 4.62). 09/2011; 27(20):2917-8. DOI: 10.1093/bioinformatics/btr499
Source: PubMed

ABSTRACT Pathway-level analysis is a powerful approach enabling interpretation of post-genomic data at a higher level than that of individual biomolecules. Yet, it is currently hard to integrate more than one type of omics data in such an approach. Here, we present a web tool 'IMPaLA' for the joint pathway analysis of transcriptomics or proteomics and metabolomics data. It performs over-representation or enrichment analysis with user-specified lists of metabolites and genes using over 3000 pre-annotated pathways from 11 databases. As a result, pathways can be identified that may be disregulated on the transcriptional level, the metabolic level or both. Evidence of pathway disregulation is combined, allowing for the identification of additional pathways with changed activity that would not be highlighted when analysis is applied to any of the functional levels alone. The tool has been implemented both as an interactive website and as a web service to allow a programming interface.
The web interface of IMPaLA is available at A web services programming interface is provided at;;
Supplementary data are available at Bioinformatics online.

Download full-text


Available from: Atanas Kamburov, Jul 28, 2015
  • Source
    • "The availability of these massive data ideally allows for complex and detailed modeling of the underlying biological system but they also pose a serious potential multiple testing problem because the number of covariates are typically orders of magnitude larger than the number of observations. Recently, investigators are starting to combine several of these high-dimensional dataset which has increased the demand for analysis methods that accommodates these vast data (Nie et al., 2006; Brink-Jensen et al., 2013; Su et al., 2011; Kamburov et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These methods identify and rank variables of importance but do not generally provide any inference of the selected variables. Thus, the variables selected might be the "most important" but need not be significant. We propose a significance test for the selection found by the Lasso. We introduce a procedure that computes inference and p-values for features chosen by the Lasso. This method rephrases the null hypothesis and uses a randomization approach which ensures that the error rate is controlled even for small samples. We demonstrate the ability of the algorithm to compute $p$-values of the expected magnitude with simulated data using a multitude of scenarios that involve various effects strengths and correlation between predictors. The algorithm is also applied to a prostate cancer dataset that has been analyzed in recent papers on the subject. The proposed method is found to provide a powerful way to make inference for feature selection even for small samples and when the number of predictors are several orders of magnitude larger than the number of observations. The algorithm is implemented in the MESS package in R and is freely available.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Citation: Taboureau O, Hersey A, Audouze K, Gautier L, Jacobsen UP, et al. (2012) Toxicogenomics Investigation Under the eTOX Project. J Pharmacogenom Pharmacoproteomics S7:001. Copyright: © 2012 Taboureau O, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. The current cost to bring a drug candidate to market is estimated to US $1.8 billion, with an average success rate of 8% [1]. However, there has been a significant decrease in the development of new and effective drugs and one of the most important reasons for attrition was due to clinical side effects and toxicity. Interestingly, since the advent of DNA microarray technology (15 years ago), the field of toxicology started to discuss the great potential of genome-wide expression profiling for toxicity testing: the promise is that the mechanism of action of a chemical at the cellular level, thus the risk of chemical toxicity, can be identified through the transcriptional activity of cells. The keyword toxicogenomics was coined to identify the systematic approach. Moreover, at the molecular level, as the human and rodent genome exhibit more than 90% similarity, toxicogenomics could be of benefit for the extrapolation of toxic effects between species. A similar argument applies for extrapolating in-vivo effects from in-vitro experiments, although most often different parameters are measured in both experiments [2]. Over the last decade, a number of toxicogenomics studies have been performed taking advantage of the maturity of the microarray technology, and we consider that technology for expression profiling as an indicator at how the concept is gaining adoption. Looking on the number of references mentioning "gene expression" in the PubMed database, we can observe that microarray technology is not applied solely to toxicology but the method allows study of the global transcriptional changes of a given biological system in response to any stress perturbation. The "toxicogenomics" field was really investigated from 2004 when gene expression experiments of drugs and toxicants started to be publicly available (Figure 1). Toxicogenomics has proven to be useful in toxicology [3,4]. For example in carcinogenicity, gene expression profiling at early time points accurately predicted non-genotoxic carcinogenesis and hepatocarcinogenicity [5,6]. Toxicogenomics was also of relevance to evaluate the potential immunotoxicity of small interfering RNAs (siRNAs) considered for potential therapeutic application [7]. Compounds inducing similar gene expression profiles to known model toxicants can be identified as putatively toxic based on the common mechanisms of response at the molecular level. Nonetheless,
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
    BMC Bioinformatics 05/2012; 13:102. DOI:10.1186/1471-2105-13-102 · 2.67 Impact Factor
Show more