Article

IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis

Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth, USA. .
BMC Bioinformatics (Impact Factor: 2.58). 09/2012; 13 Suppl 15(Suppl 15):S7. DOI: 10.1186/1471-2105-13-S15-S7
Source: PubMed

ABSTRACT

Next-Generation Sequencing (NGS) technologies and Genome-Wide Association Studies (GWAS) generate millions of reads and hundreds of datasets, and there is an urgent need for a better way to accurately interpret and distill such large amounts of data. Extensive pathway and network analysis allow for the discovery of highly significant pathways from a set of disease vs. healthy samples in the NGS and GWAS. Knowledge of activation of these processes will lead to elucidation of the complex biological pathways affected by drug treatment, to patient stratification studies of new and existing drug treatments, and to understanding the underlying anti-cancer drug effects. There are approximately 141 biological human pathway resources as of Jan 2012 according to the Pathguide database. However, most currently available resources do not contain disease, drug or organ specificity information such as disease-pathway, drug-pathway, and organ-pathway associations. Systematically integrating pathway, disease, drug and organ specificity together becomes increasingly crucial for understanding the interrelationships between signaling, metabolic and regulatory pathway, drug action, disease susceptibility, and organ specificity from high-throughput omics data (genomics, transcriptomics, proteomics and metabolomics).
We designed the Integrated Pathway Analysis Database for Systematic Enrichment Analysis (IPAD, http://bioinfo.hsc.unt.edu/ipad), defining inter-association between pathway, disease, drug and organ specificity, based on six criteria: 1) comprehensive pathway coverage; 2) gene/protein to pathway/disease/drug/organ association; 3) inter-association between pathway, disease, drug, and organ; 4) multiple and quantitative measurement of enrichment and inter-association; 5) assessment of enrichment and inter-association analysis with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources; and 6) cross-linking of multiple available data sources.IPAD is a comprehensive database covering about 22,498 genes, 25,469 proteins, 1956 pathways, 6704 diseases, 5615 drugs, and 52 organs integrated from databases including the BioCarta, KEGG, NCI-Nature curated, Reactome, CTD, PharmGKB, DrugBank, PharmGKB, and HOMER. The database has a web-based user interface that allows users to perform enrichment analysis from genes/proteins/molecules and inter-association analysis from a pathway, disease, drug, and organ.Moreover, the quality of the database was validated with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources. Two case studies were also presented to demonstrate: 1) self-validation of enrichment analysis and inter-association analysis on brain-specific markers, and 2) identification of previously undiscovered components by the enrichment analysis from a prostate cancer study.
IPAD is a new resource for analyzing, identifying, and validating pathway, disease, drug, organ specificity and their inter-associations. The statistical method we developed for enrichment and similarity measurement and the two criteria we described for setting the threshold parameters can be extended to other enrichment applications. Enriched pathways, diseases, drugs, organs and their inter-associations can be searched, displayed, and downloaded from our online user interface. The current IPAD database can help users address a wide range of biological pathway related, disease susceptibility related, drug target related and organ specificity related questions in human disease studies.

Download full-text

Full-text

Available from: Renee Drabier
  • Source
    • "The inferred regulatory network for IL15 is shown in Additional file 7: Figure S2. Using the Integrated Pathway Analysis Database (IPAD) for Systematic Enrichment Analysis[53], we found genesinvolved in the immune system (pathway ID: 168256) (q = 0.013) and metabolism (pathway ID: 1430728) (q = 0.066) were enriched among the DEGs. IPAD-based disease-associated gene enrichment analysis suggested genes involved in several diseases were overrepresented among DEGs (q < 1E-05), including taste disorder, eating disorder, anorexia, hyperphagia, obesity, insulin resistance , mitochondrial diseases and lymphocytosis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Improving feed efficiency (FE) of pigs by genetic selection is of economic and environmental significance. An increasingly accepted measure of feed efficiency is residual feed intake (RFI). Currently, the molecular mechanisms underlying RFI are largely unknown. Additionally, to incorporate RFI into animal breeding programs, feed intake must be recorded on individual pigs, which is costly and time-consuming. Thus, convenient and predictive biomarkers for RFI that can be measured at an early age are greatly desired. In this study, we aimed to explore whether differences exist in the global gene expression profiles of peripheral blood of 35 to 42 day-old pigs with extremely low (more efficient) and high RFI (less efficient) values from two lines that were divergently selected for RFI during the grow-finish phase, to use such information to explore the potential molecular basis of RFI differences, and to initiate development of predictive biomarkers for RFI. Results: We identified 1972 differentially expressed genes (DEGs) (q ≤ 0.15) between the low (n = 15) and high (n = 16) RFI groups of animals by using RNA sequencing technology. We validated 24 of 37 selected DEGs by reverse transcription-quantitative PCR (RT-qPCR) in a joint analysis of 24 (12 per line) of the 31 samples already used for RNA-seq plus 24 (12 per line) novel samples from the same contemporary group of pigs. Using an analysis of the 24 novel samples alone, only nine of the 37 selected DEGs were validated. Genes involved in small molecule biosynthetic process, antigen processing and presentation of peptide antigen via major histocompatibility complex (MHC) class I, and steroid biosynthetic process were overrepresented among DEGs that had higher expression in the low versus high RFI animals. Genes known to function in the proteasome complex or mitochondrion were also significantly enriched among genes with higher expression in the low versus high RFI animals. Alternatively, genes involved in signal transduction, bone mineralization and regulation of phosphorylation were overrepresented among DEGs with lower expression in the low versus high RFI animals. The DEGs significantly overlapped with genes associated with disease, including hyperphagia, eating disorders and mitochondrial diseases (q < 1E-05). A weighted gene co-expression network analysis (WGCNA) identified four co-expression modules that were differentially expressed between the low and high RFI groups. Genes involved in lipid metabolism, regulation of bone mineralization, cellular immunity and response to stimulus were overrepresented within the two modules that were most significantly differentially expressed between the low and high RFI groups. We also found five of the DEGs and one of the co-expression modules were significantly associated with the RFI phenotype of individual animals (q < 0.05). Conclusions: The post-weaning blood transcriptome was clearly different between the low and high RFI groups. The identified DEGs suggested potential differences in mitochondrial and proteasomal activities, small molecule biosynthetic process, and signal transduction between the two RFI groups and provided potential new insights into the molecular basis of RFI in pigs, although the observed relationship between the post-weaning blood gene expression and RFI phenotype measured during the grow-finish phase was not strong. DEGs and representative genes in co-expression modules that were associated with RFI phenotype provide a preliminary list for developing predictive biomarkers for RFI in pigs.
    Preview · Article · Dec 2016 · BMC Genomics
  • Source
    • "Pathway analysis is performed using the following databases: Integrated Pathway Analysis Database (IPAD) http://bioinfo.hsc.unt.edu/ipad/[20]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection. In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set). Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.
    Full-text · Article · Dec 2013 · BMC proceedings
  • Source
    • "The Integrated Pathway Analysis Database (IPAD) (http://bioinfo.hsc.unt.edu/ipad/) [17] is used for pathway analysis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Detecting breast cancer at early stages can be challenging. Traditional mammography and tissue microarray that have been studied for early breast cancer detection and prediction have many drawbacks. Therefore, there is a need for more reliable diagnostic tools for early detection of breast cancer due to a number of factors and challenges. In the paper, we presented a five-marker panel approach based on SVM for early detection of breast cancer in peripheral blood and show how to use SVM to model the classification and prediction problem of early detection of breast cancer in peripheral blood. We found that the five-marker panel can improve the prediction performance (area under curve) in the testing data set from 0.5826 to 0.7879. Further pathway analysis showed that the top four five-marker panels are associated with signaling, steroid hormones, metabolism, immune system, and hemostasis, which are consistent with previous findings. Our prediction model can serve as a general model for multibiomarker panel discovery in early detection of other cancers.
    Full-text · Article · Nov 2013
Show more