Mining Alzheimer disease relevant proteins from integrated protein interactome data.

Indiana University School of Informatics, Purdue University School of Science, Dept. of Computer and Information Science Indianapolis, IN 46202, USA.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 02/2006; DOI: 10.1142/9789812701626_0034
Source: PubMed

ABSTRACT Huge unrealized post-genome opportunities remain in the understanding of detailed molecular mechanisms for Alzheimer Disease (AD). In this work, we developed a computational method to rank-order AD-related proteins, based on an initial list of AD-related genes and public human protein interaction data. In this method, we first collected an initial seed list of 65 AD-related genes from the OMIM database and mapped them to 70 AD seed proteins. We then expanded the seed proteins to an enriched AD set of 765 proteins using protein interactions from the Online Predicated Human Interaction Database (OPHID). We showed that the expanded AD-related proteins form a highly connected and statistically significant protein interaction sub-network. We further analyzed the sub-network to develop an algorithm, which can be used to automatically score and rank-order each protein for its biological relevance to AD pathways(s). Our results show that functionally relevant AD proteins were consistently ranked at the top: among the top 20 of 765 expanded AD proteins, 19 proteins are confirmed to belong to the original 70 AD seed protein set. Our method represents a novel use of protein interaction network data for Alzheimer disease studies and may be generalized for other disease areas in the future.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. Our technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techniques. It can particularly identify "rare" activation/inhibition relationship events that can be difficult to find in experimental results. In SLDR, we model each candidate target gene as being controlled by N binary-state regulators that lead to ≤2N observable states ("step-levels") for the target. We applied SLDR to the study of the GEO microarray data set GSE25644, which consists of 158 different mutant S. cerevisiae gene expressional profiles. For each target gene t, we first clustered ordered samples into various clusters, each approximating an observable step-level of t to screen out the "de-centric" target. Then, we ordered each gene x as a candidate regulator and aligned t to x for the purpose of examining the step-level correlations between low expression set of x (Ro) and high expression set of x (Rh) from the regulator x to t, by finding max f(t, x): |Ro-Rh| over all candidate × in the genome for each t. We therefore obtained activation and inhibitions events from different combinations of Ro and Rh. Furthermore, we developed criteria for filtering out less-confident regulators, estimated the number of regulators for each target t, and evaluated identified top-ranking regulator-target relationship. Our results can be cross-validated with the Yeast Fitness database. SLDR is also computationally efficient with o(N2) complexity. In summary, we believe SLDR can be applied to the mining of functional genomics big data for future network biology and network medicine applications.
    BMC Bioinformatics 10/2014; 15(S11):S1. · 2.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of candidate molecular entities involved in a specific disease has been a primary focus of cancer study on biomarker discovery. Prioritizing proteins from a disease-specific protein-protein interaction (PPI) network has become an efficient computational strategy for cancer biomarker discovery. Although some successful methods, such as random walk ranking (RWR) algorithm, can exploit global network topology to prioritize proteins, this network-based computational strategy still needs more comprehensive prior knowledge, like genome-wide association study (GWAS), to improve its discovering capability. In this paper, we first analyzed genome-wide association loci for human diseases, and built disease association networks (DAN), whose associations were defined by two diseases sharing common genetic variants. Then we assigned each node in a human PPI network a disease-specific weight, based on knowledge from the DANs and text mining. Finally, we presented a seed-weighted random walk ranking (SW-RWR) method to prioritize biomarkers in the global human PPI network. We used a lung cancer case study to show that our ranking strategy has better accuracy and sensitivity in discovering potential clinically-useful; biomarkers than a similar network-based ranking method. This result suggests that close association among different diseases could play an important role in biomarker discovery.
    Proceedings of the International Symposium on Biocomputing; 02/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: e developed a new computational technique to identify de-centric ge-netic regulatory relationship candidates. Our technique takes advantages of functional genomics data for the same species under different perturbation con-ditions, therefore making it complementary to current computational techniques including database search, clustering of gene expression profiles, motif match-ing, structural modeling, and network effect simulation methods. It is fast and addressed the need of biologists to determine activation/inhibition relationship details often missing in synthetic lethality or chip-seq experiments. We used GEO microarray data set GSE25644 with 158 different mutant genes in S. cere-visiae. We screened out 83 targets with 610 activation pairs and 93 targets with 494 inhibition pairs. In the Yeast Fitness database, 33 targets (40%) with 126 activation pairs and 31 targets (33%) with 97 inhibition pairs were identified. To be identified further are 50 targets with 484 activation pairs and 62 targets with 397 inhibition pairs. The aggregation test confirmed that all discovered de-centric regulatory relationships are significant from random discovery at a p-value=0.002; therefore, this method is highly complementary to others that tend to discover hub-related regulatory relationships. We also developed criteria for rejecting genetic regulator candidates x as a candidate regulator and assessing the ranking of the regulator-target relationship identified. The top 10 high suspected regulators determined by our criteria were found to be significant, pending future experimental verifications.
    International Symposium on Bioinformatics Research and Applications, Zhangjiajie, China; 06/2014

Full-text (2 Sources)

Available from
May 22, 2014