Mining Alzheimer Disease Relevant Proteins from Integrated Protein Interactome Data

Indiana University School of Informatics, Purdue University School of Science, Dept. of Computer and Information Science Indianapolis, IN 46202, USA.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 02/2006; 11:367-78. DOI: 10.1142/9789812701626_0034
Source: PubMed


Huge unrealized post-genome opportunities remain in the understanding of detailed molecular mechanisms for Alzheimer Disease (AD). In this work, we developed a computational method to rank-order AD-related proteins, based on an initial list of AD-related genes and public human protein interaction data. In this method, we first collected an initial seed list of 65 AD-related genes from the OMIM database and mapped them to 70 AD seed proteins. We then expanded the seed proteins to an enriched AD set of 765 proteins using protein interactions from the Online Predicated Human Interaction Database (OPHID). We showed that the expanded AD-related proteins form a highly connected and statistically significant protein interaction sub-network. We further analyzed the sub-network to develop an algorithm, which can be used to automatically score and rank-order each protein for its biological relevance to AD pathways(s). Our results show that functionally relevant AD proteins were consistently ranked at the top: among the top 20 of 765 expanded AD proteins, 19 proteins are confirmed to belong to the original 70 AD seed protein set. Our method represents a novel use of protein interaction network data for Alzheimer disease studies and may be generalized for other disease areas in the future.

Download full-text


Available from: Jake Chen,
  • Source
    • "We calculated the distribution of biomarkers that surrounds the known disease genes. The sub-network was expanded by the nearest neighbor expandison (NNE) method (Chen et al., 2006). We also calculated the coverage of potential leukemia biomarkers, listed in Table 2 and Table 3. Fig. 3 is the Venn diagram of the overlap among the genes in leukemia disease gene expansion network, the genes in the top 5 related disease genes expansion network and the experiential leukemia biomarker set. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A central focus of clinical proteomics for cancer is to identify protein biomarkers with diagnostic and therapeutic application potential. Network-based analyses have been used in computational disease-related gene prioritisation for several years. The Random Walk Ranking RWR algorithm has been successfully applied to prioritising disease-related gene candidates by exploiting global network topology in a Protein-Protein Interaction PPI network. Increasing the specificity and sensitivity of biomarkers may require consideration of similar or closely-related disease phenotypes and molecular pathological mechanisms shared across different disease phenotypes. In this paper, we propose a method called Seed-Weighted Random Walk Ranking SW-RWR for prioritizing cancer biomarker candidates. This method uses the information of cancer phenotype association to assign to each gene a disease-specific, weighted value to guide the RWR algorithm in a global human PPI network. In a case study of prioritizing leukaemia biomarkers, SW-RWR outperformed a typical local network-based analysis in coverage and also showed better accuracy and sensitivity than the original RWR method global network-based analysis. Our results suggest that the tight correlation among different cancer phenotypes could play an important role in cancer biomarker discovery.
    International Journal of Data Mining and Bioinformatics 02/2014; 9(2):135-148. DOI:10.1504/IJDMB.2014.059064 · 0.50 Impact Factor
  • Source
    • "Several investigators have examined network-based methods for gene prioritization. One of the earliest application of network-based gene prioritization was to rank each protein in the Online Predicted Human Interaction Database (OPHID) according to the protein's association with Alzheimer's disease [7]. Any gene which directly interacted with a known gene on the PPIN was considered to be a candidate gene – this is known as a “nearest neighbor” based approach. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Candidate gene prioritization aims to identify promising new genes associated with a disease or a biological process from a larger set of candidate genes. In recent years, network-based methods - which utilize a knowledge network derived from biological knowledge - have been utilized for gene prioritization. Biological knowledge can be encoded either through the network's links or nodes. Current network-based methods can only encode knowledge through links. This paper describes a new network-based method that can encode knowledge in links as well as in nodes. We developed a new network inference algorithm called the Knowledge Network Gene Prioritization (KNGP) algorithm which can incorporate both link and node knowledge. The performance of the KNGP algorithm was evaluated on both synthetic networks and on networks incorporating biological knowledge. The results showed that the combination of link knowledge and node knowledge provided a significant benefit across 19 experimental diseases over using link knowledge alone or node knowledge alone. The KNGP algorithm provides an advance over current network-based algorithms, because the algorithm can encode both link and node knowledge. We hope the algorithm will aid researchers with gene prioritization.
    PLoS ONE 11/2013; 8(11):e79564. DOI:10.1371/journal.pone.0079564 · 3.23 Impact Factor
  • Source
    • "These interactions are assigned a high confidence score of 0 . 9 ( Chen , 2006 ) . "
    [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of this study was to construct a protein-protein interaction (PPI) network related to oral squamous cell carcinoma (OSCC). Each protein was ranked and those most associated with OSCC were mined within the network. First, OSCC-related genes were retrieved from the Online Mendelian Inheritance in Man (OMIM) database. Then they were mapped to their protein identifiers and a seed set of proteins was built. The seed proteins were expanded using the nearest neighbor expansion method to construct a PPI network through the Online Predicated Human Interaction Database (OPHID). The network was verified to be statistically significant, the score of each protein was evaluated by algorithm, then the OSCC-related proteins were ranked. 38 OSCC related seed proteins were expanded to 750 protein pairs. A protein-protein interaction nerwork was then constructed and the 30 top-ranked proteins listed. The four highest-scoring seed proteins were SMAD4, CTNNB1, HRAS, NOTCH1, and four non-seed proteins P53, EP300, SMAD3, SRC were mined using the nearest neighbor expansion method. The methods shown here may facilitate the discovery of important OSCC proteins and guide medical researchers in further pertinent studies.
    Asian Pacific journal of cancer prevention: APJCP 08/2013; 14(8):4621-4625. DOI:10.7314/APJCP.2013.14.8.4621 · 2.51 Impact Factor
Show more