A human functional protein interaction network and its application to cancer data analysis

Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada.
Genome biology (Impact Factor: 10.81). 05/2010; 11(5):R53. DOI: 10.1186/gb-2010-11-5-r53
Source: PubMed


One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system.
We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers.
We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases.

Download full-text


Available from: PubMed Central · License: CC BY
  • Source
    • "PARADIGM (Vaske et al., 2010) is a method for pathway analysis of heterogenic data, which goes a step further by incorporating information on the type of interaction between elements of a pathway. It enables the analysis of low frequency variations in cross-platform datasets thereby supporting the analysis in situations where the disease in each person, or subgroup of people, is caused by different types of variations (for additional methods for integrative network analysis, seeCerami et al., 2010;Wu et al., 2010;Ciriello et al., 2012). Creating a tool that incorporates biological knowledge in the analysis process is highly dependent on the availability of reliable, maintained, and well-structured databases. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The development and progression of cancer, a collection of diseases with complex genetic architectures, is facilitated by the interplay of multiple etiological factors. This complexity challenges the traditional single-platform study design and calls for an integrated approach to data analysis. However, integration of heterogeneous measurements of biological variation is a non-trivial exercise due to the diversity of the human genome and the variety of output data formats and genome coverage obtained from the commonly used molecular platforms. This review article will provide an introduction to integration strategies used for analyzing genetic risk factors for cancer. We critically examine the ability of these strategies to handle the complexity of the human genome and also accommodate information about the biological and functional interactions between the elements that have been measured – making the assessment of disease risk against a composite genomic factor possible. The focus of this review is to provide an overview and introduction to the main strategies and to discuss where there is a need for further development.
    Preview · Article · Feb 2016 · Frontiers in Genetics
  • Source
    • "We used Cytoscapev_2.8.1 and Cytoscape v_3.1.1 (Shannon et al. 2003)to perform network parametric analysis and importing several interaction types of networks via online data resources (Csermely et al. 2013). ReactomeFIPlugin (Wu et al., 2010)was our main software application for modularity measurements.Shannon et al. 2003); Most of the observed natural, biological and social networks follow power law degree distributions P(k) = γ -k with P being the probability for a node to have k edges. The Barabasi-Albert algorithm is the first proposed generative model scale-free networks acting by the rule of preferential attachment: each new vertex has an higher probability to be linked to a vertex that already has a large number of connections (Barabasi and Albert, 1999) A c c e p t e d M a n u s c r i p t, Ur Alon Lab, 2014);Due to importance of neural networks and brain structural and functional mapping, investigation of multi-scale properties of brain graphs and imaging has attracted much attention. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The definition of general topological principles allowing for graph characterization is an important pre-requisite for investigating structure-function relationships in biological networks. Here we approached the problem by means of an explorative, data-driven strategy, building upon a size-balanced data set made of around 200 distinct biological networks from seven functional classes and simulated networks coming from three mathematical graph models. A clear link between topological structure and biological function did emerge in terms of class membership prediction (Average 67% of correct predictions, p<0.0001) with a varying degree of 'peculiarity' across classes going from a very low (25%) recognition efficiency for neural and brain networks to the extremely high (80%) peculiarity of amino acid-amino acid interaction (AAI) networks. We recognized four main dimensions (principal components) as main organization principles of biological networks. These components allowed for an efficient description of network architectures and for the identification of 'not-physiological' (in this case cancer metabolic networks acting as test set) wiring patterns. We highlighted as well the need of developing new theoretical generative models for biological networks overcoming the limitations of present mathematical graph idealizations.
    Full-text · Article · Feb 2016 · Bio Systems
  • Source
    • "In addition to sequence similarity, the cross and overlap of networks contribute to screening for significant genes involved in the target phenotype (Luo and Liang 2014). Any gene or protein must participate in a biological system network to fulfill its specific role (Wu et al. 2010). Therefore, an algorithm based on network interactions is essential to determine the molecular mechanisms behind the clustered phenotypes and predict similar functional genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies of protein phenotypes represent a central challenge of modern genetics in the post-genome era because effective and accurate investigation of protein phenotypes is one of the most critical procedures to identify functional biological processes in microscale, which involves the analysis of multifactorial traits and has greatly contributed to the development of modern biology in the post genome era. Therefore, we have developed a novel computational method that identifies novel proteins associated with certain phenotypes in yeast based on the protein-protein interaction network. Unlike some existing network-based computational methods that identify the phenotype of a query protein based on its direct neighbors in the local network, the proposed method identifies novel candidate proteins for a certain phenotype by considering all annotated proteins with this phenotype on the global network using a shortest path (SP) algorithm. The identified proteins are further filtered using both a permutation test and their interactions and sequence similarities to annotated proteins. We compared our method with another widely used method called random walk with restart (RWR). The biological functions of proteins for each phenotype identified by our SP method and the RWR method were analyzed and compared. The results confirmed a large proportion of our novel protein phenotype annotation, and the RWR method showed a higher false positive rate than the SP method. Our method is equally effective for the prediction of proteins involving in all the eleven clustered yeast phenotypes with a quite low false positive rate. Considering the universality and generalizability of our supporting materials and computing strategies, our method can further be applied to study other organisms and the new functions we predicted can provide pertinent instructions for the further experimental verifications.
    Full-text · Article · Jan 2016 · Molecular Genetics and Genomics
Show more