A human functional protein interaction network and its application to cancer data analysis

Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada.
Genome biology (Impact Factor: 10.47). 05/2010; 11(5):R53. DOI: 10.1186/gb-2010-11-5-r53
Source: PubMed

ABSTRACT One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system.
We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers.
We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Neurodegenerative diseases in general and specifically late-onset Alzheimer's disease (LOAD) involve a genetically complex and largely obscure ensemble of causative and risk factors accompanied by complex feedback responses. The advent of "high-throughput" transcriptome investigation technologies such as microarray and deep sequencing is increasingly being combined with sophisticated statistical and bioinformatics analysis methods complemented by knowledge-based approaches such as Bayesian Networks or network and graph analyses. Together, such "integrative" studies are beginning to identify co-regulated gene networks linked with biological pathways and potentially modulating disease predisposition, outcome, and progression. Specifically, bioinformatics analyses of integrated microarray and genotyping data in cases and controls reveal changes in gene expression of both protein-coding and small and long regulatory RNAs; highlight relevant quantitative transcriptional differences between LOAD and non-demented control brains and demonstrate reconfiguration of functionally meaningful molecular interaction structures in LOAD. These may be measured as changes in connectivity in "hub nodes" of relevant gene networks (Zhang etal., 2013). We illustrate here the open analytical questions in the transcriptome investigation of neurodegenerative disease studies, proposing "ad hoc" strategies for the evaluation of differential gene expression and hints for a simple analysis of the non-coding RNA (ncRNA) part of such datasets. We then survey the emerging role of long ncRNAs (lncRNAs) in the healthy and diseased brain transcriptome and describe the main current methods for computational modeling of gene networks. We propose accessible modular and pathway-oriented methods and guidelines for bioinformatics investigations of whole transcriptome next generation sequencing datasets. We finally present methods and databases for functional interpretations of lncRNAs and propose a simple heuristic approach to visualize and represent physical and functional interactions of the coding and non-coding components of the transcriptome. Integrating in a functional and integrated vision coding and ncRNA analyses is of utmost importance for current and future analyses of neurodegenerative transcriptomes.
    Frontiers in Cellular Neuroscience 03/2014; 8:89. DOI:10.3389/fncel.2014.00089 · 4.18 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Over recent years, with the advances in next-generation sequencing, a large number of cancer mutations have been identified and accumulated in public repositories. Coupled to this is our increased ability to generate detailed interactome maps that help to enrich our knowledge of the biological implications of cancer mutations. As a result, network analysis approaches have become an invaluable tool to predict and interpret mutations that are associated with tumour survival and progression. Our understanding of cancer mechanisms is further enhanced by mapping protein structure information to such networks. Here we review the current methodologies for annotating the functional impacts of cancer mutations, which range from analysis of protein structures to protein-protein interaction network studies.
    Seminars in Cancer Biology 05/2013; 23(4). DOI:10.1016/j.semcancer.2013.05.002 · 9.14 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Proteomics is a rapidly emerging frontier in post-genomics medicine and biology, but the quantitative analysis and validation of proteomic data are in need of further improvements. Before selecting potential candidate proteomic biomarkers, it is important to understand the broader context of how biological processes are regulated under different conditions or in different phenotypes. The enrichment of proteomic data consists of extracting as much biological meaning as possible from curated, pathway-based, functional protein interaction networks. Currently, most of the enrichment tools are intended for microarray data and require parametric data, whereas proteomic data are often nonparametric. In this study, we aimed to select a suite of interactive tools that can enrich proteomic results with a graphical overview. This facilitated diagnosis and interpretation prior to further analysis. From a list of proteins, a network was constructed using a map of the most severely disrupted biological process, and the disease entity was then identified on the basis of clinical data. Taken together, this graphical and interactive method ranks potential proteins via functional analysis in order to improve the choice of biomarkers for validation with the following advantages: 1) It adds neighbor proteins that are not selected by mass spectrometry analysis, but could in fact be key proteins; 2) pinpoints the biological process most often involved; and 3) predicts the most likely disease on the basis of clinical data.
    Omics: a journal of integrative biology 05/2013; DOI:10.1089/omi.2012.0084 · 2.73 Impact Factor

Preview (2 Sources)

Available from