A human functional protein interaction network and its application to cancer data analysis

Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada.
Genome biology (Impact Factor: 10.47). 05/2010; 11(5):R53. DOI: 10.1186/gb-2010-11-5-r53
Source: PubMed

ABSTRACT One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system.
We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers.
We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The interplay between genetic and epigenetic variation is only partially understood. One form of epigenetic variation is methylation at CpG sites, which can be measured as methylation quantitative trait loci (meQTL). Here we report that in a panel of lymphocytes from 1,748 individuals, methylation levels at 1,919 CpG sites are correlated with at least one distal (trans) single-nucleotide polymorphism (SNP) (Po3.2 Â 10 À 13 ; FDRo5%). These trans-meQTLs include 1,657 SNP–CpG pairs from different chromosomes and 262 pairs from the same chromosome that are 41 Mb apart. Over 90% of these pairs are replicated (FDRo5%) in at least one of two independent data sets. Genomic loci harbouring trans-meQTLs are significantly enriched (Po0.001) for long non-coding transcripts (2.2-fold), known epigenetic regulators (2.3-fold), piwi-interacting RNA clusters (3.6-fold) and curated transcription factors (4.1-fold), including zinc-finger proteins (8.75-fold). Long-range epigenetic networks uncovered by this approach may be relevant to normal and disease states.
    Nature Communications 02/2015; 6. DOI:10.1038/ncomms7326 · 10.74 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The successful determination of reliable protein interaction networks (PINs) in several species in the post-genomic era has hitherto facilitated the quest to understanding systems and structural properties of such networks. It is envisaged that a clearer understanding of their intrinsic topological properties would elucidate evolutionary and biological topography of organisms. This, in turn, may inform the understanding of diseases' aetiology. By analysing sub-networks that are induced in various layers identified by zones defined as distance from central proteins, we show that zones of human PINs display self-similarity patterns. What is observed at a global level is repeated at lower levels of inducement. Furthermore, it is observed that these levels of strength point to refinement and specialisations in these layers. This may point to the fact that various levels of representations in the self-similarity phenomenon offer a way of measuring and distinguishing the importance of proteins in the network. To consolidate our findings, we have also considered a gene co-expression network and a class of gene regulatory networks in the same framework. In all cases, the phenomenon is significantly evident. In particular, the truly unbiased regulatory networks show finer level of articulation of self-similarity.
    Scientific Reports 02/2015; 5:7628. DOI:10.1038/srep07628 · 5.08 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a machine learning approach for assessing the reliability of protein–protein interactions in a high-throughput dataset. We use an alternating decision tree algorithm to distinguish true interacting protein pairs from noisy high-throughput data using various biological attributes of interacting proteins. The alternating decision tree algorithm is used both for identifying discriminating biological features that could be used for assessing protein interaction reliability and for constructing a classifier to identify true positive interacting pairs. Experimental results show that the proposed approach has a good performance in distinguishing true interacting protein pairs from noisy protein–protein interaction data. Moreover, our alternating decision tree classifier supplemented with domain knowledge may be helpful to understand the biological conditions in connection with interacting protein pairs.
    08/2014; 1(3):169-178. DOI:10.1007/s40595-014-0018-5

Preview (2 Sources)

Available from