Constructing a robust protein-protein interaction network by integrating multiple public databases

Department of Information Science, University of Arkansas at Little Rock, 2801 S, University Ave, Little Rock, AR 72204-1099, USA.
BMC Bioinformatics (Impact Factor: 2.67). 10/2011; 12 Suppl 10(Suppl 10):S7. DOI: 10.1186/1471-2105-12-S10-S7
Source: PubMed

ABSTRACT Protein-protein interactions (PPIs) are a critical component for many underlying biological processes. A PPI network can provide insight into the mechanisms of these processes, as well as the relationships among different proteins and toxicants that are potentially involved in the processes. There are many PPI databases publicly available, each with a specific focus. The challenge is how to effectively combine their contents to generate a robust and biologically relevant PPI network.
In this study, seven public PPI databases, BioGRID, DIP, HPRD, IntAct, MINT, REACTOME, and SPIKE, were used to explore a powerful approach to combine multiple PPI databases for an integrated PPI network. We developed a novel method called k-votes to create seven different integrated networks by using values of k ranging from 1-7. Functional modules were mined by using SCAN, a Structural Clustering Algorithm for Networks. Overall module qualities were evaluated for each integrated network using the following statistical and biological measures: (1) modularity, (2) similarity-based modularity, (3) clustering score, and (4) enrichment.
Each integrated human PPI network was constructed based on the number of votes (k) for a particular interaction from the committee of the original seven PPI databases. The performance of functional modules obtained by SCAN from each integrated network was evaluated. The optimal value for k was determined by the functional module analysis. Our results demonstrate that the k-votes method outperforms the traditional union approach in terms of both statistical significance and biological meaning. The best network is achieved at k = 2, which is composed of interactions that are confirmed in at least two PPI databases. In contrast, the traditional union approach yields an integrated network that consists of all interactions of seven PPI databases, which might be subject to high false positives.
We determined that the k-votes method for constructing a robust PPI network by integrating multiple public databases outperforms previously reported approaches and that a value of k=2 provides the best results. The developed strategies for combining databases show promise in the advancement of network construction and modeling.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective: In the context of “network medicine”, gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a a systematic comparison of different network integration methods for gene prioritization. Materials and Methods: We collected 9 different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. Results: The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different “informativeness” embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Conclusions: Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
    Artificial intelligence in medicine 06/2014; DOI:10.1016/j.artmed.2014.03.003 · 1.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chinese medicine has been widely used in clinical practice, but its mode of action often remains obscure. This has seriously hindered further development and better clinical applications of Chinese medicine. Among the most critical questions to be addressed, the identification of active ingredients is an important one requiring more research. Existing methods are only concerned the potential pharmacological effects of the individual purified chemical ingredients without consideration of the contents of these ingredients, which is critical to the comprehensive effect of Chinese medicine. A novel approach was proposed here to integrate network pharmacology analysis and ingredient content in Chinese medicine to identify active ingredients. The therapeutic action of Xuesaitong (XST) injection on myocardial infarction was analyzed as an example in this study. Firstly, we built a cardiovascular disease (CVD) related protein-protein interaction (PPI) network. Secondly, the potential targets of the ingredients of XST were identified by integrating microarray data, text mining and pharmacophore model-based prediction. The target-ingredient relationships were then mapped to the network. Topological attributes related to the targets of these ingredients, together with the ingredients' contents, were combined to calculate a composition-weighted index for integrative evaluation of ingredient efficacy. Our results indicated that major active ingredients in XST were notoginsenoside R1, ginsenoside Rg1, Rb1, Rd and Re, which was further validated on myocardial infarction rat models. In conclusion, this study presented a novel approach to identify active ingredients in Chinese medicine.
    Molecular BioSystems 04/2014; 10(7). DOI:10.1039/c3mb70581a · 3.35 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Transgenerational inheritance of environment induced phenotype requires transmission of epigenetic information through the germline. Whereas several epigenetic factors have been implicated in germline transmission, mediators of information transfer from soma to the germline remain unidentified in mammals. Notably, a recent bioinformatic analysis showed association of extracellular microRNAs (miRNAs) and altered transcriptomes in diverse instances of mammalian epigenetic inheritance involving different environmental factors, tissues, life cycle stages, generations and genders. It was predicted that regulatory non-coding RNAs (ncRNAs) may potentially mediate soma to germline information transfer. Remarkably, the present bioinformatic evidence suggests similar association of exosomal mRNAs and proteins. The differentially expressed genes reported previously in genome level expression profiling studies related to or relevant in epigenetic inheritance showed enrichment for documented set of exosomal mRNAs in a few instances of epigenetic inheritance and of exosomal proteins in a majority of instances. Differentially expressed genes encoding exosomal miRNAs and proteins, directly or indirectly through first and/or second degree interactome networks, overrepresented biological processes related to environmental factors used in these instances as well as to epigenetic alterations, especially chromatin and histone modifications. These findings predict that exosomal mRNAs and proteins, like extracellular miRNAs, may also potentially mediate soma to germline information transfer. A convergent conceptual model is presented wherein extracellular ncRNAs/miRNA, mRNAs and proteins provide the much needed continuum inclusive of epigenetic inheritance. The phrase “transgenerational systems biology” is introduced to signify that the realm of systems biology extends beyond an individual organism and encompasses generations.
    Journal of Theoretical Biology 09/2014; 357:143–149. DOI:10.1016/j.jtbi.2014.05.019 · 2.35 Impact Factor

Full-text (2 Sources)

Available from
May 22, 2014