Article

CoreAlign: Core-based Global Alignment for Protein-Protein Interaction Networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Biological network alignment aims to find similar functional and topological regions to guide the transfer of biological knowledge of cellular functioning from known, well-studied species to unknown ones. The proposed aligner (CoreAlign) relays on the structural of the Protein-Protein Interactions (PPI) network by using network decomposition of what is called shells or internal network cores. The proposed aligner searches the space of each core to build the Alignment. CoreAlign has been compared with many aligners and it has competitive results among these aligners in either topological or biological measures.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Motivation: As an increasing amount of protein–protein interaction (PPI) data becomes available, their computational interpretation has become an important problem in bioinformatics. The alignment of PPI networks from different species provides valuable information about conserved subnetworks, evolutionary pathways and functional orthologs. Although several methods have been proposed for global network alignment, there is a pressing need for methods that produce more accurate alignments in terms of both topological and functional consistency. Results: In this work, we present a novel global network alignment algorithm, named ModuleAlign, which makes use of local topology information to define a module-based homology score. Based on a hierarchical clustering of functionally coherent proteins involved in the same module, ModuleAlign employs a novel iterative scheme to find the alignment between two networks. Evaluated on a diverse set of benchmarks, ModuleAlign outperforms state-of-the-art methods in producing functionally consistent alignments. By aligning Pathogen–Human PPI networks, ModuleAlign also detects a novel set of conserved human genes that pathogens preferentially target to cause pathogenesis. Availability:http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html Contact:canzar@ttic.edu or j3xu.ttic.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Article
Full-text available
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches’ biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Article
Full-text available
Motivation: Discovering and understanding patterns in networks of protein-protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. A few methods have been proposed for global PPI network alignments, but because of NP-completeness of underlying sub-graph isomorphism problem, producing topologically and biologically accurate alignments remains a challenge. Results: We introduce a novel global network alignment tool, Lagrangian GRAphlet-based ALigner (L-GRAAL), which directly optimizes both the protein and the interaction functional conservations, using a novel alignment search heuristic based on integer programming and Lagrangian relaxation. We compare L-GRAAL with the state-of-the-art network aligners on the largest available PPI networks from BioGRID and observe that L-GRAAL uncovers the largest common sub-graphs between the networks, as measured by edge-correctness and symmetric sub-structures scores, which allow transferring more functional information across networks. We assess the biological quality of the protein mappings using the semantic similarity of their Gene Ontology annotations and observe that L-GRAAL best uncovers functionally conserved proteins. Furthermore, we introduce for the first time a measure of the semantic similarity of the mapped interactions and show that L-GRAAL also uncovers best functionally conserved interactions. In addition, we illustrate on the PPI networks of baker's yeast and human the ability of L-GRAAL to predict new PPIs. Finally, L-GRAAL's results are the first to show that topological information is more important than sequence information for uncovering functionally conserved interactions. Availability and implementation: L-GRAAL is coded in C++. Software is available at: http://bio-nets.doc.ic.ac.uk/L-GRAAL/. Contact: n.malod-dognin@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Statistical properties of the static networks have been extensively studied. However, online social networks are evolving dynamically, understanding the evolving characteristics of the core is one of major concerns in online social networks. In this paper, we empirically investigate the evolving characteristics of the Facebook core. Firstly, we separate the Facebook-link(FL) and Facebook-wall(FW) datasets into 28 snapshots in terms of timestamps. By employing the k-core decomposition method to identify the core of each snapshot, we find that the core sizes of the FL and FW networks approximately contain about 672 and 373 nodes regardless of the exponential growth of the network sizes. Secondly, we analyze evolving topological properties of the core, including the k-core value, assortative coefficient, clustering coefficient and the average shortest path length. Empirical results show that nodes in the core are getting more interconnected in the evolving process. Thirdly, we investigate the life span of nodes belonging to the core. More than 50% nodes stay in the core for more than one year, and 19% nodes always stay in the core from the first snapshot. Finally, we analyze the connections between the core and the whole network, and find that nodes belonging to the core prefer to connect nodes with high k-core values, rather than the high degrees ones. This work could provide new insights into the online social network analysis.
Article
Full-text available
Motivation: High-throughput experimental techniques have produced a large amount of protein–protein interaction (PPI) data. The study of PPI networks, such as comparative analysis, shall benefit the understanding of life process and diseases at the molecular level. One way of comparative analysis is to align PPI networks to identify conserved or species-specific subnetwork motifs. A few methods have been developed for global PPI network alignment, but it still remains challenging in terms of both accuracy and efficiency.Results: This paper presents a novel global network alignment algorithm, denoted as HubAlign, that makes use of both network topology and sequence homology information, based upon the observation that topologically important proteins in a PPI network usually are much more conserved and thus, more likely to be aligned. HubAlign uses a minimum-degree heuristic algorithm to estimate the topological and functional importance of a protein from the global network topology information. Then HubAlign aligns topologically important proteins first and gradually extends the alignment to the whole network. Extensive tests indicate that HubAlign greatly outperforms several popular methods in terms of both accuracy and efficiency, especially in detecting functionally similar proteins.Availability: HubAlign is available freely for non-commercial purposes at http://ttic.uchicago.edu/∼hashemifar/software/HubAlign.zipContact: jinboxu@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
As biological inquiry produces ever more network data, such as protein-protein interaction networks, gene regulatory networks, and metabolic networks, many algorithms have been proposed for the purpose of pairwise network alignment- finding a mapping from the nodes of one network to the nodes of another in such a way that the mapped nodes can be considered to correspond with respect to both their place in the network topology and their biological attributes. This technique is helpful in identifying previously undiscovered homologies between proteins of different species and revealing functionally similar subnetworks. In the past few years, a wealth of different aligners have been published, but few of them have been compared to one another, and no comprehensive review of these algorithms has yet appeared. We present the problem of biological network alignment, provide a guide to existing alignment algorithms, and comprehensively benchmark existing algorithms on both synthetic and real-world biological data, finding dramatic differences between existing algorithms in the quality of the alignments they produce. Additionally, we find that many of these tools are inconvenient to use in practice, and there remains a need for easy-to-use, cross-platform tools for performing network alignment. cclark@uccs.edu, jkalita@uccs.edu.
Article
Full-text available
Motivation: The interactions among proteins and the resulting networks of such interactions have a central role in cell biology. Aligning these networks gives us important information, such as conserved complexes and evolutionary relationships. Although there have been several publications on the global alignment of protein networks; however, none of proposed methods are able to produce a highly conserved and meaningful alignment. Moreover, time complexity of current algorithms makes them impossible to use for multiple alignment of several large networks together. Results: We present a novel algorithm for the global alignment of protein-protein interaction networks. It uses a greedy method, based on the alignment scoring matrix, which is derived from both biological and topological information of input networks to find the best global network alignment. NETAL outperforms other global alignment methods in terms of several measurements, such as Edge Correctness, Largest Common Connected Subgraphs and the number of common Gene Ontology terms between aligned proteins. As the running time of NETAL is much less than other available methods, NETAL can be easily expanded to multiple alignment algorithm. Furthermore, NETAL overpowers all other existing algorithms in term of performance so that the short running time of NETAL allowed us to implement it as the first server for global alignment of protein-protein interaction networks. Availability: Binaries supported on linux are freely available for download at http://www.bioinf.cs.ipm.ir/software/netal. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.
Article
Full-text available
Local network alignment is an important component of the analysis of protein-protein interaction networks that may lead to the identification of evolutionary related complexes. We present AlignNemo, a new algorithm that, given the networks of two organisms, uncovers subnetworks of proteins that relate in biological function and topology of interactions. The discovered conserved subnetworks have a general topology and need not to correspond to specific interaction patterns, so that they more closely fit the models of functional complexes proposed in the literature. The algorithm is able to handle sparse interaction data with an expansion process that at each step explores the local topology of the networks beyond the proteins directly interacting with the current solution. To assess the performance of AlignNemo, we ran a series of benchmarks using statistical measures as well as biological knowledge. Based on reference datasets of protein complexes, AlignNemo shows better performance than other methods in terms of both precision and recall. We show our solutions to be biologically sound using the concept of semantic similarity applied to Gene Ontology vocabularies. The binaries of AlignNemo and supplementary details about the algorithms and the experiments are available at: sourceforge.net/p/alignnemo.
Article
Full-text available
Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/ or http://www.kegg.jp/) is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem. Major efforts have been undertaken to manually create a knowledge base for such systemic functions by capturing and organizing experimental knowledge in computable forms; namely, in the forms of KEGG pathway maps, BRITE functional hierarchies and KEGG modules. Continuous efforts have also been made to develop and improve the cross-species annotation procedure for linking genomes to the molecular networks through the KEGG Orthology system. Here we report KEGG Mapper, a collection of tools for KEGG PATHWAY, BRITE and MODULE mapping, enabling integration and interpretation of large-scale data sets. We also report a variant of the KEGG mapping procedure to extend the knowledge base, where different types of data and knowledge, such as disease genes and drug targets, are integrated as part of the KEGG molecular networks. Finally, we describe recent enhancements to the KEGG content, especially the incorporation of disease and drug information used in practice and in society, to support translational bioinformatics.
Article
Full-text available
Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.
Article
Full-text available
Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology and disease. Comparison and alignment of biological networks will probably have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species-yeast and human-indicate that even distant species share a surprising amount of network topology, suggesting broad similarities in internal cellular wiring across all life on Earth.
Article
Full-text available
PathBLAST is a network alignment and search tool for comparing protein interaction networks across species to identify protein pathways and complexes that have been conserved by evolution. The basic method searches for high-scoring alignments between pairs of protein interaction paths, for which proteins of the first path are paired with putative orthologs occurring in the same order in the second path. This technique discriminates between true- and false-positive interactions and allows for functional annotation of protein interaction pathways based on similarity to the network of another, well-characterized species. PathBLAST is now available at http://www.pathblast.org/ as a web-based query. In this implementation, the user specifies a short protein interaction path for query against a target protein–protein interaction network selected from a network database. PathBLAST returns a ranked list of matching paths from the target network along with a graphical view of these paths and the overlap among them. Target protein–protein interaction networks are currently available for Helicobacter pylori, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. Just as BLAST enables rapid comparison of protein sequences between genomes, tools such as PathBLAST are enabling comparative genomics at the network level.
Article
Full-text available
With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.
Article
In this paper, we survey algorithms that perform global alignment of networks or graphs. Global network alignment aligns two or more given networks to find the best mapping from nodes in one network to nodes in other networks. Since graphs are a common method of data representation, graph alignment has become important with many significant applications. Proteinprotein interactions can be modeled as networks and aligning these networks of protein interactions has many applications in biological research. In this survey, we review algorithms for global pairwise alignment highlighting various proposed approaches, and classify them based on their methodology. Evaluation metrics that are used to measure the quality of the resulting alignments are also surveyed. We discuss and present a comparison between selected aligners on the same datasets and evaluate using the same evaluation metrics. Finally, a quick overview of the most popular databases of protein interaction networks is presented focusing on datasets that have been used recently.
Article
Network alignment aims to find conserved regions between different networks. Existing methods aim to maximize total similarity over all aligned nodes (i.e. node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Thus, we recently introduced MAGNA to directly maximize edge conservation while producing alignments and showed its superiority over the existing methods. Here, we extend the original MAGNA with several important algorithmic advances into a new MAGNA++ framework. MAGNA++ introduces several novelties: 1) It simultaneously maximizes any one of three different measures of edge conservation (including our recent superior S(3) measure) and any desired node conservation measure, which further improves alignment quality compared to maximizing only node conservation or only edge conservation. 2) It speeds up the original MAGNA algorithm by parallelizing it to automatically use all available resources, as well as by reimplementing the edge conservation measures more efficiently. 3) It provides a friendly graphical user interface for easy use by domain (e.g., biological) scientists. 4) At the same time, MAGNA++ offers source code for easy extensibility by computational scientists. http://www.nd.edu/~cone/MAGNA++/ CONTACT: tmilenko@nd.edu. © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Article
Motivation: Biological network alignment aims to identify similar regions between networks of different species. Existing methods compute node similarities to rapidly identify from possible alignments the high-scoring alignments with respect to the overall node similarity. But, the accuracy of the alignments is then evaluated with some other measure that is different than the node similarity used to construct the alignments. Typically, one measures the amount of conserved edges. Thus, the existing methods align similar nodes between networks hoping to conserve many edges (after the alignment is constructed!). Results: Instead, we introduce MAGNA to directly 'optimize' edge conservation while the alignment is constructed, without decreasing the quality of node mapping. MAGNA uses a genetic algorithm and our novel function for 'crossover' of two 'parent' alignments into a superior 'child' alignment to simulate a 'population' of alignments that 'evolves' over time; the 'fittest' alignments survive and proceed to the next 'generation', until the alignment accuracy cannot be optimized further. While we optimize our new and superior measure of the amount of conserved edges, MAGNA can optimize any alignment accuracy measure, including a combined measure of both node and edge conservation. In systematic evaluations against state-of-the-art methods (IsoRank, MI-GRAAL and GHOST), on both synthetic networks and real-world biological data, MAGNA outperforms all of the existing methods, in terms of both node and edge conservation as well as both topological and biological alignment accuracy. Availability: Software: http://nd.edu/∼cone/MAGNA CONTACT: : tmilenko@nd.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Large amounts of protein-protein interaction (PPI) data are available. The human PPI network currently contains over 56 000 interactions between 11 100 proteins. It has been demonstrated that the structure of this network is not random and that the same wiring patterns in it underlie the same biological processes and diseases. In this paper, we ask if there exists a subnetwork of the human PPI network such that its topology is the key to disease formation and hence should be the primary object of therapeutic intervention. We demonstrate that such a subnetwork exists and can be obtained purely computationally. In particular, by successively pruning the entire human PPI network, we are left with a "core" subnetwork that is not only topologically and functionally homogeneous, but is also enriched in disease genes, drug targets, and it contains genes that are known to drive disease formation. We call this subnetwork the Core Diseasome. Furthermore, we show that the topology of the Core Diseasome is unique in the human PPI network suggesting that it may be the wiring of this network that governs the mutagenesis that leads to disease. Explaining the mechanisms behind this phenomenon and exploiting them remains a challenge.
Article
Networks are an invaluable framework for modeling biological systems. Analyzing protein-protein interaction (PPI) networks can provide insight into underlying cellular processes. It is expected that comparison and alignment of biological networks will have a similar impact on our understanding of evolution, biological function, and disease as did sequence comparison and alignment. Here, we introduce a novel pairwise global alignment algorithm called Common-neighbors based GRAph ALigner (C-GRAAL) that uses heuristics for maximizing the number of aligned edges between two networks and is based solely on network topology. As such, it can be applied to any type of network, such as social, transportation, or electrical networks. We apply C-GRAAL to align PPI networks of eukaryotic and prokaryotic species, as well as inter-species PPI networks, and we demonstrate that the resulting alignments expose large connected and functionally topologically aligned regions. We use the resulting alignments to transfer biological knowledge across species, successfully validating many of the predictions. Moreover, we show that C-GRAAL can be used to align human-pathogen inter-species PPI networks and that it can identify patterns of pathogen interactions with host proteins solely from network topology.
Article
High-throughput methods for detecting molecular interactions have produced large sets of biological network data with much more yet to come. Analogous to sequence alignment, efficient and reliable network alignment methods are expected to improve our understanding of biological systems. Unlike sequence alignment, network alignment is computationally intractable. Hence, devising efficient network alignment heuristics is currently a foremost challenge in computational biology. We introduce a novel network alignment algorithm, called Matching-based Integrative GRAph ALigner (MI-GRAAL), which can integrate any number and type of similarity measures between network nodes (e.g. proteins), including, but not limited to, any topological network similarity measure, sequence similarity, functional similarity and structural similarity. Hence, we resolve the ties in similarity measures and find a combination of similarity measures yielding the largest contiguous (i.e. connected) and biologically sound alignments. MI-GRAAL exposes the largest functional, connected regions of protein-protein interaction (PPI) network similarity to date: surprisingly, it reveals that 77.7% of proteins in the baker's yeast high-confidence PPI network participate in such a subnetwork that is fully contained in the human high-confidence PPI network. This is the first demonstration that species as diverse as yeast and human contain so large, continuous regions of global network similarity. We apply MI-GRAAL's alignments to predict functions of un-annotated proteins in yeast, human and bacteria validating our predictions in the literature. Furthermore, using network alignment scores for PPI networks of different herpes viruses, we reconstruct their phylogenetic relationship. This is the first time that phylogeny is exactly reconstructed from purely topological alignments of PPI networks. Supplementary files and MI-GRAAL executables: http://bio-nets.doc.ic.ac.uk/MI-GRAAL/.
UniProt: The universal protein knowledgebase
  • A Bateman
  • M J Martin
  • C O'donovan
  • M Magrane
  • E Alpi
  • R Antunes
  • J Zhang
Bateman, A., Martin, M. J., O'Donovan, C., Magrane, M., Alpi, E., Antunes, R., … Zhang, J. (2017). UniProt: The universal protein knowledgebase. Nucleic Acids Research, 45(D1), D158-D169.
Measure node importance -MATLAB centrality
https://doi.org/10.1093/bioinformatics/btv130 [16] MathWorks. (n.d.). Measure node importance -MATLAB centrality. Retrieved April 22, 2019, from https://www.mathworks.com/help/matlab/ref/graph.centr ality.html
Research in Computational Molecular Biology
  • R Singh
  • J Xu
  • B Berger
Singh, R., Xu, J., & Berger, B. (2007). Research in Computational Molecular Biology. Research in Computational Molecular Biology, (April).