Publications (49)29.05 Total impact
- [Show abstract] [Hide abstract] ABSTRACT: Because most complex genetic diseases are caused by defects of cell signaling, illuminating a signaling cascade is essential for understanding their mechanisms. We present three novel computational algorithms to reconstruct signaling networks between a starting protein and an ending protein using genome-wide protein-protein interaction (PPI) networks and gene ontology (GO) annotation data. A signaling network is represented as a directed acyclic graph in a merged form of multiple linear pathways. An advanced semantic similarity metric is applied for weighting PPIs as the preprocessing of all three methods. The first algorithm repeatedly extends the list of nodes based on path frequency towards an ending protein. The second algorithm repeatedly appends edges based on the occurrence of network motifs which indicate the link patterns more frequently appearing in a PPI network than in a random graph. The last algorithm uses the information propagation technique which iteratively updates edge orientations based on the path strength and merges the selected directed edges. Our experimental results demonstrate that the proposed algorithms achieve higher accuracy than previous methods when they are tested on well-studied pathways of S. cerevisiae. Furthermore, we introduce an interactive web application tool, called P-Finder, to visualize reconstructed signaling networks.
- [Show abstract] [Hide abstract] ABSTRACT: Reconstruction of signaling pathways is crucial for understanding cellular mechanisms. A pathway is represented as a path of a signaling cascade involving a series of proteins to perform a particular function. Since a protein pair involved in signaling and response have a strong interaction, putative pathways can be detected from protein-protein interaction (PPI) networks. However, predicting directed pathways from the undirected genome-wide PPI networks has been challenging. We present a novel computational algorithm to efficiently predict signaling pathways from PPI networks given a starting protein and an ending protein. Our approach integrates topological analysis of PPI networks and semantic analysis of PPIs using Gene Ontology data. An advanced semantic similarity measure is used for weighting each interacting protein pair. Our distance-wise algorithm iteratively selects an adjacent protein from a PPI network to build a pathway based on a distance condition. On each iteration, the strength of a hypothetical path passing through a candidate edge is estimated by a local heuristic. We evaluate the performance by comparing the resultant paths to known signaling pathways on yeast. The results show that our approach has higher accuracy and efficiency than previous methods.
- [Show abstract] [Hide abstract] ABSTRACT: The experimental study of signal transduction over a decade has made a substantial contribution to understanding functional mechanisms in a cell. A signaling pathway represents a linear path of a signaling cascade involving a series of proteins. As an advanced model, multiple linear pathways with extensive cross-talk between receptors can be merged into a larger-scale signaling network. We present an efficient computational approach to predict signaling networks by integration of genome-wide protein-protein interaction (PPI) data and ontological annotation data. We adopt an advanced semantic similarity metric for weighting PPIs, and an information propagation algorithm that runs on a weighted PPI network. This algorithm iteratively selects potential directed edges for signaling cascade using user-specified path strength parameters. Our approach also includes a preprocessing step to filter the large-scale PPI network by distance condition using the maximum path length parameter. Our experimental results show that the proposed approach runs extremely faster than existing computational methods and has competitive accuracy in the test of predicting well-studied pathways of S. cerevisiae and C. elegans. High efficiency of this approach would facilitate development of a web-based application tool to discover potential signaling networks.
- [Show abstract] [Hide abstract] ABSTRACT: A signaling pathway, which is represented as a chain of interacting proteins for a biological process, can be predicted from protein-protein interaction (PPI) networks. However, pathway prediction is computationally challenging because of (1) inefficiency in searching all possible paths from the large-scale PPI networks and (2) unreliability of current PPI data generated by automated high-throughput methods. In this paper, we propose a novel approach to efficiently predict signaling pathways from PPI networks when a starting protein (source) and an ending protein (target) are given. Our approach is a combination of topological analysis of the networks and ontological analysis of interacting proteins. Starting from the source, this method repeatedly extends the list of proteins to form a pathway based on the improved support model (iSup). This model integrates (1) the frequency of the paths towards the target and (2) the semantic similarity between each adjacent pair in a pathway. The path frequency is computed by a heuristic data-mining technique to determine the most frequent paths towards the target in a PPI network. The semantic similarity is measured by the distance of the information contents of Gene Ontology (GO) terms annotating interacting proteins. To further improve computational efficiency, we propose two additional strategies: filtering the PPI networks and precomputing approximate path frequency. The experiment with the yeast PPI data demonstrates that our approach predicted MAPK signaling pathways with higher accuracy and efficiency than other existing methods.
- [Show abstract] [Hide abstract] ABSTRACT: Protein-protein interactions (PPIs) play a key role in understanding the mechanisms of cellular processes. The availability of interactome data has catalyzed the development of computational approaches to elucidate functional behaviors of proteins on a system level. Gene Ontology (GO) and its annotations are a significant resource for functional characterization of proteins. Because of wide coverage, GO data have often been adopted as a benchmark for protein function prediction on the genomic scale. We propose a computational approach, called M-Finder, for functional association pattern mining. This method employs semantic analytics to integrate the genome-wide PPIs with GO data. We also introduce an interactive web application tool that visualizes a functional association network linked to a protein specified by a user. The proposed approach comprises two major components. First, the PPIs that have been generated by high-throughput methods are weighted in terms of their functional consistency using GO and its annotations. We assess two advanced semantic similarity metrics which quantify the functional association level of each interacting protein pair. We demonstrate that these measures outperform the other existing methods by evaluating their agreement to other biological features, such as sequence similarity, the presence of common Pfam domains, and core PPIs. Second, the information flow-based algorithm is employed to discover a set of proteins functionally associated with the protein in a query and their links efficiently. This algorithm reconstructs a functional association network of the query protein. The output network size can be flexibly determined by parameters. M-Finder provides a useful framework to investigate functional association patterns with any protein. This software will also allow users to perform further systematic analysis of a set of proteins for any specific function. It is available online at http://bionet.ecs.baylor.edu/mfinder.
- [Show abstract] [Hide abstract] ABSTRACT: Predicting protein complexes from protein-protein interaction (PPI) networks has been the focus of many computational approaches over the last decade. These methods tend to vary in performance based on the structure of the network and the parameters provided to the algorithm. Here, we evaluate the merits of enhancing PPI networks with semantic similarity edge weights using Gene Ontology (GO) and its annotation data. We compare the cluster features and predictive efficacy of six well-known unweighted protein complex detection methods (Clique Percolation, MCODE, DPClus, IPCA, Graph Entropy, and CoAch) against updated weighted implementations. We conclude that incorporating semantic similarity edge weighting in PPI network analysis unequivocally increases the performance of these methods.
- [Show abstract] [Hide abstract] ABSTRACT: The generation of protein-protein interactions (PPIs) has created the need for efficient computational approaches that can discover highly modular clusters of good quality. These clusters represent protein complexes or functional modules. There are a number of seed-growth style algorithms that exist to identify protein complexes from the genome-wide PPI networks. However, these methods lose accuracy when the networks are comparatively large and have complex connectivity. To combat the noise that exists in these large PPI networks, we propose an improvement to the graph entropy approach which is one of the seed-growth style algorithms. As a novel information-theoretic definition, Graph Entropy is a measure of the structural complexity of a graph. For example, the loss of entropy represents an increase in modularity of the graph. The original algorithm only considers the interconnected nature of vertices, but the new modified definition now considers edge weights. These edge weights are achieved by measuring the semantic similarity of PPIs. The weighted graph entropy approach is applied to the S. cerevisiae PPI data set from BioGRID. The output clusters are compared with known protein complexes so that we can calculate /-scores and use them to evaluate the clusters accuracy. The proposed improvement to the graph entropy approach proves to enhance the quality of clusters as potential protein complexes when compared to the other seed-growth style algorithms.
- [Show abstract] [Hide abstract] ABSTRACT: Protein-protein interactions (PPIs) play a key role in understanding functional behavior of genes. Discovering association patterns from PPI networks is crucial for functional characterization on a system level. We present a novel approach to discover the functional association pattern of a query gene from the genome-wide PPI networks. This approach consists of two major components. First, we transform the PPI network to a weighted graph representation by measuring semantic similarity. Three enhanced semantic similarity methods are proposed to estimate functional closeness of each interacting pair. Second, we apply a dynamic propagation algorithm to detect the functional association pattern of a gene, represented as a sub-network. The size of the sub-networks is flexibly determined by user-specific parameters. In this paper, we also introduce an interactive web application, called M-Finder, to visualize the functional association pattern of a gene entered by a user. The semantic similarity measures and the dynamic propagation algorithm are embedded in this tool to run on up-to-date PPI networks of model species. M-Finder allows users to carry out further systematic analysis for functional characterization on the genomic scale.
- [Show abstract] [Hide abstract] ABSTRACT: Recent computational techniques have facilitated analyzing genome-wide protein-protein interaction data for several model organisms. Various graph-clustering algorithms have been applied to protein interaction networks on the genomic scale for predicting the entire set of potential protein complexes. In particular, the density-based clustering algorithms which are able to generate overlapping clusters, i.e. the clusters sharing a set of nodes, are well-suited to protein complex detection because each protein could be a member of multiple complexes. However, their accuracy is still limited because of complex overlap patterns of their output clusters. We present a systematic approach of refining the overlapping clusters identified from protein interaction networks. We have designed novel metrics to assess cluster overlaps: overlap coverage and overlapping consistency. We then propose an overlap refinement algorithm. It takes as input the clusters produced by existing density-based graph-clustering methods and generates a set of refined clusters by parameterizing the metrics. To evaluate protein complex prediction accuracy, we used the f-measure by comparing each refined cluster to known protein complexes. The experimental results with the yeast protein-protein interaction data sets from BioGRID and DIP demonstrate that accuracy on protein complex prediction has increased significantly after refining cluster overlaps. The effectiveness of the proposed cluster overlap refinement approach for protein complex detection has been validated in this study. Analyzing overlaps of the clusters from protein interaction networks is a crucial task for understanding of functional roles of proteins and topological characteristics of the functional systems.
- [Show abstract] [Hide abstract] ABSTRACT: Complex systems have been widely studied to characterize their structural behaviors from a topological perspective. High modularity is one of the recurrent features of real-world complex systems. Various graph clustering algorithms have been applied to identifying communities in social networks or modules in biological networks. However, their applicability to real-world systems has been limited because of the massive scale and complex connectivity of the networks. In this study, we exploit a novel information-theoretic model for graph clustering. The entropy-based clustering approach finds locally optimal clusters by growing a random seed in a manner that minimizes graph entropy. We design and analyze modifications that further improve its performance. Assigning priority in seed-selection and seed-growth is well applicable to the scale-free networks characterized by the hub-oriented structure. Computing seed-growth in parallel streams also decomposes an extremely large network efficiently. The experimental results with real biological and social networks show that the entropy-based approach has better performance than competing methods in terms of accuracy and efficiency.
- [Show abstract] [Hide abstract] ABSTRACT: Recent computational techniques have facilitated analyzing genome-wide protein-protein interactions. Various graph-clustering algorithms have been applied to the protein interaction networks for identifying protein complexes and functional modules. Since each protein performs multiple functions, a clustering algorithm should be able to produce overlapping clusters. In this paper, we use the seed-refinement algorithm to generate a set of preliminary overlapping clusters. Next, for further refining the preliminary clusters, we carry out a systematic analysis of their overlaps by novel metrics: overlap coverage and overlapping consistency. We propose the cluster-merging algorithm to yield final clusters by parameterizing the metrics. In the test with the yeast protein-protein interaction network, we demonstrate the proposed approach improves accuracy on detecting protein complexes and functional modules by optimizing the parameter values.
- [Show abstract] [Hide abstract] ABSTRACT: Recent high-throughput experiments have generated protein–protein interaction data on a genomic scale, yielding the complete interactome for several organisms. Various graph clustering algorithms have been applied to protein interaction networks for identifying protein complexes and functional modules. Although the previous algorithms are scalable and robust, their accuracy is still limited because of the complex connectivity found in protein interaction networks. In this study, we propose a novel information-theoretic definition, graph entropy, as a measure of the structural complexity of a graph. Loss of graph entropy represents an increase in modularity of the graph. Based on this concept, we present a graph clustering algorithm which searches for the local optimum in modularity. The algorithm detects each optimal cluster by growing a seed in a manner that minimizes graph entropy. In the experiments with the yeast interactome, the results show that the graph entropy approach has higher accuracy in predicting protein complexes and functional modules than the best competing method. We statistically compared output clusters to both known protein complexes and Gene Ontology annotations in the biological process and molecular function categories in order to measure f-scores and p-scores as clustering accuracy. Because this algorithm is also scalable, it can be applied to the larger scale human protein interaction network.
- [Show abstract] [Hide abstract] ABSTRACT: Protein-protein interactions are fundamental to the biological processes within a cell. In the scale-free, small-world network typically modeled by protein interaction networks, hubs play a key role in maintaining the network structure. From the biological perspective, hubs are expected to be functionally essential proteins, participating in critical interactions of biological processes. Hubs can be classified into two different categories, party hubs (intra-module hubs) and date hubs (intermodule hubs), which vary in the timing and place of their associations with their interacting partners. This paper introduces a novel measure for identifying and differentiating party and date hubs in a protein interaction network. Our approach is based on the semantic similarity measure integrated with Gene Ontology data. Combined with the centrality measures of degree, betweenness, and closeness, we demonstrate that this measure detects potential party hubs and date hubs that match the confirmed party and date hubs with high accuracy.
Conference Paper: Decomposing protein interactome networks by graph entropy[Show abstract] [Hide abstract] ABSTRACT: Recent high-throughput experimental methods have generated protein-protein interaction data in the genome scale, called interactome. Various graph clustering algorithms have been applied to the protein interactome networks for identifying protein complexes and predicting functional modules. Although the previous algorithms are scalable and robust, their accuracy is still limited because of complex connectivity of the networks. In this study, we propose a novel information-theoretic definition, Graph Entropy, as a measure of structural complexity of a graph. Loss of graph entropy represents an increase in modularity of the graph. Based on this concept, we present a graph clustering algorithm. Starting from a random seed vertex and its neighbors as a seed cluster, the algorithm iteratively adds or removes vertices on the border of the cluster to minimize graph entropy. We make an additional improvement on the algorithm for generating overlapping clusters. In the experiments with the yeast protein interactome network, we show the graph entropy-based approach has higher accuracy in predicting functional modules than other competing methods.
- [Show abstract] [Hide abstract] ABSTRACT: Protein-protein interactions play a key role in biological processes of proteins within a cell. Recent high-throughput techniques have generated protein-protein interaction data in a genome-scale. A wide range of computational approaches have been applied to interactome network analysis for uncovering functional organizations and pathways. However, they have been challenged because of complex connectivity. It has been investigated that protein interaction networks are typically characterized by intrinsic topological features: high modularity and hub-oriented structure. Elucidating the structural roles of modules and hubs is a critical step in complex interactome network analysis. We propose a novel approach to convert the complex structure of an interactome network into hierarchical ordering of proteins. This algorithm measures functional similarity between proteins based on the path strength model, and reveals a hub-oriented tree structure hidden in the complex network. We score hub confidence and identify functional modules in the tree structure of proteins, retrieved by our algorithm. Our experimental results in the yeast protein interactome network demonstrate that the selected hubs are essential proteins for performing functions. In network topology, they have a role in bridging different functional modules. Furthermore, our approach has high accuracy in identifying functional modules hierarchically distributed. Decomposing, converting, and synthesizing complex interaction networks are fundamental tasks for modeling their structural behaviors. In this study, we systematically analyzed complex interactome network structures for retrieving functional information. Unlike previous hierarchical clustering methods, this approach dynamically explores the hierarchical structure of proteins in a global view. It is well-applicable to the interactome networks in high-level organisms because of its efficiency and scalability.
- [Show abstract] [Hide abstract] ABSTRACT: Biological networks having complex connectivity have been widely studied recently. By characterizing their inherent and structural behaviors in a topological perspective, these studies have attempted to discover hidden knowledge in the systems. However, even though various algorithms with graph-theoretical modeling have provided fundamentals in the network analysis, the availability of practical approaches to efficiently handle the complexity has been limited. In this paper, we present a novel flow-based approach, called flowNet, to efficiently analyze large-sized, complex networks. Our approach is based on the functional influence model that quantifies the influence of a biological component on another. We introduce a dynamic flow simulation algorithm to generate a flow pattern which is a unique characteristic for each component. The set of patterns can be used in identifying functional modules (i.e., clustering). The proposed flow simulation algorithm runs very efficiently in sparse networks. Since our approach uses a weighted network as an input, we also discuss supervised and unsupervised weighting schemes for unweighted biological networks. As experimental results in real applications to the yeast protein interaction network, we demonstrate that our approach outperforms previous graph clustering methods with respect to accuracy.
- [Show abstract] [Hide abstract] ABSTRACT: The systematic analysis of protein-protein interactions is one of the most fundamental challenges to understand cellular organizations, processes and functions. The interaction between two proteins provides significant insight into their functional association. Recent high-throughput experiments have determined protein-protein interactions in the genome scale, called interactome. A wide range of graph theoretic, computational approaches have been presented to characterize protein functions from the interactome networks. In this work, we quantitatively analyze topological features of interactome networks. Our topological model is based on hierarchical modularity in the scale-free nature. First, we use connectivity and betweenness centrality to measure the likelihood of bridging two clusters for each node and edge. Next, we propose an efficient algorithm to detect clusters by collapsing the bridging nodes and edges. To assess the measurement of bridges, we compute the clustering coefficients of the networks which are built from successive deletion of bridging nodes. The alteration pattern of the clustering coefficients approximates the amount of bridging nodes in a network. We also investigate the biological importance of bridging nodes based on protein lethality information. The modularization results show that our approach accurately identifies functional modules in the interactome network of S. cerevisiae. We finally apply our approach to predicting biological functions of uncharacterized proteins in S. cerevisiae.
Waco, Texas, United States
- Department of Computer Science
University at Buffalo, The State University of New York
Buffalo, New York, United States
- Department of Computer Science and Engineering