Young-Rae Cho

Baylor University, Waco, Texas, United States

Are you Young-Rae Cho?

Claim your profile

Publications (40)29.46 Total impact

  • Young-Rae Cho, Yanan Xin, Greg Speegle
    [Show abstract] [Hide abstract]
    ABSTRACT: Because most complex genetic diseases are caused by defects of cell signaling, illuminating a signaling cascade is essential for understanding their mechanisms. We present three novel computational algorithms to reconstruct signaling networks between a starting protein and an ending protein using genome-wide protein-protein interaction (PPI) networks and gene ontology (GO) annotation data. A signaling network is represented as a directed acyclic graph in a merged form of multiple linear pathways. An advanced semantic similarity metric is applied for weighting PPIs as the preprocessing of all three methods. The first algorithm repeatedly extends the list of nodes based on path frequency towards an ending protein. The second algorithm repeatedly appends edges based on the occurrence of network motifs which indicate the link patterns more frequently appearing in a PPI network than in a random graph. The last algorithm uses the information propagation technique which iteratively updates edge orientations based on the path strength and merges the selected directed edges. Our experimental results demonstrate that the proposed algorithms achieve higher accuracy than previous methods when they are tested on well-studied pathways of S. cerevisiae. Furthermore, we introduce an interactive web application tool, called P-Finder, to visualize reconstructed signaling networks.
    IEEE/ACM Transactions on Computational Biology and Bioinformatics 04/2015; 12(2):309-321. DOI:10.1109/TCBB.2014.2355216 · 1.54 Impact Factor
  • Slavka Jaromerska, Petr Praus, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: Reconstruction of signaling pathways is crucial for understanding cellular mechanisms. A pathway is represented as a path of a signaling cascade involving a series of proteins to perform a particular function. Since a protein pair involved in signaling and response have a strong interaction, putative pathways can be detected from protein-protein interaction (PPI) networks. However, predicting directed pathways from the undirected genome-wide PPI networks has been challenging. We present a novel computational algorithm to efficiently predict signaling pathways from PPI networks given a starting protein and an ending protein. Our approach integrates topological analysis of PPI networks and semantic analysis of PPIs using Gene Ontology data. An advanced semantic similarity measure is used for weighting each interacting protein pair. Our distance-wise algorithm iteratively selects an adjacent protein from a PPI network to build a pathway based on a distance condition. On each iteration, the strength of a hypothetical path passing through a candidate edge is estimated by a local heuristic. We evaluate the performance by comparing the resultant paths to known signaling pathways on yeast. The results show that our approach has higher accuracy and efficiency than previous methods.
    Journal of Bioinformatics and Computational Biology 02/2014; 12(1):1450004. DOI:10.1142/S0219720014500048 · 0.93 Impact Factor
  • Young-Rae Cho, Slavka Jaromerska
    [Show abstract] [Hide abstract]
    ABSTRACT: The experimental study of signal transduction over a decade has made a substantial contribution to understanding functional mechanisms in a cell. A signaling pathway represents a linear path of a signaling cascade involving a series of proteins. As an advanced model, multiple linear pathways with extensive cross-talk between receptors can be merged into a larger-scale signaling network. We present an efficient computational approach to predict signaling networks by integration of genome-wide protein-protein interaction (PPI) data and ontological annotation data. We adopt an advanced semantic similarity metric for weighting PPIs, and an information propagation algorithm that runs on a weighted PPI network. This algorithm iteratively selects potential directed edges for signaling cascade using user-specified path strength parameters. Our approach also includes a preprocessing step to filter the large-scale PPI network by distance condition using the maximum path length parameter. Our experimental results show that the proposed approach runs extremely faster than existing computational methods and has competitive accuracy in the test of predicting well-studied pathways of S. cerevisiae and C. elegans. High efficiency of this approach would facilitate development of a web-based application tool to discover potential signaling networks.
    2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 12/2013
  • Yilan Bai, Greg Speegle, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: A signaling pathway, which is represented as a chain of interacting proteins for a biological process, can be predicted from protein-protein interaction (PPI) networks. However, pathway prediction is computationally challenging because of (1) inefficiency in searching all possible paths from the large-scale PPI networks and (2) unreliability of current PPI data generated by automated high-throughput methods. In this paper, we propose a novel approach to efficiently predict signaling pathways from PPI networks when a starting protein (source) and an ending protein (target) are given. Our approach is a combination of topological analysis of the networks and ontological analysis of interacting proteins. Starting from the source, this method repeatedly extends the list of proteins to form a pathway based on the improved support model (iSup). This model integrates (1) the frequency of the paths towards the target and (2) the semantic similarity between each adjacent pair in a pathway. The path frequency is computed by a heuristic data-mining technique to determine the most frequent paths towards the target in a PPI network. The semantic similarity is measured by the distance of the information contents of Gene Ontology (GO) terms annotating interacting proteins. To further improve computational efficiency, we propose two additional strategies: filtering the PPI networks and precomputing approximate path frequency. The experiment with the yeast PPI data demonstrates that our approach predicted MAPK signaling pathways with higher accuracy and efficiency than other existing methods.
    2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 12/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions (PPIs) play a key role in understanding the mechanisms of cellular processes. The availability of interactome data has catalyzed the development of computational approaches to elucidate functional behaviors of proteins on a system level. Gene Ontology (GO) and its annotations are a significant resource for functional characterization of proteins. Because of wide coverage, GO data have often been adopted as a benchmark for protein function prediction on the genomic scale. We propose a computational approach, called M-Finder, for functional association pattern mining. This method employs semantic analytics to integrate the genome-wide PPIs with GO data. We also introduce an interactive web application tool that visualizes a functional association network linked to a protein specified by a user. The proposed approach comprises two major components. First, the PPIs that have been generated by high-throughput methods are weighted in terms of their functional consistency using GO and its annotations. We assess two advanced semantic similarity metrics which quantify the functional association level of each interacting protein pair. We demonstrate that these measures outperform the other existing methods by evaluating their agreement to other biological features, such as sequence similarity, the presence of common Pfam domains, and core PPIs. Second, the information flow-based algorithm is employed to discover a set of proteins functionally associated with the protein in a query and their links efficiently. This algorithm reconstructs a functional association network of the query protein. The output network size can be flexibly determined by parameters. M-Finder provides a useful framework to investigate functional association patterns with any protein. This software will also allow users to perform further systematic analysis of a set of proteins for any specific function. It is available online at http://bionet.ecs.baylor.edu/mfinder.
    Proteome Science 11/2013; 11(Suppl 1):S3. DOI:10.1186/1477-5956-11-S1-S3 · 1.88 Impact Factor
  • True Price, Francisco I Peña, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: Predicting protein complexes from protein-protein interaction (PPI) networks has been the focus of many computational approaches over the last decade. These methods tend to vary in performance based on the structure of the network and the parameters provided to the algorithm. Here, we evaluate the merits of enhancing PPI networks with semantic similarity edge weights using Gene Ontology (GO) and its annotation data. We compare the cluster features and predictive efficacy of six well-known unweighted protein complex detection methods (Clique Percolation, MCODE, DPClus, IPCA, Graph Entropy, and CoAch) against updated weighted implementations. We conclude that incorporating semantic similarity edge weighting in PPI network analysis unequivocally increases the performance of these methods.
    Interdisciplinary Sciences Computational Life Sciences 09/2013; 5(3):196-210. DOI:10.1007/s12539-013-0174-9 · 0.66 Impact Factor
  • Pietro Hiram Guzzi, Young-Rae Cho
    Interdisciplinary Sciences Computational Life Sciences 09/2013; 5(3):165-6. DOI:10.1007/s12539-013-0175-8 · 0.66 Impact Factor
  • Source
    Tak Chien Chiam, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent computational techniques have facilitated analyzing genome-wide protein-protein interaction data for several model organisms. Various graph-clustering algorithms have been applied to protein interaction networks on the genomic scale for predicting the entire set of potential protein complexes. In particular, the density-based clustering algorithms which are able to generate overlapping clusters, i.e. the clusters sharing a set of nodes, are well-suited to protein complex detection because each protein could be a member of multiple complexes. However, their accuracy is still limited because of complex overlap patterns of their output clusters. We present a systematic approach of refining the overlapping clusters identified from protein interaction networks. We have designed novel metrics to assess cluster overlaps: overlap coverage and overlapping consistency. We then propose an overlap refinement algorithm. It takes as input the clusters produced by existing density-based graph-clustering methods and generates a set of refined clusters by parameterizing the metrics. To evaluate protein complex prediction accuracy, we used the f-measure by comparing each refined cluster to known protein complexes. The experimental results with the yeast protein-protein interaction data sets from BioGRID and DIP demonstrate that accuracy on protein complex prediction has increased significantly after refining cluster overlaps. The effectiveness of the proposed cluster overlap refinement approach for protein complex detection has been validated in this study. Analyzing overlaps of the clusters from protein interaction networks is a crucial task for understanding of functional roles of proteins and topological characteristics of the functional systems.
    Proteome Science 06/2012; 10 Suppl 1(Suppl 1):S3. DOI:10.1186/1477-5956-10-S1-S3 · 1.88 Impact Factor
  • F.I. Pena, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: The generation of protein-protein interactions (PPIs) has created the need for efficient computational approaches that can discover highly modular clusters of good quality. These clusters represent protein complexes or functional modules. There are a number of seed-growth style algorithms that exist to identify protein complexes from the genome-wide PPI networks. However, these methods lose accuracy when the networks are comparatively large and have complex connectivity. To combat the noise that exists in these large PPI networks, we propose an improvement to the graph entropy approach which is one of the seed-growth style algorithms. As a novel information-theoretic definition, Graph Entropy is a measure of the structural complexity of a graph. For example, the loss of entropy represents an increase in modularity of the graph. The original algorithm only considers the interconnected nature of vertices, but the new modified definition now considers edge weights. These edge weights are achieved by measuring the semantic similarity of PPIs. The weighted graph entropy approach is applied to the S. cerevisiae PPI data set from BioGRID. The output clusters are compared with known protein complexes so that we can calculate /-scores and use them to evaluate the clusters accuracy. The proposed improvement to the graph entropy approach proves to enhance the quality of clusters as potential protein complexes when compared to the other seed-growth style algorithms.
    Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on; 01/2012
  • Young-Rae Cho, Tak Chien Chiam, Yanxin Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions (PPIs) play a key role in understanding functional behavior of genes. Discovering association patterns from PPI networks is crucial for functional characterization on a system level. We present a novel approach to discover the functional association pattern of a query gene from the genome-wide PPI networks. This approach consists of two major components. First, we transform the PPI network to a weighted graph representation by measuring semantic similarity. Three enhanced semantic similarity methods are proposed to estimate functional closeness of each interacting pair. Second, we apply a dynamic propagation algorithm to detect the functional association pattern of a gene, represented as a sub-network. The size of the sub-networks is flexibly determined by user-specific parameters. In this paper, we also introduce an interactive web application, called M-Finder, to visualize the functional association pattern of a gene entered by a user. The semantic similarity measures and the dynamic propagation algorithm are embedded in this tool to run on up-to-date PPI networks of model species. M-Finder allows users to carry out further systematic analysis for functional characterization on the genomic scale.
    Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on; 01/2012
  • Edward Casey Kenley, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent high-throughput experiments have generated protein–protein interaction data on a genomic scale, yielding the complete interactome for several organisms. Various graph clustering algorithms have been applied to protein interaction networks for identifying protein complexes and functional modules. Although the previous algorithms are scalable and robust, their accuracy is still limited because of the complex connectivity found in protein interaction networks. In this study, we propose a novel information-theoretic definition, graph entropy, as a measure of the structural complexity of a graph. Loss of graph entropy represents an increase in modularity of the graph. Based on this concept, we present a graph clustering algorithm which searches for the local optimum in modularity. The algorithm detects each optimal cluster by growing a seed in a manner that minimizes graph entropy. In the experiments with the yeast interactome, the results show that the graph entropy approach has higher accuracy in predicting protein complexes and functional modules than the best competing method. We statistically compared output clusters to both known protein complexes and Gene Ontology annotations in the biological process and molecular function categories in order to measure f-scores and p-scores as clustering accuracy. Because this algorithm is also scalable, it can be applied to the larger scale human protein interaction network.
    Proteomics 10/2011; 11(19):3835 - 3844. DOI:10.1002/pmic.201100193 · 3.97 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions are fundamental to the biological processes within a cell. In the scale-free, small-world network typically modeled by protein interaction networks, hubs play a key role in maintaining the network structure. From the biological perspective, hubs are expected to be functionally essential proteins, participating in critical interactions of biological processes. Hubs can be classified into two different categories, party hubs (intra-module hubs) and date hubs (intermodule hubs), which vary in the timing and place of their associations with their interacting partners. This paper introduces a novel measure for identifying and differentiating party and date hubs in a protein interaction network. Our approach is based on the semantic similarity measure integrated with Gene Ontology data. Combined with the centrality measures of degree, betweenness, and closeness, we demonstrate that this measure detects potential party hubs and date hubs that match the confirmed party and date hubs with high accuracy.
  • Edward Casey Kenley, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: Complex systems have been widely studied to characterize their structural behaviors from a topological perspective. High modularity is one of the recurrent features of real-world complex systems. Various graph clustering algorithms have been applied to identifying communities in social networks or modules in biological networks. However, their applicability to real-world systems has been limited because of the massive scale and complex connectivity of the networks. In this study, we exploit a novel information-theoretic model for graph clustering. The entropy-based clustering approach finds locally optimal clusters by growing a random seed in a manner that minimizes graph entropy. We design and analyze modifications that further improve its performance. Assigning priority in seed-selection and seed-growth is well applicable to the scale-free networks characterized by the hub-oriented structure. Computing seed-growth in parallel streams also decomposes an extremely large network efficiently. The experimental results with real biological and social networks show that the entropy-based approach has better performance than competing methods in terms of accuracy and efficiency.
    11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11-14, 2011; 01/2011
  • Nicholas Soltau, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent computational techniques have facilitated analyzing genome-wide protein-protein interactions. Various graph-clustering algorithms have been applied to the protein interaction networks for identifying protein complexes and functional modules. Since each protein performs multiple functions, a clustering algorithm should be able to produce overlapping clusters. In this paper, we use the seed-refinement algorithm to generate a set of preliminary overlapping clusters. Next, for further refining the preliminary clusters, we carry out a systematic analysis of their overlaps by novel metrics: overlap coverage and overlapping consistency. We propose the cluster-merging algorithm to yield final clusters by parameterizing the metrics. In the test with the yeast protein-protein interaction network, we demonstrate the proposed approach improves accuracy on detecting protein complexes and functional modules by optimizing the parameter values.
    IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011, Atlanta, GA, USA, 12-15 November, 2011; 01/2011
  • Source
    Young-Rae Cho, Aidong Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions play a key role in biological processes of proteins within a cell. Recent high-throughput techniques have generated protein-protein interaction data in a genome-scale. A wide range of computational approaches have been applied to interactome network analysis for uncovering functional organizations and pathways. However, they have been challenged because of complex connectivity. It has been investigated that protein interaction networks are typically characterized by intrinsic topological features: high modularity and hub-oriented structure. Elucidating the structural roles of modules and hubs is a critical step in complex interactome network analysis. We propose a novel approach to convert the complex structure of an interactome network into hierarchical ordering of proteins. This algorithm measures functional similarity between proteins based on the path strength model, and reveals a hub-oriented tree structure hidden in the complex network. We score hub confidence and identify functional modules in the tree structure of proteins, retrieved by our algorithm. Our experimental results in the yeast protein interactome network demonstrate that the selected hubs are essential proteins for performing functions. In network topology, they have a role in bridging different functional modules. Furthermore, our approach has high accuracy in identifying functional modules hierarchically distributed. Decomposing, converting, and synthesizing complex interaction networks are fundamental tasks for modeling their structural behaviors. In this study, we systematically analyzed complex interactome network structures for retrieving functional information. Unlike previous hierarchical clustering methods, this approach dynamically explores the hierarchical structure of proteins in a global view. It is well-applicable to the interactome networks in high-level organisms because of its efficiency and scalability.
    BMC Bioinformatics 04/2010; 11 Suppl 3(Suppl 3):S3. DOI:10.1186/1471-2105-11-S3-S3 · 2.67 Impact Factor
  • Young-Rae Cho, Lei Shi, Aidong Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Biological networks having complex connectivity have been widely studied recently. By characterizing their inherent and structural behaviors in a topological perspective, these studies have attempted to discover hidden knowledge in the systems. However, even though various algorithms with graph-theoretical modeling have provided fundamentals in the network analysis, the availability of practical approaches to efficiently handle the complexity has been limited. In this paper, we present a novel flow-based approach, called flowNet, to efficiently analyze large-sized, complex networks. Our approach is based on the functional influence model that quantifies the influence of a biological component on another. We introduce a dynamic flow simulation algorithm to generate a flow pattern which is a unique characteristic for each component. The set of patterns can be used in identifying functional modules (i.e., clustering). The proposed flow simulation algorithm runs very efficiently in sparse networks. Since our approach uses a weighted network as an input, we also discuss supervised and unsupervised weighting schemes for unweighted biological networks. As experimental results in real applications to the yeast protein interaction network, we demonstrate that our approach outperforms previous graph clustering methods with respect to accuracy.
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on; 01/2010
  • Hao Lian, Chengsen Song, Young-Rae Cho
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent high-throughput experimental methods have generated protein-protein interaction data in the genome scale, called interactome. Various graph clustering algorithms have been applied to the protein interactome networks for identifying protein complexes and predicting functional modules. Although the previous algorithms are scalable and robust, their accuracy is still limited because of complex connectivity of the networks. In this study, we propose a novel information-theoretic definition, Graph Entropy, as a measure of structural complexity of a graph. Loss of graph entropy represents an increase in modularity of the graph. Based on this concept, we present a graph clustering algorithm. Starting from a random seed vertex and its neighbors as a seed cluster, the algorithm iteratively adds or removes vertices on the border of the cluster to minimize graph entropy. We make an additional improvement on the algorithm for generating overlapping clusters. In the experiments with the yeast protein interactome network, we show the graph entropy-based approach has higher accuracy in predicting functional modules than other competing methods.
    2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010, Hong Kong, China, 18 - 21 December 2010, Proceedings; 01/2010
  • Young-Rae Cho, Aidong Zhang
  • Young-Rae Cho, Aidong Zhang
    01/2010; 1:20-35. DOI:10.4018/jkdb.2010070102
  • Lei Shi, Young-Rae Cho, Aidong Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions (PPIs) play fundamental roles in nearly all biological processes and differ based on the composition, affinity and lifetime of the association. A vast amount of PPI data for various organisms is available from MIPS, DIP and other sources. The identification of functional modules in PPI network is of great interest because they often reveal unknown functional ties between proteins and hence predict functions for unknown proteins. In this paper, we propose using functional flow simulation and the topology of the network for the functional module detection and function prediction problem. Our approach is based on the functional influence model that quantifies the influence of a biological component on another. We introduce a flow simulation algorithm to generate a functional profile for each component. In addition, a new clustering method FMD (Functional Module Detection) is designed to associate with functional profiles to detect functional modules. We evaluate the proposed technique on three different yeast networks with MIPS functional categories and compare it with several other existing techniques in terms of precision and recall. Our experiments show that our approach achieves better accuracy than other existing methods.
    10th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2010, Philadelphia, Pennsylvania, USA, May 31-June 3 2010; 01/2010

Publication Stats

289 Citations
29.46 Total Impact Points

Institutions

  • 2009–2015
    • Baylor University
      • Department of Computer Science
      Waco, Texas, United States
  • 2–2009
    • University at Buffalo, The State University of New York
      • Department of Computer Science and Engineering
      Buffalo, New York, United States
  • 2007
    • State University of New York
      New York City, New York, United States