[Show abstract][Hide abstract] ABSTRACT: Biological networks obtained by high-throughput profiling or human curation are typically noisy. For functional module identification, single network clustering algorithms may not yield accurate and robust results. In order to borrow information across multiple sources to alleviate such problems due to data quality, we propose a new joint network clustering algorithm ASModel in this paper. We construct an integrated network to combine network topological information based on protein-protein interaction (PPI) datasets and homological information introduced by constituent similarity between proteins across networks. A novel random walk strategy on the integrated network is developed for joint network clustering and an optimization problem is formulated by searching for low conductance sets defined on the derived transition matrix of the random walk, which fuses both topology and homology information. The optimization problem of joint clustering is solved by a derived spectral clustering algorithm. Network clustering using several state-of-the-art algorithms has been implemented to both PPI networks within the same species (two yeast PPI networks and two human PPI networks) and those from different species (a yeast PPI network and a human PPI network). Experimental results demonstrate that ASModel outperforms the existing single network clustering algorithms as well as another recent joint clustering algorithm in terms of complex prediction and Gene Ontology (GO) enrichment analysis.
[Show abstract][Hide abstract] ABSTRACT: Identifying functional modules in protein-protein interaction (PPI) networks may shed light on cellular functional organization and thereafter underlying cellular mechanisms. Many existing module identification algorithms aim to detect densely connected groups of proteins as potential modules. However, based on this simple topological criterion of "higher than expected connectivity", those algorithms may miss biologically meaningful modules of functional significance, in which proteins have similar interaction patterns to other proteins in networks but may not be densely connected to each other. A few blockmodel module identification algorithms have been proposed to address the problem but the lack of global optimum guarantee and the prohibitive computational complexity have been the bottleneck of their applications in real-world large-scale PPI networks.
In this paper, we propose a novel optimization formulation LCP(2) using the concept of Markov random walk on graphs, which enables simultaneous identification of both dense and sparse modules based on protein interaction patterns in given networks through searching for low two-hop conductance sets by random walk. A spectral approximate algorithm SLCP(2) is derived to identify non-overlapping functional modules. Based on a bottom-up greedy strategy, we further extend LCP(2) to a new algorithm GLCP(2) to identify overlapping functional modules. We compare SLCP(2) and GLCP(2) with a range of state-of-the-art algorithms on synthetic networks and real-world PPI networks. The performance evaluation based on several criteria with respect to protein complex prediction, high level GO (Gene Ontology) term prediction, and especially, sparse module detection, has demonstrated that our algorithms based on searching for low two-hop conductance sets outperform all other compared algorithms.
All data and code are available at http://www.cse.usf.edu/xqian/fmi/slcp2hop/.
[Show abstract][Hide abstract] ABSTRACT: Real-world problems often involve complex systems that cannot be perfectly modeled or identified, and many engineering applications aim to design operators that can perform reliably in the presence of such uncertainty. In this paper, we propose a novel Bayesian framework for objective-based uncertainty quantification (UQ), which quantifies the uncertainty in a given system based on the expected increase of the operational cost that it induces. This measure of uncertainty, called MOCU (mean objective cost of uncertainty), provides a practical way of quantifying the effect of various types of system uncertainties on the operation of interest. Furthermore, the proposed UQ framework provides a general mathematical basis for designing robust operators, and it can be applied to diverse applications, including robust filtering, classification, and control. We demonstrate the utility and effectiveness of the proposed framework by applying it to the problem of robust structural intervention of gene regulatory networks, an important application in translational genomics.
IEEE Transactions on Signal Processing 05/2013; 61(9):2256-2266. · 2.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Identifying functional modules and understanding their organization in biological networks is of great importance. Recently, module identification by block modeling has demonstrated its advantages over the existing algorithms only considering topologically "cohesive" modules. In this paper, we aim to identify biologically meaningful functional modules by not only considering topologically "cohesive" modules but also taking into account the modules with nodes sparsely connected but sharing similar interaction patterns. In our adopted block modeling framework, we propose a novel efficient optimization algorithm by combining Simulated Annealing (SA) and Path Relinking (PR) to solve this difficult combinatorial optimization problem. We have evaluated the performance of our algorithm on a set of synthetic benchmark networks and a human protein-protein interaction (PPI) network. Our results show that our new SAPR algorithm achieves higher accuracy than existing state-of-the-art algorithms. The new algorithm also has significantly reduced computation time compared to the traditional SA algorithm with competitive accuracy. Preliminary results for identifying functional modules in the human PPI network and the comparison with the commonly adopted Markov Clustering (MCL) algorithm have demonstrated the potential of our algorithm to discover new types of modules, within which proteins are sparsely connected but with significantly enriched biological functionalities.
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine; 10/2012
[Show abstract][Hide abstract] ABSTRACT: A salient purpose for studying gene regulatory networks is to derive intervention strategies to identify potential drug targets and design gene-based therapeutic intervention. Optimal and approximate intervention strategies based on the transition probability matrix of the underlying Markov chain have been studied extensively for probabilistic Boolean networks. While the key goal of control is to reduce the steady-state probability mass of undesirable network states, in practice it is important to limit collateral damage and this constraint should be taken into account when designing intervention strategies with network models. In this paper, we propose two new phenotypically constrained stationary control policies by directly investigating the effects on the network long-run behavior. They are derived to reduce the risk of visiting undesirable states in conjunction with constraints on the shift of undesirable steady-state mass so that only limited collateral damage can be introduced. We have studied the performance of the new constrained control policies together with the previous greedy control policies to randomly generated probabilistic Boolean networks. A preliminary example for intervening in a metastatic melanoma network is also given to show their potential application in designing genetic therapeutics to reduce the risk of entering both aberrant phenotypes and other ambiguous states corresponding to complications or collateral damage. Experiments on both random network ensembles and the melanoma network demonstrate that, in general, the new proposed control policies exhibit the desired performance. As shown by intervening in the melanoma network, these control policies can potentially serve as future practical gene therapeutic intervention strategies.
IEEE/ACM Transactions on Computational Biology and Bioinformatics 03/2012; · 1.62 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: One of the ultimate objectives of studying gene regulatory networks is to derive potential intervention strategies to avoid aberrant cellular behavior. Boolean networks (BNs) and their stochastic extension, probabilistic Boolean networks (PBNs), provide a convenient framework to design different types of intervention strategies. In this paper, we focus on studying structural intervention, in which we perturb regulatory Boolean functions to alter the long-term network dynamics to obtain desirable behavior. Specifically, we extend our previous work that derives optimal structural intervention for rank-1 function perturbations to more general solutions for arbitrary rank-k function perturbations. The analytic solution is derived using the Sherman-Morrison-Woodbury (SMW) formula. We apply the derived structural intervention to a mutated mammalian cell cycle network. Our results show that our intervention strategy correctly identifies the main targets to stop uncontrolled cell growth in the mutated cell cycle network.
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on; 01/2012 · 4.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: To identify highly discriminating biomarkers for better disease prognosis and diagnosis, we present two new network-based methods that search for the cliques with the maximum node and edge weights that integrate both individual discriminating power and pairwise synergistic interactions. Under this novel framework of Maximum Weighted Multiple Clique Problem (MWMCP), we have derived the first analytical algorithm based on column generation method for its optimal solution. We also have developed a sequential heuristic solution for large-scale networks. In a preliminary study of immunologic and metabolic indices regarding the development of Type-1 Diabetes (T1D) from the Diabetes Prevention Trial-Type 1 (DPT-1) study, we have shown that the proposed methods can identify important biomarkers for T1D onset.
Genomic Signal Processing and Statistics, (GENSIPS), 2012 IEEE International Workshop on; 01/2012
[Show abstract][Hide abstract] ABSTRACT: The diverse cellular mechanisms that sustain the life of living organisms are carried out by numerous biomolecules, such as deoxyribonucleic acids (DNAs), ribonucleic acids (RNAs), and proteins. Over the past few decades, significant research efforts have been made to sequence the genomes of various species and to search these genomes to track down genes that give rise to proteins and noncoding RNAs (ncRNAs) , . As a result, the catalog of known functional molecules in cells has experienced a rapid expansion.
IEEE Signal Processing Magazine 01/2012; 29(1):22-34. · 3.37 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The segmentation of ultrasound images is challenging due to the difficulty of appropriate modeling of their appearance variations including speckle as well as signal dropout. We propose a novel automatic segmentation method for 2D cardiac ultrasound images based on hidden Markov models (HMMs). By directly exploiting the local image characteristics around contour points in images and integrating them into contour-based HMMs, we solve the segmentation problem by graph matching using an efficient dynamic programming algorithm. Due to the direct integration of local properties in our HMMs, our segmentation method automatically deals with inhomogeneity but avoids the complexities of explicit appearance modeling in classical Maximum A Posteriori (MAP) approaches. The optimization for contour extraction is straightforward and guarantees the global optimal results. We implemented our method to segment the endocardium in short-axis cardiac ultrasound images successfully. The method can also be used for other image modalities with the presence of image inhomogeneity.
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on; 06/2011 · 4.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Comparative network analysis aims to identify common subnetworks in biological networks. It can facilitate the prediction of conserved functional modules across different species and provide deep insights into their underlying regulatory mechanisms. Recently, it has been shown that hidden Markov models (HMMs) can provide a flexible and computationally efficient framework for modeling and comparing biological networks.
In this work, we show that using global correspondence scores between molecules can improve the accuracy of the HMM-based network alignment results. The global correspondence scores are computed by performing a semi-Markov random walk on the networks to be compared. The resulting score naturally integrates the sequence similarity between molecules and the topological similarity between their molecular interactions, thereby providing a more effective measure for estimating the functional similarity between molecules. By incorporating the global correspondence scores, instead of relying on sequence similarity or functional annotation scores used by previous approaches, our HMM-based network alignment method can identify conserved subnetworks that are functionally more coherent.
Performance analysis based on synthetic and microbial networks demonstrates that the proposed network alignment strategy significantly improves the robustness and specificity of the predicted alignment results, in terms of conserved functional similarity measured based on KEGG ortholog (KO) groups. These results clearly show that the HMM-based network alignment framework using global correspondence scores can effectively find conserved biological pathways and has the potential to be used for automatic functional annotation of biomolecules.
[Show abstract][Hide abstract] ABSTRACT: Human immunodeficiency virus type one (HIV-1) is the major pathogen that causes the acquired immune deficiency syndrome (AIDS). With the availability of large-scale protein-protein interaction (PPI) measurements, comparative network analysis can provide a promising way to study the host-virus interactions and their functional significance in the pathogenesis of AIDS. Until now, there have been a large number of HIV studies based on various animal models. In this paper, we present a novel framework for studying the host-HIV interactions through comparative network analysis across different species.
Based on the proposed framework, we test our hypothesis that HIV-1 attacks essential biological pathways that are conserved across species. We selected the Homo sapiens and Mus musculus PPI networks with the largest coverage among the PPI networks that are available from public databases. By using a local network alignment algorithm based on hidden Markov models (HMMs), we first identified the pathways that are conserved in both networks. Next, we analyzed the HIV-1 susceptibility of these pathways, in comparison with random pathways in the human PPI network. Our analysis shows that the conserved pathways have a significantly higher probability of being intercepted by HIV-1. Furthermore, Gene Ontology (GO) enrichment analysis shows that most of the enriched GO terms are related to signal transduction, which has been conjectured to be one of the major mechanisms targeted by HIV-1 for the takeover of the host cell.
This proof-of-concept study clearly shows that the comparative analysis of PPI networks across different species can provide important insights into the host-HIV interactions and the detailed mechanisms of HIV-1. We expect that comparative multiple network analysis of various species that have different levels of susceptibility to similar lentiviruses may provide a very effective framework for generating novel, and experimentally verifiable hypotheses on the mechanisms of HIV-1. We believe that the proposed framework has the potential to expedite the elucidation of the important mechanisms of HIV-1, and ultimately, the discovery of novel anti-HIV drugs.
[Show abstract][Hide abstract] ABSTRACT: One of the most important goals of the mathematical modeling of gene regulatory networks is to alter their behavior toward desirable phenotypes. Therapeutic techniques are derived for intervention in terms of stationary control policies. In large networks, it becomes computationally burdensome to derive an optimal control policy. To overcome this problem, greedy intervention approaches based on the concept of the Mean First Passage Time or the steady-state probability mass of the network states were previously proposed. Another possible approach is to use reduction mappings to compress the network and develop control policies on its reduced version. However, such mappings lead to loss of information and require an induction step when designing the control policy for the original network.
In this paper, we propose a novel solution, CoD-CP, for designing intervention policies for large Boolean networks. The new method utilizes the Coefficient of Determination (CoD) and the Steady-State Distribution (SSD) of the model. The main advantage of CoD-CP in comparison with the previously proposed methods is that it does not require any compression of the original model, and thus can be directly designed on large networks. The simulation studies on small synthetic networks shows that CoD-CP performs comparable to previously proposed greedy policies that were induced from the compressed versions of the networks. Furthermore, on a large 17-gene gastrointestinal cancer network, CoD-CP outperforms other two available greedy techniques, which is precisely the kind of case for which CoD-CP has been developed. Finally, our experiments show that CoD-CP is robust with respect to the attractor structure of the model.
The newly proposed CoD-CP provides an attractive alternative for intervening large networks where other available greedy methods require size reduction on the network and an extra induction step before designing a control policy.
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose a definition for the essentiality of regulatory relationships among molecules in a Boolean network model, which takes the regulatory relationships between the molecules into account, in addition to their connectivity. The proposed definition of essentiality is tightly related to the ultimate goal of designing intervention strategies to achieve beneficial dynamic changes in the network. Focusing on Boolean networks, we define the essentiality of each regulatory relationship as the difference between the expected performance of the Bayesian robust structural intervention over the uncertainty class of networks, which arises from the uncertainty in the given regulatory relationship, and the performance of the optimal structural intervention for the known network in which there is no uncertainty. For a specific regulatory relationship, a large difference in performance implies that the given relationship is critical for designing effective therapeutic strategies. On the other hand, small difference implies that the regulatory relationship under consideration may not be crucial in designing intervention strategies. This new definition of essentiality, grounded on the quantification of uncertainty in network dynamics, may provide a deep understanding of the robustness, adaptability, and controllability of gene regulatory networks.
[Show abstract][Hide abstract] ABSTRACT: MOTIVATION: A key goal of studying biological systems is to design therapeutic intervention strategies. Probabilistic Boolean networks (PBNs) constitute a mathematical model which enables modeling, predicting and intervening in their long-run behavior using Markov chain theory. The long-run dynamics of a PBN, as represented by its steady-state distribution (SSD), can guide the design of effective intervention strategies for the modeled systems. A major obstacle for its application is the large state space of the underlying Markov chain, which poses a serious computational challenge. Hence, it is critical to reduce the model complexity of PBNs for practical applications. RESULTS: We propose a strategy to reduce the state space of the underlying Markov chain of a PBN based on a criterion that the reduction least distorts the proportional change of stationary masses for critical states, for instance, the network attractors. In comparison to previous reduction methods, we reduce the state space directly, without deleting genes. We then derive stationary control policies on the reduced network that can be naturally induced back to the original network. Computational experiments study the effects of the reduction on model complexity and the performance of designed control policies which is measured by the shift of stationary mass away from undesirable states, those associated with undesirable phenotypes. We consider randomly generated networks as well as a 17-gene gastrointestinal cancer network, which, if not reduced, has a 2(17) × 2(17) transition probability matrix. Such a dimension is too large for direct application of many previously proposed PBN intervention strategies.
[Show abstract][Hide abstract] ABSTRACT: Developing computational models paves the way to understanding, predicting, and influencing the long-term behavior of genomic regulatory systems. However, several major challenges have to be addressed before such models are successfully applied in practice. Their inherent high complexity requires strategies for complexity reduction. Reducing the complexity of the model by removing genes and interpreting them as latent variables leads to the problem of selecting which states and their corresponding transitions best account for the presence of such latent variables. We use the Boolean network (BN) model to develop the general framework for selection and reduction of the model's complexity via designating some of the model's variables as latent ones. We also study the effects of the selection policies on the steady-state distribution and the controllability of the model.
IEEE Transactions on Signal Processing 10/2010; · 2.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Gene regulatory networks serve as models from which to derive therapeutic intervention strategies, in particular, stationary control policies over time that shift the probability mass of the steady state distribution (SSD) away from states associated with undesirable phenotypes. Derivation of control policies is hindered by the high-dimensional state spaces associated with gene regulatory networks. Hence, network reduction is a fundamental issue for intervention.
The network model that has been most used for the study of intervention in gene regulatory networks is the probabilistic Boolean network (PBN), which is a collection of constituent Boolean networks (BNs) with perturbation. In this article, we propose an algorithm that reduces a BN with perturbation, designs a control policy on the reduced network and then induces that policy to the original network. The coefficient of determination (CoD) is used to choose a gene for deletion, and a reduction mapping is used to rewire the remaining genes. This CoD-reduction procedure is used to construct a reduced network, then either the previously proposed mean first-passage time (MFPT) or SSD stationary control policy is designed on the reduced network, and these policies are induced to the original network. The efficacy of the overall algorithm is demonstrated on networks of 10 genes or less, where it is possible to compare the steady state shifts of the induced and original policies (because the latter can be derived), and by applying it to a 17-gene gastrointestinal network where it is shown that there is substantial beneficial steady state shift.
The code for the algorithms is available at: http://gsp.tamu.edu/Publications/supplementary/ghaffari10a/ Please Contact Noushin Ghaffari at email@example.com for further questions.
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: We present a novel framework based on hidden Markov mod- els (HMMs) for matching feature point sets, which capture the shapes of object contours of interest. Point matching al- gorithms provide effective tools for shape analysis, an impor- tant problem in computer vision and image processing appli- cations. Typically, it is computationally expensive to find the optimal correspondence between feature points in different sets, hence existing algorithms often resort to various heuris- tics that find suboptimal solutions. Unlike most of the pre- vious algorithms, the proposed HMM-based framework al- lows us to find the optimal correspondence using an efficient dynamic programming algorithm, where the computational complexity of the resulting shape matching algorithm grows only linearly with the size of the respective point sets. We demonstrate the promising potential of the proposed algo- rithm based on several benchmark data sets.
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA; 01/2010 · 4.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: This paper addresses the problem of indexing shapes in medical image databases. Shapes of organs are often indicative of disease, making shape similarity queries important in medical image databases. Mathematically, shapes with landmarks belong to shape spaces which are curved manifolds with a well defined metric. The challenge in shape indexing is to index data in such curved spaces. One natural indexing scheme is to use metric trees, but metric trees are prone to inefficiency. This paper proposes a more efficient alternative. We show that it is possible to optimally embed finite sets of shapes in shape space into a Euclidean space. After embedding, classical coordinate-based trees can be used for efficient shape retrieval. The embedding proposed in the paper is optimal in the sense that it least distorts the partial Procrustes shape distance. The proposed indexing technique is used to retrieve images by vertebral shape from the NHANES II database of cervical and lumbar spine X-ray images maintained at the National Library of Medicine. Vertebral shape strongly correlates with the presence of osteophytes, and shape similarity retrieval is proposed as a tool for retrieval by osteophyte presence and severity. Experimental results included in the paper evaluate (1) the usefulness of shape similarity as a proxy for osteophytes, (2) the computational and disk access efficiency of the new indexing scheme, (3) the relative performance of indexing with embedding to the performance of indexing without embedding, and (4) the computational cost of indexing using the proposed embedding versus the cost of an alternate embedding. The experimental results clearly show the relevance of shape indexing and the advantage of using the proposed embedding.
Medical image analysis 01/2010; 14(3):243-54. · 3.09 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Sensitivity analysis is a critical yet challenging problem for understanding complex systems. In genomic signal processing, it has been recognized that many biological systems are asymptotically stable. The sensitivity regarding the structural and dynamical uncertainty of network models may provide a deep understanding of the robustness, adaptability, and controllability of biological processes. We focus on the Boolean network model, as it has been shown to be able to capture the switching behavior of many biological processes by appropriate modeling of multivariate nonlinear relationships among genes. We study two different sensitivity measures for the Boolean network model, one directly related to individual predictor Boolean functions and the other to long-term network dynamics. Although there is some correlation between the measures, our study shows that these different sensitivities characterize different aspects of network behavior, so that their application depends on how they relate to specific translational goals.