Detection of the dominant direction of information flow and feedback links in densely interconnected regulatory networks

Ariadne Inc., 9430 Key West Ave, Suite 113 Rockville, MD 20850, USA.
BMC Bioinformatics (Impact Factor: 2.58). 11/2008; 9(1):424. DOI: 10.1186/1471-2105-9-424
Source: PubMed


Finding the dominant direction of flow of information in densely interconnected regulatory or signaling networks is required in many applications in computational biology and neuroscience. This is achieved by first identifying and removing links which close up feedback loops in the original network and hierarchically arranging nodes in the remaining network. In mathematical language this corresponds to a problem of making a graph acyclic by removing as few links as possible and thus altering the original graph in the least possible way. The exact solution of this problem requires enumeration of all cycles and combinations of removed links, which, as an NP-hard problem, is computationally prohibitive even for modest-size networks.
We introduce and compare two approximate numerical algorithms for solving this problem: the probabilistic one based on a simulated annealing of the hierarchical layout of the network which minimizes the number of "backward" links going from lower to higher hierarchical levels, and the deterministic, "greedy" algorithm that sequentially cuts the links that participate in the largest number of feedback cycles. We find that the annealing algorithm outperforms the deterministic one in terms of speed, memory requirement, and the actual number of removed links. To further improve a visual perception of the layout produced by the annealing algorithm, we perform an additional minimization of the length of hierarchical links while keeping the number of anti-hierarchical links at their minimum. The annealing algorithm is then tested on several examples of regulatory and signaling networks/pathways operating in human cells.
The proposed annealing algorithm is powerful enough to performs often optimal layouts of protein networks in whole organisms, consisting of around approximately 10(4) nodes and approximately 10(5) links, while the applicability of the greedy algorithm is limited to individual pathways with approximately 100 vertices. The considered examples indicate that the annealing algorithm produce biologically meaningful layouts: The function of the most of the anti-hierarchical links is indeed to send a feedback signal to the upstream pathway elements. Source codes of F90 and Matlab implementation of the two algorithms are available at

Download full-text


Available from: Sergei S Maslov
  • Source
    • "The importance of mixed signs in incoherent feedback cycles has been discussed in [14]. Coherent loops can produce multi-stationary behavior in all affected nodes [15]. For a discussion of the mathematical properties of feed-back loops in a wider sense see [16]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Network motifs play an important role in the qualitative analysis and quantitative characterization of networks. The feed-forward loop is a semantically important and statistically highly significant motif. In this paper, we extend the definition of the feed-forward loop to subgraphs of arbitrary size. To avoid the complexity of path enumeration, we define generalized feed-forward loops as pairs of source and target nodes that have two or more internally disjoint connecting paths. Based on this definition, we formally derive an approach for the detection of this generalized motif. Our quantitative analysis demonstrates that generalized feed-forward loops up to a certain path length are statistically significant. Loops of greater size are statistically underrepresented and hence an anti-motif.
    Full-text · Article · Dec 2012 · Procedia Computer Science
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this chapter, we have introduced in details the miniTUBA system, and how to apply the miniTUBA dynamic Bayesian network (DBN) approach to analyze a typical use case in the areas of host-pathogen interactions using high throughput microarray data. The DBNs are powerful to model the stochastic evoluation of a set of random variables over time. Since the biological processes and various measurement errors are stochastic in nature, DBN has been considered as a suitable technique to study biological networks and pathways. Bayesian networks (BNs) and DBNs are based on a multinomial distribution. This distribution is very flexible, and each node has a different parameterization. Therefore, it is very feasible to use DBNs to model the dynamics of biological systems and responses to parameter perturbations. However, although a few applications for both Bayesian network and DBNs to modeling gene expression data have been discussed and reported, their usefullness remains to be shown with more well-understood pathways (Xia, et al., 2004). Programmed cell death (i.e., apoptosis) pathways are well studied and important for all plant and animal organisms. We first demonstrated in this report how the DBN analysis can be used to predict crucial genes for a cell death pathway, which led to correct experimental verification. Two major challenges in DBN analysis for biological network modeling exist. First, continuous gene expression data has to be descretized, leading to the loss of information. The descretization simplifies the computation and stablizes the predicated results. However, current equal quantile and interval descretization methods do not often reflect the biological realities. The customized descretization method is too time consuming and may not correlate with the unknown truth either. Therefore, alternative approaches will need to be explored to improve the descretization and minimize loss of information. How to find reliable ways to model continuous data remains to be a major challenge in the DBN and other modeling studies. Second, it is a big challenge to identify the correct time steps (i.e., Markov lags) for a DBN modeling. By default, we require all variables have the same time step size. However, it might be possible to allow a mixture of different time step sizes. The time scale likely differ between variables. To identify the relevant time scale, we may allow different discretization schemes. While more finely discretized variables offer slower changes, it might be difficult to determine how many are appropriate. The generation of very large sizes of discretizations is also time consuming. One solution is to allow mixtures of time steps in the learning step. However, it is in practice very difficult because the current step depends on a range of past experiences. If the previous time steps are not multiplies of each other, a complex splining function is usually needed to dynamically interpolate the missing data. Alternatively, we can explicitly search for an optimum informative time step. A DBN search will favor small time steps because it means more data to be used. However, if the data represents only more interpolated data, it would not help. While DBN analysis can be improved in different directions, the two areas of DBN research with the largest impact are probably the discretization and correct time step setting. Besides addressing the above challenges, dynamic Bayesian networks can further be improved through different directions: (i) those strong links (or edges) are conserved among top networks and can be detected by consensus analysis (Fig. 4). (ii) cross-species
    Full-text · Chapter · Aug 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose three different graph-theoretical decompositions of large-scale biologi-cal networks, all three aiming at highlighting specific dynamical properties of the system. The first consists in finding a maximal directed acyclic subgraph in the network, which dynamically cor-responds to searching for the maximal open-loop subsystem of the given system. The other two decompositions deal with the strong monotonicity property, and aim at decomposing the system into strongly monotone components with different structural characteristics: a single large strongly con-nected monotone subsystem in one case, and a set of smaller disjoint monotone subsystems in the other. For all three decompositions we provide original heuristic algorithms.
    Full-text · Article · Sep 2010
Show more