DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator

Center for Genomic Regulation, Spain
PLoS ONE (Impact Factor: 3.23). 03/2010; 5(3):e9803. DOI: 10.1371/journal.pone.0009803
Source: PubMed


Many current works aiming to learn regulatory networks from systems biology data must balance model complexity with respect to data availability and quality. Methods that learn regulatory associations based on unit-less metrics, such as Mutual Information, are attractive in that they scale well and reduce the number of free parameters (model complexity) per interaction to a minimum. In contrast, methods for learning regulatory networks based on explicit dynamical models are more complex and scale less gracefully, but are attractive as they may allow direct prediction of transcriptional dynamics and resolve the directionality of many regulatory interactions.
We aim to investigate whether scalable information based methods (like the Context Likelihood of Relatedness method) and more explicit dynamical models (like Inferelator 1.0) prove synergistic when combined. We test a pipeline where a novel modification of the Context Likelihood of Relatedness (mixed-CLR, modified to use time series data) is first used to define likely regulatory interactions and then Inferelator 1.0 is used for final model selection and to build an explicit dynamical model.
Our method ranked 2nd out of 22 in the DREAM3 100-gene in silico networks challenge. Mixed-CLR and Inferelator 1.0 are complementary, demonstrating a large performance gain relative to any single tested method, with precision being especially high at low recall values. Partitioning the provided data set into four groups (knock-down, knock-out, time-series, and combined) revealed that using comprehensive knock-out data alone provides optimal performance. Inferelator 1.0 proved particularly powerful at resolving the directionality of regulatory interactions, i.e. "who regulates who" (approximately of identified true positives were correctly resolved). Performance drops for high in-degree genes, i.e. as the number of regulators per target gene increases, but not with out-degree, i.e. performance is not affected by the presence of regulatory hubs.

Download full-text


Available from: Aviv Madar,
31 Reads
  • Source
    • "Furthermore, previous studies inferred networks using Pearson correlation, a measure that has numerous issues [37], and only compared networks at a fixed co-expression threshold. We conduct our analysis over a range of co-expression thresholds and infer networks using mutual information (MI) and context likelihood of relatedness (CLR), an approach shown to be state of the art in comparative studies [38,39]. As a result, our approach yielded several novel biological observations. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Divergence in gene regulation has emerged as a key mechanism underlying species differentiation. Comparative analysis of co-expression networks across species can reveal conservation and divergence in the regulation of genes. We inferred co-expression networks of A. thaliana, Populus spp. and O. sativa using state-of-the-art methods based on mutual information and context likelihood of relatedness, and conducted a comprehensive comparison of these networks across a range of co-expression thresholds. In addition to quantifying gene-gene link and network neighbourhood conservation, we also applied recent advancements in network analysis to do cross-species comparisons of network properties such as scale free characteristics and gene centrality as well as network motifs. We found that in all species the networks emerged as scale free only above a certain co-expression threshold, and that the high-centrality genes upholding this organization tended to be conserved. Network motifs, in particular the feed-forward loop, were found to be significantly enriched in specific functional subnetworks but where much less conserved across species than gene centrality. Although individual gene-gene co-expression had massively diverged, up to ~80% of the genes still had a significantly conserved network neighbourhood. For genes with multiple predicted orthologs, about half had one ortholog with conserved regulation and another ortholog with diverged or non-conserved regulation. Furthermore, the most sequence similar ortholog was not the one with the most conserved gene regulation in over half of the cases. We have provided a comprehensive analysis of gene regulation evolution in plants and built a web tool for Comparative analysis of Plant co-Expression networks (ComPlEx, The tool can be particularly useful for identifying the ortholog with the most conserved regulation among several sequence-similar alternatives and can thus be of practical importance in e.g. finding candidate genes for perturbation experiments.
    BMC Genomics 02/2014; 15(1):106. DOI:10.1186/1471-2164-15-106 · 3.99 Impact Factor
  • Source
    • "A set of the algorithms operates under such a hypothesis that coexpressed [3–5], roughly coordinated genes [6, 7] and genes with dependency [8–10] across a set of samples indicate a functional relationship [11, 12]. As one of the best gene network construction methods, the context likelihood of relatedness (CLR) method [9] utilizing the mutual information (MI) for scoring the similarity of gene pairs has been widely used to decipher gene networks for multiple species, such as yeast, bacteria, mammalian, and plants [9, 13–16]. However, it is computationally infeasible to decipher genome-scale networks for species with large genomes on a single computer due to physical limits on CPU speeds and memory capacities. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.
    BioMed Research International 11/2013; 2013:856325. DOI:10.1155/2013/856325 · 2.71 Impact Factor
  • Source
    • "The CLR algorithm was introduced in [18] and extended in [19], [20] for the DREAM competitions. CLR increases the contrast between direct and indirect relationships using the local network context to compute a statistical metric of similarity between expression profiles. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Regulation of gene expression is crucial for organism growth, and it is one of the challenges in systems biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyze two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, and assess causality of their regulatory interactions by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation.
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 05/2013; 10(1):50-60. DOI:10.1109/TCBB.2013.3 · 1.44 Impact Factor
Show more