DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator
ABSTRACT Many current works aiming to learn regulatory networks from systems biology data must balance model complexity with respect to data availability and quality. Methods that learn regulatory associations based on unit-less metrics, such as Mutual Information, are attractive in that they scale well and reduce the number of free parameters (model complexity) per interaction to a minimum. In contrast, methods for learning regulatory networks based on explicit dynamical models are more complex and scale less gracefully, but are attractive as they may allow direct prediction of transcriptional dynamics and resolve the directionality of many regulatory interactions.
We aim to investigate whether scalable information based methods (like the Context Likelihood of Relatedness method) and more explicit dynamical models (like Inferelator 1.0) prove synergistic when combined. We test a pipeline where a novel modification of the Context Likelihood of Relatedness (mixed-CLR, modified to use time series data) is first used to define likely regulatory interactions and then Inferelator 1.0 is used for final model selection and to build an explicit dynamical model.
Our method ranked 2nd out of 22 in the DREAM3 100-gene in silico networks challenge. Mixed-CLR and Inferelator 1.0 are complementary, demonstrating a large performance gain relative to any single tested method, with precision being especially high at low recall values. Partitioning the provided data set into four groups (knock-down, knock-out, time-series, and combined) revealed that using comprehensive knock-out data alone provides optimal performance. Inferelator 1.0 proved particularly powerful at resolving the directionality of regulatory interactions, i.e. "who regulates who" (approximately of identified true positives were correctly resolved). Performance drops for high in-degree genes, i.e. as the number of regulators per target gene increases, but not with out-degree, i.e. performance is not affected by the presence of regulatory hubs.
SourceAvailable from: Armita Zarnegar[Show abstract] [Hide abstract]
ABSTRACT: A Gene Regulatory Network (GRN) is a graph that represents the way in which genes inhibit or activate other genes. The discovery of GRNs is one of the most important and challenging tasks in bioinformatics. This is not only because of the role of GRNs in providing insight into processes and functions inside cells, but also because of their potential for the treatment of diseases and drug discovery. Usually, the technology used for collecting information about changes in gene activity is the microarray. Microarray data is complex and noisy and its analysis requires the assistance of computational methods. This thesis focuses on the automated discovery of GRNs from microarray gene expression data using heuristics from the molecular biology domain. We employed heuristic information for GRN discovery in three different approaches and employed a synthetic data generator called SynTReN to generate different benchmark problems to evaluate each approach. In the first approach, a combination of local search with gene expression programming was advanced, which we called Memetic Gene Expression Programming, to solve a system of differential equations that modeled a GRN. This resulted in an improvement over techniques previously applied to this problem. Our memetic gene expression programming technique also proved to be promising for any other application where there is a need for solving a system of differential equations. Despite the improvements, this method was found to be unsuitable to solve a large-scale real-sized GRN. In the second approach we used a coarse-grain equation-free model with another combined evolutionary algorithm (Memetic Algorithm) for the automated discovery of large scale real-sized networks. In this approach, we found that the evolutionary algorithm was not sufficiently efficient for exploring such a large search space. In the third approach, we integrated heuristics from domain knowledge to a greater extent than the two previous approaches. The third approach followed two strands. In the v first strand, we advanced a new method to measure and visualize the way a gene activates or inhibits another gene. We called this a 2D Visualized Co-regulation function and used it to select gene pairs for building a GRN. We also advanced two postprocessing steps in order to reduce the number of incorrect associations. The first postprocessing method used heuristic information and the second one used an information processing technique. In the second strand, the structural properties of known networks were used to discover the GRN. Finding the correct structure of the GRN has been reported to be the most challenging aspect of GRN discovery. Our solution to finding the correct structure of the GRN is based on using Hub Network to build the core structure of the network. Hubs are nodes with a high number of links attached to them and are known to be the most important genes. We first detected hub genes from domain knowledge and then built a network based on them from microarray data. This resulted in a plausible structure for building the rest of the network. We built the rest of the network incrementally using heuristic information such as the degree of the nodes. The results obtained using the third approach showed considerable improvement in the performance of GRN discovery when we compared them against existing approaches. We thus demonstrated that the process of discovering GRNs can be improved by using heuristic information along with computational modeling.08/2011, Degree: PhD, Supervisor: Peter Vamplew, Andrew Stranieri
[Show abstract] [Hide abstract]
ABSTRACT: During the last two decades, molecular genetic studies and the completion of the sequencing of the Arabidopsis thaliana genome have increased knowledge of hormonal regulation in plants. These signal transduction pathways act in concert through gene regulatory and signalling networks whose main components have begun to be elucidated. Our understanding of the resulting cellular processes is hindered by the complex, and sometimes counter-intuitive, dynamics of the networks, which may be interconnected through feedback controls and cross-regulation. Mathematical modelling provides a valuable tool to investigate such dynamics and to perform in silico experiments that may not be easily carried out in a laboratory. In this article, we firstly review general methods for modelling gene and signalling networks and their application in plants. We then describe specific models of hormonal perception and cross-talk in plants. This mathematical analysis of sub-cellular molecular mechanisms paves the way for more comprehensive modelling studies of hormonal transport and signalling in a multi-scale setting.Mathematical Modelling of Natural Phenomena 01/2013; 8(4). DOI:10.1051/mmnp/20138402 · 0.73 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform this data into biological knowledge. For example, how to use this data to answer questions such as: which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction and network interrogation. Herein, we provide an overview of network analysis including a step by step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.