[Show abstract][Hide abstract] ABSTRACT: The interpretation of high-throughput datasets has remained one of the central challenges of computational biology over the past decade. Furthermore, as the amount of biological knowledge increases, it becomes more and more difficult to integrate this large body of knowledge in a meaningful manner. In this article, we propose a particular solution to both of these challenges.
We integrate available biological knowledge by constructing a network of molecular interactions of a specific kind: causal interactions. The resulting causal graph can be queried to suggest molecular hypotheses that explain the variations observed in a high-throughput gene expression experiment. We show that a simple scoring function can discriminate between a large number of competing molecular hypotheses about the upstream cause of the changes observed in a gene expression profile. We then develop an analytical method for computing the statistical significance of each score. This analytical method also helps assess the effects of random or adversarial noise on the predictive power of our model.
Our results show that the causal graph we constructed from known biological literature is extremely robust to random noise and to missing or spurious information. We demonstrate the power of our causal reasoning model on two specific examples, one from a cancer dataset and the other from a cardiac hypertrophy experiment. We conclude that causal reasoning models provide a valuable addition to the biologist's toolkit for the interpretation of gene expression data.
R source code for the method is available upon request.
[Show abstract][Hide abstract] ABSTRACT: Triglyceride accumulation is associated with obesity and type 2 diabetes. Genetic disruption of diacylglycerol acyltransferase 1 (DGAT1), which catalyzes the final reaction of triglyceride synthesis, confers dramatic resistance to high-fat diet induced obesity. Hence, DGAT1 is considered a potential therapeutic target for treating obesity and related metabolic disorders. However, the molecular events shaping the mechanism of action of DGAT1 pharmacological inhibition have not been fully explored yet. Here, we investigate the metabolic molecular mechanisms induced in response to pharmacological inhibition of DGAT1 using a recently developed computational systems biology approach, the Causal Reasoning Engine (CRE). The CRE algorithm utilizes microarray transcriptomic data and causal statements derived from the biomedical literature to infer upstream molecular events driving these transcriptional changes. The inferred upstream events (also called hypotheses) are aggregated into biological models using a set of analytical tools that allow for evaluation and integration of the hypotheses in context of their supporting evidence. In comparison to gene ontology enrichment analysis which pointed to high-level changes in metabolic processes, the CRE results provide detailed molecular hypotheses to explain the measured transcriptional changes. CRE analysis of gene expression changes in high fat habituated rats treated with a potent and selective DGAT1 inhibitor demonstrate that the majority of transcriptomic changes support a metabolic network indicative of reversal of high fat diet effects that includes a number of molecular hypotheses such as PPARG, HNF4A and SREBPs. Finally, the CRE-generated molecular hypotheses from DGAT1 inhibitor treated rats were found to capture the major molecular characteristics of DGAT1 deficient mice, supporting a phenotype of decreased lipid and increased insulin sensitivity.
PLoS ONE 01/2011; 6(11):e27009. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Over the past decade gene expression data sets have been generated at an increasing pace. In addition to ever increasing data
generation, the biomedical literature is growing exponentially. The PubMed database (Sayers et al., 2010) comprises more than
20 million citations as of October 2010. The goal of our method is the prediction of putative upstream regulators of observed
expression changes based on a set of over 400,000 causal relationships. The resulting putative regulators constitute directly
testable hypotheses for follow-up.
Research in Computational Molecular Biology - 15th Annual International Conference, RECOMB 2011, Vancouver, BC, Canada, March 28-31, 2011. Proceedings; 01/2011
[Show abstract][Hide abstract] ABSTRACT: Models of regulatory networks become more difficult to construct and understand as they grow in size and complexity. Large models are usually built up from smaller models, representing subsets of reactions within the larger network. To assist modelers in this composition process, we present a formal approach for model composition, a wizard-style program for implementing the approach, and suggested language extensions to the Systems Biology Markup Language to support model composition. To illustrate the features of our approach and how to use the JigCell Composition Wizard, we build up a model of the eukaryotic cell cycle "engine" from smaller pieces.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 01/2010; 7(2):278-87. · 2.25 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Models of regulatory networks become more difficult to construct and understand as they grow in size and complexity. Modelers naturally build large models from smaller components that each represent subsets of reactions within the larger network. To assist modelers in this process, we present model aggregation, which defines models in terms of components that are designed for the purpose of being combined.
We have implemented a model editor that incorporates model aggregation, and we suggest supporting extensions to the Systems Biology Markup Language (SBML) Level 3. We illustrate aggregation with a model of the eukaryotic cell cycle 'engine' created from smaller pieces.
Java implementations are available in the JigCell Aggregation Connector. See http://jigcell.biol.vt.edu.
[Show abstract][Hide abstract] ABSTRACT: We demonstrate how to model macromolecular regulatory networks with JigCell and the Parameter Estimation Toolkit (PET). These software tools are designed specifically to support the process typically used by systems biologists to model complex regulatory circuits. A detailed example illustrates how a model of the cell cycle in frog eggs is created and then refined through comparison of simulation output with experimental data. We show how parameter estimation tools automatically generate rate constants that fit a model to experimental data.
[Show abstract][Hide abstract] ABSTRACT: We describe procedures for converting a macromolecular regulatory model from the most common deterministic formulation to one suitable for stochastic simulation. To avoid error, we seek to automate as much of the process as possible. However, deterministic models often omit key information necessary to a stochastic formulation. In this paper we introduce how we implement conversion in the JigCell modeling environment. Our tool makes it easier for the modeler to include complete details. Stochastic simulations are known for being computationally intensive, and thus require high performance computing facilities to be practical. We provide the first stochastic simulation results for realistic cell cycle models, using Virginia Tech's System X supercomputer.
Proceedings of the 2008 Spring Simulation Multiconference, SpringSim 2008, Ottawa, Canada, April 14-17, 2008; 01/2008
[Show abstract][Hide abstract] ABSTRACT: Today's macromolecular regulatory network models are small compared to the amount of information known about the corresponding cellular pathways, in part because current modeling languages and tools are unable to handle significantly larger models. Most pathway models are small models of individual pathways which are relatively easy to construct and manage. The hope is someday to put these pieces together to create a more complete picture of the underlying molecular machinery. While efforts to make large models can benefit from reusing existing components, there currently exists little tool or representational support for combining or composing models. In this paper we present a tool for merging two or more models (we call this process model fusion) and a concrete proposal for implementing composition in the context of the Systems Biology Markup Language (SBML).
Proceedings of the 2007 Spring Simulation Multiconference, SpringSim 2007, Norfolk, Virginia, USA, March 25-29, 2007, Volume 2; 01/2007
[Show abstract][Hide abstract] ABSTRACT: Today's macromolecular regulatory network models are small compared to the amount of information known about a particular cellular pathway, in part because current mod- eling languages and tools are unable to handle significantly larger models. Thus, most pathway modeling work today focuses on building small models of individual pathways since they are easy to construct and manage. The hope is someday to put these pieces together to create a more com- plete picture of the underlying molecular machinery. While efforts to make large models benefit from reusing existing components, unfortunately, there currently exists little tool or representational support for combining or composing models. We have identified four distinct modeling pro- cesses related to model composition: fusion, composition, aggregation, and flattening. We present concrete proposals for implementing all four processes in the context of the
Proceedings of the Winter Simulation Conference WSC 2006, Monterey, California, USA, December 3-6, 2006; 01/2006