Figure 1 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
The RE:IN (Reasoning Engine for Interaction Networks) methodology, illustrated by example. First, critical network components must be identified: genes A, B and C are critical regulators of a given cell state, while S1 and S2 are input signals (panel 1). Components can be active or inactive, to fit a Boolean formalism. Second, definite and possible interactions should be defined (panel 2): S1 activates A (solid arrow), B may activate C (dashed arrow). These define the topology of an abstract network, which describes 24=16 unique, concrete networks, in which each possible interaction is present or not (panel 3). By combining this topology with known or hypothesized regulation conditions at each node (panel 4), we characterize an Abstract Boolean Network (ABN, panel 5). Next, experimental observations are encoded as constraints on state trajectories (panel 6). A constrained Abstract Boolean Network (cABN) defines an ABN together with the constraints describing system observations, thus integrating available knowledge describing the structure, dynamics and observed behavior of the process (panel 7). We can enumerate the concrete models that satisfy these constraints (panel 8). In addition, we can use the cABN to formulate predictions (panel 9): to identify minimal networks, which have the fewest optional interactions instantiated (concrete model 2, panel 8), as well as required (or disallowed) interactions that are present in all (none) concrete models. We can also study genetic perturbations. Once predictions have been tested experimentally (panel 10), they can be added to the set of experimental constraints. If no concrete models are identified, then the process is iterated, starting by re-examining our assumptions about components, interactions, dynamics and behavior.
Source publication
Predictive biology is elusive because rigorous, data-constrained, mechanistic models of complex biological systems are difficult to derive and validate. Current approaches tend to construct and examine static interaction network models, which are descriptively rich, but often lack explanatory and predictive power, or dynamic models that can be simu...
Similar publications
A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - wh...
Citations
... The resulting interaction graph is then called a Prior Knowledge Network (PKN). Several recent methods using this approach, including CellNetOptimizer and its evolutions [16] [25], Caspo-ts [26], RE:IN [27], and BRE:IN [28], have shown convincing performance. ...
... The iteration over all possible functions to decide whether to select them or not is not a viable strategy. Therefore, in such decision-based methods, the space of Boolean functions is usually reduced by adding constraints like the minimality properties in Caspo-ts [26], by specifying a template for acceptable functions as in RE:IN [27] and BRE:IN [28], or by rounding off the functions to a subset of their terms [29]. Alternatively, or in addition, to the reduction of the solution space, solutions can be searched for heuristically rather than enumerated as in CellNetOptimizer [16] [25], GAPORE [24], or CGA-CNI [30]. ...
Large amounts of knowledge regarding biological processes are readily available in the literature and aggregated in diverse databases. Boolean networks are powerful tools to render that knowledge into models that can mimic and simulate biological phenomena at multiple scales. Yet, when a model is required to understand or predict the behavior of a biological system in given conditions, existing information often does not completely match this context. Networks built from only prior knowledge can overlook mechanisms, lack specificity, and just partially recapitulate experimental observations. To address this limitation, context-specific data needs to be integrated. However, the brute-force identification of qualitative rules matching these data becomes infeasible as the number of candidates explodes for increasingly complex systems. Here, we used Zhegalkin polynomials to transform this identification into a binary value assignment for exponentially fewer variables, which we addressed with a state-of-the-art SAT solver. We evaluated our implemented method alongside two widely recognized tools, CellNetOptimizer and Caspo-ts, on both artificial toy models and large-scale models based on experimental data from the HPN-DREAM challenge. Our approach demonstrated benchmark-leading capabilities on networks of significant size and intricate complexity. It thus appears promising for the in silico modeling of ever more comprehensive biological systems.
... Among the approaches for GRN inference, RE:IN [3,4,5,6] is capable of inferring interactions among numerous genes using a Boolean network modelling approach and simulating knockouts of one or two genes. However, RE:IN relied on both knockout experiments and bulk transcriptomic data, lacking single-cell resolution, and needed to include other types of data (ChIP sequencing data, named ChIP-seq, and literaturecurated interactions) to constrain the model. ...
Physiological and pathological processes are governed by a network of genes called gene regulatory networks (GRNs). By reconstructing GRNs, we can accurately model how cells behave in their natural state and predict how genetic changes will affect them. Transcriptomic data of single cells are now available for a wide range of cellular processes in multiple species. Thus, a method building predictive GRNs from single-cell RNA sequencing (scRNA-seq) data, without any additional prior knowledge, could have a great impact on our understanding of biological processes and the genes playing a key role in them. To this aim, we developed IGNITE (Inference of Gene Networks using Inverse kinetic Theory and Experiments), an unsupervised machine learning framework designed to infer directed, weighted, and signed GRNs directly from unperturbed single-cell RNA sequencing data. IGNITE uses the GRNs to generate gene expression data upon single and multiple genetic perturbations. IGNITE is based on the inverse problem for a kinetic Ising model, a model from statistical physics that has been successfully applied to biological networks. We tested IGNITE on murine pluripotent stem cells (PSCs) transitioning from the naive to formative states. Using as input only scRNA-seq data of unperturbed PSCs, IGNITE simulated single and triple gene knockouts. Comparison with experimental data revealed high accuracy, up to 74%, outperforming currently available methods. In sum, IGNITE identifies predictive GRNs from scRNA-seq data without additional prior knowledge and faithfully simulates single and multiple gene perturbations. Applications of IGNITE range from studying cell differentiation to identifying genes specifically active under pathological conditions.
... The computational modelling was performed using the reasoning engine for interaction networks (RE:IN) 17,63,96 . This approach supports the modelling of gene networks via Abstract Boolean Networks (ABNs), allowing the specification of partially known networks by specifying certain interactions as definite while other interactions are designated as possible. ...
During embryonic development, naive pluripotent epiblast cells transit to a formative state. The formative epiblast cells form a polarized epithelium, exhibit distinct transcriptional and epigenetic profiles and acquire competence to differentiate into all somatic and germline lineages. However, we have limited understanding of how the transition to a formative state is molecularly controlled. Here we used murine embryonic stem cell models to show that ESRRB is both required and sufficient to activate formative genes. Genetic inactivation of Esrrb leads to illegitimate expression of mesendoderm and extra-embryonic markers, impaired formative expression and failure to self-organize in 3D. Functionally, this results in impaired ability to generate formative stem cells and primordial germ cells in the absence of Esrrb. Computational modelling and genomic analyses revealed that ESRRB occupies key formative genes in naive cells and throughout the formative state. In so doing, ESRRB kickstarts the formative transition, leading to timely and unbiased capacity for multi-lineage differentiation.
... The closest related work on the inference of logical models with the help of model-checking methods is the framework of abstract Boolean Networks (ABN) introduced in Yordanov et al. (2016) and implemented in RE:IN by Goldfeder and Kugler (2019). ABNs are associated with experimental constraints (corresponding to a subclass of dynamical restrictions in our framework), which makes them comparable with data-informed sketches (see the supplement for details). ...
... In our approach, network sketches employ a richer logic allowing significantly more expressive specifications: steady-state behaviour (attractors), advanced reachability (e.g., monotonicity in between measurements in a time series), and a combination of both (e.g., basins of attraction). Crucially, the synthesis process of Yordanov et al. (2016) is limited to a pre-defined set of "patterns" for update functions and is, therefore, not truly exhaustive. ...
Motivation:
The problem of model inference is of fundamental importance to systems biology. Logical models (e.g., Boolean networks; BNs) represent a computationally attractive approach capable of handling large biological networks. The models are typically inferred from experimental data. However, even with a substantial amount of experimental data supported by some prior knowledge, existing inference methods often focus on a small sample of admissible candidate models only.
Results:
We propose Boolean network sketches as a new formal instrument for the inference of Boolean networks. A sketch integrates (typically partial) knowledge about the network's topology and the update logic (obtained through, e.g., a biological knowledge base or a literature search), as well as further assumptions about the properties of the network's transitions (e.g., the form of its attractor landscape), and additional restrictions on the model dynamics given by the measured experimental data. Our new BNs inference algorithm starts with an initial sketch which is extended by adding restrictions representing experimental data to a data-informed sketch and subsequently computes all BNs consistent with the data-informed sketch. Our algorithm is based on a symbolic representation and coloured model-checking. Our approach is unique in its ability to cover a broad spectrum of knowledge and efficiently produce a compact representation of all inferred BNs. We evaluate the method on a non-trivial collection of real-world and simulated data.
Availability:
All software and data are freely available as a reproducible artefact at https://doi.org/10.5281/zenodo.7688740.
Supplementary information:
Supplementary data available online through Bioinformatics.
... This approach (Dunn, 2019;Dunn et al., 2014;Yordanov et al., 2016) offered several advantages over how biological models had been constructed and analyzed previously. Together with its various extensions (Goldfeder and Kugler, 2019a;Goldfeder and Kugler, 2019b;Shavit et al., 2016) and related SMT-based methodologies, these techniques have so far been applied to study stem cell decision making Dunn et al., 2014), sea urchin development (Paoletti et al., 2014), neuron maturation (Shavit et al., 2016), epidermal commitment (Mishra et al., 2017), genetic motifs and function Kugler et al., 2018), synthetic biology (Yordanov et al., 2013b), and DNA computing (Yordanov et al., 2013a). ...
... The Reasoning Engine was implemented using the F# programming language (Harrop, 2011;Syme, 2020) with Z3 (de Moura and Bjøner, 2008) as a built-in SMT solver and includes the DSL and reasoning methodologies supporting Reasoning Engine for Interaction Networks, RE:IN (Dunn et al., 2014;Yordanov et al., 2016) (available so far only as a stand-alone tool that is currently not accessible online), RE:SIN (Shavit et al., 2016), and RE:MOTE Kugler et al., 2018), which were previously unreleased. The resulting library (Yordanov et al., 2023b) can be used to develop novel stand-alone tools and libraries using .NET or can be accessed from within Jupyter (Kluyver et al., 2016) or .NET Interactive (.NET Interactive, 2023) notebooks. ...
... The Reasoning Engine encodes a REIL program as an SMT problem using an approach inspired by Bounded Model Checking (BMC) (Biere et al., 1999;Yordanov et al., 2016). The problem variables from a THE REASONING ENGINE 1049 REIL program are encoded as SMT variables of an appropriate type, together with additional constraints to ensure that all variables are indeed in the specified ranges. ...
We present a framework called the Reasoning Engine, which implements Satisfiability Modulo Theories (SMT) based methods within a unified computational environment to address diverse biological analysis problems. The reasoning engine was used to reproduce results from key scientific studies, as well as supporting new research in stem cell biology. The framework utilizes an intermediate language for encoding partially specified discrete dynamical systems, which bridges the gap between high-level domain specific languages (DSLs) and low-level SMT solvers. We provide this framework as open source together with various biological case studies, illustrating the synthesis, enumeration, optimization and reasoning over models consistent with experimental observations to reveal novel biological insights.
... Constraint solving based methods pose the problem as a series of logical constraints, e.g. that the update functions must be consistent with steady states described in the data. These constraints are typically encoded as Boolean logic equations or in a more abstract formalism such as answer set programming (ASP) (Chevalier et al., 2020(Chevalier et al., , 2019 or satisfiability modulo theories (SMT) problems (Yordanov et al., 2016;Fisher et al., 2015). Specialized solvers then find a set of models which satisfy all the constraints specified by the data and the modeling assumptions. ...
Motivation
Many important processes in biology, such as signaling and gene regulation, can be described using logic models. These logic models are typically built to behaviorally emulate experimentally observed phenotypes, which are assumed to be steady states of a biological system. Most models are built by hand and therefore researchers are only able to consider one or perhaps a few potential mechanisms. We present a method to automatically synthesize Boolean logic models with a specified set of steady states. Our method, called MC-Boomer, is based on Monte Carlo Tree Search (MCTS), an efficient, parallel search method using reinforcement learning. Our approach enables users to constrain the model search space using prior knowledge or biochemical interaction databases, thus leading to generation of biologically plausible mechanistic hypotheses. Our approach can generate very large numbers of data-consistent models. To help develop mechanistic insight from these models, we developed analytical tools for multi-model inference and model selection. These tools reveal the key sets of interactions that govern the behavior of the models.
Results
We demonstrate that MC-Boomer works well at reconstructing randomly generated models. Then, using single time point measurements and reasonable biological constraints, our method generates hundreds of thousands of candidate models that match experimentally validated in-vivo behaviors of the Drosophila segment polarity network. Finally we outline how our multimodel analysis procedures elucidate potentially novel biological mechanisms and provide opportunities for model-driven experimental validation.
Availability
Code is available at: www.github.com/bglazer/mcboomer
... Here we present a first detailed network model for germline stem cells, that explores the specification of the cell fate in C. elegans by means of state-of-the-art formal reasoning synthesis methods, and the reasoning engine for interaction networks tool (RE:IN) (Dunn et al., 2014;Goldfeder and Kugler, 2019;Yordanov et al., 2016). RE:IN is a synthesis-based tool, that is now available as an open source data science framework (the reasoning framework) that supports scalable formal reasoning procedures combined with a user friendly interface to specify interaction network models constrained by experimental results. ...
... To investigate the dynamics of the germline genetic network and the underlying regulatory interactions, we used the reasoning framework (for more information see Yordanov et al., 2016 and the Materials and Methods section). This approach supports the modeling of gene networks via Abstract Boolean Networks (ABN). ...
... For example, if two components g1 and g2 regulate component g, the choice of the regulation condition will determine if both g1 and g2 are required to activate g (AND logical function) or either g1 or g2 are sufficient to activate g (OR logical function). In general, the current 18 regulation conditions supported in the reasoning framework take into account multiple regulators (activators and inhibitors), and define the activation of a gene as a logical function depending on the activity state (active/inactive) of these regulating components (see Yordanov et al., 2016 and Supplementary Material, Figure S1). The regulation conditions distinguish between a case where all, some or none of the activators (inhibitors) are active. ...
Computational methods and tools are a powerful complementary approach to experimental work for studying regulatory interactions in living cells and systems. We demonstrate the use of formal reasoning methods as applied to the Caenorhabditis elegans germ line, which is an accessible system for stem cell research. The dynamics of the underlying genetic networks and their potential regulatory interactions are key for understanding mechanisms that control cellular decision-making between stem cells and differentiation.We model the “stem cell fate” versus entry into the “meiotic development” pathway decision circuit in the young adult germ line based on an extensive study of published experimental data and known/hypothesized genetic interactions. We apply a formal reasoning framework to derive predictive networks for control of differentiation. Using this approach we simultaneously specify many possible scenarios and experiments together with potential genetic interactions, and synthesize genetic networks consistent with all encoded experimental observations. In silico analysis of knock-down and overexpression experiments within our model recapitulate published phenotypes of mutant animals and can be applied to make predictions on cellular decision-making. A methodological contribution of this work is demonstrating how to effectively model within a formal reasoning framework a complex genetic network with a wealth of known experimental data and constraints. We provide a summary of the steps we have found useful for the development and analysis of this model and can potentially be applicable to other genetic networks. This work also lays a foundation for developing realistic whole tissue models of the C. elegans germ line where each cell in the model will execute a synthesized genetic network.
... In a notable advancement, automated formal reasoning successfully identified a set of minimal GRNs underlying naive pluripotency in mice. Gene expression observations across multiple culture conditions were used to logically constrain possible GRN configurations, and the resulting set was able to accurately predict the outcome of 70% of new experiments [8,9]. Yet, these methods are not based on high-throughput data. ...
The increasing availability of single-cell RNA-sequencing (scRNA-seq) data from various developmental systems provides the opportunity to infer gene regulatory networks (GRNs) directly from data. Herein we describe IQCELL, a platform to infer, simulate, and study executable logical GRNs directly from scRNA-seq data. Such executable GRNs allow simulation of fundamental hypotheses governing developmental programs and help accelerate the design of strategies to control stem cell fate. We first describe the architecture of IQCELL. Next, we apply IQCELL to scRNA-seq datasets from early mouse T-cell and red blood cell development, and show that the platform can infer overall over 74% of causal gene interactions previously reported from decades of research. We will also show that dynamic simulations of the generated GRN qualitatively recapitulate the effects of known gene perturbations. Finally, we implement an IQCELL gene selection pipeline that allows us to identify candidate genes, without prior knowledge. We demonstrate that GRN simulations based on the inferred set yield results similar to the original curated lists. In summary, the IQCELL platform offers a versatile tool to infer, simulate, and study executable GRNs in dynamic biological systems.
... Here we present a first detailed network model for germline stem cells, that explores the specification of the cell fate in C. elegans by means of state-of-the-art formal reasoning synthesis methods, and the reasoning engine for interaction networks tool (RE:IN) [9,10,11]. RE:IN is a synthesis-based tool, that is now available as an open source data science framework (the reasoning framework) that supports scalable formal reasoning procedures combined with a user friendly interface to specify interaction network models constrained by experimental results. Synthesis approaches for biological modeling are becoming an important area of research and applications, see for example [12,13,14,15,16,17,18,19,20] and references within. ...
... To investigate the dynamics of this genetic network and potential regulatory interactions, we used the reasoning framework (for more information see Yordanov et al. [11] and the Materials and Methods section). This approach supports the modeling of gene networks via Abstract Boolean Networks (ABN). ...
... For example, if two components g1 and g2 regulate component g, the choice of the regulation condition will determine if both g1 and g2 are required to activate g (AND logical function) or either g1 or g2 are sufficient to activate g (OR logical function). In general, the current 18 regulation conditions supported in the reasoning framework take into account multiple regulators (activators and inhibitors), and define the activation of a gene as a logical function depending on the activity state (active/inactive) of these reg-ulating components (see [11] and Supplementary Material, Figure S1). The regulation conditions distinguish between a case where all, some or none of the activators (inhibitors) are active and have a property of monotonicity, implying that if a regulated component is active, and one of its activators switches from inactive to active, the regulated component will remain active, and similarly if a regulated component is inactive, and one of its inhibitors switches from inactive to active, the regulated component will remain inactive. ...
Computational methods and tools are a powerful complementary approach to experimental work for studying regulatory interactions in living cells and systems. We demonstrate the use of formal reasoning methods as applied to the Caenorhabditis elegans germ line, which is an accessible model system for stem cell research. The dynamics of the underlying genetic networks and their potential regulatory interactions are key for understanding mechanisms that control cellular decision-making between stem cells and differentiation.We model the "stem cell fate" versus entry into the "meiotic development" pathway decision circuit in the young adult germ line based on an extensive study of published experimental data and known/hypothesized genetic interactions. We apply a formal reasoning framework to derive predictive networks for control of differentiation. Using this approach we simultaneously specify many possible scenarios and experiments together with potential genetic interactions, and synthesize genetic networks consistent with all encoded experimental observations. In silico analysis of knock-down and overexpression experiments within our model recapitulate published phenotypes of mutant animals and can be applied to make predictions on cellular decision-making. This work lays a foundation for developing realistic whole tissue models of the C. elegans germ line where each cell in the model will execute a synthesized genetic network.
... The first is that of Reactive BNs Figueiredo and Barbosa (2018), which introduces the notion of reactive frames Gabbay and Marcelino (2009a) into BNs. The second one builds upon Abstract BNs Yordanov et al. (2016)-whereby update functions might be partially known-and provides a model checking tool for the verification of network dynamical properties Goldfeder and Kugler (2018). ...
In this work, we explore the properties of a control mechanism exerted on random Boolean networks that takes inspiration from the methylation mechanisms in cell differentiation and consists in progressively freezing (i.e. clamping to 0) some nodes of the network. We study the main dynamical properties of this mechanism both theoretically and in simulation. In particular, we show that when applied to random Boolean networks, it makes it possible to attain dynamics and path dependence typical of biological cells undergoing differentiation.