Figure 4 - uploaded by Hillel Kugler

Content may be subject to copyright.

# Studying the biological program governing myeloid progenitor differentiation. (a) The differentiation of a common myeloid progenitor towards four different blood cell types is considered. (b) The network topology proposed by Krumsiek et al. (c) The set of experimental observations indicates that, starting from the progenitor cellular state (step 0), each state characterizing a different cell type is reached after 20 steps and the system stabilizes (indicated by a star). The megakaryocyte GATA-2 was observed as active in experiments but was inactive in the model from Krumsiek et al. (red box). (d) 15 of the possible interactions were identified as required (solid red and green arrows) and 2 were identified as disallowed (solid black arrows) in the cABN satisfying the constraints in c. (e) If all interactions from the original model in b are considered as definite, the correct expression of megakaryocyte GATA-2 can be achieved by including one of 12 possible interactions. (f) The experimental constraints are modified to specify that the cell-fate decision is made in response to whether the hypothetical signals X and Y are present or not. (g) Two minimal models are identified when considering the hypothetical signals. Three novel interactions (signal X activating Fli1, signal Y activating EKLF and Fli1 activating GATA-2) appear in both models. In the first minimal model Y represses Gfi1, while in the second this signal activates cjun.

Source publication

Predictive biology is elusive because rigorous, data-constrained, mechanistic models of complex biological systems are difficult to derive and validate. Current approaches tend to construct and examine static interaction network models, which are descriptively rich, but often lack explanatory and predictive power, or dynamic models that can be simu...

## Contexts in source publication

**Context 1**

... the first minimal model Y represses Gfi1, while in the second this signal activates cjun. Myeloid progenitor differentiation To model myeloid progenitor differentiation (Figure 4a), Krumsiek et al. 12 constructed an asynchronous BN of 11 regulators and 28 interactions based on the literature (Figure 4b). By directly exploring the 2 11 = 2,048 nodes of the state-transition graph, four stable states (attractors) were shown to be reachable from a common progenitor state. ...

**Context 2**

... the first minimal model Y represses Gfi1, while in the second this signal activates cjun. Myeloid progenitor differentiation To model myeloid progenitor differentiation (Figure 4a), Krumsiek et al. 12 constructed an asynchronous BN of 11 regulators and 28 interactions based on the literature (Figure 4b). By directly exploring the 2 11 = 2,048 nodes of the state-transition graph, four stable states (attractors) were shown to be reachable from a common progenitor state. ...

**Context 3**

... first studied this proposed network topology (Figure 4b). The specified update functions named regulators for each component, and so we instead applied our regulation conditions, assuming at least one activator is required for component activation ( Figure 2). ...

**Context 4**

... specified update functions named regulators for each component, and so we instead applied our regulation conditions, assuming at least one activator is required for component activation ( Figure 2). We employed an asynchronous update strategy, and used the gene expression patterns of the 5 cell types as observations (Figures 4a and c). RE:IN identified that these constraints are satisfiable, despite our use of potentially different regulation rules. ...

**Context 5**

... we correct the constraint that GATA-2 is active in megakaryocytes, as observed experimentally, 12 no consistent models exist. This is not the case if every interaction is marked as possible, and under this scenario we identified that to reproduce the observed behavior, 15 interactions are required and 2 are disallowed (Figure 4d). However, previous experimental evidence supports the inclusion of these two disallowed interactions. ...

**Context 6**

... investigate this, we constructed an ABN by setting the interactions from Krumsiek et al. as definite and adding all other interactions (activation and repression between each pair of components) as possible. Identifying the minimal networks in this case reveals that the observations can be reproduced with only one additional interaction (Figure 4e). Our results suggest 12 candidate interactions, at least 3 of which (Fli1 to GATA-2, SCL to GATA-2, Gfi1 to GATA-1) are consistent with interactions reported elsewhere. ...

**Context 7**

... alternative approach, consistent with our view of biological programs, would be to describe this decision as the result of the deterministic information processing of a number of inputs (e.g., cytokines) that regulate haematopoiesis. 25 To illustrate this, we considered two hypothetical signals (X and Y) that deterministically specify cell fate (Figure 4f), and employed synchronous updates. Once set, the signals remain unchanged, but their effects can propagate throughout the network over a number of updates. ...

**Context 8**

... set, the signals remain unchanged, but their effects can propagate throughout the network over a number of updates. With no prior knowledge of how such signals could input to the network, we included a possible positive and negative interaction from each signal to every component of the network, while again consider- ing all original interactions as definite, and the 12 interactions from Figure 4e as possible. We then identified that there are only two minimal models (Figure 4g). ...

**Context 9**

... no prior knowledge of how such signals could input to the network, we included a possible positive and negative interaction from each signal to every component of the network, while again consider- ing all original interactions as definite, and the 12 interactions from Figure 4e as possible. We then identified that there are only two minimal models (Figure 4g). In both, Fli1 activates GATA-2, and signals X and Y activate Fli1 and EKLF, respectively. ...

**Context 10**

... we considered deterministic myeloid differentiation with signals X and Y (Figure 4f). Analysis using caspo led to memory errors, potentially caused by the complexity of this system. ...

**Context 11**

... using caspo led to memory errors, potentially caused by the complexity of this system. Therefore we simplified the ABN by preserving only 2 of the additional possible interactions (Figure 4e, SCL and Fli each activate GATA2) and considered all interactions between X and Y and the four components EKLF, Fli1, cjun and Gfi1 as possible (Supplementary Figure S1). ...

**Context 12**

... on this reduced model, brute-force simulation failed to identify a single valid model in over 5 days of computation, while RE:IN identified 2 minimal models in ~ 7 s (Figure 4g). In contrast, caspo identified 264 minimal models in about 5 s. ...

**Context 13**

... property can be used to model self-degrading (self-activating) signals, which are active (inactive) only during the initial state of an experiment (in the case of the yeast cell cycle model, we used the delayed threshold rule for this purpose). To ensure that an input signal is sustained throughout each experiment (either as active or inactive depending on the initial value) we include a single definite self-activation (Supplementary Figure S4c). Alternatively, an oscillating signal can be modeled by including a single definite self-repression interaction (Supplementary Figure S4d). ...

**Context 14**

... ensure that an input signal is sustained throughout each experiment (either as active or inactive depending on the initial value) we include a single definite self-activation (Supplementary Figure S4c). Alternatively, an oscillating signal can be modeled by including a single definite self-repression interaction (Supplementary Figure S4d). ...

## Similar publications

A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - wh...

## Citations

... The computational modelling was performed using the reasoning engine for interaction networks (RE:IN) 17,63,96 . This approach supports the modelling of gene networks via Abstract Boolean Networks (ABNs), allowing the specification of partially known networks by specifying certain interactions as definite while other interactions are designated as possible. ...

During embryonic development, naive pluripotent epiblast cells transit to a formative state. The formative epiblast cells form a polarized epithelium, exhibit distinct transcriptional and epigenetic profiles and acquire competence to differentiate into all somatic and germline lineages. However, we have limited understanding of how the transition to a formative state is molecularly controlled. Here we used murine embryonic stem cell models to show that ESRRB is both required and sufficient to activate formative genes. Genetic inactivation of Esrrb leads to illegitimate expression of mesendoderm and extra-embryonic markers, impaired formative expression and failure to self-organize in 3D. Functionally, this results in impaired ability to generate formative stem cells and primordial germ cells in the absence of Esrrb. Computational modelling and genomic analyses revealed that ESRRB occupies key formative genes in naive cells and throughout the formative state. In so doing, ESRRB kickstarts the formative transition, leading to timely and unbiased capacity for multi-lineage differentiation.

... The closest related work on the inference of logical models with the help of model-checking methods is the framework of abstract Boolean Networks (ABN) introduced in Yordanov et al. (2016) and implemented in RE:IN by Goldfeder and Kugler (2019). ABNs are associated with experimental constraints (corresponding to a subclass of dynamical restrictions in our framework), which makes them comparable with data-informed sketches (see the supplement for details). ...

... In our approach, network sketches employ a richer logic allowing significantly more expressive specifications: steady-state behaviour (attractors), advanced reachability (e.g., monotonicity in between measurements in a time series), and a combination of both (e.g., basins of attraction). Crucially, the synthesis process of Yordanov et al. (2016) is limited to a pre-defined set of "patterns" for update functions and is, therefore, not truly exhaustive. ...

Motivation:
The problem of model inference is of fundamental importance to systems biology. Logical models (e.g., Boolean networks; BNs) represent a computationally attractive approach capable of handling large biological networks. The models are typically inferred from experimental data. However, even with a substantial amount of experimental data supported by some prior knowledge, existing inference methods often focus on a small sample of admissible candidate models only.
Results:
We propose Boolean network sketches as a new formal instrument for the inference of Boolean networks. A sketch integrates (typically partial) knowledge about the network's topology and the update logic (obtained through, e.g., a biological knowledge base or a literature search), as well as further assumptions about the properties of the network's transitions (e.g., the form of its attractor landscape), and additional restrictions on the model dynamics given by the measured experimental data. Our new BNs inference algorithm starts with an initial sketch which is extended by adding restrictions representing experimental data to a data-informed sketch and subsequently computes all BNs consistent with the data-informed sketch. Our algorithm is based on a symbolic representation and coloured model-checking. Our approach is unique in its ability to cover a broad spectrum of knowledge and efficiently produce a compact representation of all inferred BNs. We evaluate the method on a non-trivial collection of real-world and simulated data.
Availability:
All software and data are freely available as a reproducible artefact at https://doi.org/10.5281/zenodo.7688740.
Supplementary information:
Supplementary data available online through Bioinformatics.

... genetic interactions between various transcription factors) that had been uncovered through years of painstaking experimentation, together with various constraints encoding experimental observations about the activity of genes and transcription factors in different chemical contexts. By encoding computational analysis queries as Satisfiability Modulo Theories (SMT) problems, solvable using off-the-shelf tools [4,33], the resulting methodology [41] allowed consistent models to be synthesized, and the behavior of a system under untested experimental conditions to be predicted [14]. ...

... This approach [12,14,15,41] offered several advantages over how biological models had been constructed and analyzed previously. Together with its various extensions [17,38] and related SMT-based methodologies, these techniques have so far been applied to study stem cell decision-making [13,14], sea urchin development [34], neuron maturation [38], epidermal commitment [32], genetic motifs and function [11,26], synthetic biology [42] and DNA computing [43]. ...

... The Reasoning Engine was implemented using F# [20,40] with Z3 [33] as a built-in SMT-solver and includes the DSL and reasoning methodologies supporting RE:IN [14,41] (available so far only as a stand-alone tool), RE:SIN [38] and RE:MOTE [11,26], which were previously unreleased. The resulting library [44] can be used to develop novel stand-alone tools and libraries using .NET or can be accessed from within Jupyter [22] or .NET Interactive [1] notebooks. ...

We present a framework called the Reasoning Engine, which implements Satisfiability Modulo Theories (SMT) based methods within a unified computational environment to address diverse biological analysis problems. The reasoning engine was used to reproduce results from key scientific studies, as well as supporting new research in stem cell biology. The framework utilizes an intermediate language for encoding partially specified discrete dynamical systems, which bridges the gap between high-level domain specific languages (DSLs) and low-level SMT solvers. We provide this framework as open source together with various biological case studies, illustrating the synthesis, enumeration, optimization and reasoning over models consistent with experimental observations to reveal novel biological insights.

... Constraint solving based methods pose the problem as a series of logical constraints, e.g. that the update functions must be consistent with steady states described in the data. These constraints are typically encoded as Boolean logic equations or in a more abstract formalism such as answer set programming (ASP) (Chevalier et al., 2020(Chevalier et al., , 2019 or satisfiability modulo theories (SMT) problems (Yordanov et al., 2016;Fisher et al., 2015). Specialized solvers then find a set of models which satisfy all the constraints specified by the data and the modeling assumptions. ...

Motivation
Many important processes in biology, such as signaling and gene regulation, can be described using logic models. These logic models are typically built to behaviorally emulate experimentally observed phenotypes, which are assumed to be steady states of a biological system. Most models are built by hand and therefore researchers are only able to consider one or perhaps a few potential mechanisms. We present a method to automatically synthesize Boolean logic models with a specified set of steady states. Our method, called MC-Boomer, is based on Monte Carlo Tree Search (MCTS), an efficient, parallel search method using reinforcement learning. Our approach enables users to constrain the model search space using prior knowledge or biochemical interaction databases, thus leading to generation of biologically plausible mechanistic hypotheses. Our approach can generate very large numbers of data-consistent models. To help develop mechanistic insight from these models, we developed analytical tools for multi-model inference and model selection. These tools reveal the key sets of interactions that govern the behavior of the models.
Results
We demonstrate that MC-Boomer works well at reconstructing randomly generated models. Then, using single time point measurements and reasonable biological constraints, our method generates hundreds of thousands of candidate models that match experimentally validated in-vivo behaviors of the Drosophila segment polarity network. Finally we outline how our multimodel analysis procedures elucidate potentially novel biological mechanisms and provide opportunities for model-driven experimental validation.
Availability
Code is available at: www.github.com/bglazer/mcboomer

... Here we present a first detailed network model for germline stem cells, that explores the specification of the cell fate in C. elegans by means of state-of-the-art formal reasoning synthesis methods, and the reasoning engine for interaction networks tool (RE:IN) (Dunn et al., 2014;Goldfeder and Kugler, 2019;Yordanov et al., 2016). RE:IN is a synthesis-based tool, that is now available as an open source data science framework (the reasoning framework) that supports scalable formal reasoning procedures combined with a user friendly interface to specify interaction network models constrained by experimental results. ...

... To investigate the dynamics of the germline genetic network and the underlying regulatory interactions, we used the reasoning framework (for more information see Yordanov et al., 2016 and the Materials and Methods section). This approach supports the modeling of gene networks via Abstract Boolean Networks (ABN). ...

... For example, if two components g1 and g2 regulate component g, the choice of the regulation condition will determine if both g1 and g2 are required to activate g (AND logical function) or either g1 or g2 are sufficient to activate g (OR logical function). In general, the current 18 regulation conditions supported in the reasoning framework take into account multiple regulators (activators and inhibitors), and define the activation of a gene as a logical function depending on the activity state (active/inactive) of these regulating components (see Yordanov et al., 2016 and Supplementary Material, Figure S1). The regulation conditions distinguish between a case where all, some or none of the activators (inhibitors) are active. ...

Computational methods and tools are a powerful complementary approach to experimental work for studying regulatory interactions in living cells and systems. We demonstrate the use of formal reasoning methods as applied to the Caenorhabditis elegans germ line, which is an accessible system for stem cell research. The dynamics of the underlying genetic networks and their potential regulatory interactions are key for understanding mechanisms that control cellular decision-making between stem cells and differentiation.We model the “stem cell fate” versus entry into the “meiotic development” pathway decision circuit in the young adult germ line based on an extensive study of published experimental data and known/hypothesized genetic interactions. We apply a formal reasoning framework to derive predictive networks for control of differentiation. Using this approach we simultaneously specify many possible scenarios and experiments together with potential genetic interactions, and synthesize genetic networks consistent with all encoded experimental observations. In silico analysis of knock-down and overexpression experiments within our model recapitulate published phenotypes of mutant animals and can be applied to make predictions on cellular decision-making. A methodological contribution of this work is demonstrating how to effectively model within a formal reasoning framework a complex genetic network with a wealth of known experimental data and constraints. We provide a summary of the steps we have found useful for the development and analysis of this model and can potentially be applicable to other genetic networks. This work also lays a foundation for developing realistic whole tissue models of the C. elegans germ line where each cell in the model will execute a synthesized genetic network.

... Here we present a first detailed network model for germline stem cells, that explores the specification of the cell fate in C. elegans by means of state-of-the-art formal reasoning synthesis methods, and the reasoning engine for interaction networks tool (RE:IN) [9,10,11]. RE:IN is a synthesis-based tool, that is now available as an open source data science framework (the reasoning framework) that supports scalable formal reasoning procedures combined with a user friendly interface to specify interaction network models constrained by experimental results. Synthesis approaches for biological modeling are becoming an important area of research and applications, see for example [12,13,14,15,16,17,18,19,20] and references within. ...

... To investigate the dynamics of this genetic network and potential regulatory interactions, we used the reasoning framework (for more information see Yordanov et al. [11] and the Materials and Methods section). This approach supports the modeling of gene networks via Abstract Boolean Networks (ABN). ...

... For example, if two components g1 and g2 regulate component g, the choice of the regulation condition will determine if both g1 and g2 are required to activate g (AND logical function) or either g1 or g2 are sufficient to activate g (OR logical function). In general, the current 18 regulation conditions supported in the reasoning framework take into account multiple regulators (activators and inhibitors), and define the activation of a gene as a logical function depending on the activity state (active/inactive) of these reg-ulating components (see [11] and Supplementary Material, Figure S1). The regulation conditions distinguish between a case where all, some or none of the activators (inhibitors) are active and have a property of monotonicity, implying that if a regulated component is active, and one of its activators switches from inactive to active, the regulated component will remain active, and similarly if a regulated component is inactive, and one of its inhibitors switches from inactive to active, the regulated component will remain inactive. ...

Computational methods and tools are a powerful complementary approach to experimental work for studying regulatory interactions in living cells and systems. We demonstrate the use of formal reasoning methods as applied to the Caenorhabditis elegans germ line, which is an accessible model system for stem cell research. The dynamics of the underlying genetic networks and their potential regulatory interactions are key for understanding mechanisms that control cellular decision-making between stem cells and differentiation.We model the "stem cell fate" versus entry into the "meiotic development" pathway decision circuit in the young adult germ line based on an extensive study of published experimental data and known/hypothesized genetic interactions. We apply a formal reasoning framework to derive predictive networks for control of differentiation. Using this approach we simultaneously specify many possible scenarios and experiments together with potential genetic interactions, and synthesize genetic networks consistent with all encoded experimental observations. In silico analysis of knock-down and overexpression experiments within our model recapitulate published phenotypes of mutant animals and can be applied to make predictions on cellular decision-making. This work lays a foundation for developing realistic whole tissue models of the C. elegans germ line where each cell in the model will execute a synthesized genetic network.

... The first is that of Reactive BNs Figueiredo and Barbosa (2018), which introduces the notion of reactive frames Gabbay and Marcelino (2009a) into BNs. The second one builds upon Abstract BNs Yordanov et al. (2016)-whereby update functions might be partially known-and provides a model checking tool for the verification of network dynamical properties Goldfeder and Kugler (2018). ...

In this work, we explore the properties of a control mechanism exerted on random Boolean networks that takes inspiration from the methylation mechanisms in cell differentiation and consists in progressively freezing (i.e. clamping to 0) some nodes of the network. We study the main dynamical properties of this mechanism both theoretically and in simulation. In particular, we show that when applied to random Boolean networks, it makes it possible to attain dynamics and path dependence typical of biological cells undergoing differentiation.

... Gene expression observations across multiple culture conditions were used to logically constrain possible GRN configurations, and the resulting set was able to accurately predict the outcome of 70% of new experiments(S.-J. Dunn et al., 2014;Yordanov et al., 2016). Yet, these methods are not based on high-throughput data. ...

The increasing availability of single-cell RNA-sequencing (scRNA-seq) data from various developmental systems provides the opportunity to infer gene regulatory networks (GRNs) directly from data. Herein we describe IQCELL , a platform to infer, simulate, and study executable logical GRNs directly from scRNA-seq data. Such executable GRNs provide an opportunity to inform fundamental hypotheses in developmental programs and help accelerate the design of stem cell-based technologies. We first describe the architecture of IQCELL. Next, we apply IQCELL to a scRNA-seq dataset of early mouse T-cell development and show that it can infer a priori over 75% of causal gene interactions previously reported via decades of research. We will also show that dynamic simulations of the derived GRN qualitatively recapitulate the effects of the known gene perturbations on the T-cell developmental trajectory. IQCELL is applicable to many developmental systems and offers a versatile tool to infer, simulate, and study GRNs in biological systems. (https://gitlab.com/stemcellbioengineering/iqcell)

... Gene expression observations across multiple culture conditions were used to logically constrain possible GRN configurations, and the resulting set was able to accurately predict the outcome of 70% of new experiments(S.-J. Dunn et al., 2014;Yordanov et al., 2016). Yet, these methods are not based on high-throughput data. ...

The increasing availability of single-cell RNA-sequencing (scRNA-seq) data from various developmental systems provides the opportunity to infer gene regulatory networks (GRNs) directly from data. Herein we describe IQCELL, a platform to infer, simulate, and study executable logical GRNs directly from scRNA-seq data. Such executable GRNs provide an opportunity to inform fundamental hypotheses in developmental programs and help accelerate the design of stem cell-based technologies. We first describe the architecture of IQCELL. Next, we apply IQCELL to a scRNA-seq dataset of early mouse T-cell development and show that it can infer a priori over 75% of causal gene interactions previously reported via decades of research. We will also show that dynamic simulations of the derived GRN qualitatively recapitulate the effects of the known gene perturbations on the T-cell developmental trajectory. IQCELL is applicable to many developmental systems and offers a versatile tool to infer, simulate, and study GRNs in biological systems.

... One of the prominent areas of application of SAT and SMT solvers is modeling and understanding gene regulatory networks (GRN) [28,33,49,63,70,89]. The gene regulatory network question is encoded either directly as a Boolean satisfiability problem or in addition with combinations of background theories. ...

... Whereas, [28] uses a symbolic, SAT-based approach on the Boolean network model. [33,89] presents an SMT based encoding of the problem based on the presented Boolean model. ...

Vesicle traffic systems (VTSs) transport cargo among the intracellular compartments of eukaryotic cells. The compartments are viewed as nodes that are labeled by their chemical identity and the transport vesicles are similarly viewed as labeled edges between the nodes. Several interesting questions about VTSs translate to combinatorial search and synthesis problems. We present novel encodings for the problems based on Boolean satisfiability (SAT), satisfiability modulo theories and quantified Boolean formula of the properties over vesicle traffic systems. We have implemented the presented encodings in a tool that searches for the networks that satisfy properties related to transport consistency conditions using these solvers. In our numerical experiments, we show that our tool can search for networks of sizes that are relevant to real cellular systems. Our work illustrates the potential of novel biological applications of SAT solving technology.