ArticlePDF Available

TopoFilter: a MATLAB package for mechanistic model identification in systems biology

Authors:

Abstract and Figures

Background: To develop mechanistic dynamic models in systems biology, one often needs to identify all (or minimal) representations of the biological processes that are consistent with experimental data, out of a potentially large set of hypothetical mechanisms. However, a simple enumeration of all alternatives becomes quickly intractable when the number of model parameters grows. Selecting appropriate dynamic models out of a large ensemble of models, taking the uncertainty in our biological knowledge and in the experimental data into account, is therefore a key current problem in systems biology. Results: The TopoFilter package addresses this problem in a heuristic and automated fashion by implementing the previously described topological filtering method for Bayesian model selection. It includes a core heuristic for searching the space of submodels of a parametrized model, coupled with a sampling-based exploration of the parameter space. Recent developments of the method allow to balance exhaustiveness and speed of the model space search, to efficiently re-sample parameters, to parallelize the search, and to use custom scoring functions. We use a theoretical example to motivate these features and then demonstrate TopoFilter's applicability for a yeast signaling network with more than 250'000 possible model structures. Conclusions: TopoFilter is a flexible software framework that makes Bayesian model selection and reduction efficient and scalable to network models of a complexity that represents contemporary problems in, for example, cell signaling. TopoFilter is open-source, available under the GPL-3.0 license at https://gitlab.com/csb.ethz/TopoFilter. It includes installation instructions, a quickstart guide, a description of all package options, and multiple examples.
TopoFilter method. a Parameter space P=p1min,p1max×p2min,p2max\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {P}=\left [p_{1}^{\text {min}},p_{1}^{\text {max}}\right ]\times \left [p_{2}^{\text {min}},p_{2}^{\text {max}}\right ]$\end{document} and viable subspace P~\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${\tilde {P}}$\end{document} (gray). Sample parameter points (black dots), when projected to zero separately for the two coordinates (arrows), yield viable (green) or non-viable (red) reductions. Colored lines at the axes and a point at the origin denote viable (green) and non-viable (red) lower-dimensional subspaces. b Topological filtering step with a rank 1 exhaustive search, for a viable (green) model ℳ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {M}$\end{document} with d~=4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\tilde {d}=4$\end{document} reducible parameters; ℳI\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {M}_{I}$\end{document} (ℳ∖I\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {M}_{\setminus I}$\end{document}) denotes the model with (without) reducible parameters I⊆1,…,d~\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$I\subseteq \left \{1,\ldots,\tilde {d}\right \}$\end{document}. For a single viable parameter sample p, all rank 1 parameter reductions (1P) are tested for viability. A union of the three viable 1P reductions skips over the 2P reduction candidates (gray) and goes directly to a single 3P reduction (blue) for viability test. The remaining subspace of models (white) induced by the non-viable 1P reduction (red) is pruned from testing for the current parameter sample. Reductions that have been skipped (gray, white) may still be tested using another parameter sample or in a recursive step
… 
This content is subject to copyright. Terms and conditions apply.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34
https://doi.org/10.1186/s12859-020-3343-y
SOFTWARE Open Access
TopoFilter: a MATLAB package for
mechanistic model identification in systems
biology
Mikołaj Rybi´
nski1,2 , Simon Möller1, Mikael Sunnåker1, Claude Lormeau1,3 and Jörg Stelling1*
Abstract
Background: To develop mechanistic dynamic models in systems biology, one often needs to identify all (or
minimal) representations of the biological processes that are consistent with experimental data, out of a potentially
large set of hypothetical mechanisms. However, a simple enumeration of all alternatives becomes quickly intractable
when the number of model parameters grows. Selecting appropriate dynamic models out of a large ensemble of
models, taking the uncertainty in our biological knowledge and in the experimental data into account, is therefore a
key current problem in systems biology.
Results: The TopoFilter package addresses this problem in a heuristic and automated fashion by implementing the
previously described topological filtering method for Bayesian model selection. It includes a core heuristic for
searching the space of submodels of a parametrized model, coupled with a sampling-based exploration of the
parameter space. Recent developments of the method allow to balance exhaustiveness and speed of the model
space search, to efficiently re-sample parameters, to parallelize the search, and to use custom scoring functions. We
use a theoretical example to motivate these features and then demonstrate TopoFilter’s applicability for a yeast
signaling network with more than 250’000 possible model structures.
Conclusions: TopoFilter is a flexible software framework that makes Bayesian model selection and reduction efficient
and scalable to network models of a complexity that represents contemporary problems in, for example, cell
signaling. TopoFilter is open-source, available under the GPL-3.0 license at https://gitlab.com/csb.ethz/TopoFilter.It
includes installation instructions, a quickstart guide, a description of all package options, and multiple examples.
Keywords: Ensemble modeling, Bayesian model selection, Topological filtering, Signal transduction
Background
Uncertainty poses a key challenge for developing pre-
dictive models in systems biology [1]. One challenge,
parameter inference for systems biology models, has
seen important progress in the development and imple-
mentation of computational methods that scale to real-
world problems [2,3]. In particular, given that systems
biology model parameters are often not uniquely iden-
tifiable with the available experimental data, ensemble
modeling approaches have gained attention. They repre-
sent quantitative uncertainties of biology not by a single
*Correspondence: joerg.stelling@bsse.ethz.ch
1Department of Biosystems Science and Engineering and SIB Swiss Institute of
Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
Full list of author information is available at the end of the article
parametrization of a model, but by ensembles of param-
eter values, for applications in areas such as cell sig-
naling [4,5] and metabolic network analysis [6,7]. The
corresponding methods differ in important details, such
as how parameter ensembles are generated; constrained
multi-objective optimization [8] and random sampling
[9] are possibilities. Bayesian methods such as Approxi-
mate Bayesian Computation (ABC) [1012], a simulation-
based method for approximating the Bayesian posterior
in parameter space and thereby systematically quanti-
fying uncertainties (in model parameters) [4], are key
techniques for model analysis in this context.
These approaches, however, address only one part of the
problem in that they assume the underlying network to be
uniquely determined. Often also the mechanisms of inter-
© The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 2 of 12
actions (the model topology) are uncertain and need to
be identified by combining known mechanisms, biologi-
cal hypotheses, and experimental data. This is a pertinent
problem, for instance, for cell signaling studies [13]. If
there are few competing hypotheses on mechanisms—
leading to few possible model topologies—they can be
enumerated and, for example, one can apply ABC to each
model topology to select the topology that is most con-
sistent with the data [10,1417]. Such Bayesian model
selection has been successful for elucidating mechanisms
of mammalian epidermal growth factor (EGF) [18]and
target of rapamycin (TOR) [19] signaling and of gene
regulatory networks in yeast nutrient sensing [20].
With many biological hypotheses, however, the number
of possible model topologies explodes in a combinato-
rial fashion, making enumeration infeasible. To perform
model selection in hypothesis spaces with hundreds or
thousands of alternatives without full enumeration, three
classes of approaches have been proposed: First, it is
possible to use simpler, qualitative models to represent
alternative biological hypotheses [21,22], but in this case
the quantitative characteristics of the modeled system
are not represented. A second option is to combine effi-
cient search in the space of model topologies by formu-
lating a mixed integer optimization problem [23]orby
using heuristics to generate candidate topologies [24]with
optimization-based parameter estimation. However, this
leads to single point estimates for model parameters that
do not necessarily reflect the parameter uncertainty, and
criteria for model selection that one can use with such
point estimates are only justified asymptotically for large
numbers of data points, which is rarely the case [25]. The
third alternative, ABC for model selection, circumvents
these limitations by sampling parameters and topologies
(which are again encoded as integers) jointly [10,15,16],
but with high computational effort and very limited scal-
ability. In particular, these ABC-based methods do not
exploit that candidate topologies may be related to each
other, preventing a re-use of samples that require costly
model simulations between model topologies.
To enable more efficient and scalable Bayesian model
selection, we previously proposed a method termed topo-
logical filtering [26]. While the original method consti-
tuted a first assessment of the basic idea, here we describe
an implementation in the TopoFilter package that gener-
alizes to a variety of applications in systems and synthetic
biology, makes the method directly usable in the form
of a well-documented toolbox, and includes new features
compared to the version in [26].
Implementation
Principle of topological filtering
Biochemical reaction networks, composed of species and
reactions that couple them, are the key basis for develop-
ing (dynamic) systems biology models. To capture how l
molecular species interact via mreactions, we consider a
parametric model M(p)with dmreal-valued parame-
ters pin a bounded parameter space P.Ofteninsystems
biology applications, such a parametric model is given in
the form of a system of ordinary differential equations
(ODEs):
dx(t)
dt =S·v(x(t);p),x(0)=x0,
where the state vector x(t)0 is a time-dependent vec-
torofconcentrationsofthelspecies. The time-invariant
stoichiometric matrix S, which captures how the lspecies
interact via the mreactions, and the reaction rate laws
encoded in the non-negative vector function v, which
depends on the dmparameters p, together define the
topology of the model M(p).
The model generates predictions y(p)corresponding to
experimental data y0with known measurement errors σ.
A scoring function s decides on whether, for a fixed p,
M(p)is viable—if it describes the data sufficiently well.
Correspondingly, we define the viable subspace of the
parameter space as
P=p
sp;y0s0y0,wheres
0
is a model-independent viability threshold (Fig. 1a).
To identify model topologies that are consistent with the
data, topological filtering defines a root model based on
a network that includes the confirmed reactions as well
as all hypothetical reactions, and then finds viable reduc-
tions of this model. The key idea is to re-formulate model
reductions as projections in parameter space (Fig. 1a). A
rate law of a biochemical reaction jmis of the form
vj(x;p)=pj·rj(x;pm+1,...,pd),withx0 the species
concentrations and rjscalar functions corresponding to
the concentration-dependent terms [27]. By projecting a
multiplicative kinetic constant pjto pj=0, we elimi-
nate reaction j. Additional parameters pm+1,...,pdmay
be projected to different values. For example, one could
project a Michaelis-Menten constant to infinity to remove
an enzyme-catalyzed reaction.
For the confirmed reactions in the root model, which
represent well-established mechanisms, the associated
parameters are non-removable (while they may assume
different values, they cannot be projected). For the hypo-
thetical reactions, we consider projections of any subset of
their associated ˜
ddparameters to given projection val-
ues, and each of these subsets defines a submodel.Because
the values of all dparameters are uncertain (parame-
ters of known values are not part of the problem), we
not only need to search the topology space of 2dcandi-
date submodels (Fig. 1b), but also the parameter space of
each submodel. Topological filtering achieves this by fil-
tering candidate reductions and by exploring their lower-
dimensional parameter spaces with an efficient sampling
algorithm [9].
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 3 of 12
ab
Fig. 1 TopoFilter method. aParameter space P=pmin
1,pmax
1×pmin
2,pmax
2and viable subspace ˜
P(gray). Sample parameter points (black dots),
when projected to zero separately for the two coordinates (arrows), yield viable (green) or non-viable (red) reductions. Colored lines at the axes and
a point at the origin denote viable (green) and non-viable (red) lower-dimensional subspaces. bTopological filtering step with a rank 1 exhaustive
search, for a viable (green) model Mwith ˜
d=4 reducible parameters; MI(M\I) denotes the model with (without) reducible parameters
I1, ...,˜
d. For a single viable parameter sample p, all rank 1 parameter reductions (1P) are tested for viability. A union of the three viable 1P
reductions skips over the 2P reduction candidates (gray) and goes directly to a single 3P reduction (blue) for viability test. The remaining subspace
of models (white) induced by the non-viable 1P reduction (red) is pruned from testing for the current parameter sample. Reductions that have been
skipped (gray, white) may still be tested using another parameter sample or in a recursive step
TopoFilter algorithm
Here, we describe the original algorithm for topological
filtering in [26], and focus on the new features in the
subsequent sections (see also Fig. 2).
Topological filtering starts with the root model (that
is, the most complex model without any parameter elim-
inated, denoted as M\{}) and one sample in the viable
parameter space for this model (or for short: one viable
sample). The initial viable sample can be obtained by stan-
dard parameter estimation methods. Starting from such
a single viable sample, the original implementation uses a
combination of out-of-equilibrium adaptive Monte Carlo
(OEAMC) [28] sampling and multiple ellipsoid based
sampling (MEBS) [9] to explore the parameter space of the
root model. For each sample, the algorithm proceeds by
testing if single-parameter projections are viable, thereby
identifying possible single-parameter (1P) reductions of
the root model (submodels M\{1},M\{2},andM\{3}in
Fig. 1b). Subsequently, a union of 1P reductions resulting
from a sample is tested for viability. The 1P and union
reduced models with their associated viable parameter
samples serve as starting points for the next iterations
of the search, until no further reductions are possible.
Thus, TopoFilter explores the space of models by reducing
viability testing to possibly distant descendants of viable
submodels.
New features
To further develop the method since its first implemen-
tation [26], we focused on the scope of applications, the
accuracy of the model search, and the computational
efficiency. More specifically, the current implementation
of TopoFilter includes: (i) customizable scoring func-
tions that enable applications beyond model inference; (ii)
simultaneous reductions of several parameters to obtain a
viable submodel (see “Results and discussion”sectionfor
an example where this feature is important); (iii) adaptive
re-sampling to avoid situations in which viable submod-
els are not detected because the parameter samples from
the root model are thinned out during the iterations over
the model space; (iv) efficient search heuristics over the
model space for cases where we are primarily interested
in finding maximal reductions (the most compact sub-
models that are still consistent with the experimental
observations); and (v) more comprehensive options for
parallelization, which can avoid redundant searches over
model and parameter subspaces. Relevant changes to the
algorithm are highlighted in Fig. 2.
Customizable scoring functions
By default, we assume uncorrelated and normally dis-
tributed errors E=y(p)y0N(0,diagσ).The
residuals sum-squared error =ETdiag(σ)1Ethen are
χ2-distributed. From the distribution of E,thenegative
log-likelihood of the data given the model is =/2+C,
with C=ln (2π)kσiand kthenumberofmea-
sured data. TopoFilter uses the default scoring function
sand a threshold s0y0based on quantiles of the χ2
distribution with kdegrees of freedom, an upper bound
in the standard goodness of fit test for model evaluation.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 4 of 12
Fig. 2 TopoFilter pseudo-code with an implementation outline. Note that, because of an option to re-sample and save viable points for all found
projections, preparing points comes actually after the for IIn1loop, in a separate for IInloop, and during initialization. Parts in red highlight
differences to the original algorithm [26] such as choice of enumeration level, custom scoring and threshold functions, automatic (tail) recursion,
and adaptive resampling from whole (representative) sets of previous samples
The original scoring function from [26], which relied on
a threshold derived as an expected value plus two stan-
dard deviations over all data points of the (unknown) true
model and its true parameterization, remains available. In
addition, TopoFilter supports custom scoring functions,
including likelihood-free functions, for example, to enable
model-based design in synthetic biology. In such appli-
cations, one can score a model according to a desired
circuit performance (for example, the ability to adapt to an
external signal), without providing experimental data [29].
Variable-order projections and parameter coupling
We denote the number of parameters simultaneously
tested for projection as the rank rof a reduction (relative
to the (sub)model the projection is applied to). While the
original algorithm supports only reductions of rank r=1
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 5 of 12
and their subsequent unions in one iteration of topolog-
ical filtering, TopoFilter exhaustively checks reductions
up to a given, user-defined rank (and their subsequent
unions; Fig. 1b). As illustrated in the theoretical example
in the “Results and discussion” section, higher-rank reduc-
tions may find valid submodels not detected otherwise
because of—not necessarily obvious—couplings between
parameters. Moreover, TopoFilter supports a user-defined
asymmetric coupling of parameters, for example, to elim-
inate the associated Michaelis-Menten constants when
eliminating the multiplicative kinetic constant for typi-
cal enzyme kinetics. Such definitions help increasing the
efficiency of model space exploration.
Adaptive re-sampling
Originally, the Dparameter samples in each step of topo-
logical filtering carry over from previous steps, and they
originate from sampling the root model. In addition to
thinning out the samples for higher-order reductions, the
distribution of samples in lower-dimensional parameter
spaces may not be representative for the corresponding
submodels. We therefore provide an option in TopoFilter
to obtain new samples adaptively when too few samples
are carried over. Re-sampling explores the viable sub-
space with HYPERSPACE [9], but in contrast to the initial
sampling, the OEAMC exploration uses all previously
found viable samples to improve MEBS performance. Re-
sampling improves discovery of viable submodels but
it increases the computational cost. In addition to the
model evaluations for parameter space exploration result-
ing in Dsamples, the number of model evaluations for
model space search in each recursive step of TopoFilter
is D·r
i=0˜
d
i,linearinDand in the number of pro-
jectable parameters ˜
dfor a rank r=1 exhaustive search,
quadratic for r=2, etc. TopoFilter therefore allows the
user to control minimal and maximal sample sizes as
well as the maximal number of model evaluations per
sampling step.
Efficient model space exploration
In addition to exploring the model space exhaustively
up to a given depth (investigating all reductions up to a
given rank for each viable submodel), TopoFilter provides
options to speed up the search for higher-order reduc-
tions. The algorithm can „jump” heuristically to a union
of all viable lower-rank reductions for the currently con-
sidered parameter sample and thereby exceed the rank r
in practice. Each union reduction is then checked against
all parameter samples for viability (see the example of
how M{4}can be reached in Fig. 1b). TopoFilter may also
proceed recursively, where the set of root models for the
recursive steps (e.g., M{4}, if it is viable for some parame-
ter sample) depends on a user-defined enumeration level
to trade off speed and exhaustiveness. TopoFilter supports
three enumeration strategies that are selected by their
corresponding enumeration levels (in brackets):
conservative (0): enumerate only maximal viable
projections found in a single recursive search step,
balanced (1): enumerate all viable union projections
found for each parameter sample (see Fig. 1b, blue
box if found viable), and
aggressive (2): enumerate all viable projections found,
including those found during the initial exhaustive
search among low-rank reductions (Fig. 1b, green
boxes).
Note that aggressive enumeration is particularly impor-
tant for model selection, where the trade-off between
model complexity and goodness-of-fit needs to be considered.
Finally, TopoFilter implements backtracking as an
experimental option. If a reduction of rank r>1is
found to be inviable–either for a single parameter sample
or for all samples in a iteration of topological filtering–
backtracking will test if reductions of lower rank that were
’skipped over’ during model search are viable. For the
example in Fig. 1b, if M{4}was inviable, M\{1,2},M\{1,3},
M\{2,3}would be tested during backtracking.
Parallelization
TopoFilter provides options to automatically support par-
allelization at different levels of the method, allowing
adjustments both to the considered case study and to the
available hardware. The currently available parallelization
levels (in brackets) are:
(0): Runs are performed sequentially, without
parallelization.
(1): Viability checks and, if required, maximal
projections found during viable point preparation are
parallelized. This is the least wasteful option
compared to sequential runs in terms of
computational time because it minimizes the number
of redundant model and parameter space searches
done in parallel. However, level 1 parallelization has
the biggest communication overhead.
(2): Iterations over root models within a recursive step
are carried out in parallel, which is an automation of
the parallel strategy in the original method.
(3): Independent repeats of topological filtering are
run in parallel. This option has no additional
overhead in terms of computations and
communication, and is useful if the number of
available parallel cores is small and the method’s
results in a particular application have high variance
in terms of the model reductions identified.
Software structure
The main inputs, internal dependencies, and outputs of
the TopoFilter implementation are summarized in Fig. 3.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 6 of 12
Fig. 3 Structure of the TopoFilter package. Call graph diagram of the main TopoFilter function files (middle) with corresponding inputs (left) and
outputs (right), denoted with empty arrowheads. Dashed lines and gray boxes indicate optional inputs, call dependencies and outputs. The
recommended (optional) own experiment function file can be created from the template experiment function file included in the package or from
one of the existing examples
Mandatory user-defined inputs include the mathematical
model, the experimental data (which usually is in the form
of time-course data for ODE models), and a specification
of the experimental design (e.g., to define time-dependent
inputs). Optional inputs allow for the customization of
many aspects of TopoFilter’s internal (default) functions,
such as the definition of custom scoring functions (see
above). Together with parameters for runtime operation
(e.g., enumeration level and initial viable parameter vec-
tor), these data files and functions are passed to a single
main function as the TopoFilter entry point. The main
function performs all required computations for topolog-
ical filtering, returning a single data structure containing
the essential outputs, such as each viable projection dis-
covered together with a single witness parameter sample
(the memory-consuming sets of all viable parameter sam-
ples are optionally written to files on-the-fly; see Fig. 3).
Implementation details
TopoFilter (v0.3.3) is implemented in MATLAB (Math-
Works, Natick, MA) with parallelization support via
MATLAB’s Parallel Computing Toolbox. Ordinary differ-
ential equation (ODE) models are numerically integrated
with SUNDIALS CVODES v2.5 C package [30]viathe
IQM Tools v1.2.x MATLAB package (IntiQuan GmbH,
Basel), which supports SBML [31] and a multi-experiment
setup. Sampling uses the HYPERSPACE v1.2.x MATLAB
package—an improved version of the implementation
described in [9], available at https://gitlab.com/csb.ethz/
HYPERSPACE.
For our case studies, all computations were carried out
on a homogeneous cluster of IntelXeon2.70GHz 24
cores CPUs with 30720KB cache each, running MAT-
LAB R2018b with the Parallel Computing Toolbox.
Results and discussion
Case study: target of rapamycin signaling
Biological background and study setup
We previously reported applications of topological filter-
ing to models of cell signaling with up to 12 parameters for
model selection and up to hundreds of alternative topolo-
gies [26]. To test scalability of the improved TopoFilter
method for a larger, intracellular signaling pathway model,
we focused on target of rapamycin (TOR) signaling. TOR
signaling, a pathway responding to the availability of nitro-
gen sources, is complicated by its connections to other
nutrient signaling pathways and because signal transduc-
tion involves the control of phosphatases that are hard to
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 7 of 12
analyze experimentally [32]. Dynamic model-based analy-
sis has therefore been instrumental to investigate the path-
way’s topology in yeast [33] and mammalian [19]cells,and
it suggested complex emergent behaviors in mammalian
TOR signaling [34].
For our case study, we use the mass-action kinetics
model of the budding yeast target of rapamycin (yTOR)
signaling pathway from [33] that includes a core model
and several extensions representing hypothetical control
mechanisms shown in Fig. 4a. The model captures the
core upstream signaling, from TOR complex 1 (TORC1),
via the regulatory proteins Tip41 and Tap42, to the het-
erotrimeric protein phosphatase 2A 1/2 (PP2A1/2) com-
plexes; it includes the drug rapamycin, which binds to
TORC1 and inhibits its activity, as an input. As a root
model for our analysis, we lumped core model exten-
sions 1–4 to encapsulate a total of 31 state variables
and 42 parameters, out of which 18 parameters can be
reduced. We used an essential subset of the original exper-
imental data, namely a total of 20 data points for 13
different observable variables, in 3 different experimen-
tal conditions (inputs of +0 μM, +109 μM, and +500 μM
rapamycin; see example data and simulation results in
Fig. 4b-c).
Finding the maximal reduction
In this setup with 18 out of 42 parameters being reducible,
the core model with 24 parameters is viable, and thus the
maximally reduced model. Finding the maximal reduc-
tion would require a naïve search to test the whole space
of 218 =262 144 submodels for viability. To character-
ize TopoFilter’s performance and accuracy, we therefore
first analyzed how the heuristics impact on the speed and
probability of finding the maximal reduction.
The data compiled in Table 1shows that, with the num-
ber of model evaluations for the sampling in parameter
space varying between 102and 104, the maximal reduc-
tion is always found using the balanced recursive heuris-
tic (enumeration level 1) and the conservative heuristic
(enumeration level 0), except for the most greedy search
setup with enumeration level 0, rank 1, and the small-
est sample size of n=102. Time-wise, when searching
for the maximal reduction, the conservative enumera-
tion strategy outperforms the balanced enumeration strat-
egy in all cases on a single core (non-parallel), and the
performance difference increases with the number of
samples n. Together, these data indicate that TopoFilter
can traverse the model space efficiently and with high
reliability.
ab
c
Fig. 4 Dynamic model for TOR signaling in budding yeast. aMolecular interactions represented in the core model (solid lines) and in hypothetical
extensions (dashed lines), adapted from [33]. Nodes represent proteins or protein complexes (boxes; phosphorylation indicated by ’P’) as well as
small molecules (ellipses). Arrows indicate reversible complex formation, while filled (open)circles adjacent to transition reactions denote protein
phosphorylation (dephosphorylation). b,cExperimental data (symbols; mean and s.d.) and sample model trajectories (lines) for stimulation of TOR
signaling with 500 nM (b) and 109 nM (c) rapamycin at t=20min.In(b), the abundance of phosphorylated Tap42 protein (red) was measured; in
(c), complex formations of Tip41 with Tap42 (blue) and of Tap42 with Sit4 (green) were determined. All data are relative to steady-state
concentrations prior to rapamycin addition; for details on model structure and experimental data, see [33]. Simulations in (b)and(c) represent viable
parameter samples for a default negative log-likelihood scoring function with a 0.95 quantile as a threshold
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 8 of 12
Table 1 TopoFilter performance in finding the maximally reduced TOR signaling model
Enumeration level Rank rSuccess rate (%) Time per run (min)
n=102n=103n=104n=102n=103n=104
Conservative (0) 1 80 100 100 4 19 154
2 100 100 100 11 93 733
Balanced (1) 1 100 100 10027 178 8’014
2 100 100 100 44 215 5’096
Data are averages from five repeated runs on a single worker each, except for the case with four repeats
Covering the model space
Next, to assess how well TopoFilter covers the model
space, which is essential for Bayesian model selection, we
emphasized simulation studies with the balanced strat-
egy (enumeration level 1). Fig. 5a shows that the total
number of discovered projections (submodels) grows with
the number of model evaluations as a power function: a
highernumberofmodelevaluationsallowsforabetter
exploration of the parameter space, as would be expected.
In this application, TopoFilter re-samples on average every
ca. 3–9 viable projections found, and with each 10-fold
increase of the number of model evaluations per sam-
pling,thenumberofviableprojectionsfoundpersam-
pling increases by ca. 1.5 fold (Fig. 5b). This implies that
re-sampling indeed helps to explore the viable subspace
more accurately, and that more often sufficiently many
lower-dimensional viable samples are left after a viable
projection than without re-sampling.
A comparison of the data for rank 1 and rank 2 exhaus-
tive search in Fig. 5a,b indicates that rank 2 exhaustive
abc
Fig. 5 TopoFilter performance for the yeast TOR signaling model. aNumber of identified viable projections (submodels of the root model) for rank 1
(blue) and 2 (red) exhaustive searches with balanced enumeration strategy, depending on the number of allowed model evaluations in each
(re-)sampling step, n. Symbols indicate number of workers in parallelized runs (inset). bNumber of viable projection divided by the number of times
the parameter space sampling was carried out when the number of samples left after projecting and filtering was smaller than the user-defined
threshold (here: 1/20 of the number of evaluations per sample); symbols are as in (a). cRun time as a function of number of parallel processes for
n=102(blue), n=103(red), and n=104(orange); symbols indicate the enumeration level (inset). The dashed lines give references for a slope of
-1. Regression lines (and their standard prediction errors; shaded) were computed for groups per exhaustive search rank r(a,b), or per number of
function evaluations n(c)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 9 of 12
search finds fewer projections and (re-)samples more fre-
quently. Although this seems counter-intuitive, the expla-
nation is that in the balanced enumeration strategy (level
1), the total number of all projected parameters in the
rank-bounded exhaustive search grows with its rank (pro-
jections found in the search are not subject to further
recursive steps, only their unions are per parameter sam-
ple, see Fig. 2), and bigger projections leave less viable
samples after their re-validation. Hence, jumps over lev-
els (ranks) of the model space are bigger for the rank 2
search, but, relative to the number of discovered viable
submodels, they imply more frequent re-sampling than
for the rank 1 search (Fig. 5b). In addition, for the bal-
anced enumeration strategy, although significantly fewer
total projections are found with the rank 2 search (by a
factor of 5-10; Fig. 5b), and, hence, fewer iterations are
required, the higher (quadratic) cost of testing projections
in the rank-bounded exhaustive search steps makes the
rank 2 search only slightly faster overall than the rank 1
search (level 1; see Fig. 5bandTable1). However, in prin-
ciple, as our theoretical example above demonstrates, the
rank 2 exhaustive search allows to find projections that
would not be found by eliminating only one parameter at
atime.
Finally, for the aggressive enumeration level 2, TopoFil-
ter finds approximately 7–9 ·104viable projections with
only n=102average sample size, that is, ca. 25–35%
of all 218 26 ·104submodels. The strategy finds the
maximal projection quickly, and the vast majority of the
computations concerns parts of the model space close
to the root model, where previously found small viable
reductions from the rank-bounded exhaustive search are
systematically extended with single parameters in the fol-
lowing recursive steps (which is not done for enumeration
levels less than 2; see Fig. 2).
Parallelization and scaling
For the timing data per TopoFilter run in Fig. 5c, it is
worth noting that the time required on a single CPU
(worker) for n=104samplesisapproximately80h.
Hence, TopoFilter can make model selection feasible
for a complex practical example such as TOR signaling
in reasonable time. The total time per each TopoFilter
run decreases with the number of parallel workers lin-
early (Fig. 5c), showing a good scalability of the method
over a parallel computing infrastructure. The most fine-
grained parallelization (level 1, in which viability tests are
parallelized; see “Implementation” section) allows for sig-
nificant time improvements with respect to the number of
workers, but the average CPU time per worker increases
(see increasing gap with respect to the diagonal in Fig. 5c).
This is caused by parallelization bottlenecks such as the
initial (non-parallel) sampling, and the synchronization
after each recursive step, where the method waits for
the projections that require the most time-consuming
re-sampling.
Theoretical example
While the case study of TOR signaling indicated per-
formance characteristics of TopoFilter depending on the
algorithmic options, it is too complex to systematically
identify limitations of the topological filtering heuris-
tic for model selection. We therefore devised a simple,
theoretically tractable example network; its analysis moti-
vated in part the method modifications implemented in
TopoFilter. In particular, the theoretical example high-
lights caveats of the heuristic for rank 1 reductions in
the exhaustive search as well as the critical nature of the
choice of parameter bounds.
Our theoretical example network contains two species
with concentrations x1and x2.Thefirstspeciesisonly
added instantaneously at t=0anddegradedwith
rate k1x1(t).Itactsasaligandthatenhances,withrate
x1(t), the production of the second (reporter) species. We
assume that the reporter is not present at t=0, but that
it can be produced at constant rate k2and degraded with
rate x2(t). This leads to the ODE system:
dx1(t)
dt =−k1x1(t)
dx2(t)
dt =k2+x1(t)x2(t),
with initial conditions x1(0)=x0
1>0andx2(0)=x0
2=0.
With k1>0, that is, with a degradable ligand, the
steady-state of the system is:
x
1=0, x
2=k2.
However, when we assume a non-degradable ligand (k1=
0), we find that
x
1=x0
1>0, x
2=x0
1+k2.
Assume that the correct model is the maximal reduction
with k1=k2=0. We experimentally observe only the
reporter, and only close to the steady-state, such that the
model output would be y1x
2. For the maximal reduc-
tion, x
2=x0
1, and correspondingly the measurement data
for observable y1is:
y0
1=x0
1+ε,whereεN0, σ12,
that is, the ligand’s initial concentration with a measure-
ment error εthat is assumed to be normally distributed
with variance equal to σ12. With TopoFilter’s default neg-
ative log-likelihood score and its default 95% quantile
threshold 1.96σ1, we have the viability criterion:
y1y0
12<2σ12·1.96σ1ln(2πσ1),
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 10 of 12
and in terms of parameter samples when parameters are
set to 0:
y1y0
1
0+εif k1=k2=0,
k2εif k1=0,
x0
1+εif k2=0,
where the approximation becomes more accurate the
closer the system is to steady-state at the time point of
measurement.
Having fixed the ligand’s initial amount x0
1and the lower
bound kmin
2>0, σ1can be so small that neither k1alone
nor k2alone can be reduced by projection to 0 as shown
in Fig. 6. Hence, when using only the rank 1 exhaustive
search, the viable rank 2 reduction of both k1and k2will
not be tested. There are several alternatives to solve or cir-
cumvent such issues. One can straightforwardly increase
the exhaustive search rank, but this will increase the
runtime of a search step. Alternatively, we can choose
sufficiently wide bounds for the analyzed region of the
parameter space (possibly disregarding known physical
bounds)—by decreasing kmin
2here—but this will decrease
sampling accuracy. Finally, if we choose small but posi-
tive projection values that approximate the true projection
values of 0 (Fig. 6), the caveat is an increased numerical
integration time.
For a different scenario, using the same model, the same
measurement data, and the same default score function
and threshold, consider a different projection value for k1
at the other end of the range of parameter values, k10
(Fig. 6). With a k1projection value such that the half-life
of the ligand is sufficiently small compared to the time
point of the measurement t, only a value of the parame-
ter of the constitutive production of the reporter matters
for the score. Here, the k2value has to be within error
bounds of ca. 1.96σ1from the x0
1value. Given that the
upper bound kmax
1is sufficiently high to allow TopoFilter
to find a sample with k2value within the error bounds, k1
can be reduced by projecting to 0 value.
Thus, while TopoFilter’s standard setting of testing only
single-parameter reductions at a time may prevent find-
ing a maximally reduced model that is consistent with the
Fig. 6 Likelihood and viable space for the theoretical example of a ligand-fluorescent reporter network with multiplicative kinetic constants k1and
k2. Bounding box for searched parameter values is given as kmin
1,kmax
1×kmin
2,kmax
2(dashed). The -projection values for each of the parameters
(red dashed, bottom and left) can lead to discovery of the rank 2 reduction via rank 1 reductions (first k1:=1, then, from viable point 1,kmin
2,
k2:=2), whereas projection values equal to 0 cannot. With the k1projection value greater than kmax
1(red dashed, right), the viable space, enclosed
within a 0.95 quantile of the cost function, is determined only by k2x0
11.96σ1,x0
1+1.96σ1;k1can be projected alone to some (high enough)
value. The figure was plotted with kmin
1,kmax
1×kmin
2,kmax
2=[0.0117, 0.027]×[1.3, 11.7],x0
1=10, σ1=0.05·x0
1
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 11 of 12
experimental data, several options exist to prevent or mit-
igate this potential problem in an application. However,
note that these options may lead to increased computa-
tional cost or to decreased accuracy.
Conclusions
The TopoFilter package combines high flexibility in tack-
ling model selection problems in systems and synthetic
biology with state-of-the-art, scalable performance. In
particular, the user has control over the model space
search exhaustiveness and, correspondingly, over the total
run-time. TopoFilter’s parameters allow one to choose
between search goals: model reduction (maximal viable
reduction) and model selection (statistically representa-
tive enumeration of viable submodels, the method’s orig-
inal purpose) are the extremes. The characterization of
viable spaces during filtering also enables efficient post-
hoc uniform sampling for Bayesian computations, which
we exploited in prior applications in systems biology [26]
and synthetic biology [29] for automated model genera-
tion and selection.
We see three main limitations of TopoFilter that could
be addressed by future developments: First, paralleliza-
tion could be extended to the sampling in parameter
space, which requires costly model evaluations, in order
to improve the computational efficiency and scalabil-
ity to larger applications. Second, the heuristics for the
search in model space could be improved. For example,
TopoFilter deals with each parameter sample separately—
analyzing the ensemble of viable samples could, for exam-
ple, help identifying the most promising directions for
multi-parameter projections, and thus increase efficiency
and accuracy of model space exploration. Finally, TopoFil-
ter currently only provides interfaces for ordinary dif-
ferential equation (ODE) models, but it could be easily
extended to other classes of parametric models. For exam-
ple, extensions to stochastic network descriptions are
particularly straightforward when the dynamics is approx-
imated by so-called moment equations in the form of
ODEs [35]. In the future, it could also be interesting to aim
for hybrid methods [21] that, for the purpose of model
selection, use parameter-free approaches such as logical
modeling to constrain the search space for (more detailed)
topological filtering a priori as much as possible.
Availability and requirements
Project name: To p olo g i cal fi l ter i ng
Project home page: https://gitlab.comv/csb.ethz/TopoFilter
Operating system(s): Platform independent
Programming language: MATLAB
Other requirements: MATLAB 2016a or higher, HYPER-
SPACE 1.2.1 or higher, IQM-tools 1.2.2.2 or higher
License: GNU GPLv3
Any restrictions to use by non-academics: None
Abbreviations
ABC: Approximate Bayesian computation; EGF: Epidermal growth factor; MEBS:
Multiple ellipsoid based sampling; ODE: Ordinary differential equation;
OEAMC: Out-of-equilibrium adaptive Monte Carlo sampling; PP2A1/2: Protein
phosphatase 2A 1/2 complexes; TOR: Target of rapamycin; TORC1: TOR
complex 1; yTOR: Budding yeast target of rapamycin
Acknowledgements
We thank Thomas Liphardt for discussions and benchmarks, Moritz Lang for
suggesting the ligand-stimulated reporter example, and Eve Tasiudi for
comments on the manuscript.
Authors’ contributions
J.S. and M.R. supervised the project. MR, SM, and MS implemented the
software. MR, CL, and SM performed the case studies. All authors analyzed the
data, wrote the manuscript, and revised the manuscript.
Funding
We acknowledge funding by the Swiss Initiative for Systems Biology
SystemsX.ch (SignalX project) and the NCCR Molecular Systems Engineering,
evaluated by the Swiss National Science Foundation. The funding bodies had
no role in the design of the study and collection, analysis, and interpretation of
data and in writing the manuscript.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available
in the gitlab repository, https://gitlab.com/csb.ethz/TopoFilter. In particular,
the experimental data for the yTOR case study is available in the repository in
the "examples/kuepfer-tor/expData.xls" file.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1Department of Biosystems Science and Engineering and SIB Swiss Institute of
Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland. 2ID Scientific
IT Services, ETH Zurich, 8092 Zurich, Switzerland. 3Life Science Zurich Ph.D.
program “Systems Biology”, 8092 Zurich, Switzerland.
Received: 29 March 2019 Accepted: 8 January 2020
References
1. Kirk PDW, Babtie AC, Stumpf MPH. Systems biology (un)certainties.
Science. 2015;350:386–8. https://doi.org/10.1126/science.aac9505.
2. Villaverde AF, Banga JR. Reverse engineering and identification in
systems biology: strategies, perspectives and challenges. J R Soc Interface.
2014;11:20130505. https://doi.org/10.1098/rsif.2013.0505.
3. Heinemann T, Raue A. Model calibration and uncertainty analysis in
signaling networks. Curr Opin Biotechnol. 2016;39:143–9. https://doi.org/
10.1016/j.copbio.2016.04.004.
4. Chen WW, Niepel M, Sorger PK. Classic and contemporary approaches to
modeling biochemical reactions. Genes Dev. 2010;24(17):1861–75.
https://doi.org/10.1101/gad.1945410.
5. Gould R, Bassen DM, Chakrabarti A, Varner JD, Butcher J. Population
heterogeneity in the epithelial to mesenchymal transition is controlled by
NFAT and phosphorylated Sp1. PLoS Comput Biol. 2016;12:1005251.
https://doi.org/10.1371/journal.pcbi.1005251.
6. Tan Y, Rivera JGL, Contador CA, Asenjo JA, Liao JC. Reducing the
allowable kinetic space by constructing ensemble of dynamic models
with the same steady-state flux. Metab Eng. 2011;13:60–75. https://doi.
org/10.1016/j.ymben.2010.11.001.
7. Bassen DM, Vilkhovoy M, Minot M, Butcher JT, Varner JD. JuPOETs: a
constrained multiobjective optimization approach to estimate
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Rybi ´
nski et al. BMC Bioinformatics (2020) 21:34 Page 12 of 12
biochemical model ensembles in the Julia programming language. BMC
Syst Biol. 2017;11:10. https://doi.org/10.1186/s12918-016- 0380-2.
8. Song SO, Chakrabarti A, Varner JD. Ensembles of signal transduction
models using Pareto Optimal Ensemble Techniques (POETs). Biotechnol J.
2010;5:768–80. https://doi.org/10.1002/biot.201000059.
9. Zamora-Sillero E, Hafner M, Ibig A, Stelling J, Wagner A. Efficient
characterization of high-dimensional parameter spaces for systems
biology. BMC Syst Biol. 2011;5:142. https://doi.org/10.1186/1752-0509- 5-
142.
10. Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MPH. Approximate
Bayesian computation scheme for parameter inference and model
selection in dynamical systems. J R Soc Interface. 2009;6(31):187–202.
11. Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C.
Approximate Bayesian computation. PLoS Comput Biol. 2013;9(1):
1002803. https://doi.org/10.1371/journal.pcbi.1002803.
12. Beaumont MA. Approximate Bayesian computation. Ann Rev Stat Appl.
2019;6:379–403. https://doi.org/10.1146/annurev-statistics- 030718-
105212.
13. Invergo BM, Beltrao P. Reconstructing phosphorylation signalling
networks from quantitative phosphoproteomic data. Essays Biochem.
2018. https://doi.org/10.1042/EBC20180019.
14. Vyshemirsky V, Girolami MA. Bayesian ranking of biochemical system
models. Bioinformatics. 2008;24(6):833–9. https://doi.org/10.1093/
bioinformatics/btm607.
15. Toni T, Stumpf MPH. Simulation-based model selection for dynamical
systems in systems and population biology. Bioinformatics (Oxford,
England). 2010;26:104–10. https://doi.org/10.1093/bioinformatics/
btp619.
16. Liepe J, Kirk P, Filippi S, Toni T, Barnes CP, Stumpf MPH. A framework for
parameter estimation and model selection from experimental data in
systems biology using approximate Bayesian computation. Nat Protoc.
2014;9:439–56. https://doi.org/10.1038/nprot.2014.025.
17. Hug S, Schmidl D, Li WB, Greiter MB, Theis FJ. Bayesian model selection
methods and their application to biological ODE systems. In: Uncertainty
in Biology. Cham: Springer; 2016. p. 243–68.
18. Xu T-R, Vyshemirsky V, Gormand A, von Kriegsheim A, Girolami M,
Baillie GS, Ketley D, Dunlop AJ, Milligan G, Houslay MD, Kolch W.
Inferring signaling pathway topologies from multiple perturbation
measurements of specific biochemical species. Sci Signal. 2010;3(113):20.
https://doi.org/10.1126/scisignal.2000517.
19. Dalle Pezze P, Sonntag AG, Thien A, Prentzell MT, Gödel M, Fischer S,
Neumann-Haefelin E, Huber TB, Baumeister R, Shanley DP, Thedieck K. A
dynamic network model of mTOR signaling reveals TSC-independent
mTORC2 regulation. Sci Signal. 2012;5:25. https://doi.org/10.1126/
scisignal.2002469.
20. Milias-Argeitis A, Oliveira AP, Gerosa L, Falter L, Sauer U, Lygeros J.
Elucidation of genetic interactions in the yeast GATA-factor network
using Bayesian model selection. PLoS Comput Biol. 2016;12:1004784.
https://doi.org/10.1371/journal.pcbi.1004784.
21. D’Alessandro LA, Samaga R, Maiwald T, Rho S-H, Bonefas S, Raue A,
Iwamoto N, Kienast A, Waldow K, Meyer R, Schilling M, Timmer J, Klamt
S, Klingmüller U. Disentangling the complexity of HGF signaling by
combining qualitative and quantitative modeling. PLoS Comput Biol.
2015;11:1004192. https://doi.org/10.1371/journal.pcbi.1004192.
22. Henriques D, Villaverde AF, Rocha M, Saez-Rodriguez J, Banga JR.
Data-driven reverse engineering of signaling pathways using ensembles
of dynamic models. PLoS Comput Biol. 2017;13:1005379. https://doi.org/
10.1371/journal.pcbi.1005379.
23. Otero-Muras I, Banga JR. Mixed integer multiobjective optimization
approaches for systems and synthetic biology. IFAC-PapersOnLine.
2018;51(19):58–61. https://doi.org/10.1016/j.ifacol.2018.09.042.7th
Conference on Foundation of Systems Biology in Engineering FOSBE
2018.
24. Gabel M, Hohl T, Imle A, Fackler OT, Graw F. FAMoS: A flexible and
dynamic algorithm for model selection to analyse complex systems
dynamics. PLoS Comput Biol. 2019;15:1007230. https://doi.org/10.1371/
journal.pcbi.1007230.
25. Sunnåker M, Stelling J. Model extension and model selection. In:
Uncertainty in Biology. Cham: Springer; 2016. p. 213–41.
26. Sunnåker M, Zamora-Sillero E, Dechant R, Ludwig C, Busetto AG,
Wagner A, Stelling J. Automatic generation of predictive dynamic models
reveals nuclear phosphorylation as the key Msn2 control mechanism. Sci
Signal. 2013;6(277):41. https://doi.org/10.1126/scisignal.2003621.
27. Ederer M, Gilles ED. Thermodynamically feasible kinetic models of
reaction networks. Biophys J. 2007;92(6):1846–57. https://doi.org/10.
1529/biophysj.106.094094.
28. Nilmeier JP, Crooks GE, Minh DDL, Chodera JD. Nonequilibrium
candidate Monte Carlo is an efficient tool for equilibrium simulation. Proc
Natl Acad Sci U S A. 2011;108:1009–18. https://doi.org/10.1073/pnas.
1106094108.
29. Lormeau C, Rybi ´
nski M, Stelling J. Multi-objective design of synthetic
biological circuits. IFAC-PapersOnLine. 2017;50(1):9871–6. https://doi.org/
10.1016/j.ifacol.2017.08.1601. Accessed 15 Dec 2017.
30. Serban R, Hindmarsh AC. CVODES: The Sensitivity-Enabled ODE Solver in
SUNDIALS. In: ASME Proceedings, 5th International Conference on
Multibody Systems, Nonlinear Dynamics, and Control, vol. 6. Long Beach,
California, USA; 2005. p. 257–69. https://doi.org/10.1115/DETC2005-
85597.
31. Hucka M, Finney A, Sauro HM, Bolouri H, et al. The systems biology
markup language (SBML): a medium for representation and exchange of
biochemical network models. Bioinformatics. 2003;19(4):524–31.
32. González A, Hall MN. Nutrient sensing and TOR signaling in yeast and
mammals. EMBO J. 2017;36:397–408. https://doi.org/10.15252/embj.
201696010.
33. Kuepfer L, Peter M, Sauer U, Stelling J. Ensemble modeling for analysis of
cell signaling dynamics. Nat Biotechnol. 2007;25(9):1001–6. https://doi.
org/10.1038/nbt1330.
34. Varusai TM, Nguyen LK. Dynamic modelling of the mTOR signalling
network reveals complex emergent behaviours conferred by DEPTOR. Sci
Rep. 2018;8:643. https://doi.org/10.1038/s41598-017- 18400-z.
35. Fröhlich F, Thomas P, Kazeroonian A, Theis FJ, Grima R, Hasenauer J.
Inference for stochastic chemical kinetics using moment equations and
system size expansion. PLoS Comput Biol. 2016;12:1005030. https://doi.
org/10.1371/journal.pcbi.1005030.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Recently TopoFilter has been made available as a MATLAB package for mechanistic model selection. 10 2.2. Control-Theoretic Approaches. ...
... The results reported are for enzyme networks, and the same for TRNs are yet to be studied. Recently TopoFilter has been made available as a MATLAB package for mechanistic model selection [33]. ...
Preprint
Full-text available
Genetic circuit design is a well-studied problem in synthetic biology. Ever since the first genetic circuits -- the repressilator and the toggle switch -- were designed and implemented, many advances have been made in this area of research. The current review systematically organizes a number of key works in this domain by employing the versatile framework of generalized morphological analysis. Literature in the area has been mapped based on (a) the design methodologies used, ranging from brute-force searches to control-theoretic approaches, (b) the modelling techniques employed, (c) various circuit functionalities implemented, (d) key design characteristics, and (e) the strategies used for the robust design of genetic circuits. We conclude our review with an outlook on multiple exciting areas for future research, based on the systematic assessment of key research gaps that have been readily unravelled by our analysis framework.
Article
Full-text available
Cells can encode information about their environment by modulating signaling dynamics and responding accordingly. Yet, the mechanisms cells use to decode these dynamics remain unknown when cells respond exclusively to transient signals. Here, we approach design principles underlying such decoding by rationally engineering a synthetic short-pulse decoder in budding yeast. A computational method for rapid prototyping, TopoDesign, allowed us to explore 4122 possible circuit architectures, design targeted experiments, and then rationally select a single circuit for implementation. This circuit demonstrates short-pulse decoding through incoherent feedforward and positive feedback. We predict incoherent feedforward to be essential for decoding transient signals, thereby complementing proposed design principles of temporal filtering, the ability to respond to sustained signals, but not to transient signals. More generally, we anticipate TopoDesign to help designing other synthetic circuits with non-intuitive dynamics, simply by assembling available biological components.
Article
Full-text available
Most biological systems are difficult to analyse due to a multitude of interacting components and the concomitant lack of information about the essential dynamics. Finding appropriate models that provide a systematic description of such biological systems and that help to identify their relevant factors and processes can be challenging given the sheer number of possibilities. Model selection algorithms that evaluate the performance of a multitude of different models against experimental data provide a useful tool to identify appropriate model structures. However, many algorithms addressing the analysis of complex dynamical systems, as they are often used in biology, compare a preselected number of models or rely on exhaustive searches of the total model space which might be unfeasible dependent on the number of possibilities. Therefore, we developed an algorithm that is able to perform model selection on complex systems and searches large model spaces in a dynamical way. Our algorithm includes local and newly developed non-local search methods that can prevent the algorithm from ending up in local minima of the model space by accounting for structurally similar processes. We tested and validated the algorithm based on simulated data and showed its flexibility for handling different model structures. We also used the algorithm to analyse experimental data on the cell proliferation dynamics of CD4+ and CD8+ T cells that were cultured under different conditions. Our analyses indicated dynamical changes within the proliferation potential of cells that was reduced within tissue-like 3D ex vivo cultures compared to suspension. Due to the flexibility in handling various model structures, the algorithm is applicable to a large variety of different biological problems and represents a useful tool for the data-oriented evaluation of complex model spaces.
Article
Full-text available
Cascades of phosphorylation between protein kinases comprise a core mechanism in the integration and propagation of intracellular signals. Although we have accumulated a wealth of knowledge around some such pathways, this is subject to study biases and much remains to be uncovered. Phosphoproteomics, the identification and quantification of phosphorylated proteins on a proteomic scale, provides a high-throughput means of interrogating the state of intracellular phosphorylation, both at the pathway level and at the whole-cell level. In this review, we discuss methods for using human quantitative phosphoproteomic data to reconstruct the underlying signalling networks that generated it. We address several challenges imposed by the data on such analyses and we consider promising advances towards reconstructing unbiased, kinome-scale signalling networks.
Article
Full-text available
The mechanistic Target of Rapamycin (mTOR) signalling network is an evolutionarily conserved network that controls key cellular processes, including cell growth and metabolism. Consisting of the major kinase complexes mTOR Complex 1 and 2 (mTORC1/2), the mTOR network harbours complex interactions and feedback loops. The DEP domain-containing mTOR-interacting protein (DEPTOR) was recently identified as an endogenous inhibitor of both mTORC1 and 2 through direct interactions, and is in turn degraded by mTORC1/2, adding an extra layer of complexity to the mTOR network. Yet, the dynamic properties of the DEPTOR-mTOR network and the roles of DEPTOR in coordinating mTORC1/2 activation dynamics have not been characterised. Using computational modelling, systems analysis and dynamic simulations we show that DEPTOR confers remarkably rich and complex dynamic behaviours to mTOR signalling, including abrupt, bistable switches, oscillations and co-existing bistable/oscillatory responses. Transitions between these distinct modes of behaviour are enabled by modulating DEPTOR expression alone. We characterise the governing conditions for the observed dynamics by elucidating the network in its vast multi-dimensional parameter space, and develop strategies to identify core network design motifs underlying these dynamics. Our findings provide new systems-level insights into the complexity of mTOR signalling contributed by DEPTOR.
Article
Full-text available
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM’s ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Article
Full-text available
Background Ensemble modeling is a promising approach for obtaining robust predictions and coarse grained population behavior in deterministic mathematical models. Ensemble approaches address model uncertainty by using parameter or model families instead of single best-fit parameters or fixed model structures. Parameter ensembles can be selected based upon simulation error, along with other criteria such as diversity or steady-state performance. Simulations using parameter ensembles can estimate confidence intervals on model variables, and robustly constrain model predictions, despite having many poorly constrained parameters. Results In this software note, we present a multiobjective based technique to estimate parameter or models ensembles, the Pareto Optimal Ensemble Technique in the Julia programming language (JuPOETs). JuPOETs integrates simulated annealing with Pareto optimality to estimate ensembles on or near the optimal tradeoff surface between competing training objectives. We demonstrate JuPOETs on a suite of multiobjective problems, including test functions with parameter bounds and system constraints as well as for the identification of a proof-of-concept biochemical model with four conflicting training objectives. JuPOETs identified optimal or near optimal solutions approximately six-fold faster than a corresponding implementation in Octave for the suite of test functions. For the proof-of-concept biochemical model, JuPOETs produced an ensemble of parameters that gave both the mean of the training data for conflicting data sets, while simultaneously estimating parameter sets that performed well on each of the individual objective functions. Conclusions JuPOETs is a promising approach for the estimation of parameter and model ensembles using multiobjective optimization. JuPOETs can be adapted to solve many problem types, including mixed binary and continuous variable types, bilevel optimization problems and constrained problems without altering the base algorithm. JuPOETs is open source, available under an MIT license, and can be installed using the Julia package manager from the JuPOETs GitHub repository
Article
Full-text available
Epithelial to mesenchymal transition (EMT) is an essential differentiation program during tissue morphogenesis and remodeling. EMT is induced by soluble transforming growth factor β (TGF-β) family members, and restricted by vascular endothelial growth factor family members. While many downstream molecular regulators of EMT have been identified, these have been largely evaluated individually without considering potential crosstalk. In this study, we created an ensemble of dynamic mathematical models describing TGF-β induced EMT to better understand the operational hierarchy of this complex molecular program. We used ordinary differential equations (ODEs) to describe the transcriptional and post-translational regulatory events driving EMT. Model parameters were estimated from multiple data sets using multiobjective optimization, in combination with cross-validation. TGF-β exposure drove the model population toward a mesenchymal phenotype, while an epithelial phenotype was enhanced following vascular endothelial growth factor A (VEGF-A) exposure. Simulations predicted that the transcription factors phosphorylated SP1 and NFAT were master regulators promoting or inhibiting EMT, respectively. Surprisingly, simulations also predicted that a cellular population could exhibit phenotypic heterogeneity (characterized by a significant fraction of the population with both high epithelial and mesenchymal marker expression) if treated simultaneously with TGF-β and VEGF-A. We tested this prediction experimentally in both MCF10A and DLD1 cells and found that upwards of 45% of the cellular population acquired this hybrid state in the presence of both TGF-β and VEGF-A. We experimentally validated the predicted NFAT/Sp1 signaling axis for each phenotype response. Lastly, we found that cells in the hybrid state had significantly different functional behavior when compared to VEGF-A or TGF-β treatment alone. Together, these results establish a predictive mechanistic model of EMT susceptibility, and potentially reveal a novel signaling axis which regulates carcinoma progression through an EMT versus tubulogenesis response.
Article
Many of the statistical models that could provide an accurate, interesting, and testable explanation for the structure of a data set turn out to have intractable likelihood functions. The method of approximate Bayesian computation (ABC) has become a popular approach for tackling such models. This review gives an overview of the method and the main issues and challenges that are the subject of current research. Expected final online publication date for the Annual Review of Statistics and Its Application Volume 6 is March 7, 2019. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
In this work we tackle a number of computational challenges in systems and synthetic biology exploiting optimization based approaches. Our framework combines three important capabilities: multiple optimization objectives (taking into account trade-offs between conflicting goals), simultaneous exploration of topology and parameter spaces (through a mixed integer modeling framework) and high computational efficiency. We illustrate the capacities of the mixed integer multiobjective framework in three different applications: i) automated design of synthetic bistable genetic switches, ii) exloring design principles underlying biochemical bistable switches in living cells iii) advanced identification of cellular process models from experimental data.
Article
Computational methods enable the design of synthetic biological circuits demonstrating a specific dynamic behavior. Current methods are based on the assembly of parts characterized in different contexts, which often fail to operate as predicted when combined. Here we introduce a circuit design method that compensates for parts uncertainty by identifying circuit topologies whose behavior is robust to variations in parameters. Our heuristic topological filtering approach efficiently yields robust circuit designs in a Bayesian framework, and enables to reliably assess trade-offs between performance, robustness, and experimental feasibility, thus increasing the probability of success of circuit implementation.
Article
Coordinating cell growth with nutrient availability is critical for cell survival. The evolutionarily conserved TOR (target of rapamycin) controls cell growth in response to nutrients, in particular amino acids. As a central controller of cell growth, mTOR (mammalian TOR) is implicated in several disorders, including cancer, obesity, and diabetes. Here, we review how nutrient availability is sensed and transduced to TOR in budding yeast and mammals. A better understanding of how nutrient availability is transduced to TOR may allow novel strategies in the treatment for mTOR-related diseases.