Sloppy Models, Parameter Uncertainty, and the Role of Experimental Design

Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
Molecular BioSystems (Impact Factor: 3.21). 10/2010; 6(10):1890-900. DOI: 10.1039/b918098b
Source: PubMed


Computational models are increasingly used to understand and predict complex biological phenomena. These models contain many unknown parameters, at least some of which are difficult to measure directly, and instead are estimated by fitting to time-course data. Previous work has suggested that even with precise data sets, many parameters are unknowable by trajectory measurements. We examined this question in the context of a pathway model of epidermal growth factor (EGF) and neuronal growth factor (NGF) signaling. Computationally, we examined a palette of experimental perturbations that included different doses of EGF and NGF as well as single and multiple gene knockdowns and overexpressions. While no single experiment could accurately estimate all of the parameters, experimental design methodology identified a set of five complementary experiments that could. These results suggest optimism for the prospects for calibrating even large models, that the success of parameter estimation is intimately linked to the experimental perturbations used, and that experimental design methodology is important for parameter fitting of biological models and likely for the accuracy that can be expected from them.

Download full-text


Available from: Joshua Apgar, Nov 24, 2015
  • Source
    • "). We point out that this is the premise invoked also in the context of Sloppy Models [35] [36] [37], whose behavior depends only on a few stiff combinations of parameters (accounted here by W and y), with many sloppy parameter directions largely unimportant for model predictions (accounted here by η z ). We also note here the fundamental difference with PCA decompositions which attain the same form as Equation(9). "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper is concerned with a lesser-studied problem in the context of model-based, uncertainty quantification (UQ), that of optimization/design/control under uncertainty. The solution of such problems is hindered not only by the usual difficulties encountered in UQ tasks (e.g. the high computational cost of each forward simulation, the large number of random variables) but also by the need to solve a nonlinear optimization problem involving large numbers of design variables and potentially constraints. We propose a framework that is suitable for a large class of such problems and is based on the idea of recasting them as probabilistic inference tasks. To that end, we propose a Variational Bayesian (VB) formulation and an iterative VB-Expectation-Maximization scheme that is also capable of identifying a low-dimensional set of directions in the design space, along which, the objective exhibits the largest sensitivity. We demonstrate the validity of the proposed approach in the context of two numerical examples involving $\mathcal{O}(10^3)$ random and design variables. In all cases considered the cost of the computations in terms of calls to the forward model was of the order $\mathcal{O}(10^2)$. The accuracy of the approximations provided is assessed by appropriate information-theoretic metrics.
  • Source
    • "In the case of an integral approach, experimental design can give data coverage for many parameter directions and maximize predictive accuracy (Apgar et al. 2010), because large uncertainty parameter directions in an experiment can correspond to less uncertainty parameter direction in other experiments (Apgar et al. 2010). The effect of the multi-fitting complementary design is the constriction of parameters (Gutenkunst et al. 2007b) in sloppy multi-parameter models with few stiff parameters and many sloppy parameter directions (Daniels et al. 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern researchers working in applied animal science systems have faced issues with modelling huge quantities of data. Modelling approaches that use to be useful to model biological systems are having problems to adapt to increased number of publications and research. In order to develop new approaches that have potential to deal with these fast- changing complex conditions, it is relevant to review modern modelling approaches that have been used successfully in other fields. Therefore, this paper reviews the potential capacity of new integrated applied animal science approaches to discriminate parameters, interpret data and understand biological process. The analysis shows that the principal challenge is handling ill- conditioned complex models, but an integrated approach can obtain meaningful information from complementary data that cannot be obtained from present applied animal science approaches. Furthermore, it is shown that parameter sloppiness and data complementarity are key concepts during system behavior restrictions and parameter discrimination. Additionally, model evaluation and implementation of the potential integrated approach are reviewed. Finally, the objective of an integral approach is discussed. Our conclusion is that these approaches have potential to be used to deepen the understanding of applied animal systems, and that exist enough developed resources and methodologies to deal with the huge quantities of data associated with this science.
    Animal Production Science 10/2014; 54(11-12). DOI:10.1071/AN14568 · 1.29 Impact Factor
  • Source
    • "New approaches to developing SEAMAPs will be needed to circumvent combinatorial explosion, particularly as larger genetic systems with many proteins are targeted for engineering. Notably, while system-level models describing protein interactions have several unknown parameters, model reduction and rule-based simulations can significantly reduce the number of equations, transitions, codependent variables, and insensitive constants (Conzelmann et al, 2008; Tran et al, 2008; Apgar et al, 2010; Sneddon et al, 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Developing predictive models of multi-protein genetic systems to understand and optimize their behavior remains a combinatorial challenge, particularly when measurement throughput is limited. We developed a computational approach to build predictive models and identify optimal sequences and expression levels, while circumventing combinatorial explosion. Maximally informative genetic system variants were first designed by the RBS Library Calculator, an algorithm to design sequences for efficiently searching a multi-protein expression space across a > 10,000-fold range with tailored search parameters and well-predicted translation rates. We validated the algorithm's predictions by characterizing 646 genetic system variants, encoded in plasmids and genomes, expressed in six gram-positive and gram-negative bacterial hosts. We then combined the search algorithm with system-level kinetic modeling, requiring the construction and characterization of 73 variants to build a sequence-expression-activity map (SEAMAP) for a biosynthesis pathway. Using model predictions, we designed and characterized 47 additional pathway variants to navigate its activity space, find optimal expression regions with desired activity response curves, and relieve rate-limiting steps in metabolism. Creating sequence-expression-activity maps accelerates the optimization of many protein systems and allows previous measurements to quantitatively inform future designs.
    Molecular Systems Biology 06/2014; 10(6):731. DOI:10.15252/msb.20134955 · 10.87 Impact Factor
Show more