ArticlePDF Available

Does model-free forecasting really outperform the true model?

October 2013
Proceedings of the National Academy of Sciences 110(42)

October 2013
110(42)

DOI:10.1073/pnas.1308603110

Source
PubMed

Authors:

Florian Hartig

Universität Regensburg

Estimating population models from uncertain observations is an important problem in ecology. Perretti et al. observed that standard Bayesian state-space solutions to this problem may provide biased parameter estimates when the underlying dynamics are chaotic. Consequently, forecasts based on these estimates showed poor predictive accuracy compared to simple "model-free" methods, which lead Perretti et al. to conclude that "Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data". However, a simple modification of the statistical methods also suffices to remove the bias and reverse their results.

Fig. S1. Population dynamics for r = 3.7, K = 1, σ proc = 0.005 and σ obs = 0.2. Solid line shows the true population size, circles show observations.

…

Figures - uploaded by Florian Hartig

Content may be subject to copyright.

Content uploaded by Florian Hartig

Content may be subject to copyright.

A preview of the PDF is not available

Hybrid modeling and prediction of dynamical systems

Article

Full-text available

Jul 2017
PLOS COMPUT BIOL

Scientific analysis often relies on the ability to make accurate predictions of a system's dynamics. Mechanistic models, parameterized by a number of unknown parameters, are often used for this purpose. Accurate estimation of the model state and parameters prior to prediction is necessary, but may be complicated by issues such as noisy data and uncertainty in parameters and initial conditions. At the other end of the spectrum exist nonparametric methods, which rely solely on data to build their predictions. While these nonparametric methods do not require a model of the system, their performance is strongly influenced by the amount and noisiness of the data. In this article, we consider a hybrid approach to modeling and prediction which merges recent advancements in nonparametric analysis with standard parametric methods. The general idea is to replace a subset of a mechanistic model's equations with their corresponding nonparametric representations, resulting in a hybrid modeling and prediction scheme. Overall, we find that this hybrid approach allows for more robust parameter estimation and improved short-term prediction in situations where there is a large uncertainty in model parameters. We demonstrate these advantages in the classical Lorenz-63 chaotic system and in networks of Hindmarsh-Rose neurons before application to experimentally collected structured population data.

MCMC-driven importance samplers

Article

Full-text available

Jul 2022
APPL MATH MODEL

Monte Carlo sampling methods are the standard procedure for approximating complicated integrals of multidimensional posterior distributions in Bayesian inference. In this work, we focus on the class of layered adaptive importance sampling algorithms, which is a family of adaptive importance samplers where Markov chain Monte Carlo algorithms are employed to drive an underlying multiple importance sampling scheme. The modular nature of the layered adaptive importance sampling scheme allows for different possible implementations, yielding a variety of different performances and computational costs. In this work, we propose different enhancements of the classical layered adaptive importance sampling setting in order to increase the efficiency and reduce the computational cost, of both upper and lower layers. The different variants address computational challenges arising in real-world applications, for instance with highly concentrated posterior distributions. Furthermore, we introduce different strategies for designing cheaper schemes, for instance, recycling samples generated in the upper layer and using them in the final estimators in the lower layer. Different numerical experiments show the benefits of the proposed schemes, comparing with benchmark methods presented in the literature, and in several challenging scenarios.

Integrating multiple data sources to fit matrix population models for interacting species

Article

Full-text available

Aug 2019
ECOL MODEL

Inferring interactions between populations of different species is a challenging statistical endeavour, which requires a large amount of data. There is therefore some incentive to combine all available sources of data into a single analysis to do so. In demography and single-population studies, Integrated Population Models combine population counts, capture-recapture and reproduction data to fit matrix population models. Here, we extend this approach to the community level in a stage-structured predator-prey context. We develop Integrated Community Models (ICMs), implemented in a Bayesian framework, to fit multispecies nonlinear matrix models to multiple data sources. We assessed the value of the different sources of data using simulations of ICMs under different scenarios contrasting data availability. We found that combining all data types (capture-recapture, counts, and reproduction) allows the estimation of both demographic and interaction parameters, unlike count-only data which typically generate high bias and low precision in interaction parameter estimates for short time series. Moreover, reproduction surveys informed the estimation of interactions particularly well when compared to capture-recapture programs, and have the advantage of being less costly. Overall, ICMs offer an accurate representation of stage structure in community dynamics, and foster the development of efficient observational study designs to monitor communities in the field.

An equation-free method reveals the ecological interaction networks within complex microbial ecosystems

Article

May 2017
Methods Ecol. Evol.

1. Mapping the network of ecological interactions is key to understanding the composition, stability, function and dynamics of microbial communities. In recent years various approaches have been used to reveal microbial interaction networks from metagenomic sequencing data, such as time-series analysis, machine learning and statistical techniques. Despite these efforts it is still not possible to capture details of the ecological interactions behind complex microbial dynamics. 2. We developed the sparse S-map method (SSM), which generates a sparse interaction network from a multivariate ecological time-series without presuming any mathematical formulation for the underlying microbial processes. The advantage of the SSM over alternative methodologies is that it fully utilizes the observed data using a framework of empirical dynamic modelling. This makes the SSM robust to non-equilibrium dynamics and underlying complexity (nonlinearity) in microbial processes. 3. We showed that an increase in dataset size or a decrease in observational error improved the accuracy of SSM whereas, the accuracy of a comparative equation-based method was almost unchanged for both cases and equivalent to the SSM at best. Hence, the SSM outperformed a comparative equation-based method when datasets were large and the magnitude of observational errors were small. The results were robust to the magnitude of process noise and the functional forms of inter-specific interactions that we tested. We applied the method to a microbiome data of six mice and found that there were different microbial interaction regimes between young to middle age (4-40 week-old) and middle to old age (36-72 week-old) mice. 4. The complexity of microbial relationships impedes detailed equation-based modeling. Our method provides a powerful alternative framework to infer ecological interaction networks of microbial communities in various environments and will be improved by further developments in metagenomics sequencing technologies leading to increased dataset size and improved accuracy and precision.

The Ecological Forecast Horizon, and examples of its uses and determinants

Article

Full-text available

May 2015
ECOL LETT

Forecasts of ecological dynamics in changing environments are increasingly important, and are available for a plethora of variables, such as species abundance and distribution, community structure and ecosystem processes. There is, however, a general absence of knowledge about how far into the future, or other dimensions (space, temperature, phylogenetic distance), useful ecological forecasts can be made, and about how features of ecological systems relate to these distances. The ecological forecast horizon is the dimensional distance for which useful forecasts can be made. Five case studies illustrate the influence of various sources of uncertainty (e.g. parameter uncertainty, environmental variation, demographic stochasticity and evolution), level of ecological organisation (e.g. population or community), and organismal properties (e.g. body size or number of trophic links) on temporal, spatial and phylogenetic forecast horizons. Insights from these case studies demonstrate that the ecological forecast horizon is a flexible and powerful tool for researching and communicating ecological predictability. It also has potential for motivating and guiding agenda setting for ecological forecasting research and development. © 2015 The Authors Ecology Letters published by John Wiley & Sons Ltd and CNRS.

Energy landscape analysis elucidates the multistability of ecological communities across environmental gradients

Article

Full-text available

Jun 2021
ECOL MONOGR

Compositional multistability is widely observed in multispecies ecological communities. Since differences in community composition often lead to differences in community function, understanding compositional multistability is essential to comprehend the role of biodiversity in maintaining ecosystems. In community assembly studies, it has long been recognized that the order and timing of species migration and extinction influence structure and function of communities. The study of multistability in ecology has focused on the change in dynamical stability across environmental gradients, and was developed mainly for low‐dimensional systems. As a result, methodologies for studying the compositional stability of empirical multispecies communities are not well developed. Here, we show that models previously used in ecology can be analyzed from a new perspective, the energy landscape, to unveil compositional stability in observational data. To show that our method can be applicable to real‐world ecological communities, we simulated assembly dynamics driven by population‐level processes, and show that results were mostly robust to different simulation assumptions. Our method reliably captured the change in the overall compositional stability of multispecies communities over environmental change, and indicated a small fraction of community compositions that may be channels for transitions between stable states. When applied to murine gut microbiota, our method showed the presence of two alternative states whose relationship changes with age, and suggested mechanisms by which aging affects the compositional stability of the murine gut microbiota. Our method provides a practical tool to study the compositional stability of communities in a changing world, and will facilitate empirical studies that integrate the concept of multistability from different fields.

An equation-free method reveals the ecological interaction networks within complex microbial ecosystems

Preprint

Full-text available

Mar 2017

Mapping the network of ecological interactions is key to understanding the composition, stability, function and dynamics of microbial communities. In recent years various approaches have been used to reveal microbial interaction networks from metagenomic sequencing data, such as time-series analysis, machine learning and statistical techniques. Despite these efforts it is still not possible to capture details of the ecological interactions behind complex microbial dynamics. We developed the sparse S-map method (SSM), which generates a sparse interaction network from a multivariate ecological time-series without presuming any mathematical formulation for the underlying microbial processes. The advantage of the SSM over alternative methodologies is that it fully utilizes the observed data using a framework of empirical dynamic modelling. This makes the SSM robust to non-equilibrium dynamics and underlying complexity (nonlinearity) in microbial processes. We showed that an increase in dataset size or a decrease in observational error improved the accuracy of SSM whereas, the accuracy of a comparative equation-based method was almost unchanged for both cases and equivalent to the SSM at best. Hence, the SSM outperformed a comparative equation-based method when datasets were large and the magnitude of observational errors were small. The results were robust to the magnitude of process noise and the functional forms of inter-specific interactions that we tested. We applied the method to a microbiome data of six mice and found that there were different microbial interaction regimes between young to middle age (4-40 week-old) and middle to old age (36-72 week-old) mice. The complexity of microbial relationships impedes detailed equation-based modeling. Our method provides a powerful alternative framework to infer ecological interaction networks of microbial communities in various environments and will be improved by further developments in metagenomics sequencing technologies leading to increased dataset size and improved accuracy and precision.

Forecasting and Uncertainty Quantification Using a Hybrid of Mechanistic and Non-mechanistic Models for an Age-Structured Population Model

Article

Apr 2018

In this paper, we present a new method for the prediction and uncertainty quantification of data-driven multivariate systems. Traditionally, either mechanistic or non-mechanistic modeling methodologies have been used for prediction; however, it is uncommon for the two to be incorporated together. We compare the forecast accuracy of mechanistic modeling, using Bayesian inference, a non-mechanistic modeling approach based on state space reconstruction, and a novel hybrid methodology composed of the two for an age-structured population data set. The data come from cannibalistic flour beetles, in which it is observed that the adults preying on the eggs and pupae result in non-equilibrium population dynamics. Uncertainty quantification methods for the hybrid models are outlined and illustrated for these data. We perform an analysis of the results from Bayesian inference for the mechanistic model and hybrid models to suggest reasons why hybrid modeling methodology may enable more accurate forecasts of multivariate systems than traditional approaches.

Time Series from a Nonlinear Dynamical Systems Perspective

Chapter

Sep 2017

Daniel Durstewitz

Nonlinear dynamics is a huge field in mathematics and physics, and we will hardly be able to scratch the surface here. Nevertheless, this field is so tremendously important for our theoretical understanding of brain function and time series phenomena that I felt a book on statistical methods in neuroscience should not go without discussing at least some of its core concepts. Having some grasp of nonlinear dynamical systems may give important insights into how the observed time series were generated. In fact, nonlinear dynamics provides a kind of universal language for mathematically describing the deterministic part of the dynamical systems generating the observed time series—we will see later (Sect. 9.3) how to connect these ideas to stochastic processes and statistical inference. ARMA and state space models as discussed in Sects. 7.2 and 7.5 are examples of discrete-time, linear dynamical systems driven by noise. However, linear dynamical systems can only exhibit a limited repertoire of dynamical behaviors and typically do not capture a number of prominent and computationally important phenomena observed in physiological recordings. In the following, we will distinguish between models that are defined in discrete time (Sect. 9.1), as all the time series models discussed so far, and continuous-time models (Sect. 9.2).

A fast universal self-tuned sampler within Gibbs sampling

Article

Full-text available

Apr 2015
DIGIT SIGNAL PROCESS

Bayesian inference often requires efficient numerical approximation algorithms, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods. The Gibbs sampler is a well-known MCMC technique, widely applied in many signal processing problems. Drawing samples from univariate full-conditional distributions efficiently is essential for the practical application of the Gibbs sampler. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm which produces virtually independent samples from these univariate target densities. The proposal density used is self-tuned and tailored to the specific target, but it is not adaptive. Instead, the proposal is adjusted during an initial optimization stage, following a simple and extremely effective procedure. Hence, we have named the newly proposed approach as FUSS (Fast Universal Self-tuned Sampler), as it can be used to sample from any bounded univariate distribution and also from any bounded multi-variate distribution, either directly or by embedding it within a Gibbs sampler. Numerical experiments, on several synthetic data sets (including a challenging parameter estimation problem in a chaotic system) and a high-dimensional financial signal processing problem, show its good performance in terms of speed and estimation accuracy.

Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data

Article

Full-text available

Feb 2013
P NATL ACAD SCI USA

Accurate predictions of species abundance remain one of the most vexing challenges in ecology. This observation is perhaps unsurprising, because population dynamics are often strongly forced and highly nonlinear. Recently, however, numerous statistical techniques have been proposed for fitting highly parameterized mechanistic models to complex time series, potentially providing the machinery necessary for generating useful predictions. Alternatively, there is a wide variety of comparatively simple model-free forecasting methods that could be used to predict abundance. Here we pose a rather conservative challenge and ask whether a correctly specified mechanistic model, fit with commonly used statistical techniques, can provide better forecasts than simple model-free methods for ecological systems with noisy nonlinear dynamics. Using four different control models and seven experimental time series of flour beetles, we found that Markov chain Monte Carlo procedures for fitting mechanistic models often converged on best-fit parameterizations far different from the known parameters. As a result, the correctly specified models provided inaccurate forecasts and incorrect inferences. In contrast, a model-free method based on state-space reconstruction gave the most accurate short-term forecasts, even while using only a single time series from the multivariate system. Considering the recent push for ecosystem-based management and the increasing call for ecological predictions, our results suggest that a flexible model-free approach may be the most promising way forward.

Statistical inference for stochastic simulation models? Theory and application

Article

Jun 2011
ECOL LETT

Ecology Letters (2011) 14: 816–827 Statistical models are the traditional choice to test scientific theories when observations, processes or boundary conditions are subject to stochasticity. Many important systems in ecology and biology, however, are difficult to capture with statistical models. Stochastic simulation models offer an alternative, but they were hitherto associated with a major disadvantage: their likelihood functions can usually not be calculated explicitly, and thus it is difficult to couple them to well-established statistical theory such as maximum likelihood and Bayesian statistics. A number of new methods, among them Approximate Bayesian Computing and Pattern-Oriented Modelling, bypass this limitation. These methods share three main principles: aggregation of simulated and observed data via summary statistics, likelihood approximation based on the summary statistics, and efficient sampling. We discuss principles as well as advantages and caveats of these methods, and demonstrate their potential for integrating stochastic simulation models into a unified framework for statistical modelling.

Statistical inference for noisy nonlinear ecological dynamic systems

Article

Aug 2010
NATURE

Simon Wood

Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.

Statistical methods of parameter estimation for deterministically chaotic time series

Article

Apr 2004

We discuss the possibility of applying some standard statistical methods (the least-square method, the maximum likelihood method, and the method of statistical moments for estimation of parameters) to deterministically chaotic low-dimensional dynamic system (the logistic map) containing an observational noise. A "segmentation fitting" maximum likelihood (ML) method is suggested to estimate the structural parameter of the logistic map along with the initial value x(1) considered as an additional unknown parameter. The segmentation fitting method, called "piece-wise" ML, is similar in spirit but simpler and has smaller bias than the "multiple shooting" previously proposed. Comparisons with different previously proposed techniques on simulated numerical examples give favorable results (at least, for the investigated combinations of sample size N and noise level). Besides, unlike some suggested techniques, our method does not require the a priori knowledge of the noise variance. We also clarify the nature of the inherent difficulties in the statistical analysis of deterministically chaotic time series and the status of previously proposed Bayesian approaches. We note the trade off between the need of using a large number of data points in the ML analysis to decrease the bias (to guarantee consistency of the estimation) and the unstable nature of dynamical trajectories with exponentially fast loss of memory of the initial condition. The method of statistical moments for the estimation of the parameter of the logistic map is discussed. This method seems to be the unique method whose consistency for deterministically chaotic time series is proved so far theoretically (not only numerically).

Failure of maximum likelihood methods for chaotic dynamical systems

Article

Apr 2007

Kevin Judd

The maximum likelihood method is a basic statistical technique for estimating parameters and variables, and is the starting point for many more sophisticated methods, like Bayesian methods. This paper shows that maximum likelihood fails to identify the true trajectory of a chaotic dynamical system, because there are trajectories that appear to be far more (infinitely more) likely than truth. This failure occurs for unbounded noise and for bounded noise when it is sufficiently large and will almost certainly have consequences for parameter estimation in such systems. The reason for the failure is rather simple; in chaotic dynamical systems there can be trajectories that are consistently closer to the observations than the true trajectory being observed, and hence their likelihood dominates truth. The residuals of these truth-dominating trajectories are not consistent with the noise distribution; they would typically have too small standard deviation and many outliers, and hence the situation may be remedied by using methods that examine the distribution of residuals and are not entirely maximum likelihood based.

Does model-free forecasting really outperform the true model?

Abstract and Figures

Recommended publications

Estimation of arbitrary impulsive interference in the complex state space systems

The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpol...

Bounding the Equilibrium Distribution of Markov Population Models

Evolutionary Rotation in Switching Incentive Zero-Sum Games