**research item**

Updates

0 new

66

Recommendations

0 new

140

Followers

1 new

466

Reads

22 new

7858

## Project log

- Sri Harsha Turlapati
- Lyudmila Grigoryeva
- Juan-Pablo Ortega
- Domenico Campolo

The empirical laws governing human-curvilinear movements have been studied using various relationships, including minimum jerk, the 2/3 power law, and the piecewise power law. These laws quantify the speed-curvature relationships of human movements during curve tracing using critical speed and curvature as regressors. In this work, we provide a reservoir computing-based framework that can learn and reproduce human-like movements. Specifically, the geometric invariance of the observations, i.e., lateral distance from the closest point on the curve, instantaneous velocity, and curvature, when viewed from the moving frame of reference, are exploited to train the reservoir system. The artificially produced movements are evaluated using the power law to assess whether they are indistinguishable from their human counterparts. The generalisation capabilities of the trained reservoir to curves that have not been used during training are also shown.

We present a convolutional framework which significantly reduces the complexity and thus, the computational effort for distributed reinforcement learning control of dynamical systems governed by partial differential equations (PDEs). Exploiting translational invariances, the high-dimensional distributed control problem can be transformed into a multi-agent control problem with many identical, uncoupled agents. Furthermore, using the fact that information is transported with finite velocity in many cases, the dimension of the agents' environment can be drastically reduced using a convolution operation over the state space of the PDE. In this setting, the complexity can be flexibly adjusted via the kernel width or by using a stride greater than one. Moreover, scaling from smaller to larger systems -- or the transfer between different domains -- becomes a straightforward task requiring little effort. We demonstrate the performance of the proposed framework using several PDE examples with increasing complexity, where stabilization is achieved by training a low-dimensional deep deterministic policy gradient agent using minimal computing resources.

Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. As shown in [27, 15, 36, 16, 39, 29, 48], a simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a data-adapted kernel which can be learned by using Kernel Flows [42]. The method of Kernel Flows is a trainable machine learning method that learns the optimal parameters of a kernel based on the premise that a kernel is good if there is no significant loss in accuracy if half of the data is used. The objective function could be a short-term prediction or some other objective (cf. [27] and [37] for other variants of Kernel Flows). However, this method is limited by the choice of the base kernel. In this paper, we introduce the method of Sparse Kernel Flows in order to learn the "best" kernel by starting from a large dictionary of kernels. It is based on sparsifying a kernel that is a linear combination of elemental kernels. We apply this approach to a library of 132 chaotic systems.

The Earth’s climate system displays substantial variability on spatial and temporal scales over many orders of magnitude. This complexity of the physics underlying
the dynamics of the system poses a significant challenge for any attempt to quantitatively model the climate system, as in any computationally tractable climate model
there will be a broad range of scales that can not be explicitly resolved but whose
aggregate effects on the resolved scales must be accounted for. Much of uncertainty
in predictions of weather and climate variability is linked to parameterization issues,
combined with our lack of understanding of the interaction between the mesoscale,
synoptic and planetary scales, simulation uncertainty due to the computer power.
Water vapor, as a greenhouse gas in the atmosphere, plays an important role
in climate system. But its dynamics are extremely complicated owing to the fact that
it is controlled by both cloud microphysical processes and by dynamical processes,
and that neither of these controls is well understood or represented in climate models.
Climate is sensitive to poorly known microphysical and dynamical processes. In general circulation models (GCMs), water vapor modeling and representation is one of
the major uncertainties. Better parameterization scheme will improve the prediction
of weather and climate. In this thesis, we present an observational data-driven, stochastic analysis-based method to represent these small-scale process forcing in order
to provide an alternative way for traditional deterministic moist convective parameterizations, which has some limitations and drawbacks.
There are two parts in this thesis. The first part is to estimate the moist
convection by means of observation data and theoretical model and then to examine
its role on the distribution of water vapor. The second part is to present a stochastic
convective parameterization scheme, to develop an idealized stochastic moist model
and finally to test its validity.
We have devised an observational data-driven, stochastic analysis-based method
for parameterizing the convective moistening. Convective forcing is much temporal
fluctuating and shows relative large correlation in time. We represent it by a correlated (or colored) noisy process, in terms of a fractional Brownian motion. Diagnostic
analysis shows that convective forcing has a closely relationship with specific humidity,
so multiplicative noise is used to develop an empirical formula between the convective
forcing and specific humidity. Based on the convergence theorem of power variation
and stochastic calculus, optimal parameters are obtained. An idealized theoretical
stochastic model for water vapor evolution is developed.
To validate our stochastic parameterization scheme, we compare the numerical
predictions based on the stochastic Advection-Diffusion-Condensation model with the
ERA-40 observation. Results demonstrate that even we have simplified treatment
for water vapor model, the established stochastic model can capture the characters
not only in first-moment of water vapor in El Ni˜no and La Ni˜na year but also in
second-moment and probability distribution. These results are quite promising. Both
mathematical theory and numerical experiments verify that this data-based stochastic
parameterization scheme is reasonable.

We consider the data-driven approximation of the Koopman operator for stochastic differential equations on reproducing kernel Hilbert spaces (RKHS). Our focus is on the estimation error if the data are collected from long-term ergodic simulations. We derive both an exact expression for the variance of the kernel cross-covariance operator, measured in the Hilbert-Schmidt norm, and probabilistic bounds for the finite-data estimation error. Moreover, we derive a bound on the prediction error of observables in the RKHS using a finite Mercer series expansion. Further, assuming Koopman-invariance of the RKHS, we provide bounds on the full approximation error. Numerical experiments using the Ornstein-Uhlenbeck process illustrate our results.

The advances in data science and machine learning have resulted in significant improvements regarding the modeling and simulation of nonlinear dynamical systems. It is nowadays possible to make accurate predictions of complex systems such as the weather, disease models or the stock market. Predictive methods are often advertised to be useful for control, but the specifics are frequently left unanswered due to the higher system complexity, the requirement of larger data sets and an increased modeling effort. In other words, surrogate modeling for autonomous systems is much easier than for control systems. In this paper we present the framework QuaSiModO (Quantization-Simulation-Modeling-Optimization) to transform arbitrary predictive models into control systems and thus render the tremendous advances in data-driven surrogate modeling accessible for control. Our main contribution is that we trade control efficiency by autonomizing the dynamics – which yields mixed-integer control problems – to gain access to arbitrary, ready-to-use autonomous surrogate modeling techniques. We then recover the complexity of the original problem by leveraging recent results from mixed-integer optimization. The advantages of QuaSiModO are a linear increase in data requirements with respect to the control dimension, performance guarantees that rely exclusively on the accuracy of the predictive model in use, and little prior knowledge requirements in control theory to solve complex control problems.

- Lukas Gonon
- Lyudmila Grigoryeva
- Juan-Pablo Ortega

A universal kernel is constructed whose sections approximate any causal and time-invariant filter in the fading memory category with inputs and outputs in a finite-dimensional Euclidean space. This kernel is built using the reservoir functional associated with a state-space representation of the Volterra series expansion available for any analytic fading memory filter. It is hence called the Volterra reservoir kernel. Even though the state-space representation and the corresponding reservoir feature map are defined on an infinite-dimensional tensor algebra space, the kernel map is characterized by explicit recursions that are readily computable for specific data sets when employed in estimation problems using the representer theorem. We showcase the performance of the Volterra reservoir kernel in a popular data science application in relation to bitcoin price prediction.

Most existing results in the analysis of quantum reservoir computing (QRC) systems with classical inputs have been obtained using the density matrix formalism. This paper shows that alternative representations can provide better insights when dealing with design and assessment questions. More explicitly, system isomorphisms have been established that unify the density matrix approach to QRC with the representation in the space of observables using Bloch vectors associated with Gell-Mann bases. It has been shown that these vector representations yield state-affine systems (SAS) previously introduced in the classical reservoir computing literature and for which numerous theoretical results have been established. This connection has been used to show that various statements in relation to the fading memory (FMP) and the echo state (ESP) properties are independent of the representation, and also to shed some light on fundamental questions in QRC theory in finite dimensions. In particular, a necessary and sufficient condition for the ESP and FMP to hold has been formulated, and contractive quantum channels that have exclusively trivial semi-infinite solutions have been characterized in terms of the existence of input-independent fixed points.

To what extent can we forecast a time series without fitting to historical data? Can universal patterns of probability help in this task? Deep relations between pattern Kolmogorov complexity and pattern probability have recently been used to make a priori probability predictions in a variety of systems in physics, biology and engineering. Here we study simplicity bias (SB) — an exponential upper bound decay in pattern probability with increasing complexity — in discretised time series extracted from the World Bank Open Data collection. We predict upper bounds on the probability of discretised series patterns, without fitting to trends in the data. Thus we perform a kind of ‘forecasting without training data’, predicting time series shape patterns a priori, but not the actual numerical value of the series. Additionally we make predictions about which of two discretised series is more likely with accuracy of ∼80%, much higher than a 50% baseline rate, just by using the complexity of each series. These results point to a promising perspective on practical time series forecasting and integration with machine learning methods.

The Koopman operator has become an essential tool for data-driven approximation of dynamical (control) systems, e.g., via extended dynamic mode decomposition. Despite its popularity, convergence results and, in particular, error bounds are still scarce. In this paper, we derive probabilistic bounds for the approximation error and the prediction error depending on the number of training data points, for both ordinary and stochastic differential equations while using either ergodic trajectories or i.i.d. samples. We illustrate these bounds by means of an example with the Ornstein–Uhlenbeck process. Moreover, we extend our analysis to (stochastic) nonlinear control-affine systems. We prove error estimates for a previously proposed approach that exploits the linearity of the Koopman generator to obtain a bilinear surrogate control system and, thus, circumvents the curse of dimensionality since the system is not autonomized by augmenting the state by the control inputs. To the best of our knowledge, this is the first finite-data error analysis in the stochastic and/or control setting. Finally, we demonstrate the effectiveness of the bilinear approach by comparing it with state-of-the-art techniques showing its superiority whenever state and control are coupled.

Modeling geophysical systems as dynamical systems and regressing their vector field from data is a simple way to learn emulators for such systems. We show that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equation-based models but are easier to train than neural networks such as the long short-term memory neural network. In addition, they are also more accurate and predictive than the latter. When trained on observational data for the global sea-surface temperature, considerable gains are observed by the proposed technique in comparison to classical partial differential equation-based models in terms of forecast computational cost and accuracy. When trained on publicly available re-analysis data for temperatures in the North-American continent, we see significant improvements over climatology and persistence based forecast techniques.

Modelling geophysical processes as low-dimensional dynamical systems and regressing their vector field from data is a promising approach for learning emulators of such systems. We show that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equation-based models but are easier to train than neural networks such as the long short-term memory neural network. In addition, they are also more accurate and predictive than the latter. When trained on geophysical observational data, for example the weekly averaged global sea-surface temperature, considerable gains are also observed by the proposed technique in comparison with classical partial differential equation-based models in terms of forecast computational cost and accuracy. When trained on publicly available re-analysis data for the daily temperature of the North American continent, we see significant improvements over classical baselines such as climatology and persistence-based forecast techniques. Although our experiments concern specific examples, the proposed approach is general, and our results support the viability of kernel methods (with learned kernels) for interpretable and computationally efficient geophysical forecasting for a large diversity of processes.

For dynamical systems with a non hyperbolic equilibrium, it is possible to significantly simplify the study of stability by means of the center manifold theory. This theory allows to isolate the complicated asymptotic behavior of the system close to the equilibrium point and to obtain meaningful predictions of its behavior by analyzing a reduced order system on the so-called center manifold.
Since the center manifold is usually not known, good approximation methods are important as the center manifold theorem states that the stability properties of the origin of the reduced order system are the same as those of the origin of the full order system.
In this work, we establish a data-based version of the center manifold theorem that works by considering an approximation in place of an exact manifold. Also the error between the approximated and the original reduced dynamics are quantified.
We then use an apposite data-based kernel method to construct a suitable approximation of the manifold close to the equilibrium, which is compatible with our general error theory. The data are collected by repeated numerical simulation of the full system by means of a high-accuracy solver, which generates sets of discrete trajectories that are then used as a training set. The method is tested on different examples which show promising performance and good accuracy.

A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a kernel. In particular, this strategy is highly efficient (both in terms of accuracy and complexity) when the kernel is data-adapted using Kernel Flows (KF) [OY19] (which uses gradient-based optimization to learn a kernel based on the premise that a kernel is good if there is no significant loss in accuracy if half of the data is used for interpolation). In this work, we extend previous work on learning dynamical systems using Kernel Flows [HO21, DHS + 21, LBHO21, DTL + 21, Owh21a] to the case of learning vector-valued dynamical systems from time-series observations that are partial/incomplete in the state space. The method combines Kernel Flows with Computational Graph Completion.

Data-driven models for nonlinear dynamical systems based on approximating the underlying Koopman operator or generator have proven to be successful tools for forecasting, feature learning, state estimation, and control. It has become well known that the Koopman generators for control-affine systems also have affine dependence on the input, leading to convenient finite-dimensional bilinear approximations of the dynamics. Yet there are still two main obstacles that limit the scope of current approaches for approximating the Koopman generators of systems with actuation. First, the performance of existing methods depends heavily on the choice of basis functions over which the Koopman generator is to be approximated; and there is currently no universal way to choose them for systems that are not measure preserving. Secondly, if we do not observe the full state, we may not gain access to a sufficiently rich collection of such functions to describe the dynamics. This is because the commonly used method of forming time-delayed observables fails when there is actuation. To remedy these issues, we write the dynamics of observables governed by the Koopman generator as a bilinear hidden Markov model, and determine the model parameters using the expectation-maximization (EM) algorithm. The E-step involves a standard Kalman filter and smoother, while the M-step resembles control-affine dynamic mode decomposition for the generator. We demonstrate the performance of this method on three examples, including recovery of a finite-dimensional Koopman-invariant subspace for an actuated system with a slow manifold; estimation of Koopman eigenfunctions for the unforced Duffing equation; and model-predictive control of a fluidic pinball system based only on noisy observations of lift and drag.

Reservoir computing systems are constructed using a driven dynamical system in which external inputs can alter the evolving states of a system. These paradigms are used in information processing, machine learning, and computation. A fundamental question that needs to be addressed in this framework is the statistical relationship between the input and the system states. This paper provides conditions that guarantee the existence and uniqueness of asymptotically invariant measures for driven systems and shows that their dependence on the input process is continuous when the set of input and output processes are endowed with the Wasserstein distance. The main tool in these developments is the characterization of those invariant measures as fixed points of naturally defined Foias operators that appear in this context and which have been profusely studied in the paper. Those fixed points are obtained by imposing a newly introduced stochastic state contractivity on the driven system that is readily verifiable in examples. Stochastic state contractivity can be satisfied by systems that are not state-contractive, which is a need typically evoked to guarantee the echo state property in reservoir computing. As a result, it may actually be satisfied even if the echo state property is not present.

In recent years, the artificial intelligence community has seen a continuous interest in research aimed at investigating dynamical aspects of both training procedures and machine learning models. Of particular interest among recurrent neural networks, we have the Reservoir Computing (RC) paradigm characterized by conceptual simplicity and a fast training scheme. Yet, the guiding principles under which RC operates are only partially understood. In this work, we analyze the role played by Generalized Synchronization (GS) when training a RC to solve a generic task. In particular, we show how GS allows the reservoir to correctly encode the system generating the input signal into its dynamics. We also discuss necessary and sufficient conditions for the learning to be feasible in this approach. Moreover, we explore the role that ergodicity plays in this process, showing how its presence allows the learning outcome to apply to multiple input trajectories. Finally, we show that satisfaction of the GS can be measured by means of the mutual false nearest neighbors index, which makes effective to practitioners theoretical derivations.

Training a residual neural network with L2 regularization on weights and biases is equivalent to minimizing a discrete least action principle and to controlling a discrete Hamiltonian system representing the propagation of input data across layers. The kernel/feature map analysis of this Hamiltonian system suggests a mean-field limit for trained weights and biases as the number of data points goes to infinity. The purpose of this paper is to investigate this mean-field limit and illustrate its existence through numerical experiments and analysis (for simple kernels).

Koopman operator theory has been successfully applied to problems from various research areas such as fluid dynamics, molecular dynamics, climate science, engineering, and biology. Applications include detecting metastable or coherent sets, coarse-graining, system identification, and control. There is an intricate connection between dynamical systems driven by stochastic differential equations and quantum mechanics. In this paper, we compare the ground-state transformation and Nelson's stochastic mechanics and demonstrate how data-driven methods developed for the approximation of the Koopman operator can be used to analyze quantum physics problems. Moreover, we exploit the relationship between Schrödinger operators and stochastic control problems to show that modern data-driven methods for stochastic control can be used to solve the stationary or imaginary-time Schrödinger equation. Our findings open up a new avenue toward solving Schrödinger's equation using recently developed tools from data science.

We propose a machine learning (ML) non-Markovian closure modelling framework for accurate predictions of statistical responses of turbulent dynamical systems subjected to external forcings. One of the difficulties in this statistical closure problem is the lack of training data, which is a configuration that is not desirable in supervised learning with neural network models. In this study with the 40-dimensional Lorenz-96 model, the shortage of data is due to the stationarity of the statistics beyond the decorrelation time. Thus, the only informative content in the training data is from the short-time transient statistics. We adopt a unified closure framework on various truncation regimes, including and excluding the detailed dynamical equations for the variances. The closure framework employs a Long-Short-Term-Memory architecture to represent the higher-order unresolved statistical feedbacks with a choice of ansatz that accounts for the intrinsic instability yet produces stable long-time predictions. We found that this unified agnostic ML approach performs well under various truncation scenarios. Numerically, it is shown that the ML closure model can accurately predict the long-time statistical responses subjected to various time-dependent external forces that have larger maximum forcing amplitudes and are not in the training dataset.
This article is part of the theme issue ‘Data-driven prediction in dynamical systems’.

This technical note presents an application of kernel model decomposition (KMD) for detecting critical transitions in some fast-slow random dynamical systems. The approach rests upon using KMD for reconstructing an observable with a novel data-based time-frequency-phase kernel that allows to approximate signals with critical transitions. In particular, we apply the developed method for approximating the solution and detecting critical transitions in some pro-totypical slow-fast SDEs with critical transitions. We also apply it to detecting seizures in a multi-scale mesoscale model of brain activity.
Keywords : Kernel Mode Decomposition (KMD), data-based kernels, micro-local kernel design, critical transitions, slow-fast stochastic differential equations, learning signal from data, learning noise from data.

To what extent can we forecast a time series without fitting to historical data? Can universal patterns of probability help in this task? Deep relations between pattern Kolmogorov complexity and pattern probability have recently been used to make a priori probability predictions in a variety of systems in physics, biology and engineering. Here we study simplicity bias (SB)-an exponential upper bound decay in pattern probability with increasing complexity-in discretised time series extracted from the World Bank Open Data collection. We predict upper bounds on the probability of discretised series patterns, without fitting to trends in the data. Thus we perform a kind of 'forecasting without training data'. Additionally we make predictions about which of two discretised series is more likely with accuracy of ∼80%, much higher than a 50% baseline rate, just by using the complexity of each series. These results point to a promising perspective on practical time series forecasting and integration with machine learning methods.

We present an approach for guaranteed constraint satisfaction by means of data-based optimal control, where the model is unknown and has to be obtained from measurement data. To this end, we utilize the Koopman framework and an eDMD-based bilinear surrogate modeling approach for control systems to show an error bound on predicted observables, i.e., functions of the state. This result is then applied to the constraints of the optimal control problem to show that satisfaction of tightened constraints in the purely data-based surrogate model implies constraint satisfaction for the original system.

Koopman operator theory has been successfully applied to problems from various research areas such as fluid dynamics, molecular dynamics, climate science, engineering, and biology. Applications include detecting metastable or coherent sets, coarse-graining, system identification, and control. There is an intricate connection between dynamical systems driven by stochastic differential equations and quantum mechanics. In this paper, we compare the ground-state transformation and Nelson's stochastic mechanics and demonstrate how data-driven methods developed for the approximation of the Koopman operator can be used to analyze quantum physics problems. Moreover, we exploit the relationship between Schr\"odinger operators and stochastic control problems to show that modern data-driven methods for stochastic control can be used to solve the stationary or imaginary-time Schr\"odinger equation. Our findings open up a new avenue towards solving Schr\"odinger's equation using recently developed tools from data science.

In previous work, we showed that learning dynamical system [21] with kernel methods can achieve state of the art, both in terms of accuracy and complexity, for predicting climate/weather time series [20], when the kernel is also learned from data. While the kernels considered in previous work were parametric, in this follow-up paper, we test a non-parametric approach and tune warping kernels (with kernel flows, a variant of cross-validation) for learning prototypical dynamical systems.

A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a kernel. In particular, this strategy is highly efficient (both in terms of accuracy and complexity) when the kernel is data-adapted using Kernel Flows (KF) [34] (which uses gradient-based optimization to learn a kernel based on the premise that a kernel is good if there is no significant loss in accuracy if half of the data is used for interpolation). Despite its previous successes, this strategy (based on interpolating the vector field driving the dynamical system) breaks down when the observed time series is not regularly sampled in time. In this work, we propose to address this problem by directly approximating the vector field of the dynamical system by incorporating time differences between observations in the (KF) data-adapted kernels. We compare our approach with the classical one over different benchmark dynamical systems and show that it significantly improves the forecasting accuracy while remaining simple, fast, and robust.

In this paper, we consider the density estimation problem associated with the stationary measure of ergodic Itô diffusions from a discrete-time series that approximate the solutions of the stochastic differential equations. To take an advantage of the characterization of density function through the stationary solution of a parabolic-type Fokker-Planck PDE, we proceed as follows. First, we employ deep neural networks to approximate the drift and diffusion terms of the SDE by solving appropriate supervised learning tasks. Subsequently, we solve a steady-state Fokker-Plank equation associated with the estimated drift and diffusion coefficients with a neural-network-based least-squares method. We establish the convergence of the proposed scheme under appropriate mathematical assumptions, accounting for the generalization errors induced by regressing the drift and diffusion coefficients, and the PDE solvers. This theoretical study relies on a recent perturbation theory of Markov chain result that shows a linear dependence of the density estimation to the error in estimating the drift term, and generalization error results of nonparametric regression and of PDE regression solution obtained with neural-network models. The effectiveness of this method is reflected by numerical simulations of a two-dimensional Student's t distribution and a 20-dimensional Langevin dynamics.

We propose a Machine Learning (ML) non-Markovian closure modeling framework for accurate predictions of statistical responses of turbulent dynamical systems subjected to external forcings. One of the difficulties in this statistical closure problem is the lack of training data, which is a configuration that is not desirable in supervised learning with neural network models. In this study with the 40-dimensional Lorenz-96 model, the shortage of data (in temporal) is due to the stationarity of the statistics beyond the decorrelation time, thus, the only informative content in the training data is on short-time transient statistics. We adopted a unified closure framework on various truncation regimes, including and excluding the detailed dynamical equations for the variances. The closure frameworks employ a Long-Short-Term-Memory architecture to represent the higher-order unresolved statistical feedbacks with careful consideration to account for the intrinsic instability yet producing stable long-time predictions. We found that this unified agnostic ML approach performs well under various truncation scenarios. Numerically, the ML closure model can accurately predict the long-time statistical responses subjected to various time-dependent external forces that are not (and maximum forcing amplitudes that are relatively larger than those) in the training dataset.

The Koopman operator has become an essential tool for data-driven approximation of dynamical (control) systems in recent years, e.g., via extended dynamic mode decomposition. Despite its popularity, convergence results and, in particular, error bounds are still quite scarce. In this paper, we derive probabilistic bounds for the approximation error and the prediction error depending on the number of training data points; for both ordinary and stochastic differential equations. Moreover, we extend our analysis to nonlinear control-affine systems using either ergodic trajectories or i.i.d. samples. Here, we exploit the linearity of the Koopman generator to obtain a bilinear system and, thus, circumvent the curse of dimensionality since we do not autonomize the system by augmenting the state by the control inputs. To the best of our knowledge, this is the first finite-data error analysis in the stochastic and/or control setting. Finally, we demonstrate the effectiveness of the proposed approach by comparing it with state-of-the-art techniques showing its superiority whenever state and control are coupled.

- Lyudmila Grigoryeva
- Allen Hart
- Juan-Pablo Ortega

This paper shows that the celebrated Embedding Theorem of Takens is a particular case of a much more general statement according to which, randomly generated linear state-space representations of generic observations of an invertible dynamical system carry in their wake an embedding of the phase space dynamics into the chosen Euclidean state space. This embedding coincides with a natural generalized synchronization that arises in this setup and that yields a topological conjugacy between the state-space dynamics driven by the generic observations of the dynamical system and the dynamical system itself. This result provides additional tools for the representation, learning, and analysis of chaotic attractors and sheds additional light on the reservoir computing phenomenon that appears in the context of recurrent neural networks.

This paper studies the theoretical underpinnings of machine learning of ergodic It\^o diffusions. The objective is to understand the convergence properties of the invariant statistics when the underlying system of stochastic differential equations (SDEs) is empirically estimated with a supervised regression framework. Using the perturbation theory of ergodic Markov chains and the linear response theory, we deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients. More importantly, our study shows that the usual $L^2$-norm characterization of the learning generalization error is insufficient for achieving this linear dependence result. We find that sufficient conditions for such a linear dependence result are through learning algorithms that produce a uniformly Lipschitz and consistent estimator in the hypothesis space that retains certain characteristics of the drift coefficients, such as the usual linear growth condition that guarantees the existence of solutions of the underlying SDEs. We examine these conditions on two well-understood learning algorithms: the kernel-based spectral regression method and the shallow random neural networks with the ReLU activation function.

Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theoretic methods allow for accurate predictions of weights and biases of feedforward, fully connected deep networks over a non-trivial range of training time. During this window, we find that our approach is >10x faster than various gradient descent based methods (e.g. Adam, Adadelta, Adagrad), in line with our complexity analysis. We end by highlighting open questions in this exciting intersection between dynamical systems and neural network theory. We highlight additional methods by which our results could be expanded to broader classes of networks and larger training intervals, which shall be the focus of future work.

Many problems in science and engineering require the efficient numerical approximation of integrals, a particularly important application being the numerical solution of initial value problems for differential equations. For complex systems, an equidistant discretization is often inadvisable, as it either results in prohibitively large errors or computational effort. To this end, adaptive schemes have been developed that rely on error estimators based on Taylor series expansions. While these estimators a) rely on strong smoothness assumptions and b) may still result in erroneous steps for complex systems (and thus require step rejection mechanisms), we here propose a data-driven time stepping scheme based on machine learning, and more specifically on reinforcement learning (RL) and meta-learning. First, one or several (in the case of non-smooth or hybrid systems) base learners are trained using RL. Then, a meta-learner is trained which (depending on the system state) selects the base learner that appears to be optimal for the current situation. Several examples including both smooth and non-smooth problems demonstrate the superior performance of our approach over state-of-the-art numerical schemes. The code is available under https://github.com/lueckem/quadrature-ML.

Modeling geophysical systems as dynamical systems and regressing their vector field from data is a simple way to learn emulators for such systems. We show that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equation-based models but are easier to train than neural networks such as the long short-term memory neural network. In addition, they are also more accurate and predictive than the latter. When trained on observational data for the global sea-surface temperature, considerable gains are observed by the proposed technique in comparison to classical partial differential equation based models in terms of forecast computational cost and accuracy. When trained on publicly available re-analysis data for temperatures in the North-American continent, we see significant improvements over climatology and persistence based forecast techniques.

As in almost every other branch of science, the major advances in data science and machine learning have also resulted in significant improvements regarding the modeling and simulation of nonlinear dynamical systems. It is nowadays possible to make accurate medium to long-term predictions of highly complex systems such as the weather, the dynamics within a nuclear fusion reactor, of disease models or the stock market in a very efficient manner. In many cases, predictive methods are advertised to ultimately be useful for control, as the control of high-dimensional nonlinear systems is an engineering grand challenge with huge potential in areas such as clean and efficient energy production, or the development of advanced medical devices. However, the question of how to use a predictive model for control is often left unanswered due to the associated challenges, namely a significantly higher system complexity, the requirement of much larger data sets and an increased and often problem-specific modeling effort. To solve these issues, we present a universal framework (which we call QuaSiModO: Quantization-Simulation-Modeling-Optimization) to transform arbitrary predictive models into control systems and use them for feedback control. The advantages of our approach are a linear increase in data requirements with respect to the control dimension, performance guarantees that rely exclusively on the accuracy of the predictive model, and only little prior knowledge requirements in control theory to solve complex control problems. In particular the latter point is of key importance to enable a large number of researchers and practitioners to exploit the ever increasing capabilities of predictive models for control in a straight-forward and systematic fashion.

Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. We present variants of cross-validation (Kernel Flows [31] and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) as simple approaches for learning the kernel used in these emulators.

Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. We present variants of cross-validation (Kernel Flows (Owhadi and Yoo, 2019) and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) as simple approaches for learning the kernel used in these emulators.

A nonparametric method to predict non-Markovian time series of partially observed dynamics is developed. The prediction problem we consider is a supervised learning task of finding a regression function that takes a delay-embedded observable to the observable at a future time. When delay-embedding theory is applicable, the proposed regression function is a consistent estimator of the flow map induced by the delay-embedding. Furthermore, the corresponding Mori-Zwanzig equation governing the evolution of the observable simplifies to only a Markovian term, represented by the regression function. We realize this supervised learning task with a class of kernel-based linear estimators, the kernel analog forecast (KAF), which are consistent in the limit of large data. In a scenario with a high-dimensional covariate space, we employ a Markovian kernel smoothing method which is computationally cheaper than the Nyström projection method for realizing KAF. In addition to the guaranteed theoretical convergence, we numerically demonstrate the effectiveness of this approach on higher-dimensional problems where the relevant kernel features are difficult to capture with the Nyström method. Given noisy training data, we propose a nonparametric smoother as a de-noising method. Numerically, we show that the proposed smoother is more accurate than EnKF and 4Dvar in de-noising signals corrupted by independent (but not necessarily identically distributed) noise, even if the smoother is constructed using a data set corrupted by white noise. We show skillful prediction using the KAF constructed from the denoised data.

Many dimensionality and model reduction techniques rely on estimating dominant eigenfunctions of associated dynamical operators from data. Important examples include the Koopman operator and its generator, but also the Schrödinger operator. We propose a kernel-based method for the approximation of differential operators in reproducing kernel Hilbert spaces and show how eigenfunctions can be estimated by solving auxiliary matrix eigenvalue problems. The resulting algorithms are applied to molecular dynamics and quantum chemistry examples. Furthermore, we exploit that, under certain conditions, the Schrödinger operator can be transformed into a Kolmogorov backward operator corresponding to a drift-diffusion process and vice versa. This allows us to apply methods developed for the analysis of high-dimensional stochastic differential equations to quantum mechanical systems.

We study the problem of estimating linear response statistics under external perturbations using time series of unperturbed dynamics. Based on the fluctuation-dissipation theory, this problem is reformulated as an unsupervised learning task of estimating a density function. We consider a nonparametric density estimator formulated by the kernel embedding of distributions with "Mercer-type" kernels, constructed based on the classical orthogonal polynomials defined on non-compact domains. While the resulting representation is analogous to Polynomial Chaos Expansion (PCE), the connection to the reproducing kernel Hilbert space (RKHS) theory allows one to establish the uniform convergence of the estimator and to systematically address a practical question of identifying the PCE basis for a consistent estimation. We also provide practical conditions for the well-posedness of not only the estimator but also of the underlying response statistics. Finally, we provide a statistical error bound for the density estimation that accounts for the Monte-Carlo averaging over non-i.i.d time series and the biases due to a finite basis truncation. This error bound provides a means to understand the feasibility as well as limitation of the kernel embedding with Mercer-type kernels. Numerically, we verify the effectiveness of the estimator on two stochastic dynamics with known, yet, non-trivial equilibrium densities.

We present a novel algorithm that allows us to gain detailed insight into the effects of sparsity in linear and nonlinear optimization, which is of great importance in many scientific areas such as image and signal processing, medical imaging, compressed sensing, and machine learning (e.g., for the training of neural networks). Sparsity is an important feature to ensure robustness against noisy data, but also to find models that are interpretable and easy to analyze due to the small number of relevant terms. It is common practice to enforce sparsity by adding the $\ell_1$-norm as a weighted penalty term. In order to gain a better understanding and to allow for an informed model selection, we directly solve the corresponding multiobjective optimization problem (MOP) that arises when we minimize the main objective and the $\ell_1$-norm simultaneously. As this MOP is in general non-convex for nonlinear objectives, the weighting method will fail to provide all optimal compromises. To avoid this issue, we present a continuation method which is specifically tailored to MOPs with two objective functions one of which is the $\ell_1$-norm. Our method can be seen as a generalization of well-known homotopy methods for linear regression problems to the nonlinear case. Several numerical examples - including neural network training - demonstrate our theoretical findings and the additional insight that can be gained by this multiobjective approach.

We study the problem of predicting rare critical transition events for a class of slow-fast nonlinear dynamical systems. The state of the system of interest is described by a slow process, whereas a faster process drives its evolution and induces critical transitions. By taking advantage of recent advances in reservoir computing, we present a data-driven method to predict the future evolution of the state. We show that our method is capable of predicting a critical transition event at least several numerical time steps in advance. We demonstrate the success as well as the limitations of our method using numerical experiments on three examples of systems, ranging from low dimensional to high dimensional. We discuss the mathematical and broader implications of our results.

For dynamical systems with a non hyperbolic equilibrium, it is possible to significantly simplify the study of stability by means of the center manifold theory. This theory allows to isolate the complicated asymptotic behavior of the system close to the equilibrium point and to obtain meaningful predictions of its behavior by analyzing a reduced order system on the so-called center manifold. Since the center manifold is usually not known, good approximation methods are important as the center manifold theorem states that the stability properties of the origin of the reduced order system are the same as those of the origin of the full order system. In this work, we establish a data-based version of the center manifold theorem that works by considering an approximation in place of an exact manifold. Also the error between the approximated and the original reduced dynamics are quantified. We then use an apposite data-based kernel method to construct a suitable approximation of the manifold close to the equilibrium, which is compatible with our general error theory. The data are collected by repeated numerical simulation of the full system by means of a high-accuracy solver, which generates sets of discrete trajectories that are then used as a training set. The method is tested on different examples which show promising performance and good accuracy.

This article presents a general framework for recovering missing dynamical systems using available data and machine learning techniques. The proposed framework reformulates the prediction problem as a supervised learning problem to approximate a map that takes the memories of the resolved and identifiable unresolved variables to the missing components in the resolved dynamics. We demonstrate the effectiveness of the proposed framework with a strong convergence error bound of the resolved variables up to finite time and numerical tests on prototypical models in various scientific domains. These include the 57-mode barotropic stress models with multiscale interactions that mimic the blocked and unblocked patterns observed in the atmosphere, the nonlinear Schrödinger equation which found many applications in physics such as optics and Bose-Einstein-Condense, the Kuramoto-Sivashinsky equation which spatiotemporal chaotic pattern formation models trapped ion mode in plasma and phase dynamics in reaction-diffusion systems. While many machine learning techniques can be used to validate the proposed framework, we found that recurrent neural networks outperform kernel regression methods in terms of recovering the trajectory of the resolved components and the equilibrium one-point and two-point statistics. This superb performance suggests that a recurrent neural network is an effective tool for recovering the missing dynamics that involves approximation of high-dimensional functions.

This paper investigates the formulation and implementation of Bayesian inverse problems to learn input parameters of partial differential equations (PDEs) defined on manifolds. Specifically, we study the inverse problem of determining the diffusion coefficient of a second-order elliptic PDE on a closed manifold from noisy measurements of the solution. Inspired by manifold learning techniques, we approximate the elliptic differential operator with a kernel-based integral operator that can be discretized via Monte Carlo without reference to the Riemannian metric. The resulting computational method is mesh-free and easy to implement, and can be applied without full knowledge of the underlying manifold, provided that a point cloud of manifold samples is available. We adopt a Bayesian perspective to the inverse problem, and establish an upper bound on the total variation distance between the true posterior and an approximate posterior defined with the kernel forward map. Supporting numerical results show the effectiveness of the proposed methodology.

Echo state networks (ESNs) have been recently proved to be universal approximants for input/output systems with respect to various $L ^p$-type criteria. When $1\leq p< \infty$, only $p$-integrability hypotheses need to be imposed, while in the case $p=\infty$ a uniform boundedness hypotheses on the inputs is required. This note shows that, in the last case, a universal family of ESNs can be constructed that contains exclusively elements that have the echo state and the fading memory properties. This conclusion could not be drawn with the results and methods available so far in the literature.

In recent years, the machine learning community has seen a continuous growing interest in research aimed at investigating dynamical aspects of both training procedures and perfected models. Of particular interest among recurrent neural networks, we have the Reservoir Computing (RC) paradigm for its conceptual simplicity and fast training scheme. Yet, the guiding principles under which RC operates are only partially understood. In this work, we study the properties behind learning dynamical systems with RC and propose a new guiding principle based on Generalized Synchronization (GS) granting its feasibility. We show that the well-known Echo State Property (ESP) implies and is implied by GS, so that theoretical results derived from the ESP still hold when GS does. However, by using GS one can profitably study the RC learning procedure by linking the reservoir dynamics with the readout training. Notably, this allows us to shed light on the interplay between the input encoding performed by the reservoir and the output produced by the readout optimized for the task at hand. In addition, we show that - as opposed to the ESP - satisfaction of the GS can be measured by means of the Mutual False Nearest Neighbors index, which makes effective to practitioners theoretical derivations.

- Lyudmila Grigoryeva
- Allen Hart
- Juan-Pablo Ortega

This paper shows that a large class of fading memory state-space systems driven by discrete-time observations of dynamical systems defined on compact manifolds always yields continuously differentiable synchronizations. This general result provides a powerful tool for the representation, reconstruction, and forecasting of chaotic attractors. It also improves previous statements in the literature for differentiable generalized synchronizations, whose existence was so far guaranteed for a restricted family of systems and was detected using H\"older exponent-based criteria.

In recent years, the success of the Koopman operator in dynamical systems analysis has also fueled the development of Koopman operator-based control frameworks. In order to preserve the relatively low data requirements for an approximation via dynamic mode decomposition, a quantization approach was recently proposed in [S. Peitz and S. Klus, Automatica J. IFAC, 106 (2019), pp. 184--191]. This way, control of nonlinear dynamical systems can be realized by means of switched systems techniques, using only a finite set of autonomous Koopman operator-based reduced models. These individual systems can be approximated very efficiently from data. The main idea is to transform a control system into a set of autonomous systems for which the optimal switching sequence has to be computed. In this article, we extend these results to continuous control inputs using relaxation. This way, we combine the advantages of the data efficiency of approximating a finite set of autonomous systems with continuous controls, as the data requirements increase only linearly with the input dimension. We show that when using the Koopman generator, this relaxation---realized by linear interpolation between two operators---does not introduce any error for control affine systems. This allows us to control high-dimensional nonlinear systems using bilinear, low-dimensional surrogate models. The efficiency of the proposed approach is demonstrated using several examples with increasing complexity, from the Duffing oscillator to the chaotic fluidic pinball.

This short review describes mathematical techniques for statistical analysis and prediction in dynamical systems. Two problems are discussed, namely (i) the supervised learning problem of forecasting the time evolution of an observable under potentially incomplete observations at forecast initialization; and (ii) the unsupervised learning problem of identification of observables of the system with a coherent dynamical evolution. We discuss how ideas from from operator-theoretic ergodic theory combined with statistical learning theory provide an effective route to address these problems, leading to methods well-adapted to handle nonlinear dynamics, with convergence guarantees as the amount of training data increases.

- Christa Cuchiero
- Lukas Gonon
- Lyudmila Grigoryeva
- [...]
- Josef Teichmann

A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of state-space systems that generate Volterra series expansions. This procedure yields a state-affine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided.