PosterPDF Available

Abstract and Figures

In modeling nonlinear dynamics, neural networks are of interest for prediction and uncertainty quantification. The “learnability” of chaotic dynamics by neural networks, however, remains poorly understood. In this work, we show that a parsimonious network trained on few data points suffices for accurate prediction of local divergence rates on the whole attractor. To understand neural learnability, we decompose the mappings in the neural network into a series of geometric stretching and compressing operations that indicate topological mixing and, therefore, chaos. This reveals that neural networks and chaotic dynamical systems are structurally similar, which yields excellent reproduction of local divergence rates. To build parsimonious networks, we employ an approach that matches the spectral features of the dynamics of deep learning those of polynomial regression.
Content may be subject to copyright.
On Neural Learnability of Chaotic Dynamics
Ziwei Liand Sai Ravela
Earth Signals and Systems Group, Department of Earth, Atmospheric,
and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 91106, USA
(Dated: December 10, 2019)
In modeling nonlinear dynamics, neural networks are of interest for prediction and uncertainty
quantification. The “learnability” of chaotic dynamics by neural networks, however, remains poorly
understood. In this work, we show that a parsimonious network trained on few data points suffices
for accurate prediction of local divergence rates on the whole attractor. To understand neural
learnability, we decompose the mappings in the neural network into a series of geometric stretching
and compressing operations that indicate topological mixing and, therefore, chaos. This reveals
that neural networks and chaotic dynamical systems are structurally similar, which yields excellent
reproduction of local divergence rates. To build parsimonious networks, we employ an approach
that matches the spectral features of the dynamics of deep learning those of polynomial regression.
Introduction – Chaotic systems are ubiquitous [24].
For these systems, there usually exists a set of contin-
uous nonlinear governing equations, but finding the ex-
act solutions is often impossible, in part due to a com-
mon chaos characteristic wherein two close by trajecto-
ries diverge exponentially. In practice, modelers use dis-
cretization to solve the nonlinear equations, and often
from multiple initial conditions to quantify errors. Do-
ing so, however, raises difficult challenges in the form
of non-linearity, high-dimensionality and non-Gaussian
uncertainty[21]. As a result, the search for simple-yet-
effective models for chaotic dynamics remains a crucial
pursuit in the physical sciences.
Recently, there has been a surge of interest in us-
ing neural networks (NNs) to emulate chaotic dynam-
ics [2, 3, 7, 9, 19, 29, 30, 32], showing neural networks
as promising models. We follow this line of investigation,
first showing that neural models with only a few neurons,
and trained using a small number of data points, recon-
struct the entire attractor object of the classic Lorenz-63
(L63) system [18]. NNs can “extrapolate” from partial
knowledge of the attractor, rendering a uniform distribu-
tion of the training data unnecessary. The neural models
are also, it appears, as chaotic as the L63 system they
train from.
This success is much like other effort seeking to em-
ulate chaotic dynamics. By way of explanation, one
typically resorts to the universal approximation theorem
(UAP)[10, 16, 17, 23]. However, UAP is an existence
statement. It neither explains the emergence of chaos
nor the efficacy with which the attractor is reconstructed.
Inspired by the validity of a geometric interpretation of
the L63 system [26], we show that a geometric perspec-
tive helps to explain the neural efficacy. Neural mappings
alternately rotate, stretch, and compress, which are the
defining characteristics of chaotic dynamics [6]. Whilst
the question of optimizing neural computation for predic-
tion remains open, our work suggests that possession of
ziweili@mit.edu
geometrical properties required by chaos theory enables
neural networks to efficiently match the structure of the
L63’s attractor object and its predictability. To the best
of our knowledge, this explanation for neural learnability
of chaos is new [20]. A key step in this process is to show
that NNs are compact and do not over-fit in relation to
the attractor object. We achieve this by noting that L63
is a polynomial, allowing us impose bounds the size of
the neural network [22, 25].
Methods – The L63 model was originally used to de-
scribe 2-D Rayleigh-B´enard convection, in which the pa-
rameters of the streamfunction and temperature fields
are written in a set of ordinary differential equations [18]:
˙
X=σ(YX);
˙
Y=ρX YXZ;
˙
Z=βZ +X Y,
(1)
where Xand Yare the strengths of the streamfunction
and temperature modes, and Zrepresents the deviation
of the vertical temperature profile from linearity. Consis-
tent with Lorenz’s original paper, we set σ= 10, β= 8/3,
and ρ= 28. The solutions of L63 are known to be dis-
sipative (volume in phase space contracts rapidly), and
chaotic (sensitive to initial perturbations).
We define L63 as a discrete mapping from the current
state of the system xn= (X, Y, Z)Tto the state of the
next timestep xn+1:
ΦL63(xn) = xn+1 .(2)
We choose this discrete form both for L63 and NN maps
because it provides a straightforward connection between
the geometric L63 mapping and dynamics in the neu-
ral network. Since the exact form of Eq. (2) for L63 is
unknown, the discrete map is obtained by numerically
integrating Eq. (1) and sampling at increment dt.
The discrete map implemented by a single-hidden-layer
feedforward NN is
ΦNN(xn) = W2g(W1xn+b1) + b2,(3)
in which the 3×1 input vector, xn, is first left-multiplied
by an L×3 weight matrix W1, and added to an L×1
arXiv:submit/2959506 [cs.LG] 10 Dec 2019
2
bias term b1, where Lis the number of neurons. The
resulting vector is then element-wise “compressed” by a
sigmoid function g, which takes the form of tanh in our
setup. Left-multiplication by a 3×Lmatrix W2followed
by addition of another bias term b2finishes a mapping
iteration of the NN.
We use Matlab function ode45 to numerically solve
for the discrete mappings of L63 as training data. To
obtain data on the attractor, we randomly initialize 1000
trajectories from region [20,20] ×[20,20] ×[0,50] with
uniform distribution. Each trajectory is integrated for
2500 timesteps with dt= 0.01. We abandon the first
2000 timesteps to remove the transient parts, which are
much shorter than 2000 steps. Then, the remaining 500
timesteps of the 1000 trajectories are aggregated as pairs
(x,x0) that satisfy x0=ΦL63(x), forming the training
data pool.
The prior locations of the training data pairs (x) can
be seen as a representation of the L63 attractor (AL63),
and each consecutive location pair provide information
about the discrete L63 flow. We then randomly sample
a specific number of training pairs from the data pool
to train NNs. Each NN is trained for 103epochs with
Bayesian regularization [8], where an epoch means a full
sweep through the sampled training data.
We use the finite-time Lyapunov exponent (FTLE) [13]
to compare the local divergence rates of NN and L63 and
quantify their similarity on the basis of predictability.
The FTLE is computed from forward-propagating two
nearby trajectories that originate from the vicinity of the
AL63 attractor. Formally, it is defined as
λmax := 1
Nt
ln
max
δx0
δxNt
|δx0|=1
Nt
ln σmax,(4)
where λmax denotes the maximum FTLE, δx0is the ini-
tial perturbation between two trajectories, and δxNtde-
notes their difference after Ntsteps. The FTLE relates
to the original Lyapunov exponent when Nt→ ∞ and
δx00[4]. To obtain λmax, we select the direction
of δx0such that |δxNt|is maximized, and in practice,
we calculate λmax using the largest eigenvalue (σmax) of
JTJ, where Jis the Jacobian matrix evaluated using per-
turbations around x0.
Results – We first show that NN can learn the chaotic
dynamics of L63 efficiently with a small number of data
and neurons. The quadratic prediction error is reported
in Ref. [31], and will not be the main focus of this paper.
We instead compare the short-term and long-term behav-
iors of the two systems. Specifically, we show that the dy-
namics represented by NN possess similar predictability
to L63 as quantified by FTLE, and NNs are able to ex-
trapolate regions that are unknown in the training data.
We analyze a 4-neuron network trained on 40 data
points randomly sampled from the training data pool
(table Ishows the learnt parameters of this network).
Fig. 1depicts two trajectories that follow the L63 flow
FIG. 1. Two trajectories produced by L63 (blue) and the 4-
neuron NN trained on 40 data points sampled from the whole
attractor (red). They are both 2000 timesteps long, and start
from the same location on the Lorenz attractor.
and the flow of the trained NN, respectively. They in-
terlace with each other, creating the well-known Lorenz
attractor. Trajectories starting from other locations on
the attractor roughly trace out the same structure (not
shown). The close-resemblance between the two attrac-
tors indicates that the dynamics of this parsimonious NN
is similar to that of L63, i.e., NNs are able to learn chaotic
dynamics efficiently.
FIG. 2. One-to-one scatter plot of FTLE with NN and L63.
The NN used in this plot has 4 neurons and is trained on only
40 data points. The panels correspond to four integration
steps, Nt, with time increment dt= 0.01.
To calculate FTLE, we generate points following the
NN flow using the same generation process as in Meth-
ods. The generated points in the phase space represents
the NN attractor (ANN). We then randomly initialize
2000 trajectories on AL63 . Every trajectory from AL63 is
paired with another trajectory that starts from the clos-
est point on ANN, and in each pair of trajectories, the
former follows the L63 flow while the latter follows the
NN flow.
3
FIG. 3. The root-mean-squre error in FTLE of neural net-
works for each neuron and number-of-data configuration. The
FTLE is calculated with Nt= 50 and averaged over 2000 tra-
jectories randomly initialized on the attractor. The red dot
represents the example configuration in Fig. 1and 2. The
red surface is located at z= 0.05.
The FTLE of the trajectory pairs are compared under
different timesteps: Nt= 5,50,100,500 (Fig. 2). When
Nt= 5,50, NN accurately reproduces local divergence
rates over the whole attractor, indicating that the short-
term predictability of the two systems agree with each
other. As Ntincreases, the correspondence diverges (Nt
= 100), and converges again (Nt= 500) to the classical
largest Lyapunov exponent of L63, which is roughly 0.91
as in Ref. [28]. The convergence of FTLE under large
timesteps implies that the long-term behavior of the two
systems is also similar.
TABLE I. Parameters of the 4-neuron NN flow
Matrix Values
W1
0.0034 0.0030 0.0050
0.0115 0.0072 0.0015
0.0067 0.0009 0.0064
0.0075 0.0005 0.0001
b1
T
0.1131 0.6111 0.0266 0.1395
W2
6.0807 5.2861 7.9178 107.1371
370.0114 26.1875 270.5582 366.2765
169.8626 95.5298 40.6654 62.2099
b2
T
11.5557 71.0935 40.1989
diag(S)4.3852 1.2087 0.7184 0.0000
The agreement in FTLE generally improves under in-
creasing numbers of neurons and data points (Fig. 3).
This trend is expected if we invoke the bias and variance
trade-off [11]: increased complexity in learning models
such as neural networks generally translates into better
prediction accuracy (lower bias), provided that regular-
ization techniques prevent the learning algorithm from
entering the high-variance regime.
NN can extrapolate from an incomplete training
dataset. Fig. 4shows a comparison of two trajecto-
FIG. 4. Similar to Fig. 1, but the red trajectory is produced
by a 5-neuron NN trained on 100 data points sampled from
X > 5 part of the attractor. The region to the right of the
grey partition is the training data range, and the region to
the left is unknown to the NN.
ries predicted by NN and L63 that originate from the
same location (red dot). The NN in this case has 5 neu-
rons, and is trained on 100 data points sampled from the
X > 5 part of AL63 , which amounts to knowing 73%
of the attractor structure. The two trajectories are close
in the first 100 timesteps, and then bifurcate onto the
two branches of the attractor. Despite starting from an
unknown region, the NN still predicts a well-behaved at-
tractor that closely resembles the original attractor in the
extrapolated region of X≤ −5. The one-to-one corre-
spondence of FTLEs between L63 and the NN trained on
the incomplete dataset is similar to Fig. 2(not shown).
Geometric perspective of the NN flow – We showed
from the previous section that the neural learnability on
the L63 dynamics is very good. However, the theoreti-
cal approach to understand this learnability is unknown.
Although the UAP states that mapping ΦL63 can be ap-
proximated by NN arbitrarily well, it does not explain
the NN’s ability to reconstruct the strange attractor ef-
ficiently, nor its skill of extrapolation. Inspired by the
exact mathematical correspondence between the geomet-
ric Lorenz flow and L63 [12, 26], we give our geometric
understanding of the NN flow.
The dynamics of NN (Eq. 3) can be seen as a map-
ping in a multi-dimensional Riemann space (this inter-
pretation is also used in classification problems [14]). In
the discrete map of the simple 4-neuron network dis-
cussed above, the input vector xin the 3-D phase space
is mapped into a 4-D neuron space, and then mapped
back to the phase space. We write an Nt-step trajectory
(Nt2) as LNt
0={x0,x1, ..., xNt}. In each mapping
from step nn+ 1, n ∈ {0,1, ..., Nt1}, there exists a
4-D intermediate vector yin the neuron space:
yn+1 =g(W1xn+b1),(n= 0,1, ..., Nt1).(5)
We refer to yas neuron vector. The recurrence relation
of the neuron vector is then:
yn+1 =g(Wyn+b),(n= 1,2, ..., Nt1).(6)
4
where W=W1W2is a 4-by-4 matrix, and b=
W1b2+b1is a 4-by-1 vector. Wcan be decomposed
as W=USVTusing singular-value decomposition. U
and Vare 4-D both orthonormal matrices, and Sis a
diagonal matrix of rank 3. We then rewrite Eq. (6) more
explicitly as
yn+1 =g(USVTyn+b),(7)
which we call the neuron map. Equation (7) encodes the
entire dynamics learnt by NN, because it is different from
Eq. (3) by a homomorphism, i.e., Eq. (5). Therefore, un-
derstanding the neuron map is equivalent to understand-
ing the dynamics of NN.
The neuron map comprises 4 sub-steps: rotation,
stretch, rotation, and compression. Rotations in this pa-
per takes the generalized sense of orthogonal transfor-
mation, and they are carried out by matrices VTand
Uin the neuron map. Since the sigmoid function only
has a compressing effect due to its gradient being smaller
than or equal to 1, diagonal matrix Smust have at least
one diagonal element larger than 1 in order to satisfy
the requirement of stretching in chaotic dynamics. For
the 4-neuron NN at question, Simposes expanding effect
in two dimensions since two of its diagonal elements are
greater than 1 (see table I).
Through the error growth between each timestep, the
effects of compression and expansion exerted by NN are
seen more clearly. For a small perturbation (δy) between
two initial points near y, its value at the next timestep
is
δy0=g0(Wy+b)Wδy(8)
where we neglected second- and higher-order terms, and
denotes the element-wise product. We let Gjj =
g0PL
i=1 W
ji yi+b
j, then g0(Wyn+b)Wδy=
GWδy, where G= diag{G11 , G22, ...}. The squared
error is then
|δy0|2= (Wδy)TG2Wδy.(9)
From Eq. (9), it’s clear that singular values of Wthat
are larger than 1 will expand the perturbation, and G
compresses the perturbation since g0(x)(0,1],xR.
With information of y,Gcontrols the degree of com-
pression in each direction in the neuron space. Uand
Vin the decomposition of Wcontrols the orientations
of compression and expansion, so that they take place in
different directions.
The stretch and compression sub-steps in neuron maps
are frequently thought of as the standard way to create
topological mixing, an indicator of chaos. The ability to
be trained to obtain these geometric operations makes
NN very good to approximate discrete chaotic mapping.
As another perhaps more concrete example, a 2-neuron
NN map can be trained to faithfully recreate the H´enon
map (not shown), which is a 2-D chaotic map defined
such that trajectories are stretched in one direction and
compressed in the other [15].
Generalization to multi-layer networks is straightfor-
ward in the above framework. Since “the dynamics of
the neural vector” will be ambiguous as there are mul-
tiple layers of neurons, we apply the same argument to
perturbations in the phase space. For a perturbation of
δxaround x, its squared length at the next timestep is
|δx0|2=|WN+1GNWN...G1W1δx|2,(10)
where Gi= diag{g0(Wiyi1+bi)}, and yi1is the neu-
ron vector of the ith layer for i > 1 (y0=x). The
weight and gradient matrices then consecutively param-
eterize multiple stretching and compressing operations in
a single NN map.
Lower-bounding the number of neurons – Since the
Euler-forward scheme of Eq. (1) is a 3-D (n= 3) poly-
nomial with a degree of at most d= 2, we use previ-
ous theoretical results on learning polynomials with NNs
[1, 5] to establish lower bounds on the necessary number
of neurons. In effect, we assume that the dynamics are
polynomial but the learning system doesn’t know the ex-
act governing equations. The number of neurons (L) for
learning a polynomial with root-mean-square error target
is bounded by L= Ω(n6d/3)[1]. This is a rather coarse
estimate as more than 5 ×105nodes are needed when
1. In stark contrast, exactly two neurons in a single
hidden-layer of a PolyNet reproduce the sparse L63 poly-
nomial to numerical precision [25]. Matching equilibrium
norms of neural and polynomial regression asymptoti-
cally, a full polynomial (n, d) needs L=n+d
d(n+ 1)
hidden nodes [22, 25] for an exact match. If a poly-
nomial were instead represented by a network with di-
rect input-output connections for the linear part, added
with a single tanh -activated hidden-layer for the resid-
ual nonlinear part, then matching the equilibrium norm
yields a bound: Ln
2n+1 hn+d
d(n+ 1)iof hidden-
layer units [22]. Eliminating constants using random
bounded-input, bounded-weight networks reveals that an
n= 3, d = 2 polynomial matches networks of 3 to 8
nodes with 95% confidence. Note that the standard net-
work with just a single hidden tanh -layer with no input-
output bypass is sub-optimal, asymptotically yielding the
bound [22]:Ln
2n+1 hn+d
d1iof hidden-layer units.
A Taylor-expansion of the sigmoid function to the third
order: tanh(x) = xx3/3 + O(x5) allows Eq.(3) to
be modeled as a polynomial of degree 3 (NN polyno-
mial). We further require all coefficients of the NN poly-
nomial to be equal to those in Eq. (1). Then for an
NN with Lhidden nodes, biases, and n-dimensional in-
put/output, a total of 2nL +n+Lparameters should
satisfy 3C3
(n+3) constraining equations. The parameters
should be over-determined for a good fit, hence at least
L=d(3C3
(n+3) n)/(2n+ 1)e= 9 hidden nodes are
needed. To obtain an error estimate, we assume that
the prediction errors between NN and its truncated poly-
nomial dominates over the errors between NN and L63,
5
therefore the truncation errors of NN polynomial pro-
vides an upper bound on the prediction error of NN. We
can estimate the truncation error by substituting table I
into the NN polynomial to obtain ΦNNpoly, and calcu-
late the expected error over data sampled from the ANN
attractor: 2=h(ΦNNpoly ΦNN)2iANN . 5000 random
samples on ANN gives a normalized error of 0.12.
Discussion – Our work suggests that NN may be a
good candidate to learn from data and represent a broad
range of chaotic dynamics with a good skill of generaliza-
tion. With the flow-like dynamics and gradient-descent
algorithm, it may serve as a non-parametric model for
chaotic systems without explicit expressions. Conversely,
neural networks can be seen as a much more generalized
class of chaotic systems. Apart from compression and
expansion operations that are necessary for chaos, the
higher-dimensional rotations are also vital in creating the
flow-like dynamics. We may further posit that the neu-
ral networks may be a unifying formulation for modeling
chaotic dynamics because it reproduces the H´enon map
and the discrete Lorenz map under the same mathemat-
ical framework.
On the other hand, the compression operation repre-
sented by the sigmoid function makes NN preferable to
model simple dissipative systems; its ability to model
conservative dynamics and systems of much higher di-
mensions is yet to be tested. More work is also needed,
possibly with the aid of Riemann geometry, to fundamen-
tally understand the geometric operations in the high-
dimensional neuron space.
Acknowledgments – Ziwei Li was advised by Sai Rav-
ela. Support from ONR grant N00014-19-1-2273, the
MIT Environmental Solutions Initiative, the John S. and
Maryann Montrym Fund, and the MIT Lincoln Labora-
tory is gratefully acknowledged.
[1] Andoni, A., Panigrahy, R., Valiant, G., and Zhang, L.,
in Proceedings of the 31st International Conference on
International Conference on Machine Learning (2014).
[2] Bahi, J. M., Couchot, J. F., Guyeux, C., and Salomon,
M., Chaos 22, 013122 (2012).
[3] Bakker, R., Schouten, J. C., Lee Giles, C., Takens, F.,
and Van den Bleek, C. M., Neural Computation 12, 2355
(2000).
[4] Barreira, L. and Pesin, Y., in University Lecture Series,
Vol. 23 (American Mathematical Society, Providence, RI,
2002) p. 151.
[5] Barron, A. R., IEEE Transactions on Information Theory
39, 930 (1993).
[6] Berge, P., Pomeau, Y., and Vidal, C., Order within chaos
(Wiley, 1987) p. 329.
[7] Chen, R. T., Rubanova, Y., Bettencourt, J., and Du-
venaud, D., in 32nd Conference on Neural Information
Processing Systems (2018).
[8] Dan Foresee, F. and Hagan, M. T., in Proceedings of
International Conference on Neural Networks (1997).
[9] Dudul, S. V., Applied Soft Computing 5, 333 (2005).
[10] Funahashi, K. and Nakamura, Y., Neural Networks 6,
801 (1993).
[11] Goodfellow, I., Bengio, Y., and Courville, A., Deep
Learning (MIT Press, Cambridge, MA, 2016).
[12] Guckenheimer, J. and Williams, R. F., Publ. Math. IHES
50, 307 (1979).
[13] Haller, G., Physica D: Nonlinear Phenomena 149, 248
(2001).
[14] Hauser, M. and Ray, A., in 31st Conference on Neural
Information Processing Systems (2017) p. 10.
[15] enon, M., Communications in Mathematical Physics
50, 69 (1976).
[16] Hornik, K., Neural Networks 4, 251 (1991).
[17] Hornik, K., Stinchcombe, M., and White, H., Neural
Networks 2, 359 (1989).
[18] Lorenz, E. N., Journal of the Atmospheric Sciences 20,
130 (1963).
[19] Madondo, M. and Gibbons, T., in Proceedings of the Mid-
west Instruction and Computing Symposium (2018).
[20] The term learnability is used here in the sense of neu-
ral system’s fidelity to specified properties of a dynami-
cal system, e.g., the predictability of the dynamical sys-
tem quantified by finite-time Lyapunov exponent. This
is somewhat different from Valiant’s definition [27].
[21] Ravela, S., “Tractable non-gaussian representations in
dynamic data driven coherent fluid mapping,” in Hand-
book of Dynamic Data Driven Applications Systems,
edited by E. Blasch, S. Ravela, and A. Aved (Springer
International Publishing, Cham, 2018) pp. 29–46.
[22] Ravela, S., Li, Z., Trautner, M., and Reilly, S., Preprint
(2019).
[23] Seidl, D. R. and Lorenz, R. D., in Proceedings of the In-
ternational Joint Conference on Neural Networks 1991,
Vol. 2 (IEEE, 1991) pp. 709–714.
[24] Strogatz, S. H., Nonlinear Dynamics and Chaos: With
Applications to Physics, Biology, Chemistry, and Engi-
neering, 2nd ed. (CRC Press, Boca Raton, FL, 2015) p.
531.
[25] Trautner, M. and Ravela, S., “Neural integration of con-
tinuous dynamics,” (2019), arXiv:1911.10309 [cs.LG].
[26] Tucker, W., Foundations of Computational Mathematics
2, 53 (2002).
[27] Valiant, L. G., Commun. ACM 27, 1134 (1984).
[28] Viswanath, D., Lyapunov Exponents from Random Fi-
bonacci Sequences to the Lorenz Equations, Ph.D. thesis,
Cornell University, Ithaca, NY, USA (1998).
[29] Yu, R., Zheng, S., and Liu, Y., in Proceedings of
the ICML 17 Workshop on Deep Structured Prediction
(2017).
[30] Zerroug, A., Terrissa, L., and Faure, A., Annual Review
of Chaos Theory, Bifurcations and Dynamical Systems
4, 55 (2013).
[31] Zhang, L., in 2017 IEEE 30th Canadian Conference on
Electrical and Computer Engineering, 2 (2017) pp. 30–
33.
[32] Zhang, L., in 2017 IEEE Life Sciences Conference (2017)
pp. 39–42.
... feed-forward with smooth nonlinear activation), we show elsewhere that bounds on neural structure can be established [23]. These bounds have been experimentally validated [10], but naturally are higher than PolyNets provide. For example, for the Lorenz system, exactly 2 PolyNet nodes are needed, however, a feed-forward network with tanh activation, 6 nodes appear to be necessary. ...
Preprint
Full-text available
Neural dynamical systems are dynamical systems that are described at least in part by neural networks. The class of continuous-time neural dynamical systems must, however, be numerically integrated for simulation and learning. Here, we present a compact neural circuit for two common numerical integrators: the explicit fixed-step Runge-Kutta method of any order and the semi-implicit/predictor-corrector Adams-Bashforth-Moulton method. Modeled as constant-sized recurrent networks embedding a continuous neural differential equation, they achieve fully neural temporal output. Using the polynomial class of dynamical systems, we demonstrate the equivalence of neural and numerical integration.
Book
Full-text available
Table of Contents Evidence of Noisy Chaotic Dynamics in the Returns of Four Dow Jones Stock Indices John Francis T. Diaz 01-15 Dynamic Behaviour of a Unified Two-Point Fourth Order Family of Iterative Methods D. K. R. Babajee and S. K. Khratti 16-29 A Common Fixed Point Theorem of Presic Type for Three Maps in Fuzzy Metric Space P. P. Murthy, Rashmi 30-36 Analysis of Dual Functions Farid Messelmi 37-54 Chaotic Dynamical Behavior of Recurrent Neural Network A. Zerroug, L. Terrissa, A. Faure 55-66
Article
Full-text available
Many research works deal with chaotic neural networks for various fields of application. Unfortunately, up to now, these networks are usually claimed to be chaotic without any mathematical proof. The purpose of this paper is to establish, based on a rigorous theoretical framework, an equivalence between chaotic iterations according to Devaney and a particular class of neural networks. On the one hand, we show how to build such a network, on the other hand, we provide a method to check if a neural network is a chaotic one. Finally, the ability of classical feedforward multilayer perceptrons to learn sets of data obtained from a dynamical system is regarded. Various boolean functions are iterated on finite states. Iterations of some of them are proven to be chaotic as it is defined by Devaney. In that context, important differences occur in the training process, establishing with various neural networks that chaotic behaviors are far more difficult to learn.
Article
We show that standard multilayer feedforward networks with as few as a single hidden layer and arbitrary bounded and nonconstant activation function are universal approximators with respect to Lp(μ) performance criteria, for arbitrary finite input environment measures μ, provided only that sufficiently many hidden units are available. If the activation function is continuous, bounded and nonconstant, then continuous mappings can be learned uniformly over compact input sets. We also give very general conditions ensuring that networks with sufficiently smooth activation functions are capable of arbitrarily accurate approximation to a function and its derivatives.
Article
Learning is regarded as the phenomenon of knowledge acquisition in the absence of explicit programming. A precise methodology is given for studying this phenomenon rom a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learned using it in a reasonable (polynomial) number of steps. Although inherent algorithmic complexity appears to set serious limits to the range of concepts that can be learned, it is shown that there are some important nontrivial classes of propositional concepts that can be learned in a realistic sense.
Article
This paper investigates the prediction of a Lorenz chaotic attractor having relatively high values of Lypunov's exponents. The characteristic of this time series is its rich chaotic behavior. For such dynamic reconstruction problem, regularized radial basis function (RBF) neural network (NN) models have been widely employed in the literature. However, author recommends using a two-layer multi-layer perceptron (MLP) NN-based recurrent model. When none of the available linear models have been able to learn the dynamics of this attractor, it is shown that the proposed NN-based auto regressive (AR) and auto regressive moving average (ARMA) models with regularization have not only learned the true trajectory of this attractor, but also performed much better in multi-step-ahead predictions. However, equivalent linear models seem to fail miserably in learning the dynamics of the time series, despite the low values of Akaike's final prediction error (FPE) estimate. Author proposes to employ the recurrent NN-based ARMA model with regularization which clearly outperforms all other models and thus, it is possible to obtain good results for prediction and reconstruction of the dynamics of the chaotic time series with NN-based models.
Article
In this paper, we prove that any finite time trajectory of a given n-dimensional dynamical system can be approximately realized by the internal state of the output units of a continuous time recurrent neural network with n output units, some hidden units, and an appropriate initial condition. The essential idea of the proof is to embed the n-dimensional dynamical system into a higher dimensional one which defines a recurrent neural network. As a corollary, we also show that any continuous curve can be approximated by the output of a recurrent neural network.