ArticlePDF Available

Reservoir Computing Universality With Stochastic Inputs

Authors:

Abstract

The universal approximation properties with respect to L <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sup> -type criteria of three important families of reservoir computers with stochastic discrete-time semi-infinite inputs are shown. First, it is proven that linear reservoir systems with either polynomial or neural network readout maps are universal. More importantly, it is proven that the same property holds for two families with linear readouts, namely, trigonometric state-affine systems and echo state networks, which are the most widely used reservoir systems in applications. The linearity in the readouts is a key feature in supervised machine learning applications. It guarantees that these systems can be used in high-dimensional situations and in the presence of large data sets. The L <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sup> criteria used in this paper allow the formulation of universality results that do not necessarily impose almost sure uniform boundedness in the inputs or the fading memory property in the filter that needs to be approximated.
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 1
Reservoir Computing Universality With Stochastic
Inputs
Lukas Gonon and Juan-Pablo Ortega
Abstract—The universal approximation properties with respect
to Lp-type criteria of three important families of reservoir
computers with stochastic discrete-time semi-infinite inputs are
shown. First, it is proved that linear reservoir systems with either
polynomial or neural network readout maps are universal. More
importantly, it is proved that the same property holds for two
families with linear readouts, namely, trigonometric state-affine
systems and echo state networks, which are the most widely used
reservoir systems in applications. The linearity in the readouts
is a key feature in supervised machine learning applications. It
guarantees that these systems can be used in high-dimensional
situations and in the presence of large datasets. The Lpcriteria
used in this paper allow the formulation of universality results
that do not necessarily impose almost sure uniform boundedness
in the inputs or the fading memory property in the filter that
needs to be approximated.
Index Terms—Reservoir computing, echo state network, ESN,
machine learning, uniform system approximation, stochastic
input, universality.
I. INTRODUCTION
AUNIVERSALITY statement in relation to a machine
learning paradigm refers to its versatility at the time of
reproducing a rich number of patterns obtained by modifying
only a limited number of hyperparameters. In the language
of learning theory, universality amounts to the possibility of
making approximation errors as small as one wants [1]–[3].
Well-known universality results are, for example, the uniform
approximation properties of feedforward neural networks es-
tablished in [4], [5] for deterministic inputs and, later on,
extended in [6] to accommodate random inputs.
This paper is a generalization of the universality statements
in [6] to a discrete-time dynamical context. More specifically,
we are interested in the learning not of functions but of
filters that transform semi-infinite random input sequences
parameterized by time into outputs that depend on those inputs
in a causal and time-invariant manner. The approximants
used are small subfamilies of reservoir computers (RC) [7],
[8] or reservoir systems. Reservoir computers (also referred
to in the literature as liquid state machines [9], [10]) are
filters generated by nonlinear state-space transformations that
constitute special types of recurrent neural networks. They are
determined by two maps, namely a reservoir F:RN×Rn
RN,n, N N, and a readout map h:RNRthat under
certain hypotheses transform (or filter) an infinite discrete-time
L. Gonon and J.-P. Ortega are with the Faculty of Mathematics and
Statistics, Universit¨
at Sankt Gallen, Sankt Gallen, Switzerland. L. Gonon is
also affiliated with the Department of Mathematics, ETH Z¨
urich, Switzerland.
J.-P. Ortega is also affiliated with the Centre National de la Recherche
Scientifique (CNRS), France.
input z= (. . . , z1,z0,z1, . . .)(Rn)Zinto an output signal
yRZof the same type using a state-space transformation
given by:
xt=F(xt1,zt),
yt=h(xt),
(1)
(2)
where tZand the dimension NNof the state vectors
xtRNis referred to as the number of virtual neurons
of the system. In supervised machine learning applications
the reservoir map is very often randomly generated and the
memoryless readout is trained so that the output matches a
given teaching signal. An important particular case of the RC
systems in (1)-(2) are echo state networks (ESN) introduced,
in different contexts, in [8], [11], [12], and that are built using
the transformations
(xt=σ(Axt1+Czt+ζ),
yt=w>xt,(3)
with AMN,CMN,n,ζRN, and wRN. The map
σ:RNRNis obtained via the componentwise application
of a given activation function σ:RRthat is denoted
with the same symbol. ESNs have as an important feature
the linearity of the readout specified by the vector wRN
that is estimated using linear regression methods based on a
training dataset. This is done once the other parameters in the
model (A,C, and ζ) have been randomly generated and their
scale has been adapted to the problem in question by tuning
a limited number of hyperparameters (like the sparsity or the
spectral radius of the matrix A).
Families of reservoir systems of the type (1)-(2) have
already been proved to be universal in different contexts. In the
continuous-time setup, it was shown in [13] that linear reser-
voir systems with polynomial readouts or bilinear reservoirs
with linear readouts are able to uniformly approximate any
fading memory filter with uniformly bounded and equicon-
tinuous inputs. The fading memory property is a continuity
feature exhibited by many filters encountered in applications.
See also [9], [10], [14], [15] for other contributions to the RC
universality problem in the continuous-time setup.
In the discrete-time setup, several universality statements
were already part of classical systems theory statements for
inputs defined on a finite number of time points [16]–[18].
In the more general context of semi-infinite inputs, various
universality results have been formulated for systems with
approximate finite memory [11], [12], [19]–[22]. More re-
cently, it has been shown in [23], [24], that RCs generated
by contractive reservoir maps (similar to the ESNs introduced
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 2
above) exhibit universality properties in the approximate finite
memory category.
These universality results have been recently extended to
the causal and fading memory category in [25], [26]. In those
works the universality of two important families of reservoir
systems with linear readouts has been established, namely,
the so called state affine systems (SAS) and the echo state
networks (ESN) that we just introduced in (3). Moreover, the
universality of the SAS family was established in [25] both for
uniformly bounded deterministic inputs, as well as for almost
surely uniformly bounded stochastic ones. This last statement
was shown to be a corollary of a general transfer theorem
that proves that very important features of causal and time-
invariant filters like the fading memory property or universality
are naturally inherited by reservoir systems with almost surely
uniformly bounded stochastic inputs from their counterparts
with deterministic inputs.
Unfortunately, almost surely bounded random inputs are
not always appropriate for many applications. For example,
most parametric time series models use as driving innova-
tions random variables whose distributions are not compactly
supported (Gaussian, for example) in order to ensure ade-
quate levels of performance. The main goal of this work
is formulating universality results in the stochastic context
that do not impose almost sure uniform boundedness in the
inputs. This is achieved by using a density criterion (which is
the mathematical characterization of universality) based not
on L-type norms, like in [25], [26], but on Lpnorms,
p[1,). This approach follows the pattern introduced in
the static case in [6].
This strategy allows to cover a more general class of input
signals and filters, but it also creates some differences in
the type of approximation results that are obtained. More
specifically, in the stochastic universality statements in [25],
for example, universal families are presented that uniformly
approximate any given filter for any input in a given class of
stochastic processes. In contrast with this statement and like
in [6], we fix here first a discrete-time stochastic process that
models the data generating process (DGP) behind the system
inputs that are being considered. Subsequently, families of
reservoir filters are spelled out whose images of the DGP
are dense in the Lpsense. Equivalently, the image of the
DGP by any measurable causal and time invariant filter can
be approximated by the image of one of the members of the
universal family with respect to an Lpnorm defined using the
law of the prefixed DGP.
It is important to point out that this approach allows us to
formulate universality results for filters that do not necessarily
have the fading memory property since only measurability is
imposed as a hypothesis.
The paper contains three main universality statements. The
first one shows that linear reservoir systems with either poly-
nomial or neural network readout maps are universal in the
Lpsense. More importantly, two other families with linear
readouts are shown to also have this property, namely, trigono-
metric state-affine systems and echo state networks, which
are the most widely used reservoir systems in applications.
The linearity of the readout is a key feature of these systems
since in supervised machine learning applications it reduces
the training task to the solution of a linear regression problem,
which can be implemented efficiently also in high-dimensional
situations and in the presence of large datasets.
We emphasize that, from a learning theoretical perspective,
the results in this paper only establish the possibility of
making the approximation error arbitrarily small when using
the proposed RC families in a specific learning task. We do
not provide bounds neither for the approximation nor the
corresponding estimation errors using finite random samples.
Even though some results in this direction already exist in the
literature [23], [24], we plan to address this important subject
in a forthcoming paper where the same degree of generality
as in the present paper will be adopted.
II. PRELIMINARIES
In this section we introduce some notation and collect
general facts about filters, reservoir systems, and stochastic
input signals.
A. Notation
We write N={0,1, . . .}and Z={. . . , 1,0}. The
elements of the Euclidean spaces Rnwill be written as column
vectors and will be denoted in bold. Given a vector vRn, we
denote its entries by vior by v(i), with i∈ {1, . . . , n}.(Rn)Z
and (Rn)Zdenote the sets of infinite Rn-valued sequences of
the type (. . . , z1,z0,z1, . . .)and (. . . , z1,z0)with ziRn
for iZand iZ, respectively. Additionally, we denote
by z(k)
ithe k-th component of zi. The elements in these
sequence spaces will also be written in bold, for example,
z:= (. . . , z1,z0)(Rn)Z. We denote by Mn,m the space
of real n×mmatrices with m, n N. When n=m, we
use the symbol Mnto refer to the space of square matrices
of order n. Random variables and stochastic processes will be
denoted using upper case characters that will be bold when
they are vector valued.
B. Filters and functionals
A filter is a map U: (Rn)ZRZ. It is called causal, if
for any z,w(Rn)Zwhich satisfy zτ=wτfor all τt
for a given tZ, one has that U(z)t=U(w)t. Denote
by Tτ: (Rn)Z(Rn)Zthe time delay operator defined by
Tτ(z)t:= zt+τ, for any τZ. A filter Uis called time-
invariant, if TτU=UTτfor all τZ.
Causal and time-invariant filters can be equivalently de-
scribed using their naturally associated functionals. We refer
to a map H: (Rn)ZRas a functional. Given a causal
and time-invariant filter U, one defines the functional HU
associated to it by setting HU(z) := U(ze)0. Here zeis an
arbitrary extension of z(Rn)Zto (Rn)Z.HUdoes not
depend on the choice of this extension since Uis causal.
Conversely, given a functional Hone may define a causal and
time-invariant filter UH: (Rn)ZRZby setting UH(z)t:=
H(πZTt(z)), where πZ: (Rn)Z(Rn)Zis the natural
projection. One may verify that any causal and time-invariant
filter can be recovered from its associated functional and
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 3
conversely. Equivalently, U=UHUand H=HUH. We refer
to [13] for further details.
If Uis causal and time-invariant, then for any z(Rn)Z
the sequence U(z)restricted to Zonly depends on (zt)tZ.
Thus we may also consider Uas a map U: (Rn)ZRZ,
but when we do so this will always be clear from the context.
C. Reservoir computing systems
A specific class of filters can be obtained using the reservoir
computing systems or reservoir computers (RC) introduced in
(1)-(2) when they satisfy the so called echo state property
(ESP) given by the following statement (see [27]–[29]): for
any z(Rn)Zthere exists a unique x(RN)Zsuch that (1)
holds. In the presence of the ESP, the RC system gives rise
to a well-defined filter UF
hthat is constructed by associating
to any z(Rn)Zthe unique x(RN)Zsatisfying (1)
and by mapping xsubsequently to the output in (2), that
is, UF
h(z)t:= yt. Furthermore, it can be shown (see [26,
Proposition 2.1]) that UF
his necessarily causal and time-
invariant and hence we may associate to UF
ha reservoir
functional HF
h: (Rn)ZRdefined as HF
h(z) := UF
h(z)0.
As seen above, the causal and time-invariant filter UF
his
uniquely determined by the reservoir functional HF
h. Since
the latter is determined by the restriction of the RC system to
Z, we will sometimes consider the system (1)-(2) only for
tZ.
D. Deterministic filters with stochastic inputs
We are interested in feeding the filters and the systems
that we just introduced with stochastic processes as inputs.
More explicitly, given a causal and time-invariant filter U
that satisfies certain measurability hypotheses, any stochastic
process Z= (Zt)tZis mapped to a new stochastic process
(U(Z)t)tZ. The main contributions in this article address
the question of approximating U(Z)by reservoir filters in an
Lpsense. We now introduce the precise framework to achieve
this goal.
1) Probabilistic framework: Consider a probability space
(Ω,F,P)on which all random variables are defined. Recall
that the sample space is an arbitrary set representing
possible outcomes, the σ-algebra Fis a collection of sub-
sets of describing the set of events to be considered,
and P:F [0,1] is a probability measure that assigns a
probability of occurrence to each event. The input signal is
modeled as a discrete-time stochastic process Z= (Zt)tZ
with values in Rn. For each outcome ωwe denote
by Z(ω) = (Zt(ω))tZthe realization or sample path of
Z. Thus Zmay be viewed as a random sequence in Rn
and when dealing with stochastic processes we will make no
distinctions between the assignment Z:Z×Rnand
the corresponding map into path space Z: Ω (Rn)Z. We
recall that Zis a stochastic process when the corresponding
map Z: Ω (Rn)Zis measurable. Here (Rn)Zis equipped
with the product σ-algebra tZB(Rn)(which coincides
with the Borel σ-algebra of (Rn)Zequipped with the product
topology by [30, Lemma 1.2]), where B(Rn)is the Borel σ-
algebra on Rn.
We denote by Ft:= σ(Z0,...,Zt),tZ, the σ-algebra
generated by {Z0,...,Zt}and write F−∞ := σ(Zt:tZ).
Thus Ftmodels the information contained in the input stream
at times 0,1, . . . , t. For p[1,]we denote by Lp(Ω,F,P)
the Banach space formed by the real-valued random variables
in (Ω,F,P)that have a finite usual Lpnorm k·kp.
We say that the process Zis stationary when for any
{t1, . . . , tk} ⊂ Z,hZ, and At1, . . . , Atk∈ B(Rn),
we have that
P(Zt1At1,...,ZtkAtk)
=P(Zt1+hAt1,...,Ztk+hAtk).
2) Measurable functionals and filters: We say that a func-
tional His measurable when the map between measurable
spaces H:(Rn)Z,tZB(Rn)(R,B(R)) is measur-
able. When His measurable then so is H(Z) : (Ω,F)
(R,B(R)) since H(Z) = HZis the composition of
measurable maps and hence H(Z)is a random variable on
(Ω,F,P).
Analogously, we will say that a causal, time-invariant filter
Uis measurable when the map between measurable spaces
U:(Rn)Z,tZB(Rn)RZ,tZB(R)is measurable.
In that case, also the restriction of Uto Z(see above) is
measurable and so U(Z)is a real-valued stochastic process.
As discussed above, causal, time-invariant filters and func-
tionals are in a one-to-one correspondence. This relation
is compatible with the measurability condition, that is, a
causal and time-invariant filter is measurable if and only if
the associated functional is measurable. In order to prove
this statement we show first that the operator πZTt:
(Rn)Z,tZB(Rn)(Rn)Z,tZB(Rn)is a mea-
surable map, for any tZ. Indeed, notice first that the
projections pi:(Rn)Z,tZB(Rn)(Rn,B(Rn)),
iZ, given by pi(z) = ziare measurable. Thus πZTt
can be written as the Cartesian product of measurable maps,
i.e. for each kZone has that (πZTt)k=pt+kis
measurable. This yields that πZTtis measurable [30,
Lemma 1.8].
Now, if His a measurable functional, this implies that the
associated filter UHis also measurable, since for each tZ,
(UH)t=HπZTt,(4)
is a composition of measurable functions and hence also
measurable. Conversely, if Uis causal, time-invariant, and
measurable, then so is the associated functional HU=p0U.
3) Lp-norm for functionals: Fix p[1,)and let Hbe
a measurable functional such that H(Z)Lp(Ω,F,P). The
functionals which satisfy that
kH(Z)kp:= E[|H(Z)|p]1/p <(5)
will be referred to as p-integrable with respect to the input
process Z.
Let us now consider the expression (5) from an alternative
point of view. Denote by µZ:= PZ1the law of Zwhen
viewed as a (Rn)Z-valued random variable as above. Thus
µZis a probability measure on (Rn)Zsuch that for any
measurable set A(Rn)Zone has µZ(A) = P(ZA).
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 4
The requirement H(Z)Lp(Ω,F,P)then translates to
HLp((Rn)Z, µZ)and (5) is equal [30, Lemma 1.22] to
kHkµZ
p:= "Z(Rn)Z
|H(z)|pµZ(dz)#1/p
=kH(Z)kp.
Thus, the results formulated later on in the paper for
functionals with random inputs can also be seen as statements
for functionals with deterministic inputs in (Rn)Z, where
the closeness between them is measured using the norm in
Lp((Rn)Z, µZ). Following the terminology used by [6] we
will refer to µZas the input environment measure.
We emphasize that these two points of view are equivalent.
Given any probability measure µZon (Rn)Zone may set
Ω=(Rn)Z,F=tZB(Rn),P=µZand define Zt(z) :=
ztfor all z. We will switch between these two viewpoints
throughout the paper without much warning to the reader.
4) Lp-norm for filters: Fix p[1,). A causal, time-
invariant, measurable filter Uis said to be p-integrable, if
kU(Z)kp:= sup
tZnE[|U(Z)t|p]1/po<.(6)
It is easy to see that if Uis p-integrable, then so is the
corresponding functional HUdue to the following inequality
kHU(Z)kp=E[|HU(Z)|p]1/p =E[|U(Z)0|p]1/p
sup
tZnE[|U(Z)t|p]1/po=kU(Z)kp<.
The converse implication holds true when the input process
is stationary. In order to show this fact, notice first that if µt
is the law of πZTt(Z),tZ, and Zis by hypothesis
stationary then, for any {t1, . . . , tk} ⊂ Zand At1, . . . , Atk
B(Rn), we have that
P(πZTt(Z))t1At1,...,(πZTt(Z))tkAtk
=P(Zt1+tAt1,...,Ztk+tAtk)
=P(Zt1At1,...,ZtkAtk),
which proves that
µZ=µt,for all tZ.(7)
This identity, together with (4), implies that for any p-
integrable functional H:
kUH(Z)kp= sup
tZnE[|UH(Z)t|p]1/po
= sup
tZnE|H(πZTt(Z))|p1/po
= sup
tZ
"Z(Rn)Z
|H(z)|pµt(dz)#1/p
= sup
tZ
"Z(Rn)Z
|H(z)|pµZ(dz)#1/p
=kH(Z)kp<,
(8)
which proves the p-integrability of the associated filter UH.
III. Lp-UNIVERSALITY RESULTS
Fix p[1,),Zan input process, and a functional H
such that H(Z)Lp(Ω,F,P). The goal of this section is
finding simple families of reservoir systems that are able to
approximate H(Z)as accurately as needed in the Lpsense.
The first part contains a result that shows that linear reservoir
maps with polynomial readouts are able to carry this out. As
we already pointed out in the introduction, a result for the
same type of reservoir systems has been proved in [25] in the
Lsetting for both deterministic and almost surely uniformly
bounded stochastic inputs. The second part presents a family
that is able to achieve universality using only linear readouts,
which is of major importance for applications since in that
case the training effort reduces to solving a linear regression.
Finally, we prove the universality of echo state networks which
is the most widely used family of reservoir systems with linear
readouts.
A. Linear reservoirs with nonlinear readouts
Consider a reservoir system with linear reservoir map and
a polynomial readout. More precisely, given AMN,c
MN,n, and hPolNa real-valued polynomial in Nvariables,
consider the system
(xt=Axt1+czt, t Z,
yt=h(xt), t Z,(9)
for any z(Rn)Z. If the matrix Ais chosen so that
σmax(A)<1, then this system has the echo state property
and the corresponding reservoir filter UA,c
his causal and time-
invariant [25]. We denote by HA,c
hthe associated functional.
We are interested in the approximation capabilities that can be
achieved by using processes of the type HA,c
h(Z), where Zis
a fixed input process and HA,c
h(Z) = Y0, with Y0obviously
determined by the stochastic reservoir system
(Xt=AXt1+cZt, t Z,
Yt=h(Xt), t Z.(10)
Proposition III.1. Fix p[1,), let Zbe a fixed Rn-valued
input process, and let Hbe a functional such that H(Z)
Lp(Ω,F,P). Suppose that for any KNthere exists α > 0
such that
E"exp α
K
X
k=0
n
X
i=1
|Z(i)
k|!#<,(11)
where Z(i)
kdenotes the i-th component of Zk. Then, for any
ε > 0there exists NN,AMN,cMN,n, and hPolN
such that (9)has the echo state property, the corresponding
filter is causal and time-invariant, the associated functional
satisfies HA,c
h(Z)Lp(Ω,F,P), and
kH(Z)HA,c
h(Z)kp< ε. (12)
If the input process Zis stationary then
kUH(Z)UA,c
h(Z)kp< ε. (13)
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 5
Proof. The proof consists of two steps: In the first one we
use assumption (11) and classical results in the literature to
establish that
Poln(K+1) is dense in Lp(Rn(K+1), µK),for all KN,
(14)
where µKis the law of (Z(1)
0, Z(2)
0, . . . , Z(n1)
K, Z(n)
K)on
Rn(K+1) under P. In the second step we then use (14) to
construct a linear RC system of the type in (9) that yields the
approximation statement (12).
Step 1: Denote by µKthe law of
(Z(1)
0, Z(2)
0, . . . , Z(n1)
K, Z(n)
K)on RNunder P, where
N:= n(K+ 1). By (11) there exists α > 0such that
RRNexp(αkzk1)µK(dz)<, where here and in the rest
of this proof k · k1denotes the Euclidean 1-norm. Denoting
by µj
Kthe j-th marginal distribution of µK, this implies for
j= 1, . . . , N that
ZR
exp(α|z(j)|)µj
K(dz(j))ZRN
exp(αkzk1)µK(dz)<.
Consequently, by [31, Theorem 6], Pol1is dense in Lp(R, µj
K)
for any p[1,),j= 1, . . . , N . By [32, Proposition page
364] this implies that PolNis dense in Lp(RN, µK), where
we note that µKindeed satisfies the moment assumption in
[32, Page 361]: since x2mexp(αx)for any x0,mN,
one has
ZRN
kzk2m
2µK(dz)ZRN
exp(αkzk2)µK(dz)
ZRN
exp(αkzk1)µK(dz)<.
Step 2: Let ε > 0. By Lemma A.1 in the appendix there
exists KNsuch that
kH(Z)E[H(Z)|FK]kp<ε
2(15)
where FK:= σ(Z0,...,ZK). In the following para-
graphs we will establish the approximation statement (12) for
E[H(Z)|FK]instead of H(Z). Combining this with (15) will
then yield (12).
Let N:= n(K+ 1). By definition, E[H(Z)|FK]is
FK-measurable and hence there exists [30, Lemma 1.13] a
measurable function gK:RNRsuch that E[H(Z)|FK] =
gK(Z0,...,ZK). Furthermore,
ZRN
|gK(z)|pµK(dz)
=E[|E[H(Z)|FK]|p]E[|H(Z)|p]<,
by standard properties of conditional expectations (see, for in-
stance, [33, Theorem 5.1.4]) and the assumption that H(Z)
Lp(Ω,F,P). Thus, gKLp(RN, µK)and using the statement
(14) established in Step 1, there exists hPolNsuch that
kE[H(Z)|FK]h(Z>
0,...,Z>
K)kp
=kgKhkLp(RNK)<ε
2.(16)
Define now a reservoir system of the type (10) with inputs
given by the random variables Zt,tZand reservoir
matrices AMNand cMN,n with all entries equal to
0except Ai,in= 1 for i=n+ 1, . . . , N and ci,i = 1 for
i= 1, . . . , n, that is
A=0n,nK 0n,n
InK 0n,n ,and c=In
0nK,n .
This system has the echo state property (all the eigenval-
ues of Aequal zero) and has a unique causal and time
invariant solution associated to the reservoir states Xt:=
Z>
t,Z>
t1,...,Z>
tK>,tZ. It is easy to verify that
the corresponding reservoir functional is given by
HA,c
h(Z) = h(Z>
0,...,Z>
K).(17)
Now the triangle inequality and (15), (16) and (17) allow us
to conclude (12).
The statement in (13) in the presence of the stationarity
hypothesis for Zis a straightforward consequence of (7) and
the equality (8).
Remark III.2.It is important to point out that the reservoir
systems used in the proof of Proposition III.1 all have finite
memory. Thus, this proof shows that it is possible to obtain
universality in the Lpsense with that type of finite memory
systems and that, in particular, they can be used to approximate
infinite memory filters. A key ingredient in this statement
is, apart from the hypothesis (11), the Lemma A.1 in the
Appendix. The other universal systems introduced later on in
the paper (trigonometric state-affine systems and echo state
networks) also share this feature. Similar statements have
also been proved for linear reservoir systems with polynomial
readouts and state-affine systems with linear readouts in the
Lsetup for both deterministic and almost surely uniformly
bounded stochastic inputs (see, for instance, [25, Corollary 11,
Theorem 19]). This phenomenon has also been observed in the
in the context of approximation of deterministic filters using
Volterra series operators (see [13, Theorems 3 and 4]).
Remark III.3.A simple situation in which condition (11) is
satisfied is when for any tZthe random variable Ztis
bounded, i.e. for any tZthere exists Ct0such that
kZtk ≤ Ct,P-a.s. However, as the next remark shows, there
are also practically relevant examples of input streams with
unbounded support, for which (11) is satisfied.
Remark III.4.A sufficient condition for (11) to hold is that
the random variables {Zt:tZ}are independent and
that for each t, there exists a constant α > 0such that
E[exp(αPn
i=1 |Z(i)
t|)] <. This last condition is satisfied,
for instance, if Ztis normally distributed. For input streams
coming from more heavy-tailed distributions like Student’s t-
distribution, the condition is not satisfied and so one should use
the reservoir systems considered below (see Corollary III.8,
Theorem III.9, and Theorem III.10) instead if universality is
needed.
Remark III.5.Assumption (11) can be replaced by alternative
assumptions but it can not be removed. Even if n= 1 and
{Zt:tZ}are independent and identically distributed with
distribution ν, a condition stronger than the existence of mo-
ments of all orders for νis required. As a counterexample, one
may take for νa lognormal distribution. Then νhas moments
of all orders, but (11) is not satisfied. Let us now argue that
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 6
the approximation result proved under assumption (11) fails
in this case. The following argument relies on results for the
classical moment problem (see, for example, the collection of
references in [34]).
Indeed, by [35]νis not determinate (there exist other
probability measures with identical moments) and thus (see
e.g. [36, Theorem 4.3]) Pol1is not dense in Lp(R, ν)for
p2. In particular, there exists gLp(R, ν)and ε > 0
such that kg˜
hkp> ε for all ˜
hPol1. Suppose that
we are in the case n= 1 and let {Zt:tZ}be
independent and identically distributed with distribution ν
and H(z) := g(z0)for zRZ. Then, for any choice
of N,A,cand hone has E[HA,c
h(Z)|F0] = ˜
h(Z0), where
˜
h(x) := E[h(AX1+cx)], x R, is a polynomial. Thus one
may use [33, Theorem 5.1.4] and the fact that by construction
H(Z)is F0-measurable to obtain
kH(Z)HA,c
h(Z)kp≥ kE[H(Z)|F0]E[HA,c
h(Z)|F0]kp
=kg˜
hkp> ε.
Remark III.6.In previous reservoir computing universality
results for both deterministic and stochastic inputs quoted in
the introduction there was an important continuity hypothesis
called the fading memory property that does not play a role
here and that has been replaced by the integrability require-
ment HLp((Rn)Z, µZ). In particular, the universality
results that we just proved and those that come in the next
section (see Theorem III.9) yield approximations for filters
which do not necessarily have the fading memory property.
Whether or not the approximation results apply depends on the
integrability condition with respect to the input environment
measure µZ. Consider, for example, the functional associated
to the peak-hold operator [13]. In the discrete-time setting, the
associated functional is
H(z) = sup
t0
{zt},with zRZ.
We now show that the two possibilities HLp((Rn)Z, µZ)
and H /Lp((Rn)Z, µZ)are feasible, depending on the
choice of µZ:
Let Z= (Zt)tZbe one dimensional independent
and identically distributed (i.i.d) random variables with
unbounded support and denote by µZthe law of Zon
RZ. Denoting by Fthe distribution function of Z1and
using the i.i.d assumption one calculates, for any aR,
P(H(Z)> a) = 1 P(t<0{Zta})
= 1 lim
n→∞ F(a)n= 1.
Hence, we can conclude that H(Z) = ,µZ-almost
everywhere and therefore H /Lp((Rn)Z, µZ).
Consider now the same setup, but assume this time that
the random variables have bounded support, that is, for
some amax Rone has that P(Ztamax)=1and
P(Zt> amax) = 0. Then, the same argument shows
that H(Z) = amax,µZ-almost everywhere and therefore
HLp((Rn)Z, µZ).
Remark III.7.From the proof of Proposition III.1 one sees
that one could replace in its statement PolNby any other
family {HN}NNthat satisfies the density statement (14). In
particular, the following corollary shows that this result can
be obtained with readouts made out of neural networks.
Denote by HNthe set of feedforward one hidden layer
neural networks with inputs in RNthat are constructed with
a fixed activation function σ. More specifically, HNis made
of functions h:RNRof the type
h(x) =
k
X
j=1
βjσ(αj·xθj),(18)
for some kN,βj, θjR, and αjRN, for j= 1, . . . , k.
Corollary III.8. In the setup of Proposition III.1, consider the
family of neural networks h∈ HNconstructed with a fixed
activation function σthat is bounded and non-constant. Then,
for any ε > 0there exists NN,AMN,cMN,n, and a
neural network h∈ HNsuch that the corresponding reservoir
system (9)has the echo state property and has a unique causal
and time-invariant filter associated. Moreover, the correspond-
ing functional satisfies that HA,c
h(Z)Lp(Ω,F,P)and
kH(Z)HA,c
h(Z)kp< ε. (19)
Proof. By [6, Theorem 1] the set HNis dense in Lp(RN, µ)
for any finite measure µon RN. Thus, statement (14) holds
with HNreplacing Poln(K+1). Mimicking line by line the
proof of Step 2 in Proposition III.1 then proves the Corollary.
B. Trigonometric state-affine systems with linear readouts
Fix M, N Nand consider R:RnMN ,M defined by
R(z) :=
r
X
k=1
Akcos(uk·z) + Bksin(vk·z),zRn,(20)
for some rN,Ak, BkMN,M ,uk,vkRn, for
k= 1, . . . , r. The symbol TrigN,M denotes the set of all
functions of the type (20). We call the elements of TrigN,M
trigonometric polynomials.
We now introduce reservoir systems with linear readouts
and reservoir maps constructed using trigonometric polyno-
mials: let NN,wRN,PTrigN,N ,QTrigN,1and
define, for any z(Rn)Z, the system:
(xt=P(zt)xt1+Q(zt), t Z,
yt=w>xt, t Z.(21)
We call the systems of this type trigonometric state-affine
systems. When such a system has the echo state property and a
unique causal and time-invariant solution for any input, we de-
note by UP,Q
wthe corresponding filter and by HP,Q
w(z) := y0
the associated functional. As in the previous section, we fix
p[1,),Zan input process, and a functional Hsuch that
H(Z)Lp(Ω,F,P)and we are interested in approximating
H(Z)by systems of the form HP,Q
w(Z). Again, we will write
HP,Q
w(Z) = Y0, where Y0is uniquely determined by the
reservoir system with stochastic inputs
(Xt=P(Zt)Xt1+Q(Zt), t Z,
Yt=w>Xt, t Z.(22)
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 7
Define Aas the set of four-tuples (N, w, P, Q)N×RN×
TrigN,N ×TrigN,1whose associated systems (21) have the
echo state property and the unique solutions are causal and
time-invariant. In particular, for such (N, w, P, Q)a reservoir
functional HP,Q
wassociated to (21) exists.
Theorem III.9. Let p[1,)and let Zbe a fixed Rn-valued
input process. Denote by LZthe set of reservoir functionals
of the type (21)which are p-integrable, that is,
LZ:= {HP,Q
w(Z) : (N, w, P, Q)∈ A} ∩ Lp(Ω,F,P).
Then LZis dense in Lp(Ω,F−∞,P).
In particular, for any functional Hsuch that H(Z)
Lp(Ω,F,P)and any ε > 0, there exists NN,wRN,
PTrigN,N and QTrigN,1such that the system (21)
has the echo state property and causal and time-invariant
solutions. Moreover, HP,Q
w(Z)Lp(Ω,F,P)and
kH(Z)HP,Q
w(Z)kp< ε. (23)
If the input process Zis stationary then
kUH(Z)UP,Q
w(Z)kp< ε. (24)
Proof. We first argue that LZis a linear subspace of
Lp(Ω,F−∞,P). To do this we need to introduce some no-
tation. Given AMN1,M1,BMN2,M2, we denote by
ABMN1+N2,M1+M2the direct sum. Given Ras in
(20) we define RATrigN+N1,M +M1by
RA(z) :=
r
X
k=1
AkAcos(uk·z) + BkAsin(vk·z),
and (with the analogous definition for BR) for Ri
TrigNi,Mi,i= 1,2we set
R1R2=R10N2,M2+0N1,M1R2.
One easily verifies that for λRand (Ni,wi, Pi, Qi)∈ A,
i= 1,2, one has that
(N1+N2,w1λw2, P1P2, Q1Q2)∈ A,
HP1,Q1
w1(Z) + λHP2,Q2
w2(Z) = HP1P2,Q1Q2
w1λw2(Z).
This shows that LZis indeed a linear subspace of
Lp(Ω,F−∞,P).
Secondly, in order to show that LZis dense in
Lp(Ω,F−∞,P), it suffices to prove that if F
Lq(Ω,F−∞,P)satisfies E[F H ]=0for all H∈ LZ, then
F= 0,P-almost surely. Here q(1,]is the H¨
older
conjugate exponent of p. This can be shown by contraposition.
Suppose that LZis not dense in Lp(Ω,F−∞,P). Since LZis
a linear subspace, by the Hahn-Banach theorem there exists
a bounded linear functional Λon Lp(Ω,F−∞,P)such that
Λ(H) = 0 for all H∈ LZ, but Λ6= 0, see e.g. [37,
Theorem 5.19]. Then by [37, Theorem 6.16] there exists
FLq(Ω,F−∞,P)such that Λ(H) = E[F H ]for all
HLp(Ω,F−∞,P)and F6= 0, since Λ6= 0. In particular,
there exists FLq(Ω,F−∞,P)\ {0}such that E[F H ] = 0
for all H∈ LZ.
Thirdly, suppose that FLq(Ω,F−∞,P)satisfies
E[F H ]=0for all H∈ LZ.(25)
If we show that F= 0,P-almost surely, then the statement
in the theorem follows by the argument in the second step.
In order to prove that F= 0,P-almost surely, we first show
that (25) implies the following statement: for any KN, any
subset I⊂ IK:= {0, . . . , K}, and any u0,...,uKRnit
holds that
E
FY
jI
sin(uj·Zj)Y
k∈IK\I
cos(uk·Zk)
= 0.(26)
We prove this claim by induction on KN. For K= 0,
one sets Q1(z) := cos(u0·z)and Q2(z) := sin(u0·z)and
notices that (1,1,0, Qi)∈ A. Moreover, since the sine and
cosine function are bounded, it is easy see that Qi(Z0) =
H0,Qi
1(Z0)∈ LZ, for i∈ {1,2}. Thus (25) implies (26) and
so the statement holds for K= 0. For the induction step, let
KN\ {0}and assume the implication holds for K1.
We now fix Iand u0,...,uKRnas above and prove (26).
To simplify the notation we define for k∈ {0, . . . , K}and
zRnthe function gkby
gk(z) := (sin(uk·z),if kI,
cos(uk·z),if k∈ IK\I.
To prove (26), we set N:= K+ 1, for j∈ {1, . . . , K}define
AjMNwith all entries equal to 0except (Aj)j+1,j = 1,
that is, (Aj)k,l =δk,j+1 δl,j ,k, l ∈ {1, . . . , N }. Define now
for zRn
P(z) :=
K1
X
j=0
AKjgj(z),
Q(z) := e1gK(z),
w:= eK+1,
(27)
where ejis the j-th unit vector in RN, that is, the only non-
zero entry of ejis a 1in the j-th coordinate. By Lemma A.2
in the appendix, one has AjL· · · Aj0= 0 for any j0, . . . , jL
{1, . . . , K}and LK, since jL=j0+Lcan not be satisfied.
In other words, any product of more than Kfactors of matrices
A(j)is equal to 0and thus for any LNwith LKand
any z0,...,zLRnone has P(z0). . . P (zL) = 0. Using
this fact and iterating (21), one obtains that the trigonometric
state-affine system defined by the elements in (27) has a unique
solution given by
xt=Q(zt) +
K
X
j=1
P(zt)· · · P(ztj+1)Q(ztj).(28)
In particular (N, w, P, Q)∈ A and
HP,Q
w(Z) = X0
=w>
Q(Z0) +
K
X
j=1
P(Z0)· · · P(Zj+1)Q(Zj)
.
(29)
The finiteness of the sum in (29) and the boundedness of the
trigonometric polynomials implies that HP,Q
w(Z)∈ LZ.
We conclude the proof of the induction step with the
following chain of equalities that uses (25) in the first one,
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 8
the representation (29) in the second one, and the choice of
the vector wand the induction hypothesis in the last step:
0 = E[F H P,Q
w(Z)]
=E[Fw>Q(Z0)]
+E[Fw>
K
X
j=1
P(Z0)· · · P(Zj+1)Q(Zj)]
=E[Fw>P(Z0)· · · P(ZK+1)Q(ZK)].
(30)
However, again by Lemma A.2 in the appendix, the only
non-zero product of matrices AjK1· · · Aj0for j0,...jK1
{1, . . . , K}takes place when jk=k+ 1 for k∈ {0, . . . , K
1}. Therefore:
P(Z0)· · · P(ZK+1)
=AKg0(Z0)AK1g1(Z1)· · · A1gK1(ZK+1).
Combining this with (30) and using the identity (49) in
Lemma A.2 in the appendix one obtains
0 = E[F e>
K+1AK· · · A1e1
K
Y
k=0
gk(Zk)]
=E[F
K
Y
k=0
gk(Zk)],
which is the same as (26).
Fourthly, by standard trigonometric identities, the identity
(26) established in the third step implies that for any KN,
E
Fexp
i
K
X
j=0
uj·Zj
= 0 for all u0,...,uKRn.
(31)
We claim that (31) implies F= 0,P-almost surely and
hence the statement in the theorem follows. This fact is
a consequence of the uniqueness theorem for characteristic
functions (which is ultimately a consequence of the Stone-
Weierstrass approximation theorem). See for instance [30,
Theorem 4.3] and the text below that result. To prove F= 0,
P-almost surely, we denote by F+and Fthe positive
and negative parts of F. Then by (31) one has E[F]=0,
necessarily. Thus, if it does not hold that F= 0,P-almost
surely, then c:= E[F+] = E[F]>0and one may
define probability measures Q+and Qon (Ω,F)by setting
Q+(A) := c1E[F+A]and Q(A) := c1E[FA]for
A∈ F. Denote by µ+
Kand µ
Kthe law in Rn(K+1) of the
random variable
ZK:= (Z>
0,Z>
1,...,Z>
K)>
under Q+and Q. Then, the statement (31) implies that for
all uRn(K+1),
ZRn(K+1)
exp(iu ·z)µ+
K(dz) = ZRn(K+1)
exp(iu ·z)µ
K(dz).
By the uniqueness theorem for characteristic functions (see
e.g. [30, Theorem 4.3] and the text below) this implies
that µ+
K=µ
K. Translating this statement back to random
variables, this means that for any bounded and measurable
function g:Rn(K+1) Rone has
0 = cEQ+[g(ZK)] cEQ[g(ZK)] = E[F g(ZK)],
which, by definition, means that E[F|FK]=0,P-almost
surely. Since KNwas arbitrary and FL1(Ω,F−∞,P),
one may combine this with limt→−∞ E[F|Ft] = F,P-almost
surely (see Lemma A.1) to conclude F= 0, as desired.
The statement in (24) in the presence of the stationarity
hypothesis for Zis a straightforward consequence of (7) and
the equality (8).
We emphasize that the use in the proof of the theorem
of nilpotent matrices of the type introduced in Lemma A.2
ensures that the the echo state property is automatically
satisfied (see (28)).
C. Echo state networks
We now turn to showing the universality in the Lpsense
of the the most widely used reservoir systems with linear
readouts, namely, echo state networks. An echo state network
is a RC system determined by
(xt=σ(Axt1+Czt+ζ),
yt=w>xt,(32)
for AMN,CMN,n,ζRN, and wRN. As it
is customary in the neural networks literature, the map σ:
RNRNis obtained via the componentwise application of
a given activation function σ:RRthat is denoted with the
same symbol.
If this system has the echo state property and the resulting
filter is causal and time-invariant, we write as HA,C,ζ
w(z) := y0
the associated functional.
Theorem III.10. Fix p[1,), let Zbe a fixed Rn-
valued input process, and let Hbe a functional such that
H(Z)Lp(Ω,F,P). Suppose that the activation function
σ:RRis non-constant, continuous, and has a bounded
image. Then for any ε > 0, there exists NN,CMN,n ,
ζRN,AMN,wRNsuch that (32)has the echo state
property, the corresponding filter is causal and time-invariant,
the associated functional satisfies HA,C,ζ
w(Z)Lp(Ω,F,P)
and
kH(Z)HA,C,ζ
w(Z)kp< ε. (33)
Proof. First, by Corollary III.8 and (17) there exists K, N
N,wRN,AMN,n(K+1) , and ζRNsuch that the
neural network
h(z) = w>σ(Az+ζ)
satisfies
kH(Z)h(Z>
0,...,Z>
K)kp<ε
2.(34)
Notice that we may rewrite Aas
A= [A(0)A(1) · · · A(K)]
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 9
with A(j)MN,n and
H(Z):=h(Z>
0,...,Z>
K)
=w>σ
K
X
j=0
A(j)Zj+ζ
.(35)
Second, by the neural network approximation theorem for
continuous functions [6, Theorem 2], for any mNthere
exists a neural network that uniformly approximates the iden-
tity mapping on the hypercube Bm:= {xRn:|xi| ≤
mfor i= 1, . . . , n}. More specifically, [6, Theorem 2] is
formulated for R-valued mappings and we hence apply it
componentwise: for any mNand i= 1, . . . , n there
exists N(m)
iN,w(m)
iRN(m)
i,A(m)
iMN(m)
i,n, and
ζ(m)
iRN(m)
i, such that for all i= 1, . . . , n the neural
network
h(m)
i(x) = w(m)
i>σA(m)
ix+ζ(m)
i
satisfies
sup
xBm
{|h(m)
i(x)xi|} <1
m.(36)
Write h(m)(x)=(h(m)
1(x), . . . , h(m)
n(x))>and for j=
1, . . . , K, denote by [h(m)]j=h(m) · · · h(m)the jth
composition of h(m). We now claim that for all j= 1, . . . , K
and xRnit holds that
lim
m→∞[h(m)]j(x) = x.(37)
Indeed, let us fix xRnand argue by induction on j. To
prove (37) for j= 1, let ε > 0be given and choose m0
Nsatisfying m0>max {|x1|,...,|xn|,1}. Then, for any
mm0one has xBmby definition and (36) implies that
for i= 1, . . . , n,
|h(m)
i(x)xi|<1
m< ε.
Hence (37) indeed holds for j= 1. Now let j2
and assume that (37) has been proved for j1. Define
x(m):= [h(m)]j1(x). Then, by the induction hypothesis, for
any given ε > 0one finds m0Nsuch that for all mm0
and i= 1, . . . , n it holds that
|x(m)
ixi|<ε
2.(38)
Hence, choosing m0Nwith m0>max(m0,|x1|+
ε
2,...,|xn|+ε
2,2)one obtains from the triangle inequality
and (38) that x(m)Bm0for all mm0. In particular for
any mm0one may use the triangle inequality in the first
step, x(m)Bm0Bmand (38) in the second step and (36)
in the last step to estimate
|[h(m)]j
i(x)xi|≤|h(m)
i(x(m))x(m)
i|+|x(m)
ixi|
sup
yBm
{|h(m)
i(y)yi|} +ε
2
<1
m+ε
2< ε.
This proves (37) for all j= 1, . . . , K.
Thirdly, define
Hm(Z) := w>σ
K
X
j=0
A(j)[h(m)]j(Zj) + ζ
with the convention [h(m)]0(x) = x.
Since σis continuous, (37) implies that limm→∞ Hm(Z) =
H(Z),P-almost surely, where Hwas defined in (35).
Furthermore, by assumption there exists C > 0such that
|σ(x)| ≤ Cfor all xR. Hence one has |H(Z)
Hm(Z)|p(2CPN
i=1 |wi|)pfor all mN. Thus one may
apply the dominated convergence theorem to obtain
lim
m→∞ kH(Z)Hm(Z)kp
= lim
m→∞ E[|H(Z)Hm(Z)|p]1/p = 0.
In particular for mNlarge enough one has kH(Z)
Hm(Z)kp<ε
2and combining this with the triangle inequality
and (34) one obtains
kH(Z)Hm(Z)kp≤ kH(Z)H(Z)kp
+kH(Z)Hm(Z)kp< ε. (39)
To conclude the proof we now fix mNlarge enough
(so that (39) holds) and show that Hm(Z) = HA,C,ζ
w(Z)for
suitable choices of A, C, ζand w. To do so, first define NJ:=
N(m)
1+· · · +N(m)
nand the block matrices
WJ:=
(w(m)
1)>0
...
0(w(m)
n)>
Mn,NJ,
ζJ:=
ζ(m)
1.
.
.
ζ(m)
n
RNJ,and AJ:=
A(m)
1
.
.
.
A(m)
n
MNJ,n.
Furthermore, to emphasize that mis fixed and h(m)approxi-
mates the identity, set J(x) := h(m)(x)and note that
J(x) = WJσ(AJx+ζJ).(40)
Now set N:= KNJ+Nand define the block matrix AMN
by
A=
0NJ,NJ
AJWJ0NJ,NJ
AJWJ
...
0
0...0NJ,NJ
AJWJ0NJ,NJ
A(1)WJA(2) WJ· · · · · · A(K)WJ0N,N
and ζRN,CMN,n and wRNby
ζ:=
ζJ
.
.
.
ζJ
ζ
, C :=
AJ
0
.
.
.
0
A(0)
,and w:= 0KNJ,1
w.
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 10
Furthermore, we partition the reservoir states xtof the corre-
sponding echo state system as
xt:=
x(1)
t
.
.
.
x(K+1)
t
,
with x(j)
tRNJ, for jK, and x(K+1)
tRN. With this
notation for xtand these choices of matrices, the recursions
associated to the echo state reservoir map in (32) read as
x(1)
t=σ(AJzt+ζJ),(41)
x(j)
t=σ(AJWJx(j1)
t1+ζJ),for j= 2, . . . , K, (42)
x(K+1)
t=σ(
K
X
j=1
A(j)WJx(j)
t1+A(0)zt+ζ).(43)
By iteratively inserting (42) into itself and using (41) one
obtains (recall the definition of Jin (40)) that the unique
solution to (42) is given by
x(j)
t=σ(AJ[J]j1(ztj+1) + ζJ).(44)
More formally, one uses induction on j: For j= 1 the two
expressions (44) and (41) coincide. For j= 2, . . . , K one
inserts (44) for j1(which holds by induction hypothesis)
into (42) to obtain
x(j)
t=σ(AJWJσ(AJ[J]j2(ztj+1) + ζJ) + ζJ)
=σ(AJ[J]j1(ztj+1) + ζJ),
which is indeed (44). Finally, combining (44) and (43) one
obtains
yt=w>x(K+1)
t=w>σ(
K
X
j=1
A(j)WJx(j)
t1+A(0)zt+ζ)
=w>σ(
K
X
j=1
A(j)[J]j(ztj) + A(0)zt+ζ).
The statement (44) shows, in particular, that the echo state
network associated to A, C, ζand wsatisfies the echo state
property. Moreover, inserting t= 0 in the previous equality
and comparing with the definition of Hm(Z)one sees that
indeed Hm(Z) = HA,C,ζ
w(Z). The approximation statement
(33) therefore follows from (39).
Remark III.11.In this paper we measure closeness between
filters and functionals in a Lpsense. As we already pointed
out in Remark III.6, this choice allows us to approximate with
the systems used in this paper measurable filters that, unlike
in the Lcase, do not necessarily satisfy the fading memory
property. Therefore, an interesting aspect of the universality
results in Proposition III.1, Corollary III.8, Theorem III.9, and
Theorem III.10 is that it is possible to approximately simulate
any measurable filter that does not necessarily satisfy the fad-
ing memory property using the reservoir systems introduced
in those results that do satisfy the fading memory property.
Remark III.12.The results presented in this article address
the approximation capabilities of echo state networks and
other reservoir computing systems. When these systems are
used in practice not all of their parameters are trained. For
example, the recurrent connections of ESNs do not usually
undergo a training process, that is, the architecture parameters
A, C, ζare randomly drawn from a distribution and only the
readout wis trained by linear regression so as to optimally
fit the given teaching signal. Subsequently, an optimization
over a few hyperparameters (for instance, the spectral radius
of A) is carried out. In addition, in many situations the same
reservoir matrix Acan be used for different input time series
and different learning tasks and only the input-to-reservoir
matrices C, ζand the readout wneed to be modified (see,
for instance, the approach taken in [38], [39] to define time
series kernels). This feature is key in the implementation of
the notion of multi-tasking in the RC context (see [10]). Thus,
the empirically observed robustness of ESNs with respect
to these parameter choices is not entirely explained by the
universality results presented here. While in the static setting
of feedforward neural networks such questions have already
been tackled (see, for instance, [40]) for echo state networks
a full explanation is not available yet and these questions are
the subject of ongoing research.
D. An alternative viewpoint
So far all the universality results have been formulated
for functionals and filters with random inputs. Equivalently,
we may formulate them as Lp-approximation results on the
sequence space (Rn)Zendowed with any measure µthat
makes p-integrable the filter that we want to approximate.
Theorem III.13. Let H: (Rn)ZRbe a measurable
functional. Then, for any probability measure µon (Rn)Z
with HLp((Rn)Z, µ)and any ε > 0there exists a
reservoir system that has the echo state property and such that
the corresponding filter is causal and time-invariant, the as-
sociated functional HRC satisfies that HRC Lp((Rn)Z, µ)
and
kHHRCkLp((Rn)Z
)< ε. (45)
The reservoir functional HRC may be chosen as coming from
any of the following systems:
Linear reservoir with polynomial readout, that is, (9)for
some NN,AMN,cMN,n, and a polynomial h
PolN, if the measure µsatisfies the following condition:
for any KN,
Z(Rn)Z
exp α
K
X
k=0
n
X
i=1
|z(i)
k|!µ(dz)<.
Linear reservoir with neural network readout, that is, (9)
for some NN,AMN,cMN,n, and a neural
network h∈ HN.
Trigonometric state-affine system with linear readout, that
is, (21)for some NN,wRN,PTrigN,N and
QTrigN,1.
Echo state network with linear readout, that is, (32)for
some NN,CMN,n,ζRN,AMN,wRN,
where we assume that σ:RRemployed in (32)is
bounded, continuous and non-constant.
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 11
Proof. Set Ω=(Rn)Z,F=tZB(Rn),P=µ
and define Zt(z) := ztfor all z,tZ. Then
F=σ(Zt:tZ) = F−∞ and Zis the identity
mapping on (Rn)Z. One may now apply Proposition III.1,
Corollary III.8, Theorem III.9 and Theorem III.10 with this
choice of probability space (Ω,F,P)and input process Z. The
statement of Theorem III.13 then precisely coincides with the
statement of Proposition III.1, Corollary III.8, Theorem III.9
and Theorem III.10, respectively.
E. Approximation of stationary strong time series models
Most parametric time series models commonly used in
financial, macroeconometrics, and forecasting applications are
specified by relations of the type
Xt=G(Xt1,Zt,θ),(46)
where θRkare the parameters of the model and the vector
XtRNis built so that it contains in its components the
time series of interest and that, at the same time, allows for a
Markovian representation of the model as in (46). The model
is driven by the innovations process Z= (Zt)tZ(Rn)Z.
When the innovations are made out of independent and iden-
tically distributed random variables we say that the model is
strong [41]. It is customary in the time series literature to
impose constraints on the parameters vector θso that the
relation (46) has a unique second-order stationary solution or,
in the language of this paper, the system (46) satisfies the echo
state property and the associated filter UG: (Rn)ZRNZ
satisfies that
E[UG(Z)t] =: µand EUG(Z)tUG(Z)>
t+h=: Σh, t, h Z,
(47)
with µRNand ΣhMNconstants that do not depend
on tZ. The Wold decomposition theorem [42, Theorem
5.7.1] shows that any such filter can be uniquely written as
the sum of a linear and a deterministic process.
It is obvious that for strong models the stationarity condition
(7) holds and that, moreover, the condition (47) implies that
kUG(Z)k2= sup
tZnE|UG(Z)t|21/2o= trace (Σ0)1/2<.
(48)
This integrability condition guarantees that the approximation
results in Proposition III.1, Corollary III.8, and Theorems
III.9 and III.10 hold for second-order stationary strong time
series models with p= 2. More specifically, the processes
determined by this kind of models can be approximated in
the L2sense by linear processes with polynomial or neural
network readouts (when the condition in Remark III.4 is
satisfied), by trigonometric state-affine systems with linear
readouts, or by echo state networks.
Important families of models to which this approximation
statement can be applied are, among many others, (see the
references for the meaning of the acronyms) GARCH [43],
[44], VEC [45], BEKK [46], CCC [47], DCC [48], [49],
GDC [50], and ARSV [51], [52].
IV. CONCLUSION
We have shown the universality of three different families
of reservoir computers with respect to the Lpnorm associated
to any given discrete-time semi-infinite input process.
On the one hand we proved that linear reservoir systems
with either neural network or polynomial readout maps (in
this case the input process needs to satisfy the exponential
moments condition (11)) are universal.
On the other hand we showed that the exponential moment
condition (11), which was required in the case of polynomial
readouts, can be dropped by considering two different reservoir
families with linear readouts, namely, trigonometric state-
affine systems and echo state networks. The latter are the most
widely used reservoir systems in applications. The linearity in
the readouts is a key feature in supervised machine learning
applications of these systems. It guarantees that they can be
used in high-dimensional situations and in the presence of
large datasets, since the training in that case is reduced to a
linear regression.
We emphasize that, unlike existing results in the literature
[25], [26] dealing with uniform universal approximation, the
Lpcriteria used in this paper allow to formulate universality
statements that do not necessarily impose almost sure uniform
boundedness on the inputs or the fading memory property on
the filter that needs to be approximated.
APPENDIX
A. Auxiliary Lemmas
Lemma A.1. Let Z:Z×Rnbe a stochastic process
and let Ft:= σ(Z0,...,Zt),tZ, and F−∞ := σ(Zt:t
Z)}. Let FLp(Ω,F−∞,P). Then E[F|Ft]converges to
Fas t→ −∞, both P-almost surely and in norm k·kp, for
any p[1,).
Proof. Since Ft⊂ Ft1⊂ F−∞, for all tN,
and FLp(Ω,F−∞,P)L1(Ω,F−∞ ,P), one has by
L´
evy’s Upward Theorem (see, for instance, [53, II.50.3] or
[33, Theorem 5.5.7]) that Ft:= E[F|Ft]converges for
t→ −∞ to Fin k·k1and P-almost surely. If p= 1 this
already implies the claim. For p > 1one has by standard
properties of conditional expectations (see, for instance, [33,
Theorem 5.1.4]) that suptN{E[|Ft|p]} ≤ E[|F|p]. Hence [33,
Theorem 5.4.5] implies that Ftconverges for t→ −∞ to
some ˜
FLp(Ω,F−∞,P)both in k·kpand P-almost surely.
But this identifies ˜
F= limt→−∞ Ft=F,P-almost surely
and hence Ftconverges for t→ −∞ to Falso in k · kp.
Lemma A.2. For NN\{0,1}and j= 1, . . . , N 1define
AjMNby (Aj)k,l =δk,j+1 δl,j for k, l ∈ {1, . . . , N}. Then
for LN,j0, . . . , jL∈ {1, . . . , N 1}it holds that
(AjL· · · Aj0)k,l =δk,jL+1δl,j0
L
Y
i=1
δji,ji1+1.(49)
In particular AjL· · · Aj06= 0 if and only if ji=j0+ifor
i∈ {1, . . . , L}.
Proof. The last statement directly follows from (49). To prove
(49) we proceed by induction on L. Indeed, for L= 0 the
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 12
formula (49) is just the definition of Aj0. For the induction
step, one assumes that (49) holds for L1and calculates
(AjL· · · Aj0)k,l
=
N
X
r=1
δk,jL+1δr,jL(AjL1· · · Aj0)r,l
=
N
X
r=1
δk,jL+1δr,jLδr,jL1+1δl,j0
L1
Y
i=1
δji,ji1+1,
which is indeed (49).
ACKNOWLEDGMENT
The authors thank Lyudmila Grigoryeva and Josef Teich-
mann for helpful discussions and remarks and acknowledge
partial financial support coming from the Research Com-
mission of the Universit¨
at Sankt Gallen, the Swiss National
Science Foundation (grants number 175801/1 and 179114),
and the French ANR “BIPHOPROC” project (ANR-14-OHRI-
0018-02).
REFERENCES
[1] F. Cucker and S. Smale, “On the mathematical foundations of learning,
Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49,
2002.
[2] S. Smale and D.-X. Zhou, “Estimating the approximation error in
learning theory,Analysis and Applications, vol. 01, no. 01, pp. 17–41,
jan 2003.
[3] F. Cucker and D.-X. Zhou, Learning Theory : An Approximation Theory
Viewpoint. Cambridge University Press, 2007.
[4] G. Cybenko, “Approximation by superpositions of a sigmoidal function,”
pp. 303–314, dec 1989.
[5] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
networks are universal approximators,Neural Networks, vol. 2, no. 5,
pp. 359–366, 1989.
[6] K. Hornik, “Approximation capabilities of muitilayer feedforward net-
works,” Neural Networks, vol. 4, no. 1989, pp. 251–257, 1991.
[7] W. Maass, T. Natschl¨
ager, and H. Markram, “Real-time computing
without stable states: a new framework for neural computation based
on perturbations,” Neural Computation, vol. 14, pp. 2531–2560, 2002.
[8] H. Jaeger and H. Haas, “Harnessing Nonlinearity: Predicting Chaotic
Systems and Saving Energy in Wireless Communication,Science, vol.
304, no. 5667, pp. 78–80, 2004.
[9] W. Maass and H. Markram, “On the computational power of circuits of
spiking neurons,” Journal of Computer and System Sciences, vol. 69,
no. 4, pp. 593–616, 2004.
[10] W. Maass, “Liquid state machines: motivation, theory, and applications,”
in Computability In Context: Computation and Logic in the Real World,
S. S. Barry Cooper and A. Sorbi, Eds., 2011, ch. 8, pp. 275–296.
[11] M. B. Matthews, “On the Uniform Approximation of Nonlinear
Discrete-Time Fading-Memory Systems Using Neural Network Mod-
els,” Ph.D. dissertation, ETH Z¨
urich, 1992.
[12] ——, “Approximating nonlinear fading-memory operators using neural
network models,” Circuits, Systems, and Signal Processing, vol. 12,
no. 2, pp. 279–307, jun 1993.
[13] S. Boyd and L. Chua, “Fading memory and the problem of approxi-
mating nonlinear operators with Volterra series,” IEEE Transactions on
Circuits and Systems, vol. 32, no. 11, pp. 1150–1161, 1985.
[14] K.-i. Funahashi and Y. Nakamura, “Approximation of dynamical systems
by continuous time recurrent neural networks,” Neural Networks, vol. 6,
no. 6, pp. 801–806, jan 1993.
[15] W. Maass, P. Joshi, and E. D. Sontag, “Computational aspects of
feedback in neural circuits,” PLoS Computational Biology, vol. 3, no. 1,
p. e165, 2007.
[16] E. Sontag, “Realization theory of discrete-time nonlinear systems: Part I-
The bounded case,” IEEE Transactions on Circuits and Systems, vol. 26,
no. 5, pp. 342–356, may 1979.
[17] E. D. Sontag, “Polynomial Response Maps,” in Lecture Notes Control
in Control and Information Sciences. Vol. 13. Springer Verlag, 1979.
[18] M. Fliess and D. Normand-Cyrot, “Vers une approche alg´
ebrique des
syst`
emes non lin´
eaires en temps discret,” in Analysis and Optimization
of Systems. Lecture Notes in Control and Information Sciences, vol. 28,
A. Bensoussan and J. Lions, Eds. Springer Berlin Heidelberg, 1980.
[19] I. W. Sandberg, “Approximation theorems for discrete-time systems,
IEEE Transactions on Circuits and Systems, vol. 38, no. 5, pp. 564–
566, 1991.
[20] ——, “Structure theorems for nonlinear systems,” Multidimensional
Systems and Signal Processing, vol. 2, pp. 267–286, 1991.
[21] P. C. Perryman, “Approximation Theory for Deterministic and Stochastic
Nonlinear Systems,” Ph.D. dissertation, University of California, Irvine,
1996.
[22] A. Stubberud and P. Perryman, “Current state of system approximation
for deterministic and stochastic systems,” in Conference Record of The
Thirtieth Asilomar Conference on Signals, Systems and Computers,
vol. 1. IEEE Comput. Soc. Press, 1997, pp. 141–145.
[23] B. Hammer and P. Tino, “Recurrent neural networks with small weights
implement definite memory machines,” Neural Computation, vol. 15,
no. 8, pp. 1897–1929, aug 2003.
[24] P. Tino, B. Hammer, and M. Bod´
en, “Markovian bias of neural-based
architectures with feedback connections,” in Perspectives of Neural-
Symbolic Integration. Studies in Computational Intelligence, vol 77.,
Hammer B. and Hitzler P., Eds. Springer, Berlin, Heidelberg, 2007,
pp. 95–133.
[25] L. Grigoryeva and J.-P. Ortega, “Universal discrete-time reservoir com-
puters with stochastic inputs and linear readouts using non-homogeneous
state-affine systems,Journal of Machine Learning Research, vol. 19,
no. 24, pp. 1–40, 2018.
[26] ——, “Echo state networks are universal,” Neural Networks, vol. 108,
pp. 495–508, 2018.
[27] H. Jaeger, “The ’echo state’ approach to analysing and training recurrent
neural networks with an erratum note,” German National Research
Center for Information Technology, 2010.
[28] I. B. Yildiz, H. Jaeger, and S. J. Kiebel, “Re-visiting the echo state
property.” Neural networks : the official journal of the International
Neural Network Society, vol. 35, pp. 1–9, nov 2012.
[29] G. Manjunath and H. Jaeger, “Echo state property linked to an input:
exploring a fundamental characteristic of recurrent neural networks,”
Neural Computation, vol. 25, no. 3, pp. 671–696, 2013.
[30] O. Kallenberg, Foundations of Modern Probability, ser. Probability and
Its Applications. Springer New York, 2002.
[31] C. Berg and J. P. R. Christensen, “Density questions in the classical
theory of moments,” Annales de l’Institut Fourier, vol. 31, no. 3, pp.
99–114, 1981.
[32] L. C. Petersen, “On the relation between the multidimensional moment
problem and the one-dimensional moment problem,” Mathematica Scan-
dinavica, vol. 51, no. 2, pp. 361–366, 1983.
[33] R. Durrett, Probability: Theory and Examples, 4th ed., ser. Cambridge
Series in Statistical and Probabilistic Mathematics. Cambridge: Cam-
bridge University Press, 2010.
[34] Ernst, Oliver G., Mugler, Antje, Starkloff, Hans-J¨
org, and Ullmann,
Elisabeth, “On the convergence of generalized polynomial chaos ex-
pansions,” ESAIM: M2AN, vol. 46, no. 2, pp. 317–339, 2012.
[35] C. C. Heyde, “On a property of the lognormal distribution,” The Journal
of the Royal Statistical Society Series B (Methodological), vol. 25, no. 2,
pp. 392–393, 1963.
[36] G. Freud, Orthogonal Polynomials. Pergamon Press, 1971.
[37] W. Rudin, Real and Complex Analysis, 3rd ed. McGraw-Hill, 1987.
[38] H. Chen, F. Tang, P. Tino, and X. Yao, “Model-based kernel for
efficient time series analysis,” in Proceedings of the 19th ACM SIGKDD
international conference on Knowledge discovery and data mining -
KDD ’13, 2013.
[39] H. Chen, P. Tino, A. Rodan, and X. Yao, “Learning in the model space
for cognitive fault diagnosis,IEEE Transactions on Neural Networks
and Learning Systems, 2014.
[40] G.-B. G.-B. Huang, Q.-Y. Q.-Y. Zhu, C.-k. C.-K. C.-K. Siew, G.-b. H. ˜
A,
Q.-Y. Q.-Y. Zhu, and C.-k. C.-K. C.-K. Siew, “Extreme learning machine
: Theory and applications,” Neurocomputing, 2006.
[41] C. Francq and J.-M. Zakoian, GARCH Models: Structure, Statistical
Inference and Financial Applications. Wiley, 2010.
[42] P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods.
Springer-Verlag, 2006.
[43] R. F. Engle, “Autoregressive conditional heteroscedasticity with esti-
mates of the variance of United Kingdom inflation,Econometrica,
vol. 50, no. 4, pp. 987–1007, 1982.
[44] T. Bollerslev, “Generalized autoregressive conditional heteroskedastic-
ity,Journal of Econometrics, vol. 31, no. 3, pp. 307–327, 1986.
GONON AND ORTEGA: RESERVOIR COMPUTING UNIVERSALITY WITH STOCHASTIC INPUTS 13
[45] T. Bollerslev, R. F. Engle, and J. M. Wooldridge, “A capital asset pricing
model with time varying covariances,Journal of Political Economy,
vol. 96, pp. 116–131, 1988.
[46] R. F. Engle and F. K. Kroner, “Multivariate simultaneous generalized
ARCH,” Econometric Theory, vol. 11, pp. 122–150, 1995.
[47] T. Bollerslev, “Modelling the coherence in short-run nominal exchange
rates: A multivariate generalized ARCH model,Review of Economics
and Statistics, vol. 72, no. 3, pp. 498–505, 1990.
[48] Y. K. Tse and A. K. C. Tsui, “A multivariate GARCH with time-varying
correlations,” Journal of Business and Economic Statistics, vol. 20, pp.
351–362, 2002.
[49] R. F. Engle, “Dynamic conditional correlation: a simple class of multi-
variate GARCH models,Journal of Business and Economic Statistics,
vol. 20, pp. 339–350, 2002.
[50] F. K. Kroner and V. K. Ng, “Modelling asymmetric comovements of
asset returns,” The Review of Financial Studies, vol. 11, pp. 817–844,
1998.
[51] S. J. Taylor, “Financial returns modelled by the product of two stochastic
processes, a study of daily sugar prices,” in Time series analysis: theory
and practice I, B. D. Anderson, Ed., 1982, pp. 1961–1979.
[52] A. C. Harvey, E. Ruiz, and N. Shephard, “Multivariate stochastic
variance models,Review of Economic Studies, vol. 61, pp. 247–264,
1994.
[53] L. C. G. Rogers and D. Williams, Diffusions, Markov Processes, and
Martingales, 2nd ed. Cambridge University Press, 2000, vol. 1.
... Since the landmark paper demonstrating RC's ability to predict spatiotemporally chaotic systems from data [13], there has been a flurry of efforts to understand the success as well as identify limitations of RC [27][28][29][30][31][32][33][34][35][36]. As a result, more sophisticated architectures have been developed to extend the capability of the original framework, such as hybrid [37], parallel [38,39], and symmetry-aware [40] RC schemes. ...
... It has been established that both standard RC and NGRC are universal approximators, which in appropriate limits can achieve arbitrarily good fits to any system's dynamics [29,33]. But in practice, this is a rather weak guarantee. ...
Article
Full-text available
Reservoir computing (RC) is a simple and efficient model-free framework for forecasting the behavior of nonlinear dynamical systems from data. Here, we show that there exist commonly-studied systems for which leading RC frameworks struggle to learn the dynamics unless key information about the underlying system is already known. We focus on the important problem of basin prediction—determining which attractor a system will converge to from its initial conditions. First, we show that the predictions of standard RC models (echo state networks) depend critically on warm-up time, requiring a warm-up trajectory containing almost the entire transient in order to identify the correct attractor. Accordingly, we turn to next-generation reservoir computing (NGRC), an attractive variant of RC that requires negligible warm-up time. By incorporating the exact nonlinearities in the original equations, we show that NGRC can accurately reconstruct intricate and high-dimensional basins of attraction, even with sparse training data (e.g., a single transient trajectory). Yet, a tiny uncertainty in the exact nonlinearity can render prediction accuracy no better than chance. Our results highlight the challenges faced by data-driven methods in learning the dynamics of multistable systems and suggest potential avenues to make these approaches more robust.
... They are built around regressions 13 on large libraries of linear and nonlinear combinations constructed from the data observations and their time lags, such as next generation reservoir computers (NG-RCs) 14 or sparse identification of nonlinear dynamics (SINDy) 15 . These algorithms are built around nonlinear vector autogression (NVAR) 16 and the mathematical fact that a powerful universal approximator can be constructed by using an RC with a linear activation function 17,18 . ...
... The minimal possible architecture would be a spectral radius ρ * =0 and block-size b=1 , for which our RC reduces to the case described by Gonon and Ortega. 17 Here, we do not have a reservoir and directly feed the input data to the readout and perform a Ridge regression. While we find this parametrization to be capable of reasonable predictions, a few minor alterations increase the performance significantly. ...
Article
Full-text available
Reservoir computers are powerful machine learning algorithms for predicting nonlinear systems. Unlike traditional feedforward neural networks, they work on small training data sets, operate with linear optimization, and therefore require minimal computational resources. However, the traditional reservoir computer uses random matrices to define the underlying recurrent neural network and has a large number of hyperparameters that need to be optimized. Recent approaches show that randomness can be taken out by running regressions on a large library of linear and nonlinear combinations constructed from the input data and their time lags and polynomials thereof. However, for high-dimensional and nonlinear data, the number of these combinations explodes. Here, we show that a few simple changes to the traditional reservoir computer architecture further minimizing computational resources lead to significant and robust improvements in short- and long-term predictive performances compared to similar models while requiring minimal sizes of training data sets.
... Despite the fast training protocol offered by RC, the random initialization of the reservoir presents its own problem: there is an overwhelmingly large number of hyperparameters to be optimized and there is no consensus on how to pick an optimal reservoir. NG-RC is an alternative approach that takes advantage of the discovery that RC can be equivalently performed using a linear reservoir and a nonlinear trainable output layer [18,19]. The latter is in turn equivalent to a nonlinear vector autoregression (NVAR) machine [20]. ...
Preprint
Full-text available
Next Generation Reservoir Computing (NG-RC) is a modern class of model-free machine learning that enables an accurate forecasting of time series data generated by dynamical systems. We demonstrate that NG-RC can accurately predict full many-body quantum dynamics, instead of merely concentrating on the dynamics of observables, which is the conventional application of reservoir computing. In addition, we apply a technique which we refer to as skipping ahead to predict far future states accurately without the need to extract information about the intermediate states. However, adopting a classical NG-RC for many-body quantum dynamics prediction is computationally prohibitive due to the large Hilbert space of sample input data. In this work, we propose an end-to-end quantum algorithm for many-body quantum dynamics forecasting with a quantum computational speedup via the block-encoding technique. This proposal presents an efficient model-free quantum scheme to forecast quantum dynamics coherently, bypassing inductive biases incurred in a model-based approach.
... Despite the fast training protocol offered by RC, the random initialization of the reservoir presents its own problem: there is an overwhelmingly large number of hyperparameters to be optimized and there is no consensus on how to pick an optimal reservoir. NG-RC is an alternative approach that takes advantage of the discovery that RC can be equivalently performed using a linear reservoir and a nonlinear trainable output layer [18,19]. The latter is in turn equivalent to a nonlinear vector autoregression (NVAR) machine [20]. ...
Preprint
Full-text available
Next Generation Reservoir Computing (NG-RC) is a modern class of model-free machine learning that enables an accurate forecasting of time series data generated by dynamical systems. We demonstrate that NG-RC can accurately predict full many-body quantum dynamics, instead of merely concentrating on the dynamics of observables, which is the conventional application of reservoir computing. In addition, we apply a technique which we refer to as skipping ahead to predict far future states accurately without the need to extract information about the intermediate states. However, adopting a classical NG-RC for many-body quantum dynamics prediction is computationally prohibitive due to the large Hilbert space of sample input data. In this work, we propose an end-to-end quantum algorithm for many-body quantum dynamics forecasting with a quantum computational speedup via the block-encoding technique. This proposal presents an efficient model-free quantum scheme to forecast quantum dynamics coherently, bypassing inductive biases incurred in a model-based approach.
... Reservoir computers [1][2][3][4][5][6][7][8][9] are a particular kind of recurrent neural network where the only trained parameters are outgoing weights forming a linear layer between the internal parameters of the network, called the reservoir, and the readouts. This architecture drastically simplifies the process of training the network while maintaining high computational power. ...
Preprint
In this work, we bound a machine's ability to learn based on computational limitations implied by physicality. We start by considering the information processing capacity (IPC), a normalized measure of the expected squared error of a collection of signals to a complete basis of functions. We use the IPC to measure the degradation under noise of the performance of reservoir computers, a particular kind of recurrent network, when constrained by physical considerations. First, we show that the IPC is at most a polynomial in the system size $n$, even when considering the collection of $2^n$ possible pointwise products of the $n$ output signals. Next, we argue that this degradation implies that the family of functions represented by the reservoir requires an exponential number of samples to learn in the presence of the reservoir's noise. Finally, we conclude with a discussion of the performance of the same collection of $2^n$ functions without noise when being used for binary classification.
... Quantum reservoir computing [12] has been extensively studied in the past years from a theoretical and from an empirical perspective, in [10,33,36,49] and in the review papers [13,37]. Universal approximation properties for quantum reservoir computing have been studied in [6,7,38], analogous to universality results for classical reservoir computing [16,17,19,20]. In quantum reservoir computing there is usually a dynamical aspect, whereby the inputs that need to be processed are sequences. ...
Preprint
Universal approximation theorems are the foundations of classical neural networks, providing theoretical guarantees that the latter are able to approximate maps of interest. Recent results have shown that this can also be achieved in a quantum setting, whereby classical functions can be approximated by parameterised quantum circuits. We provide here precise error bounds for specific classes of functions and extend these results to the interesting new setup of randomised quantum circuits, mimicking classical reservoir neural networks. Our results show in particular that a quantum neural network with $\mathcal{O}(\varepsilon^{-2})$ weights and $\mathcal{O} (\lceil \log_2(\varepsilon^{-1}) \rceil)$ qubits suffices to achieve accuracy $\varepsilon>0$ when approximating functions with integrable Fourier transform.
Article
Full-text available
Hardware implementations tailored to requirements in reservoir computing would facilitate lightweight and powerful temporal processing. Capacitive reservoirs would boost power efficiency due to their ultralow static power consumption but haven't been experimentally exploited yet. Here we report an oxide‐based memcapacitive synapse (OMC) based on Zr‐doped HfO 2 (HZO) for a power‐efficient and multisensory processing reservoir computing system. The nonlinearity and state richness required for reservoir computing could be originated from the capacitively coupled polarization switching and charge trapping of hafnium oxide‐based devices. The power consumption (∼113.4 fJ/spike) and temporal processing versatility outperform most resistive reservoirs. Our system has been verified by common benchmark tasks, and it exhibits high accuracy (>94%) in recognizing multisensory information, including acoustic, electrophysiological, and mechanic modalities. As a proof‐of‐concept, a touchless user interface for virtual shopping is demonstrated based on our OMC‐based reservoir computing system, benefiting from its interference‐robust acoustic and electrophysiological perception. Our results would shed light on the development of high power‐efficient human‐machine interfaces and machine‐learning platforms. This article is protected by copyright. All rights reserved
Article
Full-text available
Reservoir computing (RC), first applied to temporal signal processing, is a recurrent neural network in which neurons are randomly connected. Once initialized, the connection strengths remain unchanged. Such a simple structure turns RC into a non-linear dynamical system that maps low-dimensional inputs into a high-dimensional space. The model’s rich dynamics, linear separability, and memory capacity then enable a simple linear readout to generate adequate responses for various applications. RC spans areas far beyond machine learning, since it has been shown that the complex dynamics can be realized in various physical hardware implementations and biological devices. This yields greater flexibility and shorter computation time. Moreover, the neuronal responses triggered by the model’s dynamics shed light on understanding brain mechanisms that also exploit similar dynamical processes. While the literature on RC is vast and fragmented, here we conduct a unified review of RC’s recent developments from machine learning to physics, biology, and neuroscience. We first review the early RC models, and then survey the state-of-the-art models and their applications. We further introduce studies on modeling the brain’s mechanisms by RC. Finally, we offer new perspectives on RC development, including reservoir design, coding frameworks unification, physical RC implementations, and interaction between RC, cognitive neuroscience and evolution.