Content uploaded by Juan-Pablo Ortega

Author content

All content in this area was uploaded by Juan-Pablo Ortega on Aug 31, 2018

Content may be subject to copyright.

Echo state networks are universal

Lyudmila Grigoryeva1and Juan-Pablo Ortega2,3

Abstract

This paper shows that echo state networks are universal uniform approximants in the context

of discrete-time fading memory ﬁlters with uniformly bounded inputs deﬁned on negative inﬁnite

times. This result guarantees that any fading memory input/output system in discrete time can be

realized as a simple ﬁnite-dimensional neural network-type state-space model with a static linear

readout map. This approximation is valid for inﬁnite time intervals. The proof of this statement

is based on fundamental results, also presented in this work, about the topological nature of the

fading memory property and about reservoir computing systems generated by continuous reservoir

maps.

Key Words: reservoir computing, universality, echo state networks, ESN, state-aﬃne systems,

SAS, machine learning, fading memory property, echo state property, linear training, uniform system

approximation.

1 Introduction

Many recently introduced machine learning techniques in the context of dynamical problems have much

in common with system identiﬁcation procedures developed in the last decades for applications in signal

treatment, circuit theory and, in general, systems theory. In these problems, system knowledge is

only available in the form of input-output observations and the task consists in ﬁnding or learning a

model that approximates it for mainly forecasting or classiﬁcation purposes. An important goal in that

context is ﬁnding families of transformations that are both computationally feasible and versatile enough

to reproduce a rich number of patterns just by modifying a limited number of procedural parameters.

The versatility or ﬂexibility of a given machine learning paradigm is usually established by proving its

universality. We say that a family of transformations is universal when its elements can approximate

as accurately as one wants all the elements of a suﬃciently rich class containing, for example, all

continuous or even all measurable transformations. In the language of learning theory, this is equivalent

to the possibility of making approximation errors arbitrarily small [Cuck 02,Smal 03,Cuck 07]. In

more mathematical terms, the universality of a family amounts to its density in a rich class of the

type mentioned above. Well-known universality results are, for example, the uniform approximation

properties of feedforward neural networks established in [Cybe 89,Horn 89,Horn 91] in the context of

static continuous and, more generally, measurable real functions.

A ﬁrst solution to this problem in the dynamic context was pioneered in the works of Fr´echet

[Frec 10] and Volterra [Volt 30] one century ago when they proved that ﬁnite Volterra series can be

1Department of Mathematics and Statistics. Universit¨at Konstanz. Box 146. D-78457 Konstanz. Germany.

Lyudmila.Grigoryeva@uni-konstanz.de

2Universit¨at Sankt Gallen. Faculty of Mathematics and Statistics. Bodanstrasse 6. CH-9000 Sankt Gallen. Switzer-

land. Juan-Pablo.Ortega@unisg.ch

3Centre National de la Recherche Scientiﬁque (CNRS). France.

1

Echo state networks are universal 2

used to uniformly approximate continuous functionals deﬁned on compact sets of continuous functions.

These results were further extended in the 1950s by the MIT school lead by N. Wiener [Wien 58,Bril 58,

Geor 59] but always under compactness assumptions on the input space and the time interval in which

inputs are deﬁned. A major breakthrough was the generalization to inﬁnite time intervals carried out

by Boyd and Chua in [Boyd 85], who formulated a uniform approximation theorem using Volterra series

for operators endowed with the so called fading memory property on continuous time inputs. An

input/output system is said to have fading memory when the outputs associated to inputs that are close

in the recent past are close, even when those inputs may be very diﬀerent in the distant past.

In this paper we address the universality or the uniform approximation problem for transformations

or ﬁlters of discrete time signals of inﬁnite length that have the fading memory property. The approx-

imating set that we use is generated by nonlinear state-space transformations and that is referred to

as reservoir computers (RC) [Jaeg 10,Jaeg 04,Maas 02,Maas 11,Croo 07,Vers 07,Luko 09] or

reservoir systems. These are special types of recurrent neural networks determined by two maps,

namely a reservoir F:RN×Rn−→ RN,n, N ∈N, and a readout map h:RN→Rdthat under

certain hypotheses transform (or ﬁlter) an inﬁnite discrete-time input z= (. . . , z−1,z0,z1, . . .)∈(Rn)Z

into an output signal y∈(Rd)Zof the same type using the state-space transformation given by:

xt=F(xt−1,zt),

yt=h(xt),

(1.1)

(1.2)

where t∈Zand the dimension N∈Nof the state vectors xt∈RNis referred to as the number of

virtual neurons of the system. When a RC system has a uniquely determined ﬁlter associated to it,

we refer to it as the RC ﬁlter.

An important advantage of the RC approach is that, under certain hypotheses, intrinsically inﬁnite

dimensional problems regarding ﬁlters can be translated into analogous questions related to the reser-

voir and readout maps that generate them and that are deﬁned on much simpler ﬁnite dimensional

spaces. This strategy has already been used in the literature in relation to the universality question

in, for instance, [Sand 91a,Sand 91b,Matt 92,Matt 93,Perr 96,Stub 97]. The universal approxima-

tion properties of feedforward neural networks [Kolm 56,Arno 57,Spre 65,Spre 96,Spre 97,Cybe 89,

Horn 89,Horn 90,Horn 91,Horn 93,Rusc 98] was used in those works to ﬁnd neural networks-based

families of ﬁlters that are dense in the set of approximately ﬁnite memory ﬁlters with inputs deﬁned

in the positive real half-line. Other works in connection with the universality problem in the dynamic

context are [Maas 00,Maas 02,Maas 04,Maas 07] where RC is referred to as Liquid State Machines.

In those references and in the same vein as in [Boyd 85], universal families of RC systems with inputs

deﬁned on inﬁnite continuous time intervals were identiﬁed in the fading memory category as a corollary

of the Stone-Weierstrass theorem. This approach required invoking the natural hypotheses associated

to this result, like the pointwise separation property or the compactness of the input space, that was ob-

tained as a consequence of the fading memory property. Another strand of interesting literature that we

will not explore in this work has to with the Turing computability capabilities of the systems of the type

that we just introduced; recent relevant works in this direction are [Kili 96,Sieg 97,Cabe 15,Cabe 16],

and references therein.

The main contribution of this paper is showing that a particularly simple type of RC systems called

echo state networks (ESNs) can be used as universal approximants in the context of discrete-time

fading memory ﬁlters with uniformly bounded inputs deﬁned on negative inﬁnite times. ESNs are RC

systems of the form (1.1)-(1.2) given by:

xt=σ(Axt−1+Czt+ζ),

yt=Wxt.

(1.3)

(1.4)

In these equations, C∈MN,n is called the input mask,ζ∈RNis the input shift, and A∈MN ,N

Echo state networks are universal 3

is referred to as the reservoir matrix. The map σin the state-space equation (1.3) is constructed by

componentwise application of a sigmoid function (like the hyperbolic tangent or the logistic function)

and is called the activation function. Finally, the readout map is linear in this case and implemented

via the readout matrix W∈Md,N . ESNs already appear in [Matt 92,Matt 93] under the name

of recurrent networks but it was only more recently, in the works of H. Jaeger [Jaeg 04], that their

outstanding performance in machine learning applications was demonstrated.

The strategy that we follow to prove that statement is a combination of what the literature refers

to as internal and external approximation. External approximation is the construction of a RC

ﬁlter that approximates a given (not necessarily RC) ﬁlter. In the internal approximation problem, one

is given a RC ﬁlter and builds another RC ﬁlter that approximates it by ﬁnding reservoir and readout

maps that are close to those of the given one. In the external part of our proof we use a previous

work [Grig 17] where we constructed a family of RC systems with linear readouts that we called non-

homogeneous state aﬃne systems (SAS). We showed in that paper that the RC ﬁlters associated

to SAS systems uniformly approximate any discrete-time fading memory ﬁlter with uniformly bounded

inputs deﬁned on negative inﬁnite times. Regarding the internal approximation, we show that any RC

ﬁlter, in particular SAS ﬁlters, can be approximated by ESN ﬁlters using the universal approximation

property of neural networks. These two facts put together allow us to conclude that ESN ﬁlters are

capable of uniformly approximating any discrete-time fading memory ﬁlter with uniformly bounded

inputs. We emphasize that this result is shown exclusively for deterministic inputs using a uniform

approximation criterion; an extension of this statement that accommodates stochastic inputs and Lp

approximation criteria can be found in [Gono 18].

The paper is structured in three sections:

•Section 2introduces the notation that we use all along the paper and, more importantly, speciﬁes

the topologies and Banach space structures that we need in order to talk about continuity in the

context of discrete-time ﬁlters. It is worth mentioning that we characterize the fading memory

property as a continuity condition of the ﬁlters that have it with respect to the product topology

in the input space. On other words, the fading memory property is not a metric property, as it is

usually presented in the literature, but a topological one. An important conceptual consequence of

this fact is that the fading memory property does not contain any information about the rate at

which systems that have it “forget” inputs. Several corollaries can be formulated as a consequence

of this fact that are very instrumental in the developments in the paper.

•Section 3contains a collection of general results in relation with the properties of the RC systems

generated by continuous reservoir maps. In particular, we provide conditions that guarantee that

a unique reservoir ﬁlter can be associated to them (the so called echo state property) and we

identify situations in which those ﬁlters are themselves continuous (they satisfy automatically

the fading memory property). We also point out large classes of RC systems for which internal

approximation is possible, that is, if the RC systems are close then so are the associated reservoir

ﬁlters.

•Section 4shows that echo state networks are universal uniform approximants in the category of

discrete-time fading memory ﬁlters with uniformly bounded inputs.

2 Continuous and fading memory ﬁlters

This section introduces the notation of the paper as well as general facts about ﬁlters and functionals

needed in the developments that follow. The new results are contained in Section 2.3, where we charac-

terize the fading memory property as a continuity condition when the sequence spaces where inputs and

Echo state networks are universal 4

outputs are deﬁned are uniformly bounded and are endowed with the product topology. This feature

makes this property independent of the weighting sequences that are usually introduced to deﬁne it.

2.1 Notation

Vectors and matrices. A column vector is denoted by a bold lower case symbol like rand r>

indicates its transpose. Given a vector v∈Rn, we denote its entries by vi, with i∈ {1, . . . , n}; we also

write v= (vi)i∈{1,...,n}. We denote by Mn,m the space of real n×mmatrices with m, n ∈N. When

n=m, we use the symbol Mnto refer to the space of square matrices of order n. Given a matrix

A∈Mn,m, we denote its components by Aij and we write A= (Aij), with i∈ {1, . . . , n},j∈ {1,...m}.

Given a vector v∈Rn, the symbol kvkstands for any norm in Rn(they are all equivalent) and is not

necessarily the Euclidean one, unless it is explicitly mentioned. The open balls with respect to a given

norm k·k, center v∈Rn, and radius r > 0 will be denoted by Bk·k(v, r ); their closures by Bk·k(v, r).

For any A∈Mn,m,kAk2denotes its matrix norm induced by the Euclidean norms in Rmand Rn, and

satisﬁes [Horn 13, Example 5.6.6] that kAk2=σmax(A), with σmax (A) the largest singular value of A.

kAk2is sometimes referred to as the spectral norm of A. The symbol |||·||| is reserved for the norms of

operators or functionals deﬁned on inﬁnite dimensional spaces.

Sequence spaces. Ndenotes the set of natural numbers with the zero element included. Z(re-

spectively, Z+and Z−) are the integers (respectively, the positive and the negative integers). The

symbol (Rn)Zdenotes the set of inﬁnite real sequences of the form z= (. . . , z−1,z0,z1, . . .), zi∈Rn,

i∈Z; (Rn)Z−and (Rn)Z+are the subspaces consisting of, respectively, left and right inﬁnite sequences:

(Rn)Z−={z= (. . . , z−2,z−1,z0)|zi∈Rn, i ∈Z−}, (Rn)Z+={z= (z0,z1,z2, . . .)|zi∈Rn, i ∈Z+}.

Analogously, (Dn)Z, (Dn)Z−, and (Dn)Z+stand for (semi-)inﬁnite sequences with elements in the sub-

set Dn⊂Rn. In most cases we endow these inﬁnite product spaces with the Banach space structures

associated to one of the following two norms:

•The supremum norm: deﬁne kzk∞:= supt∈Z{kztk}. The symbols ∞(Rn) and ∞

±(Rn) are

used to denote the Banach spaces formed by the elements in the corresponding inﬁnite product

spaces that have a ﬁnite supremum norm.

•The weighted norm: let w:N−→ (0,1] be a decreasing sequence with zero limit. We deﬁne the

associated weighted norm k · kwon (Rn)Z−associated to the weighting sequence was the

map:

k·kw: (Rn)Z−−→ R+

z7−→ kzkw:= supt∈Z−{kztw−tk}.

The Proposition 5.2 in Appendix 5.11 shows that the space

w

−(Rn) := nz∈(Rn)Z−| kzkw<∞o,

endowed with weighted norm k·kwforms also a Banach space.

It is straightforward to show that kzkw≤ kzk∞, for all v∈(Rn)Z−. This implies that ∞

−(Rn)⊂w

−(Rn)

and that the inclusion map (∞

−(Rn),k·k∞)→(w

−(Rn,k·kw) is continuous.

2.2 Filters and systems

Filters. Let Dn⊂Rnand DN⊂RN. We refer to the maps of the type U: (Dn)Z−→ (DN)Zas ﬁlters

or operators and to those like H: (Dn)Z−→ DN(or H: (Dn)Z±−→ DN) as RN-valued functionals.

Echo state networks are universal 5

These deﬁnitions will be sometimes extended to accommodate situations where the domains and the

targets of the ﬁlters are not necessarily product spaces but just arbitrary subsets of (Rn)Zand RNZ

like, for instance, ∞(Rn) and ∞(RN).

A ﬁlter U: (Dn)Z−→ (DN)Zis called causal when for any two elements z,w∈(Dn)Zthat satisfy

that zτ=wτfor any τ≤t, for a given t∈Z, we have that U(z)t=U(w)t. Let Tτ: (Dn)Z−→ (Dn)Z

be the time delay operator deﬁned by Tτ(z)t:= zt−τ. The ﬁlter Uis called time-invariant (TI) when

it commutes with the time delay operator, that is, Tτ◦U=U◦Tτ, for any τ∈Z(in this expression,

the two operators Tτhave to be understood as deﬁned in the appropriate sequence spaces).

We recall (see for instance [Boyd 85]) that there is a bijection between causal time-invariant ﬁlters

and functionals on (Dn)Z−. Indeed, consider the sets F(Dn)Z−and H(Dn)Z−deﬁned by

F(Dn)Z−:= U: (Dn)Z−→ (RN)Z|Uis causal and time-invariant,(2.1)

H(Dn)Z−:= H: (Dn)Z−−→ RN.(2.2)

Then, given a time-invariant ﬁlter U: (Dn)Z−→ (RN)Z, we can associate to it a functional HU:

(Dn)Z−−→ RNvia the assignment HU(z) := U(ze)0, where ze∈(Rn)Zis an arbitrary extension of

z∈(Dn)Z−to (Dn)Z. Let Ψ:F(Dn)Z−−→ H(Dn)Z−be the map such that Ψ(U) := HU. Conversely, for

any functional H: (Dn)Z−−→ RN, we can deﬁne a time-invariant causal ﬁlter UH: (Dn)Z−→ (RN)Z

by UH(z)t:= H((PZ−◦T−t)(z)), where T−tis the (−t)-time delay operator and PZ−: (Rn)Z−→ (Rn)Z−

is the natural projection. Let Φ:H(Dn)Z−−→ F(Dn)Z−be the map such that Φ(H) := UH. It is easy

to verify that:

Ψ◦Φ=IH

(Dn)Z−or, equivalently, HUH=H, for any functional H: (Dn)Z−→RN,

Φ◦Ψ=IF

(Dn)Z−or, equivalently, UHU=U, for any causal TI ﬁlter U: (Dn)Z→(RN)Z,

that is, Ψand Φare inverses of each other and hence are both bijections. Additionally, we note that

the sets F(Dn)Z−and H(Dn)Z−are vector spaces with naturally deﬁned operations and that Ψand Φare

linear maps between them, which allows us to conclude that F(Dn)Z−and H(Dn)Z−are linear isomorphic.

When a ﬁlter is causal and time-invariant, we work in many situations just with the restriction

U: (Dn)Z−−→ (DN)Z−instead of the original ﬁlter U: (Dn)Z−→ (DN)Zwithout making the

distinction, since the former uniquely determines the latter. Indeed, by deﬁnition, for any z∈(Dn)Z

and t∈Z:

U(z)t= (T−t(U(z)))0= (U(T−t(z)))0,(2.3)

where the second equality holds by the time-invariance of Uand the value in the right-hand side depends

only on PZ−(T−t(z)) ∈(Dn)Z−, by causality.

Reservoir systems and ﬁlters. Consider now the RC system determined by (1.1)–(1.2) with reser-

voir map deﬁned on subsets DN, D0

N⊂RNand Dn⊂Rn, that is, F:DN×Dn−→ D0

Nand

h:D0

N→Rd. There are two properties of reservoir systems that will be crucial in what follows:

•Existence of solutions property: this property holds when for each z∈(Dn)Zthere exists an

element x∈(DN)Zthat satisﬁes the relation (1.1) for each t∈Z.

•Uniqueness of solutions or echo state property (ESP): it holds when the system has the

existence of solutions property and, additionally, these solutions are unique.

The echo state property has deserved much attention in the context of echo state networks [Jaeg 10,

Jaeg 04,Bueh 06,Yild 12,Bai 12,Wain 16,Manj 13,Gall 17]. We emphasize that these two properties

Echo state networks are universal 6

are genuine conditions that are not automatically satisﬁed by all RC systems. Later on in the paper,

Theorem 3.1 speciﬁes suﬃcient conditions for them to hold.

The combination of the existence of solutions with the axiom of choice allows us to associate ﬁlters

UF: (Dn)Z−→ (DN)Zto each RC system with that property via the reservoir map and (1.1), that is,

UF(z)t:= xt∈RN, for all t∈Z. We will denote by UF

h: (Dn)Z−→ (Dd)Zthe corresponding ﬁlter

determined by the entire reservoir system, that is, UF

h(z)t=hUF(z)t:= yt∈Rd.UF

his said to be

areservoir ﬁlter or a response map associated to the RC system (1.1)–(1.2). The ﬁlters UFand

UF

hare causal by construction. A unique reservoir ﬁlter can be associated to a reservoir system when

the echo state property holds. We warn the reader that reservoir ﬁlters appear in the literature only

in the presence of the ESP; that is why we sometimes make the distinction between those that come

from reservoir systems that do and do not satisfy the ESP by referring to them as reservoir ﬁlters

and generalized reservoir ﬁlters, respectively.

In the systems theory literature, the RC equations (1.1)–(1.2) are referred to as the state-variable

or the internal representation point of view and associated ﬁlters as the external representation

of the system.

The next proposition shows that in the presence of the ESP, reservoir ﬁlters are not only causal

but also time-invariant. In that situation we can hence associate to UF

hareservoir functional

HF

h: (Dn)Z−−→ Rddetermined by HF

h:= HUF

h.

Proposition 2.1 Let DN⊂RN,Dn⊂Rn, and F:DN×Dn−→ DNbe a reservoir map that satisﬁes

the echo state property for all the elements in (Dn)Z. Then, the corresponding ﬁlter UF: (Dn)Z−→

(DN)Zis causal and time-invariant.

We emphasize that, as it can be seen in the proof in the appendix, it is the autonomous character

of the reservoir map that guarantees time-invariance in the previous proposition. An explicit time

dependence on time in that map would spoil that conclusion.

Reservoir system morphisms. Let N1, N2, n, d ∈Nand let F1:DN1×Dn−→ DN1,h1:DN1→Rd

and F2:DN2×Dn−→ DN2,h2:DN2→Rdbe two reservoir systems. We say that a map f:DN1−→

DN2is a morphism between the two systems when it satisﬁes the following two properties:

(i) Reservoir equivariance: f(F1(x1,z)) = F2(f(x1),z),for all x1∈DN1, and z∈Dn.

(ii) Readout invariance: h1(x1) = h2(f(x1)), for all x1∈DN1.

When the map fhas an inverse and it is also a morphism between the systems determined by the

pairs (F2, h2) and (F1, h1) we say that fis a system isomorphism and that the systems (F1, h1) and

(F2, h2) are isomorphic. Given a system F1:DN1×Dn−→ DN1,h1:DN1→Rdand a bijection

f:DN1−→ DN2, the map fis a system isomorphism with respect to the system F2:DN2×Dn−→ DN2,

h2:DN2→Rddeﬁned by

F2(x2,z) := fF1(f−1(x2),z),for all x2∈DN2,z∈Dn,(2.4)

h2(x2) := h1(f−1(x2))),for all x2∈DN2.(2.5)

The proof of the following statement is a straightforward consequence of the deﬁnitions.

Proposition 2.2 Let F1:DN1×Dn−→ DN1,h1:DN1→Rdand F2:DN2×Dn−→ DN2,

h2:DN2→Rdbe two reservoir systems. Let f:DN1−→ DN2be a morphism between them. Then:

(i) If x1∈(DN1)Zis a solution for the reservoir map F1associated to the input z∈(Dn)Z, then the

sequence x2∈(DN2)Zdeﬁned by x2

t:= fx1

t,t∈Z, is a solution for the reservoir map F2

associated to the same input.

Echo state networks are universal 7

(ii) If UF1

h1is a generalized reservoir ﬁlter for the system determined by the pair (F1, h1)then it is also

a reservoir ﬁlter for the system (F2, h2). Equivalently, given a generalized reservoir ﬁlter UF1

h1

determined by (F1, h1), there exists a generalized reservoir ﬁlter UF2

h2determined by (F2, h2)such

that UF1

h1=UF2

h2.

(iii) If fis a system isomorphism then the implications in the previous two points are reversible.

2.3 Continuity and the fading memory property

In agreement with the notation introduced in the previous section, in the following paragraphs the

symbol U: (Dn)Z−−→ (DN)Z−stands for a causal and time-invariant ﬁlter or, strictly speaking, for

the restriction of U: (Dn)Z−→ (DN)Zto Z−, see (2.3); HU: (Dn)Z−−→ DNis the associated

functional, for some DN⊂RNand Dn⊂Rn. Analogously, UHis the ﬁlter associated to a given

functional H.

Deﬁnition 2.3 (Continuous ﬁlters and functionals) Let DN⊂RNand Dn⊂Rnbe bounded

subsets such that (Dn)Z−⊂∞

−(Rn)and (DN)Z−⊂∞

−(RN). A causal and time-invariant ﬁlter U:

(Dn)Z−−→ (DN)Z−is called continuous when it is a continuous map between the metric spaces

(Dn)Z−,k·k∞and (DN)Z−,k·k∞. An analogous prescription can be used to deﬁne continuous

functionals H:(Dn)Z−,k·k∞−→ (DN,k·k).

The following proposition shows that when ﬁlters are causal and time-invariant, their continuity can

be read out of their corresponding functionals and viceversa.

Proposition 2.4 Let Dn⊂Rnand DN⊂RNbe such that (Dn)Z−⊂∞

−(Rn)and (DN)Z−⊂∞

−(RN).

Let U: (Dn)Z−−→ (DN)Z−be a causal and time-invariant ﬁlter, H: (Dn)Z−−→ DNa functional,

and let Φand Ψbe the maps deﬁned in the previous section. Then, if the ﬁlter Uis continuous then

so is the associated functional Ψ(U) =: HU. Conversely, if His continuous then so is Φ(H) =: UH.

Deﬁne now the vector spaces

F∞

(Dn)Z−:= U: (Dn)Z−−→ ∞

−(RN)|Uis causal, time-invariant, and continuous,(2.6)

H∞

(Dn)Z−:= H: (Dn)Z−−→ RN|His continuous.(2.7)

The previous statements guarantee that the maps Ψand Φrestrict to the maps (that we denote with

the same symbol) Ψ:F∞

(Dn)Z−−→ H∞

(Dn)Z−and Φ:H∞

(Dn)Z−−→ F∞

(Dn)Z−that are linear isomorphisms

and are inverses of each other.

Deﬁnition 2.5 (Fading memory ﬁlters and functionals) Let w:N−→ (0,1] be a weighting se-

quence and let DN⊂RNand Dn⊂Rnbe such that (Dn)Z−⊂w

−(Rn)and (DN)Z−⊂w

−(RN).

We say that a causal and time-invariant ﬁlter U: (Dn)Z−−→ (DN)Z−(respectively, a functional

H: (Dn)Z−−→ DN) satisﬁes the fading memory property (FMP) with respect to the sequence w

when it is a continuous map between the metric spaces (Dn)Z−,k·kwand (DN)Z−,k·kw(respec-

tively, (Dn)Z−,k·kwand (DN,k·k)). If the weighting sequence wis such that wt=λt, for some

λ∈(0,1) and all t∈N, then Uis said to have the λ-exponential fading memory property. We

deﬁne the sets

Fw

(Dn)Z−,(DN)Z−:= U: (Dn)Z−−→ (DN)Z−|Ucausal, time-invariant, and FMP w.r.t. w,(2.8)

Hw

(Dn)Z−,(DN)Z−:= H: (Dn)Z−−→ DN|His FMP with respect to w.(2.9)

Echo state networks are universal 8

These deﬁnitions can be extended by replacing the product set (DN)Z−by any subset of w

−(RN)that is

not necessarily a product space. In particular, we deﬁne the sets

Fw

(Dn)Z−:= U: (Dn)Z−−→ w

−(RN)|Uis causal, time-invariant, and FMP w.r.t. w,(2.10)

Hw

(Dn)Z−:= H: (Dn)Z−−→ RN|His FMP with respect to w.(2.11)

Deﬁnitions 2.3 and 2.5 can be easily reformulated in terms of more familiar -δ-type criteria, as they

were introduced in [Boyd 85]. For example, the continuity of the functional H: (Dn)Z−−→ DNis

equivalent to stating that for any z∈(Dn)Z−and any > 0, there exists a δ()>0 such that for any

s∈(Dn)Z−that satisﬁes that

kz−sk∞= sup

t∈Z−

{kzt−stk} < δ(),then kHU(z)−HU(s)k< . (2.12)

Regarding the fading memory property, it suﬃces to replace the implication in (2.12) by

kz−skw= sup

t∈Z−

{kzt−stkw−t}< δ(),then kHU(z)−HU(s)k< . (2.13)

A very important part of the results that follow concern uniformly bounded families of sequences,

that is, subsets of (Rn)Z−of the form

KM:= nz∈(Rn)Z−| kztk ≤ Mfor all t∈Z−o,for some M > 0. (2.14)

It is straightforward to show that KM⊂∞

−(Rn)⊂w

−(Rn), for all M > 0 and any weighting sequence

w. A very useful fact is that the relative topology induced by (w

−(Rn),k·kw) in KMcoincides with the

one induced by the product topology in (Rn)Z−. This is a consequence of the following result that is a

slight generalization of [Munk 14, Theorem 20.5]. A proof is provided in Appendix 5.3 for the sake of

completeness.

Theorem 2.6 Let k·k :Rn−→ [0,∞)be a norm in Rn,M > 0, and let w:N−→ (0,1] be a weighting

sequence. Let dM(a,b) := min {ka−bk, M },a,b∈Rn, be a bounded metric on Rnand deﬁne the

w-weighted metric DM

won (Rn)Z−as

DM

w(x,y) := sup

t∈Z−dM(xt,yt)w−t,x,y∈(Rn)Z−.(2.15)

Then DM

wis a metric that induces the product topology on (Rn)Z−. The space (Rn)Z−is complete

relative to this metric.

An important consequence that can be drawn from this theorem is that all the weighted norms induce

the same topology on the subspaces formed by uniformly bounded sequences. An obvious consequence

of this fact is that continuity with respect to this topology can be deﬁned without the help of weighting

sequences or, equivalently, ﬁlters or functionals with uniformly bounded inputs that have the fading

memory with respect to a weighting sequence, have the same feature with respect to any other weighting

sequence. We make this more speciﬁc in the following statements.

Corollary 2.7 Let M > 0and let KM:= nz∈(Rn)Z−| kztk ≤ Mfor all t∈Z−obe a subset of

(Rn)Z−formed by uniformly bounded sequences. Let w:N−→ (0,1] be an arbitrary weighting sequence.

Then, the metric induced by the weighted norm k·kwon KMcoincides with D2M

w. Moreover, since D2M

w

induces the product topology on KM=Bk·k(0, M )Z−, we can conclude that all the weighted norms

induce the same topology on KM. We recall that Bk·k (0, M )is the closure of the ball with radius M

centered at the origin, with respect to the norm k·k in Rn. The same conclusion holds when instead of

KMwe consider the set (Dn)Z−, with Dna compact subset of Rn.

Echo state networks are universal 9

Theorem 2.6 can also be used to give a quick alternative proof in discrete time to an important

compactness result originally formulated in Boyd and Chua in [Boyd 85, Lemma 1] for continuous time

and, later on, in [Grig 17] for discrete time. The next corollary contains an additional completeness

statement.

Corollary 2.8 Let KMbe the set of uniformly bounded sequences, deﬁned as in (2.14), and let w:

N−→ (0,1] be a weighting sequence. Then, (KM,k·kw)is a compact, complete, and convex subset of the

Banach space (w

−(Rn),k·kw). The compactness and the completeness statements also hold when instead

of KMwe consider the set (Dn)Z−, with Dna compact subset of Rn; if Dnis additionally convex then

the convexity of (Dn)Z−is also guaranteed.

It is important to point out that the coincidence between the product topology and the topologies

induced by weighted norms that we described in Corollary 2.7 only occurs for uniformly bounded sets

of the type introduced in (2.14). As we state in the next result, the norm topology in w

−(Rn) is strictly

ﬁner than the one induced by the product topology in (Rn)Z−.

Proposition 2.9 Let w:N−→ (0,1] be a weighting sequence and let (w

−(Rn),k·kw)be the Banach

space constructed using the corresponding weighted norm on the space of left inﬁnite sequences with

elements in Rn. The norm topology in w

−(Rn)is strictly ﬁner than the subspace topology induced by the

product topology in (Rn)Z−on w

−(Rn)⊂(Rn)Z−.

The results that we just proved imply an elementary property of the sets that we deﬁned in (2.8)-(2.9)

and (2.10)-(2.11) that we state in the following lemma.

Lemma 2.10 Let M > 0and let wbe a weighting sequence. Let U:KM−→ w

−(RN)(respectively,

H:KM−→ RN) be and element of Fw

KM(respectively, Hw

KM). Then there exists L > 0such that

U(KM)⊂KL(respectively, H(KM)⊂Bk·k(0, L))) and we can hence conclude that U∈Fw

KM,KL

(respectively, H∈Hw

KM,KL). Conversely, the inclusion Fw

KM,KL⊂Fw

KM(respectively, Hw

KM,KL⊂Hw

KM)

holds true for any M > 0. The sets Fw

KMand Hw

KMare vector spaces.

The next proposition spells out how the fading memory property is independent of the weighting

sequence that is used to deﬁne it, which shows its intrinsically topological nature. A conceptual conse-

quence of this fact is that the fading memory property does not contain any information about the rate

at which systems that have it “forget” inputs. A similar statement in the continuous time setup has

been formulated in [Sand 03]. Additionally, there is a bijection between FMP ﬁlters and functionals.

Proposition 2.11 Let KM⊂(Rn)Z−and KL⊂RNZ−be subsets of uniformly bounded sequences

deﬁned as in (2.14)and let w:N−→ (0,1] be a weighting sequence. Let U:KM−→ KLbe a causal

and time-invariant ﬁlter and let H:KM−→ Bk·k (0, L)be a functional. Then:

(i) If U(respectively H) has the fading memory property with respect to the weighting sequence w, then

it has the same property with respect to any other weighting sequence. In particular, this implies

that

Fw

KM,KL=Fw0

KM,KLand Hw

KM,KL=Hw0

KM,KL,for any weighting sequence w0.

In what follows we just say that U(respectively H) has the fading memory property and denote

FFMP

KM,KL:= Fw

KM,KLand HFMP

KM,KL:= Hw

KM,KL,for any weighting sequence w.

The same statement holds true for the vector spaces Fw

KMand Hw

KM, that will be denoted in the

sequel by FFMP

KMand HFMP

KM, respectively.

Echo state networks are universal 10

(ii) Let Φand Ψbe the maps deﬁned in the previous section. Then, if the ﬁlter Uhas the fading memory

property then so does the associated functional Ψ(U) =: HU. Analogously, if Hhas the fading

memory property, then so does Φ(H) =: UH. This implies that the maps Ψand Φrestrict to maps

(that we denote with the same symbols) Ψ:FFMP

KM,KL−→ HFMP

KM,KLand Φ:HFMP

KM,KL−→ FFMP

KM,KL

that are inverses of each other. The same applies to Ψ:FFMP

KM−→ HFMP

KMand Φ:HFMP

KM−→ FFMP

KM

that, in this case, are linear isomorphisms.

The same statements can be formulated when instead of KMand KLwe consider the sets (Dn)Z−and

(DN)Z−, with Dnand DNcompact subsets of Rnand RN, respectively.

In the conditions of the previous proposition, the vector spaces FFMP

KMand HFMP

KMcan be endowed

with a norm. More speciﬁcally, let U:KM−→ w

−(Rn) be a ﬁlter and let H:KM−→ RNbe a

functional that have the FMP. Deﬁne:

|||U|||∞:= sup

z∈KM

{kU(z)k∞}= sup

z∈KM(sup

t∈Z−

{kU(z)tk}),(2.16)

|||H|||∞:= sup

z∈KM

{kH(z)k} .(2.17)

The compactness of (KM,k·kw) guaranteed by Corollary 2.8 and the fact that by Lemma 2.10 Uand

Hmap into uniformly bounded sequences and a compact subspace of RN, respectively, ensures that

the values in (2.16) and (2.17) are ﬁnite, which makes FFMP

KM,|||·|||∞and HFMP

KM,|||·|||∞into normed

spaces that, as we will see in the next result, are linearly homeomorphic. For any L > 0 these norms

restrict to the spaces FFMP

KM,KLand HFMP

KM,KL, which are in general not linear but become nevertheless

metric spaces.

Proposition 2.12 The linear isomorphism Ψ:FFMP

KM,|||·|||∞−→ HFMP

KM,|||·|||∞and its inverse Φ

satisfy that

|||Ψ(U)|||∞≤ |||U|||∞,for any U∈FFMP

KM,(2.18)

|||Φ(H)|||∞≤ |||H|||∞,for any H∈HFMP

KM.(2.19)

These inequalities imply that these two maps are continuous linear bijections and hence the spaces

FFMP

KM,|||·|||∞and HFMP

KM,|||·|||∞are linearly homeomorphic. Equivalently, the following diagram com-

mutes and all the maps in it are linear and continuous

FFMP

KM,|||·|||∞Ψ

−−−−−−−−−→ HFMP

KM,|||·|||∞

IdFFMP

KMx

yIdHFMP

KM

FFMP

KM,|||·|||∞Φ

←−−−−−−−−− HFMP

KM,|||·|||∞.

For any L > 0, the inclusions FFMP

KM,KL,|||·|||∞→FFMP

KM,|||·|||∞and HFMP

KM,KL,|||·|||∞→HFMP

KM,|||·|||∞

(see Lemma 2.10) are continuous and so are the restricted bijections (that we denote with the same sym-

bols) Ψ: (FFMP

KM,KL,|||·|||∞)−→ (HFMP

KM,KL,|||·|||∞)and Φ: (HFMP

KM,KL,|||·|||∞)−→ (FFMP

KM,KL,|||·|||∞)that are

inverses of each other. The last statement is a consequence of the following inequalities:

|||Ψ(U1)−Ψ(U2)|||∞≤ |||U1−U2|||∞,for any U1, U2∈FFMP

KM,KL,(2.20)

|||Φ(H1)−Φ(H2)|||∞≤ |||H1−H2|||∞,for any H1, H2∈HFMP

KM,KL.(2.21)

The same statements can be formulated when instead of KMand KLwe consider the sets (Dn)Z−and

(DN)Z−, with Dnand DNcompact subsets of Rnand RN, respectively.

Echo state networks are universal 11

3 Internal approximation of reservoir ﬁlters

This section characterizes situations under which reservoir ﬁlters can be uniformly approximated by

ﬁnding uniform approximants for the corresponding reservoir systems. Such a statement is part of the

next theorem that also identiﬁes criteria for the availability of the echo state and the fading memory

properties (recall that we used the acronyms ESP and FMP, respectively). As it was already mentioned,

a reservoir system has the ESP when it has a unique semi-inﬁnite solution for each semi-inﬁnite input.

We also recall that in the presence of uniformly bounded inputs, as it was shown in Section 2.3, the FMP

amounts to the continuity of a reservoir ﬁlter with respect to the product topologies on the input and

output spaces. The completeness and compactness of those spaces established in Corollary 2.8 allows

us to use various ﬁxed point theorems to show that solutions for reservoir systems exist under very

weak hypotheses and that for contracting and continuous reservoir maps (we deﬁne this below) these

solutions are unique and depend continuously on the inputs. Said diﬀerently, contracting continuous

reservoir maps induce reservoir ﬁlters that automatically have the echo state and the fading memory

properties.

Theorem 3.1 Let KM⊂(Rn)Z−and KL⊂RNZ−be subsets of uniformly bounded sequences deﬁned

as in (2.14)and let F:Bk·k(0, L)×Bk·k(0, M )−→ Bk·k (0, L)be a continuous reservoir map.

(i) Existence of solutions: for each z∈KMthere exists a x∈KL(not necessarily unique) that

solves the reservoir equation associated to F, that is,

xt=F(xt−1,zt),for all t∈Z−.

(ii) Uniqueness and continuity of solutions (ESP and FMP): suppose that the reservoir map F

is a contraction, that is, there exists 0< r < 1such that for all u,v∈Bk·k (0, L),z∈Bk·k(0, M ),

one has

kF(u,z)−F(v,z)k ≤ rku−vk.

Then, the reservoir system associated to Fhas the echo state property. Moreover, this system has

a unique associated causal and time-invariant ﬁlter UF:KM−→ KLthat has the fading memory

property, that is, UF∈FFMP

KM,KL. The set UF(KM)of accessible states of the ﬁlter UFis compact.

(iii) Internal approximation property: let F1, F2:Bk·k(0, L)×Bk·k (0, M )−→ Bk·k(0, L)be two

continuous reservoir maps such that F1is a contraction with constant 0< r < 1and F2has the

existence of solutions property. Let UF1, UF2:KM−→ KLbe the corresponding ﬁlters (if F2does

not have the ESP, then UF2is just a generalized ﬁlter). Then, for any > 0, we have that

kF1−F2k∞< δ() := (1 −r)implies that |||UF1−UF2|||∞< . (3.1)

Part (i) also holds true when instead of KMand KLwe consider the sets (Dn)Z−and (DN)Z−, with Dn

and DNcompact and convex subsets of Rnand RN, respectively, that replace the closed balls Bk·k(0, M )

and Bk·k(0, L). The same applies to parts (ii) and (iii) but, this time, the convexity hypothesis is not

needed.

Deﬁne the set KKM,KL:= nF:Bk·k(0, L)×Bk·k (0, M )−→ Bk·k(0, L)|Fis a continuous contractiono.

Using the notation introduced in the previous section, the statement in (3.1) and part (ii) of the theorem

automatically imply that the map

Ξ : (KKM,KL,k·k∞)−→ FFMP

KM,KL,|||·|||∞

F7−→ UF

Echo state networks are universal 12

is continuous and by Proposition 2.12, the map that associates to each F∈KKM,KLthe corresponding

functional HF, that is,

Ψ◦Ξ : (KKM,KL,k·k∞)−→ HFMP

KM,KL,|||·|||∞

F7−→ HF,

is also continuous.

Proof of the theorem. (i) We start by deﬁning, for each z∈KM, the map given by

Fz:KL−→ KL

x7−→ (Fz(x))t:= F(xt−1,zt).

We show ﬁrst that Fzcan be written as a product of continuous functions. Indeed:

Fz=Y

t∈Z−

F(·,zt)◦pt−1(x),(3.2)

where the projections pt:KL−→ Bk·k(0, L) are given by pt(x) = xt. These projections are continuous

when we consider in KLthe product topology. Additionally, the continuity of the reservoir Fimplies

that Fzis a product of continuous functions, which ensures that Fzis itself continuous [Munk 14,

Theorem 19.6]. Moreover, by the corollaries 2.7 and 2.8, the space KLis a compact and convex subset

of the Banach space w

−(Rn),k·kw(see Proposition 5.2), for any weighting sequence w. Schauder’s

Fixed Point Theorem (see [Shap 16, Theorem 7.1, page 75]) guarantees then that Fzhas at least a ﬁxed

point, that is, a point x∈KLthat satisﬁes Fz(x) = xor, equivalently,

xt=F(xt−1,zt),for all t∈Z−,

which implies that xis a solution of Ffor z, as required.

Proof of part (ii) The main tool in the proof of this part is a parameter dependent version of the

Contraction Fixed Point Theorem, that we include here for the sake of completeness and whose proof

can be found in [Ster 10, Theorem 6.4.1, page 137].

Lemma Let (X, dX)be a complete metric space and let Zbe a metric space. Let K:X×Z−→ X

be a continuous map such that for each z∈Z, the map Kz:X−→ Xgiven by Kz(x) := K(x, z)is a

contraction with a constant 0< r < 1(independent of z), that is, dX(K(x, z), K(y , z)) ≤rd(x, y), for

all x, y ∈Xand all z∈Z. Then:

(i) For each z∈Z, the map Kzhas a unique ﬁxed point in X.

(ii) The map UK:Z−→ Xthat associates to each point z∈Zthe unique ﬁxed point of Kzis

continuous.

Consider now the map

F:KL×KM−→ KL

(x,z)7−→ (F(x,z))t:= F(xt−1,zt).

First, as we did in (3.2), it is easy to show that Fis continuous with respect to the product topologies

in KMand KL, by writing it down as the product of the composition of continuous functions. Second,

we show that the map Fis a contraction. Indeed, since by Corollary 2.7 we can choose an arbitrary

weighting sequence to generate the product topologies in KMand KL, we select w:N−→ (0,1] given

Echo state networks are universal 13

by wt:= λt, with t∈Nand λ > 0 that satisﬁes 0 <r<λ<1. Then, for any x,y∈KLand any

z∈KM, we have

kF(x,z)− F(y,z)kw= sup

t∈Z−kF(xt−1,zt)−F(yt−1,zt)kλ−t≤sup

t∈Z−kxt−1−yt−1krλ−t,

where we used that Fis a contraction. Now, since 0 <r<λ<1 and hence r/λ < 1, we have

sup

t∈Z−kxt−1−yt−1krλ−t= sup

t∈Z−nkxt−1−yt−1kλ−(t−1) r

λo≤r

λkx−ykw.

This shows that Fis a family of contractions with constant r/λ < 1 that is continuously parametrized by

the elements in KM. The lemma above implies the existence of a continuous map UF: (KM,k·kw)−→

(KL,k·kw) that is uniquely determined by the identity

F(UF(z),z) = UF(z),for all z∈KM.

Proposition 2.1 implies that UFis causal and time-invariant. The set UF(KM) of accessible states of

the ﬁlter UFis compact because it is the image of a compact set (see Corollary 2.8) by a continuous

map (see [Munk 14, Theorem 26.5, page 166]).

Proof of part (iii) Let z∈KMand let UF1(z) be the unique solution for zof the reservoir systems

associated to F1available by the part (ii) of the theorem that we just proved. Additionally, let UF2(z)

be the value of a generalized ﬁlter associated to F2that exist by hypothesis. Then, for any t∈Z−, we

have:

kUF1(z)t−UF2(z)tk=kF1(UF1(z)t−1,zt)−F2(UF2(z)t−1,zt)k

=kF1(UF1(z)t−1,zt)−F1(UF2(z)t−1,zt) + F1(UF2(z)t−1,zt)−F2(UF2(z)t−1,zt)k

≤ kF1(UF1(z)t−1,zt)−F1(UF2(z)t−1,zt)k+kF1(UF2(z)t−1,zt)−F2(UF2(z)t−1,zt)k

≤rkUF1(z)t−1−UF2(z)t−1k+kF1(UF2(z)t−1,zt)−F2(UF2(z)t−1,zt)k.

If we now recursively apply ntimes the same procedure to the ﬁrst summand of this expression, we

obtain that

kUF1(z)t−UF2(z)tk ≤ rnkUF1(z)t−n−UF2(z)t−nk+kF1(UF2(z)t−1,zt)−F2(UF2(z)t−1,zt)k

+rkF1(UF2(z)t−2,zt−1)−F2(UF2(z)t−2,zt−1)k

+· · · +rn−1

F1(UF2(z)t−n,zt−(n+1))−F2(UF2(z)t−n,zt−(n+1) )

(3.3)

If we combine the inequality (3.3) with the hypothesis

kF1−F2k∞= sup

x∈Bk·k(0,L),z∈Bk·k (0,M )

{kF1(x,z)−F2(x,z)k} < δ() := (1 −r),

we obtain

kUF1(z)−UF2(z)k∞= sup

t∈Z−

{kUF1(z)t−UF2(z)tk}

≤2Lrn+ (1 + · · · +rn−1)δ()=2Lrn+1−rn

1−rδ() (3.4)

Since this inequality is valid for any n∈N, we can take the limit n−→ ∞ and we obtain that

kUF1(z)−UF2(z)k∞≤δ()

1−r=.

Echo state networks are universal 14

Additionally, as this relation is valid for any z∈KM, we can conclude that

|||UF1−UF2|||∞= sup

z∈KM

{kUF1(z)−UF2(z)k∞} ≤ ,

as required.

As a straightforward corollary of the ﬁrst part of the previous theorem, it is easy to show that echo

state networks always have (generalized) reservoir ﬁlters associated as well as to formulate conditions

that ensure simultaneously the echo state and the fading memory properties.

We recall that a map σ:R−→ [−1,1] is a squashing function if it is non-decreasing, limx→−∞ σ(x) =

−1, and limx→∞ σ(x) = 1.

Corollary 3.2 Consider echo state network given by

xt=σ(Axt−1+Czt+ζ),

yt=Wxt,

(3.5)

(3.6)

where C∈MN,n for some N∈N,ζ∈RN,A∈MN,N ,W∈Md,N , and the input signal z∈(Dn)Z,

with Dn⊂Rna compact and convex subset. The function σ:RN−→ [−1,1]Nin (3.5)is constructed

by componentwise application of a squashing function that we also call σ. Then:

(i) If the squashing function σis continuous, then the reservoir equation (3.5)has the existence of

solutions property and we can hence associate to the system (3.5)-(3.6)a generalized reservoir

ﬁlter.

(ii) If the squashing function σis diﬀerentiable with Lipschitz constant Lσ:= supx∈R{|σ0(x)|} <∞

and the matrix Ais such that kAk2Lσ=σmax(A)Lσ<1, then the reservoir system (3.5)-(3.6)

has the echo state and the fading memory properties and we can hence associate to it a unique

time-invariant reservoir ﬁlter.

The statement in part (i) remains valid when [−1,1]Nis replaced by a compact and convex subset

DN⊂[−1,1]Nthat is left invariant by the reservoir equation (3.5), that is, σ(Ax+Cz+ζ)∈DNfor

any x∈DNand any z∈Dn. The same applies to part (ii) but only the compactness hypothesis is

necessary.

Remark 3.3 The hypothesis kAk2Lσ<1 appears in the literature as a suﬃcient condition to ensure

the echo state property, which has been extensively studied in the ESN literature [Jaeg 10,Jaeg 04,

Bueh 06,Bai 12,Yild 12,Wain 16,Manj 13]. Our result shows that this condition implies automati-

cally the fading memory property. Nevertheless, that condition is far from being sharp and has been

signiﬁcantly improved in [Bueh 06,Yild 12]. We point out that the enhanced suﬃcient conditions for

the echo state property contained in those references also imply the fading memory property via part

(ii) of Theorem 3.1.

4 Echo state networks as universal uniform approximants

The internal approximation property that we introduced in part (ii) of Theorem 3.1 tells us that we

can approximate any reservoir ﬁlter by ﬁnding an approximant for the reservoir system that generates

it. This reduces the problem of proving a density statement in a space of operators between inﬁnite-

dimensional spaces to a space of functions with ﬁnite dimensional variables and values. This topic is the

subject of many results in approximation theory, some of which we mentioned in the introduction. This

Echo state networks are universal 15

strategy allows one to ﬁnd simple approximating reservoir ﬁlters for any reservoir system that has the

fading memory property. In the next result we use as approximating family the echo state networks that

we presented in the introduction and that, as we see later on, are the natural generalizations of neural

networks in a dynamic learning setup, with the important added feature that they are constructed using

linear readouts. The combination of this approach with a previously obtained result [Grig 17] on the

density of reservoir ﬁlters on the fading memory category allows us to prove in the next theorem that

echo state networks can approximate any fading memory ﬁlter. On other words, echo state networks

are universal.

All along this section, we use the Euclidean norm for the ﬁnite dimensional spaces, that is, for

each x∈Rn, we write kxk:= Pn

i=1 x2

i1/2. For any M > 0, the symbol Bk·k (0, M ) (respectively

Bk·k(0, M )) denotes here the open (respectively closed) balls with respect to that norm. Additionally,

we set In:= Bk·k(0,1).

Theorem 4.1 Let U:IZ−

n−→ RdZ−be a causal and time-invariant ﬁlter that has the fading memory

property. Then, for any > 0and any weighting sequence w, there is an echo state network

xt=σ(Axt−1+Czt+ζ),

yt=Wxt.

(4.1)

(4.2)

whose associated generalized ﬁlters UESN :IZ−

n−→ RdZ−satisfy that

|||U−UESN|||∞< . (4.3)

In these expressions C∈MN,n for some N∈N,ζ∈RN,A∈MN,N , and W∈Md,N . The function

σ:RN−→ [−1,1]Nin (4.1)is constructed by componentwise application of a continuous squashing

function σ:R−→ [−1,1] that we denote with the same symbol.

When the approximating echo state network (4.1)-(4.2)satisﬁes the echo state property, then it has a

unique ﬁlter UESN associated which is necessarily time-invariant. The corresponding reservoir functional

HESN :IZ−

n−→ Rdsatisﬁes that

|||HU−HESN|||∞< . (4.4)

Remark 4.2 Echo state networks are generally used in practice in the following way: the architecture

parameters A,C, and ζare drawn at random from a given distribution and it is only the readout matrix

Wthat is trained using a teaching signal by solving a linear regression problem. It is important to

emphasize that the universality theorem that we just stated does not completely explain the empirically

observed robustness of ESNs with respect to the choice of those parameters. In the context of standard

feedforward neural networks this feature has been addressed using, for example, the so called extreme

learning machines [Huan 06]. In dynamical setups and for ESNs this question remains an open problem

that will be addressed in future works.

Proof of the theorem. As we already explained, we proceed by ﬁrst approximating the ﬁlter Uby

one of the non-homogeneous state-aﬃne system (SAS) reservoir ﬁlters introduced in [Grig 17], and we

later on show that we can approximate that reservoir ﬁlter by an echo state network like the one in

(4.1)-(4.2).

We start by recalling that a non-homogeneous state-aﬃne system is a reservoir system determined

by the state-space transformation: xt=p(zt)xt−1+q(zt),

yt=W1xt,

(4.5)

(4.6)

Echo state networks are universal 16

where the inputs zt∈In:= Bk·k(0,1), the states xt∈RN1, for some N1∈N, and W1∈Md,N1. The

symbols p(zt) and q(zt) stand for polynomials with matrix coeﬃcients and degrees rand s, respectively,

of the form:

p(z) = X

i1,...,in∈{0,...,r}

i1+···+in≤r

zi1

1· · · zin

nAi1,...,in, Ai1,...,in∈MN1,z∈In

q(z) = X

i1,...,in∈{0,...,s}

i1+···+in≤s

zi1

1· · · zin

nBi1,...,in, Bi1,...,in∈MN1,1,z∈In.

Let L > 0 and choose a real number Ksuch that

0< K < L

L+ 1 <1.(4.7)

Consider now SAS ﬁlters that satisfy that maxz∈Inσmax(p(z)) < K and maxz∈Inσmax (q(z)) < K.

It can be shown [Grig 17, Proposition 3.7] that under those hypotheses, the reservoir system (4.5)-

(4.6) has the echo state property and deﬁnes a unique causal, time-invariant, and fading memory ﬁlter

Up,q

W1:IZ−

n−→ (Rd)Z−. Moreover, Theorem 3.12 in [Grig 17] shows that for any 1>0, there exists a

SAS ﬁlter Up,q

W1satisfying the hypotheses that we just discussed, for which

HU−Hp,q

W1∞< 1,(4.8)

where HUand Hp,q

W1are the reservoir functionals associated to Uand Up,q

W1, respectively. Proposition

2.12 together with this inequality imply that

U−Up,q

W1∞< 1.(4.9)

We now show that the SAS ﬁlter Up,q

W1can be approximated by the ﬁlters generated by an echo state

network. Deﬁne the map

FSAS :Bk·k(0, L)×In−→ RN1

(x,z)7−→ p(z)x+q(z),(4.10)

with Bk·k(0, L)⊂RN1and pand qthe polynomials associated to the approximating SAS ﬁlter Up,q

W1in

(4.9).

The prescription on the choice of the constant Kin (4.7) has two main consequences. Firstly, the

map FSAS is a contraction. Indeed, for any (x,z),(y,z)∈Bk·k(0, L)×In:

kFSAS(x,z)−FSAS (y,z)k≤kp(z)x−p(z)yk≤kp(z)k2kx−yk ≤ Kkx−yk.(4.11)

The map FSAS is hence a contraction since K < 1 by hypothesis. Second, kFSASk∞< L because by

(4.7)

kFSASk∞= sup

(x,z)∈Bk·k(0,L)×In

{kp(z)x+q(z)k} ≤ sup

(x,z)∈Bk·k(0,L)×In

{kp(z)k2kxk+kq(z)k} ≤ KL+K < L.

This implies, in particular, that the map FSAS maps into Bk·k(0, L) and hence (4.10) can be rewritten

as

FSAS :Bk·k(0, L)×In−→ Bk·k(0, L).

Additionally, we set

L1:= kFSASk∞< L. (4.12)

Echo state networks are universal 17

The uniform density on compacta of the family of feedforward neural networks with one hidden

layer proved in [Cybe 89,Horn 89] guarantees that for any 2>0, there exists N∈N,G∈MN,N1,

C∈MN,n,E∈MN1,N , and ζ∈RN, such that the map deﬁned by

FNN :Bk·k(0, L)×In−→ RN1

(x,z)7−→ Eσ (Gx+Cz+ζ),(4.13)

satisﬁes that

kFNN −FSASk∞= sup

x∈Bk·k(0,L),z∈In

{kFNN(x,z)−FSAS (x,z)k} < 2.(4.14)

The combination of (4.14) with the reverse triangle inequality implies that kFNNk∞− kFSAS k∞< 2

or, equivalently,

kFNNk∞<kFSAS k∞+2.(4.15)

Given that kFSASk∞=L1< L, if we choose 2>0 small enough so that L1+2< L or, equivalently,

2< L −L1,(4.16)

then (4.15) guarantees that kFNNk∞< L, which shows that FNN maps into Bk·k(0, L), that is, we can

write that

FNN :Bk·k(0, L)×In−→ Bk·k(0, L).(4.17)

The continuity of the map FNN and the ﬁrst part of Theorem 3.1 imply that the corresponding reservoir

equation has the existence of solutions property and that we can hence associate to it a (generalized)

ﬁlter UFNN . At the same time, as we proved in (4.11), the map FSAS is a contraction with constant

K < 1. These facts, together with (4.14) and the internal approximation property in Theorem 3.1 allow

us to conclude that the (unique) reservoir ﬁlter UFSAS associated to the reservoir map FSAS is such that

|||UFNN −UFSAS |||∞< 2/(1 −K).(4.18)

Consider now the readout map hW1:RN1−→ Rdgiven by hW1(x) := W1xand let UhW1

FNN : (In)Z−−→

(Rd)Z−be the ﬁlter given by UhW1

FNN (z)t:= W1UFNN (z)t,t∈Z−. Analogously, deﬁne UhW1

FSAS : (In)Z−−→

(Rd)Z−and notice that UhW1

FSAS =Up,q

W1. Using these observations and (4.18) we have proved that for any

2>0 we can ﬁnd a ﬁlter of the type UhW1

FNN that satisﬁes that

Up,q

W1−UhW1

FNN ∞≤ kW1k2|||UFSAS −UFNN |||∞<kW1k22/(1 −K).(4.19)

Consequently, for any > 0, if we ﬁrst set 1=/2 in (4.8) and we then choose

2:= min (1 −K)

2kW1k2

,L−L1

2,(4.20)

in view of (4.16) and (4.19), we can guarantee using (4.9) and (4.19) that

U−UhW1

FNN ∞≤U−Up,q

W1∞+Up,q

W1−UhW1

FNN ∞≤

2+

2=. (4.21)

In order to conclude the proof it suﬃces to show that the ﬁlter UhW1

FNN can be realized as the reservoir

ﬁlter associated to an echo state network of the type presented in the statement. We carry that out

Echo state networks are universal 18

by using the elements that appeared in the construction of the reservoir FNN in (4.13) to deﬁne a new

reservoir map FESN with the architecture of an echo state network. Let A:= GE ∈MNand deﬁne

FESN :DN×In−→ RN

(x,z)7−→ σ(Ax+Cz+ζ).(4.22)

The set DNin the domain of FESN is given by

DN:= [−1,1]N∩E−1(Bk·k(0, L)),(4.23)

where E−1(Bk·k(0, L)) denotes the preimage of the set Bk·k (0, L)⊂RN1by the linear map E:RN−→

RN1associated to the matrix E∈MN1,N . This set is compact as E−1(Bk·k (0, L)) is closed and [−1,1]N

is compact and hence DNis a closed subspace of a compact space which is always compact [Munk 14,

Theorem 26.2]. Additionally, DNis also convex because [−1,1]Nis convex and E−1(Bk·k(0, L)) is also

convex because it is the preimage of a convex set by a linear map, which is always convex.

We note now that the image of FESN is contained in DN. First, as the squashing function maps into

the interval [−1,1], it is clear that

FESN (DN, In)⊂[−1,1]N.(4.24)

Second, for any x∈DNwe have by construction that x∈E−1(Bk·k(0, L)) and hence Ex∈Bk·k(0, L).

Since by (4.17)FNN maps into Bk·k(0, L), we can ensure that for any z∈In, the image FNN(Ex,z) =

Eσ (GEx+Cz+ζ) = Eσ (Ax+Cz+ζ)∈Bk·k(0, L) or, equivalently,

FESN(x,z) = σ(Ax+Cz+ζ)∈E−1(Bk·k (0, L)).(4.25)

The relations (4.24) and (4.25) imply that

FESN (DN, In)⊂DN,(4.26)

and hence, we can rewrite (4.22) as

FESN :DN×In−→ DN.

The continuity of the map FESN and the compactness and convexity of the set DN⊂RNthat we

established above allow us to use the ﬁrst part of Theorem 3.1 to conclude that the corresponding

reservoir equation has the existence of solutions property and that we can hence associate to it a

(generalized) ﬁlter UFESN . Let W:= W1E∈Md,n and deﬁne the readout map hESN :DN−→ Rd

by hESN(x) := Wx=W1Ex. Denote by UESN any generalized reservoir ﬁlter associated to the echo

state network system (FESN, hESN) that, by construction, satisﬁes UESN(z)t:= hESN(UFESN (z)t) =

W UFESN (z)t, for any z∈Inand t∈Z−.

We next show that the map f:DN= [−1,1]N∩E−1(Bk·k(0, L)) −→ Bk·k(0, L) given by f(x) := Ex

is a morphism between the echo state network system (FESN, hESN ) and the reservoir system (FNN, hW1).

Indeed, the reservoir equivariance property holds because, for any (x,z)∈DN×In, the deﬁnitions (4.13)

and (4.22) ensure that

f(FESN(x,z)) = Eσ (Ax+Cz+ζ) = Eσ (GEx+Cz+ζ) = FNN (Ex,z) = FNN(f(x),z).

The readout invariance is obvious. This fact and the second part in Proposition 2.2 imply that all the

generalized ﬁlters UESN associated to the echo state network are actually ﬁlters generated by the system

(FNN, hW1). This means that for each generalized ﬁlter UESN there exists a generalized ﬁlter of the type

UhW1

FNN such that UESN =UhW1

FNN . The inequality (4.21) proves then (4.3) in the statement of the theorem.

The last claim in the theorem is a straightforward consequence of Propositions 2.1 and 2.12.

Echo state networks are universal 19

5 Appendices

5.1 Proof of Proposition 2.1

Let τ∈Nand let Tn

τ: (Dn)Z−→ (Dn)Zand TN

τ: (DN)Z−→ (DN)Zbe the corresponding time delay

operators. For any z∈(Dn)Z, let x∈(DN)Zbe the unique solution of the reservoir system determined

by F, that is,

x:= UF(z).(5.1)

Then, for any t∈Z,TN

τ◦UF(z)=xt−τ.(5.2)

Analogously, let e

x∈(DN)Zbe the unique solution of Fassociated to the input Tn

τ(z), that is,

e

xt=UF◦Tn

τ(z)t,for any t∈Z.(5.3)

By construction, the sequence e

xsatisﬁes that

e

xt=F(e

xt−1, T n

τ(z)t) = F(e

xt−1,zt−τ),for any t∈Z.

It we set s:= t−τ, this expression can be rewritten as

e

xs+τ=F(e

xs+τ−1,zs),for any s∈Z,(5.4)

and if we deﬁne b

xs:= e

xs+τ, the equality (5.4) becomes

b

xs=F(b

xs−1,zs),for any s∈Z,

which shows that b

x∈(DN)Zis a solution of Fdetermined by the input z∈(Dn)Z. Since the sequence

x∈(DN)Zin (5.1) is also a solution of Ffor the same input, the echo state property hypothesis on

the systems determined by Fimplies that x=b

x, necessarily. This implies that xt−τ=b

xt−τfor all

t∈Z, which is equivalent to e

xt=xt−τ. This equality guarantees that (5.2) and (5.3) are equal and

since b

z∈(Dn)Zis arbitrary, we have that

TN

τ◦UF=UF◦Tn

τ,

as required.

5.2 Proof of Proposition 2.4

Suppose ﬁrst that Uis continuous. This implies the existence of a positive function δU() such that if

u,v∈(Dn)Z−are such that ku−vk∞< δU(), then kU(u)−U(v)k∞< . Under that hypothesis, it

is clear that:

kHU(u)−HU(v)k=kU(u)0−U(v)0k ≤ sup

t∈Z−

{kU(u)t−U(v)tk} =kU(u)−U(v)k∞< ,

which shows the continuity of HU:(Dn)Z−,k·k∞−→ (DN,k·k).

Conversely, suppose that H:(Dn)Z−,k·k∞−→ (DN,k·k) is continuous and let δH()>0 be such

that if ku−vk∞< δH() then kH(u)−H(v)k< . Then, for any t∈Z−,

kUH(u)t−UH(v)tk=

H((PZ−◦T−t)(u)) −H((PZ−◦T−t)(v))

< , (5.5)

which proves the continuity of UH. The inequality follows from the fact that for any u∈(Dn)Z−, the

components of the sequence (PZ−◦T−t)(u) are included in those of uand hence sups∈Z−

(PZ−◦T−t(u))s

≤

sups∈Z−{kusk} or, equivalently,

(PZ−◦T−t)(u)

∞≤ kuk∞. This implies that if ku−vk∞< δH()

then kT−t(u)−T−t(v)k∞< δH() and hence (5.5) holds.

Echo state networks are universal 20

5.3 Proof of Theorem 2.6

We ﬁrst show that the map DM

w: (Rn)Z−×(Rn)Z−−→ [0,∞) deﬁned in (2.15) is indeed a met-

ric. It is clear that DM

w(x,y)≥0 and that DM

w(x,x) = 0, for any x,y∈(Rn)Z−. Conversely, if

DM

w(x,y) = 0, this implies that dM(xt,yt)w−t≤supt∈Z−dM(xt,yt)w−t=DM

w(x,y) = 0, which

ensures that dM(xt,yt) = 0, for any t∈Z−, and hence x=ynecessarily since the map dMis a metric

in Rn[Munk 14, Chapter 2, §20]. It is also obvious that DM

w(x,y) = DM

w(y,x). Regarding the triangle

inequality, notice that for any x,y,z∈(Rn)Z−and t∈Z−:

dM(xt,zt)w−t≤dM(xt,yt)w−t+dM(yt,zt)w−t≤DM

w(x,y) + DM

w(y,z),

which implies that

DM

w(x,z) = sup

t∈Z−dM(xt,zt)w−t≤DM

w(x,y) + DM

w(y,z).

We now show that the metric topology on (R)Z−associated to DM

wcoincides with the product

topology. Let x∈(Rn)Z−and let BDM

w(x, ) be an -ball around it with respect to the metric DM

w. Let

now N∈Nbe large enough so that wN< /M. We then show that the basis element Vfor the product

topology in (Rn)Z−given by

V:= · · · × Rn×Rn×BdM(x−N, )× · · · BdM(x−1, )×BdM(x0, )

and that obviously contains the element x∈(Rn)Z−is such that V⊂BDM

w(x, ). Indeed, since for any

y∈(Rn)Z−and any t∈Z−we have that dM(xt,yt)≤M, we can conclude that

dM(xt,yt)w−t≤MwN,for all t≤ −N .

Therefore, DM

w(x,y)≤max Mw−N, dM(x−N,y−N)wN, . . . , dM(x−1,y−1)w1, dM(x0,y0)w0and hence

if y∈Vthis expression is smaller than which allows us to conclude the desired inclusion V⊂

BDM

w(x, ).

Conversely, consider a basis element of the product topology given by U=Qt∈Z−Utwhere Ut=

BdM(xt, t) for a ﬁnite set of indices t∈ {α1, . . . , αr},t≤1, and Ut=Rnfor the rest. Let

:= mint∈{α1,...,αr}{tw−t}. We now show that BDM

w(x, )⊂U. Indeed, if y∈BDM

w(x, ) then

dM(xt,yt)w−t≤DM

w(x,y)< , for all t∈Z−. It t∈ {α1, . . . , αr}then < tw−tand hence

dM(xt,yt)w−t< tw−t, which ensures that dM(xt,yt)< tand hence y∈U, as desired.

We conclude by showing that (Rn)Z−, DM

wis a complete metric space. First, notice that since for

any x,y∈(Rn)Z−and any given t∈Z−we have that

dM(xt,yt)≤DM

w(x,y)

w−t

,

we can conclude that if {x(i)}i∈Nis a Cauchy sequence in (Rn)Z−, then so are the sequences {xt(i)}i∈N

in Rn, for any t∈Z−, with respect to the bounded metric dM. Since the completeness with respect

to the bounded metric dMand the Euclidean metric are equivalent [Munk 14, Chapter 7, §43] we can

ensure that {xt(i)}i∈Nconverges to an element at∈Rnwith respect to the Euclidean metric for any

t∈Z−. We now show that {x(i)}i∈Nconverges to a:= (at)t∈Z−∈(Rn)Z−, with respect to the metric

DM

w, which proves the completeness statement.

Indeed, since the metric DM

wgenerates the product topology, let U=Q∈∈Z−Utbe a basis element

such that a∈Uand, as before, Ut=BdM(at, t) for a ﬁnite set of indices t∈ {α1, . . . , αr},t≤1,

and Ut=Rnfor the rest. Let = min {α1, . . . , αr}. Since for each t∈Z−the sequence xt(i)i→∞

−→ at,

then there exists Nt∈Nsuch that for any k > Ntwe have that kxt(k)−atk< . If we take N=

max {Nα1, . . . , Nαr}then it is clear that x(i)∈U, for all i>N, as required.

Echo state networks are universal 21

5.4 Proof of Corollary 2.7

Notice ﬁrst that for any x,y∈KM, we have that kxt−ytk<2M,t∈Z−, and hence

D2M

w(x,y) := sup

t∈Z−d2M(xt,yt)w−t= sup

t∈Z−

{kxt−ytkw−t}=kx−ykw.

Hence, the topology induced by the weighted norm k·kwon KMcoincides with the metric topology

induced by the restricted metric D2M

w|KM×KMwhich, by Theorem 2.6, is the subspace topology induced

by the product topology on (Rn)Z−on KM(see [Munk 14, Exercise 1, page 133]), as well as the product

topology on the product KM=Bk·k (0, M )Z−(see [Munk 14, Theorem 19.3, page 116])).

5.5 Proof of Corollary 2.8

First, since KM=Bk·k(0, M )Z−, it is clearly the product of compact spaces. By Tychonoﬀ’s Theorem

([Munk 14, Chapter 5]) KMis compact when endowed with the product topology which, by Corollary

2.7, coincides with the topology associated to the restriction of the norm k·kwto KM, as well as with

the metric topology given by D2M

w|KM×KM.

Second, since (KM,k·kw) is metrizable it is a Hausdorﬀ space. This implies (see [Munk 14, Theorem

26.3]) that as KMis a compact subspace of the Banach space (w

−(Rn),k·kw) (see Proposition 5.2)

then it is necessarily closed. This in turn implies ([Simm 63, Theorem B, page 72]) that (KM,k·kw) is

complete.

Finally, the convexity statement follows from the fact that the product of convex sets is always

convex.

5.6 Proof of Proposition 2.9

Let dwbe the metric on w

−(Rn) induced by the weighted norm k·kwand let Dw:= D1

wbe the w-

weighted metric on (Rn)Z−with constant M= 1 introduced in Theorem 2.6 and deﬁned using the same

underlying norm in Rnas the one associated to k·kw. As we saw in that theorem, the metric Dwinduces

the product topology on (Rn)Z−.

Let now u∈w

−(Rn) and let > 0. Let now v∈w

−(Rn) be such that dw(u,v)< . By deﬁnition,

we have that

Dw(u,v) = sup

t∈Z−d1(xt,yt)w−t= sup

t∈Z−

{(min{kxt−ytk,1})w−t} ≤ sup

t∈Z−

{kxt−ytkw−t}=dw(u,v)< ,

which shows that Bdw(u, )⊂BDw(u, ) and allows us to conclude that the norm topology in w

−(Rn)

is ﬁner than the subspace topology induced by the product topology in (Rn)Z−.

We now show that this inclusion is strict. Since the weighting sequence wconverges to zero, there

exists an element t0∈Z−such that w−t0< /2. Let λ > 0 arbitrary and deﬁne the element vλ∈(Rn)Z−

by setting vλ

t0:= λut0and vλ

t:= λutwhen t6=t0. We now show that vλ∈BDw(u, ) for any λ > 0.

Indeed,

Dw(u,vλ) = min{|λ−1|kut0k,1}w−t0≤1·w−t0< /2< .

At the same time, by deﬁnition,

dw(u,vλ) = |λ−1|kut0kw−t0<∞,

which shows that vλ∈w

−(Rn). However, since |λ−1|kut0kw−t0can be made as large as desired by

choosing λbig enough, we have proved that for any ball Bdw(u, 0), with 0>0 arbitrary, the ball

Echo state networks are universal 22

BDw(u, ) contains always an element in w

−(Rn) that is not included in Bdw(u, 0). This argument

allows us to conclude that the norm topology in w

−(Rn) is strictly ﬁner than the subspace topology

induced by the product topology.

5.7 Proof of Lemma 2.10

The proof requires the following preparatory lemma that will also be used later on in the proof of

Proposition 2.11.

Lemma 5.1 Let M > 0and let wbe a weighting sequence. Then:

(i) The operator PZ−◦T−t: (KM,k·kw)−→ (KM,k·kw)is a continuous map, for any t∈Z−.

(ii) The projections pi: (w

−(Rn),k·kw)−→ (Rn,k·k),i∈Z−, given by pi(z) = zi, are continuous.

Proof of the lemma. (i) We show that this statement is true by characterizing PZ−◦T−tas a

Cartesian product of continuous maps between two product spaces endowed with the product topologies

and by using Corollary 2.7. Indeed, notice ﬁrst that the projections pi: (KM,k·kw)−→ Bk·k (0, M )

are continuous since by Corollary 2.7 the topology induced on KMby the weighted norm k·kwis the

product topolopy. Since PZ−◦T−tcan be written as the inﬁnite Cartesian product of continuous maps

PZ−◦T−t=Q−∞

i=tpi= (...,pt−2, pt−1, pt) it is hence continuous when using the product topology

induced by k·kw(see [Munk 14, Theorem 19.6]).

(ii) Notice ﬁrst that the projections pi: (w

−(Rn),k·kw)−→ (Rn,k·k) are obviously continuous when we

consider in w

−(Rn) the subspace topology induced by the product topology in (Rn)Z−. The continuity

of pi: (w

−(Rn),k·kw)−→ (Rn,k·k) then follows directly from Proposition 2.9.H

We now proceed with the proof of Lemma 2.10. Let ﬁrst H∈Hw

KM. The FMP hypothesis implies

that the map H: (KM,k·kw)−→ (RN,k·k) is continuous. Given that KMis compact by Corollary 2.8

then so is H(KM)⊂RM. This in turn implies that H(KM) is closed and bounded [Munk 14, Theorem

27.3] which guarantees the existence of L > 0 such that H(KM)⊂Bk·k(0, L)). The map obtained out

of Hby restriction of its target to Bk·k(0, L)) (that we denote with the same symbol) is also continuous

and hence H∈Hw

KM,KL.

Let now U:KM−→ w

−(RN) in Fw

KMand consider the composition p0◦U:KM−→ RN. The

FMP hypothesis on Uand the continuity of p0: (w

−(Rn),k·kw)−→ (Rn,k·k) that we established in the

second part of Lemma 5.1 imply that p0◦Uis continuous. This implies, together with the compactness

of KMthat we proved in Corollary 2.8, the existence of L > 0 such that p0◦U(KM)⊂Bk·k(0, L)).

Equivalently, for any z∈KM, we have that U(z)0∈Bk·k(0, L)). Now, since Uis by hypothesis time

invariant, we have by (2.3) that

U(z)t= (T−t(U(z)))0=U(T−t(z))0∈Bk·k(0, L)), t ∈Z−,since T−t(z)∈KM,

which proves that U(KM)⊂KL. The map obtained out of Uby restriction of its target to KL

(that we denote with the same symbol) is also continuous since (KL,k·kw) is a topological subspace of

(w

−(Rn),k·kw) and hence U∈Fw

KM,KL, as required.

The inclusion Fw

KM,KL⊂Fw

KM(respectively, Hw

KM,KL⊂Hw

KM) is a consequence of the continuity of

the inclusion map (KL,k·kw)→(w

−(Rn),k·kw) (respectively, (Bk·k(0, L)),k·k)→(Rn,k·k)).

Echo state networks are universal 23

5.8 Proof of Proposition 2.11

Proof of part (i) The FMP of Uwith respect to the sequence wis, by deﬁnition, equivalent to the con-

tinuity of the map U: (KM,k·kw)−→ (KL,k·kw) (respectively, H: (KM,k·kw)−→ (Bk·k(0, L),k·k)).

By Corollary 2.8, this is equivalent to the continuity of these maps when KMand KLare endowed with

the product topology which is, by the same result, generated by any arbitrary weighting sequence.

Consider now U: (KM,k·kw)−→ (w

−(Rn),k·kw) in Fw

KM(respectively, H: (KM,k·kw)−→ (Rn,k·k)

in Hw

KM). By Lemma 2.10 there exists an L > 0 such that U(respectively, H) can be considered

an element of Fw

KM,KL(respectively, Hw

KM,KL) by restriction of the target. Using the statement that

we just proved about the space Fw

KM,KL(respectively, Hw

KM,KL) we can conclude that U(respectively,

H) has the FMP with respect to any weighting sequence. Since, again by Lemma 2.10, the inclusion

Fw

KM,KL⊂Fw

KM(respectively, Hw

KM,KL⊂Hw

KM) holds true for any M > 0, and any weighting sequence

w, we can conclude that U(respectively,