Available via license: CC BY 4.0

Content may be subject to copyright.

PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

Catch-22s of reservoir computing

Yuanzhao Zhang 1and Sean P. Cornelius 2

1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA

2Department of Physics, Toronto Metropolitan University, Toronto, Ontario M5B 2K3, Canada

(Received 19 March 2023; accepted 2 August 2023; published 25 September 2023)

Reservoir computing (RC) is a simple and efﬁcient model-free framework for forecasting the behavior of

nonlinear dynamical systems from data. Here, we show that there exist commonly-studied systems for which

leading RC frameworks struggle to learn the dynamics unless key information about the underlying system

is already known. We focus on the important problem of basin prediction—determining which attractor a

system will converge to from its initial conditions. First, we show that the predictions of standard RC models

(echo state networks) depend critically on warm-up time, requiring a warm-up trajectory containing almost

the entire transient in order to identify the correct attractor. Accordingly, we turn to next-generation reservoir

computing (NGRC), an attractive variant of RC that requires negligible warm-up time. By incorporating the

exact nonlinearities in the original equations, we show that NGRC can accurately reconstruct intricate and

high-dimensional basins of attraction, even with sparse training data (e.g., a single transient trajectory). Yet,

a tiny uncertainty in the exact nonlinearity can render prediction accuracy no better than chance. Our results

highlight the challenges faced by data-driven methods in learning the dynamics of multistable systems and

suggest potential avenues to make these approaches more robust.

DOI: 10.1103/PhysRevResearch.5.033213

I. INTRODUCTION

Reservoir computing (RC) [1–12] is a machine learning

framework for time-series predictions based on recurrent neu-

ral networks. Because only the output layer needs to be

modiﬁed, RC is extremely efﬁcient to train. Despite its sim-

plicity, recent studies have shown that RC can be extremely

powerful when it comes to learning unknown dynamical

systems from data [13]. Speciﬁcally, RC has been used to

reconstruct attractors [14,15], calculate Lyapunov exponents

[16], infer bifurcation diagrams [17], and even predict the

basins of unseen attractors [18,19]. These advances open the

possibilities of using RC to improve climate modeling [20],

create digital twins [21], anticipate synchronization [22,23],

predict tipping points [24,25], and infer network connections

[26].

Since the landmark paper demonstrating RC’s ability to

predict spatiotemporally chaotic systems from data [13],

there has been a ﬂurry of efforts to understand the suc-

cess as well as identify limitations of RC [27–36]. As a

result, more sophisticated architectures have been developed

to extend the capability of the original framework, such as

hybrid [37], parallel [38,39], and symmetry-aware [40]RC

schemes.

One particularly promising variant of RC was proposed

in 2021 and named next-generation reservoir computing

Published by the American Physical Society under the terms of the

Creative Commons Attribution 4.0 International license. Further

distribution of this work must maintain attribution to the author(s)

and the published article’s title, journal citation, and DOI.

(NGRC) [41]. There, instead of having a nonlinear reser-

voir and a linear output layer, one has a linear reservoir and

a nonlinear output layer [42]. These differences, although

subtle, confer several advantages: First, NGRC requires no

random matrices and thus has much fewer hyperparameters

that need to be optimized. Moreover, each NGRC prediction

needs exceedingly few data points to initiate (as opposed to

thousands of data points in standard RC), which is especially

useful when predicting the basins of attraction in multistable

dynamical system [43].

Understanding the basin structure is of fundamental im-

portance for dynamical systems with multiple attractors. Such

systems include neural networks [44,45], gene regulatory net-

works [46,47], differentiating cells [48,49], and power grids

[50,51]. Basins of attraction provide a mapping from initial

conditions to attractors and, in the face of noise or pertur-

bations, tell us the robustness of each stable state. Despite

their importance, basins have not been well studied from a

machine learning perspective, with most methods for data-

driven modeling of dynamical systems currently focusing on

systems with a single attractor.

In this article, we show that the success of standard RC

in predicting the dynamics of multistable systems can depend

critically on having access to long initialization trajectories,

while the performance of NGRC can be extremely sensitive

to the choice of readout nonlinearity. It has been observed

that for each new initial condition, a standard RC model

needs to be “warmed up” with thousands of data points be-

fore it can start making predictions [43]. In practice, such

data will not exist for most initial conditions. Even when

they do exist, we demonstrate that the warm-up time series

would often have already approached the attractor, rendering

2643-1564/2023/5(3)/033213(19) 033213-1 Published by the American Physical Society

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

predictions unnecessary [52]. In contrast, NGRC can easily

reproduce highly intermingled and high-dimensional basins

with minimal warm-up, provided the exact nonlinearity in the

underlying equations is known. However, a 1% uncertainty on

that nonlinearity can already make the NGRC basin predic-

tions barely outperform random guesses. Given this extreme

sensitivity, even if one had partial (but imprecise) knowledge

of the underlying system, a hybrid scheme combining NGRC

and such knowledge would still struggle in making reliable

predictions.

The rest of the paper is organized as follows. In Sec. II,we

introduce the ﬁrst model system under study—the magnetic

pendulum, which is representative of the difﬁculties of basin

prediction in real nonlinear systems. In Secs. III–IV, we apply

standard RC to this system, showing that accurate predictions

rely heavily on the length of the warm-up trajectory. We thus

turn to next-generation reservoir computing, giving a brief

overview of its implementation in Sec. V. We present our main

results in Sec. VI, where we characterize the effect of readout

nonlinearity on NGRC’s ability to predict the basins of the

magnetic pendulum. We further support our ﬁndings using

coupled Kuramoto oscillators in Sec. VII, which can have a

large number of coexisting high-dimensional basins. Finally,

we discuss the implications of our results and suggest avenues

for future research in Sec. VIII.

II. THE MAGNETIC PENDULUM

For concreteness, we focus on the magnetic pendulum [53]

as a representative model. It is mechanistically simple—being

low-dimensional and generated by simple physical laws—and

yet captures all characteristics of the basin prediction problem

in general: The system is multistable and predicting which

attractor a given initial condition will go to is nontrivial.

The system consists of an iron bob suspended by a massless

rod above three identical magnets, located at the vertices of an

equilateral triangle in the (x,y) plane (Fig. 1). The bob moves

under the inﬂuence of gravity, drag due to air friction, and

the attractive forces of the magnets. For simplicity, we treat

the magnets as magnetic point charges and assume that the

length of the pendulum rod is much greater than the distance

between the magnets, allowing us to describe the dynamics

using a small-angle approximation.

The resulting dimensionless equations of motion for the

pendulum bob are

¨x=−ω2

0x−a˙x+

3

i=1

˜xi−x

D(˜xi,˜yi)3,(1)

¨y=−ω2

0y−a˙y+

3

i=1

˜yi−y

D(˜xi,˜yi)3,(2)

where (˜xi,˜yi) are the coordinates of the ith magnet, ω0is the

pendulum’s natural frequency, and ais the damping coefﬁ-

cient. Here, D(˜x,˜y) denotes the distance between the bob and

a given point (˜x,˜y) in the magnets’ plane,

D(˜x,˜y)=(˜x−x)2+(˜y−y)2+h2,(3)

where his the bob’s height above the plane. The system’s four-

dimensional state is thus x=(x,y,˙x,˙y)T.

FIG. 1. Magnetic pendulum with three ﬁxed-point attractors and

the corresponding basins of attraction. (Left) Illustration of the mag-

netic pendulum system. Three magnets are placed on a ﬂat surface,

each drawn in the color we use to denote the corresponding basin of

attraction. The hollow circle indicates the (x,y) coordinates of the

pendulum bob, which together with the velocity ( ˙x,˙y) fully specify

the system’s state. (Right) Basins of attraction for the region of initial

conditions under study, namely states of zero initial velocity with

−1.5x0,y01.5.

We take the (x,y) coordinates of the magnets to be

(1

/√3,0), (−1

/2√3,−1

/2), and (−1

/2√3,1

/2). Unless stated oth-

erwise, we set ω0=0.5, a=0.2, and h=0.2 in our

simulations. These values are representative of all cases for

which the magnetic pendulum has exactly three stable ﬁxed

points, corresponding to the bob being at rest and pointed

toward one of the three magnets.

Previous studies have largely focused on chaotic dynamics

as a stress test of RC’s capabilities [2,6,8,9,13,16,17,25,41].

Here we take a different approach. With nonzero damping, the

magnetic pendulum dynamics is autonomous and dissipative,

meaning all trajectories must eventually converge to a ﬁxed

point. Except on a set of initial conditions of measure zero,

this will be one of the three stable ﬁxed points identiﬁed

earlier. Yet predicting which attractor a given initial condition

will go to can be far from straightforward, with the pendulum

wandering in an erratic transient before eventually settling to

one of the three magnets [53]. This manifests as complicated

basins of attraction with a “pseudo” (fat) fractal structure

(Fig. 1). We can control the “fractalness” of the basins by, for

example, varying the height of the pendulum h. This generates

basins with tunable complexity to test the performance of

(NG)RC.

III. IMPLEMENTATION OF STANDARD RC

Consider a dynamical system whose n-dimensional state x

obeys a set of nautonomous differential equations of the form

˙

x=f(x).(4)

033213-2

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

In general, the goal of reservoir computing is to approximate

the ﬂow of Eq. (4) in discrete time by a map of the form

xt+1=F(xt).(5)

Here, the index truns over a set of discrete times separated

by ttime units of the real system, where tis a timescale

hyperparameter generally chosen to be smaller than the char-

acteristic timescale(s) of Eq. (4).

In standard RC, one views the state of the real system as a

linear readout from an auxiliary reservoir system, whose state

is an Nr-dimensional vector rt. Speciﬁcally,

xt=W·rt,(6)

where Wis an n×Nrmatrix of trainable output weights. The

reservoir system is generally much higher dimensional (Nr

n), and its dynamics obey

rt+1=(1−α)rt+αf(Wr·rt+Win ·ut+b).(7)

Here Wris the Nr×Nrreservoir matrix, Win is the Nr×n

input matrix, and bis an Nr-dimensional bias vector. The input

utis an n-dimensional vector that represents either a state

of the real system (ut=xt) during training or the model’s

own output (ut=W·rt) during prediction. The nonlinear

activation function fis applied elementwise, where we adopt

the standard choice of f(·)=tanh(·). Finally, 0 <α1is

the so-called leaky coefﬁcient, which controls the inertia of

the reservoir dynamics.

In general, only the output matrix Wis trained, with Wr,

Win, and bgenerated randomly from appropriate ensembles.

We follow best practices [54] and previous studies in generat-

ing these latter components, speciﬁcally:

(i) Wris the weighted adjacency matrix of a directed

Erd˝

os-Rényi graph on Nrnodes. The link probability is 0 <

q1, and we allow for self-loops. We ﬁrst draw the link

weights uniformly and independently from [−1,1], and then

normalize them so that Wrhas a speciﬁed spectral radius

ρ>0. Here, qand ρare hyperparameters.

(ii) Win is a dense matrix, whose entries are initially drawn

uniformly and independently from [−1,1]. In the magnetic

pendulum, the state xt[and hence the input term utin Eq. (7)]

is of the form (x,y,˙x,˙y)T. To allow for different characteristic

scales of the position vs. velocity dynamics, we scale the ﬁrst

two columns of Win by sxand the last two columns by sv,

where sx,sv>0 are scale hyperparameters.

(iii) bhas its entries drawn uniformly and independently

from [−sb,sb], where sb>0 is a scale hyperparameter.

Training. To train an RC model from a given initial condi-

tion x0, we ﬁrst integrate the real dynamics (4) to obtain Ntrain

additional states {xt}t=1,...,Ntrain. We then iterate the reservoir

dynamics (7)forNtrain times from r0=0, using the training

data as inputs (ut=xt). This produces a corresponding se-

quence of reservoir states, {rt}t=1,...,Ntrain. Finally, we solve for

the output weights Wthat render Eq. (6) the best ﬁt to the

training data using Ridge regression with Tikhonov regular-

ization,

W=XRT(RRT+λI)−1.(8)

Here, X(R) is a matrix whose columns are the xt(rt)for

t=1,...,Ntrain,Iis the identity matrix, and λ>0isa

regularization coefﬁcient that prevents ill conditioning of the

weights, which can be symptomatic of overﬁtting the data.

Prediction. To simulate a trained RC model from a given

initial condition x0, we ﬁrst integrate the true dynamics

(4) forward in time to obtain a total of Nwarm-up 0 states

{xt}t=1,...,Nwarm-up. During the ﬁrst Nwarm-up iterations of the

discrete dynamics (7), the input term comes from the real

trajectory, i.e., ut=xt. Thereafter, we replace the input with

the model’s own output at the previous iteration (ut=W·rt).

This creates a closed-loop system from Eq. (7), which we

iterate without further input from the real system.

IV. CRITICAL DEPENDENCE OF STANDARD RC ON

WARM-UP TIME

Although standard RC is extremely powerful, it is known to

demand large warm-up periods (Nwarm-up) in certain problems

in order to be stable [18]. In principle, this could create a

dilemma for the problem of basin prediction, as long warm-up

trajectories from the real system will generally be unavailable

for initial conditions unseen during training. And even if such

data were available, the problem could be rendered moot

if the required warm-up exceeds the transient period of the

given initial condition [43]. Here, we systematically test RC’s

sensitivity to the warm-up time using the magnetic pendulum

system.

Our aim is to test standard RC under the most favorable

conditions. Accordingly, we will train each RC model on a

single initial condition x0=(x0,y0,0,0)Tof the magnetic

pendulum, and ask it to reproduce only the trajectory from that

initial condition. Likewise, before training, we systematically

optimize the RC hyperparameters for that initial condition

via Bayesian optimization, seeking to minimize an objective

function that combines both training and validation error. For

details of this process, we refer the reader to Appendix F.

In initial tests of our optimization procedure, we found

it largely insensitive to the reservoir connectivity (q), with

equally good training/validation performances achievable

across a range of qfrom 0.01 to 1. We likewise found little

impact of the regularization coefﬁcient over several orders

of magnitude, with the optimizer frequently pinning λat the

supplied lower bound of 10−8. Thus, in the interest of more

fully exploring the most important hyperparameters, we ﬁx

q=0.03 and λ=10−8. We then optimize the remaining ﬁve

continuous hyperparameters (ρ,sx,sv,sb,α) over the ranges

speciﬁed in Table II.

Throughout this section, we set t=0.02, which is

smaller than the characteristic timescales of the magnetic pen-

dulum. We train each RC model on Ntrain =4000 data points

of the real system starting from the given initial condition,

which when paired with the chosen tencompass both the

transient dynamics and convergence to one of the attractors.

We ﬁx the reservoir size at Nr=300, and we show that larger

reservoir sizes do not alter our results in Appendix G.

Figure 2shows the performance of an ensemble of RC

realizations with optimized hyperparameters for the ini-

tial condition (x0,y0)=(−1.2,0.75). Speciﬁcally, we show

the normalized root-mean-square error (NRMSE, see Ap-

pendix C) between the real and RC-predicted trajectory as

a function of warm-up time (twarm-up =Nwarm-up ·t). In

033213-3

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 2. Forecastability transition of standard RC. The initial

condition used for both training and prediction was (x0,y0)=

(−1.3,0.75). The optimized RC hyperparameters for this initial con-

dition are listed in Table III. We train an ensemble of 100 different

RC models with these hyperparameters, then simulate each from the

training initial condition using varying numbers of warm-up points.

In (a), the red line/bands denote the median/interquartile range of

the resulting prediction NRMSE as a function of warm-up time.

The blue curves show the total mechanical energy Eof the training

trajectory at the same times. We see a sharp drop in model error

at twarm−up ≈6, only shortly before Efalls below the height of the

potential barriers (U∗, dashed line) separating the three wells of

the magnetic pendulum. In (b), we overlay the xand ydynamics

of the real system for comparison. This conﬁrms that the “critical”

warm-up time in (a) aligns closely with the end of the transient.

The arrows in (a) denote the warm-up times used for the example

simulations in Fig. 3.

Fig. 2(a) we observe a sharp transition around twarm-up =6.

Before this point, we consistently have NRMSE =O(1),

meaning the RC error is comparable to the scale of the real

trajectory. But after the transition, the error is always quite

small (NRMSE 1).

We can gain physical insight about this “forecastability

transition” by analyzing the total mechanical energy of the

training trajectory,

E=1

2(˙x2+˙y2)+U(x,y).(9)

Here U(x,y) is the potential corresponding to Eqs. (1) and (2),

where we set U=0 at the minima corresponding to the three

attractors. Strikingly, the critical warm-up time occurs only

shortly before the energy drops below a critical value U∗—

deﬁned as the height of the potential barriers between the three

wells [Fig. 2(a)]. By this time, the system is unambiguously

“trapped” near a speciﬁc magnet, making only damped os-

cillations thereafter [Fig. 2(b)]. This suggests that even highly

optimized RC models will fail to reproduce convergence to the

FIG. 3. Sensitivity of standard RC performance to warm-up time.

We show example simulations from one RC realization in Fig. 2with

two different warm-up times (dashed lines). The initial condition and

optimized RC hyperparameters are the same as in Fig. 2.

correct attractor unless they have already been guided there by

data from the real system.

We illustrate this further in Fig. 3, showing example pre-

dictions from one RC realization considered above under two

different warm-up times: one above the critical value in Fig. 2,

and one below. Indeed, with sufﬁcient warm-up (left), the RC

trajectory is a near-perfect match to the real one, both before

and after the warm-up period. But if the warm-up time is even

slightly less than the critical value (right), the model quickly

diverges once the autonomous prediction begins. In this case,

the model fails to reproduce convergence to any ﬁxed-point

attractor, let alone the correct one, instead oscillating wildly.

This pattern holds when we repeat our experiment for

other initial conditions, re-optimizing hyperparameters and

retraining an ensemble of RC models for each (Figs. 11–

14 in Appendix G). In all cases, we see the same sharp

drop in RC prediction error at a particular warm-up time

(Figs. 11 and 13). Without at least this much warm-up time,

the models fail to capture the real dynamics even qualita-

tively, often converging to an unphysical state with nonzero

ﬁnal velocity (Figs. 12 and 14). Although there exist initial

conditions that require shorter warm-ups—such as (x0,y0)=

(1.0,−0.5)—this is only because those initial conditions have

shorter transients. Indeed, there are other initial conditions—

such as (x0,y0)=(1.75,1.6)—that have longer transients and

demand commensurately larger warm-up times (Figs. 13 and

14). In no case have we observed the RC dynamics staying

faithful to the real system unless the warm-up is comparable

to the transient period.

Note that the breakdown of RC with insufﬁcient warm-up

time cannot be attributed to an insufﬁciently complex model

vis-à-vis the only hyperparameter we have not optimized: the

reservoir size (Nr). Indeed, we have repeated our experiment

with reservoirs twice as large (Nr=600). Even with opti-

mized values of the other hyperparameters, we still see a sharp

033213-4

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

transition in the NRMSE at a warm-up time comparable to the

transient time (Fig. 15 in Appendix G).

In sum, we have shown that standard RC is unsuitable

for basin prediction in this representative multistable system.

Speciﬁcally, RC models can only reliably reproduce conver-

gence to the correct attractor when they have been guided to

its vicinity. This is true even with the beneﬁt of highly tuned

hyperparameters (Appendix E), and validation on only the

initial condition seen during training.

For the remainder of the paper, we instead turn to

next-generation reservoir computing (NGRC). Although it

is known that every NGRC model implicitly deﬁnes the

connectivity matrix and other parameters of a standard RC

model [41,42], there is no guarantee that the two architectures

would perform similarly in practice. In particular, NGRC is

known to demand substantially less warm-up time [41], po-

tentially avoiding the “catch-22” identiﬁed here for standard

RC. Can this cutting-edge framework succeed in learning

the magnetic pendulum and other paradigmatic multistable

systems?

V. IMPLEMENTATION OF NGRC

We implement the NGRC framework following

Refs. [41,43]. In NGRC, the update rule for the discrete

dynamics is taken as

xt+1=xt+W·gt,(10)

where gtis an m-dimensional feature vector, calculated from

the current state and k−1 past states, namely,

gt=g(xt,xt−1,...,xt−k+1).(11)

Here, k1 is a hyperparameter that governs the amount

of memory in the NGRC model, and Wis an n×mmatrix of

trainable weights.

We elaborate on the functional form of the feature em-

bedding gbelow. But in general, the features can be divided

into three groups: (i) one constant (bias) feature; (ii) mlin =

nk linear features, corresponding to the components of

{xt,xt−1,...,xt−k+1}; and ﬁnally (iii) mnonlin nonlinear fea-

tures, each a nonlinear transformation of the linear features.

The total number of features is thus m=1+mlin +mnonlin.

Training. Per Eq. (10), training an NGRC model amounts

to ﬁnding values for the weights Wthat give the best ﬁt for

the discrete update rule

yt=W·gt,(12)

where yt=xt+1−xt. Accordingly, we calculate pairs of in-

puts (gt) and next-step targets (yt) over Ntraj 1 training

trajectories from the real system (4), each of length Ntrain +k.

We then solve for the values of Wthat best ﬁt Eq. (12)inthe

least-squares sense via regularized Ridge regression, namely,

W=YGT(GGT+λI)−1.(13)

Here Y(G) is a matrix whose columns are the yt(gt). The

regularization coefﬁcient λplays the same role as in standard

RC [cf. Eq. (8)].

Prediction. To simulate a trained NGRC model from a

given initial condition x0, we ﬁrst integrate the true dynamics

(4) forward in time to obtain the additional k−1 states needed

to perform the ﬁrst discrete update according to Eqs. (10)

and (11). This is the warm-up period for the NGRC model.

Thereafter, we iterate Eqs. (10) and (11) as an autonomous

dynamical system, with each output becoming part of the

model’s input at the next time step. Thus in contrast to train-

ing, the model receives no data from the real system during

prediction except the k−1 “warm-up” states.

There is a clear parallel between NGRC [41,42] and

statistical forecasting methods [55] such as nonlinear vector-

autoregression (NVAR). However, as noted in Ref. [56], the

feature vectors of a typical NGRC model usually have far

more terms than NVAR methods, as the latter was designed

with interpretability in mind. It is the use of a library of

many candidate features—in addition to other details like the

typical training methods employed—that sets NGRC apart

from classic statistical forecasting approaches. In this way,

NGRC also resembles the sparse identiﬁcation of nonlinear

dynamics (SINDy) framework [57]. The differences here are

the intended tasks (ﬁnding parsimonious models vs ﬁtting the

dynamics), the optimization schemes (LASSO vs Ridge re-

gression), and NGRC’s inclusion of delayed states (generally

no delayed states for SINDy).

VI. SENSITIVE DEPENDENCE OF NGRC PERFORMANCE

ON READOUT NONLINEARITY

The importance of careful feature selection is well appre-

ciated for many machine learning frameworks [57,58]. Yet

one major appeal of NGRC is that the choice of nonlinear-

ity is considered to be of secondary importance; in many

systems studied to date, one can often bypass the feature

selection process by adopting some generic nonlinearities

(e.g., low-order polynomials). Indeed, applications of NGRC

to chaotic benchmark systems have shown good results even

when the features do not include all nonlinearities in the

underlying ODEs [41,59]. But can we expect this to be true

in general? Here, we test NGRC’s sensitivity to the choice

of feature embedding g(i.e., readout nonlinearity) in the

basin prediction problem. Speciﬁcally, we compare the per-

formance of three candidate NGRC models, in which the

nonlinearities are:

(I) Polynomials, speciﬁcally all unique monomials formed

by the 4kcomponents of {xt,xt−1,...,xt−k+1}, with degree

between 2 and dmax.

(II) As set of NRBF radial basis functions (RBF) applied to

the position coordinates r=(x,y) of each of the kstates. The

RBFs have randomly-chosen centers and a kernel function

with shape and scale similar to the magnetic force term.

(III) The exact nonlinearities in the magnetic pendulum

system. Namely, the xand ycomponents of the magnetic force

for each magnet, evaluated at each of the kstates.

The details of each model are summarized in Table I.

Recall that in addition to their unique nonlinear features, all

models contain one constant feature (set to 1 without loss of

generality) and 4klinear features.

Models I–III represent a hierarchy of increasing knowledge

about the real system. In Model I, we assume complete igno-

rance, hoping that the real dynamics are well approximated

by a truncated Taylor series. In Model II, we acknowledge

that this is a Newtonian central force problem and even the

033213-5

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

TABLE I. Summary of NGRC models constructed for the magnetic pendulum system. For each model described in Sec. VI, we provide

examples of the nonlinear features, their total number (mnonlin), and any additional hyperparameters. (I) Here, (( a

b)) denotes the number of

ways to choose bitems (with replacement) from a set of size a. (II) Here, rt=(xt,yt) are the position coordinates at time t,andciis the ith

RBF center in 2D, whose xand ycoordinates are drawn independently from a uniform distribution over [−1.5,1.5]. (III) Here, (˜xi,˜yi)are

coordinates of the ith magnet in the real system (i=1,2,3), and D(˜x,˜y)isasinEq.(3).

Model Nonlinear features Example term(s) Addl. hyperparameters mnonlin

I Polynomials x2

t˙yt

yt−2˙yt−1˙xt

max. degree dmax dmax

d=24k

d

II Radial basis functions 1

(rt−ci2+h2)3/2centers {ci}i=1,...,NRBF NRBFk

III Pendulum forces ˜yi−yt

D(˜xi,˜yi)36k

shape/scale of that force, but plead ignorance about the lo-

cations of the point sources. Finally, in Model III, we assume

perfect knowledge of the system that generated the time series.

Between the linear and nonlinear features, Model III includes

all terms in Eqs. (1) and (2).

Our principal question is: How well each NGRC model can

reproduce the basins of attraction of the magnetic pendulum

and in turn predict its long-term behavior? We focus on the

2D region of initial conditions depicted in Fig. 1, in which the

pendulum bob is released from rest at position (x0,y0), with

−1.5x0,y01.5. We train each model on Ntraj trajectories

generated by Eqs. (1) and (2) from initial conditions sampled

uniformly and independently from the same region. We then

compare the basins predicted by each trained NGRC model

with those of the real system (Appendix D). We deﬁne the

error rate (p) as the fraction of initial conditions for which

the basin predictions disagree.

Model I (polynomial features). For NGRC models equipped

with polynomial features, excellent training ﬁts can be

achieved (Figs. 8,16, and 17). Despite this, the models strug-

gle to reproduce the qualitative dynamics of the magnetic

pendulum, let alone the basins of attraction.

Figure 4(a) shows representative NGRC basin predictions

made by Model I using k=5, dmax =3. For the vast majority

of initial conditions, the NGRC trajectory does not converge

to any of the three attractors, instead diverging to (numerical)

inﬁnity in ﬁnite time (black points in the middle panels of

Fig. 4). Modest improvements can be obtained by including

polynomials up to degree dmax =5 (with k=3) as shown in

Fig. 4(b). But even here, the model succeeds only at learning

the part of each basin in the immediate vicinity of each attrac-

tor.

Unfortunately, eking out further improvements by in-

creasing the complexity of the NGRC model becomes

computationally prohibitive. When k=3 and dmax =5, for

example, the model already has m=6188 features. Likewise,

the feature matrix Gused in training has hundreds of millions

of entries. With higher values of kand/or dmax, the model

becomes too expensive to train and simulate on a standard

computer.

To ensure the instability of the polynomial NGRC models

is not caused by a poor choice of hyperparameters, we have

repeated our experiments for a wide range of time resolu-

tions t, training trajectory lengths Ntrain, numbers of training

trajectories Ntraj (Fig. 18 in Appendix G), and values for

regularization coefﬁcient λspanning ten orders of magnitude

(Fig. 19 in Appendix G). The performance of Model I was not

signiﬁcantly improved in any case.

Model II (radial basis features). For NGRC models using

radial basis functions as the readout nonlinearity, the solutions

no longer blow up as they did in Model I above. This is

encouraging though perhaps unsurprising, as the RBFs are

much closer to the nonlinearity in the original equations de-

scribing the magnetic pendulum system. Unfortunately, the

FIG. 4. NGRC models with polynomials as their nonlinearity fail

to capture the basins of the magnetic pendulum system. We tested

the basin predictions made by NGRC models with the number of

time-delayed states up to k=5 and the maximum degree of the

polynomial up to dmax =5. Two representative predictions are shown

for (a) k=5, dmax =3and(b)k=3, dmax =5. The left panels

show the ground-truth basins of the magnetic pendulum system;

The middle panels show the basins identiﬁed by the NGRC models,

where black points denote initial conditions from which the NGRC

trajectories diverge to inﬁnity. The right panels show the correctly

identiﬁed basins in colors and the misidentiﬁed basins in black. The

hyperparameters used in this case are t=0.01, λ=1, Ntraj =100,

and Ntrain =5000.

033213-6

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 5. NGRC models with radial basis functions as their readout nonlinearity struggle to capture the basins of the magnetic pendulum

system. We tested the basin predictions made by NGRC models whose nonlinear features include NRBF radial basis functions. Panel (a) shows

the ground truth, and the rest of the panels show representative NGRC predictions for (b) NRBF =10, (c) NRBF =50, (d) NRBF =100, (e)

NRBF =500, and (f) NRBF =1000. The error rates of the predictions are indicated in the lower left corners. The solutions no longer blow up

as they did for the polynomial nonlinearities in Model I, but the NGRC models still struggle to capture the basins even qualitatively. Even at

NRBF =1000, only the most prominent features of the basins around the origin are correctly identiﬁed. The other hyperparameters used are

t=0.01, λ=1, k=2, Ntraj =100, and Ntrain =5000.

accuracy of the NGRC models in predicting basins remains

poor.

Figure 5shows representative NGRC basin predictions as

the number of radial basis functions is increased from NRBF =

10 to NRBF =1,000. In all cases, ﬁts to the training data are

impeccable, with the root-mean-square error (RMSE) ranging

from 0.003 (NRBF =10) to 0.0005 (NRBF =1,000). As more

and more RBFs are included, the predictions can be visibly

improved, but this improvement is very slow. For example, at

NRBF =1,000 [Fig. 5(f)], the trained model predicts the cor-

rect basin for only 53.4% of the initial conditions under study

(p=0.466). Moreover, most of this accuracy is attributable

to the large central portions of the basins near the attractors,

in which the dynamics are closest to linear. Outside of these

regions, the NGRC basin map may appear fractal, but the

basin predictions themselves are scarcely better than random

guesses. This deprives us of accurate forecasts in precisely

the regions of the phase space where the outcome is most in

question.

As with the polynomial case above, we have repeated our

experiments for a wide range of hyperparameters to rule out

overﬁtting or poor model calibration (Figs. 18 and 19 in Ap-

pendix G). The accuracy of Model II cannot be meaningfully

improved with any of these changes.

Model III (exact nonlinearities). We next test NGRC mod-

els equipped with the exact form of the nonlinearity in the

magnetic pendulum system, namely the force terms in Eqs. (1)

and (2). This time, the NGRC models can perform exception-

ally well. Figure 20 (in Appendix G) shows the error rate of

NGRC basin predictions as a function of the time resolution

t. Without any ﬁne-tuning of the other hyperparameters,

NGRC models already achieve a near-perfect accuracy of

98.6%, provided tis sufﬁciently small.

Astonishingly, Model III’s predictions remain highly accu-

rate even when it is trained on a single trajectory (Ntraj =1)

from a randomly-selected initial condition. Here, NGRC can

produce a map of all three basins that is very close to the

ground truth (85.0% accuracy, Fig. 6), despite seeing data

from only one basin during training. This echoes previous

results reported for the Li-Sprott system [43], in which NGRC

accurately reconstructed the basins of all three attractors (two

chaotic, one quasiperiodic) from a single training trajectory.

But how can we account for this night-and-day difference with

the more system-agnostic models (I and II), which showed

poor performance despite 100-fold more training data?

The answer lies in the construction of the NGRC dynamics.

In possession of the exact terms in the underlying differential

equations, Eq. (10) can—by a suitable choice of the weights

W—emulate the action of a numerical integration method

from the linear-multistep family [60], whose order depends

on k. When k=1, for example, Eq. (10) can mimic an Euler

step. Thus, with a sufﬁciently small step size (t), it is not

surprising that an NGRC model equipped with exact nonlin-

earities can accurately reproduce the dynamics of almost any

differential equations.

This observation might explain the stellar performance

of NGRC in forecasting speciﬁc chaotic dynamics like the

Lorenz [41] and Li-Sprott systems [43]. The nonlinearities in

these systems are quadratic, meaning that so long as dmax 2,

Model I can exactly learn the underlying vector ﬁeld. The

only information to be learned is the coefﬁcient (W) that

appears before each (non)linear term (g) in the ODEs. This

in turn could explain why a single training trajectory sufﬁces

to convey information about the phase space as a whole.

Model III with uncertainty. Considering the wide gulf in

performance between NGRC models equipped with exact

nonlinearity and those equipped with polynomial/radial non-

linearity, it is natural to wonder whether there are some other

x0

-1 0 1

y0

-1

0

1

Real

x0

-1 0 1

-1

0

1

NGRC

x0

-1 0 1

-1

0

1

Errors

FIG. 6. NGRC models trained on a single trajectory can accu-

rately capture all three basins when the exact nonlinearity from the

magnetic pendulum system is adopted. The hyperparameters used

are t=0.01, λ=10−4,k=4, Ntraj =1, and Ntrain =1000, which

achieves an error rate pof 15%. No systematic optimization was

performed to ﬁnd these parameters. For example, by lowering t

to 0.0001 and increasing Ntrain to 100 000, we can further improve

the accuracy to over 98%.

033213-7

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 7. NGRC basin prediction accuracy when using the exact

nonlinearity from the pendulum equations but with small uncer-

tainties. Here, the NGRC models adopt the exact nonlinearity in

the magnetic pendulum system, except that the coordinates of the

magnets are perturbed by amounts uniformly drawn from [−δ, δ].

Each data point is obtained by averaging the error rate pover 10

independent realizations. We see that even a small uncertainty on

the order of δ=10−5can have a noticeable impact on the accuracy

of basin predictions. For δ>10−2, the NGRC predictions become

unreliable, approaching the 66.6% failure rate of random guesses.

Three representative NGRC-predicted basins are shown for δ=

10−3,δ=10−2,andδ=10−1, respectively (all with t=0.01).

We consider predictions with p<0.45 as useful since these in gen-

eral produce basins that are visually similar to the ground truth.

The other hyperparameters used are λ=1, k=2, Ntraj =100, and

Ntrain =5000.

smart choices of nonlinear features that perform well enough

without knowing the exact nonlinearity.

To explore this possibility, we consider a variant of Model

III in which we introduce small uncertainties in the non-

linear features, perturbing the assumed coordinates of each

magnet by small amounts drawn uniformly and indepen-

dently between [−δ, δ]. Here δis a hyperparameter much

smaller than the characteristic spatial scale in this system

(δ1). We train the model on Ntraj =100 trajectories from

the (unperturbed) real system, then measure how NGRC mod-

els perform in the presence of uncertainty about the exact

nonlinearity.

In Fig. 7, we see that even a ∼1% mismatch (δ=0.01) in

the coordinates of the magnets ( ˜xi,˜yi) is enough to make the

accuracy of NGRC predictions plunge from almost 100% to

below 50% (recall that even random guesses have an accuracy

of 33.3%). This extreme sensitivity of NGRC performance

to perturbations in the readout nonlinearity suggests that any

function other than the exact nonlinearity is unlikely to enable

reliable basin predictions in the NGRC model.

Training vs prediction divergence. In all models consid-

ered, we have seen that excellent ﬁts to the training data do

not guarantee accurate basin predictions for the rest of the

phase space. But surprisingly, NGRC models can predict the

wrong basin even for the precise initial conditions on which

they were trained.

For each of Models I–III, Fig. 8shows one example train-

ing trajectory for which the model attains a near-perfect ﬁt

to the ground truth, but the NGRC trajectory from the same

initial condition nonetheless goes to a different attractor. We

can rationalize this discrepancy by considering the difference

between the training and prediction phases as described in

Sec. V. During training, NGRC is asked to calculate the next

state given the kmost recent states from the ground truth

data. In contrast, during prediction, the model must make

this forecast based on its own (autonomous) trajectory. This

permits even tiny errors to compound over time, potentially

driving the dynamics to the wrong attractor. Though Fig. 8

shows only one example for each model, these cases are quite

common, regardless of the exact hyperparameters used [61].

Moreover, in Fig. 17 in Appendix G, we show that even

when the NGRC model predicts the correct attractor for a

given training initial condition, the intervening transient dy-

namics can deviate signiﬁcantly from the ground truth. This is

especially common and pronounced for NGRC models with

polynomial or radial nonlinearities [Figs. 17(a) and 17(b)].

In particular, the transient time—how long it takes to come

close to the given attractor—can be much larger or smaller

than in the real system. As such, reaching the correct attractor

does not necessarily imply that an NGRC model has learned

the true dynamics from a given training initial condition. To

say nothing of the (uncountably many) other initial conditions

unseen during training.

Inﬂuence of basin complexity. As motivated earlier, the

magnetic pendulum is a hard-to-predict system because of

its complicated basins of attraction, regardless of the exact

parameter values used. And indeed, we see the same sensi-

tivity of NGRC performance to readout nonlinearity for other

parameter values, such as h=0.3 and h=0.4 (Fig. 21 in

Appendix G).

As the height of the pendulum his increased, the basins do

tend to become less fractal-like. In Fig. 22 in Appendix G,we

vary the value of hand show that NGRC models trained with

polynomials fail even for the most regular basins (h=0.4).

On the other hand, NGRC models trained with radial basis

functions see their performance improve signiﬁcantly as the

basins become simpler. As expected, NGRC models equipped

with exact nonlinearity successfully capture the basins for all

values of hstudied.

VII. PREDICTING HIGH-DIMENSIONAL BASINS

WITH NGRC

How general are the results presented in Sec. VI? Could the

magnetic pendulum be pathological in some unexpected way,

with low-order polynomials or other generic features sufﬁcing

as the readout nonlinearity for most dynamical systems of

interest? To address this possibility, we investigate another

paradigmatic multistable system—identical Kuramoto oscil-

lators with nearest-neighbor coupling [62–64],

˙

θi=sin(θi+1−θi)+sin(θi−1−θi),i=1,...,n,(14)

where we assume a periodic boundary condition, so θn+1=θ1

and θ0=θn.Herenis the number of oscillators and hence the

dimension of the phase space, and θi(t)∈[0,2π) is the phase

of oscillator iat time t.

033213-8

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 8. NGRC models frequently mis-forecast even the initial conditions they were trained on. The panels correspond to (a) Model I, with

dmax =3; (b) Model II, with NRBF =500; and (c) Model III, with no uncertainty. Each model was trained on trajectories from Ntraj =100

initial conditions. For each model, we show one such initial condition for which NGRC (green) predicts the wrong basin, despite an excellent

ﬁt to the corresponding ground-truth training trajectory (pink). The left column of each panel shows the training ﬁt. The right column shows

the autonomous NGRC simulation from the same initial condition. The three magnets can be distinguished by their ycoordinates (cf. Fig. 1),

allowing us to indicate which one a given trajectory approaches via the three colored lines beside the second row in each panel. In each case,

the training ﬁt is impeccable, with the two curves overlapping to within visual resolution (left). Yet when the NGRC model is run autonomously

from the same initial condition, it quickly diverges from the ground truth, eventually going to an incorrect attractor (right). For all models, we

set the other hyperparameters as t=0.01, λ=1, k=2, and Ntrain =5000.

Aside from being well studied as a model system: the

Kuramoto system has two nice features. First, its sine non-

linearities are more “tame” than the algebraic fractions in the

magnetic pendulum, helping to untangle whether the sensitive

dependence observed in Sec. VI afﬂicts only speciﬁc non-

linearities. Second, we can easily change the dimension of

Eq. (14) by varying n, allowing us to test NGRC on high-

dimensional basins.

For n>4, Eq. (14) has multiple attractors in the form

of twisted states—phase-locked conﬁgurations in which the

oscillators’ phases make qfull twists around the unit circle,

satisfying θi=2πiq/n+C.Hereqis the winding number

of the state [62]. Twisted states are ﬁxed points of Eq. (14)

for all q, but only those with |q|<n/4 are stable [63]. The

corresponding basins of attraction can be highly complicated

[64], though not fractal-like as in the magnetic pendulum

system.

Similar to Sec. VI, we consider three classes of readout

nonlinearities assuming increasing knowledge of the underly-

ing system:

(1) Monomials spanned by the nk oscillator states

in ={θt,θt−1,...,θt−k+1}, with degree between 2

and dmax.

(2) Trigonometric functions of all scalars in , consisting

of sin(θi) and cos(θi) for all iand for integers 1 max .

(3) The exact nonlinearity in Eq. (14), namely sin(θi−θj)

for all pairs of connected nodes iand j.

To test the performance of different NGRC models on the

Kuramoto system, we ﬁrst set n=9 and use them to predict

basins in a two-dimensional (2D) slice of the phase space.

Speciﬁcally, we look at slices spanned by θ0+α1P1+α2P2,

αi∈(−π,π]. Here, P1and P2are n-dimensional binary ori-

entation vectors, while θ0is the base point at the center of the

slice.

Figure 9shows results for orientation vectors given by

P1=[1,0,1,0,1,0,1,0,1],P2=[0,1,0,1,0,1,0,1,0],

with θ0representing the two-twist state. We can see that

NGRC models with polynomial nonlinearity and trigono-

FIG. 9. Predicting basins of a Kuramoto oscillator network with

NGRC. We show representative NGRC predictions for basins of n=

9 locally-coupled Kuramoto oscillators. Here, we select a 2D slice of

the phase space centered at the twisted state with q=2. Basins are

color-coded by the absolute winding number |q|of the corresponding

attractor (blue: |q|=0; orange: |q|=1; green: |q|=2). Despite the

simple geometry of the basins and extensive optimization of hyper-

parameters, NGRC models with polynomial nonlinearity (dmax =2)

or trigonometric nonlinearity (max =5) have accuracies that are

comparable to random guesses. In contrast, with exact nonlinearity,

NGRC predictions are consistently over 99% correct. The other

hyperparameters are t=0.01, λ=10−5,k=2, Ntraj =1000, and

Ntrain =3000.

033213-9

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 10. NGRC with exact nonlinearity can accurately predict

high-dimensional basins. Here we train an NGRC model (equipped

with exact nonlinearity) to predict the high-dimensional basins of

n=83 locally-coupled Kuramoto oscillators. To test the NGRC per-

formance, we randomly select a 2D slice of the 83-dimensional phase

space and compare the predicted basins with the ground truth. Basins

are color-coded by the absolute winding number |q|of the cor-

responding attractor. Despite the fragmented and high-dimensional

nature of the basins, NGRC captures the intricate basin geometry

with ease. Without deliberate optimization of the hyperparameters,

NGRC can already achieve over 97% accuracy. The hyperparame-

ters used are t=0.01, λ=10−5,k=2, Ntraj =1000, and Ntrain =

3000.

metric nonlinearity fail utterly at capturing the simple

ground-truth basins. This is despite an extensive search over

the hyperparameters t,λ,dmax, and max . On the other hand,

the NGRC model with exact nonlinearity gives almost perfect

predictions for a wide range of hyperparameters. The hyper-

parameters in Fig. 9are chosen so that trajectories predicted

by the polynomial-NGRC model do not blow up.

Next, we show that the NGRC model with exact non-

linearity can predict basins in much higher dimensions and

with more complicated geometries. In Fig. 10,wesetn=83

and choose θ0to be a random point in the phase space.

The n-dimensional binary orientation vectors P1and P2are

constructed by randomly selecting n/2components to be

1 and the rest of the components are 0. (The results are not

sensitive to the particular realizations of P1and P2.) Using the

same hyperparameters as in Fig. 9, the NGRC model achieves

an accuracy of 97.5%. Visually, one would be hard pressed

to ﬁnd any difference between the predicted basins and the

ground truth.

VIII. DISCUSSION

When can we claim that a machine learning model like

RC has “learned” a dynamical system? One basic requirement

is a good training ﬁt, but this is far from sufﬁcient. Many

(NG)RC models have extremely low training error, but fail

completely during the prediction phase (Fig. 8). A stronger

criterion germane to chaotic systems is that the predicted

trajectory (beyond the training data) should reproduce the

“climate” of the strange attractor, for example replicating the

Lyapunov exponents [16]. Here, we propose that the ability

to accurately predict basins of attraction is another important

test a model must pass before it can be trusted as a proxy

of the underlying system. This applies as much to single-

attractor systems as it does to multistable ones, as a model

might produce spurious attractors not present in the original

dynamics [35].

Here, we have shown that there exist commonly-studied

systems for which basin prediction presents a steep challenge

to leading RC frameworks. In standard RC, the model must

be warmed up by an overwhelming majority of the transient

dynamics, essentially reaching the attractor before prediction

can begin. In contrast, NGRC requires minimal warm-up data

but is critically sensitive to the choice of readout nonlinearity,

with its ability to make basin predictions contingent on having

the exact features in the underlying dynamics. Though these

frameworks face very different challenges, each presents a

“catch-22”: The dynamics cannot be learned unless key in-

formation about the system is already known.

The basin prediction problem poses distinct chal-

lenges from the problem of forecasting chaotic systems,

a test (NG)RC has largely passed with ﬂying colors

[2,6,8,9,13,16,17,25,41]. In the latter case, the “climate”

of a strange attractor can still be accurately reproduced

even after the short-term prediction has failed [16]. It is

for this reason that—in the most commonly-used bench-

mark systems (Lorenz-63, Lorenz-96, Kuramoto-Sivashinsky,

etc.)—the transients are often deemed uninteresting and dis-

carded during training. But for multistable systems, to predict

which attractor an initial condition will converge to, the

transient dynamics are the whole story. Therefore, basin pre-

diction can be even more challenging than forecasting chaos.

This is true even in the idealized setting considered here,

wherein the attractors are ﬁxed points, and the state of the

system is fully observed without noise. As such, we suggest

that the magnetic pendulum and Kuramoto systems are ideal

benchmarks for data-driven methods aiming to learn multi-

stable nonlinear systems.

It has been established that both standard RC and NGRC

are universal approximators, which in appropriate limits can

achieve arbitrarily good ﬁts to any system’s dynamics [29,33].

But in practice, this is a rather weak guarantee. Unlike many

other machine learning tasks, achieving a good ﬁt to the ﬂow

of the real system [Eq. (5)] is only the ﬁrst step; we must

ultimately evolve the trained model as a dynamical system in

its own right. This can invite a problem of stability, similar

to the one faced by numerical integrators. Even when the ﬁt

to a system’s ﬂow is excellent, the autonomous dynamics of

an (NG)RC model can be unstable, causing the prediction to

diverge catastrophically from the true solution. How to ensure

the stability of a trained (NG)RC model in the general case is

a major open problem [54].

There are several exciting directions for future research

that follow naturally from our results. First, RC’s ability to

extract global information about a nonlinear system from

local transient trajectories is one of its most powerful as-

sets. Currently, we lack a theory that characterizes conditions

under which such extrapolations can be achieved by an RC

model. Second, several factors could contribute to the difﬁ-

culty of basin prediction for RC, including the nonlinearity

in the underlying equations, the geometric complexity of

the basins, and the nature of the attractors themselves. Can

we untangle the effects of these factors? Finally, although

standard RC requires relatively long initialization data, it

tends to show more robustness towards the choice of non-

linearity (i.e., the reservoir activation function) compared to

NGRC. Can we develop a new framework that combines stan-

dard RC’s robustness with NGRC’s efﬁciency and low data

requirement?

033213-10

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

RC is elegant, efﬁcient, and powerful; but to usher in a

new era of model-free learning of complex dynamical systems

[57,65–74], it needs to solve the catch-22 created by its fragile

dependence on readout nonlinearity (NGRC) or its reliance on

long initialization data for every new initial condition (stan-

dard RC).

ACKNOWLEDGMENTS

We thank D. Gauthier, M. Girvan, M. Levine, and A. Haji

for insightful discussions. Y.Z. acknowledges support from

the Schmidt Science Fellowship and Omidyar Fellowship.

S.P.C. acknowledges the support of the Natural Sciences and

Engineering Research Council of Canada (NSERC), grant

RGPIN-2020-05015.

APPENDIX A: SOFTWARE IMPLEMENTATION

All simulations in this study were performed in Ju-

lia. For standard RC (Secs. III and IV), we employ

the ReservoirComputing package in concert with the

BayesianOptimization package for hyperparameter op-

timization. For NGRC (Secs. V–VII), we use a custom

implementation as described in Sec. V. Our source code is

freely available online [75].

APPENDIX B: NUMERICAL INTEGRATION

For the purpose of obtaining trajectories of the

real system for training and validation, we use Julia’s

DifferentialEquations package to integrate all

continuous equations of motion (4) using a ninth-order

integration scheme (Vern9), with absolute and relative error

tolerances both set to 10−10. We stress that the hyperparameter

thas no relation to the numerical integration step size,

which is determined adaptively to achieve the desired error

tolerances. Instead, tsimply represents the timescale at

which we seek to model the real dynamics via (NG)RC,

and hence the resolution at which we sample the continuous

trajectories to generate training and validation data.

APPENDIX C: (NORMALIZED) ROOT-MEAN-SQUARE

ERROR

Given an (NG)RC predicted trajectory ˜

xtand a correspond-

ing trajectory of the real system xt—each of length N—we

calculate the root-mean-square error (RMSE) as

RMSE =1

N

txt−˜

xt2,(C1)

where ·denotes the Euclidean norm. To obtain a normal-

ized version of this (NRMSE)—which we use as part of the

objective function to optimize standard RC hyperparameters

(Appendix F)—we ﬁrst rescale each component of xtand ˜

xt

by their range in the real system, e.g.,

xi,t→xi,t

xi,max −xi,min

,(C2)

where the maximum (xi,max) and minimum (xi,min ) for dimen-

sion i=1,...,nof the state space are calculated over the

corresponding training data.

APPENDIX D: BASIN PREDICTION

We associate a given condition x0with a basin of attraction

by simulating the real (NGRC) dynamics for a total of T

time units (T/titerations). We then identify the closest

stable ﬁxed point at the end of the trajectory. In the magnetic

pendulum, this is taken as the closest magnet. In the Kuramoto

system, we calculate the winding number |q|and use it to

identify the corresponding twisted state. We use T=100

for both systems, which is sufﬁcient for all initial conditions

under study to approach one of the stable ﬁxed points.

APPENDIX E: SUPPLEMENTAL TABLES

TABLE II. Optimizable hyperparameters in standard RC. Each hyperparameter is optimized in logarithmic scale between the given bounds.

Hyperparameter Meaning Lower bound Upper bound

ρSpectral radius of reservoir matrix (W)10

−31

sxInput scaling (position) 10−310

svInput scaling (velocity) 10−310

sbBias scaling 10−31

αLeaky coefﬁcient 10−31

TABLE III. Values of optimized hyperparameters for standard RC. We list the value of each hyperparameter after optimization to ﬁve

signiﬁcant ﬁgures.

Hyperparameter values

Initial condition (x0,y0) Figures ρsxsvsbα

(–1.3, 0.75) 2and 30.44077 5.5064 0.027882 1.0000 1.0000

(1.0, –0.5) 11 and 12 0.40633 5.0712 0.44366 1.0000 1.0000

(1.75, 1.6) 13 and 14 0.39391 2.9633 0.26557 1.0000 1.0000

033213-11

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

APPENDIX F: HYPERPARAMETER OPTIMIZATION

Given an initial condition x0=(x0,y0,˙x0,˙y0)Tof the mag-

netic pendulum system, we identify an optimal set of RC

hyperparameters using Bayesian optimization. The goal here

is to ﬁnd the minimizer p∗of a (noisy) function F(p), i.e.,

p∗=arg min

p

F(p).(F1)

In our setting, p=(ρ,sx,sv,sb,α)Tis a vector of our opti-

mizable hyperparameters, and Fis a scalar objective function

measuring the error between the real system and a trained RC

model generated with those hyperparameters. Typically, this

objective function incorporates the NRMSE (Appendix C)

between the real and RC-predicted trajectories [30]. But what

is the best choice?

We found that the NRMSE during training is a poor opti-

mization objective. In the magnetic pendulum, the resulting

RC dynamics tend to blow up during the subsequent au-

tonomous prediction, rather than staying near the ﬁxed point

of the real system. Accordingly, we use an objective func-

tion that incorporates both training and validation NRMSE.

Speciﬁcally, for a given set of hyperparameters p, we generate

one random RC model and train it to the ﬁrst Ntrain =4000

steps of the real trajectory starting from x0. This yields a

training NRMSE εtrain. We then simulate the trained RC model

for an additional Nvalidation time steps, picking up where the

training left off. This yields a validation NRMSE, εvalidation.

We then calculate F(p)as

F(p)=log(εtrain )+log(εvalidation ).(F2)

We ﬁnd that this approach yields optimal RC models that have

excellent training ﬁts, but remain “well behaved” (i.e., nearly

stationary) beyond the training phase.

All hyperparameter optimization for standard RC was

performed using the BayesianOptimization package in

Julia. We model the landscape of Fvia Gaussian process

regression to observed values of (p,F(p)). We employ the

default squared-exponential (Gaussian) kernel, with tunable

parameters corresponding to the standard deviation plus the

length scale of each dimension of the hyperparameter space.

We ﬁrst bootstrap the kernel (ﬁt its parameters) using 200

random sets of hyperparameters pgenerated log-uniformly

between the bounds in Table II via Latin hypercube sam-

pling. At every step of the process thereafter, we acquire a

new candidate value of pvia the commonly-used expected

improvement strategy. We repeat this process for a total of 500

iterations, returning the observed minimizer of F(p). Every

50 iterations, we reﬁt the kernel parameters via maximum a

posteriori (MAP) estimation. To account for the stochasticity

in Fdue to W,Win, and b, we generate 10 realizations

of the RC model for each candidate set of hyperparameters

p. Thus, over the course of the optimization, we evaluate

Fa total of 12 000 times—2000 for the initial bootstrap-

ping period, and an additional 10 000 during the subsequent

optimization.

APPENDIX G: SUPPLEMENTAL FIGURES

(a)

(b)

FIG. 11. Forecastability transition of standard RC. Counterpart to Fig. 2with the initial condition (x0,y0)=(1.0,−0.5). The optimized

RC hyperparameters for this initial condition are listed in Table III.

033213-12

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 12. Sensitivity of standard RC performance to warm-up time. Counterpart to Fig. 3with the initial condition (x0,y0)=(1.0,−0.5).

The respective warm-up times indicated by the dashed lines are the same as in Fig. 11.

(a)

(b)

FIG. 13. Forecastability transition of standard RC. Counterpart to Fig. 2with the initial condition (x0,y0)=(1.75,1.6). The optimized

RC hyperparameters for this initial condition are listed in Table III.

033213-13

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 14. Sensitivity of standard RC performance to warm-up time. Counterpart to Fig. 3with the initial condition (x0,y0)=(1.75,1.6).

The respective warm-up times indicated by the dashed lines are the same as in Fig. 13.

(a) (b) (c)

FIG. 15. Forecastability transition of standard RC. Counterparts to Figs. 2,11,13 using a larger reservoir size of Nr=600.

033213-14

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 16. NGRC models have excellent training ﬁt for all readout nonlinearities tested. Each panel shows an NGRC model with a different

readout nonlinearity: (a) polynomials with dmax =3; (b) radial basis functions with NRBF =500; (c) exact nonlinearity. The trajectories are

color-coded in time—they begin with dark purple points and end with bright green points. The three ﬁxed points are represented as crosses.

The root-mean-square error (RMSE) for each training trajectory is shown at the bottom of the panel. The other hyperparameters used are

t=0.01, λ=1, k=2, Ntraj =100, and Ntrain =5000.

FIG. 17. NGRC models can fail to reproduce the correct transient dynamics even when the attractor is correctly predicted. Counterpart

to Fig. 8, showing examples of training initial conditions for which the NGRC predicted trajectory (green, right columns) goes to the correct

attractor, but the transient dynamics differs markedly from the ground truth (pink). All hyperparameters are the same as in Fig. 8.

100101102103

Ntraj

0.0

0.2

0.4

0.6

0.8

1.0

p

Random guess

Useful prediction

Polynomial

Radial

Exact

FIG. 18. Error rate pas a function of the number of training trajectories Ntraj for NGRC models trained with polynomial, radial, and

exact nonlinearity. Each data point is obtained by averaging the error rate pover 10 independent realizations. NGRC cannot produce useful

predictions with polynomial nonlinearity (dmax =3) or radial nonlinearity (NRBF =100) no matter how many training trajectories are used.

With the exact nonlinearity from the magnetic pendulum equations, NGRC can make accurate predictions once trained on about 10 trajectories.

Beyond this, more training trajectories yield only marginal increases in accuracy. The other hyperparameters used here are t=0.01, k=2,

λ=1, and Ntrain =5000.

033213-15

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

10−410−2100102104

λ

0.0

0.2

0.4

0.6

0.8

1.0

p

Random guess

Useful prediction

Polynomial

Radial

Exact

FIG. 19. Error rate pas a function of the regularization coefﬁcient λfor NGRC models trained with polynomial, radial, and exact

nonlinearity. Each data point is obtained by averaging the error rate pover 10 independent realizations. No choice of λcan make NGRC

produce useful predictions with polynomial nonlinearity (dmax =3) or radial nonlinearity (NRBF =100). In contrast, exact nonlinearity can

produce useful predictions for a wide range of λ(between 10−2and 102). The other hyperparameters used are t=0.01, k=2, Ntraj =100,

and Ntrain =5000.

FIG. 20. Dependence of NGRC basin prediction accuracy on the time resolution t. Here, the NGRC models adopt the exact nonlinearity

in the magnetic pendulum system. Each data point is obtained by averaging the error rate pover 10 independent realizations (error bars are

smaller than the size of the symbol). The accuracy of the basin predictions can be signiﬁcantly improved by taking smaller steps before it

plateaus for tbelow a certain threshold. For t=0.0003125 (the leftmost points) and k=3, the NGRC model consistently achieves an

accuracy around 98.6%. Even for t=0.04 at the other end of the plot (right before NGRC becomes unstable and the solutions blow up),

the features of the true basins are qualitatively preserved. Representative NGRC-predicted basins are shown for the two tvalues discussed

above. The other hyperparameters used are λ=1, Ntraj =100, and Ntrain =20 000.

FIG. 21. NGRC basin prediction accuracy when using the exact nonlinearity but with small uncertainties. Same as Fig. 7, but with the

height of the pendulum set to (a) h=0.3and(b)h=0.4.

033213-16

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

FIG. 22. Basin predictions generally become easier when the basins are less fractal. We show representative NGRC predictions for basins

of the magnetic pendulum system with h=0.2, 0.3, and 0.4. As his increased, the basins become less fractal. For the NGRC predictions, the

error rate is marked in white and the wrong predictions are highlighted in black. With polynomial nonlinearity (dmax =3), NGRC predictions

are worse than random guesses for all htested. With radial nonlinearity (NRBF =500), NGRC predictions become increasingly better as his

increased. With exact nonlinearity, NGRC predictions are consistently good, and the best accuracy is achieved at h=0.3inthisparticular

case. The other hyperparameters used are t=0.01, λ=1, k=2, Ntraj =100, and Ntrain =5000.

[1] W. Maass, T. Natschläger, and H. Markram, Real-time com-

puting without stable states: A new framework for neural

computation based on perturbations, Neural Comput. 14, 2531

(2002).

[2] H. Jaeger and H. Haas, Harnessing nonlinearity: Predicting

chaotic systems and saving energy in wireless communication,

Science 304, 78 (2004).

[3] M. Lukoševiˇ

cius and H. Jaeger, Reservoir computing ap-

proaches to recurrent neural network training, Comput. Sci.

Rev. 3, 127 (2009).

[4] L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert,

S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I.

Fischer, Information processing using a single dynamical node

as complex system, Nat. Commun. 2, 468 (2011).

[5] D. Canaday, A. Grifﬁth, and D. J. Gauthier, Rapid time series

prediction with a hardware-based reservoir computer, Chaos 28,

123119 (2018).

[6] T. L. Carroll, Using reservoir computers to distinguish chaotic

signals, Phys.Rev.E98, 052209 (2018).

[7] P. R. Vlachas, J. Pathak, B. R. Hunt, T. P. Sapsis, M. Girvan,

E. Ott, and P. Koumoutsakos, Backpropagation algorithms and

reservoir computing in recurrent neural networks for the fore-

casting of complex spatiotemporal dynamics, Neural Networks

126, 191 (2020).

[8] M. Rafayelyan, J. Dong, Y. Tan, F. Krzakala, and S. Gigan,

Large-Scale Optical Reservoir Computing for Spatiotempo-

ral Chaotic Systems Prediction, Phys. Rev. X 10, 041037

(2020).

033213-17

YUANZHAO ZHANG AND SEAN P. CORNELIUS PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

[9] H. Fan, J. Jiang, C. Zhang, X. Wang, and Y.-C. Lai, Long-term

prediction of chaotic systems with machine learning, Phys. Rev.

Res. 2, 012080(R) (2020).

[10] G. A. Gottwald and S. Reich, Combining machine learning

and data assimilation to forecast dynamical systems from noisy

partial observations, Chaos 31, 101103 (2021).

[11] Y. Zhong, J. Tang, X. Li, B. Gao, H. Qian, and H. Wu, Dy-

namic memristor-based reservoir computing for high-efﬁciency

temporal signal processing, Nat. Commun. 12, 408 (2021).

[12] K. Nakajima and I. Fischer, Reservoir Computing (Springer,

New York, 2021)

[13] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott, Model-Free

Prediction of Large Spatiotemporally Chaotic Systems from

Data: A Reservoir Computing Approach, Phys.Rev.Lett.120,

024102 (2018).

[14] Z. Lu, B. R. Hunt, and E. Ott, Attractor reconstruction by

machine learning, Chaos 28, 061104 (2018).

[15] L. Grigoryeva, A. Hart, and J.-P. Ortega, Learning strange at-

tractors with reservoir systems, Nonlinearity 36, 4674 (2023).

[16] J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, and E. Ott, Using

machine learning to replicate chaotic attractors and calculate

Lyapunov exponents from data, Chaos 27, 121102 (2017).

[17] J. Z. Kim, Z. Lu, E. Nozari, G. J. Pappas, and D. S. Bassett,

Teaching recurrent neural networks to infer global temporal

structure from local examples, Nat. Mach. Intell. 3, 316 (2021).

[18] A. Röhm, D. J. Gauthier, and I. Fischer, Model-free inference

of unseen attractors: Reconstructing phase space features from

a single noisy trajectory using reservoir computing, Chaos 31,

103127 (2021).

[19] M. Roy, S. Mandal, C. Hens, A. Prasad, N. Kuznetsov, and

M. D. Shrimali, Model-free prediction of multistability using

echo state network, Chaos 32, 101104 (2022).

[20] T. Arcomano, I. Szunyogh, A. Wikner, J. Pathak, B. R. Hunt,

and E. Ott, A hybrid approach to atmospheric modeling that

combines machine learning with a physics-based numerical

model, J. Adv. Model. Earth Syst. 14, e2021MS002712 (2022).

[21] P. Antonik, M. Gulina, J. Pauwels, and S. Massar, Using a

reservoir computer to learn chaotic attractors, with applications

to chaos synchronization and cryptography, Phys.Rev.E98,

012215 (2018).

[22] T. Weng, H. Yang, C. Gu, J. Zhang, and M. Small, Synchro-

nization of chaotic systems and their machine-learning models,

Phys. Rev. E 99, 042203 (2019).

[23] H. Fan, L.-W. Kong, Y.-C. Lai, and X. Wang, Anticipating syn-

chronization with machine learning, Phys.Rev.Res.3, 023237

(2021).

[24] L.-W. Kong, H.-W. Fan, C. Grebogi, and Y.-C. Lai, Machine

learning prediction of critical transition and system collapse,

Phys. Rev. Res. 3, 013090 (2021).

[25] D. Patel and E. Ott, Using machine learning to anticipate

tipping points and extrapolate to post-tipping dynamics of non-

stationary dynamical systems, Chaos 33, 023143 (2023).

[26] A. Banerjee, J. D. Hart, R. Roy, and E. Ott, Machine Learning

Link Inference of Noisy Delay-Coupled Networks with Opto-

electronic Experimental Tests, Phys. Rev. X 11, 031014 (2021).

[27] T. L. Carroll and L. M. Pecora, Network structure effects in

reservoir computers, Chaos 29, 083130 (2019).

[28] J. Jiang and Y.-C. Lai, Model-free prediction of spatiotemporal

dynamical systems with recurrent neural networks: Role of

network spectral radius, Phys.Rev.Res.1, 033056 (2019).

[29] L. Gonon and J.-P. Ortega, Reservoir computing universality

with stochastic inputs, IEEE Trans. Neural Netw. Learning Syst.

31, 100 (2019).

[30] A. Grifﬁth, A. Pomerance, and D. J. Gauthier, Forecasting

chaotic systems with very low connectivity reservoir computers,

Chaos 29, 123108 (2019).

[31] T. L. Carroll, Do reservoir computers work best at the edge of

chaos?, Chaos 30, 121109 (2020).

[32] R. Pyle, N. Jovanovic, D. Subramanian, K. V. Palem, and A. B.

Patel, Domain-driven models yield better predictions at lower

cost than reservoir computers in lorenz systems, Philos. Trans.

R. Soc. A 379, 20200246 (2021).

[33] A. G. Hart, J. L. Hook, and J. H. Dawes, Echo state networks

trained by Tikhonov least squares are L2 (μ) approximators of

ergodic dynamical systems, Physica D 421, 132882 (2021).

[34] J. A. Platt, A. Wong, R. Clark, S. G. Penny, and H. D.

Abarbanel, Robust forecasting using predictive generalized syn-

chronization in reservoir computing, Chaos 31, 123118 (2021).

[35] A. Flynn, V. A. Tsachouridis, and A. Amann, Multifunctionality

in a reservoir computer, Chaos 31, 013125 (2021).

[36] T. L. Carroll, Optimizing memory in reservoir computers,

Chaos 32, 023123 (2022).

[37] J. Pathak, A. Wikner, R. Fussell, S. Chandra, B. R. Hunt, M.

Girvan, and E. Ott, Hybrid forecasting of chaotic processes:

Using machine learning in conjunction with a knowledge-based

model, Chaos 28, 041101 (2018).

[38] A. Wikner, J. Pathak, B. Hunt, M. Girvan, T. Arcomano, I.

Szunyogh, A. Pomerance, and E. Ott, Combining machine

learning with knowledge-based modeling for scalable forecast-

ing and subgrid-scale closure of large, complex, spatiotemporal

systems, Chaos 30, 053111 (2020).

[39] K. Srinivasan, N. Coble, J. Hamlin, T. Antonsen, E. Ott, and M.

Girvan, Parallel Machine Learning for Forecasting the Dynam-

ics of Complex Networks, Phys.Rev.Lett.128, 164101 (2022).

[40] W. A. S. Barbosa, A. Grifﬁth, G. E. Rowlands, L. C. G.

Govia, G. J. Ribeill, M.-H. Nguyen, T. A. Ohki, and D. J.

Gauthier, Symmetry-aware reservoir computing, Phys.Rev.E

104, 045307 (2021).

[41] D. J. Gauthier, E. Bollt, A. Grifﬁth, and W. A. Barbosa,

Next generation reservoir computing, Nat. Commun. 12, 5564

(2021).

[42] E. Bollt, On explaining the surprising success of reservoir

computing forecaster of chaos? The universal machine learning

dynamical system with contrast to VAR and DMD, Chaos 31,

013108 (2021).

[43] D. J. Gauthier, I. Fischer, and A. Röhm, Learning unseen coex-

isting attractors, Chaos 32, 113107 (2022).

[44] J. J. Hopﬁeld, Neural networks and physical systems with emer-

gent collective computational abilities., Proc. Natl. Acad. Sci.

USA 79, 2554 (1982).

[45] H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, Visualiz-

ing the loss landscape of neural nets, in Advances in Neural

Information Processing Systems (NeurIPS), Vol. 31 (Curran

Associates, Inc., 2018).

[46] A. E. Teschendorff and A. P. Feinberg, Statistical mechan-

ics meets single-cell biology, Nat. Rev. Genet. 22, 459

(2021).

[47] D. A. Rand, A. Raju, M. Sáez, F. Corson, and E. D. Siggia,

Geometry of gene regulatory dynamics, Proc. Natl. Acad. Sci.

USA 118, e2109729118 (2021).

033213-18

CATCH-22s OF RESERVOIR COMPUTING PHYSICAL REVIEW RESEARCH 5, 033213 (2023)

[48] G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subramanian,

A. Solomon, J. Gould, S. Liu, S. Lin, P. Berube et al.,

Optimal-transport analysis of single-cell gene expression iden-

tiﬁes developmental trajectories in reprogramming, Cell 176,

928 (2019).

[49] M. Sáez, R. Blassberg, E. Camacho-Aguilar, E. D. Siggia, D. A.

Rand, and J. Briscoe, Statistically derived geometrical land-

scapes capture principles of decision-making dynamics during

cell fate transitions, Cell Syst. 13, 12 (2022).

[50] P. J. Menck, J. Heitzig, N. Marwan, and J. Kurths, How basin

stability complements the linear-stability paradigm, Nat. Phys.

9, 89 (2013).

[51] P. J. Menck, J. Heitzig, J. Kurths, and H. Joachim Schellnhuber,

How dead ends undermine power grid stability, Nat. Commun.

5, 3969 (2014).

[52] Note that the warm-up time series is different from the training

data and is only used after training has been completed.

[53] A. E. Motter, M. Gruiz, G. Károlyi, and T. Tél, Doubly Tran-

sient Chaos: Generic Form of Chaos in Autonomous Dissipative

Systems, Phys. Rev. Lett. 111, 194101 (2013).

[54] M. Lukoševiˇ

cius, A practical guide to applying echo state

networks, in Neural Networks: Tricks of the Trade (Springer,

Berlin, 2012), pp. 659–686.

[55] S. A. Billings, Nonlinear System Identiﬁcation: NARMAX Meth-

ods in the Time, Frequency, and Spatio-Temporal Domains (John

Wiley & Sons, Hoboken, NJ, 2013).

[56] L. Jaurigue and K. Lüdge, Connecting reservoir computing with

statistical forecasting and deep neural networks, Nat. Commun.

13, 227 (2022).

[57] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering gov-

erning equations from data by sparse identiﬁcation of nonlinear

dynamical systems, Proc. Natl. Acad. Sci. USA 113, 3932

(2016).

[58] A. Rahimi and B. Recht, Random features for large-scale

kernel machines, in Advances in Neural Information Pro-

cessing Systems (NeurIPS), Vol. 20 (Curran Associates, Inc.,

2007).

[59] S. Shahi, F. H. Fenton, and E. M. Cherry, Prediction of chaotic

time series using recurrent neural networks and reservoir com-

puting techniques: A comparative study, Mach. Learn. Appl. 8,

100300 (2022).

[60] J. C. Butcher, Numerical Methods for Ordinary Differential

Equations (John Wiley & Sons, Hoboken, NJ, 2016).

[61] We did not observe any other attractors other than the three

ground-truth ﬁxed points and inﬁnity for all NGRC models con-

sidered. The absence of more complicated attractors (compared

to RC) is likely due to the simpler architecture of NGRC and

the dissipativity of