ArticlePDF Available

Understanding Latent Timescales in Neural Ordinary Differential Equation Models of Advection-Dominated Dynamical Systems

Authors:

Abstract

The neural ordinary differential equation (ODE) framework has shown considerable promise in recent years in developing highly accelerated surrogate models for complex physical systems characterized by partial differential equations (PDEs). For PDE-based systems, state-of-the-art neural ODE strategies leverage a two-step procedure to achieve this acceleration: a nonlinear dimensionality reduction step provided by an autoencoder, and a time integration step provided by a neural-network based model for the resultant latent space dynamics (the neural ODE). This work explores the applicability of such autoencoder-based neural ODE strategies for PDEs in which advection terms play a critical role. More specifically, alongside predictive demonstrations, physical insight into the sources of model acceleration (i.e., how the neural ODE achieves its acceleration) is the scope of the current study. Such investigations are performed by quantifying the effects of both autoencoder and neural ODE components on latent system time-scales using eigenvalue analysis of dynamical system Jacobians. To this end, the sensitivity of various critical training parameters – de-coupled versus end-to-end training, latent space dimensionality, and the role of training trajectory length, for example – to both model accuracy and the discovered latent system timescales is quantified. This work specifically uncovers the key role played by the training trajectory length (the number of rollout steps in the loss function during training) on the latent system timescales: larger trajectory lengths correlate with an increase in limiting neural ODE time-scales, and optimal neural ODEs are found to recover the largest time-scales of the full-order (ground-truth) system. Demonstrations are performed across fundamentally different unsteady fluid dynamics configurations influenced by advection: (1) the Kuramoto-Sivashinsky equations (2) Hydrogen-Air channel detonations (the compressible reacting Navier-Stokes equations with detailed chemistry), and (3) 2D Atmospheric flow.
Contents lists available at ScienceDirect
Physica D
journal homepage: www.elsevier.com/locate/physd
Understanding latent timescales in neural ordinary differential equation
models of advection-dominated dynamical systems
Ashish S. Nair a,b, Shivam Barwey a,, Pinaki Pal a, Jonathan F. MacArt b, Troy Arcomano a,
Romit Maulik a,c
aArgonne National Laboratory, 9700 South Cass Avenue, Lemont, 60439, IL, USA
bUniversity of Notre Dame, Holy Cross Dr, Note Dame, 46556, IN, USA
cPennsylvania State University, E327 Westgate Building, University Park, 16802, PA, USA
A R T I C L E I N F O
Communicated by Victor M. Perez-Garcia
Keywords:
Data-driven modeling
Neural ODEs
Time-scales
Advection-dominated dynamical systems
Detonations
Atmospheric flow
A B S T R A C T
The neural ordinary differential equation (ODE) framework has shown considerable promise in recent years
in developing highly accelerated surrogate models for complex physical systems characterized by partial
differential equations (PDEs). For PDE-based systems, state-of-the-art neural ODE strategies leverage a two-step
procedure to achieve this acceleration: a nonlinear dimensionality reduction step provided by an autoencoder,
and a time integration step provided by a neural-network based model for the resultant latent space dynamics
(the neural ODE). This work explores the applicability of such autoencoder-based neural ODE strategies for
PDEs in which advection terms play a critical role. More specifically, alongside predictive demonstrations,
physical insight into the sources of model acceleration (i.e., how the neural ODE achieves its acceleration) is
the scope of the current study. Such investigations are performed by quantifying the effects of both autoencoder
and neural ODE components on latent system time-scales using eigenvalue analysis of dynamical system
Jacobians. To this end, the sensitivity of various critical training parameters de-coupled versus end-to-end
training, latent space dimensionality, and the role of training trajectory length, for example to both model
accuracy and the discovered latent system timescales is quantified. This work specifically uncovers the key role
played by the training trajectory length (the number of rollout steps in the loss function during training) on
the latent system timescales: larger trajectory lengths correlate with an increase in limiting neural ODE time-
scales, and optimal neural ODEs are found to recover the largest time-scales of the full-order (ground-truth)
system. Demonstrations are performed across fundamentally different unsteady fluid dynamics configurations
influenced by advection: (1) the Kuramoto–Sivashinsky equations (2) Hydrogen-Air channel detonations (the
compressible reacting Navier–Stokes equations with detailed chemistry), and (3) 2D Atmospheric flow.
1. Introduction
Numerical solutions of partial differential equations (PDEs), if avail-
able, can be used by domain scientists to not only probe complex
physical behavior at unprecedented levels of accuracy and detail, but
also to accelerate design optimization workflows for engineering de-
vices. Simulating fluid dynamics, for example, requires numerically
solving the Navier–Stokes PDEs [1]. At industrial operating condi-
tions of interest those that characterize flows over aircraft wings,
in scramjets, and in gas turbine combustion chambers, for instance
these equations admit multi-scale and multi-physics behavior stem-
ming from interactions between turbulence, shock waves, and potential
chemical reactions. As a result, to generate reliable simulations of
such devices, PDE solution procedures need to resolve all length-scales
Corresponding author.
E-mail address: sbarwey@anl.gov (S. Barwey).
and time-scales that characterize these physical phenomena. These
spatiotemporal resolution requirements render (a) long-time direct nu-
merical simulation (DNS) of the aforementioned systems infeasible, and
(b) real-time simulation-based actuation or control strategies impracti-
cal, despite recent advances in supercomputing technology and physics
simulation hardware [2,3].
Reduced-order models (ROMs) are a class of modeling approaches
that seek to eliminate intrinsic costs associated with physics-based
simulations [4], with the goal of enabling long-time simulation ca-
pability. The general ROM objective is to achieve drastic reduction
in the PDE-derived dynamical system, which is the high-dimensional
nonlinear set of ordinary differential equations (ODEs) produced from
PDE discretization. In the context of fluid flows, this dynamical system
https://doi.org/10.1016/j.physd.2025.134650
Received 27 May 2024; Received in revised form 27 February 2025; Accepted 22 March 2025
Physica D 476 (2025) 134650
Available online 1 April 2025
0167-2789/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
A.S. Nair et al.
describes the evolution of a so-called state vector described by turbulent
fluid density, momentum, and species concentration fields on a grid,
and can readily reach on the order of hundreds of millions (and even
billions) of degrees-of-freedom. As such, to achieve model-based reduc-
tion, ROMs typically leverage a two-step approach: an offline projection
step which generates the reduced representation of the state vector,
and an online forecasting step which generates the time-evolution of
the reduced state. Within this scope, both physics-based and data-based
ROMs can be constructed.
Physics-based ROMs for fluid flows typically leverage linear projec-
tion operations to resolve only the large scales, which are assumed to
contain a majority of the system energy; the effect of the unresolved
(small) scales on the resolved dynamics is then modeled. A classic ex-
ample is large-eddy simulation (LES) [5,6], where the projection oper-
ation (the mechanism for dimensionality reduction) is a non-invertible
spatial filter [7], and the forecasting step requires solving a filtered
version of the Navier–Stokes equations. The LES closure model which
captures the effects of the unresolved (small) scales on the resolved
dynamics can come from either phenomenological algebraic rela-
tions [8] or statistical closures [7,9]. Other physics-based ROMs achieve
dimensionality reduction in different ways, such as through inertial
manifold assumptions [10,11], Koopman operator theory [12], and
more intrusive alterations to the governing PDEs (e.g., flamelet models
used in combustion modeling [13] and two-dimensional turbulence
used in climate modeling [14]).
Data-based ROMs, on the other hand, rely on samples of the full-
dimensional state vector (i.e., fluid flow snapshots) to produce pro-
jection operators in an optimization (or training) step. Although this
incurs a large offline computational cost not present in physics-based
counterparts, these methods have been shown in recent years to pro-
duce significantly larger levels of dimensionality reduction, such that
this training cost can be offset by immense levels of speedup achieved
during the forecasting stage [2,15,16]. In the context of data-based
ROMs, the reduced space is termed the latent space. The initial projec-
tion step produces the (reduced) latent variables, and the forecasting
step requires modeling the dynamics of these latent variables.
Data-based ROMs have a history spanning several decades, and their
ability to capture complex physics contained in training datasets has led
to their increased adoption. Methods rooted in modal decomposition
including proper orthogonal decomposition (POD) [17], dynamic mode
decomposition (DMD) [18], resolvent analysis [19], and cluster-based
methods [20], among others derive basis functions directly from
data, which translates to latent spaces generated by linear projection
operations. The properties of the resultant latent variables naturally
vary based on the method used to produce the basis. In POD, a
basis is produced from eigenvectors of the covariance matrix of the
data, resulting in latent variables that are disentangled and optimally
preserve the system energy [17]. In DMD and resolvent analysis, the
linear projection generates latent variables described by character-
istic frequencies, similar to traditional Fourier-based methods [18].
Cluster-based methods use data partitioning strategies to produce latent
variables that are symbolic encodings of the system state [21,22]. The
evolution of latent variables in all of these methods can be modeled
in a physics-derived manner through Galerkin projection of the basis
onto the underlying PDEs [23], or in a data-based manner through the
utilization of machine learning methods [24].
All of the methods discussed above leverage linear projection op-
erations to achieve reduction. These methods have been successfully
used to produce ROMs for diffusion-dominated problems, such as tur-
bulent flows in canonical configurations and simpler PDEs without
advection terms, but face difficulties for advection-dominated prob-
lems (e.g., high Reynolds number turbulent flows and shock-containing
flows) characterized by slow decaying Kolmogorov n-width [25,26].
The utilization of data-based ROMs relying on nonlinear projection
operations have been shown to overcome these limitations, extending
both compression and forecasting capabilities of linear approaches [27
29]. The backbone of these methods is the autoencoder, a compression
approach that leverages the expressive power of neural networks to
generate robust latent space representations [30]. Autoencoders rely on
two components: the encoder, which serves as the nonlinear projection
that moves the high-dimensional state into its latent representation, and
the decoder, which seeks to undo the action of the encoder by recov-
ering the full state from the latent variables. Due to the generalizable
nature of neural networks, autoencoders can take many forms tailored
to the application at hand: for example, architectures can leverage
multi-layer perceptrons [31], convolutional neural networks [27,28],
graph neural networks [32,33], and transformers [34].
The success of neural network based autoencoders has driven efforts
to create purely data-based predictive models in latent space that offer
unprecedented levels of speedup over physics-based counterparts [35].
A core advantage is that such models can leverage real-world obser-
vations (e.g., from experimental diagnostics and operational sensor
streams), which is critical for applications that are too expensive to
simulate directly or do not have a solidified set of governing equations
[27,3640]. The goal of these forecasting models is to operate in
concert with the autoencoder by using data to learn the dynamics of
the latent variables. The latent dynamical systems can be modeled
using nonlinear [41,42] or linear [21,31] surrogate forecasting mod-
els; such models have been used to accelerate advection dominated
fluid flow [27], chaotic dynamical systems [31], and stiff chemical
kinetic simulations [43]. These methods, however e.g., those based
on strategies like recurrent neural networks (RNNs) [27], residual
networks (ResNets) [44], and latent Koopman representations [45]
typically learn the latent dynamics in the context of discrete and
explicit temporal discretization, which can limit predictive capability.
The goal of this study is to develop surrogate models for advection-
dominated systems in the neural ODE framework [46], which offers
unique advantages over the above mentioned approaches. More specif-
ically, the objective here is twofold: (1) to develop a neural ODE
based latent dynamics model for advection-dominated problems in a
purely data-based setting, and (2) to conduct a detailed analysis of the
discovered dynamics in latent space using timescale characterizations.
Before outlining the specific contributions of this work, the neural ODE
strategy as it relates to these objectives is first summarized.
Since its introduction in Ref. [46], the neural ODE strategy has
been cemented as a powerful scientific machine learning tool to model
the evolution of dynamical systems using neural networks. Instead of
directly enforcing discrete temporal representations (e.g., as used in
residual networks [44] or recurrent neural networks [42]), neural ODEs
learn a continuous representation of nonlinear system dynamics. In
other words, the instantaneous right-hand-side is modeled as a nonlinear
function via a neural network, allowing the framework to leverage
existing, vetted time-integration schemes to execute the forecasting
step (the time integration scheme is separated from the dynamics
model) [4751]. A prior version of neural ODEs, integrating feedfor-
ward neural networks with numerical methods for dynamical system
identification, used a Runge–Kutta neural network framework [52].
With autoencoder-provided latent spaces, the neural ODE conveniently
outputs a functional form for the instantaneous rate-of-change of the la-
tent variables, which is useful for reduced-order modeling applications.
This autoencoder-based neural ODE strategy been used to develop
accelerated surrogate models in a variety of physical applications,
including chaotic fluid flows [53], advection-dominated PDEs [54,55],
and stiff chemical kinetics [56,57]. Moreover, neural ODE surrogates
can be effectively combined with neural controllers to tackle complex
control problems, as demonstrated in recent works on medical digital
twins and optimal control of dynamical systems [5860].
Ultimately, previous work has shown how the combination of au-
toencoders with neural ODEs can be used to generate highly accelerated
reduced-order models of physical systems. Despite this, the source of
acceleration in the overall modeling strategy remains unclear: the goal
of achieving model accuracy from both forecasting and autoencoding
Physica D: Nonlinear Phenomena 476 (2025) 134650
2
A.S. Nair et al.
perspectives often overshadows the need to identify the contribution
of each of these components to the empirically observed model ac-
celeration. Insights into the sources of acceleration provided by the
neural ODE based ROM can lead to valuable physical and model-
oriented insights, and is the scope of the current study. Recent work
has observed the effect of smoothed latent trajectories for advection-
dominated systems produced by neural ODE simulations, pointing to a
relationship between model acceleration and intrinsic timescale elim-
ination [55]. Similar trends have been shown for neural ODE based
surrogate models for stiff chemical kinetics [56]. As such, a more
rigorous and quantitative analysis of timescale elimination produced
by both autoencoder and neural ODE components is warranted. Addi-
tionally, the effect of critical neural ODE training parameters such
as the overall integration time used to evaluate model errors on the
accuracy and degree of timescale elimination produced in the latent
space has been largely unexplored in the literature. Lastly, although
previous work has demonstrated application of neural ODE based sur-
rogate models for simplified advection-dominated PDEs (e.g., Burger’s
equation), extension of this strategy to more complex shock-containing
flows remains sparsely explored. To this end, the main contributions of
this work are as follows:
Using eigenvalue analysis to quantify the fastest and slowest
time-scales in the latent space.
Evaluate the effects of training methodologies, network archi-
tecture hyperparameters, and training trajectory length (𝑛𝑡) on
accuracy and time-scale reduction in the latent space, highlight-
ing that 𝑛𝑡 is a critical controlling parameter to determine the
extent of time-scale reduction in the latent space.
Extend the proposed framework to a challenging and highly
advection-dominated test case, specifically 1D detonation wave
propagation (considering stiff chemical kinetics), and to a real-
world 2D atmospheric dataset, to validate and affirm the observed
trends.
It is emphasized that while the approach of combining autoen-
coders for spatial compression with neural ODEs for learning latent
dynamics has been previously explored, this work leverages the general
autoencoder-based neural ODE strategy to facilitate a latent time-scale
analysis through utilization of dynamical system Jacobians. Specifi-
cally, we investigate how the autoencoder architecture, latent space
dimensionality, and, most critically, the training trajectory length
(𝑛𝑡)–parameters often selected heuristically–impact the accuracy and
computational speedups achievable in the latent space. Through three
distinct test cases, it is demonstrated how 𝑛𝑡 has a notable influence on
the smoothness of latent space trajectories, enabling the use of larger
time-steps in the latent space.
This remainder of the paper is organized as follows. Section 2
provides a description of the general autoencoder+neural ODE frame-
work applied to a surrogate modeling task, including a distinction
between different training methodologies and the proposed frame-
work for time-scale analysis. The application of this framework to the
Kuramoto–Sivashinsky (KS) equations is demonstrated in Section 3.1,
with an analysis of the effect of network hyperparameters presented in
Section 3.1.1. Section 3.2 showcases the application of the framework
to a highly advection-dominated test case of 1D detonation wave
propagation.
2. Methodology
2.1. Surrogate modeling task
The application scope of this work is tied to accelerating sim-
ulation of physical systems influenced by advection particularly
those governed by fluid dynamics using data-based surrogate models.
Such systems can be mathematically described by partial differential
equations (PDEs), where a general-form PDE is given by
𝜕
𝜕𝑡 = ()+∇𝐺+().(1)
The above equation describes the evolution of a vector of state vari-
ables, denoted =𝑢1, 𝑢2,.𝑢𝑁𝑒, where 𝑁𝑒 is the number of
space- and time-dependent transport variables. The functions , , and
represent non-linear, linear, and volumetric source term operators
respectively. Demonstrations in this work leverage data produced by
one-dimensional numerical solutions of two model PDEs for fluid dy-
namics in which advection, through the nonlinear operator , plays a
critical role. The first is the Kuramoto–Sivashinsky (KS) equations (Sec-
tion 3.1), where the state vector is interpreted as a velocity magnitude
(𝑁𝑒= 1). Here, the operator represents an advection term, a diffu-
sive term, and the volumetric source term is omitted. The second is
the compressible reacting Navier–Stokes (NS) equations (Section 3.2),
where the state vector is higher dimensional, consisting of fluid density,
velocity, chemical energy, and species mass fractions. The operators
and are physically comparable with those used in the KS equations
(i.e., they capture effects of advection and diffusion, respectively), and
the volumetric source term is retained, as it represents the effect of
chemical reactions on the flow field.
Regardless of the configuration, numerical solutions of Eq. (1) are
obtained through a method-of-lines approach, which relies on spatial
discretization onto a finite-dimensional grid. Here, the time-evolution
of sampled at all spatial discretization points denoted as 𝐮
R𝑁𝑢, where 𝑁𝑢 is the full-system dimensionality computed as number
of grid points multiplied by the number of transport variables 𝑁𝑒
is provided by the solution to a deterministic and high-dimensional
ordinary differential equation (ODE)
𝑑𝐮(𝑡)
𝑑𝑡 =𝐅(𝐮(𝑡)),𝐮(𝑡= 0) = 𝐮0.(2)
In Eq. (2), 𝐅(𝐮) captures the instantaneous discretized system dy-
namics (a discrete representation of the operators in Eq. (1), and for the
PDEs described above, represents complex interactions between advec-
tion, diffusion, and (if present) reaction contributions. Given an initial
condition 𝐮0, Eq. (2) can be solved to some final integration time as a
system of ODEs using a proper time-integration scheme. Solution of the
ODE for a given initial condition generates a trajectory of time-ordered
snapshots of the high-dimensional system state variable, which serve
as the training data for the data-driven modeling strategies described
in the sections below. Note that this data can be produced either from
explicit solutions of Eq. (2) if the analytic PDE form is known (using
time-integration schemes), or through real-world observations of the
system in question if the PDEs are unknown or intractable to solve
accurately (e.g., using high-speed or laser-based imaging tools).
The ODE in Eq. (2) is interpreted here as a ‘‘ground-truth’’ rep-
resentation of the continuous PDE counterpart, meaning the grid is
assumed to be resolved enough to properly capture the contribution
of all spatiotemporal scales in the physical operators, resulting in a
high-dimensional state vector 𝐮 that corresponds to a physical space
representation of the state. The motivation for surrogate modeling is
that the ground-truth ODE in Eq. (2) is computationally expensive to
solve for realistic applications that require fast (near real-time) flow
field predictions, such as full-scale design optimization [6163] or
model-predictive control [64,65].
As such, the modeling goal is to identify an alternative surrogate
ODE representation that expedites the solution to the ground-truth ODE
without sacrificing predictive accuracy. The surrogate ODE is denoted
𝑑𝐰(𝑡)
𝑑𝑡 =𝐆(𝐰(𝑡)),𝐰(𝑡0) = 𝜙(𝐮(𝑡0)).(3)
Instead of the full state variable 𝐮, the surrogate ODE in Eq. (3)
models the dynamics of a so-called latent state vector 𝐰R𝑛𝑤, where
𝑁𝑤 𝑁𝑢. The formulation of Eq. (3) highlights two key ingredients
required to construct a data-based surrogate model, as outlined in
Physica D: Nonlinear Phenomena 476 (2025) 134650
3
A.S. Nair et al.
Fig. 1. Combined autoencoder and neural ODE framework. Latent dynamics are modeled by a neural ODE, with movement between latent and physical spaces facilitated by an
autoencoder. Neural ODE details are provided in Section 2.2, and autoencoder details are provided in Section 2.3.
Section 1: (1) the function 𝜙, which is an instantaneous mapping
function that transforms the original (physical space) state variable
at a given time instant to its corresponding latent representation in a
reduced space, and (2) the function 𝐆, which is the latent dynamical
system (it provides the dynamics in latent space). In this study, neural
networks are used to provide functional forms for both components,
the parameters of which are recovered in a training stage using an
ensemble of trajectories from pre-computed solutions of the ground-
truth ODE. More specifically, a neural ODE strategy is used to model
the latent dynamics (described in Section 2.2), while a convolutional
autoencoder strategy is used to generate mappings to and from the
reduced latent space (described in Section 2.3). The overall approach
is shown in Fig. 1
It should be noted that accelerated evaluations over the full sys-
tem using a combined autoencoder and neural ODE strategy can be
achieved through both a reduction in system dimensionality through
the latent variables (the surrogate ODE operates on a lower-
dimensional representation, which facilitates faster evaluations of in-
stantaneous rates), and also an increase in minimum system timescales
in the latent space inherent to the functional form of 𝐆 [56] (larger
limiting timescales in 𝐆 imply elimination of prohibitive timescales
in 𝐅, which in turn allows for larger time-steps to be utilized in the
simulation procedure). More specifically, previous work has empir-
ically observed how employing a convolutional autoencoder yields
smooth trajectories in the latent space for chaotic PDE-based sys-
tems [55]. To this end, within the scope of data-based surrogate
modeling, the primary emphasis of this work is to (a) provide a rigorous
quantitative analysis on how timescale elimination is achieved in the
latent space, (b) identify the role played by key neural ODE training
hyper-parameters on this timescale elimination, and (c) understand
the manner in which coupling between autoencoder and neural ODE
components contributes to forecasting accuracy and latent timescale
sensitivity.
Methodology for the components that facilitate this study is pro-
vided below. This includes details on neural ODE based modeling
(Section 2.2), integrating autoencoders into the neural ODE framework
(Section 2.3), description of training strategies (Section 2.4), model
evaluation metrics (Section 2.5), and the extraction procedure for
dynamical system timescales (Section 2.6).
2.2. Neural ODEs for latent dynamics
It is assumed that ‘‘ground-truth’’ data represented by an ensemble
of latent space trajectories generated by application of an autoencoder
to a corresponding set of full-order system trajectories (to be described
in Section 2.4) are available. These trajectories are assumed to be
sampled at a fixed discrete time interval 𝛥𝑡. As a result, one such
ground-truth latent trajectory is given by the time-ordered set
𝑤,𝑖 = [𝐰(𝑡0),𝐰(𝑡0+ 1𝛥𝑡),𝐰(𝑡0+ 2𝛥𝑡),,𝐰(𝑡0+𝑛𝑡𝛥𝑡)], 𝑖 = 1,, 𝑁𝑇,(4)
where 𝑛𝑡 corresponds to the so-called training trajectory length, and 𝑁𝑇
represents the total number of ground-truth latent trajectories obtained
at a respective 𝑛𝑡. In this work, the ground-truth latent trajectories in
Eq. (4) are obtained from the action of a convolutional neural net-
work (CNN) based encoder on the corresponding full-order trajectory
(i.e., 𝑤,𝑖 Encode(𝑢,𝑖), where 𝑢,𝑖 contains the full-order snapshots).
Neural ODEs leverage the above trajectory data to learn a
continuous-time model for the unknown latent dynamical system that
governs the evolution of 𝐰 [46]. The starting point is to cast the
functional form for the latent dynamics (the right-hand-side of Eq. (2))
as a neural network, resulting in the neural ODE
𝑑
𝐰(𝑡)
𝑑𝑡 =(
𝐰(𝑡); 𝜃),
𝐰(𝑡0) = 𝐰(𝑡0).(5)
In Eq. (5), is a neural network characterized by parameter set 𝜃.
Fig. 2 (left) provides a description for the architecture of used in
this work. Given an initial condition 𝐰0, a black box time integrator
can be used to find the latent state
𝐰(𝑡) at any time 𝑡. In this work, an
explicit Euler time-integration scheme with a constant time-step (𝛥𝑡) is
used. As such, given the above neural ODE, a predicted latent trajectory
analogous to the ground-truth trajectory of Eq. (4), starting at the same
initial condition, can be represented as
𝑤,𝑖 = [
𝐰(𝑡0),
𝐰(𝑡0+ 1𝛥𝑡),
𝐰(𝑡0+ 2𝛥𝑡),,
𝐰(𝑡0+𝑛𝑡𝛥𝑡)], 𝑖 = 1,, 𝑁𝑇.(6)
To optimize the parameters 𝜃 of the neural network , an ob-
jective function representing a mean-squared error between the 𝑤,𝑖
ground-truth latent trajectories and the
𝑤,𝑖 predicted latent trajectories
is minimized in a training stage. This objective function is given by
NODE =1
𝑛𝑡
𝑛𝑡
𝑗=1
𝐰(𝑡0+𝑗𝛥𝑡)
𝐰(𝑡0+𝑗𝛥𝑡)
2
2,(7)
where the angled brackets . represent an average over a batch of
training set trajectories. The formulation in Eq. (7) reveals the primary
advantage of the neural ODE formulation, in that the training approach
is dynamics-informed: the latent dynamical system is trained to mini-
mize accumulated error in the latent trajectory over all 𝑛𝑡 time-steps. As
such, it must be emphasized that 𝑛𝑡 the training trajectory length is a
critical training parameter that captures the amount of dynamical infor-
mation used to construct the latent dynamical system, and is typically
chosen and fixed a-priori. The implication is that the training procedure
can be executed with different 𝑛𝑡 values from the same time-ordered
data, resulting in neural ODEs with different predictive capabilities and
stability properties. To illustrate this concept, the schematic in Fig. 3
shows how a single large ground-truth latent trajectory can be split
into 𝑁𝑇 training trajectories, each of length 𝑛𝑡. A primary goal of this
work is to rigorously study the effect of 𝑛𝑡 on both predictive accuracy
and latent time-scales, in addition to proposing a pathway to choose an
optimal value of 𝑛𝑡 based on the underlying physics of the problem.
It should be noted that in the training stage, in order to calculate
the derivatives necessary for optimization (𝜕node
𝜕𝜃 ) two approaches can
be used: one involves storing the entire rollout graph of the integrated
ODE trajectory and propagating gradients backward through it, while
the other entails solving a system of ODEs known as the adjoint
equations backward in time. In this study, the latter method is used
to ensure memory efficiency when dealing with long training trajectory
lengths. This work leverages the torchdiffeq library [67] for neural
ODE training routines.
Physica D: Nonlinear Phenomena 476 (2025) 134650
4
A.S. Nair et al.
Fig. 2. (a) Schematic of neural ODE operation. The architecture of is a feed-forward neural network with four hidden layers and an ELU activation functions. The number of
hidden units is 120 neurons. (b) Schematic of encoder operation. Encoder contains a sequence of 1D convolutional layers with batch normalization and an ELU activation function,
in which the physical space input 𝐮(𝑡) is progressively down-sampled in space. The spatial component is down-sampled by a factor of two while doubling the number of channels,
upon encountering a flattening operation and linear layer to produce the latent space projection 𝐰(𝑡). Although not shown, the decoder 𝜓 is a mirrored version of the encoder,
with transpose convolution layers replacing convolution layers. Both architectures are implemented in PyTorch [66].
Fig. 3. Schematic illustrating the interpretation of training trajectory length 𝑛𝑡, which
controls how long a NODE prediction is rolled-out during training. Higher 𝑛𝑡 values
during training allow for more dynamical information to be included in the training
objective.
2.3. Dimensionality reduction with autoencoders
Autoencoders are a class of neural networks used for unsupervised
learning, primarily in the domain of data compression and feature
extraction. An autoencoder consists of an encoder and a decoder, work-
ing in concert to both achieve dimensionality reduction in the input
high-dimensional data while preserving all of its salient information.
As mentioned in Section 2.2, to generate ground-truth latent trajec-
tories in Eq. (3), and to facilitate neural ODE inference in latent space,
a neural network based encoder 𝜙 is used to project instantaneous full-
order state samples 𝐮(𝑡) R𝑁𝑢 to corresponding lower-dimensional
latent representations 𝐰(𝑡) R𝑁𝑤 such that 𝑁𝑤 𝑁𝑢. The latent tra-
jectories described in Eq. (4) are then recovered by applying the trained
encoder 𝜙 to the corresponding high-fidelity ground-truth trajectories.
The objective of the decoder 𝜓 is to then reconstruct the original
state from this latent representation with minimal loss of information,
i.e. 𝐮(𝑡)
𝐮(𝑡) = 𝜓(𝜙(𝐮(𝑡))). Inspired by recent work in data-based
reduced order modeling [27,28,55], this work leverages convolutional
neural networks as the backbone for encoder and decoder architec-
tures. Fig. 2(right) illustrates the encoder architecture; the decoder
is a mirrored version of the encoder, which transpose convolution
layers replacing convolution layers. Alongside the architectural config-
uration, the critical parameter in any autoencoder is the latent space
dimensionality 𝑁𝑤, which effectively controls the trade-off between
dimensionality reduction and reconstruction accuracy.
The encoder and decoder are trained by minimizing the loss func-
tion
AE =
𝐮(𝑡) 𝜓(𝜙(𝐮(𝑡)))
2
2(8)
that characterizes the mismatch between the reconstructed and original
states. The angled brackets in the above equation denote an average
over all 𝐮(𝑡) instantaneous target snapshots in the training set. Note that
the time-ordered quality of the data is crucial when training the neural
ODE in latent space, but is unimportant when training the autoencoder,
since both encoder and decoder are instantaneous mapping functions.
2.4. Coupled versus decoupled training
There are two distinct ways of training the combined autoencoder
and neural ODE based model in Fig. 1. The first approach, termed
decoupled training, entails training the autoencoder and neural ODE
separately, treating them as two distinct optimization problems. The
second approach, termed coupled training, involves simultaneous train-
ing of the autoencoder and the neural ODE. The details of the differ-
ences between the two is explained below. The effect of these training
methodologies on both predictive accuracy and latent time-scales is
explored in Section 3.
Decoupled Training: The initial step involves training the autoen-
coder with snapshots of the state vector at various parameter instances,
optimizing the autoencoder loss AE using full-order trajectory data 𝑢,𝑖.
Subsequently, the trained encoder is employed to produce the ground-
truth latent trajectories 𝑤,𝑖. Following this, the neural ODE is trained
by minimizing the neural ODE loss NODE within the latent space, as
described in Eq. (7).
Coupled Training: In this approach, the autoencoder and neural
ODE are trained concurrently using a single optimization problem in
the full state-space. The loss that is minimized is given by,
coupled =1+2,(9)
where 1 and 2 are defined as
1=1
𝑛𝑡
𝑛𝑡
𝑗=1
𝜓(
𝐰(𝑡+𝑗𝛥𝑡)) 𝐮(𝑡+𝑗 𝛥𝑡)
2
2and 2=AE (10)
respectively. The quantity 1 represents the error due to the neural
ODE-predicted trajectory in physical space, and 2 is added to improve
the instantaneous projection capabilities of the autoencoder. The impli-
cation of the coupled approach is that the autoencoder parameters are
also informed of the intrinsic system dynamics in the training stage,
which is not the case in the de-coupled approach.
Physica D: Nonlinear Phenomena 476 (2025) 134650
5
A.S. Nair et al.
2.5. Evaluation metrics
To compare the predictive accuracy of different combined autoen-
coder neural ODE models, two testing metrics are used. The first metric,
called the single-step prediction error (SS) is defined as
SS =𝜓(
𝐰(𝑡+𝛥𝑡)) 𝐮(𝑡+𝛥𝑡)2
2,(11)
where the angled brackets represent an average over all (𝑡, 𝑡 +𝛥𝑡) testing
snapshot pairs. The single-step prediction error SS can therefore be
used to provide a a-priori measure of neural ODE predictive accuracy
along a trajectory, in that the effect of error accumulation through time
is discarded.
The second metric is the rollout error, and is given by
RO =1
𝑛RO
𝑛RO
𝑗=1
𝜓(
𝐰(𝑡+𝑗𝛥𝑡)) 𝐮(𝑡+𝑗 𝛥𝑡)
2
2.(12)
The rollout error metric in Eq. (12) represents an a-posteriori evaluation
of the neural ODE in physical space, and is nearly identical to 1
in Eq. (10). The difference is that the trajectory error in Eq. (12)
above is computed for an evaluation trajectory of size 𝑛RO instead of
a training trajectory of size 𝑛𝑡. In other words, a neural ODE trained
using trajectory lengths 𝑛𝑡 can be evaluated in inference to produce
trajectory lengths of different sizes. In the formulation above, 𝑛RO = 1
recovers the single-step prediction error in Eq. (11). The corresponding
Relative Absolute Error (RAE) version of SS is given by,
SS =𝜓(
𝐰(𝑡+𝛥𝑡)) 𝐮(𝑡+𝛥𝑡)
𝐮(𝑡+𝛥𝑡),(13)
and the same for RO is given by,
RO =𝑛𝑅𝑂
𝑗=1 𝜓(
𝐰(𝑡+𝑗𝛥𝑡)) 𝐮(𝑡+𝑗 𝛥𝑡)
𝐮(𝑡+𝑗𝛥𝑡).(14)
As such, in the demonstrations in Section 3, one of the objectives is
to isolate the effect of various neural ODEs produced with different
training trajectory lengths 𝑛𝑡 on the above evaluation metrics.
2.6. Dynamical system timescales
Eigenvalue analysis has been used to analyze the time-scales of
dynamical systems [6870]. Training neural ODEs on long trajectories
is a challenging problem which requires a lot of fine-tuning. The choice
of the training trajectory length 𝑛𝑡 is often based on heuristics and does
not have any theoretical justification. The complexity of the training
loss function grows with the increase in 𝑛𝑡, and a neural ODE trained
on a small 𝑛𝑡 may not capture the relevant dynamics of the system.
Thus, choosing an optimal 𝑛𝑡 is an open research problem. In this
work, we employ an eigenvalue based time-scale analysis framework to
rigorously study the effect of 𝑛𝑡 on accuracy and time-scale reduction
in the latent space. We employ the same framework to study the effect
of certain network architecture based hyperparameters on time-scale
reduction in the latent space. For a dynamical system that is governed
by an ODE given by Eq. (2), the inverse of the largest eigenvalue of the
right-hand side Jacobian,
𝑡𝑙𝑖𝑚(𝐅, 𝑡) = 1
𝑚𝑎𝑥(𝑒𝑖𝑔(𝜕𝐅(𝐔,𝑡)
𝜕𝐔))
,(15)
gives a measure of the fastest evolving time-scales and can give an
indication of the largest time-step that can be used to evolve the system
using an explicit time-integration scheme. A higher value of 𝑡𝑙𝑖𝑚 in the
latent space indicates smoother latent space trajectories and in-turn
larger possible time-steps. The inverse of the smallest eigenvalue on
the other hand,
𝑡𝑚𝑎𝑥(𝐅, 𝑡) = 1
𝑚𝑖𝑛(𝑒𝑖𝑔(𝜕𝐅(𝐔,𝑡)
𝜕𝐔))
,(16)
gives a measure of the slowest evolving time-scales of the system.
In this work, we use this metric to find a correlation between 𝑛𝑡
and the predictive accuracy of the combined autoencoder neural ODE
framework.
3. Numerical experiments
The combined autoencoder neural ODE framework is tested in two
numerical experiments: the Kuramoto–Sivashinsky (KSE) equations and
a one-dimensional detonation problem. The KS equation serves as a test
bed to discern the impact of various training hyperparameters on time-
scale reduction in the latent space. Eigenvalue analysis, as explained
in the previous section, quantifies the fastest and slowest time scales in
the latent space. For a latent dynamics model approximated by a neural
ODE, described by Eq. (5), the fastest and slowest evolving time-scales
in the latent space are given by,
𝑡lim(𝑡) = 1
max(eig(𝜕(
𝐰(𝑡);𝜃)
𝜕
𝐰))
and
𝑡max(𝑡) = 1
min(eig(𝜕(
𝐰(𝑡);𝜃)
𝜕
𝐰))
,(17)
To assess the impact of the number of convolutional layers, latent
dimensions in the autoencoder, and the training methodology, net-
works with varied configurations for each parameter (while keeping
the other two fixed) are trained. The resulting 𝑡𝑚𝑎𝑥(𝑛𝑂𝐷 𝐸, 𝑡) profiles
are then compared to the full system time-scales provided by Eq. (15).
Furthermore, the influence of the training trajectory length 𝑛𝑡 on both
time-scale reduction and predictive accuracy is considered for both
numerical experiments.
3.1. KS equation
The KS equation is solved using jax-cfd [71], which allows dif-
ferentiation through the full-order (physical space) dynamical system
𝑑𝐮𝑑𝑡 =𝐅(𝐮), which facilitates Jacobian computation 𝜕𝐅
𝜕𝐮 for extracting
the corresponding true physical space time-scales facilitating compari-
son with the time-scales in the latent space. The KS equation is defined
by the partial differential equation,
𝜕𝑢
𝜕𝑡 +𝑢𝜕𝑢
𝜕𝑥 +𝜈𝜕2𝑢
𝜕𝑥2𝜈𝜕4𝑢
𝜕𝑥4= 0 in [0,𝐿] × R+,(18)
with periodic boundary conditions, where 𝐿= 64. The initial conditions
are specified as,
𝑢0(𝑥) =
3
𝑘=1
𝑛𝑐
𝑗=1
𝑎𝑗sin(𝜔𝑗𝑥+𝜙𝑗),(19)
where 𝜔𝑗 is randomly chosen from {𝜋𝐿, 2𝜋𝐿, 3𝜋𝐿}, 𝑎𝑗 is sampled
from a uniform distribution in [−0.5,0.5], and the phase 𝜙𝑗 follows
a uniform distribution in [0,2𝜋]. The parameter 𝑛𝑐= 30 governs the
number of modes in the initial conditions. 𝜈= 0.01 and other mentioned
parameters are chosen to match previous work [55,72,73].
The KS equation is solved on a uniform domain of size 1024 using
a pseudospectral discretization. A pseudospectral solver equations uti-
lizes spectral methods for solving partial differential equations (PDEs).
The spatial domain is discretized using Chebyshev collocation points,
which provide accurate approximations for functions with minimal nu-
merical dispersion. The solver employs a Fourier transform to represent
the temporal derivatives in the KS equation, converting the PDE into a
system of ordinary differential equations (ODEs) in the spectral domain.
Time integration is performed using an explicit Euler method and the
integrated trajectories are converted back to physical space using an
inverse Fourier transform. This pseudospectral approach allows for effi-
cient and accurate simulation of the spatiotemporal dynamics governed
by the KS equation. The pseudospectral solver implemented jax-cfd is
used to generate the data. The training samples are generated by assem-
bling 25 trajectories, from different initial conditions, each comprising
Physica D: Nonlinear Phenomena 476 (2025) 134650
6
A.S. Nair et al.
Fig. 4. (Left) Comparison of the ground truth and rollout predicted fields (latent
dimension of 25 and 4 convolutional layers and 𝑛𝑡= 500) for an unseen initial condition.
(Right) Corresponding latent space trajectories for 𝑛𝑡= 500 (dashed) and 𝑛𝑡= 4000
(dotted), compared to ground truth (solid). A subset of the 25 latent trajectories are
shown here for visual clarity.
of 1500 time-steps (𝛥𝑡 = 1.95𝑒−3) and the trained autoencoder neural
ODE framework is tested on unseen initial conditions.
Fig. 4 compares the ground truth field for 𝑢, with the rollout field
(𝑛𝑅𝑂 = 500) predicted by the autoencoder neural ODE framework
trained in a decoupled manner using 𝑛𝑡= 500, for an unseen initial
condition. The predictions closely match the ground truth data. The
figure also illustrates the corresponding latent space trajectories for 𝑛𝑡=
500 and 𝑛𝑡= 4000, both of which appear smoother compared to the state
space fields. Although the presence of smooth latent space trajectories
has been empirically observed [55], it has not been quantitatively
studied. It can be seen that the latent space trajectories predicted by
the neural ODE with 𝑛𝑡= 500 more closely matches the ground truth
trajectories as compared to the neural ODE with 𝑛𝑡= 4000. These
noticeable differences in accuracy suggest that the relationship between
increasing 𝑛𝑡 and predictive strength is not straightforward, as discussed
later. Additionally, latent space trajectories are smoother than state
space fields, with 𝑛𝑡= 4000 trajectories being smoother than 𝑛𝑡= 500.
3.1.1. Effect of network hyperparameters
In most applications of the combined autoencoder and neural ODE
approach for dynamical systems, hyperparameters related to the net-
work architecture are often chosen heuristically. In this section, the
effect of two hyperparameters on the time-scale reduction is quantified.
Number of convolutional layers: The convolution operation is con-
ceptualized as a localized filtering operation, suggesting that incor-
porating more convolutional layers between the state space and the
latent space should ideally yield smoother latent space trajectories and,
consequently, greater time-scale reduction in the latent space.
However, Fig. 5 illustrates the limiting time-scale in the latent space,
denoted as 𝑡𝑙𝑖𝑚(𝑛𝑂𝐷𝐸 , 𝑡), as a function of time for autoencoders with
varying numbers of convolutional layers. Contrary to expectations, the
number of convolutional layers does not exert a significant influence on
the time-scales of the latent variables. In Fig. 5, the line corresponding
to the ’full-system’ is generated by plotting the inverse of the largest
eigenvalue of the full KSE-equations right-hand side Jacobian as a
function of time. It can be seen that all the autoencoders in general
significantly reduce the limiting time-scales of the system, by roughly
six orders of magnitude.
Latent space dimensionality: Fig. 5 also depicts the limiting time-
scale in the latent space, denoted as 𝑡𝑙𝑖𝑚(𝑛𝑂 𝐷𝐸, 𝑡), as a function of
time for autoencoders with different numbers of latent dimensions. The
observation from the figure is that the size of the latent space does not
have a substantial impact on the limiting time-scales in the latent space.
3.1.2. Effect of training methodology
Literature on the application of the combined autoencoder neural
ODE for surrogate modeling lacks a rigorous comparison between the
coupled and decoupled training approaches described in Section 2.4.
This section studies the effect of the training methodology on predictive
accuracy along with the effect of adding the 2 loss term on the
accuracy of the autoencoder.
Loss Terms: The effect of adding 2 loss term to the coupled training
approach described in Section 2.4, is studied by comparing auten-
coder reconstruction loss 𝐴 of coupled autoencoders trained with and
without the 2 loss term added during training.
Fig. 6 illustrates 𝐴 for coupled autoencoders, comparing those
trained with and without the 2 loss term, across different training
trajectory lengths (𝑛𝑡).
In both scenarios, it is noted that 𝐴 increases with higher values
of 𝑛𝑡. The relationship is attributed to the fact that increasing 𝑛𝑡
lengthens the trajectory predicted by the neural ODE, consequently
enhancing the complexity of the 1 loss term in the coupled training
loss (Eq. (10)), which could account for the observed trend in Fig.
6. Notably, throughout the range of 𝑛𝑡 values, coupled autoencoders
trained with the 2 loss term exhibit lower instantaneous projection
errors. Consequently, for subsequent comparative studies, a coupled
autoencoder trained with the 2 loss term is utilized.
Projection error: Fig. 7 presents a comparison of the autoencoder
reconstruction error 𝐴, between autoencoders trained using decoupled
and coupled approaches. This analysis considers various sample sizes
(rollout-length) over a range of 𝑛𝑡 values. In the decoupled approach, a
single autoencoder is trained, while different neural ODEs are trained
with varying values of 𝑛𝑡. Consequently, the autoencoder projection
errors remain independent of 𝑛𝑡 in this scenario. Conversely, in the
coupled training approach discussed in the previous section, the in-
stantaneous projection error increases with 𝑛𝑡. When examining the
magnitudes of the projection error for autoencoders trained using
decoupled and coupled approaches, it becomes apparent that the de-
coupled approach yields lower projection errors. This outcome aligns
with expectations, considering that the coupled approach optimizes
parameters for both the autoencoder and neural ODE simultaneously,
posing a more challenging optimization problem in general.
3.1.3. Training trajectory length
The significance of another crucial hyperparameter, namely the
training trajectory length 𝑛𝑡, was analyzed here. This parameter is
chosen a priori and has a substantial impact on the predictive accuracy
of the system. Given that the focus of this work is on time-scale analysis
in the latent space, the influence of both 𝑛𝑡 and certain hyperparame-
ters related to network architecture of the autoencoder, on the latent
time-scales, are isolated.
Fig. 8 displays the limiting time-scale (𝑡𝑙𝑖𝑚) as a function of time for
neural ODEs trained with different training trajectory lengths (𝑛𝑡) in
both the decoupled and coupled training approaches. The plot reveals
that increasing 𝑛𝑡 results in an augmented limiting time-scale in the
latent space, indicating smoother latent space trajectories that allow for
larger time-steps. Specifically, transitioning from an 𝑛𝑡 value of 100 to
4000 leads to an increase in the limiting time-scale in the latent space
by approximately two orders of magnitude.
When comparing the corresponding limiting time-scales for the
coupled and decoupled training approaches, it becomes apparent that
𝑡𝑙𝑖𝑚 exhibits similar magnitudes for a given 𝑛𝑡 value in both approaches.
Consequently, the training methodology does not significantly impact
the time-scale reduction in the latent space.
Summarizing the findings from previous sections, it is evident that
among all the studied hyperparameters affecting time-scale reduction,
Physica D: Nonlinear Phenomena 476 (2025) 134650
7
A.S. Nair et al.
Fig. 5. Limiting time-scale in the latent space 𝑡𝑙𝑖𝑚 (Eq. (17)) as a function of time for neural ODEs trained with varying (left) different number of autoencoder convolutional layers
(using 𝑛𝑡= 500 and latent dimension of 25) and (right) latent dimensionality 𝑁𝑤 (using 4 convolution layers and 𝑛𝑡= 500).
Fig. 6. 𝐴𝐸 (Eq. (8)) as a function of 𝑛𝑡 using the coupled training approach with (left) and without (right) the 2 term in the combined loss (Eq. (9)).
Fig. 7. AE (Eq. (8)) as a function of 𝑛𝑡 using the (Left) de-coupled and (Right) coupled training approach.
the training trajectory length (𝑛𝑡) is the sole parameter with a sub-
stantial impact on 𝑡𝑙𝑖𝑚. Increasing 𝑛𝑡 results in smoother latent space
trajectories, allowing for larger time-steps in the latent space. However,
for predictions at unseen parameter instances, the relationship between
increasing 𝑛𝑡 and accuracy is not straightforward, as discussed later.
Consequently, there exists an optimal value for 𝑛𝑡 that strikes the best
trade-off between predictive accuracy and time-scale reduction.
Single step rollouts: To assess the predictive accuracy of the combined
autoencoder neural ODE framework while excluding the impact of error
accumulation from extended rollouts, we calculate the Mean Squared
Error (MSE) loss SSR (given by Eq. (11)) and Relative Absolute Error
(RAE) (Eq. (13)) following a single-step prediction in time, considering
an unseen initial condition.
providing a measure of the relative error in the predicted solution.
Fig. 9 illustrates the SSR and SSR, comparing the decoupled and
coupled training approaches across various 𝑛𝑡 values.
Across the entire range of 𝑛𝑡 values, the decoupled training approach
exhibits lower MSE and RAE values. In the decoupled approach, both
losses increase with 𝑛𝑡 as expected since the single-step prediction loss
is optimized for 𝑛𝑡= 1. For the coupled approach, the losses initially
rise with 𝑛𝑡 and then decrease for 𝑛𝑡 values exceeding 200.
Rollouts: To compare the predictive accuracy of different models over
longer rollout trajectory lengths, the Mean Squared Error RO (given
by Eq. (12))and Relative Absolute Error RO (Eq. (13)) are used.
Fig. 10 depicts a comparison of RO and RO for a rollout trajec-
tory length 𝑛RO = 500 timesteps in both the coupled and decoupled
approaches across a range of 𝑛𝑡 values. Notably, for the extended rollout
trajectory, the decoupled approach demonstrates significantly lower
error levels, nearly two orders of magnitude less, in comparison to the
coupled approach.
In the case of the coupled approach, there is no discernible depen-
dence of the rollout errors on 𝑛𝑡, except that lower 𝑛𝑡 values correspond
to lower rollout errors. This observation can be explained by noting that
the predictive accuracy of the autoencoder and neural ODEs, trained
using the coupled approach, appears to be diminishing as more samples
are utilized for training (higher values of 𝑛𝑡).
Conversely, for the decoupled approach, the rollout losses initially
increase with 𝑛𝑡 and then reach their minimum at an optimal 𝑛𝑡 of 500.
This implies that the combined decoupled neural ODE is most adept
at representing the underlying physics of the system at this specific
Physica D: Nonlinear Phenomena 476 (2025) 134650
8
A.S. Nair et al.
Fig. 8. Limiting time-scale in the latent space (𝑡𝑙𝑖𝑚) as a function of time for neural ODEs with different training trajectory lengths (𝑛𝑡) for the de-coupled (left) and coupled (right)
training approaches. All models utilize latent dimensionality of 25 and four convolution layers.
Fig. 9. SS (Eq. (11)) and SS (Eq. (13)) as a function of 𝑛𝑡 using de-coupled (left) and coupled (right) training approaches. All models utilize latent dimensionality of 25 and
four convolution layers.
Fig. 10. RO (Eq. (12)) and RO (Eq. (14)) for a rollout trajectory length 𝑛RO = 500 as a function of 𝑛𝑡 using de-coupled (left) and coupled (right) training approaches. Models
utilize latent dimensionality of 25 and four convolution layers.
value of 𝑛𝑡. Although this may seem expected, the identification of this
‘optimal’ 𝑛𝑡 holds substantial significance for achieving the best pre-
dictive accuracy. Subsequent sections explore a method to determine
this optimal 𝑛𝑡 a priori, providing insights into achieving the highest
predictive accuracy.
3.1.4. Largest time-scales
In addition to investigating the limiting time-scales in the latent
space, we also explore the largest time scales, represented by the
inverse of the smallest eigenvalues of the right-hand side Jacobian and
denoted as 𝑡𝑚𝑎𝑥 (defined in Section 2.6). The largest time-scales of a
dynamical system typically contain the majority of the system’s energy,
and ideally, a neural ODE that captures these large time-scales should
yield better predictive accuracy.
Fig. 11 (left) shows RO as a function of 𝑛𝑡 for different rollout
trajectory lengths 𝑛RO. It can be seen that for 𝑛RO = 500 and lower,
the lowest RO is seen near 𝑛𝑡= 500. As 𝑛RO is increased beyond 500,
the minimum shifts to 𝑛𝑡= 1000 and this trend remains consistent for
all higher 𝑛RO values, suggesting that neural ODEs with 𝑛𝑡= 500 and
𝑛𝑡= 1000 have the highest predictive accuracy for rollout predictions.
To understand why this is the case, Fig. 11 (right) illustrates 𝑡𝑚𝑎𝑥
as a function of time for neural ODEs with different training trajectory
lengths. The plot reveals that the neural ODEs trained with 𝑛𝑡= 500 and
𝑛𝑡= 1000 have 𝑡𝑚𝑎𝑥 values closely aligning with those of the full system,
in contrast to higher (4000) and lower (8) 𝑛𝑡 values. As observed in
Figs. 10 and 11, since 𝑛𝑡= 500 and 𝑛𝑡= 1000 yield the lowest rollout
errors for unseen parameter instances, this observation suggests that
neural ODEs with the best predictive accuracy have 𝑡𝑚𝑎𝑥 values closely
aligned with those of the full system, and therefore effectively capture
the slowest evolving physics.
While the precise determination of an optimal value for 𝑛𝑡 remains
unclear, the results shown here suggest that examining neural ODE
behavior on the basis of 𝑡𝑚𝑎𝑥 can offer insights into the range of 𝑛𝑡
values that would likely result in the best predictive accuracy.
Physica D: Nonlinear Phenomena 476 (2025) 134650
9
A.S. Nair et al.
Fig. 11. (Left) RO (Eq. (12)) as a function of 𝑛𝑡 for different rollout trajectory lengths 𝑛RO using the de-coupled training strategy. (Right) Largest time-scale in the latent space
(𝑡𝑚𝑎𝑥) (Eq. (17)) as a function of time for neural ODE with different training trajectory lengths (𝑛𝑡). Models utilize latent dimensionality of 25 and four convolution layers.
3.2. Extension to channel detonations
In this section, to further demonstrate the generality of the timescale
relationships described in the context of the KS equations in the pre-
vious sections, additional studies of the neural ODE strategy are per-
formed on the compressible reacting Navier–Stokes equations. More
specifically, surrogate models are constructed for unsteady gaseous
detonations in a one-dimensional channel configuration, constituting
a more realistic and highly advection-dominated demonstration case
that incorporates multiple transported variables. In broad terms, an
unsteady detonation can be interpreted as a propagating shockwave
coupled with a chemical reaction zone [74]. Accelerated simulations of
detonation-containing flows are of significant interest to the high-speed
propulsion community, where emerging concepts reliant on detonation-
based combustion offer pathways for higher efficiency and robust de-
signs [74,75]. In this context, similar to the KS demonstration studies in
the previous sections, numerical solutions of the compressible NS equa-
tions with detailed chemical kinetics are used to generate ground-truth
trajectory data describing the propagation of self-sustained detonation
waves, from which the combined autoencoder-neural ODE models are
trained. The objective is to not only showcase the capability of neural-
ODE based approaches in modeling detonation dynamics, but also
to illustrate consistency in neural ODE timescale trends (namely, the
relationship between 𝑛𝑡 and the latent timescales) across fundamentally
different PDEs.
The detonation configuration and initiation strategy is shown in
Fig. 12. More specifically, unsteady Hydrogen-Air detonations are ini-
tialized in the manner of Ref. [76], where a driver gas at elevated
pressure and temperature near the left wall is used to establish a self-
sustained Chapman–Jouguet detonation wave that propagates through
the channel. To generate ground-truth detonation data to train the
data-based models, the compressible reacting NS equations are solved
using a flow solver developed at the University of Michigan based
on the AMReX framework [77,78]. The solver is a block-structured
adaptive mesh refinement (AMR) extension of the extensively verified
UMReactingFlow [79] (note that although an AMR-based solver is
employed here, grid refinement is not used in this study). A glob-
ally second-order finite-volume strategy is utilized, where advection
terms are treated with slope-limited Harten-Lax-van Leer-Contact ap-
proximate Riemann solver [80] and diffusion terms are treated using
standard central schemes. Detailed chemistry kinetics routines (species
production rate and transport coefficient evaluations) are handled by
Cantera [81]. In this work, hydrogen-air chemistry is modeled us-
ing the 9 species, 21 reaction detailed mechanism of Mueller et al.
(1999) [82]. Global time integration is handled using a Strang splitting
strategy; chemical time integration is performed using an adaptive
explicit method, and the advection-diffusion temporal advance uses a
stability-preserving second-order Runge–Kutta method. The reader is
Table 1
Detonation dataset description.
Trajectory 𝑃amb 𝑇amb 𝑃𝑑𝑃amb 𝑇𝑑𝑇amb Training
snapshots
Testing
snapshots
1 0.5 atm 300 K 40 10 7500 7500
2 1 atm 300 K 20 10 7500 7500
directed to Ref. [79] for additional detail on the solver numerics and
discretization approach.
Detonation dynamics are parameterized by both ambient gas and
driver gas properties. In particular, the ambient gas is known to control
detonation wave speeds, peak pressures, and chemical timescales in
the wave structure (i.e., higher ambient gas pressures result in smaller
chemical timescales and more chemically stiff wave structures), and
the driver gas has additional, albeit more minor, effects on detonation
coupling and observed wave speeds. As a result, parameters of the
trajectories used to train the models are described by two distinct driver
pressure ratios and ambient pressures, and are provided in Table 1. For
both the decoupled autoencoder and neural ODE, only half of the total
time snapshots from each trajectory are utilized for training purposes.
In each snapshot, each spatial discretization point stores fluid density,
pressure, temperature, velocity, and all species mass fractions, resulting
in an input snapshot channel depth of 13 (in contrast to the unity depth
used in the KS equation).
For all ground-truth simulation trajectories, the detonation channel
length was set to 0.3 m, the grid resolution was fixed to 50 μm. This
resolution was sufficient enough to allow for proper formation of a self-
sustained detonation, allowing all complex reacting flow dynamics to
be captured. The simulation time-step was fixed to 5 ns, and snapshots
were written at intervals of 𝛥𝑡 = 10−8 for neural ODE training and
inference. Examples of detonation wave evolution, along with the cor-
responding smooth latent space trajectories are shown in Fig. 13. Fig.
14 shows the peak pressure traces for both the considered trajectories.
3.2.1. Extrapolation in time
The predictive accuracy of the trained autoencoder neural ODE
framework trained using the decoupled approach and with 𝑛𝑡= 250,
𝑛𝑢 = 10 and four convolutional layers in the autoencoder (see Fig.
2), is evaluated by extrapolating in time beyond the training dataset,
extending until the shock and flame exit the domain. In Figs. 15 and 16,
the predicted and ground truth normalized pressure, temperature, and
mass fractions for 𝐻202 and 𝐻20 profiles are presented at time instances
within and outside the training set. The results indicate that the neural
ODE autoencoder framework closely aligns with the ground truth data
within the training dataset and effectively extrapolates in time for both
driver pressure ratios and ambient pressures.
Physica D: Nonlinear Phenomena 476 (2025) 134650
10
A.S. Nair et al.
Fig. 12. Schematic of channel detonation configuration and initial condition, where a high-energy driver gas is used to initialize a self-sustained detonation wave that propagates
left-to-right through the channel. Ambient and driver pressures for the trajectories considered are provided in Table 1. Driver gas compositions (not shown in schematic) come
from Chapman–Jouguet conditions obtained from the Shock and Detonation toolbox [83]. Initial fluid velocity is zero throughout entire domain.
Fig. 13. (Left) Pressure, temperature and species mass fractions fields for Trajectory 1 at different time instances showing the initial condition, initiation phase and self-sustained
propagation phase. (Right) Latent space trajectories obtained during self-sustained propagation phase (with 𝑛𝑡= 250, 𝑁𝑤= 10 and four convolutional layers in the autoencoder).
Fig. 14. Peak pressure traces for Trajectory 1 and 2 (left and right, respectively). See Table 1 for trajectory descriptions.
3.2.2. Effect of network hyperparameters
Fig. 17 displays the Mean Squared Error (MSE) of predicted trajecto-
ries beyond the training set, calculated over 1250 samples, as a function
of 𝑛𝑡. The graph reveals that the error is minimized at an optimal value
of 𝑛𝑡, similar to what is observed in Fig. 10 for the KS equations.
Fig. 18 illustrates the limiting time-scale in the latent space, denoted
as 𝑡𝑙𝑖𝑚, over time for neural ODEs trained with different values of 𝑛𝑡.
The graph indicates that increasing 𝑛𝑡 leads to higher values for 𝑡𝑙𝑖𝑚,
resulting in smoother latent space trajectories. This trend aligns with
the observations for the KS equations shown in Fig. 8. Comparing the
latent space’s limiting (𝑡𝑙𝑖𝑚 (𝑡)) and largest (𝑡𝑚𝑎𝑥 (𝑡)) time-scales to their
physical space counterparts, as in the KS equation, poses a challenge
because the code generating the data prevents differentiation through
the full-order dynamical system, essential for calculating Jacobians
(𝜕𝐹
𝜕𝑢 ) required to determine the full-system time-scales.
3.3. Extension to 2D atmospheric flow
In this section, additional demonstrations on an atmospheric flow
dataset are performed to augment the results from the KS equation
and detonation test cases, and to further verify trends with respect to
the training trajectory length on the neural ODE strategy. Similar to
Section 3.2, the details of the dataset are first provided, followed by
an assessment of model performance and 𝑛𝑡 effects. Additionally, this
section also adds to the preexisting analysis above, in that the effect
of simulation time-step variations on neural ODE performance is also
assessed.
The 2D atmospheric fields are produced using a simplified atmo-
spheric general circulation model (AGCM) called SPEEDY. SPEEDY
is a spectral transform AGCM that was developed to produce rapid
Physica D: Nonlinear Phenomena 476 (2025) 134650
11
A.S. Nair et al.
Fig. 15. Comparison of the ground truth and predicted fields for Trajectory 1 within the training set (left) and outside the training set (right). Results shown for 𝑛𝑡= 250, 𝑁𝑤= 10,
and four convolutional layers, with extrapolation time of 37.5 μs (7500 time-steps).
Fig. 16. Comparison of the ground truth and predicted fields for Trajectory 2 within the training set (left) and outside the training set (right). Results shown for 𝑛𝑡= 125, 𝑁𝑤= 10,
and four convolutional layers, with extrapolation time of 37.5 μs (7500 time-steps).
Fig. 17. RO (Eq. (12)) for a rollout trajectory length 𝑛RO = 1250, as a function of 𝑛𝑡
(𝑁𝑤= 10 and four convolutional layers).
climate simulations, using simplified, but modern physical parameteri-
zation schemes [84,85]. The grid of SPEEDY consists of a horizontal
spatial resolution of 3.75 × 3.75 with variables defined at eight
vertical levels. The 3D varying state variables of the model are the two
components of the horizontal wind vector, temperature, and specific
humidity (moisture), while the single two-dimensionally varying state
Fig. 18. Limiting time-scale in the latent space (𝑡𝑙𝑖𝑚) (Eq. (17)) as a function of
time for neural ODEs with different training trajectory lengths 𝑛𝑡 (𝑁𝑤= 10 and four
convolutional layers).
variable is the natural logarithm of surface pressure. For this dataset,
we subsample the 3D variables to obtain 4 key variables of the model:
near surface temperature, specific humidity, and upper level winds.
This atmospheric dataset constitutes a highly complex and nonlinear
advection-dominant benchmark that is fundamentally different from
Physica D: Nonlinear Phenomena 476 (2025) 134650
12
A.S. Nair et al.
Fig. 19. Comparison of the ground truth and predicted fields for atmospheric dataset the within the training set (left) and outside the training set (right). Results shown for
𝑛𝑡= 20, 𝑁𝑤= 200, and four convolutional layers, with extrapolation time of 2500 time-steps.
both KSE and detonation cases described above. It therefore serves
as an additional dynamical system in this work to more comprehen-
sively assess neural ODE timescale trends. For additional details on the
physical phenomena contained in the dataset, the reader is directed to
Refs. [84,85].
3.3.1. Extrapolation in time
The predictive performance of the trained autoencoder neural ODE
framework, developed using the decoupled approach and configured
with 𝑛𝑡= 20, 𝑛𝑢 = 200, and four convolutional layers in the autoencoder
(see Fig. 2), is assessed by extrapolating beyond the training dataset.
Fig. 19 presents the predicted and ground truth fields for temperature,
specific humidity (SH), and the 𝑢 and 𝑣 velocity components at vari-
ous time instances, both within and beyond the training period. The
results suggest that the neural ODE autoencoder framework captures
the system dynamics within the training domain and shows an ability
to predict system behavior at unseen time instances. It should be
noted that while many of the large scale features are preserved at
the shown extrapolation time, there are indeed unphysical fine-scale
artifacts in the predicted solution, pointing to intrinsic instabilities for
this problem. Stability challenges for this dataset are generally expected
using CNN-based encodings, due to the high level of nonlinearity and
the fact that no additional stability enhancements are employed here.
Despite this, it is emphasized that the purpose of this additional demon-
stration is to augment the KS equation and detonation analysis through
inspection/understanding of the role of 𝑛𝑡, which is summarized below.
3.3.2. Effect of network hyperparameters
Fig. 20 illustrates the Mean Squared Error (MSE) of predicted tra-
jectories beyond the training set, evaluated over 1000 samples, as a
function of 𝑛𝑡. The plot highlights that the error reaches its minimum
at an optimal 𝑛𝑡, consistent with trends observed in Fig. 10 for the KS
equations and Fig. 17 for the 1D channel detonation case.
Fig. 21 shows the evolution of the limiting time-scale in the latent
space, 𝑡𝑙𝑖𝑚, for neural ODEs trained with varying 𝑛𝑡 values. The results
show that larger 𝑛𝑡 values correspond to increased 𝑡𝑙𝑖𝑚, which promotes
smoother trajectories in the latent space. This behavior is consistent
with the trends observed for the KS equations in Fig. 8 and the 1D
channel detonation case in Fig. 18.
Fig. 20. RO (Eq. (12)) for a rollout trajectory length 𝑛RO = 1000, as a function of 𝑛𝑡
(𝑁𝑤= 100 and four convolutional layers).
Fig. 21. Limiting time-scale in the latent space (𝑡𝑙𝑖𝑚) (Eq. (17)) as a function of time
for neural ODEs with different training trajectory lengths 𝑛𝑡 (𝑁𝑤= 100 and four
convolutional layers).
3.3.3. Effect of time-step
The effect of the time-step size 𝛥𝑡, used in the latent space during
inference, on the rollout error RO is analyzed in this section. For this
analysis, the best-performing model with 𝑛𝑡= 20, trained using 𝛥𝑡 = 10,
was selected.
Fig. 22 illustrates the variation of the rollout error RO as a function
of the time-step 𝛥𝑡 used in the latent space for a rollout trajectory
Physica D: Nonlinear Phenomena 476 (2025) 134650
13
A.S. Nair et al.
Fig. 22. RO (Eq. (12)) for a rollout trajectory length 𝑛RO = 2500, as a function of 𝛥𝑡
(𝑁𝑤= 100 and four convolutional layers).
length of 𝑛RO = 1000. The results indicate that the error generally
decreases as the time-step increases and eventually reaches a plateau.
This behavior suggests that neural ODEs trained on discrete data points
may primarily approximate discrete dynamics rather than capturing the
underlying continuous dynamics. As a result, when the inference time-
step deviates from the training time-step, the model’s error can increase
significantly. Notably, evaluating the model at different time-step sizes,
whether larger or smaller than the training step, can lead to substantial
errors, indicating potential overfitting to the training discretization and
limited generalization across different time-step scales. Similar trends
have also been observed in previous work [8688], highlighting the
challenges of generalizing across different time-step scales.
4. Conclusions
This work explores the applicability of autoencoder-based neural
ODE surrogate models for accelerated simulation of PDE-based systems
in which advection terms play a critical role. Overall, the strategy was
found to produce effective models for challenging unsteady advection-
dominated dynamics in the context of both the Kuramoto–Sivashinsky
and compressible reacting Navier–Stokes equations (a detonation con-
figuration described by detailed chemical kinetics).
Alongside predictive demonstrations, however, the key thrust of
this work was to uncover physical insight into the sources of model
acceleration (i.e., how the neural ODE achieves its acceleration). Since
acceleration in such surrogate models is intrinsically tied to the elimina-
tion of prohibitive time-scales, the conceptual framework to facilitate
this portion of the study came from the quantification, and in-depth
analysis, of neural ODE time-scales.
More specifically, through eigenvalue analysis of dynamical system
Jacobians, the effects of both autoencoder and neural ODE components
on latent system time-scales in the surrogate model was quantified and
analyzed in this work. To this end, the sensitivity of various critical
training parameters de-coupled versus end-to-end training, latent
space dimensionality, and the role of training trajectory length, for
example to both model accuracy and the discovered latent system
timescales was investigated in detail.
Conducting eigenvalue-based timescale analysis on the KS equations
proved instrumental in isolating the impact of individual training hy-
perparameters on timescale elimination in the latent space. Notably,
the number of convolutional layers in the autoencoder and the la-
tent dimensionality exhibited little discernible effect on latent space
timescales. Interestingly, the training methodology in terms of cou-
pled versus de-coupled optimization of the autoencoder and neural
ODE components also did not have any effect on the time-scale
reduction, although, as reported in recent work [55], the decoupled
training approach had lower single-step and rollout errors for both PDE
systems tested here.
On the other hand, the training trajectory length, denoted as 𝑛𝑡,
emerged as a crucial parameter governing latent timescales. More
specifically, it was found that increasing 𝑛𝑡 generally increases the lim-
iting (smallest) timescales (𝑡𝑙𝑖𝑚 ) in latent space, resulting in smoother
latent trajectories which in turn translates to greater levels of achiev-
able acceleration. Exceeding a certain 𝑛𝑡 threshold, however, dimin-
ished predictive accuracy. Consequently, an optimal 𝑛𝑡 value, striking
a balance between timescale elimination and predictive accuracy, was
identified for both the KS equation and the detonation test case.
Efforts to establish a framework for a priori identification of this
optimal 𝑛𝑡 value based on the problem’s physics involved analyzing the
largest timescales (𝑡𝑚𝑎𝑥) in the latent space for various 𝑛𝑡 values. For
the KS results observed here, the largest time-scales of the neural ODE
exhibiting the best predictive accuracy were found to align closely with
those of the full-order system, pointing to the role of capturing slow-
moving dynamics in the surrogate latent space for enhanced accuracy.
For the unsteady detonation and atmospheric flow datasets, a similar
general trend was seen in that the rollout error was shown to be
minimized at an particular value of 𝑛𝑡. Although promising, further
studies need to be done to investigate both the generality of time-scale
relationships observed here and neural ODE feasibility in modeling the
compressible reacting Navier–Stokes equations and atmospheric flows.
Additionally, benchmarking different surrogate modeling architectures
(such as DeepONets [89], Fourier neural operators [90], and other
approaches), and assessing the transferability of the latent time scale
trends uncovered in this work to these different architectures, is a
useful direction. Such aspects will be explored in future work.
CRediT authorship contribution statement
Ashish S. Nair: Writing review & editing, Writing original
draft, Visualization, Validation, Software, Methodology, Investigation,
Formal analysis, Data curation, Conceptualization. Shivam Barwey:
Writing review & editing, Writing original draft, Supervision, Re-
sources, Project administration, Methodology, Investigation, Funding
acquisition, Formal analysis, Conceptualization. Pinaki Pal: Writing
review & editing, Supervision, Project administration, Investigation,
Formal analysis. Jonathan F. MacArt: Writing review & editing, Su-
pervision, Project administration, Investigation, Formal analysis. Troy
Arcomano: Data curation, Formal analysis, Investigation, Writing re-
view & editing. Romit Maulik: Writing review & editing, Supervision,
Resources, Project administration, Methodology, Funding acquisition,
Formal analysis, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgments
The manuscript has been created by UChicago Argonne, LLC, Oper-
ator of Argonne National Laboratory (Argonne). The U.S. Government
retains for itself, and others acting on its behalf, a paid-up nonex-
clusive, irrevocable world-wide license in said article to reproduce,
prepare derivative works, distribute copies to the public, and perform
publicly and display publicly, by or on behalf of the Government.
This work was supported by the U.S. Department of Energy (DOE),
Office of Science under contract DE-AC02-06CH11357. This research
used resources of the Argonne Leadership Computing Facility (ALCF),
a DOE Office of Science user facility at Argonne. The authors would
also like to acknowledge the ALCF summer internship program. SB
and PP acknowledge laboratory-directed research and development
(LDRD) funding support from Argonne’s Advanced Energy Technologies
(AET) directorate through the Advanced Energy Technology and Secu-
rity (AETS) Fellowship. RM acknowledges funding support from DOE
Advanced Scientific Computing Research (ASCR) for DOE-FOA-2493
‘‘Data-intensive scientific machine learning’’.
Physica D: Nonlinear Phenomena 476 (2025) 134650
14
A.S. Nair et al.
Data availability
Data will be made available on request.
References
[1] P. Moin, K. Mahesh, Direct numerical simulation: a tool in turbulence research,
Annu. Rev. Fluid Mech. 30 (1) (1998) 539–578.
[2] V. Raman, M. Hassanaly, Emerging trends in numerical simulations of
combustion systems, Proc. Combust. Inst. 37 (2) (2019) 2073–2089.
[3] F. Alexander, A. Almgren, J. Bell, A. Bhattacharjee, J. Chen, P. Colella, D. Daniel,
J. DeSlippe, L. Diachin, E. Draeger, et al., Exascale applications: skin in the game,
Phil. Trans. R. Soc. A 378 (2166) (2020) 20190056.
[4] D.J. Lucia, P.S. Beran, W.A. Silva, Reduced-order modeling: new approaches for
computational physics, Prog. Aerosp. Sci. 40 (1–2) (2004) 51–117.
[5] R.O. Fox, Large-eddy-simulation tools for multiphase flows, Annu. Rev. Fluid
Mech. 44 (2012) 47–76.
[6] C. Fureby, Towards the use of large eddy simulation in engineering, Prog. Aerosp.
Sci. 44 (6) (2008) 381–396.
[7] J.A. Langford, R.D. Moser, Optimal LES formulations for isotropic turbulence, J.
Fluid Mech. 398 (1999) 321–346.
[8] M. Germano, U. Piomelli, P. Moin, W.H. Cabot, A dynamic subgrid-scale eddy
viscosity model, Phys. Fluids A: Fluid Dyn. 3 (7) (1991) 1760–1765.
[9] R.J. Adrian, P. Moin, Stochastic estimation of organized turbulent structure:
homogeneous shear flow, J. Fluid Mech. 190 (1988) 531–559.
[10] C. Foias, G.R. Sell, R. Temam, Inertial manifolds for nonlinear evolutionary
equations, J. Differential Equations 73 (2) (1988) 309–353.
[11] M. Akram, M. Hassanaly, V. Raman, An approximate inertial manifold (AIM)
based closure for turbulent flows, AIP Adv. 12 (7) (2022).
[12] E.J. Parish, K. Duraisamy, Non-Markovian closure models for large eddy sim-
ulations using the Mori-Zwanzig formalism, Phys. Rev. Fluids 2 (1) (2017)
014604.
[13] H. Pitsch, Large-eddy simulation of turbulent combustion, Annu. Rev. Fluid
Mech. 38 (2006) 453–482.
[14] G. Boffetta, R.E. Ecke, Two-dimensional turbulence, Annu. Rev. Fluid Mech. 44
(2012) 427–451.
[15] J.N. Kutz, Deep learning in fluid dynamics, J. Fluid Mech. 814 (2017) 1–4.
[16] K. Duraisamy, G. Iaccarino, H. Xiao, Turbulence modeling in the age of data,
Annu. Rev. Fluid Mech. 51 (2019) 357–377.
[17] G. Berkooz, P. Holmes, J.L. Lumley, The proper orthogonal decomposition in the
analysis of turbulent flows, Annu. Rev. Fluid Mech. 25 (1) (1993) 539–575.
[18] P.J. Schmid, Dynamic mode decomposition and its variants, Annu. Rev. Fluid
Mech. 54 (2022) 225–254.
[19] A. Towne, O.T. Schmidt, T. Colonius, Spectral proper orthogonal decomposition
and its relationship to dynamic mode decomposition and resolvent analysis, J.
Fluid Mech. 847 (2018) 821–867.
[20] J. Burkardt, M. Gunzburger, H.-C. Lee, Centroidal Voronoi tessellation-based
reduced-order modeling of complex systems, SIAM J. Sci. Comput. 28 (2) (2006)
459–484.
[21] E. Kaiser, B.R. Noack, L. Cordier, A. Spohn, M. Segond, M. Abel, G. Daviller,
J. Östh, S. Krajnović, R.K. Niven, Cluster-based reduced-order modelling of a
mixing layer, J. Fluid Mech. 754 (2014) 365–414.
[22] S. Barwey, V. Raman, Data-driven reduction and decomposition with time-axis
clustering, Proc. R. Soc. A 479 (2274) (2023) 20220776.
[23] S. Hijazi, G. Stabile, A. Mola, G. Rozza, Data-driven POD-Galerkin reduced order
model for turbulent flows, J. Comput. Phys. 416 (2020) 109513.
[24] A. Lario, R. Maulik, O.T. Schmidt, G. Rozza, G. Mengaldo, Neural-network
learning of SPOD latent dynamics, J. Comput. Phys. 468 (2022) 111475.
[25] M. Ohlberger, S. Rave, Reduced basis methods: Success, limitations and future
challenges, 2015, arXiv preprint arXiv:1511.02021.
[26] K. Lee, K.T. Carlberg, Model reduction of dynamical systems on nonlin-
ear manifolds using deep convolutional autoencoders, J. Comput. Phys. 404
(2020) 108973, http://dx.doi.org/10.1016/j.jcp.2019.108973, URL https://www.
sciencedirect.com/science/article/pii/S0021999119306783.
[27] R. Maulik, B. Lusch, P. Balaprakash, Reduced-order modeling of advection-
dominated systems with recurrent neural networks and convolutional autoen-
coders, Phys. Fluids 33 (3) (2021).
[28] K. Lee, K.T. Carlberg, Model reduction of dynamical systems on nonlinear
manifolds using deep convolutional autoencoders, J. Comput. Phys. 404 (2020)
108973.
[29] A. Glaws, R. King, M. Sprague, Deep learning for in situ data compression of
large turbulent flow simulations, Phys. Rev. Fluids 5 (11) (2020) 114602.
[30] M. Tschannen, O. Bachem, M. Lucic, Recent advances in autoencoder-based
representation learning, 2018, arXiv preprint arXiv:1812.05069.
[31] B. Lusch, J.N. Kutz, S.L. Brunton, Deep learning for universal linear embeddings
of nonlinear dynamics, Nat. Commun. 9 (1) (2018) 4950.
[32] A. Gruber, M. Gunzburger, L. Ju, Z. Wang, A comparison of neural network
architectures for data-driven reduced-order modeling, Comput. Methods Appl.
Mech. Engrg. 393 (2022) 114764.
[33] S. Barwey, V. Shankar, V. Viswanathan, R. Maulik, Multiscale graph neural
network autoencoders for interpretable scientific machine learning, J. Comput.
Phys. (2023) 112537.
[34] D. Liu, G. Liu, A transformer-based variational autoencoder for sentence genera-
tion, in: 2019 International Joint Conference on Neural Networks, IJCNN, IEEE,
2019, pp. 1–7.
[35] K. Lee, E.J. Parish, Parameterized neural ordinary differential equations: Appli-
cations to computational physics problems, Proc. R. Soc. A 477 (2253) (2021)
20210162.
[36] J. Xu, K. Duraisamy, Multi-level convolutional autoencoder networks for para-
metric prediction of spatio-temporal dynamics, Comput. Methods Appl. Mech.
Engrg. 372 (2020) 113379.
[37] J. Yu, C. Yan, M. Guo, Non-intrusive reduced-order modeling for fluid problems:
A brief review, Proc. Inst. Mech. Eng. Part G: J. Aerosp. Eng. 233 (16) (2019)
5896–5912.
[38] S. Fresca, L. Dede’, A. Manzoni, A comprehensive deep learning-based approach
to reduced order modeling of nonlinear time-dependent parametrized PDEs, J.
Sci. Comput. 87 (2021) 1–36.
[39] K. Duraisamy, Z.J. Zhang, A.P. Singh, New approaches in turbulence and
transition modeling using data-driven techniques, in: 53rd AIAA Aerospace
Sciences Meeting, 2015, p. 1284.
[40] J.S. Hesthaven, S. Ubbiali, Non-intrusive reduced order modeling of nonlinear
problems using neural networks, J. Comput. Phys. 363 (2018) 55–78.
[41] R. Maulik, A. Mohan, B. Lusch, S. Madireddy, P. Balaprakash, D. Livescu, Time-
series learning of latent-space dynamics for reduced-order model closure, Phys.
D: Nonlinear Phenom. 405 (2020) 132368.
[42] S.B. Reddy, A.R. Magee, R.K. Jaiman, J. Liu, W. Xu, A. Choudhary, A. Hussain,
Reduced order model for unsteady fluid flows via recurrent neural networks,
in: International Conference on Offshore Mechanics and Arctic Engineering, Vol.
58776, American Society of Mechanical Engineers, 2019, V002T08A007.
[43] W. Ji, W. Qiu, Z. Shi, S. Pan, S. Deng, Stiff-pinn: Physics-informed neural network
for stiff chemical kinetics, J. Phys. Chem. A 125 (36) (2021) 8098–8106.
[44] Z. Jiang, P. Tahmasebi, Z. Mao, Deep residual U-net convolution neural networks
with autoregressive strategy for fluid flow predictions in large-scale geosystems,
Adv. Water Resour. 150 (2021) 103878.
[45] E. Yeung, S. Kundu, N. Hodas, Learning deep neural network representations for
Koopman operators of nonlinear dynamical systems, in: 2019 American Control
Conference, ACC, IEEE, 2019, pp. 4832–4839.
[46] R.T. Chen, Y. Rubanova, J. Bettencourt, D.K. Duvenaud, Neural ordinary
differential equations, Adv. Neural Inf. Process. Syst. 31 (2018).
[47] O. Owoyele, P. Pal, ChemNODE: A neural ordinary differential equations
framework for efficient chemical kinetic solvers, Energy AI 7 (2021) 100118.
[48] C. Finlay, J.-H. Jacobsen, L. Nurbekyan, A. Oberman, How to train your
neural ODE: the world of Jacobian and kinetic regularization, in: International
Conference on Machine Learning, PMLR, 2020, pp. 3154–3164.
[49] T. Kumar, A. Kumar, P. Pal, A physics-constrained neuralODE approach for robust
learning of stiff chemical kinetics, in: NeurIPS Machine Learning and the Physical
Sciences Workshop, 2023.
[50] T. Kumar, A. Kumar, P. Pal, A physics-constrained neural ordinary differential
equations approach for robust learning of stiff chemical kinetics, Combust.
Theory Model. (2025) 1–16.
[51] L. Böttcher, L.L. Fonseca, R.C. Laubenbacher, Control of medical digital twins
with artificial neural networks, 2024, arXiv:2403.13851, URL https://arxiv.org/
abs/2403.13851.
[52] Y.-J. Wang, C.-T. Lin, Runge-Kutta neural network for identification of dynamical
systems in high accuracy, IEEE Trans. Neural Netw. 9 2 (1998) 294–307, URL
https://api.semanticscholar.org/CorpusID:19332252.
[53] S. Dutta, P. Rivera-Casillas, M.W. Farthing, Neural ordinary differential equations
for data-driven reduced order modeling of environmental hydrodynamics, 2021,
arXiv preprint arXiv:2104.13962.
[54] A.J. Linot, J.W. Burby, Q. Tang, P. Balaprakash, M.D. Graham, R. Maulik,
Stabilized neural ordinary differential equations for long-time forecasting of
dynamical systems, J. Comput. Phys. 474 (2023) http://dx.doi.org/10.1016/j.
jcp.2022.111838.
[55] Z.Y. Wan, L. Zepeda-Núñez, A. Boral, F. Sha, Evolve smoothly, fit consistently:
Learning smooth latent dynamics for advection-dominated systems, 2023, arXiv
preprint arXiv:2301.10391.
[56] V. Vijayarangan, H. Uranakara, S. Barwey, R. Malpica Galassi, R. Malik, M.
Valorani, V. Raman, H. Im, A data-driven reduced-order model for stiff chemical
kinetics using dynamics-informed training, Energy AI 15 (2023) 100325, http:
//dx.doi.org/10.1016/j.egyai.2023.100325.
[57] T. Kumar, A. Kumar, P. Pal, A physics-informed autoencoder-neuralode frame-
work (Phy-ChemNODE) for learning complex fuel combustion kinetics, in:
NeurIPS Machine Learning and the Physical Sciences Workshop, 2024.
[58] S. Mowlavi, S. Nabi, Optimal control of PDEs using physics-informed neural
networks, J. Comput. Phys. 473 (2021) 111731, URL https://api.semanticscholar.
org/CorpusID:244346593.
[59] L. Böttcher, N. Antulov-Fantulin, T. Asikis, AI pontryagin or how artificial neural
networks learn to control dynamical systems, Nat. Commun. 13 (2021) URL
https://api.semanticscholar.org/CorpusID:246022611.
Physica D: Nonlinear Phenomena 476 (2025) 134650
15
A.S. Nair et al.
[60] L. Böttcher, N. Antulov-Fantulin, T. Asikis, AI pontryagin or how artificial neural
networks learn to control dynamical systems, Nat. Commun. 13 (333) (2022)
http://dx.doi.org/10.1038/s41467-021-27590-0, URL https://www.nature.com/
articles/s41467-022-27920-6.
[61] D. Amsallem, J. Cortial, K. Carlberg, C. Farhat, A method for interpolating
on manifolds structural dynamics reduced-order models, Internat. J. Numer.
Methods Engrg. 80 (2009) 1241–1258, http://dx.doi.org/10.1002/nme.2681.
[62] T. Lieu, C. Farhat, Adaptation of aeroelastic reduced-order models and
application to an F-16 configuration, AIAA J. 45 (6) (2007) 1244–1257.
[63] T. Lieu, C. Farhat, M. Lesoinne, Reduced-order fluid/structure modeling of
a complete aircraft configuration, Comput. Methods Appl. Mech. Engrg. 195
(41–43) (2006) 5730–5742.
[64] L. Mathelin, O. Le Maıˆ
tre, Robust control of uncertain cylinder wake flows based
on robust reduced order models, Comput. & Fluids 38 (6) (2009) 1168–1182.
[65] S. Hovland, J.T. Gravdahl, K.E. Willcox, Explicit model predictive control for
large-scale systems via model reduction, J. Guid. Control Dyn. 31 (4) (2008)
918–926.
[66] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance
deep learning library, Adv. Neural Inf. Process. Syst. 32 (2019).
[67] R.T.Q. Chen, Torchdiffeq, 2018, URL https://github.com/rtqichen/torchdiffeq.
[68] U. Maas, S.B. Pope, Implementation of simplified chemical kinetics based on
intrinsic low-dimensional manifolds, 1992, pp. 103–112.
[69] K.D. Mease, S. Bharadwaj, S. Iravanchy, Timescale analysis for nonlinear dy-
namical systems, J. Guid. Control Dyn. 26 (2003) 318–330, http://dx.doi.org/
10.2514/2.5049.
[70] M. Valorani, D.A. Goussis, Explicit time-scale splitting algorithm for stiff prob-
lems: Auto-ignition of gaseous mixtures behind a steady shock, J. Comput. Phys.
169 (2001) 44–79, http://dx.doi.org/10.1006/jcph.2001.6709.
[71] D. Kochkov, J.A. Smith, A. Alieva, Q. Wang, M.P. Brenner, S. Hoyer, Machine
learning–accelerated computational fluid dynamics, Proc. Natl. Acad. Sci. 118
(21) (2021) http://dx.doi.org/10.1073/pnas.2101784118.
[72] G. Dresdner, D. Kochkov, P. Norgaard, L. Zepeda-Núñez, J.A. Smith, M.P.
Brenner, S. Hoyer, Learning to correct spectral methods for simulating turbulent
flows, 2023, arXiv:2207.00556.
[73] Y. Bar-Sinai, S. Hoyer, J. Hickey, M.P. Brenner, Learning data-driven discretiza-
tions for partial differential equations, Proc. Natl. Acad. Sci. 116 (31) (2019)
15344–15349, http://dx.doi.org/10.1073/pnas.1814058116.
[74] J.E. Shepherd, Detonation in gases, Proc. Combust. Inst. 32 (1) (2009) 83–98.
[75] V. Raman, S. Prakash, M. Gamba, Nonidealities in rotating detonation engines,
Annu. Rev. Fluid Mech. 55 (2023) 639–674.
[76] S. Yungster, K. Radhakrishnan, Pulsating one-dimensional detonations in
hydrogen–air mixtures, Combust. Theory Model. 8 (4) (2004) 745.
[77] S. Sharma, R. Bielawski, O. Gibson, S. Zhang, V. Sharma, A.H. Rauch, J. Singh, S.
Abisleiman, M. Ullman, S. Barwey, et al., An AMReX-based compressible reacting
flow solver for high-speed reacting flows relevant to hypersonic propulsion, 2024,
arXiv preprint arXiv:2412.00900.
[78] W. Zhang, A. Almgren, V. Beckner, J. Bell, J. Blaschke, C. Chan, M. Day, B.
Friesen, K. Gott, D. Graves, et al., AMReX: a framework for block-structured
adaptive mesh refinement, J. Open Source Softw. 4 (37) (2019) 1370.
[79] R. Bielawski, S. Barwey, S. Prakash, V. Raman, Highly-scalable GPU-accelerated
compressible reacting flow solver for modeling high-speed flows, Comput. &
Fluids (2023) 105972.
[80] P. Batten, N. Clarke, C. Lambert, D.M. Causon, On the choice of wavespeeds for
the HLLC Riemann solver, SIAM J. Sci. Comput. 18 (6) (1997) 1553–1570.
[81] D.G. Goodwin, H.K. Moffat, R.L. Speth, Cantera: An object-oriented software
toolkit for chemical kinetics, thermodynamics, and transport processes, 2018.
[82] M. Mueller, T. Kim, R. Yetter, F. Dryer, Flow reactor studies and kinetic modeling
of the H2/O2 reaction, Int. J. Chem. Kinet. 31 (2) (1999) 113–125.
[83] S. Browne, J. Ziegler, N. Bitter, B. Schmidt, J. Lawson, J. Shepherd, SDToolbox:
Numerical tools for shock and detonation wave modeling, 2018.
[84] F. Molteni, Atmospheric simulations using a GCM with simplified physi-
cal parametrizations. I: model climatology and variability in multi-decadal
experiments, Clim. Dyn. 20 (2) (2003) 175–191.
[85] F. Kucharski, F. Molteni, A. Bracco, Decadal interactions between the western
tropical Pacific and the North Atlantic Oscillation, Clim. Dyn. 26 (1) (2006)
79–91.
[86] A.S. Krishnapriyan, A.F. Queiruga, N.B. Erichson, M.W. Mahoney, Learning
continuous models for continuous physics, Commun. Phys. 6 (1) (2023) 319,
http://dx.doi.org/10.1038/s42005-023-01433-4.
[87] A. Mohan, A. Chattopadhyay, J. Miller, What you see is not what you get:
Neural partial differential equations and the illusion of learning, 2024, arXiv:
2411.15101, URL https://arxiv.org/abs/2411.15101.
[88] D. Chakraborty, S. Barwey, H. Zhang, R. Maulik, A note on the error analysis
of data-driven closure models for large eddy simulations of turbulence, 2024,
arXiv:2405.17612, URL https://arxiv.org/abs/2405.17612.
[89] L. Lu, P. Jin, G. Pang, Z. Zhang, G.E. Karniadakis, Learning nonlinear operators
via DeepONet based on the universal approximation theorem of operators, Nat.
Mach. Intell. 3 (3) (2021) 218–229.
[90] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart,
A. Anandkumar, Fourier neural operator for parametric partial differential
equations, 2020, arXiv preprint arXiv:2010.08895.
Physica D: Nonlinear Phenomena 476 (2025) 134650
16
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The high computational cost associated with solving for detailed chemistry poses a significant challenge for predictive computational fluid dynamics (CFD) simulations of turbulent reacting flows. While deep learning techniques have been explored to develop faster surrogate models, they often fail to integrate reliably with CFD solvers. This instability arises because traditional deep learning approaches optimize for training error without ensuring compatibility with ordinary differential equation (ODE) solvers, resulting in accumulation of errors over time. Recently, neuralODE (NODE) based approaches have been shown to be a promising technique to emulate and accelerate detailed chemistry computations. In the present work, we extend this NODE framework for stiff chemical kinetics by incorporating mass conservation constraints directly into the loss function during training. This ensures that the total mass as well as the individual elemental species masses are conserved in an a-posteriori manner. Proof-of-concept studies are performed with the novel physics-constrained NODE (PC-NODE) approach for homogeneous autoignition of hydrogen-air mixture over a range of composition and thermodynamic conditions. It is demonstrated that the PC-NODE framework not only improves the physical consistency of the resulting data-driven model with respect to mass conservation criteria, but also improves training efficiency. PC-NODE is shown to achieve 2-100x speedup relative to the hydrogen-air detailed chemical mechanism depending on the type of the ODE solver (implicit or explicit) used during autoregressive inference tests. Lastly, a-posteriori studies are performed wherein the trained PC-NODE model is coupled with a CFD solver. It is shown that higher accuracy is achieved with PC-NODE relative to the purely data-driven NODE approach. Moreover, PC-NODE also exhibits robustness and generalizability to unseen initial conditions from within (interpolative capability) as well as outside (extrapolative capability) the training regime.
Conference Paper
Full-text available
Predictive numerical simulations of energy conversion systems involving reacting flows are accompanied by high computational cost of solving a system of stiff ordinary differential equations (ODEs) associated with detailed fuel chemistry. This bottleneck becomes more prohibitive for complex hydrocarbon fuels with an increase in the number of reactive species and chemical reactions governing chemical kinetics. In this work, a physics-informed Autoencoder (AE)-neural ODE framework (known as Phy-ChemNODE) is developed for data-driven modeling of stiff chemical kinetics, wherein a non-linear autoencoder (AE) is employed for dimensionality reduction of the thermochemical state and the NODE learns the temporal evolution of the dynamical system in the latent space obtained from the AE. Both the AE and NODE are trained together in an end-to-end manner. We further enhance the approach by incorporating elemental mass conservation constraints directly into the loss function during model training. Demonstration studies are performed for methane-oxygen combustion kinetics (32 species, 266 chemical reactions) over a wide thermodynamic and composition space at high pressure. Effects of model hyperparameters, such as relative weighting of different terms in the loss function and dimensionality of the AE latent space, are assessed on the accuracy of Phy-ChemNODE. A posteriori autoregressive inference tests show that Phy-ChemNODE achieves 1-3 orders of magnitude speedup relative to the methane-oxygen chemical mechanism depending on the type of the ODE solver (implicit or explicit) used, while ensuring prediction fidelity and mass conservation.
Article
Full-text available
Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties. This results in models that often do not capture any underlying continuous dynamics—either of the system of interest, or indeed of any related system. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates an underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
Conference Paper
Full-text available
The high computational cost associated with solving for detailed chemistry poses a significant challenge for predictive computational fluid dynamics (CFD) simulations of turbulent reacting flows. These models often require solving a system of coupled stiff ordinary differential equations (ODEs). While deep learning techniques have been experimented with to develop faster surrogate models, they often fail to integrate reliably with CFD solvers. This instability arises because deep learning methods optimize for training error without ensuring compatibility with ODE solvers, leading to accumulation of errors over time. Recently, NeuralODE-based techniques have offered a promising solution by effectively modeling chemical kinetics. In this study, we extend the NeuralODE framework for stiff chemical kinetics (ChemNODE) by incorporating mass conservation constraints directly into the loss function during training. This ensures that the total mass and the elemental mass are conserved, a critical requirement for reliable downstream integration with CFD solvers. Our results demonstrate that this enhancement not only improves the physical consistency with respect to mass conservation criteria but also ensures better robustness and makes the training process more computationally efficient.
Article
Full-text available
A new approach for modal decomposition through re-interpretation of unsteady dynamics, termed time-axis clustering, is developed in this work and is demonstrated on an experimental turbulent reacting flow dataset consisting of simultaneously measured planar OH-PLIF and PIV fields in a model combustor. The method executes a K-Means clustering algorithm on an alternative representation of the input snapshot dataset: the dataset is interpreted here as a collection of one-dimensional time series, where each time series represents the time evolution of some flow quantity of interest (QoI) at a fixed point in physical space (i.e. pixel locations). The clustering algorithm in the modal decomposition context produces (a) spatial modes (called time-axis modes) that identify localized regions of dynamical similarity in physical space and (b) temporal coefficients that represent average trajectories of the flow QoI conditioned on the regions in physical space identified by the corresponding spatial mode. Due to the non-overlapping nature of K-Means clusters, visualization of the modes provides a unique pathway for flow feature extraction based on dynamical similarity. Ultimately, this work shows how time-axis clustering provides a promising avenue for domain-localized data-based modelling of complex fluid flows.
Article
Full-text available
Physics-informed neural networks (PINNs) have recently become a popular method for solving forward and inverse problems governed by partial differential equations (PDEs). By incorporating the residual of the PDE into the loss function of a neural network-based surrogate model for the unknown state, PINNs can seamlessly blend measurement data with physical constraints. Here, we extend this framework to PDE-constrained optimal control problems, for which the governing PDE is fully known and the goal is to find a control variable that minimizes a desired cost objective. We provide a set of guidelines for obtaining a good optimal control solution; first by selecting an appropriate PINN architecture and training parameters based on a forward problem, second by choosing the best value for a critical scalar weight in the loss function using a simple but effective two-step line search strategy. We then validate the performance of the PINN framework by comparing it to adjoint-based nonlinear optimal control, which performs gradient descent on the discretized control variable while satisfying the discretized PDE. This comparison is carried out on several distributed control examples based on the Laplace, Burgers, Kuramoto-Sivashinsky, and Navier-Stokes equations. Finally, we discuss the advantages and caveats of using the PINN and adjoint-based approaches for solving optimal control problems constrained by nonlinear PDEs.
Article
In data-driven modeling of spatiotemporal phenomena careful consideration is needed in capturing the dynamics of the high wavenumbers. This problem becomes especially challenging when the system of interest exhibits shocks or chaotic dynamics. We present a data-driven modeling method that accurately captures shocks and chaotic dynamics by proposing a new architecture, stabilized neural ordinary differential equation (ODE). In our proposed architecture, we learn the right-hand-side (RHS) of an ODE by adding the outputs of two NN together where one learns a linear term and the other a nonlinear term. Specifically, we implement this by training a sparse linear convolutional NN to learn the linear term and a dense fully-connected nonlinear NN to learn the nonlinear term. This contrasts with the standard neural ODE which involves training a single NN for the RHS. We apply this setup to the viscous Burgers equation, which exhibits shocked behavior, and show stabilized neural ODEs provide better short-time tracking, prediction of the energy spectrum, and robustness to noisy initial conditions than standard neural ODEs. We also apply this method to chaotic trajectories of the Kuramoto-Sivashinsky equation. In this case, stabilized neural ODEs keep long-time trajectories on the attractor, and are highly robust to noisy initial conditions, while standard neural ODEs fail at achieving either of these results. We conclude by demonstrating how stabilizing neural ODEs provide a natural extension for use in reduced-order modeling by projecting the dynamics onto the eigenvectors of the learned linear term.