John Harlim

John Harlim
Pennsylvania State University | Penn State · Department of Mathematics

PhD

About

84
Publications
9,249
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,775
Citations
Additional affiliations
July 2018 - present
Pennsylvania State University
Position
  • Professor (Full)
July 2013 - June 2018
Pennsylvania State University
Position
  • Professor (Associate)
July 2009 - June 2013
North Carolina State University
Position
  • Professor (Assistant)

Publications

Publications (84)
Article
Full-text available
This letter presents a non-parametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. The key idea is to represent the discrete shift maps on a smooth basis which can be obtained by the diffusion maps algorithm. In the limit of large data, this approach converges to a Galerkin projection of the semigroup...
Preprint
Full-text available
This article presents a general framework for recovering missing dynamical systems using available data and machine learning techniques. The proposed framework reformulates the prediction problem as a supervised learning problem to approximate a map that takes the memories of the resolved and identifiable unresolved variables to the missing compone...
Preprint
Full-text available
In this paper, we extend the class of kernel methods, the so-called diffusion maps (DM), and its local kernel variants, to approximate second-order differential operators defined on smooth manifolds with boundaries that naturally arise in elliptic PDE models. To achieve this goal, we introduce the Ghost Point Diffusion Maps (GPDM) estimator on an e...
Preprint
Full-text available
This paper studies the theoretical underpinnings of machine learning of ergodic It\^o diffusions. The objective is to understand the convergence properties of the invariant statistics when the underlying system of stochastic differential equations (SDEs) is empirically estimated with a supervised regression framework. Using the perturbation theory...
Article
We propose a machine learning (ML) non-Markovian closure modelling framework for accurate predictions of statistical responses of turbulent dynamical systems subjected to external forcings. One of the difficulties in this statistical closure problem is the lack of training data, which is a configuration that is not desirable in supervised learning...
Article
In this paper, we extend the class of kernel methods, the so-called diffusion maps (DM) and its local kernel variants to approximate second-order differential operators defined on smooth manifolds with boundaries that naturally arise in elliptic PDE models. To achieve this goal, we introduce the ghost point diffusion maps (GPDM) estimator on an ext...
Article
This paper develops manifold learning techniques for the numerical solution of PDE-constrained Bayesian inverse problems on manifolds with boundaries. We introduce graphical Matérn-type Gaussian field priors that enable flexible modeling near the boundaries, representing boundary values by superposition of harmonic functions with appropriate Dirich...
Preprint
Full-text available
We study the spectral convergence of a symmetrized Graph Laplacian matrix induced by a Gaussian kernel evaluated on pairs of embedded data, sampled from a manifold with boundary, a sub-manifold of $\mathbb{R}^m$. Specifically, we deduce the convergence rates for eigenpairs of the discrete Graph-Laplacian matrix to the eigensolutions of the Laplace-...
Article
Full-text available
Plain Language Summary One major challenge in applying machine learning algorithms for predicting the El Nino‐Southern Oscillation is the shortage of observational training data. In this article, a simple and efficient Bayesian machine learning (BML) training algorithm is developed, which exploits only a 20‐year observational time series for traini...
Preprint
Full-text available
In this paper, we consider the density estimation problem associated with the stationary measure of ergodic Itô diffusions from a discrete-time series that approximate the solutions of the stochastic differential equations. To take an advantage of the characterization of density function through the stationary solution of a parabolic-type Fokker-Pl...
Preprint
We propose a Machine Learning (ML) non-Markovian closure modeling framework for accurate predictions of statistical responses of turbulent dynamical systems subjected to external forcings. One of the difficulties in this statistical closure problem is the lack of training data, which is a configuration that is not desirable in supervised learning w...
Article
This paper studies the theoretical underpinnings of machine learning of ergodic Itô diffusions. The objective is to understand the convergence properties of the invariant statistics when the underlying system of stochastic differential equations (SDEs) is empirically estimated with a supervised regression framework. Using the perturbation theory of...
Preprint
Full-text available
This paper proposes a mesh-free computational framework and machine learning theory for solving elliptic PDEs on unknown manifolds, identified with point clouds, based on diffusion maps (DM) and deep learning. The PDE solver is formulated as a supervised learning task to solve a least-squares regression problem that imposes an algebraic equation ap...
Preprint
Full-text available
This paper develops manifold learning techniques for the numerical solution of PDE-constrained Bayesian inverse problems on manifolds with boundaries. We introduce graphical Mat\'ern-type Gaussian field priors that enable flexible modeling near the boundaries, representing boundary values by superposition of harmonic functions with appropriate Diri...
Preprint
Full-text available
In this paper, we extend the class of kernel methods, the so-called diffusion maps (DM) and ghost point diffusion maps (GPDM), to solve the time-dependent advection-diffusion PDE on unknown smooth manifolds without and with boundaries. The core idea is to directly approximate the spatial components of the differential operator on the manifold with...
Preprint
Full-text available
A simple and efficient Bayesian machine learning (BML) training and forecasting algorithm, which exploits only a 20-year short observational time series and an approximate prior model, is developed to predict the Ni\~no 3 sea surface temperature (SST) index. The BML forecast significantly outperforms model-based ensemble predictions and standard ma...
Article
Recently, we proposed a method to estimate parameters of stochastic dynamics based on the linear response statistics. The method rests upon a nonlinear least-squares problem that takes into account the response properties that stem from the Fluctuation-Dissipation Theory. In this article, we address an important issue that arises in the presence of...
Article
In this paper, we extend the diffusion maps algorithm on a family of heat kernels that are either local (having exponential decay) or nonlocal (having polynomial decay), arising in various applications. For example, these kernels have been used as a regularizer in various supervised learning tasks for denoising images. Importantly, these heat kerne...
Article
A nonparametric method to predict non-Markovian time series of partially observed dynamics is developed. The prediction problem we consider is a supervised learning task of finding a regression function that takes a delay-embedded observable to the observable at a future time. When delay-embedding theory is applicable, the proposed regression funct...
Preprint
Full-text available
We study the problem of estimating linear response statistics under external perturbations using time series of unperturbed dynamics. Based on the fluctuation-dissipation theory, this problem is reformulated as an unsupervised learning task of estimating a density function. We consider a nonparametric density estimator formulated by the kernel embe...
Article
We study the problem of estimating linear response statistics under external perturbations using time series of unperturbed dynamics. Based on the fluctuation-dissipation theory, this problem is reformulated as an unsupervised learning task of estimating a density function. We consider a nonparametric density estimator formulated by the kernel embe...
Article
This paper investigates the formulation and implementation of Bayesian inverse problems to learn input parameters of partial differential equations (PDEs) defined on manifolds. Specifically, we study the inverse problem of determining the diffusion coefficient of a second-order elliptic PDE on a closed manifold from noisy measurements of the soluti...
Preprint
Full-text available
This short review describes mathematical techniques for statistical analysis and prediction in dynamical systems. Two problems are discussed, namely (i) the supervised learning problem of forecasting the time evolution of an observable under potentially incomplete observations at forecast initialization; and (ii) the unsupervised learning problem o...
Article
This short review describes mathematical techniques for statistical analysis and prediction in dynamical systems. Two problems are discussed, namely (i) the supervised learning problem of forecasting the time evolution of an observable under potentially incomplete observations at forecast initialization; and (ii) the unsupervised learning problem o...
Article
Full-text available
This article presents a general framework for recovering missing dynamical systems using available data and machine learning techniques. The proposed framework reformulates the prediction problem as a supervised learning problem to approximate a map that takes the memories of the resolved and identifiable unresolved variables to the missing compone...
Conference Paper
Full-text available
We examine using discrete backward diffusion to produce digital halftones. The noise introduced by the discrete approximation to backwards diffusion forces the intensity away from uniform values, so that rounding each pixel to black or white can produce a pleasing halftone. We formulate our method by considering the Human Visual System norm and app...
Article
Full-text available
In this paper, we consider modeling missing dynamics with a nonparametric non-Markovian model, constructed using the theory of kernel embedding of conditional distributions on appropriate reproducing kernel Hilbert spaces (RKHS), equipped with orthonormal basis functions. Depending on the choice of the basis functions, the resulting closure model f...
Preprint
Full-text available
A nonparametric method to predict non-Markovian time series of partially observed dynamics is developed. The prediction problem we consider is a supervised learning task of finding a regression function that takes a delay embedded observable to the observable at a future time. When delay embedding theory is applicable, the proposed regression funct...
Preprint
Full-text available
In this paper, we consider modeling missing dynamics with a nonparametric non-Markovian model, constructed using the theory of kernel embedding of conditional distributions on appropriate Reproducing Kernel Hilbert Spaces (RKHS), equipped with orthonormal basis functions. Depending on the choice of the basis functions, the resulting closure model f...
Preprint
Full-text available
Recently, we proposed a method to estimate parameters in stochastic dynamics models based on the linear response statistics. The technique rests upon a nonlinear least-squares problem that takes into account the response properties that stem from the Fluctuation-Dissipation Theory. In this article, we address an important issue that arises in the p...
Preprint
Full-text available
This paper investigates the formulation and implementation of Bayesian inverse problems to learn input parameters of partial differential equations (PDEs) defined on manifolds. Specifically, we study the inverse problem of determining the diffusion coefficient of a second-order elliptic PDE on a closed manifold from noisy measurements of the soluti...
Preprint
A mesh-free numerical method for solving linear elliptic PDE's using the local kernel theory that was developed for manifold learning is proposed. In particular, this novel approach exploits the local kernel theory which allows one to approximate the Kolmogorov operator associated with Itô processes on compact Riemannian manifolds without boundary...
Article
Full-text available
A mesh-free numerical method for solving linear elliptic PDE's using the local kernel theory that was developed for manifold learning is proposed. In particular, this novel approach exploits the local kernel theory which allows one to approximate the Kolmogorov operator associated with Itô processes on compact Riemannian manifolds without boundary...
Article
Full-text available
In this paper, we consider a surrogate modeling approach using a data-driven nonparametric likelihood function constructed on a manifold on which the data lie (or to which they are close). The proposed method represents the likelihood function using a spectral expansion formulation known as the kernel embedding of the conditional distribution. To r...
Preprint
Full-text available
In locally compact, separable metric measure spaces, heat kernels can be classified as either local (having exponential decay) or nonlocal (having polynomial decay). This dichotomy of heat kernels gives rise to operators that include (but are not restricted to) the generators of the classical Laplacian associated to Brownian processes as well as th...
Article
Full-text available
Recently, the theory of diffusion maps was extended to a large class of local kernels with exponential decay which were shown to represent various Riemannian geometries on a data set sampled from a manifold embedded in Euclidean space. Moreover, local kernels were used to represent a diffeomorphism, H, between a data set and a feature of interest u...
Article
Full-text available
An equation-by-equation (EBE) method is proposed to solve a system of nonlinear equations arising from the moment constrained maximum entropy problem of multidimensional variables. The design of the EBE scheme combines ideas from homotopy continuation on Newton's iterative method. Theoretically, we establish the local convergence under appropriate...
Article
Full-text available
The diffusion forecasting is a nonparametric approach that provably solves the Fokker-Planck PDE corresponding to Itô diffusion without knowing the underlying equation. The key idea of this method is to approximate the solution of the Fokker-Planck equation with a discrete representation of the shift (Koopman) operator on a set of basis functions g...
Book
Full-text available
Modern scientific computational methods are undergoing a transformative change; big data and statistical learning methods now have the potential to outperform the classical first-principles modeling paradigm. This book bridges this transition, connecting the theory of probability, stochastic processes, functional analysis, numerical analysis, and d...
Article
Full-text available
This paper demonstrates the efficacy of data-driven localization mappings for assimilating satellite-like observations in a dynamical system of intermediate complexity. In particular, a sparse network of synthetic brightness temperature measurements is simulated using an idealized radiative transfer model and assimilated to the monsoon-Hadley multi...
Article
Full-text available
This paper presents a numerical method to implement the parameter estimation method using response statistics that was recently formulated by the authors. The proposed approach formulates the parameter estimation problem of It\^o drift diffusions as a nonlinear least-square problem. To avoid solving the model repeatedly when using an iterative sche...
Article
Full-text available
This paper presents a new parameter estimation method for It\^{o} diffusions such that the resulting model predicts the equilibrium statistics as well as the sensitivities of the underlying system to external disturbances. Our formulation does not require the knowledge of the underlying system, however we assume that the linear response statistics...
Article
Full-text available
While the formulation of most data assimilation schemes assumes an unbiased observation model error, in real applications, model error with nontrivial biases is unavoidable. A practical example is errors in the radiative transfer model (which is used to assimilate satellite measurements) in the presence of clouds. Together with the dynamical model...
Chapter
Full-text available
This chapter provides various perspective on an important challenge in data assimilation: model error. While the overall goal is to understand the implication of model error of any type in data assimilation, we emphasize on the effect of model error from unresolved scales. In particular, connection to related subjects under different names in appli...
Article
Full-text available
A data-driven method for improving the correlation estimation in serial ensemble Kalman filters is introduced. The method finds a linear map that transforms, at each assimilation cycle, the poorly estimated sample correlation into an improved correlation. This map is obtained from an offline training procedure without any tuning as the solution of...
Article
Full-text available
In this paper, we apply a recently developed nonparametric modeling approach, the “diffusion forecast”, to predict the time-evolution of Fourier modes of turbulent dynamical systems. While the diffusion forecasting method assumes the availability of a noise-free training data set observing the full state space of the dynamics, in real applications...
Article
In this Reply we provide additional results which allow a better comparison of the diffusion forecast and the "past-noise" forecasting (PNF) approach for the El Niño index. We remark on some qualitative differences between the diffusion forecast and PNF, and we suggest an alternative use of the diffusion forecast for the purposes of forecasting the...
Article
Full-text available
In this paper, a semiparametric modeling approach is introduced as a paradigm for addressing model error arising from unresolved physical phenomena. Our approach compensates for model error by learning an auxiliary dynamical model for the unknown parameters. Practically, the proposed approach consists of the following steps. Given a physics-based m...
Article
Full-text available
This paper presents a computationally fast algorithm for estimating, both, the system and observation noise covariances of nonlinear dynamics, that can be used in an ensemble Kalman filtering framework. The new method is a modification of Belanger's recursive method, to avoid an expensive computational cost in inverting error covariance matrices of...
Article
Full-text available
This paper presents a nonparametric statistical modeling method for quantifying uncertainty in stochastic gradient systems with isotropic diffusion. The central idea is to apply the diffusion maps algorithm on a training data set to produce a stochastic matrix whose generator is a discrete approximation to the backward Kolmogorov operator of the un...
Article
Full-text available
Reduced models for the (defocusing) nonlinear Schrödinger equation are developed. In particular, we develop reduced models that only involve the low-frequency modes given noisy observations of these modes. The ansatz of the reduced parametric models are obtained by employing a rational approximation and a colored noise approximation, respectively,...
Article
In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints...
Article
Practical applications of kernel methods often use variable bandwidth kernels, also known as self-tuning kernels, however much of the current theory of kernel based techniques is only applicable to fixed bandwidth kernels. In this paper, we derive the asymptotic expansion of these variable bandwidth kernels for arbitrary bandwidth functions; genera...
Article
Full-text available
In this paper, we study filtering of multiscale dynamical systems with model error arising from limitations in resolving the smaller scale processes. In particular, the analysis assumes the availability of continuous-time noisy observations of all components of the slow variables. Mathematically, this paper presents new results on higher order asym...
Article
Full-text available
A central issue in contemporary science is the development of nonlinear data driven statistical-dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological...
Article
Full-text available
Calculating the statistical linear response of turbulent dynamical systems to the change in external forcing is a problem of wide contemporary interest. Here the authors apply linear regression models with memory, AR(p) models, to approximate this statistical linear response by directly fitting the autocorrelations of the underlying turbulent dynam...
Article
Full-text available
Covariance inflation is an ad hoc treatment that is widely used in practical real-time data assimilation algorithms to mitigate covariance underestimation owing to model errors, nonlinearity, or/and, in the context of ensemble filters, insufficient ensemble size. In this paper, we systematically derive an effective 'statistical' inflation for filte...
Article
In this article, we develop a linear theory for optimal filtering of complex turbulent signals with model errors through linear autoregressive models. We will show that when the autoregressive model parameters are chosen such that they satisfy absolute stability and consistency conditions of at least order-2 of the classical multistep method for so...
Article
In this paper, we consider a practical filtering approach for assimilating irregularly spaced, sparsely observed turbulent signals through a hierarchical Bayesian reduced stochastic filtering framework. The proposed hierarchical Bayesian approach consists of two steps, blending a data-driven interpolation scheme and the Mean Stochastic Model (MSM)...
Article
The filtering/data assimilation and prediction of moisture‐coupled tropical waves is a contemporary topic with significant implications for extended‐range forecasting. The development of efficient algorithms to capture such waves is limited by the unstable multiscale features of tropical convection which can organize large‐scale circulations and th...
Article
Full-text available
Superparameterization is a fast numerical algorithm to mitigate implicit scale separation of dynamical systems with large-scale, slowly varying “mean” and smaller-scale, rapidly fluctuating “eddy” term. The main idea of superparameterization is to embed parallel highly resolved simulations of small-scale eddies on each grid cell of coarsely resolve...
Article
Full-text available
A central issue in contemporary science is the development of data driven statistical nonlinear dynamical models for time series of partial observations of nature or a complex physical model. It has been established recently that ad hoc quadratic multi-level regression (MLR) models can have finite-time blow up of statistical solutions and/or pathol...
Article
Fundamental barriers in practical filtering of nonlinear spatio-temporal chaotic systems are model errors attributed to the stiffness in resolving multiscale features. Recently, reduced stochastic filters based on linear stochastic models have been introduced to overcome such stiffness; one of them is the Mean Stochastic Model (MSM) based on a diag...
Article
Full-text available
This paper presents a fast reduced filtering strategy for assimilating multiscale systems in the presence of observations of only the macroscopic (or large scale) variables. This reduced filtering strategy introduces model errors in estimating the prior forecast statistics through the (heterogeneous multiscale methods) HMM-based reduced climate mod...
Book
Many natural phenomena ranging from climate through to biology are described by complex dynamical systems. Getting information about these phenomena involves filtering noisy data and prediction based on incomplete information (complicated by the sheer number of parameters involved), and often we need to do this in real time, for example for weather...
Article
In this paper, we present a fast numerical strategy for filtering stochastic differential equations with multiscale features. This method is designed such that it does not violate the practical linear observability condition and, more importantly, it does not require the computationally expensive cross correlation statistics between multiscale vari...
Article
Full-text available
We present a numerically fast reduced filtering strategy, the Fourier domain Kalman filter with appropriate interpolations to account for irregularly spaced observations of complex turbulent signals. The design of such a reduced filter involves: (i) interpolating irregularly spaced observations to the model regularly spaced grid points, (ii) unders...
Article
Full-text available
The modus operandi of modern applied mathematics in developing very recent mathematical strategies for filtering turbulent dynamical systems is emphasized here. The approach involves the synergy of rigorous mathematical guidelines, exactly solvable nonlinear models with physical insight, and novel cheap algorithms with judicious model errors to fil...
Article
Full-text available
Filtering sparsely turbulent signals from nature is a central problem of contemporary data assimilation. Here, sparsely observed turbulent signals from nature are generated by solutions of two-layer quasigeo-strophic models with turbulent cascades from baroclinic instability in two separate regimes with varying Rossby radius mimicking the atmospher...
Article
Full-text available
Two types of filtering failure are the well known filter divergence where errors may exceed the size of the corresponding true chaotic attractor and the much more severe catastrophic filter divergence where solutions diverge to machine infinity in finite time. In this paper, we demonstrate that these failures occur in filtering the L-96 model, a no...
Article
The filtering skill for turbulent signals from nature is often limited by model errors created by utilizing an imperfect model for filtering. Updating the parameters in the imperfect model through stochastic parameter estimation is one way to increase filtering skill and model performance. Here a suite of stringent test models for filtering with st...
Article
The filtering and predictive skill for turbulent signals is often limited by the lack of information about the true dynamics of the system and by our inability to resolve the assumed dynamics with sufficiently high resolution using the current computing power. The standard approach is to use a simple yet rich family of constant parameters to accoun...
Article
Full-text available
An important emerging scientific issue is the real time filtering through observations of noisy signals for nonlinear dynamical systems as well as the statistical accuracy of spatio-temporal discretizations for filtering such systems. From the practical standpoint, the demand for operationally practical filtering methods escalates as the model reso...
Article
Real time filtering of noisy turbulent signals through sparse observations on a regularly spaced mesh is a notoriously difficult and important prototype filtering problem. Simpler off-line test criteria are proposed here as guidelines for filter performance for these stiff multi-scale filtering problems in the context of linear stochastic partial d...