## About

335

Publications

73,387

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

20,884

Citations

## Publications

Publications (335)

One of the oldest and most studied subject in scientific computing is algorithms for solving partial differential equations (PDEs). A long list of numerical methods have been proposed and successfully used for various applications. In recent years, deep learning methods have shown their superiority for high-dimensional PDEs where traditional method...

We report an ab initio multi-scale study of lead titanate using the Deep Potential (DP) models, a family of machine learning-based atomistic models, trained on first-principles density functional theory data, to represent potential and polarization surfaces. Our approach includes anharmonic effects beyond the limitations of reduced models and of th...

Collisions are common in many dynamical systems with real applications. They can be formulated as hybrid dynamical systems with discontinuities automatically triggered when states transverse certain manifolds. We present an algorithm for the optimal control problem of such hybrid dynamical systems, based on solving the equations derived from the hy...

Solving complex optimal control problems have confronted computation challenges for a long time. Recent advances in machine learning have provided us with new opportunities to address these challenges. This paper takes the model predictive control, a popular optimal control method, as the primary example to survey recent progress that leverages mac...

To fill the gap between accurate (and expensive) ab initio calculations and efficient atomistic simulations based on empirical interatomic potentials, a new class of descriptions of atomic interactions has emerged and been widely applied; i.e., machine learning potentials (MLPs). One recently developed type of MLP is the Deep Potential (DP) method....

We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions wh...

Machine learning models for the potential energy of multi-atomic systems, such as the deep potential (DP) model, make molecular simulations with the accuracy of quantum mechanical density functional theory possible at a cost only moderately higher than that of empirical force fields. However, the majority of these models lack explicit long-range in...

We propose a machine learning enhanced algorithm for solving the optimal landing problem. Using Pontryagin's minimum principle, we derive a two-point boundary value problem for the landing problem. The proposed algorithm uses deep learning to predict the optimal landing time and a space-marching technique to provide good initial guesses for the bou...

The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator...

To fill the gap between accurate (and expensive) ab initio calculations and efficient atomistic simulations based on empirical interatomic potentials, a new class of descriptions of atomic interactions has emerged and been widely applied; i.e., machine learning potentials (MLPs). One recently developed type of MLP is the Deep Potential (DP) method....

One of the key issues in the analysis of machine learning models is to identify the appropriate function space and norm for the model. This is the set of functions endowed with a quantity which can control the approximation and estimation errors by a particular machine learning model. In this paper, we address this issue for two representative neur...

In recent years, tremendous progress has been made on numerical algorithms for solving partial differential equations (PDEs) in a very high dimension, using ideas from either nonlinear (multilevel) Monte Carlo or deep learning. They are potentially free of the curse of dimensionality for many different applications and have been proven to be so in...

Enhanced sampling methods such as metadynamics and umbrella sampling have become essential tools for exploring the configuration space of molecules and materials. At the same time, they have long faced a number of issues such as the inefficiency when dealing with a large number of collective variables (CVs) or systems with high free energy barriers...

A long standing problem in the modeling of non-Newtonian hydrodynamics is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure, and heterogeneous interaction. DeePN$^2$...

We propose an efficient, reliable, and interpretable global solution method, $\textit{Deep learning-based algorithm for Heterogeneous Agent Models, DeepHAM}$, for solving high dimensional heterogeneous agent models with aggregate shocks. The state distribution is approximately represented by a set of optimal generalized moments. Deep neural network...

Machine learning models for the potential energy of multi-atomic systems, such as the deep potential (DP) model, make possible molecular simulations with the accuracy of quantum mechanical density functional theory, at a cost only moderately higher than that of empirical force fields. However, the majority of these models lack explicit long-range i...

We introduce a new family of numerical algorithms for approximating solutions of general high-dimensional semilinear parabolic partial differential equations at single space-time points. The algorithm is obtained through a delicate combination of the Feynman–Kac and the Bismut–Elworthy–Li formulas, and an approximate decomposition of the Picard fix...

We propose a systematic method for learning stable and physically interpretable dynamical models using sampled trajectory data from physical processes based on a generalized Onsager principle. The learned dynamics are autonomous ordinary differential equations parametrized by neural networks that retain clear physical structure information, such as...

The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator...

Using the Deep Potential methodology, we construct a model that reproduces accurately the potential energy surface of the SCAN approximation of density functional theory for water, from low temperature and pressure to about 2400 K and 50 GPa, excluding the vapor stability region. The computational efficiency of the model makes it possible to predic...

Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers th...

Enhanced sampling methods such as metadynamics and umbrella sampling have become essential tools for exploring the configuration space of molecules and materials. At the same time, they have long faced the following dilemma: Since they are only effective with a small number of collective variables (CVs), choosing a proper set of CVs becomes critica...

We propose a unified framework that extends the inference methods for classical hidden Markov models to continuous settings, where both the hidden states and observations occur in continuous time. Two different settings are analyzed: (1) hidden jump process with a finite state space; (2) hidden diffusion process with a continuous state space. For e...

Solid-state electrolyte materials with superior lithium ionic conductivities are vital to the next-generation Li-ion batteries. Molecular dynamics could provide atomic scale information to understand the diffusion process of Li-ion in these superionic conductor materials. Here, we implement the deep potential generator to set up an efficient protoc...

We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepa...

Using the Deep Potential methodology, we construct a model that reproduces accurately the potential energy surface of the SCAN approximation of density functional theory for water, from low temperature and pressure to about 2400 K and 50 GPa, excluding the vapor stability region. The computational efficiency of the model makes it possible to predic...

We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that for a water system of 12,582,912 atoms, the GPU version can be 7 times faster than the CPU version under the same power consu...

We propose the coarse-grained spectral projection method (CGSP), a deep learning assisted approach for tackling quantum unitary dynamic problems with an emphasis on quench dynamics. We show that CGSP can extract spectral components of many-body quantum states systematically with a sophisticated neural network quantum ansatz. CGSP fully exploits the...

We introduce DeePKS-kit, an open-source software package for developing machine learning based energy and density functional models. DeePKS-kit is interfaced with PyTorch, an open-source machine learning library, and PySCF, an ab initio computational chemistry program that provides simple and customized tools for developing quantum chemistry codes....

A recent numerical study observed that neural network classifiers enjoy a large degree of symmetry in the penultimate layer. Namely, if $h(x) = Af(x) +b$ where $A$ is a linear map and $f$ is the output of the penultimate layer of the network (after activation), then all data points $x_{i, 1}, \dots, x_{i, N_i}$ in a class $C_i$ are mapped to a sing...

We propose a general machine learning-based framework for building an accurate and widely applicable energy functional within the framework of generalized Kohn-Sham density functional theory. To this end, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of label...

We use explicit representation formulas to show that solutions to certain partial differential equations can be represented efficiently using artificial neural networks, even in high dimension. Conversely, we present examples in which the solution fails to lie in the function space associated to a neural network under consideration.

Models for learning probability distributions such as generative models and density estimators behave quite differently from models for learning functions. One example is found in the memorization phenomenon, namely the ultimate convergence to the empirical distribution, that occurs in generative adversarial networks (GANs). For this reason, the is...

We introduce a machine-learning-based framework for constructing continuum a non-Newtonian fluid dynamics model directly from a microscale description. Dumbbell polymer solutions are used as examples to demonstrate the essential ideas. To faithfully retain molecular fidelity, we establish a micro-macro correspondence via a set of encoders for the m...

It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms....

We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than t
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-4</sup>
/(d
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w...

We consider binary and multi-class classification problems using hypothesis classes of neural networks. For a given hypothesis class, we use Rademacher complexity estimates and direct approximation theorems to obtain a priori error estimates for regularized loss functionals.

Neural network-based machine learning is capable of approximating functions in very high dimension with unprecedented efficiency and accuracy. This has opened up many exciting new possibilities, not just in traditional areas of artificial intelligence, but also in scientific computing and computational science. At the same time, machine learning ha...

The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gai...

We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understo...

The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations. Three types of qualitative features are observed in the training loss curve: fast initial convergence, oscillations and large spikes. The sign gradient descent (signGD) algorithm, which is the limit of...

We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without...

We propose a systematic method for learning stable and interpretable dynamical models using sampled trajectory data from physical processes based on a generalized Onsager principle. The learned dynamics are autonomous ordinary differential equations parameterized by neural networks that retain clear physical structure information, such as free ener...

We present a continuous formulation of machine learning, as a problem in the calculus of variations and differential-integral equations, in the spirit of classical numerical analysis. We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the two-layer neural network model and the residual neural...

In recent years, tremendous progress has been made on numerical algorithms for solving partial differential equations (PDEs) in a very high dimension, using ideas from either nonlinear (multilevel) Monte Carlo or deep learning. They are potentially free of the curse of dimensionality for many different applications and have been proven to be so in...

The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size. This behavior is characterized by the appearance of large generalization gap, and is due to the occurrence of very small eigenvalues for the associated Gram matrix. In this paper, we examine the dynamic behavior of the...

We introduce the Deep Post–Hartree–Fock (DeePHF) method, a machine learning based scheme for constructing accurate and transferable models for the ground-state energy of electronic structure problems. DeePHF predicts the energy diﬀerence between results of highly accurate models such as the coupled cluster method and low accuracy models such as the...

We propose a general machine learning-based framework for building an accurate and widely-applicable energy functional within the framework of generalized Kohn-Sham density functional theory. To this end, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of label...

We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable general...

We introduce a deep neural network to model in a symmetry preserving way the environmental dependence of the centers of the electronic charge. The model learns from ab initio density functional theory, wherein the electronic centers are uniquely assigned by the maximally localized Wannier functions. When combined with the deep potential model of th...

We propose the coarse-grained spectral projection method (CGSP), a deep learning approach for tackling quantum unitary dynamic problems with an emphasis on quench dynamics. We show CGSP can extract spectral components of many-body quantum states systematically with highly entangled neural network quantum ansatz. CGSP exploits fully the linear unita...

A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases i...

We study the natural function space for infinitely wide two-layer neural networks and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions whose singular set is fractal or curve...

It has been a challenge to accurately simulate Li-ion diffusion processes in battery materials at room temperature using {\it ab initio} molecular dynamics (AIMD) due to its high computational cost. This situation has changed drastically in recent years due to the advances in machine learning-based interatomic potentials. Here we implement the Deep...

Machine learning is poised as a very powerful tool that can drastically improve our ability to carry out scientific research. However, many issues need to be addressed before this becomes a reality. This article focuses on one particular issue of broad interest: How can we integrate machine learning with physics-based modeling to develop new interp...

We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We pr...

We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor $L^2$-approximators for the class of two-layer neural netwo...

Spatial artificial neural network (ANN) models are developed for subgrid-scale (SGS) forces in the large eddy simulation (LES) of turbulence. The input features are based on the first-order derivatives of the filtered velocity field at different spatial locations. The correlation coefficients of SGS forces predicted by the spatial artifical neural...

For 35 years, {\it ab initio} molecular dynamics (AIMD) has been the method of choice for understanding complex materials and molecules at the atomic scale from first principles. However, most applications of AIMD are limited to systems with thousands of atoms due to the high computational complexity. We report that a machine learning-based molecul...

We introduce the Deep Post-Hartree-Fock (DeePHF) method, a machine learning based scheme for constructing accurate and transferable models for the ground-state energy of electronic structure problems. DeePHF predicts the energy difference between results of highly accurate models such as the coupled cluster method and low accuracy models such as th...

We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU version is 7 times faster than the CPU version with the same power consumption. The code can scale up to the entire S...

We introduce a machine-learning-based framework for constructing continuum non-Newtonian fluid dynamics model directly from a micro-scale description. Polymer solution is used as an example to demonstrate the essential ideas. To faithfully retain molecular fidelity, we establish a micro-macro correspondence via a set of encoders for the micro-scale...

In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key componen...

A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is sho...

We present a continuous formulation of machine learning, as a problem in the calculus of variations and differential-integral equations, very much in the spirit of classical numerical analysis and statistical physics. We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the shallow neural networ...

We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate,...

In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key componen...

A framework is introduced for constructing interpretable and truly reliable reduced models for multiscale problems in situations without scale separation. Hydrodynamic approximation to the kinetic equation is used as an example to illustrate the main steps and issues involved. To this end, a set of generalized moments are constructed first to optim...

A comprehensive microscopic understanding of ambient liquid water is a major challenge for ab initio simulations as it simultaneously requires an accurate quantum mechanical description of the underlying potential energy surface (PES) as well as extensive sampling of configuration space. Due to the presence of light atoms (e.g. H or D), nuclear qua...

We introduce a new family of trial wave-functions based on deep neural networks to solve the many-electron Schrödinger equation. The Pauli exclusion principle is dealt with explicitly to ensure that the trial wave-functions are physical. The optimal trial wave-function is obtained through variational Monte Carlo and the computational cost scales qu...

High-dimensional partial differential equations (PDE) appear in a number of models from the financial industry, such as in derivative pricing models, credit valuation adjustment (CVA) models, or portfolio optimization models. The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portf...

Inspired by chemical kinetics and neurobiology, we propose a mathematical theory for pattern recurrence in text documents, applicable to a wide variety of languages. We present a Markov model at the discourse level for Steven Pinker's ``mentalese'', or chains of mental states that transcend the spoken/written forms. Such (potentially) universal tem...

A new framework is introduced for c