# Arnulf JentzenThe Chinese University of Hong Kong Shenzhen & University of Münster

Arnulf Jentzen

PhD

## About

193

Publications

54,557

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

10,309

Citations

Introduction

Additional affiliations

September 2012 - August 2016

May 2011 - August 2012

December 2009 - October 2010

## Publications

Publications (193)

Deep learning methods - consisting of a class of deep neural networks (DNNs) trained by a stochastic gradient descent (SGD) optimization method - are nowadays key tools to solve data driven supervised learning problems. Despite the great success of SGD methods in the training of DNNs, it remains a fundamental open problem of research to explain the...

Recently, so-called full-history recursive multilevel Picard (MLP) approximation schemes have been introduced and shown to overcome the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations (PDEs) with Lipschitz nonlinearities. The key contribution of this article is to introduce and analyze a...

In this work we establish weak convergence rates for temporal discretisations of stochastic wave equations with multiplicative noise, in particular, for the hyperbolic Anderson model. For this class of stochastic partial differential equations the weak convergence rates we obtain are indeed twice the known strong rates. To the best of our knowledge...

Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In this paper, we prove a variant of the relevant dynamical systems resu...

The approximation of solutions of partial differential equations (PDEs) with numerical algorithms is a central topic in applied mathematics. For many decades, various types of methods for this purpose have been developed and extensively studied. One class of methods which has received a lot of attention in recent years are machine learning-based me...

Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems. In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SG...

Deep learning algorithms - typically consisting of a class of deep neural networks trained by a stochastic gradient descent (SGD) optimization method - are nowadays the key ingredients in many artificial intelligence (AI) systems and have revolutionized our ways of working and living in modern societies. For example, SGD methods are used to train p...

It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned determ...

It is a challenging topic in applied mathematics to solve high-dimensional nonlinear partial differential equations (PDEs). Standard approximation methods for nonlinear PDEs suffer under the curse of dimensionality (COD) in the sense that the number of computational operations of the approximation method grows at least exponentially in the PDE dime...

Stochastic gradient descent (SGD) optimization methods such as the plain vanilla SGD method and the popular Adam optimizer are nowadays the method of choice in the training of artificial neural networks (ANNs). Despite the remarkable success of SGD methods in the ANN training in numerical simulations, it remains in essentially all practical relevan...

Nonlinear partial differential equations (PDEs) are used to model dynamical processes in a large number of scientific fields, ranging from finance to biology. In many applications standard local models are not sufficient to accurately account for certain non-local phenomena such as, e.g., interactions at a distance. Non-local nonlinear PDE models c...

This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch...

In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably...

Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such...

Discrete time stochastic optimal control problems and Markov decision processes (MDPs), respectively, serve as fundamental models for problems that involve sequential decision making under uncertainty and as such constitute the theoretical foundation of reinforcement learning. In this article we study the numerical approximation of MDPs with infini...

Although deep learning-based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, b...

The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. GD type optimization schemes can be regarded as temporal discretization methods for the gradient flow (GF) differential equations associated to the c...

Many mathematical convergence results for gradient descent (GD) based algorithms employ the assumption that the GD process is (almost surely) bounded and, also in concrete numerical simulations, divergence of the GD process may slow down, or even completely rule out, convergence of the error function. In practical relevant learning problems, it thu...

In this article we propose a new deep learning approach to solve parametric partial differential equations (PDEs) approximately. In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular scientific compu...

In this article we study high-dimensional approximation capacities of shallow and deep artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation. In particular, it is a key contribution of this work to reveal that for all $a,b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a,b]^d\ni x=(x_1,\dots,x_d)\mapsto\pro...

Over the last few years deep artificial neural networks (ANNs) have very successfully been used in numerical simulations for a wide variety of computational problems including computer vision, image classification, speech recognition, natural language processing, as well as computational advertisement. In addition, it has recently been proposed to...

Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong nowadays to the most heavily employed computational schemes in the digital world. Despite the compelling success of such methods, it remains an open problem to provide a rigorous theoretical justification for the success of GD methods in the training of ANNs....

In this article we investigate blow up phenomena for gradient descent optimization methods in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on shallow ANNs with one neuron on the input layer, one neuron on the output layer, and one hidden layer. For ANNs with ReLU activation and at least two neurons on the h...

In this paper we develop a numerical method for efficiently approximating solutions of certain Zakai equations in high dimensions. The key idea is to transform a given Zakai SPDE into a PDE with random coefficients. We show that under suitable regularity assumptions on the coefficients of the Zakai equation the corresponding random PDE admits a sol...

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergen...

Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In particular, this is the case for rectified linear unit (ReLU) networks...

In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including, e.g., language processing, image recognition, fraud detection, and computational advertisement. Recently, it has also been proposed in the scientific literature to reformulate high-dimensional partial d...

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of the corresponding gradient flows (GFs). In this work we analyze GF processes in the training of ANNs with ReLU activation and three lay...

Backward stochastic differential equations (BSDEs) belong nowadays to the most frequently studied equations in stochastic analysis and computational stochastics. BSDEs in applications are often nonlinear and high-dimensional. In nearly all cases such nonlinear high-dimensional BSDEs cannot be solved explicitly and it has been and still is a very ac...

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional...

In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. In particular, we show that there exist no local maxima...

It is an elementary fact in the scientific literature that the Lipschitz norm of the realization function of a feedforward fully-connected rectified linear unit (ReLU) artificial neural network (ANN) can, up to a multiplicative constant, be bounded from above by sums of powers of the norm of the ANN parameter vector. Roughly speaking, in this work...

Nonlinear partial differential equations (PDEs) are used to model dynamical processes in a large number of scientific fields, ranging from finance to biology. In many applications standard local models are not sufficient to accurately account for certain non-local phenomena such as, e.g., interactions at a distance. In order to properly capture the...

The scientific literature contains a number of numerical approximation results for stochastic partial differential equations (SPDEs) with superlinearly growing nonlinearities but, to the best of our knowledge, none of them prove strong or weak convergence rates for full-discrete numerical approximations of space-time white noise driven SPDEs with s...

Deep learning algorithms have been applied very successfully in recent years to a range of problems out of reach for classical solution paradigms. Nevertheless, there is no completely rigorous mathematical error and convergence analysis which explains the success of deep learning algorithms. The error of a deep learning algorithm can in many situat...

Gradient descent (GD) type optimization schemes are the standard instruments to train fully connected feedforward artificial neural networks (ANNs) with rectified linear unit (ReLU) activation and can be considered as temporal discretizations of solutions of gradient flow (GF) differential equations. It has recently been proved that the risk of eve...

In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably...

We analyze approximation rates by deep ReLU networks of a class of multivariate solutions of Kolmogorov equations which arise in option pricing. Key technical devices are deep ReLU architectures capable of efficiently approximating tensor products. Combining this with results concerning the approximation of well-behaved (i.e., fulfilling some smoot...

Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed converge in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disp...

It is one of the most challenging problems in applied mathematics to approximatively solve high-dimensional partial differential equations (PDEs). Recently, several deep learning-based approximation algorithms for attacking this problem have been proposed and tested numerically on a number of examples of high-dimensional PDEs. This has given rise t...

The purpose of this article is to develop machinery to study the capacity of deep neural networks (DNNs) to approximate high-dimensional functions. In particular, we show that DNNs have the expressive power to overcome the curse of dimensionality in the approximation of a large class of functions. More precisely, we prove that these functions can b...

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of t...

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the traini...

In recent years, tremendous progress has been made on numerical algorithms for solving partial differential equations (PDEs) in a very high dimension, using ideas from either nonlinear (multilevel) Monte Carlo or deep learning. They are potentially free of the curse of dimensionality for many different applications and have been proven to be so in...

Stochastic wave equations appear in several models for evolutionary processes
subject to random forces, such as the motion of a strand of DNA in a liquid or
heat flow around a ring. Semilinear stochastic wave equations can typically not
be solved explicitly, but the literature contains a number of results which
show that numerical approximation pro...

We introduce a new family of numerical algorithms for approximating solutions of general high-dimensional semilinear parabolic partial differential equations at single space-time points. The algorithm is obtained through a delicate combination of the Feynman–Kac and the Bismut–Elworthy–Li formulas, and an approximate decomposition of the Picard fix...

Full-history recursive multilevel Picard (MLP) approximation schemes have been shown to overcome the curse of dimensionality in the numerical approximation of high-dimensional semilinear partial differential equations (PDEs) with general time horizons and Lipschitz continuous nonlinearities. However, each of the error analyses for MLP approximation...

In this article we investigate the spatial Sobolev regularity of mild solutions to stochastic Burgers equations with additive trace class noise. Our findings are based on a combination of suitable bootstrap-type arguments and a detailed analysis of the nonlinearity in the equation.

Stochastic differential equations (SDEs) and the Kolmogorov partial differential equations (PDEs) associated to them have been widely used in models from engineering, finance, and the natural sciences. In particular, SDEs and Kolmogorov PDEs, respectively, are highly employed in models for the approximative pricing of financial derivatives. Kolmogo...

Backward stochastic differential equations (BSDEs) belong nowadays to the most frequently studied equations in stochastic analysis and computational stochastics. BSDEs in applications are often nonlinear and high-dimensional. In nearly all cases such nonlinear high-dimensional BSDEs cannot be solved explicitly and it has been and still is a very ac...

The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. Till this day in the scientific literature there is in general no mathematical convergence analysis which explains the numerical success of GD type o...

Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plai...

Recently, artificial neural networks (ANNs) in conjunction with stochastic gradient descent optimization methods have been employed to approximately compute solutions of possibly rather high-dimensional partial differential equations (PDEs). Very recently, there have also been a number of rigorous mathematical results in the scientific literature,...

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical conve...

The classical Feynman–Kac identity builds a bridge between stochastic analysis and partial differential equations (PDEs) by providing stochastic representations for classical solutions of linear Kolmogorov PDEs. This opens the door for the derivation of sampling based Monte Carlo approximation methods, which can be meshfree and thereby stand a chan...

Partial differential equations (PDEs) are a fundamental tool in the modeling of many real-world phenomena. In a number of such real-world phenomena the PDEs under consideration contain gradient-dependent nonlinearities and are high-dimensional. Such high-dimensional nonlinear PDEs can in nearly all cases not be solved explicitly, and it is one of t...

Nowadays many financial derivatives, such as American or Bermudan options, are of early exercise type. Often the pricing of early exercise options gives rise to high-dimensional optimal stopping problems, since the dimension corresponds to the number of underlying assets. High-dimensional optimal stopping problems are, however, notoriously difficul...

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergen...

Recently in [M. Hairer, M. Hutzenthaler, and A. Jentzen, Ann. Probab. 43, 2 (2015), 468–527] and [A. Jentzen, T. Müller-Gronbach, and L. Yaroslavtseva, Commun. Math. Sci. 14, 6 (2016), 1477–1500] stochastic differential equations (SDEs) with smooth coefficient functions have been constructed which have an arbitrarily slowly converging modulus of co...

In this paper, we analyze the landscape of the true loss of a ReLU neural network with one hidden layer. We provide a complete classification of the critical points in the case where the target function is affine. In particular, we prove that local minima and saddle points have to be of a special form and show that there are no local maxima. Our ap...

Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical relevant computational problems involving high-dimensional input data ranging from classification tasks...

We consider ordinary differential equations (ODEs) which involve expectations of a random variable. These ODEs are special cases of McKean-Vlasov stochastic differential equations (SDEs). A plain vanilla Monte Carlo approximation method for such ODEs requires a computational cost of order $\varepsilon^{-3}$ to achieve a root-mean-square error of si...

In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches reach their limits. However, it is still unclear why randomly initialized gradient descent optimization algorithms, such as the well-known batch gradient descent, are able to achieve zero t...

Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or d...

In this article, we develop a framework for showing that neural networks can overcome the curse of dimensionality in different high-dimensional approximation problems. Our approach is based on the notion of a catalog network, which is a generalization of a standard neural network in which the nonlinear activation functions can vary from layer to la...

In the recent article [A. Jentzen, B. Kuckuck, T. Müller-Gronbach, and L. Yaroslavtseva, J. Math. Anal. Appl. 502, 2 (2021)] it has been proved that the solutions to every additive noise driven stochastic differential equation (SDE) which has a drift coefficient function with at most polynomially growing first order partial derivatives and which ad...

In this paper, we introduce a numerical method for nonlinear parabolic partial differential equations (PDEs) that combines operator splitting with deep learning. It divides the PDE approximation problem into a sequence of separate learning problems. Since the computational graph for each of the subproblems is comparatively small, the approach can h...

It is one of the most challenging problems in applied mathematics to approximatively solve high-dimensional partial differential equations (PDEs). Recently, several deep learning-based approximation algorithms for attacking this problem have been proposed and tested numerically on a number of examples of high-dimensional PDEs. This has given rise t...

For a long time it is well-known that high-dimensional linear parabolic partial differential equations (PDEs) can be approximated by Monte Carlo methods with a computational effort which grows polynomially both in the dimension and in the reciprocal of the prescribed accuracy. In other words, linear PDEs do not suffer from the curse of dimen-sional...

Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, b...

One of the most challenging problems in applied mathematics is the approximate solution of nonlinear partial differential equations (PDEs) in high dimensions. Standard deterministic approximation methods like finite differences or finite elements suffer from the curse of dimensionality in the sense that the computational effort grows exponentially...

In this paper we develop a new machinery to study the capacity of artificial neural networks (ANNs) to approximate high-dimensional functions without suffering from the curse of dimensionality. Specifically, we introduce a concept which we refer to as approximation spaces of artificial neural networks and we present several tools to handle those sp...

In this article we introduce and study a deep learning based approximation algorithm for solutions of stochastic partial differential equations (SPDEs). In the proposed approximation algorithm we employ a deep neural network for every realization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consider...

In this article we establish exponential moment bounds, moment bounds in fractional order smoothness spaces, a uniform Hölder continuity in time, and strong convergence rates for a class of fully discrete exponential Euler-type numerical approximations of infinite dimensional stochastic convolution processes. The considered approximations involve s...

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the am...

The approximative calculation of iterated nested expectations is a recurring challenging problem in applications. Nested expectations appear, for example, in the numerical approximation of solutions of backward stochastic differential equations (BSDEs), in the numerical approximation of solutions of semilinear parabolic partial differential equatio...

The recently introduced full-history recursive multilevel Picard (MLP) approximation methods have turned out to be quite successful in the numerical approximation of solutions of high-dimensional nonlinear PDEs. In particular, there are mathematical convergence results in the literature which prove that MLP approximation methods do overcome the cur...

In recent years, tremendous progress has been made on numerical algorithms for solving partial differential equations (PDEs) in a very high dimension, using ideas from either nonlinear (multilevel) Monte Carlo or deep learning. They are potentially free of the curse of dimensionality for many different applications and have been proven to be so in...

In this paper we introduce a deep learning method for pricing and hedging American-style options. It first computes a candidate optimal stopping policy. From there it derives a lower bound for the price. Then it calculates an upper bound, a point estimate and confidence intervals. Finally, it constructs an approximate dynamic hedging strategy. We t...

Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and num...

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the si...

This article establishes optimal upper and lower error estimates for strong full-discrete numerical approximations of the stochastic heat equation driven by space-time white noise. Thereby, this work proves the optimality of the strong convergence rates for certain full-discrete approximations of stochastic Allen–Cahn equations with space-time whit...

It is one of the most challenging issues in applied mathematics to approximately solve high-dimensional partial differential equations (PDEs) and most of the numerical approximation methods for PDEs in the scientific literature suffer from the so-called curse of dimensionality in the sense that the number of computational operations employed in the...

One of the most challenging issues in applied mathematics is to develop and analyze algorithms which are able to approximately compute solutions of high-dimensional nonlinear partial differential equations (PDEs). In particular, it is very hard to develop approximation algorithms which do not suffer under the curse of dimensionality in the sense th...

Deep neural networks and other deep learning methods have very successfully been applied to the numerical approximation of high-dimensional nonlinear parabolic partial differential equations (PDEs), which are widely used in finance, engineering, and natural sciences. In particular, simulations indicate that algorithms based on deep learning overcom...

The classical Feynman-Kac identity builds a bridge between stochastic analysis and partial differential equations (PDEs) by providing stochastic representations for classical solutions of linear Kolmogorov PDEs. This opens the door for the derivation of sampling based Monte Carlo approximation methods, which can be meshfree and thereby stand a chan...

In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to...