Lorenz Richter’s research while affiliated with Zuse-Institut Berlin and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (27)


Dynamical Measure Transport and Neural PDE Solvers for Sampling
  • Preprint
  • File available

July 2024

·

112 Reads

Jingtong Sun

·

·

Lorenz Richter

·

[...]

·

Anima Anandkumar

The task of sampling from a probability density can be approached as transporting a tractable density function to the target, known as dynamical measure transport. In this work, we tackle it through a principled unified framework using deterministic or stochastic evolutions described by partial differential equations (PDEs). This framework incorporates prior trajectory-based sampling methods, such as diffusion models or Schr\"odinger bridges, without relying on the concept of time-reversals. Moreover, it allows us to propose novel numerical methods for solving the transport task and thus sampling from complicated targets without the need for the normalization constant or data samples. We employ physics-informed neural networks (PINNs) to approximate the respective PDE solutions, implying both conceptional and computational advantages. In particular, PINNs allow for simulation- and discretization-free optimization and can be trained very efficiently, leading to significantly better mode coverage in the sampling task compared to alternative methods. Moreover, they can readily be fine-tuned with Gauss-Newton methods to achieve high accuracy in sampling.

Download


An optimal control perspective on diffusion-based generative modeling

February 2024

·

1,418 Reads

·

36 Citations

We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we can formulate diffusion-based generative modeling as a minimization of the Kullback-Leibler divergence between suitable measures in path space. Finally, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences. We demonstrate that our time-reversed diffusion sampler (DIS) can outperform other diffusion-based sampling approaches on multiple numerical examples.


Figure 3: Traditional risk curve: schematic sketch of the generalization error of a generic deep neural network for a fixed amount of training data as a function of the training time t; see (Yang/E 2022) for details.
Figure 4: Risk curve with benign overfitting: highly overparametrized ANNs often exhibit the double descent phenomenom when the number of parameters exceeds the number of data points. The leftmost vertical dashed line shows the optimal model complexity (for given observation data), beyond which the model is considered overparametrized. The rightmost vertical dashed line marks the interpolation threshold at which the model can exactly fit all data points.
Figure 5: We consider a fully connected neural network (blue) that has been trained on N = 100 noisy data points (orange), once by gradient descent and once by stochastic gradient descent, and compare it to the ground truth function (grey).
Figure 6: The original image of Thomas Bayes in the left panel gets reasonably classified ("cloak"), whereas the right picture is the result of an adversarial attack and therefore gets misclassified (as "mosque").
Transgressing the Boundaries: Towards a Rigorous Understanding of Deep Learning and Its (Non )Robustness

December 2023

·

40 Reads

·

1 Citation


Figure 3: Traditional risk curve: schematic sketch of the generalization error of a generic deep neural network for a fixed amount of training data as a function of the training time t; see (Yang/E 2022) for details.
Figure 4: Risk curve with benign overfitting: highly overparametrized ANNs often exhibit the double descent phenomenom when the number of parameters exceeds the number of data points. The leftmost vertical dashed line shows the optimal model complexity (for given observation data), beyond which the model is considered overparametrized. The rightmost vertical dashed line marks the interpolation threshold at which the model can exactly fit all data points.
Figure 5: We consider a fully connected neural network (blue) that has been trained on N = 100 noisy data points (orange), once by gradient descent and once by stochastic gradient descent, and compare it to the ground truth function (grey).
Figure 6: The original image of Thomas Bayes in the left panel gets reasonably classified ("cloak"), whereas the right picture is the result of an adversarial attack and therefore gets misclassified (as "mosque").
Transgressing the Boundaries: Towards a Rigorous Understanding of Deep Learning and Its (Non )Robustness

August 2023

·

28 Reads

·

1 Citation

The emergence of artificial intelligence has triggered enthusiasm and promise of boundless opportunities as much as uncertainty about its limits. The contributions to this volume explore the limits of AI, describe the necessary conditions for its functionality, reveal its attendant technical and social problems, and present some existing and potential solutions. At the same time, the contributors highlight the societal and attending economic hopes and fears, utopias and dystopias that are associated with the current and future development of artificial intelligence.


Figure 7: We compare different loss functions for a 100-dimensional HJB example, either relying on tensor trains or on neural networks. For computational details we refer to Appendix B.
From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs

July 2023

·

62 Reads

The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue that tensor trains provide an appealing framework for parabolic PDEs: The combination of reformulations in terms of backward stochastic differential equations and regression-type methods holds the promise of leveraging latent low-rank structures, enabling both compression and efficient computation. Emphasizing a continuous-time viewpoint, we develop iterative schemes, which differ in terms of computational efficiency and robustness. We demonstrate both theoretically and numerically that our methods can achieve a favorable trade-off between accuracy and computational efficiency. While previous methods have been either accurate or fast, we have identified a novel numerical strategy that can often combine both of these aspects.


Figure 1: We plot a given function f (x) = sin(2πx) (in gray) along with data points (in orange) given either by a deterministic or stochastic mapping in the first two panels. The right panel shows an approximation of the measure P for the stochastic case.
Figure 5: We consider a fully connected neural network (blue) that has been trained on N = 100 noisy data points (orange), once by gradient descent and once by stochastic gradient descent, and compare it to the ground truth function (grey).
Figure 7: We display the evaluation of a BNN by showing its mean prediction function (in dark blue) and a set of two standard deviations from it (in light blue), compared to the ground truth (in gray). Our BNN is either untrained (left) or has seen N = 5 (in the central panel) or N = 100 data points (in right panel) during training.
Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness

July 2023

·

40 Reads

The recent advances in machine learning in various fields of applications can be largely attributed to the rise of deep learning (DL) methods and architectures. Despite being a key technology behind autonomous cars, image processing, speech recognition, etc., a notorious problem remains the lack of theoretical understanding of DL and related interpretability and (adversarial) robustness issues. Understanding the specifics of DL, as compared to, say, other forms of nonlinear regression methods or statistical learning, is interesting from a mathematical perspective, but at the same time it is of crucial importance in practice: treating neural networks as mere black boxes might be sufficient in certain cases, but many applications require waterproof performance guarantees and a deeper understanding of what could go wrong and why it could go wrong. It is probably fair to say that, despite being mathematically well founded as a method to approximate complicated functions, DL is mostly still more like modern alchemy that is firmly in the hands of engineers and computer scientists. Nevertheless, it is evident that certain specifics of DL that could explain its success in applications demands systematic mathematical approaches. In this work, we review robustness issues of DL and particularly bridge concerns and attempts from approximation theory to statistical learning theory. Further, we review Bayesian Deep Learning as a means for uncertainty quantification and rigorous explainability.


Improved sampling via learned diffusions

July 2023

·

28 Reads

Recently, a series of papers proposed deep learning-based approaches to sample from unnormalized target densities using controlled diffusion processes. In this work, we identify these approaches as special cases of the Schr\"odinger bridge problem, seeking the most likely stochastic evolution between a given prior distribution and the specified target. We further generalize this framework by introducing a variational formulation based on divergences between path space measures of time-reversed diffusion processes. This abstract perspective leads to practical losses that can be optimized by gradient-based algorithms and includes previous objectives as special cases. At the same time, it allows us to consider divergences other than the reverse Kullback-Leibler divergence that is known to suffer from mode collapse. In particular, we propose the so-called log-variance loss, which exhibits favorable numerical properties and leads to significantly improved performance across all considered approaches.


Improved sampling via learned diffusions

July 2023

·

41 Reads

·

11 Citations

Recently, a series of papers proposed deep learning-based approaches to sample from unnormalized target densities using controlled diffusion processes. In this work, we identify these approaches as special cases of the Schrödinger bridge problem, seeking the most likely stochastic evolution between a given prior distribution and the specified target. We further generalize this framework by introducing a variational formulation based on divergences between path space measures of time-reversed diffusion processes. This abstract perspective leads to practical losses that can be optimized by gradient-based algorithms and includes previous objectives as special cases. At the same time, it allows us to consider divergences other than the reverse Kullback-Leibler divergence that is known to suffer from mode collapse. In particular, we propose the so-called log-variance loss, which exhibits favorable numerical properties and leads to significantly improved performance across all considered approaches.


Poster - An optimal control perspective on diffusion-based generative modeling

December 2022

·

23 Reads

We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs) such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences.


Citations (11)


... Other choices of the loss functions include the relative entropy loss D KL (P u ∥P u * ) [30,44,53], the variance loss (or log-variance loss) Var P v ( dP u * dP u ) (or Var P v (log dP u * dP u )) for some suitable basis path measure P v [54], etc. Some theoretical bounds for the KL-type losses above were established in [29]. Besides the PG-based algorithms, other related importance sampling methods include the well-known forward-backward stochastic differential equation (FBSDE) approaches [36,20,60], where one approximates the target value Z via the solution of some SDE with given terminal-time state and a forward filtration. ...

Reference:

Guidance for twisted particle filter: a continuous-time perspective
Nonasymptotic Bounds for Suboptimal Importance Sampling
  • Citing Article
  • April 2024

SIAM/ASA Journal on Uncertainty Quantification

... A popular view regards the Bayesian approach as the most appropriate framework, in principle (Wang & Yeung, 2016). According to this view, the main drawback of a full Bayesian estimate is its prohibitive cost, which leads to a very active search for approximations that offer the best trade-off between accuracy and computational efficiency (Blundell et al., 2015;Gal & Ghahramani, 2016;Hartmann & Richter, 2023;Jospin et al., 2020;MacKay, 1992;Sensoy et al., 2018;Titterington, 2004). Besides the Bayesian framework, the other main approaches rely either on ensemble methods (Lakshminarayanan et al., 2017;Michelucci & Venturini, 2021;Tavazza et al., 2021;Wen et al., 2020), or data augmentation methods (Shorten & Khoshgoftaar, 2019;Wen et al., 2021). ...

Transgressing the Boundaries: Towards a Rigorous Understanding of Deep Learning and Its (Non )Robustness

... On the other hand, Mittal et al. [40] introduces an neural network-based approach for approximate Bayesian inference that amortizes over exchangeable data sets to handle posterior inference in novel data. Also, Richter and Berner [48], Richter et al. [49] develop a low-variance gradient estimator for carrying out variational inference derived from the log-variance loss, which could be employed as an alternative to our KL-based objective for streaming updates of GFlowNets. On a broader scale, Cranmer et al. [9] reviews approximate Bayesian methodology under the lens of simulation-based inference. ...

Improved sampling via learned diffusions

... Traditional numerical methods for solving PDEs such as finite differences [34] and finite elements [7] suffer from the curse of dimensionality, which implies that the computational cost grows exponentially as the dimension increases. Recently, many deep learning-based algorithms have been proposed for different classes of PDEs, see [3,11,18,[28][29][30]33]. ...

Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning

... Having an initial state x 0 = 0 in a stochastic setting causes an enormous increase in complexity as millions of system evaluation might be required for each single x 0 (e.g. in a Monte-Carlo simulation). For that reason, first techniques for SDEs with non-zero initial data have been investigated [4,19]. However, these approaches come with error bounds either depending on the terminal time T (exploding as T → ∞) or on singular values of the so-called error system being practically less useful. ...

Error bounds for model reduction of feedback-controlled linear stochastic dynamics on Hilbert spaces
  • Citing Article
  • March 2022

Stochastic Processes and their Applications

... First of all, once the value functional V is known, it is easy to change the initial training data set, new optimal feedback controls are easily computed from (6.7). Moreover we remark that the numerical solution of the high dimensional HJB equation is the subject of intense scrutiny and various fast algorithms have been established [32,33]. ...

Solving high-dimensional Hamilton–Jacobi–Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space

SN Partial Differential Equations and Applications

... During forward prediction runs, θ is obtained through sampling from a standard normal distribution, ε, instead of sampling directly from the variational distribution q λ (θ) so as to facilitate the implementation of the aforementioned reparametrization formulation. There are also other possible solutions for computing the gradient when random variables are included in the neural network, e.g., score function estimator, 39 VarGrad, 40 straight-through estimator, 41 among others. The reparametrization approach described before is widely adopted in practice owing to its capability for generating unbiased gradient estimates. ...

VarGrad: A Low-Variance Gradient Estimator for Variational Inference
  • Citing Preprint
  • October 2020

... Various approaches have been proposed to numerically solve (23) and obtain an approximate control. (Hartmann et al. 2019) solved the d-dimensional HJB PDE (23) using least-squares regression, whereas (Hartmann et al. 2016) solved it using model-reduction techniques for higher dimensions. Neural networks have also been employed to solve the HJB PDE in higher dimensions with stochastic gradient (Hartmann et al. 2017) and cross-entropy (Zhang et al. 2014) learning methods for the stochastic optimal control formulation (25). ...

Variational approach to rare event simulation using least-squares regression
  • Citing Article
  • June 2019