
Philipp Christian Petersen- Ph.D.
- Professor (Associate) at University of Vienna
Philipp Christian Petersen
- Ph.D.
- Professor (Associate) at University of Vienna
About
72
Publications
43,881
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,233
Citations
Introduction
Philipp Christian Petersen currently works at the Department of Mathematics, University of Vienna. Philipp does research in Approximation theory, Neural Networks, and Signal and Image Processing.
Current institution
Publications
Publications (72)
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in an $L^2$-sense. As a model class, we consider the set $\mathcal{E}^\beta (\mathbb R^d)$ of possibly discontinuous piecewise $C^\beta$ functions $f : [-1/2, 1/2]^d \to \mathb...
We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties: It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0<p<\infty$, for all practical...
Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images...
We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(\Omega)$ for weighted analytic function classes in certain polytopal domains $\Omega$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset \Omega$, but may exhibit isolated point singularities in the interior of...
We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory predicts that the perfor...
Deep learning's success comes with growing energy demands, raising concerns about the long-term sustainability of the field. Spiking neural networks, inspired by biological neurons, offer a promising alternative with potential computational and energy-efficiency gains. This article examines the computational properties of spiking networks through t...
We prove that a classifier with a Barron-regular decision boundary can be approximated with a rate of high polynomial degree by ReLU neural networks with three hidden layers when a margin condition is assumed. In particular, for strong margin conditions, high-dimensional discontinuous classifiers can be approximated with a rate that is typically on...
In recent work it has been shown that determining a feedforward ReLU neural network to within high uniform accuracy from point samples suffers from the curse of dimensionality in terms of the number of samples needed. As a consequence, feedforward ReLU neural networks are of limited use for applications where guaranteed high uniform accuracy is req...
We introduce a conceptual framework for numerically solving linear elliptic, parabolic, and hyperbolic PDEs on bounded, polytopal domains in euclidean spaces by deep neural networks. The PDEs are recast as minimization of a least-squares (LSQ for short) residual of an equivalent, well-posed first-order system, over parametric families of deep neura...
We study the problem of approximating and estimating classification functions that have their decision boundary in the $RBV^2$ space. Functions of $RBV^2$ type arise naturally as solutions of regularized neural network learning problems and neural networks can approximate these functions without the curse of dimensionality. We modify existing resul...
Introduction Polarized endurance training is an important and frequently discussed training intensity distribution (TID). The polarized TID is described as the largest fraction of training time or sessions spent with low-intensity exercise in intensity zone (z)1, followed by a considerable fraction of high intensity exercise (z3), and a relatively...
This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aim...
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but...
We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal en-coders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity r...
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces...
We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathemat...
In recent years the development of new classification and regression algorithms based on deep learning has led to a revolution in the fields of artificial intelligence, machine learning, and data analysis. The development of a theoretical foundation to guarantee the success of these algorithms constitutes one of the most active and exciting researc...
We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have a...
We study the problem of reconstructing solutions of inverse problems when only noisy measurements are available. We assume that the problem can be modeled with an infinite-dimensional forward operator that is not continuously invertible. Then, we restrict this forward operator to finite-dimensional spaces so that the inverse is Lipschitz continuous...
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces...
We study the problem of reconstructing solutions of inverse problems with neural networks when only noisy data is available. We assume the problem can be modeled with an infinite-dimensional forward operator that is not continuously invertible. Then, we restrict this forward operator to finite-dimensional spaces so that the inverse is Lipschitz con...
In certain polytopal domains Ω, in space dimension d=2,3, we prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in H1(Ω) for weighted analytic function classes. These classes comprise in particular solution sets of source and eigenvalue problems for elliptic PDEs with analytic data. Functions in these classes are locally ana...
We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by...
We present a deep-learning-based algorithm to jointly solve a reconstruction problem and a wavefront set extraction problem in tomographic imaging. The algorithm is based on a recently developed digital wavefront set extractor as well as the well-known microlocal canonical relation for the Radon transform. We use the wavefront set information about...
We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we...
Wir beschreiben das neue Feld der mathematischen Analyse des tiefen Lernens (engl. deep learning). Dieses Feld entwickelte sich angetrieben von Forschungsfragen, die nicht ausreichend durch klassische Lerntheorie beantwortet werden konnten. Diese Fragen betreffen unter anderem: die außergewöhnlich genauen Vorhersagen von überparametrisierten neuron...
We present a deep learning-based algorithm to jointly solve a reconstruction problem and a wavefront set extraction problem in tomographic imaging. The algorithm is based on a recently developed digital wavefront set extractor as well as the well-known microlocal canonical relation for the Radon transform. We use the wavefront set information about...
We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory for fully-connected neur...
We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent...
We demonstrate that deep neural networks with the ReLU activation function can efficiently approximate the solutions of various types of parametric linear transport equations. For non-smooth initial conditions, the solutions of these PDEs are high-dimensional and non-smooth. Therefore, approximation of these functions suffers from a curse of dimens...
We prove bounds for the approximation and estimation of certain classification functions using ReLU neural networks. Our estimation bounds provide a priori performance guarantees for empirical risk minimization using networks of a suitable size, depending on the number of training samples available. The obtained approximation and estimation rates a...
We introduce two shearlet-based Ginzburg–Landau energies, based on the continuous and the discrete shearlet transform. The energies result from replacing the elastic energy term of a classical Ginzburg–Landau energy by the weighted L2-norm of a shearlet transform. The asymptotic behaviour of sequences of these energies is analysed within the framew...
We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to \(L^p\)-norms, \(0< p < \infty \), for all p...
In this paper we provide a construction of multiscale systems on a bounded
domain $\Omega \subset \mathbb{R}^2$ coined boundary shearlet systems, which
satisfy several properties advantageous for applications to imaging science and
numerical analysis of partial differential equations. More precisely, we
construct boundary shearlet systems that form...
We demonstrate that deep neural networks with the ReLU activation function can efficiently approximate the solutions of various types of parametric linear transport equations. For non-smooth initial conditions, the solutions of these PDEs are high-dimensional and non-smooth. Therefore, approximation of these functions suffers from a curse of dimens...
Approximation rate bounds for emulations of real-valued functions on intervals by deep neural networks (DNNs) are established. The approximation results are given for DNNs based on ReLU activation functions. The approximation error is measured with respect to Sobolev norms. It is shown that ReLU DNNs allow for essentially the same approximation rat...
We analyze to what extent deep Rectified Linear Unit (ReLU) neural networks can efficiently approximate Sobolev regular functions if the approximation error is measured with respect to weaker Sobolev norms. In this context, we first establish upper approximation bounds by ReLU neural networks for Sobolev regular functions by explicitly constructing...
In this paper, we study a newly developed shearlet system on bounded domains which yields frames for $H^s(\Omega)$ for some $s\in \mathbb{N}$, $\Omega \subset \mathbb{R}^2$. We will derive approximation rates with respect to $H^s(\Omega)$ norms for functions whose derivatives admit smooth jumps along curves and demonstrate superior rates to those p...
We discuss the expressive power of neural networks which use the non-smooth ReLU activation function $\varrho(x) = \max\{0,x\}$ by analyzing the approximation theoretic properties of such networks. The existing results mainly fall into two categories: approximation using ReLU networks with a fixed depth, or using ReLU networks whose depth increases...
We analyze to what extent deep ReLU neural networks can efficiently approximate Sobolev regular functions if the approximation error is measured with respect to weaker Sobolev norms. In this context, we first establish upper approximation bounds by ReLU neural networks for Sobolev regular functions by explicitly constructing the approximating ReLU...
We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by...
We analyze approximation rates of deep ReLU neural networks for Sobolev-regular functions with respect to weaker Sobolev norms. First, we construct, based on a calculus of ReLU networks, artificial neural networks with ReLU activation functions that achieve certain approximation rates. Second, we establish lower bounds for the approximation by ReLU...
We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data...
Approximation rate bounds for expressions of real-valued functions on intervals by deep neural networks
(DNNs for short) are established. The approximation results are given for DNNs based on ReLU activation
functions, and the approximation error is measured with respect to Sobolev norms. It is shown that
ReLU DNNs allow for essentially the same ap...
Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images...
Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images...
The chapters in this volume highlight the state-of-the-art of compressed sensing and are based on talks given at the third international MATHEON conference on the same topic, held from December 4-8, 2017 at the Technical University in Berlin. In addition to methods in compressed sensing, chapters provide insights into cutting edge applications of d...
We introduce two shearlet-based Ginzburg--Landau energies, based on the continuous and the discrete shearlet transform. The energies result from replacing the elastic energy term of a classical Ginzburg--Landau energy by the weighted $L^2$-norm of a shearlet transform. The asymptotic behaviour of sequences of these energies is analysed within the f...
Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of...
We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties: It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0<p<\infty$, for all practical...
In this paper, we study a newly developed shearlet system on bounded domains which yields frames for $H^s(\Omega)$ for some $s\in \mathbb{N}$, $\Omega \subset \mathbb{R}^2$. We will derive approximation rates with respect to $H^s(\Omega)$ norms for functions whose derivatives admit smooth jumps along curves and demonstrate superior rates to those p...
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in $L^2$. As a model class, we consider the set $\mathcal{E}^\beta (\mathbb R^d)$ of possibly discontinuous piecewise $C^\beta$ functions $f : [-1/2, 1/2]^d \to \mathbb R$, whe...
We summarize the main results of a recent theory—developed by the authors—establishing fundamental lower bounds on the connectivity and memory requirements of deep neural networks as a function of the complexity of the function class to be approximated by the network. These bounds are shown to be achievable. Specifically, all function classes that...
Regularization techniques for the numerical solution of inverse scattering problems in two space dimensions are discussed. Assuming that the boundary of a scatterer is its most prominent feature, we exploit as model the class of cartoon-like functions. Since functions in this class are asymptotically optimally sparsely approximated by shearlet fram...
We introduce bendlets, a shearlet-like system that is based on anisotropic scaling, translation, shearing, and bending of a compactly supported generator. With shearing being linear and bending quadratic in spatial coordinates, bendlets provide what we term a second-order shearlet system. As we show in this article, the decay rates of the associate...
We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions fro...
We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functi...
We analyze the detection and classification of singularities of functions $f
= \chi_B$, where $B \subset \mathbb{R}^d$ and $d = 2,3$. It will be shown how
the set $\partial B$ can be extracted by a continuous shearlet transform
associated with compactly supported shearlets. Furthermore, if $\partial S$ is
a $d-1$ dimensional piecewise smooth manifo...
We introduce bendlets, a shearlet-like system that is based on anisotropic scaling, translation, shearing, and bending of a compactly supported generator. With shearing being linear and bending quadratic in spatial coordinates, bendlets provide what we term a second-order shearlet system. As we show in this article, the decay rates of the associate...
We introduce bendlets, a shearlet-like system that is based on anisotropic scaling, translation, shearing, and bending of a compactly supported generator. With shearing being linear and bending quadratic in spatial coordinates, bendlets provide what we term a second-order shearlet system. As we show in this article, the decay rates of the associate...
In this thesis we discuss and extend the theory of shearlet systems. These systems were introduced by Guo, Kutyniok, Labate, Lim and Weiss, and have found a multitude of applications in signal- and image processing and related fields since then. The results of this thesis are split into two different but connected parts. In the first part we presen...
We demonstrate that shearlet systems yield superior $N$-term approximation
rates compared with wavelet systems of functions whose first or higher order
derivatives are piecewise smooth away from smooth discontinuity curves. We will
also provide an improved estimate for the decay of shearlet coefficients that
intersect a discontinuity curve non-tang...
Linear indepencence of finite subsets of a frame is closely related to the
recently proven Kadison Singer Conjecture. However, for Gabor frames it is
still an open question whether every finite subset is linearly independent. In
this paper we consider shearlet systems and show that seperabel compactly
supported shearlet systems exhibit indeed the l...