Felix Voigtländer

Felix Voigtländer
Katholische Universität Eichstätt-Ingolstadt (KU) | KU · Department of Mathematics

PhD

About

74
Publications
7,826
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
791
Citations
Citations since 2017
62 Research Items
757 Citations
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
Introduction
I am a professor for Mathematics of Machine Learning at KU Eichstätt-Ingolstadt. My research focuses on mathematically understanding the potential and limitations of methods from machine learning and in particular deep learning. This includes: - Approximation properties of deep neural networks - Complexity of sets of neural networks (VC dimension, entropy numbers, sampling numbers) - Robustness of deep neural networks I am also interested in harmonic analysis.
Additional affiliations
June 2021 - October 2021
Technische Universität München
Position
  • Junior Research Group Leader
June 2020 - May 2021
University of Vienna
Position
  • Senior Scientist
February 2018 - present
Katholische Universität Eichstätt-Ingolstadt (KU)
Position
  • PostDoc Position

Publications

Publications (74)
Article
In this paper we show that the Fourier transform induces an isomorphism between the coorbit spaces defined by Feichtinger and Gr\"ochenig of the mixed, weighted Lebesgue spaces $L_{v}^{p,q}$ with respect to the quasi-regular representation of a semi-direct product $\mathbb{R}^{d}\rtimes H$ with suitably chosen dilation group $H$, and certain decomp...
Article
Many smoothness spaces in harmonic analysis are decomposition spaces. Iin this paper we ask: Given two decomposition spaces, is there an embedding between the two? A decomposition space $\mathcal{D}(\mathcal{Q}, L^p, Y)$ can be described using : a covering $\mathcal{Q}=(Q_{i})_{i\in I}$ of the frequency domain, an exponent $p$ and a sequence space...
Article
\newcommand{mc}[1]{\mathcal{#1}}$ $\newcommand{D}{\mc{D}(\mc{Q},L^p,\ell_w^q)}$ We present a framework for the construction of structured, possibly compactly supported Banach frames and atomic decompositions for decomposition spaces. Such a space $\D$ is defined using a frequency covering $\mc{Q}=(Q_i)_{i\in I}$: If $(\varphi_i)_{i}$ is a suitable...
Preprint
We generalize the classical universal approximation theorem for neural networks to the case of complex-valued neural networks. Precisely, we consider feedforward networks with a complex activation function $\sigma : \mathbb{C} \to \mathbb{C}$ in which each neuron performs the operation $\mathbb{C}^N \to \mathbb{C}, z \mapsto \sigma(b + w^T z)$ with...
Preprint
In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $\sigma > 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(\xi) \, e^{2 \pi i \langle x, \xi \rangle} \, d \xi \quad \text{with} \quad \int_{\mathbb{R}^d} |F(\xi)| \cdot (1 + |\xi|)^{\sigma} \, d \xi < \infty. \] For $\sigma = 1$, th...
Article
Full-text available
Motivated by results of Dyatlov on Fourier uncertainty principles for Cantor sets and of Knutsen, for joint time-frequency representations (STFT with Gaussian, equivalent to Fock spaces), we suggest a general setting relating localization and uncertainty and prove, within this context, an uncertainty principle for Cantor sets in Bergman spaces of t...
Article
Full-text available
This paper provides maximal function characterizations of anisotropic Triebel–Lizorkin spaces associated to general expansive matrices for the full range of parameters p∈(0,∞)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgree...
Article
Full-text available
Continuing previous work, this paper provides maximal characterizations of anisotropic Triebel-Lizorkin spaces F˙p,qα\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\do...
Article
Full-text available
Gabor systems are used in fields ranging from audio processing to digital communication. Such a Gabor system (g,Λ) consists of all time-frequency shifts π(λ)g of a window function g∈L2(R) along a lattice Λ⊂R2. We focus on Gabor systems that are also Riesz sequences, meaning that one can stably reconstruct the coefficients c=(cλ)λ∈Λ from the functio...
Conference Paper
Full-text available
Statistical learning theory provides bounds on the necessary number of training samples needed to reach a prescribed accuracy in a learning problem formulated over a given target class. This accuracy is typically measured in terms of a generalization error, that is, an expected value of a given loss function. However, for several applications --- f...
Article
Full-text available
We consider Gabor frames generated by a general lattice and a window function that belongs to one of the following spaces: the Sobolev space $$V_1 = H^1(\mathbb {R}^d)$$ V 1 = H 1 ( R d ) , the weighted $$L^2$$ L 2 -space $$V_2 = L_{1 + |x|}^2(\mathbb {R}^d)$$ V 2 = L 1 + | x | 2 ( R d ) , and the space $$V_3 = \mathbb {H}^1(\mathbb {R}^d) = V_1 \c...
Preprint
Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (non-linear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best...
Preprint
This paper provides a classification theorem for expansive matrices $A \in \mathrm{GL}(d, \mathbb{R})$ generating the same anisotropic homogeneous Triebel-Lizorkin space $\dot{\mathbf{F}}^{\alpha}_{p, q}(A)$ for $\alpha \in \mathbb{R}$ and $p,q \in (0,\infty]$. It is shown that $\dot{\mathbf{F}}^{\alpha}_{p, q}(A) = \dot{\mathbf{F}}^{\alpha}_{p, q}...
Preprint
Full-text available
Warped time-frequency systems have recently been introduced as a class of structured continuous frames for functions on the real line. Herein, we generalize this framework to the setting of functions of arbitrary dimensionality. After showing that the basic properties of warped time-frequency representations carry over to higher dimensions, we dete...
Preprint
Full-text available
Statistical learning theory provides bounds on the necessary number of training samples needed to reach a prescribed accuracy in a learning problem formulated over a given target class. This accuracy is typically measured in terms of a generalization error, that is, an expected value of a given loss function. However, for several applications --- f...
Article
Full-text available
We derive an extension of the Walnut–Daubechies criterion for the invertibility of frame operators. The criterion concerns general reproducing systems and Besov-type spaces. As an application, we conclude that $$L^2$$ L 2 frame expansions associated with smooth and fast-decaying reproducing systems on sufficiently fine lattices extend to Besov-type...
Preprint
This paper is a continuation of [arXiv:2104.14361]. It concerns maximal characterizations of anisotropic Triebel-Lizorkin spaces $\dot{\mathbf{F}}^{\alpha}_{p,q}$ for the endpoint case of $p = \infty$ and the full scale of parameters $\alpha \in \mathbb{R}$ and $q \in (0,\infty]$. In particular, a Peetre-type characterization of the anisotropic Bes...
Preprint
This paper provides a self-contained exposition of coorbit spaces associated with integrable group representations and quasi-Banach function spaces. It extends the theory in [Studia Math., 180(3):237-253, 2007] to locally compact groups that do not necessarily possess a compact, conjugation-invariant unit neighborhood. Furthermore, the present pape...
Preprint
Full-text available
Motivated by results of Dyatlov on Fourier uncertainty principles for Cantor sets and of Knutsen, for joint time-frequency representations (STFT with Gaussian, equivalent to Fock spaces), we suggest a general setting relating localization and uncertainty and prove, within this context, an uncertainty principle for Cantor sets in Bergman spaces of t...
Preprint
Full-text available
We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we...
Article
Full-text available
Rate distortion theory is concerned with optimally encoding signals from a given signal class S using a budget of R bits, as R→∞. We say that Scan be compressed at rates if we can achieve an error of at most O(R-s) for encoding the given signal class; the supremal compression rate is denoted by s∗(S). Given a fixed coding scheme, there usually are...
Preprint
We consider neural network approximation spaces that classify functions according to the rate at which they can be approximated (with error measured in $L^p$) by ReLU neural networks with an increasing number of coefficients, subject to bounds on the magnitude of the coefficients and the number of hidden layers. We prove embedding theorems between...
Article
Full-text available
We consider neural network approximation spaces that classify functions according to the rate at which they can be approximated (with error measured in $L^p$) by ReLU neural networks with an increasing number of coefficients, subject to bounds on the coefficients and the number of hidden layers. We prove embedding theorems between these spaces for...
Article
Full-text available
Assume that \(X_{\Sigma } \in \mathbb {R}^{n}\) is a centered random vector following a multivariate normal distribution with positive definite covariance matrix \(\Sigma \). Let \(g : \mathbb {R}^{n} \rightarrow \mathbb {C}\) be measurable and of moderate growth, say \(|g(x)| \lesssim (1 + |x|)^{N}\). We show that the map \(\Sigma \mapsto \mathbb...
Article
Schur's test for integral operators states that if a kernel K:X×Y→C satisfies ∫Y|K(x,y)|dν(y)≤C and ∫X|K(x,y)|dμ(x)≤C, then the associated integral operator is bounded from Lp(ν) into Lp(μ), simultaneously for all p∈[1,∞]. We derive a variant of this result which ensures that the integral operator acts boundedly on the (weighted) mixed-norm Lebesgu...
Preprint
Full-text available
We consider Gabor frames generated by a general lattice and a window function that belongs to one of the following spaces: the Sobolev space $V_1 = H^1(\mathbb R^d)$, the weighted $L^2$-space $V_2 = L_{1 + |x|}^2(\mathbb R^d)$, and the space $V_3 = \mathbb H^1(\mathbb R^d) = V_1 \cap V_2$ consisting of all functions with finite uncertainty product;...
Article
Let G⊂L2(R) be the subspace spanned by a Gabor Riesz sequence (g,Λ) with g∈L2(R) and a lattice Λ⊂R2 of rational density. It was shown recently that if g is well-localized both in time and frequency, then G cannot contain any time-frequency shift π(z)g of g with z∈R2∖Λ. In this paper, we improve the result to the quantitative statement that the L2-d...
Article
Full-text available
We study the expressivity of deep neural networks. Measuring a network’s complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical ap...
Preprint
This paper provides maximal function characterizations of anisotropic Triebel-Lizorkin spaces associated to general expansive matrices for the full range of parameters $p \in (0,\infty)$, $q \in (0,\infty]$ and $\alpha \in \mathbb{R}$. The equivalent norm is defined in terms of the decay of wavelet coefficients, quantified by a Peetre-type space ov...
Preprint
Full-text available
We study the computational complexity of (deterministic or randomized) algorithms based on point samples for approximating or integrating functions that can be well approximated by neural networks. Such algorithms (most prominently stochastic gradient descent and its variants) are used extensively in the field of deep learning. One of the most impo...
Preprint
We show that complex-valued neural networks with the modReLU activation function $\sigma(z) = \mathrm{ReLU}(|z| - 1) \cdot z / |z|$ can uniformly approximate complex-valued functions of regularity $C^n$ on compact subsets of $\mathbb{C}^d$, giving explicit bounds on the approximation rate.
Article
We prove a negative result for the approximation of functions defined on compact subsets of Rd (where d≥2) using feedforward neural networks with one hidden layer and arbitrary continuous activation function. In a nutshell, this result claims the existence of target functions that are as difficult to approximate using these neural networks as one m...
Article
We show that sampling or interpolation formulas in reproducing kernel Hilbert spaces can be obtained by reproducing kernels whose dual systems form molecules, ensuring that the size profile of a function is fully reflected by the size profile of its sampled values. The main tool is a local holomorphic calculus for convolution-dominated operators, v...
Preprint
Full-text available
We prove bounds for the approximation and estimation of certain classification functions using ReLU neural networks. Our estimation bounds provide a priori performance guarantees for empirical risk minimization using networks of a suitable size, depending on the number of training samples available. The obtained approximation and estimation rates a...
Preprint
Full-text available
Rate distortion theory is concerned with optimally encoding a given signal class $\mathcal{S}$ using a budget of $R$ bits, as $R\to\infty$. We say that $\mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $\mathcal{O}(R^{-s})$ for encoding $\mathcal{S}$; the supremal compression rate is denoted $s^\ast(\mathcal{S})$. Given a fi...
Preprint
Full-text available
Schur's test states that if $K:X\times Y\to\mathbb{C}$ satisfies $\int_Y |K(x,y)|d\nu(y)\leq C$ and $\int_X |K(x,y)|d\mu(x)\leq C$, then the associated integral operator acts boundedly on $L^p$ for all $p\in [1,\infty]$. We derive a variant of this result ensuring boundedness on the (weighted) mixed-norm Lebesgue spaces $L_w^{p,q}$ for all $p,q\in...
Article
Full-text available
We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to \(L^p\)-norms, \(0< p < \infty \), for all p...
Preprint
We show that sampling or interpolation formulas in reproducing kernel Hilbert spaces can be obtained by reproducing kernels whose dual systems form molecules, ensuring that the size profile of a function is fully reflected by the size profile of its sampled values. The main tool is a local holomorphic calculus for convolution-dominated operators, v...
Preprint
We derive an extension of the Walnut-Daubechies criterion for the invertibility of frame operators. The criterion concerns general reproducing systems and Besov-type spaces. As an application, we conclude that $L^2$ frame expansions associated with smooth and fast-decaying reproducing systems on sufficiently fine lattices extend to Besov-type space...
Preprint
Full-text available
We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical ap...
Article
We introduce a family of quasi-Banach spaces — which we call wave packet smoothness spaces — that includes those function spaces which can be characterised by the sparsity of their expansions in Gabor frames, wave atoms, and many other frame constructions. We construct Banach frames for and atomic decompositions of the wave packet smoothness spaces...
Preprint
Full-text available
We consider non-complete Gabor frame sequences generated by an $S_0$-function and a lattice $\Lambda$ and prove that there is $m \in \mathbb{N}$ such that all time-frequency shifts leaving the corresponding Gabor space invariant have their parameters in $\tfrac{1}{m}\Lambda$. We also investigate time-frequency shift invariance under duality aspects...
Preprint
Let $\mathcal G \subset L^2(\mathbb R)$ be the subspace spanned by a Gabor Riesz sequence $(g,\Lambda)$ with $g \in L^2(\mathbb R)$ and a lattice $\Lambda \subset \mathbb R^2$ of rational density. It was shown recently that if $g$ is well-localized both in time and frequency, then $\mathcal G$ cannot contain any time-frequency shift $\pi(z) g$ of $...
Preprint
We introduce a family of quasi-Banach spaces - which we call wave packet smoothness spaces - that includes those function spaces which can be characterised by the sparsity of their expansions in Gabor frames, wave atoms, and many other frame constructions. We construct Banach frames for and atomic decompositions of the wave packet smoothness spaces...
Preprint
We discuss the expressive power of neural networks which use the non-smooth ReLU activation function $\varrho(x) = \max\{0,x\}$ by analyzing the approximation theoretic properties of such networks. The existing results mainly fall into two categories: approximation using ReLU networks with a fixed depth, or using ReLU networks whose depth increases...
Chapter
This chapter is concerned with recent progress in the context of coorbit space theory. Based on a square-integrable group representation, the coorbit theory provides new families of associated smoothness spaces, where the smoothness of a function is measured by the decay of the associated voice transform. Moreover, by discretizing the representatio...
Article
Full-text available
Many representation systems on the sphere have been proposed in the past, such as spherical harmonics, wavelets, or curvelets. Each of these data representations is designed to extract a specific set of features, and choosing the best fixed representation system for a given scientific application is challenging. In this paper, we show that we can l...
Preprint
Full-text available
Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of...
Preprint
Full-text available
Many representation systems on the sphere have been proposed in the past, such as spherical harmonics, wavelets, or curvelets. Each of these data representations is designed to extract a specific set of features, and choosing the best fixed representation system for a given scientific application is challenging. In this paper, we show that we can l...
Preprint
Full-text available
This paper ist concerned with recent progress in the context of coorbit space theory. Based on a square integrable group representation, the coorbit theory provides new families of associated smoothness spaces, where the smoothness of a function is measured by the decay of the associated voice transform. Moreover, by discretizing the representation...
Article
Full-text available
We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties: It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0<p<\infty$, for all practical...
Preprint
Full-text available
We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties: It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0<p<\infty$, for all practical...
Article
Assume that $X_{\Sigma}\in\mathbb{R}^{n}$ is a random vector following a multivariate normal distribution with zero mean and positive definite covariance matrix $\Sigma$. Let $g:\mathbb{R}^{n}\to\mathbb{C}$ be measurable and of moderate growth, e.g., $|g(x)| \lesssim (1+|x|)^{N}$. We show that the map $\Sigma\mapsto\mathbb{E}\left[g(X_{\Sigma})\rig...
Article
Full-text available
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in an $L^2$-sense. As a model class, we consider the set $\mathcal{E}^\beta (\mathbb R^d)$ of possibly discontinuous piecewise $C^\beta$ functions $f : [-1/2, 1/2]^d \to \mathb...
Article
There are two notions of sparsity associated to a frame $\Psi=(\psi_i)_{i\in I}$: Analysis sparsity of $f$ means that the analysis coefficients $(\langle f,\psi_i\rangle)_i$ are sparse, while synthesis sparsity means that $f=\sum_i c_i\psi_i$ with sparse coefficients $(c_i)_i$. Here, sparsity of $c=(c_i)_i$ means $c\in\ell^p(I)$ for a given $p<2$....
Chapter
Full-text available
This article describes how the ideas promoted by the fundamental papers published by M. Frazier and B. Jawerth in the eighties have influenced subsequent developments related to the theory of atomic decompositions and Banach frames for function spaces such as the modulation spaces and Besov-Triebel-Lizorkin spaces. Both of these classes of spaces a...
Article
Full-text available
We consider the problem of characterizing the wavefront set of a tempered distribution u∈S′(Rd)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\in \mathcal {S}'(\mathb...
Article
Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may...
Article
Full-text available
This article describes how the ideas promoted by the fundamental papers published by M. Frazier and B. Jawerth in the eighties have influenced subsequent developments related to the theory of atomic decompositions and Banach frames for function spaces such as the modulation spaces and Besov-Triebel-Lizorkin spaces. Both of these classes of spaces a...
Article
In the present paper, we investigate whether an embedding of a decomposition space $\mathcal{D}\left(\mathcal{Q},L^{p},Y\right)$ into a given Sobolev space $W^{k,q}(\mathbb{R}^{d})$ exists. As special cases, this includes embeddings into Sobolev spaces of (homogeneous and inhomogeneous) Besov spaces, ($\alpha$)-modulation spaces, shearlet smoothnes...
Article
Full-text available
We consider the problem of characterizing the wavefront set of a tempered distribution $u\in\mathcal{S}'(\mathbb{R}^{d})$ in terms of its continuous wavelet transform, where the latter is defined with respect to a suitably chosen dilation group $H\subset{\rm GL}(\mathbb{R}^{d})$. In this paper we develop a comprehensive and unified approach that al...
Conference Paper
Full-text available
Performance analysis is very important to understand the applications’ behavior and to identify bottlenecks. Performance analysis tools should facilitate the exploration of the data collected and help to identify where the analyst has to look. While this functionality can promote the tools usage on small and medium size environments, it becomes ma...
Article
Programming and optimising large parallel applications for multi-core systems is an ambitious and time consuming challenge. Therefore, a number of software tools have been developed in the past to assist the programmer in optimising their codes. Scalasca and Vampir are two of these performance-analysis tools that are already well established and re...
Article
Full-text available
The performance of parallel applications is often aected by wait states occurring when processes fail to reach synchronization points simultaneously. In the kojak project, we have shown that these wait states and other performance properties can be diagnosed by searching event traces for characteristic patterns and quantifying their severity, i.e.,...

Network

Cited By