Rémi Gribonval's research while affiliated with French National Centre for Scientific Research and other places

Publications (358)

Preprint
We explore the use of Optical Processing Units (OPU) to compute random Fourier features for sketching, and adapt the overall compressive clustering pipeline to this setting. We also propose some tools to help tuning a critical hyper-parameter of compressive clustering.
Preprint
We consider general approximation families encompassing ReLU neural networks. On the one hand, we introduce a new property, that we call $\infty$-encodability, which lays a framework that we use (i) to guarantee that ReLU networks can be uniformly quantized and still have approximation speeds comparable to unquantized ones, and (ii) to prove that R...
Preprint
We address the differentially private estimation of multiple quantiles (MQ) of a dataset, a key building block in modern data analysis. We apply the recent non-smoothed Inverse Sensitivity (IS) mechanism to this specific problem and establish that the resulting method is closely related to the current state-of-the-art, the JointExp algorithm, shari...
Article
Full-text available
We study the expressivity of deep neural networks. Measuring a network’s complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical ap...
Preprint
We consider the problem of recovering elements of a low-dimensional model from under-determined linear measurements. To perform recovery, we consider the minimization of a convex regularizer subject to a data fit constraint. Given a model, we ask ourselves what is the "best" convex regularizer to perform its recovery. To answer this question, we de...
Preprint
Comparing probability distributions is at the crux of many machine learning algorithms. Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between probability measures that have attracted abundant attention in past years. This paper establishes some conditions under which the Wasserstein distance can...
Preprint
The problem of approximating a dense matrix by a product of sparse factors is a fundamental problem for many signal processing and machine learning tasks. It can be decomposed into two subproblems: finding the position of the non-zero coefficients in the sparse factors, and determining their values. While the first step is usually seen as the most...
Preprint
Many well-known matrices Z are associated to fast transforms corresponding to factorizations of the form Z = X^(L). .. X^(1) , where each factor X^(l) is sparse. Based on general result for the case with two factors, established in a companion paper, we investigate essential uniqueness of such factorizations. We show some identifiability results fo...
Preprint
Full-text available
Sparse matrix factorization is the problem of approximating a matrix Z by a product of L sparse factors X^(L) X^(L--1). .. X^(1). This paper focuses on identifiability issues that appear in this problem, in view of better understanding under which sparsity constraints the problem is well-posed. We give conditions under which the problem of factoriz...
Preprint
Full-text available
Daily pandemic surveillance, often achieved through the estimation of the reproduction number, constitutes a critical challenge for national health authorities to design countermeasures. In an earlier work, we proposed to formulate the estimation of the reproduction number as an optimization problem, combining data-model fidelity and space-time reg...
Article
Big data can be a blessing: with very large training data sets it becomes possible to perform complex learning tasks with unprecedented accuracy. Yet, this improved performance comes at the price of enormous computational challenges. Thus, one may wonder: Is it possible to leverage the information content of huge data sets while keeping computation...
Article
This preprint results from a split and profound restructuring and improvements of of https://hal.inria.fr/hal-01544609v2It is a companion paper to https://hal.inria.fr/hal-01544609v3
Preprint
Full-text available
Neural networks with the Rectified Linear Unit (ReLU) nonlinearity are described by a vector of parameters $\theta$, and realized as a piecewise linear continuous function $R_{\theta}: x \in \mathbb R^{d} \mapsto R_{\theta}(x) \in \mathbb R^{k}$. Natural scalings and permutations operations on the parameters $\theta$ leave the realization unchanged...
Article
This work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, called a sketch vector, from which the learning task is then performed. We provide sharp boun...
Preprint
Diffusing a graph signal at multiple scales requires computing the action of the exponential of several multiples of the Laplacian matrix. We tighten a bound on the approximation error of truncated Chebyshev polynomial approximations of the exponential, hence significantly improving a priori estimates of the polynomial order for a prescribed error....
Chapter
Optimal Transport (OT) for structured data has received much attention in the machine learning community, especially for addressing graph classification or graph transfer learning tasks. In this paper, we present the Diffusion Wasserstein (\(\mathtt {DW}\)) distance, as a generalization of the standard Wasserstein distance to undirected and connect...
Article
Recent advances in audio declipping have substantially improved the state of the art. Yet, practitioners need guidelines to choose a method, and while existing benchmarks have been instrumental in advancing the field, larger-scale experiments are needed to guide such choices. First, we show that the clipping levels in existing small-scale benchmark...
Preprint
Full-text available
Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning. Yet, despite recent algorithmic developments, their complexity prevents their direct use on large scale datasets. To overcome this challenge, a common workaround is to compute these distances on minibatch...
Article
This paper presents new theoretical results on sparse recovery guarantees for a greedy algorithm, Orthogonal Matching Pursuit (OMP), in the context of continuous parametric dictionaries. Here, the continuous setting means that the dictionary is made up of an infinite uncountable number of atoms. In this work, we rely on the Hilbert structure of the...
Preprint
In this short paper we bridge two seemingly unrelated sparse approximation topics: continuous sparse coding and low-rank approximations. We show that for a specific choice of continuous dictionary, linear systems with nuclear-norm regularization have the same solutions as a BLasso problem. Although this fact was already partially understood in the...
Article
Full-text available
Among the different indicators that quantify the spread of an epidemic such as the on-going COVID-19, stands first the reproduction number which measures how many people can be contaminated by an infected person. In order to permit the monitoring of the evolution of this number, a new estimation procedure is proposed here, assuming a well-accepted...
Preprint
This article considers "sketched learning," or "compressive learning," an approach to large-scale machine learning where datasets are massively compressed before learning (e.g., clustering, classification, or regression) is performed. In particular, a "sketch" is first constructed by computing carefully chosen nonlinear random features (e.g., rando...
Article
Full-text available
We characterize proximity operators, that is to say functions that map a vector to a solution of a penalized least-squares optimization problem. Proximity operators of convex penalties have been widely studied and fully characterized by Moreau. They are also widely used in practice with nonconvex penalties such as the \(\ell ^0\) pseudo-norm, yet t...
Preprint
Full-text available
Among the different indicators that quantify the spread of an epidemic, such as the on-going COVID-19, stands first the reproduction number which measures how many people can be contaminated by an infected person. In order to permit the mon- itoring of the evolution of this number, a new estimation procedure is proposed here, assuming a well-accept...
Preprint
Full-text available
Recent advances in audio declipping have substantially improved the state of the art in certain saturation regimes. Yet, practitioners need guidelines to choose a method, and while existing benchmarks have been instrumental in advancing the field, larger-scale experiments are needed to guide such choices. First, we show that the saturation levels i...
Preprint
We provide statistical learning guarantees for two unsupervised learning tasks in the context of compressive statistical learning, a general framework for resource-efficient large-scale learning that we introduced in a companion paper. The principle of compressive statistical learning is to compress a training collection, in one pass, into a low-di...
Preprint
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work with extrem...
Article
Full-text available
Measures of brain activity through functional magnetic resonance imaging (fMRI) or electroencephalography (EEG), two complementary modalities, are ground solutions in the context of neurofeedback (NF) mechanisms for brain rehabilitation protocols. While NF-EEG (in which real-time neurofeedback scores are computed from EEG signals) has been explored...
Article
We investigate the problem of recovering jointly [Formula: see text]-rank and [Formula: see text]-bisparse matrices from as few linear measurements as possible, considering arbitrary measurements as well as rank-one measurements. In both cases, we show that [Formula: see text] measurements make the recovery possible in theory, meaning via a nonprac...
Preprint
We propose a numerical interferometry method for identification of optical multiply-scattering systems when only intensity can be measured. Our method simplifies the calibration of the optical transmission matrices from a quadratic to a linear inverse problem by first recovering the phase of the measurements. We show that by carefully designing the...
Preprint
Full-text available
Optimal transport distances are powerful tools to compare probability distributions and have found many applications in machine learning. Yet their algorithmic complexity prevents their direct use on large scale datasets. To overcome this challenge, practitioners compute these distances on minibatches {\em i.e.} they average the outcome of several...
Preprint
Full-text available
In this paper, we address the problem of reducing the memory footprint of ResNet-like convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs and not its weights. The advantage of our approach is that it minimizes the loss reconstruction error fo...
Preprint
In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media. A signal of interest $\...
Conference Paper
Full-text available
In this article we investigate how the local Gaussian model (LGM) can be applied to separate sound sources in the higher-order ambisonics (HOA) domain. First, we show that in the HOA domain, the mathematical formalism of the local Gaussian model remains the same as in the microphone domain. Second, using an off-the shelf source separation toolbox (...
Article
There are two major routes to address inverse problems in signal and image processing, such as denoising or deblurring. A first route relies on Bayesian modeling, where prior probabilities embody models of both the distribution of the unknown variables and their statistical dependence with respect to the observed data. Estimation typically relies o...
Article
In sketched clustering, a dataset of T samples is first sketched down to a vector of modest size, from which the centroids are subsequently extracted. Advantages include i) reduced storage complexity and ii) centroid extraction complexity independent of T. For the sketching methodology recently proposed by Keriven, et al., which can be interpreted...
Article
In this paper, we propose a way to combine two acceleration techniques for the $\ell _1$ -regularized least squares problem: safe screening tests, which allow to eliminate useless dictionary atoms; and the use of fast structured approximations of the dictionary matrix. To do so, we introduce a new family of screening tests, termed stable screeni...
Preprint
Full-text available
We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical ap...
Preprint
Full-text available
This paper presents new theoretical results on sparse recovery guarantees for a greedy algorithm, Orthogonal Matching Pursuit (OMP), in the context of continuous parametric dictionaries. Here, the continuous setting means that the dictionary is made up of an infinite (potentially uncountable) number of atoms. In this work, we rely on the Hilbert st...
Preprint
Full-text available
Measures of brain activity through functional magnetic resonance imaging (fMRI) or Electroencephalography (EEG), two complementary modalities, are ground solutions in the context of neuro-feedback (NF) mechanisms for brain-rehabilitation protocols. Though NF-EEG (real-time neurofeedback scores computed from EEG) have been explored for a very long t...
Preprint
Modern neural networks are over-parametrized. In particular, each rectified linear hidden unit can be modified by a multiplicative factor by adjusting input and output weights, without changing the rest of the network. Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the L2 norm of the weights, equivalen...
Preprint
Full-text available
This preprint is not a finished product. It is presently intended to gather community feedback. We investigate the problem of recovering jointly $r$-rank and $s$-bisparse matrices from as few linear measurements as possible, considering arbitrary measurements as well as rank-one measurements. In both cases, we show that $m \asymp r s \ln(en/s)$ mea...
Article
In many applications it is useful to replace the Moore-Penrose pseudoinverse (MPP) by a different generalized inverse with more favorable properties. We may want, for example, to have many zero entries, but without giving up too much of the stability of the MPP. One way to quantify stability is by how much the Frobenius norm of a generalized invers...
Chapter
The spectral graph wavelet transform (SGWT) defines wavelet transforms appropriate for data defined on the vertices of a weighted graph. Weighted graphs provide an extremely flexible way to model the data domain for a large number of important applications (such as data defined on vertices of social networks, transportation networks, brain connecti...
Preprint
In this paper, we propose a way to combine two acceleration techniques for the $\ell_1$-regularized least squares problem: safe screening tests, which allow to eliminate useless dictionary atoms, and the use of fast structured approximations of the dictionary matrix. To do so, we introduce a new family of screening tests, termed stable screening, w...
Preprint
Full-text available
This paper addresses the general problem of blind echo retrieval, i.e., given M sensors measuring in the discrete-time domain M mixtures of K delayed and attenuated copies of an unknown source signal, can the echo locations and weights be recovered? This problem has broad applications in fields such as sonars, seismol-ogy, ultrasounds or room acous...
Preprint
In many applications it is useful to replace the Moore-Penrose pseudoinverse (MPP) by another generalized inverse with more favorable properties. We may want, for example, to have many zero entries, but without giving up too much of the stability of the MPP. One way to quantify stability is to look at how much the Frobenius norm of a generalized in...
Preprint
We characterize proximity operators, that is to say functions that map a vector to a solution of a penalized least squares optimization problem. Proximity operators of convex penalties have been widely studied and fully characterized by Moreau. They are also widely used in practice with nonconvex penalties such as the {\ell} 0 pseudo-norm, yet the...
Preprint
There are two major routes to address the ubiquitous family of inverse problems appearing in signal and image processing, such as denoising or deblurring. A first route relies on Bayesian modeling, where prior probabilities are used to embody models of both the distribution of the unknown variables and their statistical dependence with the observed...
Preprint
The 1-norm is a good convex regularization for the recovery of sparse vectors from under-determined linear measurements. No other convex regularization seems to surpass its sparse recovery performance. How can this be explained? To answer this question, we define several notions of "best" (convex) regulariza-tion in the context of general low-dimen...
Chapter
A new dictionary learning model is introduced where the dictionary matrix is constrained as a sum of R Kronecker products of K terms. It offers a more compact representation and requires fewer training data than the general dictionary learning model, while generalizing Tucker dictionary learning. The proposed Higher Order Sum of Kroneckers model ca...
Article
Full-text available
In this paper, we study the preservation of information in ill-posed non-linear inverse problems, where the measured data is assumed to live near a low-dimensional model set. We provide necessary and sufficient conditions for the existence of a so-called instance optimal decoder, i.e. that is robust to noise and modelling error. Inspired by existin...
Article
Full-text available
In this paper, we propose a new adaptive modulation based on the wavelet packet transform, which targets a good resistance against frequency selective channels while avoiding a large PAPR. Classical multi-carrier modulation schemes divide the channel bandwidth into narrowband sub-channels to improve its robustness against frequency selective fading...
Article
Full-text available
To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to...
Article
In sketched clustering, the dataset is first sketched down to a vector of modest size, from which the cluster centers are subsequently extracted. The goal is to perform clustering more efficiently than with methods that operate on the full training data, such as k-means++. For the sketching methodology recently proposed by Keriven, Gribonval, et al...
Article
We propose a unified modeling and algorithmic framework for audio restoration problem. It encompasses analysis sparse priors as well as more classical synthesis sparse priors, and regular sparsity as well as various forms of structured sparsity embodied by shrinkage operators (such as social shrinkage). The versatility of the framework is illustrat...
Article
Full-text available
The graph Fourier transform (GFT) is in general dense and requires O(n^2) time to compute and O(n^2) memory space to store. In this paper, we pursue our previous work on the approximate fast graph Fourier transform (FGFT). The FGFT is computed via a truncated Jacobi algorithm, and is defined as the product of J Givens rotations (very sparse orthogo...
Conference Paper
In this paper a multi-modal approach is black presented and validated on real data to estimate the brain neuronal sources based on EEG and fMRI. Combining these two modalities can lead to source estimations with high spatio-temporal resolution. The joint method is based on the idea of linear model already presented in the literature where each of t...
Article
Full-text available
Over the past decades, a multitude of different brain source imaging algorithms have been developed to identify the neural generators underlying the surface electroencephalography measurements. While most of these techniques focus on determining the source positions, only a small number of recently developed algorithms provides an indication of the...
Article
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benef...
Article
This is the second part of a two-paper series on generalized inverses that minimize matrix norms. In Part II we focus on generalized inverses that are minimizers of entrywise p norms whose main representative is the sparse pseudoinverse for $p = 1$. We are motivated by the idea to replace the Moore-Penrose pseudoinverse by a sparser generalized inv...