Rémi Gribonval’s research while affiliated with Claude Bernard University Lyon 1 and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (385)


Identifying a Piecewise Affine Signal From Its Nonlinear Observation—Application to DNA Replication Analysis
  • Article

January 2025

·

12 Reads

IEEE Transactions on Signal Processing

Clara Lage

·

·

·

[...]

·

Rémi Gribonval

We consider a nonlinear inverse problem where the unknown is assumed to be piecewise affine, which is motivated by an application in DNA replication analysis. Since traditional algorithmic and theoretical tools from linear inverse problems do not apply, we propose a novel formalism and computational approach to harness it. In the noiseless case, we establish sufficient identifiability conditions, and prove that the solution is the unique minimizer of a nonconvex optimization problem. The latter is specially challenging because of its multiple local minima. We propose an optimization algorithm that provably finds the global solution in the noiseless case and is shown to be numerically effective for noisy signals. When instantiated in a DNA replication analysis scenario, where the unknown is a so-called timing profile, the approach is shown to be more computationally effective than the state-of-the-art optimization methods by at least 30 orders of magnitude. Besides, it automatically recovers the full configuration of the DNA replication dynamics, which is crucial for DNA replication analysis and was not possible with previous methods.


PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms

December 2024

·

5 Reads

Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computation of the spectral decomposition of the Laplacian matrix, which is not applicable for large graphs with a large number of communities. This work introduces PASCO, an overlay that accelerates clustering algorithms. Our method consists of three steps: 1-We compute several independent small graphs representing the input graph by applying an efficient and structure-preserving coarsening algorithm. 2-A clustering algorithm is run in parallel onto each small graph and provides several partitions of the initial graph. 3-These partitions are aligned and combined with an optimal transport method to output the final partition. The PASCO framework is based on two key contributions: a novel global algorithm structure designed to enable parallelization and a fast, empirically validated graph coarsening algorithm that preserves structural properties. We demonstrate the strong performance of 1 PASCO in terms of computational efficiency, structural preservation, and output partition quality, evaluated on both synthetic and real-world graph datasets.


Butterfly factorization with error guarantees

November 2024

·

14 Reads

In this paper, we investigate the butterfly factorization problem, i.e., the problem of approximating a matrix by a product of sparse and structured factors. We propose a new formal mathematical description of such factors, that encompasses many different variations of butterfly factorization with different choices of the prescribed sparsity patterns. Among these supports, we identify those that ensure that the factorization problem admits an optimum, thanks to a new property called "chainability". For those supports we propose a new butterfly algorithm that yields an approximate solution to the butterfly factorization problem and that is supported by stronger theoretical guarantees than existing factorization methods. Specifically, we show that the ratio of the approximation error by the minimum value is bounded by a constant, independent of the target matrix.


Path-metrics, pruning, and generalization

May 2024

·

7 Reads

Analyzing the behavior of ReLU neural networks often hinges on understanding the relationships between their parameters and the functions they implement. This paper proves a new bound on function distances in terms of the so-called path-metrics of the parameters. Since this bound is intrinsically invariant with respect to the rescaling symmetries of the networks, it sharpens previously known bounds. It is also, to the best of our knowledge, the first bound of its kind that is broadly applicable to modern networks such as ResNets, VGGs, U-nets, and many more. In contexts such as network pruning and quantization, the proposed path-metrics can be efficiently computed using only two forward passes. Besides its intrinsic theoretical interest, the bound yields not only novel theoretical generalization bounds, but also a promising proof of concept for rescaling-invariant pruning.


Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
  • Preprint
  • File available

May 2024

·

41 Reads

Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely unknown. In this paper, we characterize "all" conservation laws in this general setting. In stark contrast to the case of gradient flows, we prove that the conservation laws for momentum-based dynamics exhibit temporal dependence. Additionally, we often observe a "conservation loss" when transitioning from gradient flow to momentum dynamics. Specifically, for linear networks, our framework allows us to identify all momentum conservation laws, which are less numerous than in the gradient flow case except in sufficiently over-parameterized regimes. With ReLU networks, no conservation law remains. This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF): all conservation laws can be determined in the gradient flow context, yet none persists in the momentum case.

Download

A theory of optimal convex regularization for low-dimensional recovery

May 2024

·

7 Reads

·

6 Citations

Information and Inference A Journal of the IMA

We consider the problem of recovering elements of a low-dimensional model from under-determined linear measurements. To perform recovery, we consider the minimization of a convex regularizer subject to a data fit constraint. Given a model, we ask ourselves what is the ‘best’ convex regularizer to perform its recovery. To answer this question, we define an optimal regularizer as a function that maximizes a compliance measure with respect to the model. We introduce and study several notions of compliance. We give analytical expressions for compliance measures based on the best-known recovery guarantees with the restricted isometry property. These expressions permit to show the optimality of the 1\ell ^{1}-norm for sparse recovery and of the nuclear norm for low-rank matrix recovery for these compliance measures. We also investigate the construction of an optimal convex regularizer using the examples of sparsity in levels and of sparse plus low-rank models.


The matrix algorithm. Calculates S
Left: a temporal network defined as a series of events ordered in time (ti<tj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_i<t_j$$\end{document} for i<j\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i<j$$\end{document}). The out-component of node 3 is circled in blue; the in-component of node 2 is circled in purple. Right: the corresponding component matrix. A row depicts the in-component of a node (we emphasise that of node 2). A column depicts the out-component of a node (we emphasise that of node 3): a non-zero element in the uth column at coordinate v, means that node v belongs to the out-component of node u
From a given temporal network (left graph), we compute the Component Matrix (right matrix) of size n×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document}, with n=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=5$$\end{document}, by scanning the series of events. For each event, we compute the OR operation between the rows of the matrix corresponding to the interacting nodes and replace them by the result. Matrix S4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {S_4}$$\end{document} is the component matrix at the end of the streaming after m=4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=4$$\end{document} events
Fraction of computational time (top row) and memory usage (bottom row) of the component matrix methods divided by the ones of the EG+HLL method. a, c Depict results for the exact component matrix method, while b, d are for its approximate solution using HLL. All scales are logarithmic with colour blue indicating when the component matrix method performs better (and red in the contrary case)
From a 5 nodes temporal network, several hashed version are computed with 3 nodes each. Then, every hashed graph can be used to compute a small component matrix thanks to our matrix algorithm. Finally, the different component matrices can be fused to compute an approximate solution of the component matrix of the initial temporal network

+3

Temporal network compression via network hashing

January 2024

·

24 Reads

Applied Network Science

Pairwise temporal interactions between entities can be represented as temporal networks, which code the propagation of processes such as epidemic spreading or information cascades, evolving on top of them. The largest outcome of these processes is directly linked to the structure of the underlying network. Indeed, a node of a network at a given time cannot affect more nodes in the future than it can reach via time-respecting paths. This set of nodes reachable from a source defines an out-component, which identification is costly. In this paper, we propose an efficient matrix algorithm to tackle this issue and show that it outperforms other state-of-the-art methods. Secondly, we propose a hashing framework to coarsen large temporal networks into smaller proxies on which out-components are more easily estimated, and then recombined to obtain the initial components. Our graph hashing solution has implications in privacy respecting representation of temporal networks.


Private quantiles estimation in the presence of atoms

August 2023

·

8 Reads

·

5 Citations

Information and Inference A Journal of the IMA

We consider the differentially private estimation of multiple quantiles (MQ) of a distribution from a dataset, a key building block in modern data analysis. We apply the recent non-smoothed Inverse Sensitivity (IS) mechanism to this specific problem. We establish that the resulting method is closely related to the recently published ad hoc algorithm JointExp. In particular, they share the same computational complexity and a similar efficiency. We prove the statistical consistency of these two algorithms for continuous distributions. Furthermore, we demonstrate both theoretically and empirically that this method suffers from an important lack of performance in the case of peaked distributions, which can degrade up to a potentially catastrophic impact in the presence of atoms. Its smoothed version (i.e. by applying a max kernel to its output density) would solve this problem, but remains an open challenge to implement. As a proxy, we propose a simple and numerically efficient method called Heuristically Smoothed JointExp (HSJointExp), which is endowed with performance guarantees for a broad class of distributions and achieves results that are orders of magnitude better on problematic datasets.


Temporal network compression via network hashing

July 2023

·

13 Reads

Pairwise temporal interactions between entities can be represented as temporal networks, which code the propagation of processes such as epidemic spreading or information cascades, evolving on top of them. The largest outcome of these processes is directly linked to the structure of the underlying network. Indeed, a node of a network at given time cannot affect more nodes in the future than it can reach via time-respecting paths. This set of nodes reachable from a source defines an out-component, which identification is costly. In this paper, we propose an efficient matrix algorithm to tackle this issue and show that it outperforms other state-of-the-art methods. Secondly, we propose a hashing framework to coarsen large temporal networks into smaller proxies on which out-components are easier to estimate, and then recombined to obtain the initial components. Our graph hashing solution has implications in privacy respecting representation of temporal networks.


Butterfly factorization by algorithmic identification of rank-one blocks

July 2023

·

3 Reads

Many matrices associated with fast transforms posess a certain low-rank property characterized by the existence of several block partitionings of the matrix, where each block is of low rank. Provided that these partitionings are known, there exist algorithms, called butterfly factorization algorithms, that approximate the matrix into a product of sparse factors, thus enabling a rapid evaluation of the associated linear operator. This paper proposes a new method to identify algebraically these block partitionings for a matrix admitting a butterfly factorization, without any analytical assumption on its entries.


Citations (53)


... Another possibility is to use a convex proxy of minimization (2) (i.e. using a convex R) that guarantees the recovery of elements of Σ. A general method for the best possible choice of convex regularization R has been presented in [37]. However, in terms of practical recovery guarantees, one must combine guarantees of success of (2) with convergence guarantees of the chosen algorithm. ...

Reference:

Towards optimal algorithms for the recovery of low-dimensional models with linear rates
A theory of optimal convex regularization for low-dimensional recovery
  • Citing Article
  • May 2024

Information and Inference A Journal of the IMA

... Duchi et al. (2014;; Barber & Duchi (2014); Acharya et al. (2021e); Lalanne et al. (2023a) present general frameworks for deriving minimax lower-bounds under privacy constraints. Many parametric problems have already been studied, notably in Acharya et al. (2018;2021e); Karwa & Vadhan (2018); Kamath et al. (2019); Biswas et al. (2020); Lalanne et al. (2022;2023b); Kamath et al. (2022); Singhal (2023). Recently, some important contributions were made. ...

Private quantiles estimation in the presence of atoms
  • Citing Article
  • August 2023

Information and Inference A Journal of the IMA

... A series of works including [6,12,15,24,26] aim to compute, approximate or bound the Lipschitz constant of neural-network architectures. We also refer to [4,11,21] on the interplay between Lipschitz estimates and generalization bounds. ...

Approximation Speed of Quantized vs. Unquantized ReLU Neural Networks and Beyond
  • Citing Article
  • June 2023

IEEE Transactions on Information Theory

... In this paper, we only focus on the lossless case. By employing an achievable scheme using novel concepts and algorithms introduced in [42], [43], Theorems 1 and 2 establish the single-shot system capacity for both cases P ℓ > Λ ℓ and P ℓ = Λ ℓ , where these general cases are interesting because the tessellation patterns we design must accommodate tiles of various sizes and shapes. ...

Spurious Valleys, NP-hardness, and Tractability of Sparse Matrix Factorization With Fixed Support
  • Citing Article
  • January 2022

SIAM Journal on Matrix Analysis and Applications

... An alternative definition of the butterfly factorization refers to a sparse matrix factorization with specific constraints on the sparse factors. According to [5,6,20,44,4,27], a matrix A admits a certain butterfly factorization if, up to some row and column permutations, it can be factorized into a certain number of factors X 1 , . . . , X L for a prescribed number L ≥ 2, such that each factor X ℓ for ℓ ∈ L satisfies a so-called fixed-support constraint, i.e., the support of X ℓ , denoted supp(X ℓ ), is included in the support of a prescribed binary matrix S ℓ . ...

Efficient Identification of Butterfly Sparse Matrix Factorizations
  • Citing Article
  • January 2022

SIAM Journal on Mathematics of Data Science

... Context. Inverse problems are ubiquitous in signal and image processing [27,33,43,65], with a wealth of domains of application as diverse as nonlinear physics [14,50], astronomy [17], hyperspectral imaging [6], tomography [58], cardiology [3] and epidemiology [49]. A general inverse problem consists in estimating underlying quantities of interest from direct or indirect observations. ...

Nonsmooth Convex Optimization to Estimate the Covid-19 Reproduction Number Space-Time Evolution With Robustness Against Low Quality Data
  • Citing Article
  • January 2022

IEEE Transactions on Signal Processing

... For ReLU networks, Rolnick and Kording [26] introduced reverse engineering technique to construct finite samples for distinguishing ReLU networks; however, they did not provide an explicit number of samples required for exact recovery. Furthermore, Stock and Gribonval [28] investigated the parameter identification problem for shallow ReLU networks within a bounded set. In [28,Theorem 6], they demonstrated that for a given network f N that meets specific structural criteria, there exists a bounded set X ⊆ R d , constructed as a union of small balls, such that if f N (x) = f N ′ (x) for all x ∈ X , then N ′ is equivalent to N up to permutation and scaling ambiguity. ...

An Embedding of ReLU Networks and an Analysis of Their Identifiability

Constructive Approximation

... In the case where graphs would be too large for eigendecompositions, strategies to approximate the heat kernel could be developed. Previous works used Taylor expansion, Chebychev approximation, or low frequency approximation (Tsitsulin et al. 2018;Marcotte et al. 2022). ...

Fast Multiscale Diffusion On Graphs
  • Citing Conference Paper
  • May 2022

... An alternative definition of the butterfly factorization refers to a sparse matrix factorization with specific constraints on the sparse factors. According to [5,6,20,44,4,27], a matrix A admits a certain butterfly factorization if, up to some row and column permutations, it can be factorized into a certain number of factors X 1 , . . . , X L for a prescribed number L ≥ 2, such that each factor X ℓ for ℓ ∈ L satisfies a so-called fixed-support constraint, i.e., the support of X ℓ , denoted supp(X ℓ ), is included in the support of a prescribed binary matrix S ℓ . ...

Fast Learning of Fast Transforms, with Guarantees
  • Citing Conference Paper
  • May 2022

... More precisely, we detail a Chebychev approximation of the Diffusion process, and a procedure to select the diffusion time. This chapter corresponds to two papers: one written with the help of Sibylle Marcotte during her internship currently at the submission stage [19], and one accepted at the ICTAI'21 conference [20]. ...

Optimization of the Diffusion Time in Graph Diffused-Wasserstein Distances: Application to Domain Adaptation
  • Citing Conference Paper
  • November 2021