
Alain Durmus- Institut Mines-Télécom
Alain Durmus
- Institut Mines-Télécom
About
98
Publications
8,895
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,934
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (98)
Classifier-Free Guidance (CFG) is a widely used technique for improving conditional diffusion models by linearly combining the outputs of conditional and unconditional denoisers. While CFG enhances visual quality and improves alignment with prompts, it often reduces sample diversity, leading to a challenging trade-off between quality and diversity....
This paper proposes a novel analysis for the Scaffold algorithm, a popular method for dealing with data heterogeneity in federated learning. While its convergence in deterministic settings--where local control variates mitigate client drift--is well established, the impact of stochastic gradient updates on its performance is less understood. To add...
Denoising diffusion models have driven significant progress in the field of Bayesian inverse problems. Recent approaches use pre-trained diffusion models as priors to solve a wide range of such problems, only leveraging inference-time compute and thereby eliminating the need to retrain task-specific models on the same dataset. To approximate the po...
Recent advancements in solving Bayesian inverse problems have spotlighted denoising diffusion models (DDMs) as effective priors. Although these have great potential, DDM priors yield complex posterior distributions that are challenging to sample. Existing approaches to posterior sampling in this context address
this problem either by retraining mod...
In this paper, we present a novel analysis of FedAvg with constant step size, relying on the Markov property of the underlying process. We demonstrate that the global iterates of the algorithm converge to a stationary distribution and analyze its resulting bias and variance relative to the problem's solution. We provide a first-order expansion of t...
Diffusion models have recently shown considerable potential in solving Bayesian inverse problems when used as priors. However, sampling from the resulting denoising posterior distributions remains a challenge as it involves intractable terms. To tackle this issue, state-of-the-art approaches formulate the problem as that of sampling from a surrogat...
We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation technique to reduce the asymptotic bias of SGD at the expense of a mild i...
Vehicle-to-everything (V2X) communication technology is revolutionizing transportation by enabling interactions between vehicles, devices, and infrastructures. This connectivity enhances road safety, transportation efficiency, and driver assistance systems. V2X benefits from Machine Learning, enabling real-time data analysis, better decision-making...
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recen...
This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased obse...
In this paper, we establish moment and Bernstein-type inequalities for additive functionals of geometrically ergodic Markov chains. These inequalities extend the corresponding inequalities for independent random variables. Our conditions cover Markov chains converging geometrically to the stationary distribution either in weighted total variation n...
We develop a new approach to study the long time behaviour of solutions to nonlinear stochastic differential equations in the sense of McKean, as well as propagation of chaos for the corresponding mean-field particle system approximations. Our approach is based on a sticky coupling between two solutions to the equation. We show that the distance pr...
There is substantial empirical evidence about the success of dynamic implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler (NUTS), in many challenging inference problems but theoretical results about their behavior are scarce. The aim of this paper is to fill this gap. More precisely, we consider a general class of MCMC al...
Multi-marginal Optimal Transport (mOT), a generalization of OT, aims at minimizing the integral of a cost function with respect to a distribution with some prescribed marginals. In this paper, we consider an entropic version of mOT with a tree-structured quadratic cost, i.e., a function that can be written as a sum of pairwise cost functions betwee...
Transport maps can ease the sampling of distributions with non-trivial geometries by transforming them into distributions that are easier to handle. The potential of this approach has risen with the development of Normalizing Flows (NF) which are maps parameterized with deep neural networks trained to push a reference distribution towards a target....
Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors have been explored in the literature, from simple ones expressing local properties to more involved ones exploiting image redundancy at a n...
This paper focuses on Bayesian inference in a federated learning context (FL). While several distributed MCMC algorithms have been proposed, few consider the specific limitations of FL such as communication bottlenecks and statistical heterogeneity. Recently, Federated Averaging Langevin Dynamics (FALD) was introduced, which extends the Federated A...
This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a $d$-dimensional linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$, for which $(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only...
This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN) in the overparameterized regime, i.e., when the number of neurons tends to infinity. More specifically, we consider overparameterized two-layer BNN and point out a critical issue in the mean-field VI training. This problem arises from the decompositio...
Personalised federated learning (FL) aims at collaboratively learning a machine learning model taylored for each client. Albeit promising advances have been made in this direction, most of existing approaches works do not allow for uncertainty quantification which is crucial in many applications. In addition, personalisation in the cross-device set...
We develop a new approach to study the long time behaviour of solutions to nonlinear stochastic differential equations in the sense of McKean, as well as propagation of chaos for the corresponding mean-field particle system approximations. Our approach is based on a sticky coupling between two solutions to the equation. We show that the distance pr...
Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors have been explored in the literature, from simple ones expressing local properties to more involved ones exploiting image redundancy at a n...
The present paper focuses on the problem of sampling from a given target distribution $\pi$ defined on some general state space. To this end, we introduce a novel class of non-reversible Markov chains, each chain being defined on an extended state space and having an invariant probability measure admitting $\pi$ as a marginal distribution. The prop...
While the Metropolis Adjusted Langevin Algorithm (MALA) is a popular and widely used Markov chain Monte Carlo method, very few papers derive conditions that ensure its convergence. In particular, to the authors' knowledge, assumptions that are both easy to verify and guarantee geometric convergence, are still missing. In this work, we establish $V$...
Performing reliable Bayesian inference on a big data scale is becoming a keystone in the modern era of machine learning. A workhorse class of methods to achieve this task are Markov chain Monte Carlo (MCMC) algorithms and their design to handle distributed datasets has been the subject of many works. However, existing methods are not completely eit...
In this paper, we establish moment and Bernstein-type inequalities for additive functionals of geometrically ergodic Markov chains. These inequalities extend the corresponding inequalities for independent random variables. Our conditions cover Markov chains converging geometrically to the stationary distribution either in $V$-norms or in weighted W...
This paper establishes non-asymptotic bounds on Wasserstein distances between the invariant probability measures of inexact MCMC methods and their target distribution. In particular, the results apply to the unadjusted Langevin algorithm and to unadjusted Hamiltonian Monte Carlo, but also to methods relying on other discretization schemes. Our focu...
We study the convergence in total variation and $V$-norm of discretization schemes of the underdamped Langevin dynamics. Such algorithms are very popular and commonly used in molecular dynamics and computational statistics to approximatively sample from a target distribution of interest. We show first that, for a very large class of schemes, a mino...
Variational auto-encoders (VAE) are popular deep latent variable models which are trained by maximizing an Evidence Lower Bound (ELBO). To obtain tighter ELBO and hence better variational approximations, it has been proposed to use importance sampling to get a lower variance estimate of the evidence. However, importance sampling is known to perform...
Variational auto-encoders (VAE) are popular deep latent variable models which are trained by maximizing an Evidence Lower Bound (ELBO). To obtain tighter ELBO and hence better variational approximations, it has been proposed to use importance sampling to get a lower variance estimate of the evidence. However, importance sampling is known to perform...
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approxi...
Performing reliable Bayesian inference on a big data scale is becoming a keystone in the modern era of machine learning. A workhorse class of methods to achieve this task are Markov chain Monte Carlo (MCMC) algorithms and their design to handle distributed datasets has been the subject of many works. However, existing methods are not completely eit...
This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}\theta = \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $...
Federated learning aims at conducting inference when data are decentralised and locally stored on several clients, under two main constraints: data ownership and communication overhead. In this paper, we address these issues under the Bayesian paradigm. To this end, we propose a novel Markov chain Monte Carlo algorithm coined \texttt{QLSD} built up...
Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation...
In this paper, we provide bounds in Wasserstein and total variation distances between the distributions of the successive iterates of two functional autoregressive processes with isotropic Gaussian noise of the form $Y_{k+1} = \mathrm{T}_\gamma(Y_k) + \sqrt{\gamma\sigma^2} Z_{k+1}$ and $\tilde{Y}_{k+1} = \tilde{\mathrm{T}}_\gamma(\tilde{Y}_k) + \sq...
Simultaneously sampling from a complex distribution with intractable normalizing constant and approximating expectations under this distribution is a notoriously challenging problem. We introduce a novel scheme, Invertible Flow Non Equilibrium Sampling (InFine), which departs from classical Sequential Monte Carlo (SMC) and Markov chain Monte Carlo...
Since the seminal work of Venkatakrishnan et al. (2013), Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) or Maximum A Posteriori (MAP) estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image...
This paper studies fixed step-size stochastic approximation (SA) schemes, including stochastic gradient schemes, in a Riemannian framework. It is motivated by several applications, where geodesics can be computed explicitly, and their use accelerates crude Euclidean methods. A fixed step-size scheme defines a family of time-homogeneous Markov chain...
This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as u...
Markov Chain Monte Carlo (MCMC) is a class of algorithms to sample complex and high-dimensional probability distributions. The Metropolis-Hastings (MH) algorithm, the workhorse of MCMC, provides a simple recipe to construct reversible Markov kernels. Reversibility is a tractable property which implies a less tractable but essential property here, i...
This paper presents a detailed theoretical analysis of the three stochastic approximation proximal gradient algorithms proposed in our companion paper [49] to set regularization parameters by marginal maximum likelihood estimation. We prove the convergence of a more general stochastic approximation scheme that includes the three algorithms of [49]...
Many imaging problems require solving an inverse problem that is ill-conditioned or ill-posed. Imaging methods typically address this difficulty by regularising the estimation problem to make it well-posed. This often requires setting the value of the so-called regularisation parameters that control the amount of regularisation enforced. These para...
In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the partic...
This paper analyzes the convergence for a large class of Riemannian stochastic approximation (SA) schemes, which aim at tackling stochastic optimization problems. In particular, the recursions we study use either the exponential map of the considered manifold (geodesic schemes) or more general retraction functions (retraction schemes) used as a pro...
This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with decreasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) in a weak and strong sense. Then, motivated by recent analyses of deterministic a...
The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures. However, the computational and statistica...
In this contribution, we propose a new computationally efficient method to combine Variational Inference (VI) with Markov Chain Monte Carlo (MCMC). This approach can be used with generic MCMC kernels, but is especially well suited to \textit{MetFlow}, a novel family of MCMC algorithms we introduce, in which proposals are obtained using Normalizing...
Recent years have seen the rise of convolutional neural network techniques in exemplar-based image synthesis. These methods often rely on the minimization of some variational formulation on the image space for which the minimizers are assumed to be the solutions of the synthesis problem. In this paper we investigate, both theoretically and experime...
We consider in this paper the problem of sampling a high-dimensional probability distribution π having a density w.r.t. the Lebesgue measure on Rd, known up to a normalization constant x ∣→ π(x) = e−U(x)/( Rd e−U(y)dy. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that U is continuously d...
Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined be...
Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation...
Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g. Wasserstein generative adversarial networks, Wasserstein autoencoders). Emerging from computational optimal trans...
Stochastic Gradient Langevin Dynamics (SGLD) has emerged as a key MCMC algorithm for Bayesian learning from large scale datasets. While SGLD with decreasing step sizes converges weakly to the posterior distribution, the algorithm is often used with a constant step size in practice and has demonstrated successes in machine learning tasks. The curren...
In this work, we establish $\mathrm{L}^2$-exponential convergence for a broad class of Piecewise Deterministic Markov Processes recently proposed in the context of Markov Process Monte Carlo methods and covering in particular the Randomized Hamiltonian Monte Carlo, the Zig-Zag process and the Bouncy Particle Sampler. The kernel of the symmetric par...
A new methodology is presented for the construction of control variates to reduce the variance of additive functionals of Markov Chain Monte Carlo (MCMC) samplers. Our control variates are defined as linear combinations of functions whose coefficients are obtained by minimizing a proxy for the asymptotic variance. The construction is theoretically...
Piecewise Deterministic Markov Processes (PDMPs) are studied in a general framework. First, different constructions are proven to be equivalent. Second, we introduce a coupling between two PDMPs following the same differential flow which implies quantitative bounds on the total variation between the marginal distributions of the two processes. Fina...
The Bouncy Particle Sampler (BPS) is a Monte Carlo Markov Chain algorithm to sample from a target density known up to a multiplicative constant. This method is based on a kinetic piecewise deterministic Markov process for which the target measure is invariant. This paper deals with theoretical properties of BPS. First, we establish geometric ergodi...
By building up on the recent theory that established the connection between implicit generative modeling and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, whi...
Based on a coupling approach, we prove uniform in time propagation of chaos for weakly interacting mean-field particle systems with possibly non-convex confinement and interaction potentials. The approach is based on a combination of reflection and synchronous couplings applied to the individual particles. It provides explicit quantitative bounds t...
In this paper, we provide new insights on the Unadjusted Langevin Algorithm. We show that this method can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order $2$. Using this interpretation and techniques borrowed from convex optimization, we give a non-asymptotic analysis of thi...
In this article, we consider the problem of sampling from a probability measure $\pi$ having a density on $\mathbb{R}^d$ known up to a normalizing constant, $x\mapsto \mathrm{e}^{-U(x)} / \int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y$. The Euler discretization of the Langevin stochastic differential equation (SDE) is known to be unstable in a...
In this article, we consider the problem of sampling from a probability measure $\pi$ having a density on $\mathbb{R}^d$ known up to a normalizing constant, $x\mapsto \mathrm{e}^{-U(x)} / \int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y$. The Euler discretization of the Langevin stochastic differential equation (SDE) is known to be unstable in a...
We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the...
We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the...
We derive explicit bounds for the computation of normalizing constants $Z$ for log-concave densities $\pi = \exp(-U)/Z$ with respect to the Lebesgue measure on $\mathbb{R}^d$. Our approach relies on a Gaussian annealing combined with recent and precise bounds on the Unadjusted Langevin Algorithm (High-dimensional Bayesian inference via the Unadjust...
We derive explicit bounds for the computation of normalizing constants $Z$ for log-concave densities $\pi = \exp(-U)/Z$ with respect to the Lebesgue measure on $\mathbb{R}^d$. Our approach relies on a Gaussian annealing combined with recent and precise bounds on the Unadjusted Langevin Algorithm (High-dimensional Bayesian inference via the Unadjust...
This paper presents a detailed theoretical analysis of the Langevin Monte Carlo sampling algorithm recently introduced in Durmus et al. (Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau, 2016) when applied to log-concave probability distributions that are restricted to a convex body $\mathsf{K}$. This...
This paper discusses the stability properties of the Hamiltonian Monte Carlo (HMC) algorithm used to sample from a positive target density $\pi$ on $\mathbb{R}^d$, with either a fixed or a random number of integration steps. Under mild conditions on the potential $U$ associated with $\pi$, we show that the Markov kernel associated to the HMC algori...
Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) methods have become popular in modern data analysis problems due to their computational efficiency. Even though they have proved useful for many statistical models, the application of SG-MCMC to non- negative matrix factorization (NMF) models has not yet been extensively explored. In this study...
Modern imaging methods rely strongly on Bayesian inference techniques to solve challenging imaging problems. Currently, the predominant Bayesian computation approach is convex optimisation, which scales very efficiently to high dimensional image models and delivers accurate point estimation results. However, in order to perform more complex analyse...
Modern imaging methods rely strongly on Bayesian inference techniques to solve challenging imaging problems. Currently, the predominant Bayesian computation approach is convex optimisation, which scales very efficiently to high dimensional image models and delivers accurate point estimation results. However, in order to perform more complex analyse...
Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) algorithms have become increasingly popular for Bayesian inference in large-scale applications. Even though these methods have proved useful in several scenarios, their performance is often limited by their bias. In this study, we propose a novel sampling algorithm that aims to reduce the bias...
The subject of this thesis is the analysis of Markov Chain Monte Carlo (MCMC) methods and the development of new methodologies to sample from a high dimensional distribution. Our work is divided into three main topics. The first problem addressed in this manuscript is the convergence of Markov chains in Wasserstein distance. Geometric and sub-geome...
We consider in this paper the problem of sampling a probability distribution $\pi$ having a density with respect to the Lebesgue measure on $\mathbb{R}^d$, known up to a normalisation factor $x \mapsto \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y$. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is...
This paper considers the optimal scaling problem for high-dimensional random walk Metropolis algorithms for densities which are differentiable in Lp mean but which may be irregular at some points (like the Laplace density for example) and/or are supported on an interval. Our main result is the weak convergence of the Markov chain (appropriately res...
This paper considers the optimal scaling problem for high-dimensional random walk Metropolis algorithms for densities which are differentiable in Lp mean but which may be irregular at some points (like the Laplace density for example) and/or are supported on an interval. Our main result is the weak convergence of the Markov chain (appropriately res...
Sampling distribution over high-dimensional state-space is a problem which
has recently attracted a lot of research efforts; applications include Bayesian
non-parametrics, Bayesian inverse problems and aggregation of estimators.All
these problems boil down to sample a target distribution $\pi$ having a density
\wrt\ the Lebesgue measure on $\mathbb...
We introduce new Gaussian proposals to improve the efficiency of the standard
Hastings-Metropolis algorithm in Markov chain Monte Carlo (MCMC) methods, used
for the sampling from a target distribution in large dimension $d$. The
improved complexity is $\mathcal{O}(d^{1/5})$ compared to the complexity
$\mathcal{O}(d^{1/3})$ of the standard approach....
In this paper, we establish explicit convergence rates for Markov chains in Wasserstein distance. Compared to the more classical total variation bounds, the proposed rate of convergence leads to useful insights for the analysis of MCMC algorithms, and suggests ways to construct sampler with good mixing rate even if the dimension of the underlying s...
In this paper, we provide sufficient conditions for the existence of the
invariant distribution and subgeometric rates of convergence in the Wasserstein
distance for general state-space Markov chains which are not phi-irreducible.
Our approach is based on a coupling construction adapted to the Wasserstein
distance. Our results are applied to establ...
Our main result is a construction of a lattice-based digital signature scheme that represents an improvement, both in theory and in practice, over today’s most efficient lattice schemes. The novel scheme is obtained as a result of a modification of the rejection sampling algorithm that is at the heart of Lyubashevsky’s signature scheme (Eurocrypt,...
The Ring-LWE problem, introduced by Lyubashevsky, Peikert, and Regev (Eurocrypt 2010), has been steadily finding many uses in numerous cryptographic applications. Still, the Ring-LWE problem defined in [LPR10] involves the fractional ideal R
∨ , the dual of the ring R, which is the source of many theoretical and implementation technicalities. Until...