
Arnaud DoucetUniversity of Oxford | OX · Department of Statistics
Arnaud Doucet
PhD University Paris XI (Orsay) 1997
About
394
Publications
70,782
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
46,773
Citations
Introduction
Monte Carlo methods and application to Bayesian statistics and Machine Learning.
Additional affiliations
August 2011 - present
December 2008 - November 2009
June 2005 - August 2011
Publications
Publications (394)
Markov chain Monte Carlo methods have become standard tools in statistics to
sample from complex probability measures. Many available techniques rely on
discrete-time reversible Markov chains whose transition kernels build up over
the Metropolis-Hastings algorithm. We explore and propose several original
extensions of an alternative approach introd...
Let π0 and π1 be two probability measures on R d , equipped with the Borel σ-algebra B(R d). Any measurable function T : R d → R d such that Y = T (X) ∼ π1 if X ∼ π0 is called a transport map from π0 to π1. If for any π0 and π1, one could obtain an analytical expression for a transport map from π0 to π1 then this could be straightforwardly applied...
The application of Bayesian methods to large scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov Chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series...
Nonlinear non-Gaussian state-space models are ubiquitous in statistics,
econometrics, information engineering and signal processing. Particle methods,
also known as Sequential Monte Carlo (SMC) methods, provide reliable numerical
approximations to the associated state inference problems. However, in most
applications, the state-space model of inter...
When an unbiased estimator of the likelihood is used within a Metropolis–Hastings chain, it is necessary to trade off the
number of Monte Carlo samples used to construct this estimator against the asymptotic variances of the averages computed under
this chain. Using many Monte Carlo samples will typically result in Metropolis–Hastings averages with...
U-Nets are a go-to, state-of-the-art neural architecture across numerous tasks for continuous signals on a square such as images and Partial Differential Equations (PDE), however their design and architecture is understudied. In this paper, we provide a framework for designing and analysing general U-Net architectures. We present theoretical result...
Solving Fredholm equations of the first kind is crucial in many areas of the applied sciences. In this work we adopt a probabilistic and variational point of view by considering a minimization problem in the space of probability measures with an entropic regularization. Contrary to classical approaches which discretize the domain of the solutions,...
We establish a disintegrated PAC-Bayesian bound, for classifiers that are trained via continuous-time (non-stochastic) gradient descent. Contrarily to what is standard in the PAC-Bayesian setting, our result applies to a training algorithm that is deterministic, conditioned on a random initialisation, without requiring any $\textit{de-randomisation...
We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an ordered list of items at each time step and observes stochastic outcomes for each position. In online recommendation systems, showing an ordered list of the most attractive items would not be the best choice since both position and item dependencies...
This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obta...
Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to sample complex high‐dimensional probability distributions. They rely on a collection of N interacting auxiliary chains targeting tempered versions of the target distribution to improve the exploration of the state space. We provide here a new perspective...
Recent studies have empirically investigated different methods to train a stochastic classifier by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound....
We consider lithological tomography in which the posterior distribution of (hydro)geological parameters of interest is inferred from geophysical data by treating the intermediate geophysical properties as latent variables. In such a latent variable model, one needs to estimate the intractable likelihood of the (hydro)geological parameters given the...
Fredholm integral equations of the first kind are the prototypical example of ill-posed linear inverse problems. They model, among other things, reconstruction of distorted noisy observations and indirect density estimation and also appear in instrumental variable regression. However, their numerical solution remains a challenging problem. Many tec...
The limit of infinite width allows for substantial simplifications in the analytical study of overparameterized neural networks. With a suitable random initialization, an extremely large network is well approximated by a Gaussian process, both before and during training. In the present work, we establish a similar result for a simple stochastic arc...
Markov chain Monte Carlo (MCMC) methods to sample from a probability distribution $\pi$ defined on a space $(\Theta,\mathcal{T})$ consist of the simulation of realisations of Markov chains $\{\theta_{n},n\geq1\}$ of invariant distribution $\pi$ and such that the distribution of $\theta_{i}$ converges to $\pi$ as $i\rightarrow\infty$. In practice on...
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayo...
Fredholm integral equations of the first kind are the prototypical example of ill-posed linear inverse problems. They model, among other things, reconstruction of distorted noisy observations and indirect density estimation and also appear in instrumental variable regression. However, their numerical solution remains a challenging problem. Many tec...
Both sequential Monte Carlo (SMC) methods (a.k.a. ‘particle filters’) and sequential Markov chain Monte Carlo (sequential MCMC) methods constitute classes of algorithms which can be used to approximate expectations with respect to (a sequence of) probability distributions and their normalising constants. While SMC methods sample particles condition...
In many scenarios, a state-space model depends on a parameter which needs to be inferred from data. Using stochastic gradient search and the optimal filter first-order derivatives, the parameter can be estimated online. To analyze the asymptotic behavior of such methods, it is necessary to establish results on the existence and stability of the opt...
The analyticity of the entropy and relative entropy rates of continuous-state hidden Markov models is studied here. Using the analytic continuation principle and the stability properties of the optimal filter, the analyticity of these rates is established for analytically parameterized models. The obtained results hold under relatively mild conditi...
Stochastic Gradient Descent (SGD) is widely used to train deep neural networks. However, few theoretical results on the training dynamics of SGD are available. Recent work by Jacot et al. (2018) has showed that training a neural network of any kind with a full batch gradient descent in parameter space is equivalent to kernel gradient descent in fun...
Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to explore complex high-dimensional probability distributions. These algorithms can be highly effective but their performance is contingent on the selection of a suitable annealing schedule. In this work, we provide a new perspective on PT algorithms and th...
When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in compariso...
When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in compariso...
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theor...
The conditions of relative convexity and smoothness were recently introduced by Bauschke, Bolte, and Teboulle and Lu, Freund, and Nesterov for the analysis of first order methods optimizing a convex function. Those papers considered conditions over the primal space. We introduce a descent scheme with relative smoothness in the dual space between th...
We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a...
Bayesian inference via standard Markov Chain Monte Carlo (MCMC) methods such as Metropolis-Hastings is too computationally intensive to handle large datasets, since the cost per step usually scales like $O(n)$ in the number of data points $n$. We propose the Scalable Metropolis-Hastings (SMH) kernel that exploits Gaussian concentration of the poste...
We propose a family of optimization methods that achieve linear convergence using first-order gradient information and constant step sizes on a class of convex functions much larger than the smooth and strongly convex ones. This larger class includes functions whose second derivatives may be singular or unbounded at their minima. Our methods are di...
We present an original simulation-based method to estimate likelihood ratios efficiently for general state-space models. Our method relies on a novel use of the conditional Sequential Monte Carlo (cSMC) algorithm introduced in \citet{Andrieu_et_al_2010} and presents several practical advantages over standard approaches. The ratio is estimated using...
The Bouncy Particle Sampler is a Markov chain Monte Carlo method based on a nonreversible piecewise deterministic Markov process. In this scheme, a particle explores the state space of interest by evolving according to a linear dynamics which is altered by bouncing on the hyperplane tangent to the gradient of the negative log-target density at the...
Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for la...
Sequential Monte Carlo (SMC) methods, also known as particle filters, constitute a class of algorithms used to approximate expectations with respect to a sequence of probability distributions as well as the normalising constants of those distributions. Sequential MCMC methods are an alternative class of techniques addressing similar problems in whi...
The pseudomarginal algorithm is a Metropolis–Hastings‐type scheme which samples asymptotically from a target probability density when we can only estimate unbiasedly an unnormalized version of it. In a Bayesian context, it is a state of the art posterior simulation technique when the likelihood function is intractable but can be estimated unbiasedl...
The pseudo-marginal algorithm is a variant of the Metropolis-Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic varian...
Using stochastic gradient search and the optimal filter derivative, it is possible to perform recursive (i.e., online) maximum likelihood estimation in a non-linear state-space model. As the optimal filter and its derivative are analytically intractable for such a model, they need to be approximated numerically. In [Poyiadjis, Doucet and Singh, Bio...
In many applications, a state-space model depends on a parameter which needs to be inferred from a data set. Quite often, it is necessary to perform the parameter inference online. In the maximum likelihood approach, this can be done using stochastic gradient search and the optimal filter derivative. However, the optimal filter and its derivative a...
In many scenarios, a state-space model depends on a parameter which needs to be inferred from data. Using stochastic gradient search and the optimal filter (first-order) derivative, the parameter can be estimated online. To analyze the asymptotic behavior of online methods for parameter estimation in non-linear state-space models, it is necessary t...
The analyticity of the entropy and relative entropy rates of continuous-state hidden Markov models is studied here. Using the analytic continuation principle and the stability properties of the optimal filter, the analyticity of these rates is shown for analytically parameterized models. The obtained results hold under relatively mild conditions an...
Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochast...
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the learning procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theor...
The Metropolis-Hastings algorithm allows one to sample asymptotically from any probability distribution $\pi$. There has been recently much work devoted to the development of variants of the MH update which can handle scenarios where such an evaluation is impossible, and yet are guaranteed to sample from $\pi$ asymptotically. The most popular appro...
The asymptotic behavior of the stochastic gradient algorithm using biased gradient estimates is analyzed. Relying on arguments based on dynamic system theory (chain-recurrence) and differential geometry (Yomdin theorem and Lojasiewicz inequalities), upper bounds on the asymptotic bias of this algorithm are derived. The results hold under mild condi...
Recent advances in sensor technologies, field methodologies, numerical modeling, and inversion approaches have contributed to unprecedented imaging of hydrogeological properties and detailed predictions at multiple temporal and spatial scales. Nevertheless, imaging results and predictions will always remain imprecise, which calls for appropriate un...
Sequential Monte Carlo (SMC) methods are a set of simulation-based techniques used to approximate high- dimensional probability distributions and their normalizing constants. They have found numerous applications in statistics as they can be applied to perform state estimation for state-space models and inference for complex static models. Like man...
Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques for approximating high-dimensional probability distributions and their normalizing constants. These methods have found numerous applications in statistics and related fields; e.g. for inference in non-linear non-Gaussian state space models, and in comple...
A novel class of non-reversible Markov chain Monte Carlo schemes relying on continuous-time piecewise deterministic Markov Processes has recently emerged. In these algorithms, the state of the Markov process evolves according to a deterministic dynamics which is modified using a Markov transition kernel at random event times. These methods enjoy re...
The evidence lower bound (ELBO) appears in many algorithms for maximum likelihood estimation (MLE) with latent variables because it is a sharp lower bound of the marginal log-likelihood. For neural latent variable models, optimizing the ELBO jointly in the variational posterior and model parameters produces state-of-the-art results. Inspired by the...
Non-reversible Markov chain Monte Carlo schemes based on piecewise deterministic Markov processes have been recently introduced in applied probability, automatic control, physics and statistics. Although these algorithms demonstrate experimentally good performance and are accordingly increasingly used in a wide range of applications, quantitative c...
Non-reversible Markov chain Monte Carlo schemes based on piecewise deterministic Markov processes have been recently introduced in applied probability, automatic control, physics and statistics. Although these algorithms demonstrate experimentally good performance and are accordingly increasingly used in a wide range of applications, geometric ergo...
The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its e...
Piecewise deterministic Monte Carlo methods (PDMC) consist of a class of continuous-time Markov chain Monte Carlo methods (MCMC) which have recently been shown to hold considerable promise. Being non-reversible, the mixing properties of PDMC methods often significantly outperform classical reversible MCMC competitors. Moreover, in a Bayesian contex...
Piecewise Deterministic Monte Carlo algorithms enable simulation from a posterior distribution, whilst only needing to access a sub-sample of data at each iteration. We show how they can be implemented in settings where the parameters live on a restricted domain.
The embedded hidden Markov model (EHMM) sampling method is a Markov chain Monte Carlo (MCMC) technique for state inference in non-linear non-Gaussian state-space models which was proposed in Neal (2003); Neal et al. (2004) and extended in Shestopaloff and Neal (2016). An extension to Bayesian parameter inference was presented in Shestopaloff and Ne...
Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables or...
We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method that introduces a coupling between multiple standard and conditional sequential Monte Carlo samplers. Like related methods, iPMCMC is a Markov chain Monte Carlo sampler on an extended space. We present empirical results that show significant improvements in mixing r...
Unsupervised image segmentation aims at clustering the set of pixels of an image into spatially homogeneous regions. We introduce here a class of Bayesian nonparametric models to address this problem. These models are based on a combination of a Potts-like spatial smoothness component and a prior on partitions which is used to control both the numb...
The application of Bayesian methods to large-scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series...
The use of unbiased estimators within the Metropolis--Hastings has found
numerous applications in Bayesian statistics. The resulting so-called
pseudo-marginal algorithm allows us to substitute an unbiased Monte Carlo
estimator of the likelihood for the true likelihood which might be intractable
or too expensive to compute. Under regularity conditio...
This paper presents a new Markov chain Monte Carlo method to sample from the
posterior distribution of conjugate mixture models. This algorithm relies on a
flexible split-merge procedure built using the particle Gibbs sampler. Contrary
to available split-merge procedures, the resulting so-called Particle Gibbs
Split-Merge sampler does not require t...
The application of Bayesian methods to large-scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series...
We propose an original particle-based implementation of the Loopy Belief
Propagation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a
continuous state space. The algorithm constructs adaptively efficient proposal
distributions approximating the local beliefs at each note of the MRF. This is
achieved by considering proposal distributions...
Markov chain Monte Carlo methods are often deemed too computationally
intensive to be of any practical use for big data applications, and in
particular for inference on datasets containing a large number $n$ of
individual data points, also known as tall datasets. In scenarios where data
are assumed independent, various approaches to scale up the Me...
We introduce a new sequential Monte Carlo algorithm we call the particle
cascade. The particle cascade is an asynchronous, anytime alternative to
traditional particle filtering algorithms. It uses no barrier synchronizations
which leads to improved particle throughput and memory efficiency. It is an
anytime algorithm in the sense that it can be run...
Consider an irreducible, Harris recurrent Markov chain of transition kernel
{\Pi} and invariant probability measure {\pi}. If {\Pi} satisfies a
minorization condition, then the split chain allows the identification of
regeneration times which may be exploited to obtain perfect samples from {\pi}.
Unfortunately, many transition kernels associated wi...
Discrete-time stochastic volatility (SV) models have generated a considerable literature in financial econometrics. However, carrying out inference for these models is a difficult task and often relies on carefully customized Markov chain Monte Carlo techniques. Our contribution here is twofold. First, we propose a new SV model, namely SV-GARCH, wh...
Distributed consensus in the Wasserstein metric space of probability measures
is introduced for the first time in this work. It is shown that convergence of
the individual agents' measures to a common measure value is guaranteed so long
as a weak network connectivity condition is satisfied asymptotically. The
common measure achieved asymptotically...
Interacting particle methods are increasingly used to sample from complex
high-dimensional distributions. They have found a wide range of applications in applied
probability, Bayesian statistics and information engineering. Understanding rigorously
these new Monte Carlo simulation tools leads to fascinating mathematics related to
Feynman-Kac path i...
2014 Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the Metropolis-Hastings (MH) algorithm in this context. We propose an approximate implementation of the accept/reject step of MH that only requires ev...
Wasserstein barycenters (Agueh and Carlier, 2011) define a new family of
barycenters between N probability measures that builds upon optimal transport
theory. We argue using a simple example that Wasserstein barycenters have
interesting properties that differentiate them from other barycenters proposed
recently, which all build either or both on ke...
This paper deals with the numerical approximation of normalizing constants
produced by particle methods, in the general framework of Feynman-Kac sequences
of measures. It is well-known that the corresponding estimates satisfy a
central limit theorem for a fixed time horizon $n$ as the number of particles
$N$ goes to infinity. Here, we study the sit...
Ionides, King et al. (see e.g. Inference for nonlinear dynamical systems,
PNAS 103) have recently introduced an original approach to perform maximum
likelihood parameter estimation in state-space models which only requires being
able to simulate the latent Markov model according its prior distribution.
Their methodology relies on an approximation o...
Mixture models are ubiquitous in applied science. In many real-world applications, the number of mixture components needs to be estimated from the data. A popular approach consists of using information criteria to perform model selection. Another approach which has become very popular over the past few years consists of using Dirichlet processes mi...
We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated
annealing algorithm to optimize radial basis function (RBF) networks. This
algorithm enables us to maximize the joint posterior distribution of the
network parameters and the number of basis functions. It performs a global
search in the joint space of the parameters and nu...
Particle filters (PFs) are powerful sampling-based inference/learning
algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a
principled way, any type of probability distribution, nonlinearity and
non-stationarity. They have appeared in several fields under such names as
"condensation", "sequential Monte Carlo" and "survival o...
Online variants of the Expectation Maximization (EM) algorithm have recently been proposed to perform parameter inference with large data sets or data streams, in independent latent models and in hidden Markov models. Nevertheless, the convergence properties ...
Sequential Monte Carlo (SMC) is a methodology for sampling approximately from a sequence of probability distributions of increasing dimension and estimating their normalizing constants. We propose here an alternative methodology named Sequentially Interacting Markov Chain Monte Carlo (SIMCMC). SIMCMC methods work by generating interacting non-Marko...
This paper describes an algorithm of interest. This is a preliminary version
and we intend on writing a better descripition of it and getting bounds for its
complexity.
We show that the sensor self-localization problem can be cast as a
static parameter estimation problem for Hidden Markov Models and we
implement fully decentralized versions of the Recursive Maximum
Likelihood and on-line Expectation-Maximization algorithms to localize
the sensor network simultaneously with target tracking. For linear
Gaussian mode...