Arnaud Doucet

Arnaud Doucet
University of Oxford | OX · Department of Statistics

PhD University Paris XI (Orsay) 1997

About

394
Publications
70,782
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
46,773
Citations
Introduction
Additional affiliations
August 2011 - present
University of Oxford
Position
  • Professor (Full)
December 2008 - November 2009
June 2005 - August 2011
University of British Columbia
Position
  • Professor (Associate)

Publications

Publications (394)
Article
Full-text available
Markov chain Monte Carlo methods have become standard tools in statistics to sample from complex probability measures. Many available techniques rely on discrete-time reversible Markov chains whose transition kernels build up over the Metropolis-Hastings algorithm. We explore and propose several original extensions of an alternative approach introd...
Technical Report
Full-text available
Let π0 and π1 be two probability measures on R d , equipped with the Borel σ-algebra B(R d). Any measurable function T : R d → R d such that Y = T (X) ∼ π1 if X ∼ π0 is called a transport map from π0 to π1. If for any π0 and π1, one could obtain an analytical expression for a transport map from π0 to π1 then this could be straightforwardly applied...
Article
Full-text available
The application of Bayesian methods to large scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov Chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series...
Article
Full-text available
Nonlinear non-Gaussian state-space models are ubiquitous in statistics, econometrics, information engineering and signal processing. Particle methods, also known as Sequential Monte Carlo (SMC) methods, provide reliable numerical approximations to the associated state inference problems. However, in most applications, the state-space model of inter...
Article
Full-text available
When an unbiased estimator of the likelihood is used within a Metropolis–Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of the averages computed under this chain. Using many Monte Carlo samples will typically result in Metropolis–Hastings averages with...
Preprint
U-Nets are a go-to, state-of-the-art neural architecture across numerous tasks for continuous signals on a square such as images and Partial Differential Equations (PDE), however their design and architecture is understudied. In this paper, we provide a framework for designing and analysing general U-Net architectures. We present theoretical result...
Preprint
Full-text available
Solving Fredholm equations of the first kind is crucial in many areas of the applied sciences. In this work we adopt a probabilistic and variational point of view by considering a minimization problem in the space of probability measures with an entropic regularization. Contrary to classical approaches which discretize the domain of the solutions,...
Preprint
Full-text available
We establish a disintegrated PAC-Bayesian bound, for classifiers that are trained via continuous-time (non-stochastic) gradient descent. Contrarily to what is standard in the PAC-Bayesian setting, our result applies to a training algorithm that is deterministic, conditioned on a random initialisation, without requiring any $\textit{de-randomisation...
Preprint
Full-text available
We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an ordered list of items at each time step and observes stochastic outcomes for each position. In online recommendation systems, showing an ordered list of the most attractive items would not be the best choice since both position and item dependencies...
Preprint
Full-text available
This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obta...
Article
Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to sample complex high‐dimensional probability distributions. They rely on a collection of N interacting auxiliary chains targeting tempered versions of the target distribution to improve the exploration of the state space. We provide here a new perspective...
Preprint
Full-text available
Recent studies have empirically investigated different methods to train a stochastic classifier by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound....
Preprint
We consider lithological tomography in which the posterior distribution of (hydro)geological parameters of interest is inferred from geophysical data by treating the intermediate geophysical properties as latent variables. In such a latent variable model, one needs to estimate the intractable likelihood of the (hydro)geological parameters given the...
Article
Full-text available
Fredholm integral equations of the first kind are the prototypical example of ill-posed linear inverse problems. They model, among other things, reconstruction of distorted noisy observations and indirect density estimation and also appear in instrumental variable regression. However, their numerical solution remains a challenging problem. Many tec...
Preprint
Full-text available
The limit of infinite width allows for substantial simplifications in the analytical study of overparameterized neural networks. With a suitable random initialization, an extremely large network is well approximated by a Gaussian process, both before and during training. In the present work, we establish a similar result for a simple stochastic arc...
Preprint
Full-text available
Markov chain Monte Carlo (MCMC) methods to sample from a probability distribution $\pi$ defined on a space $(\Theta,\mathcal{T})$ consist of the simulation of realisations of Markov chains $\{\theta_{n},n\geq1\}$ of invariant distribution $\pi$ and such that the distribution of $\theta_{i}$ converges to $\pi$ as $i\rightarrow\infty$. In practice on...
Preprint
Full-text available
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayo...
Preprint
Full-text available
Fredholm integral equations of the first kind are the prototypical example of ill-posed linear inverse problems. They model, among other things, reconstruction of distorted noisy observations and indirect density estimation and also appear in instrumental variable regression. However, their numerical solution remains a challenging problem. Many tec...
Article
Both sequential Monte Carlo (SMC) methods (a.k.a. ‘particle filters’) and sequential Markov chain Monte Carlo (sequential MCMC) methods constitute classes of algorithms which can be used to approximate expectations with respect to (a sequence of) probability distributions and their normalising constants. While SMC methods sample particles condition...
Article
In many scenarios, a state-space model depends on a parameter which needs to be inferred from data. Using stochastic gradient search and the optimal filter first-order derivatives, the parameter can be estimated online. To analyze the asymptotic behavior of such methods, it is necessary to establish results on the existence and stability of the opt...
Article
The analyticity of the entropy and relative entropy rates of continuous-state hidden Markov models is studied here. Using the analytic continuation principle and the stability properties of the optimal filter, the analyticity of these rates is established for analytically parameterized models. The obtained results hold under relatively mild conditi...
Preprint
Stochastic Gradient Descent (SGD) is widely used to train deep neural networks. However, few theoretical results on the training dynamics of SGD are available. Recent work by Jacot et al. (2018) has showed that training a neural network of any kind with a full batch gradient descent in parameter space is equivalent to kernel gradient descent in fun...
Preprint
Full-text available
Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to explore complex high-dimensional probability distributions. These algorithms can be highly effective but their performance is contingent on the selection of a suitable annealing schedule. In this work, we provide a new perspective on PT algorithms and th...
Preprint
Full-text available
When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in compariso...
Conference Paper
Full-text available
When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in compariso...
Preprint
Full-text available
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theor...
Preprint
The conditions of relative convexity and smoothness were recently introduced by Bauschke, Bolte, and Teboulle and Lu, Freund, and Nesterov for the analysis of first order methods optimizing a convex function. Those papers considered conditions over the primal space. We introduce a descent scheme with relative smoothness in the dual space between th...
Preprint
We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a...
Preprint
Bayesian inference via standard Markov Chain Monte Carlo (MCMC) methods such as Metropolis-Hastings is too computationally intensive to handle large datasets, since the cost per step usually scales like $O(n)$ in the number of data points $n$. We propose the Scalable Metropolis-Hastings (SMH) kernel that exploits Gaussian concentration of the poste...
Preprint
We propose a family of optimization methods that achieve linear convergence using first-order gradient information and constant step sizes on a class of convex functions much larger than the smooth and strongly convex ones. This larger class includes functions whose second derivatives may be singular or unbounded at their minima. Our methods are di...
Preprint
We present an original simulation-based method to estimate likelihood ratios efficiently for general state-space models. Our method relies on a novel use of the conditional Sequential Monte Carlo (cSMC) algorithm introduced in \citet{Andrieu_et_al_2010} and presents several practical advantages over standard approaches. The ratio is estimated using...
Preprint
The Bouncy Particle Sampler is a Markov chain Monte Carlo method based on a nonreversible piecewise deterministic Markov process. In this scheme, a particle explores the state space of interest by evolving according to a linear dynamics which is altered by bouncing on the hyperplane tangent to the gradient of the negative log-target density at the...
Preprint
Full-text available
Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for la...
Preprint
Full-text available
Sequential Monte Carlo (SMC) methods, also known as particle filters, constitute a class of algorithms used to approximate expectations with respect to a sequence of probability distributions as well as the normalising constants of those distributions. Sequential MCMC methods are an alternative class of techniques addressing similar problems in whi...
Article
The pseudomarginal algorithm is a Metropolis–Hastings‐type scheme which samples asymptotically from a target probability density when we can only estimate unbiasedly an unnormalized version of it. In a Bayesian context, it is a state of the art posterior simulation technique when the likelihood function is intractable but can be estimated unbiasedl...
Preprint
Full-text available
The pseudo-marginal algorithm is a variant of the Metropolis-Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic varian...
Preprint
Full-text available
Using stochastic gradient search and the optimal filter derivative, it is possible to perform recursive (i.e., online) maximum likelihood estimation in a non-linear state-space model. As the optimal filter and its derivative are analytically intractable for such a model, they need to be approximated numerically. In [Poyiadjis, Doucet and Singh, Bio...
Preprint
In many applications, a state-space model depends on a parameter which needs to be inferred from a data set. Quite often, it is necessary to perform the parameter inference online. In the maximum likelihood approach, this can be done using stochastic gradient search and the optimal filter derivative. However, the optimal filter and its derivative a...
Preprint
In many scenarios, a state-space model depends on a parameter which needs to be inferred from data. Using stochastic gradient search and the optimal filter (first-order) derivative, the parameter can be estimated online. To analyze the asymptotic behavior of online methods for parameter estimation in non-linear state-space models, it is necessary t...
Preprint
The analyticity of the entropy and relative entropy rates of continuous-state hidden Markov models is studied here. Using the analytic continuation principle and the stability properties of the optimal filter, the analyticity of these rates is shown for analytically parameterized models. The obtained results hold under relatively mild conditions an...
Preprint
Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochast...
Preprint
Full-text available
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the learning procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theor...
Article
Full-text available
The Metropolis-Hastings algorithm allows one to sample asymptotically from any probability distribution $\pi$. There has been recently much work devoted to the development of variants of the MH update which can handle scenarios where such an evaluation is impossible, and yet are guaranteed to sample from $\pi$ asymptotically. The most popular appro...
Article
The asymptotic behavior of the stochastic gradient algorithm using biased gradient estimates is analyzed. Relying on arguments based on dynamic system theory (chain-recurrence) and differential geometry (Yomdin theorem and Lojasiewicz inequalities), upper bounds on the asymptotic bias of this algorithm are derived. The results hold under mild condi...
Article
Recent advances in sensor technologies, field methodologies, numerical modeling, and inversion approaches have contributed to unprecedented imaging of hydrogeological properties and detailed predictions at multiple temporal and spatial scales. Nevertheless, imaging results and predictions will always remain imprecise, which calls for appropriate un...
Article
Full-text available
Sequential Monte Carlo (SMC) methods are a set of simulation-based techniques used to approximate high- dimensional probability distributions and their normalizing constants. They have found numerous applications in statistics as they can be applied to perform state estimation for state-space models and inference for complex static models. Like man...
Preprint
Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques for approximating high-dimensional probability distributions and their normalizing constants. These methods have found numerous applications in statistics and related fields; e.g. for inference in non-linear non-Gaussian state space models, and in comple...
Article
Full-text available
A novel class of non-reversible Markov chain Monte Carlo schemes relying on continuous-time piecewise deterministic Markov Processes has recently emerged. In these algorithms, the state of the Markov process evolves according to a deterministic dynamics which is modified using a Markov transition kernel at random event times. These methods enjoy re...
Article
The evidence lower bound (ELBO) appears in many algorithms for maximum likelihood estimation (MLE) with latent variables because it is a sharp lower bound of the marginal log-likelihood. For neural latent variable models, optimizing the ELBO jointly in the variational posterior and model parameters produces state-of-the-art results. Inspired by the...
Article
Full-text available
Non-reversible Markov chain Monte Carlo schemes based on piecewise deterministic Markov processes have been recently introduced in applied probability, automatic control, physics and statistics. Although these algorithms demonstrate experimentally good performance and are accordingly increasingly used in a wide range of applications, quantitative c...
Preprint
Non-reversible Markov chain Monte Carlo schemes based on piecewise deterministic Markov processes have been recently introduced in applied probability, automatic control, physics and statistics. Although these algorithms demonstrate experimentally good performance and are accordingly increasingly used in a wide range of applications, geometric ergo...
Article
The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its e...
Article
Piecewise deterministic Monte Carlo methods (PDMC) consist of a class of continuous-time Markov chain Monte Carlo methods (MCMC) which have recently been shown to hold considerable promise. Being non-reversible, the mixing properties of PDMC methods often significantly outperform classical reversible MCMC competitors. Moreover, in a Bayesian contex...
Preprint
Piecewise Deterministic Monte Carlo algorithms enable simulation from a posterior distribution, whilst only needing to access a sub-sample of data at each iteration. We show how they can be implemented in settings where the parameters live on a restricted domain.
Article
Full-text available
The embedded hidden Markov model (EHMM) sampling method is a Markov chain Monte Carlo (MCMC) technique for state inference in non-linear non-Gaussian state-space models which was proposed in Neal (2003); Neal et al. (2004) and extended in Shestopaloff and Neal (2016). An extension to Bayesian parameter inference was presented in Shestopaloff and Ne...
Article
Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables or...
Article
Full-text available
We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method that introduces a coupling between multiple standard and conditional sequential Monte Carlo samplers. Like related methods, iPMCMC is a Markov chain Monte Carlo sampler on an extended space. We present empirical results that show significant improvements in mixing r...
Article
Unsupervised image segmentation aims at clustering the set of pixels of an image into spatially homogeneous regions. We introduce here a class of Bayesian nonparametric models to address this problem. These models are based on a combination of a Potts-like spatial smoothness component and a prior on partitions which is used to control both the numb...
Article
The application of Bayesian methods to large-scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series...
Article
Full-text available
The use of unbiased estimators within the Metropolis--Hastings has found numerous applications in Bayesian statistics. The resulting so-called pseudo-marginal algorithm allows us to substitute an unbiased Monte Carlo estimator of the likelihood for the true likelihood which might be intractable or too expensive to compute. Under regularity conditio...
Article
This paper presents a new Markov chain Monte Carlo method to sample from the posterior distribution of conjugate mixture models. This algorithm relies on a flexible split-merge procedure built using the particle Gibbs sampler. Contrary to available split-merge procedures, the resulting so-called Particle Gibbs Split-Merge sampler does not require t...
Article
The application of Bayesian methods to large-scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series...
Article
We propose an original particle-based implementation of the Loopy Belief Propagation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a continuous state space. The algorithm constructs adaptively efficient proposal distributions approximating the local beliefs at each note of the MRF. This is achieved by considering proposal distributions...
Article
Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number $n$ of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Me...
Article
Full-text available
We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional particle filtering algorithms. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run...
Article
Full-text available
Consider an irreducible, Harris recurrent Markov chain of transition kernel {\Pi} and invariant probability measure {\pi}. If {\Pi} satisfies a minorization condition, then the split chain allows the identification of regeneration times which may be exploited to obtain perfect samples from {\pi}. Unfortunately, many transition kernels associated wi...
Article
Discrete-time stochastic volatility (SV) models have generated a considerable literature in financial econometrics. However, carrying out inference for these models is a difficult task and often relies on carefully customized Markov chain Monte Carlo techniques. Our contribution here is twofold. First, we propose a new SV model, namely SV-GARCH, wh...
Article
Distributed consensus in the Wasserstein metric space of probability measures is introduced for the first time in this work. It is shown that convergence of the individual agents' measures to a common measure value is guaranteed so long as a weak network connectivity condition is satisfied asymptotically. The common measure achieved asymptotically...
Article
Full-text available
Interacting particle methods are increasingly used to sample from complex high-dimensional distributions. They have found a wide range of applications in applied probability, Bayesian statistics and information engineering. Understanding rigorously these new Monte Carlo simulation tools leads to fascinating mathematics related to Feynman-Kac path i...
Article
2014 Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. This paper describes a methodology that aims to scale up the Metropolis-Hastings (MH) algorithm in this context. We propose an approximate implementation of the accept/reject step of MH that only requires ev...
Article
Wasserstein barycenters (Agueh and Carlier, 2011) define a new family of barycenters between N probability measures that builds upon optimal transport theory. We argue using a simple example that Wasserstein barycenters have interesting properties that differentiate them from other barycenters proposed recently, which all build either or both on ke...
Article
Full-text available
This paper deals with the numerical approximation of normalizing constants produced by particle methods, in the general framework of Feynman-Kac sequences of measures. It is well-known that the corresponding estimates satisfy a central limit theorem for a fixed time horizon $n$ as the number of particles $N$ goes to infinity. Here, we study the sit...
Article
Full-text available
Ionides, King et al. (see e.g. Inference for nonlinear dynamical systems, PNAS 103) have recently introduced an original approach to perform maximum likelihood parameter estimation in state-space models which only requires being able to simulate the latent Markov model according its prior distribution. Their methodology relies on an approximation o...
Article
Mixture models are ubiquitous in applied science. In many real-world applications, the number of mixture components needs to be estimated from the data. A popular approach consists of using information criteria to perform model selection. Another approach which has become very popular over the past few years consists of using Dirichlet processes mi...
Article
We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated annealing algorithm to optimize radial basis function (RBF) networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis functions. It performs a global search in the joint space of the parameters and nu...
Article
Full-text available
Particle filters (PFs) are powerful sampling-based inference/learning algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a principled way, any type of probability distribution, nonlinearity and non-stationarity. They have appeared in several fields under such names as "condensation", "sequential Monte Carlo" and "survival o...
Article
Online variants of the Expectation Maximization (EM) algorithm have recently been proposed to perform parameter inference with large data sets or data streams, in independent latent models and in hidden Markov models. Nevertheless, the convergence properties ...
Preprint
Sequential Monte Carlo (SMC) is a methodology for sampling approximately from a sequence of probability distributions of increasing dimension and estimating their normalizing constants. We propose here an alternative methodology named Sequentially Interacting Markov Chain Monte Carlo (SIMCMC). SIMCMC methods work by generating interacting non-Marko...
Article
Full-text available
This paper describes an algorithm of interest. This is a preliminary version and we intend on writing a better descripition of it and getting bounds for its complexity.
Article
We show that the sensor self-localization problem can be cast as a static parameter estimation problem for Hidden Markov Models and we implement fully decentralized versions of the Recursive Maximum Likelihood and on-line Expectation-Maximization algorithms to localize the sensor network simultaneously with target tracking. For linear Gaussian mode...