Article

Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Since the latter integral, i.e., the resulting marginal likelihood of the multinomial logistic regression model, is computationally intractable, we will apply the Stochastic Optimisation via Unadjusted Langevin (SOUL) method [8], which is specifically designed for this type of problem. We will explain this method in detail below. ...
... The implementation guidelines and details about the SOUL algorithm can be found in [8] and in [44,Section 3.3]. For completeness, we will provide some details below. ...
... Setting δ i PGA , and m It is suggested in [44] to set δ (i) PGA = C 0 i −p where p is within the range [0.6, 0.9] (in our experiments, we set p = 0.8) and C 0 ∈ R a constant that can be initially set as (λ (0) W × (K + 1) × C) −1 and adjusted as needed. For m, we followed the recommendation in [8,44] using a single sample per iteration (that is, m = 1), as we did not observe significant differences with larger values of m. ...
Preprint
Full-text available
Biclustering has gained interest in gene expression data analysis due to its ability to identify groups of samples that exhibit similar behaviour in specific subsets of genes (or vice versa), in contrast to traditional clustering methods that classify samples based on all genes. Despite advances, biclustering remains a challenging problem, even with cutting-edge methodologies. This paper introduces an extension of the recently proposed Spike-and-Slab Lasso Biclustering (SSLB) algorithm, termed Outcome-Guided SSLB (OG-SSLB), aimed at enhancing the identification of biclusters in gene expression analysis. Our proposed approach integrates disease outcomes into the biclustering framework through Bayesian profile regression. By leveraging additional clinical information, OG-SSLB improves the interpretability and relevance of the resulting biclusters. Comprehensive simulations and numerical experiments demonstrate that OG-SSLB achieves superior performance, with improved accuracy in estimating the number of clusters and higher consensus scores compared to the original SSLB method. Furthermore, OG-SSLB effectively identifies meaningful patterns and associations between gene expression profiles and disease states. These promising results demonstrate the effectiveness of OG-SSLB in advancing biclustering techniques, providing a powerful tool for uncovering biologically relevant insights. The OGSSLB software can be found as an R/C++ package at https://github.com/luisvargasmieles/OGSSLB .
... For big data problems the dimension of the augmented posterior is large, which makes the approximation error substantial and the method highly inaccurate (Durmus and Moulines, 2017). Metropolis-adjusted Langevin algorithms (MALA) have been proposed to correct for this bias (Roberts and Tweedie, 1996), but these suffer from large computational costs and poor estimation results in high dimensions (De Bortoli et al., 2021). ...
... First, the total variation norm between the exact posterior and the ULA approximation increases in m (Dalalyan, 2017;Durmus and Moulines, 2017). Although the approximation error can be corrected with a Metropolis-Hastings step (Roberts and Tweedie, 1996), this step is time consuming, deteriorates convergence properties, and may lead to poor estimation results (De Bortoli et al., 2021). Second, evaluation of the gradient ▽ z log p(θ, z|y) can be extremely costly for problems in which the number of latent variables is large; for instance, problems with millions of observations. ...
Preprint
The exact estimation of latent variable models with big data is known to be challenging. The latents have to be integrated out numerically, and the dimension of the latent variables increases with the sample size. This paper develops a novel approximate Bayesian method based on the Langevin diffusion process. The method employs the Fisher identity to integrate out the latent variables, which makes it accurate and computationally feasible when applied to big data. In contrast to other approximate estimation methods, it does not require the choice of a parametric distribution for the unknowns, which often leads to inaccuracies. In an empirical discrete choice example with a million observations, the proposed method accurately estimates the posterior choice probabilities using only 2% of the computation time of exact MCMC.
... Such a methodology is particularly appealing in statistics and machine learning where the field b can be very expensive to evaluate or cannot even be accessed [38,136]. In these contexts, the Langevin dynamics have been primarily considered for either performing Bayesian inference [1,35,91] or optimizing an objective function [11,15,109,114]. In the first case, b = ∇ log π where π : R d → R + is the a posteriori distribution of a statistical model, which can generally be written as − log π = N k=1 U i , with N the number of observations and U i : R d → R. It has been also proposed to use Langevin dynamics to find an element of arg min R d f for some function f : R d → R by setting b = −∇f and taking σ small. ...
... The long-time convergence of this algorithm and the numerical bias on the invariant measure due to the time discretization are well understood, see e.g. [15,30,32,43,44] and references therein. However, in the present case, it is not possible to sample this Markov chain, as the exact computation of ...
Thesis
This thesis focuses on the analysis and design of Markov chain Monte Carlo (MCMC) methods used in high-dimensional sampling. It consists of three parts.The first part introduces a new class of Markov chains and MCMC methods. These methods allow to improve MCMC methods by using samples targeting a restriction of the original target distribution on a domain chosen by the user. This procedure gives rise to a new chain that takes advantage of the convergence properties of the two underlying processes. In addition to showing that this chain always targets the original target measure, we also establish ergodicity properties under weak assumptions on the Markov kernels involved.The second part of this thesis focuses on discretizations of the underdamped Langevin diffusion. As this diffusion cannot be computed explicitly in general, it is classical to consider discretizations. This thesis establishes for a large class of discretizations a condition of uniform minimization in the time step. With additional assumptions on the potential, it shows that these discretizations converge geometrically to their unique V-invariant probability measure.The last part studies the unadjusted Langevin algorithm in the case where the gradient of the potential is known to within a uniformly bounded error. This part provides bounds in V-norm and in Wasserstein distance between the iterations of the algorithm with the exact gradient and the one with the approximated gradient. To do this, an auxiliary Markov chain is introduced that bounds the difference. It is established that this auxiliary chain converges in distribution to sticky process already studied in the literature for the continuous version of this problem.
... The time discretization biases ULA asymptotically, which could be corrected via metropolization. However, the poor nonasymptotic performance of metropolized ULA [45], [46] makes this less useful in practice, and we do not use metropolization. ...
Article
Full-text available
Data-driven approaches recently achieved remarkable success in magnetic resonance imaging (MRI) reconstruction, but integration into clinical routine remains challenging due to a lack of generalizability and interpretability. In this paper, we address these challenges in a unified framework based on generative image priors. We propose a novel deep neural network based regularizer which is trained in a generative setting on reference magnitude images only. After training, the regularizer encodes higher-level domain statistics which we demonstrate by synthesizing images without data. Embedding the trained model in a classical variational approach yields high-quality reconstructions irrespective of the sub-sampling pattern. In addition, the model shows stable behavior when confronted with out-of-distribution data in the form of contrast variation. Furthermore, a probabilistic interpretation provides a distribution of reconstructions and hence allows uncertainty quantification. To reconstruct parallel MRI, we propose a fast algorithm to jointly estimate the image and the sensitivity maps. The results demonstrate competitive performance, on par with state-of-the-art end-to-end deep learning methods, while preserving the flexibility with respect to sub-sampling patterns and allowing for uncertainty quantification.
... Such a methodology is particularly appealing in statistics and machine learning where the field b can be very expensive to evaluate or cannot even be accessed [22,57]. In these contexts, the Langevin dynamics have been primarily considered for either performing Bayesian inference [20,1,42] or optimizing an objective function [47,50,7,19]. In the first case, b = ∇ log π where π : R d → R + is the a posteriori distribution of a statistical model, which can generally be written as − log π = N −1 N k=1 U i , with N the number of observations and U i : R d → R. It has been also proposed to use Langevin dynamics to find an element of arg min R d f for some function f : R d → R by setting b = −∇f and taking σ small. ...
Preprint
We study the convergence in total variation and V-norm of discretization schemes of the underdamped Langevin dynamics. Such algorithms are very popular and commonly used in molecular dynamics and computational statistics to approximatively sample from a target distribution of interest. We show first that, for a very large class of schemes, a minorization condition uniform in the stepsize holds. This class encompasses popular methods such as the Euler-Maruyama scheme and the ones based on splitting strategies. Second, we provide mild conditions ensuring that the class of schemes that we consider satisfies a geometric Foster--Lyapunov drift condition, again uniform in the stepsize. This allows us to derive geometric convergence bounds, with a convergence rate scaling linearly with the stepsize. This kind of result is a prime interest to obtain estimates on norms of solutions to Poisson equations associated with a given scheme.
Conference Paper
Full-text available
A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC 2). We follow the global contrastive learning loss as introduced in Yuan et al. (2022), and propose EMC 2 which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC 2 finds an O(1/ √ T)-stationary point of the global contrastive loss in T iterations. Compared to prior works, EMC 2 is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC 2 is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
Article
We propose a new methodology for leveraging deep generative priors for Bayesian inference in imaging inverse problems. Modern Bayesian imaging often relies on score-based diffusion generative priors, which deliver remarkable point estimates but significantly underestimate uncertainty. Push-forward models such as variational auto-encoders and generative adversarial networks provide a robust alternative, leading to Bayesian models that are provably well-posed and which produce accurate uncertainty quantification results for small problems. However, push-forward models scale poorly to large problems because of issues related to bias, mode collapse and multimodality. We propose to address this difficulty by embedding a conditional deep generative prior within an empirical Bayesian framework. We consider generative priors with a super-resolution architecture, and perform inference by using a Bayesian computation strategy that simultaneously computes the maximum marginal likelihood estimate (MMLE) of the low-resolution image of interest, and draws Monte Carlo samples from the posterior distribution of the high-resolution image, conditionally to the observed data and the MMLE. The methodology is demonstrated with an image deblurring experiment and comparisons with the state-of-the-art.
Preprint
Full-text available
Uncertainty quantification in image restoration is a prominent challenge, mainly due to the high dimensionality of the encountered problems. Recently, a Bayesian uncertainty quantification by optimization (BUQO) has been proposed to formulate hypothesis testing as a minimization problem. The objective is to determine whether a structure appearing in a maximum a posteriori estimate is true or is a reconstruction artifact due to the ill-posedness or ill-conditioness of the problem. In this context, the mathematical definition of having a ``fake structure" is crucial, and highly depends on the type of structure of interest. This definition can be interpreted as an inpainting of a neighborhood of the structure, but only simple techniques have been proposed in the literature so far, due to the complexity of the problem. In this work, we propose a data-driven method using a simple convolutional neural network to perform the inpainting task, leading to a novel plug-and-play BUQO algorithm. Compared to previous works, the proposed approach has the advantage that it can be used for a wide class of structures, without needing to adapt the inpainting operator to the area of interest. In addition, we show through simulations on magnetic resonance imaging, that compared to the original BUQO's hand-crafted inpainting procedure, the proposed approach provides greater qualitative output images. Python code will be made available for reproducibility upon acceptance of the article.
Article
The current research is focused on the synthesis of of non-magnetic shell layer on magnetic core CoFe2O4-X (X= BaTiO3 or Bi4Ti3O12), namely, CoFe2O4-BaTiO3 and CoFe2O4-Bi4Ti3O12 which were synthesized at different ratios by two-step wet chemical methods and characterized by X-ray diffraction (XRD), Scanning Electron Microscopy (SEM), Energy-Dispersive X-ray spectroscopy (EDS), and Transmission Electron Microscopy (TEM). A Vibrating Sample Magnetometer (VSM) was used to examine the magnetic properties of samples influenced by particle size and morphology. All crystal phases were quantitatively determined by the XRD and refined by Rietveld method. After analyzing the results, it was seen that the nature of shell adhered on the core plays a crucial role in the controlling of core-shell formation impacting on the magnetic properties. The results show that the majority of the samples are presented in BaTiO3 in tetragonal, Bi4Ti3O12 in orthorhombic, and CoFe2O4 in cubic structures. The lattice distortions observed in the samples are associated with the shell dimentions, seeing that the ratio 50:50 was found to be suitable for the formation of superparamagnetic core-shell structure (shell size of ~20 nm).
Article
Full-text available
Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation methods have been successfully applied to a wide range of problems in science and industry. However, this strategy scales poorly to large problems because of methodological and theoretical difficulties related to using high-dimensional Markov chain Monte Carlo algorithms within a stochastic approximation scheme. This paper proposes to address these difficulties by using unadjusted Langevin algorithms to construct the stochastic approximation. This leads to a highly efficient stochastic optimisation methodology with favourable convergence properties that can be quantified explicitly and easily checked. The proposed methodology is demonstrated with three experiments, including a challenging application to statistical audio analysis and a sparse Bayesian logistic regression with random effects problem.
Article
We introduce a novel and efficient algorithm called the stochastic approximate gradient descent (SAGD), as an alternative to the stochastic gradient descent for cases where unbiased stochastic gradients cannot be trivially obtained. Traditional methods for such problems rely on general-purpose sampling techniques such as Markov chain Monte Carlo, which typically requires manual intervention for tuning parameters and does not work efficiently in practice. Instead, SAGD makes use of the Langevin algorithm to construct stochastic gradients that are biased in finite steps but accurate asymptotically, enabling us to theoretically establish the convergence guarantee for SAGD. Inspired by our theoretical analysis, we also provide useful guidelines for its practical implementation. Finally, we show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders.
Article
Full-text available
We consider contractivity for diffusion semigroups w.r.t. Kantorovich ((Formula presented.) Wasserstein) distances based on appropriately chosen concave functions. These distances are inbetween total variation and usual Wasserstein distances. It is shown that by appropriate explicit choices of the underlying distance, contractivity with rates of close to optimal order can be obtained in several fundamental classes of examples where contractivity w.r.t. standard Wasserstein distances fails. Applications include overdamped Langevin diffusions with locally non-convex potentials, products of these processes, and systems of weakly interacting diffusions, both of mean-field and nearest neighbour type.
Article
Full-text available
Sampling distribution over high-dimensional state-space is a problem which has recently attracted a lot of research efforts; applications include Bayesian non-parametrics, Bayesian inverse problems and aggregation of estimators.All these problems boil down to sample a target distribution π\pi having a density \wrt\ the Lebesgue measure on Rd\mathbb{R}^d, known up to a normalisation factor xeU(x)/_RdeU(y)dyx \mapsto \mathrm{e}^{-U(x)}/\int\_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y where U is continuously differentiable and smooth. In this paper, we study a sampling technique based on the Euler discretization of the Langevin stochastic differential equation. Contrary to the Metropolis Adjusted Langevin Algorithm (MALA), we do not apply a Metropolis-Hastings correction. We obtain for both constant and decreasing step sizes in the Euler discretization, non-asymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances. A particular attention is paid on the dependence on the dimension of the state space, to demonstrate the applicability of this method in the high dimensional setting, at least when U is convex. These bounds are based on recently obtained estimates of the convergence of the Langevin diffusion to stationarity using Poincar{\'e} and log-Sobolev inequalities. These bounds improve and extend the results of (Dalalyan, 2014). We also investigate the convergence of an appropriately weighted empirical measure and we report sharp bounds for the mean square error and exponential deviation inequality for Lipschitz functions. A limited Monte Carlo experiment is carried out to support our findings.
Article
Full-text available
In this paper we connect various topological and probabilistic forms of stability for discrete-time Markov chains. These include tightness on the one hand and Harris recurrence and ergodicity on the other. We show that these concepts of stability are largely equivalent for a major class of chains (chains with continuous components), or if the state space has a sufficiently rich class of appropriate sets (‘petite sets'). We use a discrete formulation of Dynkin's formula to establish unified criteria for these stability concepts, through bounding of moments of first entrance times to petite sets. This gives a generalization of Lyapunov–Foster criteria for the various stability conditions to hold. Under these criteria, ergodic theorems are shown to be valid even in the non-irreducible case. These results allow a more general test function approach for determining rates of convergence of the underlying distributions of a Markov chain, and provide strong mixing results and new versions of the central limit theorem and the law of the iterated logarithm.
Article
Full-text available
The expectation-maximization (EM) algorithm is a powerful computational technique for locating maxima of functions. It is widely used in statistics for maximum likelihood or maximum a posteriori estimation in incomplete data models. In certain situations, however, this method is not applicable because the expectation step cannot be performed in closed form. To deal with these problems, a novel method is introduced, the stochastic approximation EM (SAEM), which replaces the expectation step of the EM algorithm by one iteration of a stochastic approximation procedure. The convergence of the SAEM algorithm is established under conditions that are applicable to many practical situations. Moreover, it is proved that, under mild additional conditions, the attractive stationary points of the SAEM algorithm correspond to the local maxima of the function presented to support our findings.
Article
Full-text available
In Part I we developed stability concepts for discrete chains, together with Foster–Lyapunov criteria for them to hold. Part II was devoted to developing related stability concepts for continuous-time processes. In this paper we develop criteria for these forms of stability for continuous-parameter Markovian processes on general state spaces, based on Foster-Lyapunov inequalities for the extended generator. Such test function criteria are found for non-explosivity, non-evanescence, Harris recurrence, and positive Harris recurrence. These results are proved by systematic application of Dynkin's formula. We also strengthen known ergodic theorems, and especially exponential ergodic results, for continuous-time processes. In particular we are able to show that the test function approach provides a criterion for f -norm convergence, and bounding constants for such convergence in the exponential ergodic case. We apply the criteria to several specific processes, including linear stochastic systems under non-linear feedback, work-modulated queues, general release storage processes and risk processes.
Article
Full-text available
In this paper we consider optimization problems where the objective function is given in a form of the expectation. A basic difficulty of solving such stochastic optimization problems is that the involved multidimensional integrals (expectations) cannot be computed with high accuracy. The aim of this paper is to compare two computational approaches based on Monte Carlo sampling techniques, namely, the stochastic approximation (SA) and the sample average approximation (SAA) methods. Both approaches, the SA and SAA methods, have a long history. Current opinion is that the SAA method can efficiently use a specific (say, linear) structure of the considered problem, while the SA approach is a crude subgradient method, which often performs poorly in practice. We intend to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems. We extend the analysis to the case of convex-concave stochastic saddle point problems and present (in our opinion highly encouraging) results of numerical experiments.
Article
Full-text available
Adaptive and interacting Markov chain Monte Carlo algorithms (MCMC) have been recently introduced in the literature. These novel simulation algorithms are designed to increase the simulation efficiency to sample complex distributions. Motivated by some recently introduced algorithms (such as the adaptive Metropolis algorithm and the interacting tempering algorithm), we develop a general methodological and theoretical framework to establish both the convergence of the marginal distribution and a strong law of large numbers. This framework weakens the conditions introduced in the pioneering paper by Roberts and Rosenthal [J. Appl. Probab. 44 (2007) 458--475]. It also covers the case when the target distribution π\pi is sampled by using Markov transition kernels with a stationary distribution that differs from π\pi.
Article
Full-text available
We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perfo rm more informative gradient-based learning. The adaptation, in essence, allows us to find needl es in haystacks in the form of very predictive yet rarely observed features. Our paradigm stems from recent advances in online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies the task of setting a learning rate and results in regret guar antees that are provably as good as the best proximal function that can be chosen in hindsight. We corroborate our theoretical results with experiments on a text classification task, showing substant ial improvements for classification with sparse datasets.
Article
Full-text available
The Monte Carlo expectation maximization (MCEM) algorithm is a versatile tool for inference in incomplete data models, especially when used in combination with Markov chain Monte Carlo simulation methods. In this contribution, the almost-sure convergence of the MCEM algorithm is established. It is shown, using uniform versions of ergodic theorems for Markov chains, that MCEM converges under weak conditions on the simulation kernel. Practical illustrations are presented, using a hybrid random walk Metropolis Hastings sampler and an independence sampler. The rate of convergence is studied, showing the impact of the simulation schedule on the fluctuation of the parameter estimate at the convergence. A novel averaging procedure is then proposed to reduce the simulation variance and increase the rate of convergence.
Article
Full-text available
In this paper we study the ergodicity properties of some adaptive Markov chain Monte Carlo algorithms (MCMC) that have been recently proposed in the literature. We prove that under a set of verifiable conditions, ergodic averages calculated from the output of a so-called adaptive MCMC sampler converge to the required value and can even, under more stringent assumptions, satisfy a central limit theorem. We prove that the conditions required are satisfied for the independent Metropolis--Hastings algorithm and the random walk Metropolis algorithm with symmetric increments. Finally, we propose an application of these results to the case where the proposal distribution of the Metropolis--Hastings update is a mixture of distributions from a curved exponential family.
Book
This book covers recent advances in image processing and imaging sciences from an optimization viewpoint, especially convex optimization with the goal of designing tractable algorithms. Throughout the handbook, the authors introduce topics on the most key aspects of image acquisition and processing that are based on the formulation and solution of novel optimization problems. The first part includes a review of the mathematical methods and foundations required, and covers topics in image quality optimization and assessment. The second part of the book discusses concepts in image formation and capture from color imaging to radar and multispectral imaging. The third part focuses on sparsity constrained optimization in image processing and vision and includes inverse problems such as image restoration and de-noising, image classification and recognition and learning-based problems pertinent to image understanding. Throughout, convex optimization techniques are shown to be a critically important mathematical tool for imaging science problems and applied extensively. Convex Optimization Methods in Imaging Science is the first book of its kind and will appeal to undergraduate and graduate students, industrial researchers and engineers and those generally interested in computational aspects of modern, real-world imaging and image processing problems. Discusses recent developments in imaging science and provides tools for solving image processing and computer vision problems using convex optimization methods. The reader is provided with the state of the art advancements in each imaging science problem that is covered and is directed to cutting edge theory and methods that should particularly help graduate students and young researchers in shaping their research. Each chapter of the book covers a real-world imaging science problem while balancing both the theoretical and experimental aspects. The theoretical foundation of the problem is discussed thoroughly and then from a practical point of view, extensive validation and experiments are provided to enable the transition from theory to practice. • Topics of high current relevance are covered and include color and spectral imaging, dictionary learning for image classification and recovery, optimization and evaluation of image quality, sparsity constrained estimation for image processing and computer vision etc. • Provides insight on handling real-world imaging science problems that involve hard and non-convex objective functions through tractable convex optimization methods with the goal of providing a favorable performance-complexity trade-off.
Book
Meyn & Tweedie is back! The bible on Markov chains in general state spaces has been brought up to date to reflect developments in the field since 1996 - many of them sparked by publication of the first edition. The pursuit of more efficient simulation algorithms for complex Markovian models, or algorithms for computation of optimal policies for controlled Markov models, has opened new directions for research on Markov chains. As a result, new applications have emerged across a wide range of topics including optimisation, statistics, and economics. New commentary and an epilogue by Sean Meyn summarise recent developments and references have been fully updated. This second edition reflects the same discipline and style that marked out the original and helped it to become a classic: proofs are rigorous and concise, the range of applications is broad and knowledgeable, and key ideas are accessible to practitioners with limited mathematical background.
Article
We study a version of the proximal gradient algorithm for which the gradient is intractable and is approximated by Monte Carlo methods (and in particular Markov Chain Monte Carlo). We derive conditions on the step size and the Monte Carlo batch size under which convergence is guaranteed: both increasing batch size and constant batch size are considered. We also derive non-asymptotic bounds for an averaged version. Our results cover both the cases of biased and unbiased Monte Carlo approximation. To support our findings, we discuss the inference of a sparse generalized linear model with random effect and the problem of learning the edge structure and parameters of sparse undirected graphical models.
Article
Modern imaging methods rely strongly on Bayesian inference techniques to solve challenging imaging problems. Currently, the predominant Bayesian computation approach is convex optimisation, which scales very efficiently to high dimensional image models and delivers accurate point estimation results. However, in order to perform more complex analyses, for example image uncertainty quantification or model selection, it is necessary to use more computationally intensive Bayesian computation techniques such as Markov chain Monte Carlo methods. This paper presents a new and highly efficient Markov chain Monte Carlo methodology to perform Bayesian computation for high dimensional models that are log-concave and non-smooth, a class of models that is central in imaging sciences. The methodology is based on a regularised unadjusted Langevin algorithm that exploits tools from convex analysis, namely Moreau-Yoshida envelopes and proximal operators, to construct Markov chains with favourable convergence properties. In addition to scaling efficiently to high dimensions, the method is straightforward to apply to models that are currently solved by using proximal optimisation algorithms. We provide a detailed theoretical analysis of the proposed methodology, including asymptotic and non-asymptotic convergence results with easily verifiable conditions, and explicit bounds on the convergence rates. The proposed methodology is demonstrated with four experiments related to image deconvolution and tomographic reconstruction with total-variation and 1\ell_1 priors, where we conduct a range of challenging Bayesian analyses related to uncertainty quantification, hypothesis testing, and model selection in the absence of ground truth.
Article
A large number of imaging problems reduce to the optimization of a cost function, with typical structural properties. The aim of this paper is to describe the state of the art in continuous optimization methods for such problems, and present the most successful approaches and their interconnections. We place particular emphasis on optimal first-order schemes that can deal with typical non-smooth and large-scale objective functions used in imaging problems. We illustrate and compare the different algorithms using classical non-smooth problems in imaging, such as denoising and deblurring. Moreover, we present applications of the algorithms to more advanced problems, such as magnetic resonance imaging, multilabel image segmentation, optical flow estimation, stereo matching, and classification.
Article
Sampling from various kind of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, testing procedures or confidence intervals. In many situations, the exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, to the best of our knowledge, there is no well-developed theory providing meaningful nonasymptotic guarantees for the approximate sampling procedures, especially in the high-dimensional problems. This paper aims at doing the first steps in this direction by considering the problem of sampling from a distribution having a smooth and log-concave density defined on Rp\mathbb R^p, for some integer p>0p>0. We establish nonasymptotic bounds for the error of approximating the true distribution by the one obtained from the Langevin Monte Carlo method.
Book
Introduction.- Part I Markov semigroups, basics and examples: 1.Markov semigroups.- 2.Model examples.- 3.General setting.- Part II Three model functional inequalities: 4.Poincare inequalities.- 5.Logarithmic Sobolev inequalities.- 6.Sobolev inequalities.- Part III Related functional, isoperimetric and transportation inequalities: 7.Generalized functional inequalities.- 8.Capacity and isoperimetry-type inequalities.- 9.Optimal transportation and functional inequalities.- Part IV Appendices: A.Semigroups of bounded operators on a Banach space.- B.Elements of stochastic calculus.- C.Some basic notions in differential and Riemannian geometry.- Notations and list of symbols.- Bibliography.- Index.
Article
Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x=θx = \theta of the equation M(x)=αM(x) = \alpha, where α\alpha is a given constant. We give a method for making successive experiments at levels x1,x2,x_1,x_2,\cdots in such a way that xnx_n will tend to θ\theta in probability.
Book
The use of adaptive algorithms is now very widespread across such varied applications as system identification, adaptive control, transmission systems, adaptive filtering for signal processing, and several aspects of pattern recognition. Numerous, very different examples of applications are given in the text. The success of adaptive algorithms has inspired an abundance of literature, and more recently a number of significant works such as the books of Ljung and Soderström (1983) and of Goodwin and Sin (1984).
Article
  The paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The methods provide fully automated adaptation mechanisms that circumvent the costly pilot runs that are required to tune proposal densities for Metropolis–Hastings or indeed Hamiltonian Monte Carlo and Metropolis adjusted Langevin algorithms. This allows for highly efficient sampling even in very high dimensions where different scalings may be required for the transient and stationary phases of the Markov chain. The methodology proposed exploits the Riemann geometry of the parameter space of statistical models and thus automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density. The performance of these Riemann manifold Monte Carlo methods is rigorously assessed by performing inference on logistic regression models, log-Gaussian Cox point processes, stochastic volatility models and Bayesian estimation of dynamic systems described by non-linear differential equations. Substantial improvements in the time-normalized effective sample size are reported when compared with alternative sampling approaches. MATLAB code that is available from http://www.ucl.ac.uk/statistics/research/rmhmc allows replication of all the results reported.
Article
S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Article
Incluye bibliografía e índice
Article
In this paper we consider a continuous-time method of approximating a given distribution [math] using the Langevin diffusion [math] . We find conditions under which this diffusion converges exponentially quickly to [math] or does not: in one dimension, these are essentially that for distributions with exponential tails of the form [math] , [math] , exponential convergence occurs if and only if [math] . We then consider conditions under which the discrete approximations to the diffusion converge. We first show that even when the diffusion itself converges, naive discretizations need not do so. We then consider a 'Metropolis-adjusted' version of the algorithm, and find conditions under which this also converges at an exponential rate: perhaps surprisingly, even the Metropolized version need not converge exponentially fast even if the diffusion does. We briefly discuss a truncated form of the algorithm which, in practice, should avoid the difficulties of the other forms.
Article
Libro de probabilidad. Contenido: Introducción; Martingalas; Procesos de Markov; Integración estocástica; Representación de martingalas; Hora media local; Generadores y tiempo de inversión; Teorema de Girsanov y aplicación; Ecuaciones diferenciales estocásticas; Funciones aditivas de movimiento browniano; Procesos de Bessel y teorema Ray-Knight; Recorrido; Teorema de límite y distribución.
Article
The wide applicability of Gibbs sampling has increased the use of more complex and multi-level hierarchical models. To use these models entails dealing with hyperparameters in the deeper levels of a hierarchy. There are three typical methods for dealing with these hyperparameters: specify them, estimate them, or use a 'flat' prior. Each of these strategies has its own associated problems. In this paper, using an empirical Bayes approach, we show how the hyperparameters can be estimated in a way that is both computationally feasible and statistically valid.
Article
Many problems in signal processing and statistical inference involve finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ) error term combined with a sparseness-inducing regularization term. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), wavelet-based deconvolution, and compressed sensing are a few well-known examples of this approach. This paper proposes gradient projection (GP) algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is de-emphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance.
Article
Recently, a lot of attention has been paid to regularization based methods for sparse signal reconstruction (e.g., basis pursuit denoising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as -regularized least-squares programs (LSPs), which can be reformulated as convex quadratic programs, and then solved by several standard methods such as interior-point methods, at least for small and medium size problems. In this paper, we describe a specialized interior-point method for solving large-scale -regularized LSPs that uses the preconditioned conjugate gradients algorithm to compute the search direction. The interior-point method can solve large sparse problems, with a million variables and observations, in a few tens of minutes on a PC. It can efficiently solve large dense problems, that arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for these transforms. The method is illustrated on a magnetic resonance imaging data set.
Article
In 1978, Osserman [124] wrote a rather comprehensive survey on the isoperimetric inequality. The Brunn-Minkowski inequality can be proved in a page, yet quickly yields the classical isoperimetric inequality for important classes of subsets of R n , and deserves to be better known. We present a guide that explains the relationship between the Brunn-Minkowski inequality and other inequalities in geometry and analysis, and some of its recent applications. 1.
Analysis and geometry of Markov diffusion operators
  • D Bakry
  • I Gentil
  • M Ledoux
D. Bakry, I. Gentil, and M. Ledoux. Analysis and geometry of Markov diffusion operators, volume 348 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences].
Compressed sensing audio demonstration
  • Laura Balzano
  • Robert Nowak
  • J Ellenberg
Laura Balzano, Robert Nowak, and J Ellenberg. Compressed sensing audio demonstration. website http://web.eecs.umich.edu/~girasole/csaudio, 2010.
An introduction to compressive sampling
  • J Emmanuel
  • Candès
  • Michael B Wakin
Emmanuel J Candès and Michael B Wakin. An introduction to compressive sampling [a sensing/sampling paradigm that goes against the common knowledge in data acquisition].
Compressive sampling
  • J Emmanuel
  • Candès
Emmanuel J Candès et al. Compressive sampling. In Proceedings of the international congress of mathematicians, volume 3, pages 1433-1452. Madrid, Spain, 2006.
Empirical Bayes: past, present and future
  • P Bradley
  • Thomas A Carlin
  • Louis
Bradley P. Carlin and Thomas A. Louis. Empirical Bayes: past, present and future. J. Amer. Statist. Assoc., 95(452):1286-1289, 2000. ISSN 0162-1459. doi: 10.2307/2669771. URL https://doi.org/10.2307/2669771.
An introduction to empirical Bayes data analysis
  • George Casella
George Casella. An introduction to empirical Bayes data analysis. Amer. Statist., 39(2):83-87, 1985. ISSN 0003-1305. doi: 10.2307/2682801. URL https://doi.org/10.2307/2682801.
  • De Bortoli
  • A Durmus
V. De Bortoli and A. Durmus. Convergence of diffusions and their discretizations:from continuous to discrete processes and back. arXiv preprint arXiv:1904.09808, 2019.
  • R Douc
  • É Moulines
  • P Priouret
  • P Soulier
R. Douc, É. Moulines, P. Priouret, and P. Soulier. Markov Chains. Springer, 2018. to be published.
On the convergence of hamiltonian monte carlo
  • A Durmus
  • E Moulines
  • E Saksman
A. Durmus, E. Moulines, and E. Saksman. On the convergence of hamiltonian monte carlo. arXiv preprint arXiv:1705.00166, 2017.
  • A Eberle
  • A Guillin
  • R Zimmer
A. Eberle, A. Guillin, and R. Zimmer. Couplings and quantitative contraction rates for langevin dynamics. arXiv preprint arXiv:1703.01617, 2017.