Figure - available from: Statistics and Computing
This content is subject to copyright. Terms and conditions apply.
Statistical audio compression-Evolution of the the iterate θn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n$$\end{document} and θ^n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }_n$$\end{document} with σ=0.015\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma =0.015$$\end{document} in log scale (left). Reconstruction mean squared error (MSE) in dB as a function of the θ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} (right)

Statistical audio compression-Evolution of the the iterate θn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n$$\end{document} and θ^n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }_n$$\end{document} with σ=0.015\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma =0.015$$\end{document} in log scale (left). Reconstruction mean squared error (MSE) in dB as a function of the θ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} (right)

Source publication
Article
Full-text available
Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation...

Similar publications

Preprint
Full-text available
The main goal of this paper is to study the possibility of using a stochastic non-homogeneous (without exogenous factors) diffusion process to model the evolution of CO 2 emissions in Morocco and concretely using a new process, in which the trend function is proportional to the modified Lundqvist-Korf growth curve. First, the main characteristics o...
Article
Full-text available
The main objective of this paper is to study the possibility of using a stochastic non-homogeneous (without exogenous factors) diffusion process to model the evolution of CO2 in Morocco. Concretely, we use a new process, in which the trend function is proportional to the modified Lundqvist korf growth curve. First, we study the main characteristics...

Citations

... In this landscape, a new method was introduced by [3], such that samples of the latent space variable were generated via an Unadjusted Langevin Algorithm (ULA) chain. An alternative avenue was pioneered in [10], in which the study of the limiting behaviour of various gradient flows associated with appropriate free energy functionals led to an interacting particle system (IPS) that provides efficient estimates for maximum likelihood estimations. ...
... However, tIPLAc compares favorably, attaining the same complexity as its competitors by allowing superlinear growth while imposing element-wise dissipativity. We highlight that, SOUL [3] is omitted from Table 1 due to different methodologies being employed: it operates sequentially with a single particle for latent posterior sampling, whereas our approach uses N particles and is parallelizable; additionally, SOUL [3] uses a decreasing stepsize approach to characterize almost sure convergence while in this paper a constant stepsize is used to obtain L 2 convergence estimates. Furthermore, we do not include classical MCMC methods in our numerical comparison, as our focus is on gradient-based approaches. ...
... However, tIPLAc compares favorably, attaining the same complexity as its competitors by allowing superlinear growth while imposing element-wise dissipativity. We highlight that, SOUL [3] is omitted from Table 1 due to different methodologies being employed: it operates sequentially with a single particle for latent posterior sampling, whereas our approach uses N particles and is parallelizable; additionally, SOUL [3] uses a decreasing stepsize approach to characterize almost sure convergence while in this paper a constant stepsize is used to obtain L 2 convergence estimates. Furthermore, we do not include classical MCMC methods in our numerical comparison, as our focus is on gradient-based approaches. ...
Article
Full-text available
    Recent advances in stochastic optimization have yielded the interacting particle Langevin algorithm (IPLA), which leverages the notion of interacting particle systems (IPS) to efficiently sample from approximate posterior densities. This becomes particularly crucial in relation to the framework of Expectation-Maximization (EM), where the E-step is computationally challenging or even intractable. Although prior research has focused on scenarios involving convex cases with gradients of log densities that grow at most linearly, our work extends this framework to include polynomial growth. Taming techniques are employed to produce an explicit discretization scheme that yields a new class of stable, under such non-linearities, algorithms which are called tamed interacting particle Langevin algorithms (tIPLA). We obtain non-asymptotic convergence error estimates in Wasserstein-2 distance for the new class under the best known rate.
    ... Naturally, MCMC methods based on unadjusted Markov kernels like ULA have been also used in the context of EM. In particular, [34] studied the SOUL algorithm which uses ULA (or, more generally, inexact Markov kernels) in order to draw (approximate) samples from the posterior distribution of the latent variables and approximate the E-step. This algorithm proceeds in a coordinate-wise manner, first running a sequence of ULA steps to obtain samples, then approximating the E-step with these samples, and finally moving to the M-step. ...
    ... But since sampling from the unnormalised measure e −U (θ,x) for fixed θ in this setting is typically intractable, the samples drawn using numerical schemes would be approximate which would incur bias on the gradient estimates. This is the precise problem investigated in prior works, see, e.g., [19] or [34] for using MCMC chains to sample from e −U (θ,x) with fixed θ and use these samples to estimate the gradient of k(θ). This approach requires non-trivial assumptions on step-sizes of optimisation schemes and sometimes not even computationally tractable as the sample sizes used to estimate gradients may need to increase over iterations. ...
    ... In this section, we provide numerical experiments to demonstrate the empirical behaviour of IPLA in relation to similar competitors, such as particle gradient descent (PGD) scheme [35] and the SOUL algorithm [34]. ...
    Article
    Full-text available
    We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation (MMLE) procedure to estimate the parameters of a latent variable model. We achieve this by formulating a continuous-time interacting particle system which can be seen as a Langevin diffusion over an extended state space of parameters and latent variables. In particular, we prove that the parameter marginal of the stationary measure of this diffusion has the form of a Gibbs measure where number of particles acts as the inverse temperature parameter in classical settings for global optimisation. Using a particular rescaling, we prove geometric ergodicity of the system and bound the discretisation error in a manner that is uniform in time, and non-increasing with the number of particles. We further prove nonasymptotic bounds for the optimisation error of our estimator in terms of key parameters, in the case of both deterministic and stochastic gradients. We provide numerical experiments to illustrate the empirical behaviour of our algorithm in the context of logistic regression with verifiable assumptions. Our setting provides a straightforward way to implement a diffusion-based optimisation routine compared to classical approaches such as the Expectation Maximisation (EM) algorithm, and allows for especially explicit nonasymptotic bounds.
    ... The wide use of the EM algorithm is due to the fact that it can be implemented using approximations for both steps: analytic maximisation can be replaced by numerical optimisation (Meng and Rubin, 1993;Liu and Rubin, 1994) and the expectation step can be approximated via Monte Carlo sampling from p θ (·|y) (Wei and Tanner, 1990;Celeux, 1985). When exact sampling from the posterior is unfeasible, approximate samples can be drawn via Markov chain Monte Carlo (MCMC;De Bortoli et al. (2021); Delyon et al. (1999)) leading to stochastic approximation EM (SAEM). ...
    Preprint
    Full-text available
    We introduce an approach based on mirror descent and sequential Monte Carlo (SMC) to perform joint parameter inference and posterior estimation in latent variable models. This approach is based on minimisation of a functional over the parameter space and the space of probability distributions and, contrary to other popular approaches, can be implemented when the latent variable takes values in discrete spaces. We provide a detailed theoretical analysis of both the mirror descent algorithm and its approximation via SMC. We experimentally show that the proposed algorithm outperforms standard expectation maximisation algorithms and is competitive with other popular methods for real-valued latent variables.
    ... Recent extensions have considered unadjusted MCMC, whereby the Metropolis adjustment is skipped for computational simplicity. Perhaps the most notable of these is the stochastic optimization via unadjusted Langevin (SOUL) algorithm [12], which makes use of the unadjusted Langevin algorithm (ULA) [10,17], to solve MMLE problems. Through Fisher's identity the SOUL approach leverages a stochastic approximation, iteratively running the E-step with a ULA chain, then using these samples to compute the M-step. ...
    ... The main bottleneck of this method is that it requires a unique Markov chain to be run to compute the E-step, for each M-step. A similar approach is taken in [29], which replaces the coordinate-wise procedure of [12] with an interacting particle system. This method constructs N particles for the latent variables (instead of running a chain in time) and builds a procedure in the joint space of N latent variables and parameters of interest. ...
    Preprint
    In this paper, we develop a class of interacting particle Langevin algorithms to solve inverse problems for partial differential equations (PDEs). In particular, we leverage the statistical finite elements (statFEM) formulation to obtain a finite-dimensional latent variable statistical model where the parameter is that of the (discretised) forward map and the latent variable is the statFEM solution of the PDE which is assumed to be partially observed. We then adapt a recently proposed expectation-maximisation like scheme, interacting particle Langevin algorithm (IPLA), for this problem and obtain a joint estimation procedure for the parameters and the latent variables. We consider three main examples: (i) estimating the forcing for linear Poisson PDE, (ii) estimating the forcing for nonlinear Poisson PDE, and (iii) estimating diffusivity for linear Poisson PDE. We provide computational complexity estimates for forcing estimation in the linear case. We also provide comprehensive numerical experiments and preconditioning strategies that significantly improve the performance, showing that the proposed class of methods can be the choice for parameter inference in PDE models.
    ... This algorithm required to run a Markov chain for each E-step, resulting in a double-loop algorithm. The bias incurred by unadjusted chains complicates the theoretical analysis and requires a delicate balance of the step-sizes of the Langevin algorithm and gradient step to guarantee convergence [24]. An alternative approach was developed in [45], where instead of running Markov chains to perform E-step, the authors proposed to use an interacting particle system, consisting of N particles. ...
    ... which is easy to prove (see, e.g., [27,Proposition D.4] and [2, Remark 1]). The expression in (6) is the motivation behind approximate schemes for expectation-maximisation, where one can first sample from p θ (x|y) for fixed θ with an MCMC chain and then compute the gradient in (6) with these samples [24,6]. However, these approaches require non-trivial assumptions on the step-size, as well as having a sample size that may need to grow to ensure that the gradient approximation does not incur asymptotic bias. ...
    ... With this nonasymptotic analysis, [2] identify parameters needed to implement the algorithm and introduce an order of convergence guarantee in W 2 . The algorithm is empirically shown to be competitive with state-of-the-art models, such as SOUL [24] and the PGD proposed in [45]. Further, the IPLA algorithm lays the foundation for many avenues for further exploration: such as considering a kinetic Langevin algorithm and considering other numerical discretisations of proposed SDEs as we will do in this paper. ...
    Preprint
    This paper introduces and analyses interacting underdamped Langevin algorithms, termed Kinetic Interacting Particle Langevin Monte Carlo (KIPLMC) methods, for statistical inference in latent variable models. We propose a diffusion process that evolves jointly in the space of parameters and latent variables and exploit the fact that the stationary distribution of this diffusion concentrates around the maximum marginal likelihood estimate of the parameters. We then provide two explicit discretisations of this diffusion as practical algorithms to estimate parameters of statistical models. For each algorithm, we obtain nonasymptotic rates of convergence for the case where the joint log-likelihood is strongly concave with respect to latent variables and parameters. In particular, we provide convergence analysis for the diffusion together with the discretisation error, providing convergence rate estimates for the algorithms in Wasserstein-2 distance. To demonstrate the utility of the introduced methodology, we provide numerical experiments that demonstrate the effectiveness of the proposed diffusion for statistical inference and the stability of the numerical integrators utilised for discretisation. Our setting covers a broad number of applications, including unsupervised learning, statistical inference, and inverse problems.
    ... The third direction involves stochastic optimisation methods that combine the stochastic approximation (Robbins & Monro, 1951) and sampling techniques. Developments in this direction include Gu and Kong (1998), Cai (2010b), Cai (2010a), Atchade et al. (2017), De Bortoli et al. (2021, and Zhang and Chen (2022). These methods are closely related to the StEM algorithm. ...
    ... As shown via a simulation study in Zhang and Chen (2022), the stochastic optimisation approach is computationally more efficient than StEM, as performing a stochastic gradient update is substantially faster than solving an optimisation. Among the developments in this direction, we draw attention to the Stochastic Optimisation by Unadjusted Langevin (SOUL) method proposed in De Bortoli et al. (2021), which adopts an Unadjusted Langevin Sampler (ULS; Durmus & Moulines, 2019;Roberts & Tweedie, 1996) in the sampling step. The sampler is an inexact MCMC sampler. ...
    ... In what follows, we provide the convergence guarantee for the proposed algorithm. It extends Theorem 5 in De Bortoli et al. (2021) to the setting where the SG is constructed using minibatches. ...
    Preprint
    Latent variable models are widely used in social and behavioural sciences, such as education, psychology, and political science. In recent years, high-dimensional latent variable models have become increasingly common for analysing large and complex data. Estimating high-dimensional latent variable models using marginal maximum likelihood is computationally demanding due to the complexity of integrals involved. To address this challenge, stochastic optimisation, which combines stochastic approximation and sampling techniques, has been shown to be effective. This method iterates between two steps -- (1) sampling the latent variables from their posterior distribution based on the current parameter estimate, and (2) updating the fixed parameters using an approximate stochastic gradient constructed from the latent variable samples. In this paper, we propose a computationally more efficient stochastic optimisation algorithm. This improvement is achieved through the use of a minibatch of observations when sampling latent variables and constructing stochastic gradients, and an unadjusted Langevin sampler that utilises the gradient of the negative complete-data log-likelihood to sample latent variables. Theoretical results are established for the proposed algorithm, showing that the iterative parameter update converges to the marginal maximum likelihood estimate as the number of iterations goes to infinity. Furthermore, the proposed algorithm is shown to scale well to high-dimensional settings through simulation studies and a personality test application with 30,000 respondents, 300 items, and 30 latent dimensions.
    ... In recent years, motivated by diffusion (or SDE) based MCMC samplers, such as the unadjusted Langevin algorithm (ULA) [48,26,18,19,20], a number of methods have been proposed for MMLE problem that are approximating the E-step of the EM with an unadjusted Langevin chain. In this context, [21] studied an algorithm, abbreviated as SOUL, which consists of an E-step based on an approximation provided by a ULA chain and an M-step based on gradient descent. The work [21] showed the convergence of the algorithm (in discrete-time) under some strict conditions, and furthermore provided empirical evidence of the performance of this method. ...
    ... In this context, [21] studied an algorithm, abbreviated as SOUL, which consists of an E-step based on an approximation provided by a ULA chain and an M-step based on gradient descent. The work [21] showed the convergence of the algorithm (in discrete-time) under some strict conditions, and furthermore provided empirical evidence of the performance of this method. Inspired by SDE-based approaches such as SOUL, there has been also algorithms developed based on interacting particle systems -which instead of a single ULA chain for the E-step, use a system of interacting N particles run in parallel with parameter updates, see, e.g., [36,1]. ...
    ... As a byproduct of our approach, we unify a number of known algorithms from computational statistics and multiscale methods such as SOUL [21], IPLA [1], PGD [36], and HMM [54] EFC [35], and (TAMD) [52] within a common framework, observing that they could all be thought of as different discretizations of an appropriate multiscale dynamics. This observation unlocks a number of research directions in putting forward a coherent theoretical framework for these methods. ...
    Preprint
    In this paper, we provide a multiscale perspective on the problem of maximum marginal likelihood estimation. We consider and analyse a diffusion-based maximum marginal likelihood estimation scheme using ideas from multiscale dynamics. Our perspective is based on stochastic averaging; we make an explicit connection between ideas in applied probability and parameter inference in computational statistics. In particular, we consider a general class of coupled Langevin diffusions for joint inference of latent variables and parameters in statistical models, where the latent variables are sampled from a fast Langevin process (which acts as a sampler), and the parameters are updated using a slow Langevin process (which acts as an optimiser). We show that the resulting system of stochastic differential equations (SDEs) can be viewed as a two-time scale system. To demonstrate the utility of such a perspective, we show that the averaged parameter dynamics obtained in the limit of scale separation can be used to estimate the optimal parameter, within the strongly convex setting. We do this by using recent uniform-in-time non-asymptotic averaging bounds. Finally, we conclude by showing that the slow-fast algorithm we consider here, termed Slow-Fast Langevin Algorithm, performs on par with state-of-the-art methods on a variety of examples. We believe that the stochastic averaging approach we provide in this paper enables us to look at these algorithms from a fresh angle, as well as unlocking the path to develop and analyse new methods using well-established averaging principles.
    ... Estimating the gradient of the marginal likelihood with respect to the hyper-parameter using the current particle set would in principle allow the adaptive specification of a sequence of hyper-parameter values (and hence posterior distributions) which converges towards that which maximises the marginal likelihood. Such an approach is in the spirit of the SOUL (De Bortoli et al. 2021) and PGD-type (Kuntz et al. 2023) algorithms but would employ sequential Monte Carlo in order to provide sample approximations rather than Langevin-type dynamics. There are two ways one could view such an algorithm: as a Monte Carlo approximation of a gradient-based optimizer for the hyperparameter; or as an adaptive SMC sampler in which the sequence of distributions is specified by following an approximate gradient direction in the space of parameters. ...
    Article
    Full-text available
    In Bayesian inverse problems, one aims at characterizing the posterior distribution of a set of unknowns, given indirect measurements. For non-linear/non-Gaussian problems, analytic solutions are seldom available: Sequential Monte Carlo samplers offer a powerful tool for approximating complex posteriors, by constructing an auxiliary sequence of densities that smoothly reaches the posterior. Often the posterior depends on a scalar hyper-parameter, for which limited prior information is available. In this work, we show that properly designed Sequential Monte Carlo (SMC) samplers naturally provide an approximation of the marginal likelihood associated with this hyper-parameter for free, i.e. at a negligible additional computational cost. The proposed method proceeds by constructing the auxiliary sequence of distributions in such a way that each of them can be interpreted as a posterior distribution corresponding to a different value of the hyper-parameter. This can be exploited to perform selection of the hyper-parameter in Empirical Bayes (EB) approaches, as well as averaging across values of the hyper-parameter according to some hyper-prior distribution in Fully Bayesian (FB) approaches. For FB approaches, the proposed method has the further benefit of allowing prior sensitivity analysis at a negligible computational cost. In addition, the proposed method exploits particles at all the (relevant) iterations, thus alleviating one of the known limitations of SMC samplers, i.e. the fact that all samples at intermediate iterations are typically discarded. We show numerical results for two distinct cases where the hyper-parameter affects only the likelihood: a toy example, where an SMC sampler is used to approximate the full posterior distribution; and a brain imaging example, where a Rao-Blackwellized SMC sampler is used to approximate the posterior distribution of a subset of parameters in a conditionally linear Gaussian model.
    ... In this case, the marginal likelihood θ → p θ (x) has a unique maximum given by θ In Fig. 1, we evaluate the performance of SVGD EM and Coin EM on this model, setting d z = 100 and θ = 1. We also include results for PGD [38] and the stochastic optimization via unadjusted Langevin (SOUL) algorithm [20]. In this case, both of our methods generate parameters θ t which converge rapidly to θ * , and particles (z i t ) N i=1 whose mean converges to the corresponding posterior mean. ...
    ... We next consider a standard Bayesian logistic regression with Gaussian priors, fit using the Wisconsin dataset [72]; see also [20,Sec. 4.1]. ...
    ... In Fig. 3, we compare the performance of our algorithms with PGD [38], PMGD [38], and SOUL [20]. We first plot an illustrative sequence of parameter estimates ( Fig. 3(a)) for each method, initialized at zero, using N = 100 particles and T = 800 iterations. ...
    Preprint
    We introduce two new particle-based algorithms for learning latent variable models via marginal maximum likelihood estimation, including one which is entirely tuning-free. Our methods are based on the perspective of marginal maximum likelihood estimation as an optimization problem: namely, as the minimization of a free energy functional. One way to solve this problem is to consider the discretization of a gradient flow associated with the free energy. We study one such approach, which resembles an extension of the popular Stein variational gradient descent algorithm. In particular, we establish a descent lemma for this algorithm, which guarantees that the free energy decreases at each iteration. This method, and any other obtained as the discretization of the gradient flow, will necessarily depend on a learning rate which must be carefully tuned by the practitioner in order to ensure convergence at a suitable rate. With this in mind, we also propose another algorithm for optimizing the free energy which is entirely learning rate free, based on coin betting techniques from convex optimization. We validate the performance of our algorithms across a broad range of numerical experiments, including several high-dimensional settings. Our results are competitive with existing particle-based methods, without the need for any hyperparameter tuning.
    ... But since sampling from the unnormalised measure e −U (θ,x) for fixed θ in this setting is typically intractable, the samples drawn using numerical schemes would be approximate which would incur bias on the gradient estimates. This is the precise problem investigated in prior works, see, e.g., Atchadé et al. (2017) or De Bortoli et al. (2021) for using MCMC chains to sample from e −U (θ,x) with fixed θ and use these samples to estimate the gradient of k(θ). This approach requires non-trivial assumptions on step-sizes of optimisation schemes and sometimes not even computationally tractable as the sample sizes used to estimate gradients may need to increase over iterations. ...
    ... De Bortoli et al. (2021) propose to replace the exact sampling from p θ (n) (·|y) used in SAEM, usually obtained via expensive MCMC kernels, with inexact Markov kernels which draw approximate samples from p θ (n) (·|y) but have better scalability properties and are easier to analyse. Using ULA (Roberts and Tweedie, 1996) to obtain approximate samples from p θ (n) (·|y) leads to the stochastic optimisation via Unadjusted Langevin algorithm (SOUL). ...
    ... We next adopt a realistic example using the same setup in . In particular, in this section, we compare IPLA to PGD as well as more standard methods like mean-field variational inference (MFVI) and SOUL (De Bortoli et al., 2021). ...
    Preprint
    Full-text available
    We study a class of interacting particle systems for implementing a marginal maximum likelihood estimation (MLE) procedure to optimize over the parameters of a latent variable model. To do so, we propose a continuous-time interacting particle system which can be seen as a Langevin diffusion over an extended state space, where the number of particles acts as the inverse temperature parameter in classical settings for optimisation. Using Langevin diffusions, we prove nonasymptotic concentration bounds for the optimisation error of the maximum marginal likelihood estimator in terms of the number of particles in the particle system, the number of iterations of the algorithm, and the step-size parameter for the time discretisation analysis.