Preprint

Entropy minimizing distributions are worst-case optimal importance proposals

Authors:
  • INRIA Center of Rennes
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Importance sampling of target probability distributions belonging to a given convex class is considered. Motivated by previous results, the cost of importance sampling is quantified using the relative entropy of the target with respect to proposal distributions. Using a reference measure as a reference for cost, we prove under some general conditions that the worst-case optimal proposal is precisely given by the distribution minimizing entropy with respect to the reference within the considered convex class of distributions. The latter conditions are in particular satisfied when the convex class is defined using a push-forward map defining atomless conditional measures. Applications in which the optimal proposal is Gibbsian and can be practically sampled using Monte Carlo methods are discussed.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The effective sample size (ESS) is widely used in sample‐based simulation methods for assessing the quality of a Monte Carlo approximation of a given distribution and of related integrals. In this paper, we revisit the approximation of the ESS in the specific context of importance sampling. The derivation of this approximation, that we will denote as ESS^ESS^ \hat{\mathrm{ESS}} , is partially available in a 1992 foundational technical report of Augustine Kong. This approximation has been widely used in the last 25 years due to its simplicity as a practical rule of thumb in a wide variety of importance sampling methods. However, we show that the multiple assumptions and approximations in the derivation of ESS^ESS^ \hat{\mathrm{ESS}} make it difficult to be considered even as a reasonable approximation of the ESS. We extend the discussion of the ESS^ESS^ \hat{\mathrm{ESS}} in the multiple importance sampling setting, we display numerical examples and we discuss several avenues for developing alternative metrics. This paper does not cover the use of ESS for Markov chain Monte Carlo algorithms.
Article
Full-text available
The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation of the theoretical ESS definition is widely applied, involving the inverse of the sum of the squares of the normalized importance weights. This formula, , has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric mean of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five theoretical requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.
Article
Full-text available
The goal of this paper is to complete results available about I-projections, reverse I-projections, and their generalized versions, with focus on linear and exponential families. Pythagorean-like identities and inequalities are revisited and generalized, and generalized maximum-likelihood (ML) estimates for exponential families are introduced. The main tool is a new concept of extension of exponential families, based on our earlier results on convex cores of measures.
Article
Full-text available
In the design of efficient simulation algorithms, one is often beset with a poor choice of proposal distributions. Although the performance of a given simulation kernel can clarify a posteriori how adequate this kernel is for the problem at hand, a permanent on-line modification of kernels causes concerns about the validity of the resulting algorithm. While the issue is most often intractable for MCMC algorithms, the equivalent version for importance sampling algorithms can be validated quite precisely. We derive sufficient convergence conditions for adaptive mixtures of population Monte Carlo algorithms and show that Rao--Blackwellized versions asymptotically achieve an optimum in terms of a Kullback divergence criterion, while more rudimentary versions do not benefit from repeated updating. Comment: Published at http://dx.doi.org/10.1214/009053606000001154 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Article
Sequential Monte Carlo (SMC) algorithms were originally designed for estimating intractable conditional expectations within state-space models, but are now routinely used to generate approximate samples in the context of general-purpose Bayesian inference. In particular, SMC algorithms are often used as subroutines within larger Monte Carlo schemes, and in this context, the demands placed on SMC are different: control of mean-squared error is insufficient—one needs to control the divergence from the target distribution directly. Towards this goal, we introduce the conditional adaptive resampling particle filter, building on the work of Gordon, Salmond, and Smith (1993), Andrieu, Doucet, and Holenstein (2010), and Whiteley, Lee, and Heine (2016). By controlling a novel notion of effective sample size, the ∞-ESS, we establish the efficiency of the resulting SMC sampling algorithm, providing an adaptive resampling extension of the work of Andrieu, Lee, and Vihola (2018). We apply our results to arrive at new divergence bounds for SMC samplers with adaptive resampling as well as an adaptive resampling version of the Particle Gibbs algorithm with the same geometric-ergodicity guarantees as its nonadaptive counterpart.
Article
The goal of importance sampling is to estimate the expected value of a given function with respect to a probability measure ν\nu using a random sample of size n drawn from a different probability measure μ\mu. If the two measures μ\mu and ν\nu are nearly singular with respect to each other, which is often the case in practice, the sample size required for accurate estimation is large. In this article it is shown that in a fairly general setting, a sample of size approximately exp(D(νμ))\exp(D(\nu||\mu)) is necessary and sufficient for accurate estimation by importance sampling, where D(νμ)D(\nu||\mu) is the Kullback--Leibler divergence of μ\mu from ν\nu. In particular, the required sample size exhibits a kind of cut-off in the logarithmic scale. The theory is applied to obtain a fairly general formula for the sample size required in importance sampling for exponential families (Gibbs measures). We also show that the standard variance-based diagnostic for convergence of importance sampling is fundamentally problematic. An alternative diagnostic that provably works in certain situations is suggested.
Article
Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of σ\sigma -algebras, and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.
Article
We provide a short overview of importance sampling—a popular sampling tool used for Monte Carlo computing. We discuss its mathematical foundation and properties that determine its accuracy in Monte Carlo approximations. We review the fundamental developments in designing efficient importance sampling (IS) for practical use. This includes parametric approximation with optimization-based adaptation, sequential sampling with dynamic adaptation through resampling and population-based approaches that make use of Markov chain sampling. Copyright © 2009 John Wiley & Sons, Inc. For further resources related to this article, please visit the WIREs website.
Article
The cross-entropy (CE) method is a new generic approach to combinatorial and multi-extremal optimization and rare event simulation. The purpose of this tutorial is to give a gentle introduction to the CE method. We present the CE methodology, the basic algorithm and its modifications, and discuss applications in combinatorial optimization and machine learning.
Article
This paper draws attention to a fundamental problem that occurs in applying importance sampling to ‘high-dimensional’ reliability problems, i.e., those with a large number of uncertain parameters. This question of applicability carries an important bearing on the potential use of importance sampling for solving dynamic first-excursion problems and static reliability problems for structures with a large number of uncertain structural model parameters. The conditions under which importance sampling is applicable in high dimensions are investigated, where the focus is put on the common case of standard Gaussian uncertain parameters. It is found that importance sampling densities using design points are applicable if the covariance matrix associated with each design point does not deviate significantly from the identity matrix. The study also suggests that importance sampling densities using random pre-samples are generally not applicable in high dimensions.
Article
We propose a methodology to sample sequentially from a sequence of probability distributions that are defined on a common space, each distribution being known up to a normalizing constant. These probability distributions are approximated by a cloud of weighted random samples which are propagated over time by using sequential Monte Carlo methods. This methodology allows us to derive simple algorithms to make parallel Markov chain Monte Carlo algorithms interact to perform global optimization and sequential Bayesian estimation and to compute ratios of normalizing constants. We illustrate these algorithms for various integration tasks arising in the context of Bayesian inference. Copyright 2006 Royal Statistical Society.
Article
. Simulated annealing --- moving from a tractable distribution to a distribution of interest via a sequence of intermediate distributions --- has traditionally been used as an inexact method of handling isolated modes in Markov chain samplers. Here, it is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler. The Markov chain aspect allows this method to perform acceptably even for high-dimensional problems, where finding good importance sampling distributions would otherwise be very difficult, while the use of importance weights ensures that the estimates found converge to the correct values as the number of annealing runs increases. This annealed importance sampling procedure resembles the second half of the previously-studied tempered transitions, and can be seen as a generalization of a recently-proposed variant of sequential importance sampling. It is also related to thermodynamic integration methods for estimating ratios...
Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability
  • A Dembo
  • O Zeitouni
A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability. Springer Berlin Heidelberg, 2009.