Mathias Rousset’s research while affiliated with French National Centre for Scientific Research and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (21)


Adaptive reduced tempering For Bayesian inverse problems and rare event simulation
  • Preprint

October 2024

·

2 Reads

Frederic Cerou

·

·

Mathias Rousset

This work proposes an adaptive sequential Monte Carlo sampling algorithm for solving inverse Bayesian problems in a context where a (costly) likelihood evaluation can be approximated by a surrogate, constructed from previous evaluations of the true likelihood. A rough error estimation of the obtained surrogates is required. The method is based on an adaptive sequential Monte-Carlo (SMC) simulation that jointly adapts the likelihood approximations and a standard tempering scheme of the target posterior distribution. This algorithm is well-suited to cases where the posterior is concentrated in a rare and unknown region of the prior. It is also suitable for solving low-temperature and rare-event simulation problems. The main contribution is to propose an entropy criteria that associates to the accuracy of the current surrogate a maximum inverse temperature for the likelihood approximation. The latter is used to sample a so-called snapshot, perform an exact likelihood evaluation, and update the surrogate and its error quantification. Some consistency results are presented in an idealized framework of the proposed algorithm. Our numerical experiments use in particular a reduced basis approach to construct approximate parametric solutions of a partially observed solution of an elliptic Partial Differential Equation. They demonstrate the convergence of the algorithm and show a significant cost reduction (close to a factor 10) for comparable accuracy.


Experiment #1. Left: true AMVs superimposed on the pair of incomplete image observations (black pixels correspond to missing data). Right: comparison to the MAP estimate.
Experiment #1. Comparison of the various endpoint error criteria (2.7)–(2.9) with respect to the sample size (N × L) for a chilled HMC simulation.
Experiment #1. Comparison of the different methods in terms of the endpoint error criteria (2.7) with (2.8) for p = 2 (left) or with (2.9) (right) with respect to the sample size N × L = 1×103 (H = 0.5, ζ = 1×10−6 ).
Experiment #1. Influence of the temperature ζ (left) and of the preconditioning parameter H (right) on the evolution of the endpoint error criteria (2.7) with (2.8) for p = 2.
Experiment #1. True versus expected errors obtained by the Laplace method or with chilled HMC ( N=1×102 , L = 10, H = 0.5, ζ = 1×10−6 ). The gray level values range in [0,lmax] , lmax being equal to the empirical mean plus standard deviation over Ωm of the true or expected errors (the values of ∥d⋆(s)−dˆ(s)∥2 are thresholded to lmax≈1.3 ).

+3

Chilled sampling for uncertainty quantification: a motivation from a meteorological inverse problem
  • Article
  • Publisher preview available

December 2023

·

11 Reads

·

2 Citations

Atmospheric motion vectors (AMVs) extracted from satellite imagery are the only wind observations with good global coverage. They are important features for feeding numerical weather prediction (NWP) models. Several Bayesian models have been proposed to estimate AMVs. Although critical for correct assimilation into NWP models, very few methods provide a thorough characterization of the estimation errors. The difficulty of estimating errors stems from the specificity of the posterior distribution, which is both very high dimensional, and highly ill-conditioned due to a singular likelihood, which becomes critical in particular in the case of missing data (unobserved pixels). Motivated by this difficult inverse problem, this work studies the evaluation of the (expected) estimation errors using gradient-based Markov chain Monte Carlo (MCMC) algorithms. The main contribution is to propose a general strategy, called here ‘chilling’, which amounts to sampling a local approximation of the posterior distribution in the neighborhood of a point estimate. From a theoretical point of view, we show that under regularity assumptions, the family of chilled posterior distributions converges in distribution as temperature decreases to an optimal Gaussian approximation at a point estimate given by the maximum a posteriori, also known as the Laplace approximation. Chilled sampling therefore provides access to this approximation generally out of reach in such high-dimensional nonlinear contexts. From an empirical perspective, we evaluate the proposed approach based on some quantitative Bayesian criteria. Our numerical simulations are performed on synthetic and real meteorological data. They reveal that not only the proposed chilling exhibits a significant gain in terms of accuracy of the AMV point estimates and of their associated expected error estimates, but also a substantial acceleration in the convergence speed of the MCMC algorithms.

View access options

Gradient-Informed Neural Network Statistical Robustness Estimation

April 2023

·

6 Reads

·

3 Citations

Deep neural networks are robust against random corruptions of the inputs to some extent. This global sense of safety is not sufficient in critical applications where probabilities of failure must be assessed with accuracy. Some previous works applied known statistical methods from the field of rare event analysis to classification. Yet, they use classifiers as black-box models without taking into account gradient information, readily available for deep learning models via auto-differentiation. We propose a new and highly efficient estimator of probabilities of failure dedicated to neural networks as it leverages the fast computation of gradients of the model through back-propagation.


Fluctuations of Rare Event Simulation with Monte Carlo Splitting in the Small Noise Asymptotics

December 2022

·

10 Reads

Diffusion processes with small noise conditioned to reach a target set are considered. The AMS algorithm is a Monte Carlo method that is used to sample such rare events by iteratively simulating clones of the process and selecting trajectories that have reached the highest value of a so-called importance function. In this paper, the large sample size relative variance of the AMS small probability estimator is considered. The main result is a large deviations logarithmic equivalent of the latter in the small noise asymptotics, which is rigorously derived. It is given as a maximisation problem explicit in terms of the quasi-potential cost function associated with the underlying small noise large deviations. Necessary and sufficient geometric conditions ensuring the vanishing of the obtained quantity ('weak' asymptotic efficiency) are provided. Interpretations and practical consequences are discussed.


Entropy minimizing distributions are worst-case optimal importance proposals

December 2022

·

8 Reads

Importance sampling of target probability distributions belonging to a given convex class is considered. Motivated by previous results, the cost of importance sampling is quantified using the relative entropy of the target with respect to proposal distributions. Using a reference measure as a reference for cost, we prove under some general conditions that the worst-case optimal proposal is precisely given by the distribution minimizing entropy with respect to the reference within the considered convex class of distributions. The latter conditions are in particular satisfied when the convex class is defined using a push-forward map defining atomless conditional measures. Applications in which the optimal proposal is Gibbsian and can be practically sampled using Monte Carlo methods are discussed.


Uncertainty of Atmospheric Motion Vectors by Sampling Tempered Posterior Distributions

July 2022

·

12 Reads

Atmospheric motion vectors (AMVs) extracted from satellite imagery are the only wind observations with good global coverage. They are important features for feeding numerical weather prediction (NWP) models. Several Bayesian models have been proposed to estimate AMVs. Although critical for correct assimilation into NWP models, very few methods provide a thorough characterization of the estimation errors. The difficulty of estimating errors stems from the specificity of the posterior distribution, which is both very high dimensional, and highly ill-conditioned due to a singular likelihood, which becomes critical in particular in the case of missing data (unobserved pixels). This work studies the evaluation of the expected error of AMVs using gradient-based Markov Chain Monte Carlo (MCMC) algorithms. Our main contribution is to propose a tempering strategy, which amounts to sampling a local approximation of the joint posterior distribution of AMVs and image variables in the neighborhood of a point estimate. In addition, we provide efficient preconditioning with the covariance related to the prior family itself (fractional Brownian motion), with possibly different hyper-parameters. From a theoretical point of view, we show that under regularity assumptions, the family of tempered posterior distributions converges in distribution as temperature decreases to an {optimal} Gaussian approximation at a point estimate given by the Maximum A Posteriori (MAP) log-density. From an empirical perspective, we evaluate the proposed approach based on some quantitative Bayesian evaluation criteria. Our numerical simulations performed on synthetic and real meteorological data reveal a significant gain in terms of accuracy of the AMV point estimates and of their associated expected error estimates, but also a substantial acceleration in the convergence speed of the MCMC algorithms.


Comparison of proposal distributions. On the left, we plot the value of Cm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_m$$\end{document}, the expected number of samples before acceptance, as a function of m, for the five proposal distributions discussed above. On the right we plot the empirical time (in nanoseconds) used by our implementation of the various methods. Note that the Gaussian proposal is in practice, for our implementation, a little slower than its competitors. From both point of views, the minimum of the curves stays uniformly bounded
Examples of trajectories with the same (approximate) time length. The velocity-jump process is compared to the Hamiltonian limit computed with a Verlet scheme. Various ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} are compared
Same as 2 but for a longer trajectory
Examples of long trajectories for the (non-irreducible) unit Gaussian for, from left to right, ε∈.01,1,100\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon \in \left\{ .01,1,100\right\} $$\end{document}
Box plots of samples obtained with fixed number of force evaluation n=105\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=10^5$$\end{document}. Comparison between: ε∈{10-2,10-1,1,10,102}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon \in \{10^{-2},10^{-1},1,10,10^2\}$$\end{document} on the horizontal axis, as well as symmetric versus asymmetric potentials—eigenvalues ratio 1 (up chart), 1.05 (left chart) and 5 (right chart). Observe: (i) the bias due to lack of ergodicity in the symmetric case, (iii) a decrease of variance in the λ=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda =5$$\end{document} very asymmetric case, and (iii) an efficiency which seems optimal for various non-extremal values of ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}
Exact targeting of gibbs distributions using velocity-jump processes

March 2022

·

21 Reads

·

6 Citations

Stochastics and Partial Differential Equations: Analysis and Computations

This work introduces and studies a new family of velocity jump Markov processes directly amenable to exact simulation with the following two properties: (i) trajectories converge in law, when a time-step parameter vanishes, towards a given Langevin or Hamiltonian dynamics; (ii) the stationary distribution of the process is always exactly given by the product of a Gaussian (for velocities) by any target log-density. The simulation itself, in addition to the computability of the gradient of the log-density, depends on the knowledge of appropriate explicit upper bounds on lower order derivatives of this log-density. The process does not exhibit any velocity reflections (maximum size of jumps can be controlled) and is suitable for the ’factorization method’. We provide rigorous mathematical proofs of the convergence towards Hamiltonian/Langevin dynamics when the time step vanishes, and of the exponentially fast convergence towards the target distribution when a suitable noise on velocities is present. Numerical implementation is detailed and illustrated.


Efficient Statistical Assessment of Neural Network Corruption Robustness

December 2021

·

7 Reads

·

16 Citations

We quantify the robustness of a trained network to input uncertainties with a stochastic simulation inspired by the field of Statistical Reliability Engineering. The robustness assessment is cast as a statistical hypothesis test: the network is deemed as locally robust if the estimated probability of failure is lower than a critical level. The procedure is based on an Importance Splitting simulation generating samples of rare events. We derive theoretical guarantees that are non-asymptotic w.r.t. sample size. Experiments tackling large scale networks outline the efficiency of our method making a low number of calls to the network function.




Citations (11)


... Concurrently, an alternative class of techniques has emerged centered on importance sampling [36,46,50]. While these approaches offer theoretical validity, they encounter significant implementation challenges in practice. ...

Reference:

Towards Robust LLMs: an Adversarial Robustness Measurement Framework
Efficient Statistical Assessment of Neural Network Corruption Robustness

... Synergistic use of both methods is adopted by NOAA's operational algorithm for AMV production. It is worth noting that sampling approaches (Héas et al. 2023a) for estimating AMVs together with their errors are important for quantitative applications such as improving forecasts through assimilating these winds into NWP models, and such approaches should be further explored and applied to wind estimation. ...

Chilled sampling for uncertainty quantification: a motivation from a meteorological inverse problem

... 49 This hybrid model still samples exactly from μ but can be simulated using a numerical splitting scheme that requires fewer gradient computations per time step, with a precision of the same order (in the time step) as the classical splitting schemes of the Langevin diffusion such as BAOAB. Moreover, it can be tuned to be arbitrarily close to the Langevin dynamics in terms of stochastic trajectories (see [ 50 Theorem 3.6]), which makes it suitable to estimate the dynamical properties of the process (with, of course, a trade-off between the accuracy of these dynamical properties and the numerical cost of the simulation, as would be with any numerical approximation of the Langevin equation). Finally, this versatile framework is parallelizable (allowing GPU implementations) and can be combined with the multi-time-step methods, pushing further the computational speedup while avoiding some of the resonance issues of the latest. ...

Exact targeting of gibbs distributions using velocity-jump processes

Stochastics and Partial Differential Equations: Analysis and Computations

... In [7], a similar formula is given for the large sample size variance of all estimators, see Corollary 2.8 and Theorem 2.13. The extension to the case k > 1 under the same assumptions, where k is fixed and N → +∞ can be obtained using the results of [10]. ...

On synchronized Fleming–Viot particle systems
  • Citing Article
  • March 2021

Theory of Probability and Mathematical Statistics

... The behavior of the Langevin dynamics (1.1) depends on the value of the friction parameter γ. The overdamped limit γ → ∞ is well understood; in this limit, the rescaled position process (q γt ) t 0 converges, weakly in the space of continuous functions [51] and almost surely uniformly over compact subintervals of [0, ∞) [34,Theorem 10.1], to the solution of the overdamped Langevin equation ...

A weak overdamped limit theorem for Langevin processes
  • Citing Article
  • January 2020

Latin American Journal of Probability and Mathematical Statistics

... Estimating the probabilities of rare but impactful events, an important problem throughout science and engineering, is usually done through -generally expensive -Monte Carlo methods such as importance sampling [1] or importance splitting methods [2,3]. A simple alternative approach, which is principled and sampling-free but only asymptotically exact under certain assumptions, consists of using a Laplace approximation, see e.g. ...

Adaptive Multilevel Splitting: Historical Perspective and Recent Results
  • Citing Article
  • April 2019

... = τ {ξ l} (x), the estimator of the rare event probability associated with level l, where I N l is the random number of iterations required so that all clones have reached the target set {ξ l}. The estimator p N l,ams (as well as other nonnormalized estimators) is unbiased E p N l,ams = p ε l (see [7,2]). The empirical distribution of clones at iteration I = Law(X ε | τ l (X ε ) < τ A (X ε )). ...

On the Asymptotic Normality of Adaptive Multilevel Splitting
  • Citing Article
  • April 2018

SIAM/ASA Journal on Uncertainty Quantification

... The convergence result q ǫ Ñ q 0 is often called a Smoluchowski-Kramers diffusion approximation result in the literature. If σ is constant and equal to the identity, and if f "´∇V for some potential energy function V : R d Ñ R, the SDE system (1) describes the Langevin dynamics, whereas the SDE (2) describes the overdamped Langevin dynamics, see for instance [14,Sections 2.2.3 and 2.2.4], and also the recent article [20] and references therein. In order to define numerical schemes which perform better than crude methods when ǫ varies and may vanish, it is relevant to resort to the notion of asymptotic preserving schemes as studied in the recent article [4]: if ∆t " T {N denotes the time-step size with given T P p0, 8q and N P N, one has a commutative 1 diagram property ...

A Weak Overdamped Limit Theorem for Langevin Processes
  • Citing Article
  • September 2017

Latin American Journal of Probability and Mathematical Statistics

... Moreover, we only consider soft killing at some continuous rate, and no hard killing which would correspond to the case where T is the escape time from some sub-domain (see e.g. [4,21]). Finally, as will be seen below, as far as the long-time behaviour of the process is concerned we will work in a perturbative regime, namely we will assume that the variations of λ are small with respect to the mixing time of the diffusion (1.1) (while λ ∞ itself is not required to be small). ...

A Central Limit Theorem for Fleming-Viot Particle Systems with Hard Killing
  • Citing Article
  • September 2017