Article

Laplace's Method in Bayesian Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This algorithm combines approximation techniques inspired by the Kalman filter and the Laplace method. The Laplace method is an integral approximation method, which has many applications in Bayesian estimation problems [4]. In the KLF, at each time iteration, the prediction step is performed like in the EKF and the update step is performed thanks to the Laplace method. ...
... This algorithm includes features from the EKF and features from the Laplace method. The Laplace method is an integral approximation method, which has been widely used for Bayesian estimation, see [4] for example. We first describe it in a general framework. ...
Conference Paper
Full-text available
We propose a new recursive algorithm for nonlinear Bayesian filtering, where the prediction step is performed like in the extended Kalman filter, and the update step is done thanks to the Laplace method for integral approximation. This algorithm is called the Kalman Laplace filter (KLF). The KLF provides a closed–form non–Gaussian approximation of the posterior density. The hidden state is estimated by the maximum a posteriori, using a dimension reduction method to alleviate the computation cost of the maximization. The KLF is tested on three simulated nonlinear filtering problems: target tracking with angle measurements, population dynamics monitoring, motion reconstruction by neural decoding. It exhibits a good performance, especially when the observation noise is small.
... Thus it can remain computationally tractable even when the stimulus space is very high-dimensional. The MAP is a good estimator when the Laplace (also known as saddle-point) approximation is justified, i.e., the posterior distribution is well-approximated by a Gaussian distribution centered atˆxatˆ atˆx MAP (Tierney and Kadane, 1986; Kass et al., 1991 ). As the mode and the mean of a Gaussian distribution are identical, in this case the MAP is approximately equal to the posterior mean as well. ...
... This approximation (also known as saddle point approximation), is a general asymptotic method of approximating integrals when the integrand peaks sharply at its global maximum and is exponentially suppressed away from it. In the Bayesian setting this corresponds to posterior integrals of interest (e.g., posterior averages, and so-called Bayes factors) receiving their dominant contribution from the vicinity of the main mode of p(x|r, θ), i.e., x MAP (for a comprehensive review of Laplace's method in Bayesian applications see Kass et al. (1991)). In that case, we can Taylor expand the log-posterior to the first non-vanishing order around x MAP (i.e., the second order, since the derivative vanishes at the maximum), obtaining the Gaussian approximation ...
Article
Full-text available
Stimulus reconstruction or decoding methods provide an important tool for understanding how sensory and motor information is represented in neural activity. We discuss Bayesian decoding methods based on an encoding generalized linear model (GLM) that accurately describes how stimuli are transformed into the spike trains of a group of neurons. The form of the GLM likelihood ensures that the posterior distribution over the stimuli that caused an observed set of spike trains is log concave so long as the prior is. This allows the maximum a posteriori (MAP) stimulus estimate to be obtained using efficient optimization algorithms. Unfortunately, the MAP estimate can have a relatively large average error when the posterior is highly nongaussian. Here we compare several Markov chain Monte Carlo (MCMC) algorithms that allow for the calculation of general Bayesian estimators involving posterior expectations (conditional on model parameters). An efficient version of the hybrid Monte Carlo (HMC) algorithm was significantly superior to other MCMC methods for gaussian priors. When the prior distribution has sharp edges and corners, on the other hand, the “hit-and-run” algorithm performed better than other MCMC methods. Using these algorithms, we show that for this latter class of priors, the posterior mean estimate can have a considerably lower average error than MAP, whereas for gaussian priors, the two estimators have roughly equal efficiency. We also address the application of MCMC methods for extracting nonmarginal properties of the posterior distribution. For example, by using MCMC to calculate the mutual information between the stimulus and response, we verify the validity of a computationally efficient Laplace approximation to this quantity for gaussian priors in a wide range of model parameters; this makes direct model-based computation of the mutual information tractable even in the case of large observed neural populations, where methods based on binning the spike train fail. Finally, we consider the effect of uncertainty in the GLM parameters on the posterior estimators.
... Our ÿrst method of calculation of normalizing constants is Laplace's method [22]. Brie y, a ÿrst-order Taylor expansion of the log-posterior is approximated as a multivariate normal with mode equal to the posterior mode, and covariance equal to the negative of the Hessian evaluated at the mode. ...
... Calculation of the normalizing constant can be a di cult task in higher dimensions. Use of Laplace's method is often e ective when the posterior is approximately multivariate normal [22]. In this case, because of the non-normality of the posterior distributions, normalizing constants are based on the bridge sampling method. ...
Article
We describe a methodology for model comparison in a Bayesian framework as applied to survival with a surviving fraction. This is illustrated using a case study of a randomized and controlled clinical trial investigating time until recurrence of depression. Posterior distributions are simulated using Metropolis-within-Gibbs Markov chain methods. Models reflecting the effects of covariates on the log odds of being in the surviving fraction, the log of the hazard rate, as well as both and neither are compared. Bayes factors for comparing the models are obtained by using the bridge sampling method of calculating normalizing constants.
... These computational difficulties can be circumvented, at least in some situations, by using Laplace's method to approximate an integral. Both [7,8] used the Laplace approximation to obtain computationally fast and accurate values for posterior means and other quantities. The Laplace approximation works under the condition that the likelihood function can be approximated by a quadratic function. ...
Article
Full-text available
Due to the high dimensional integration over latent variables, computing marginal likelihood and posterior distributions for the parameters of a general hierarchical model is a difficult task. The Markov Chain Monte Carlo (MCMC) algorithms are commonly used to approximate the posterior distributions. These algorithms, though effective, are computationally intensive and can be slow for large, complex models. As an alternative to the MCMC approach, the Laplace approximation (LA) has been successfully used to obtain fast and accurate approximations to the posterior mean and other derived quantities related to the posterior distribution. In the last couple of decades, LA has also been used to approximate the marginal likelihood function and the posterior distribution. In this paper, we show that the bias in the Laplace approximation to the marginal likelihood has substantial practical consequences.
... In our study, we also consider a Gaussian approximation (GA) method to address this issue. The GA method, also known as the Laplace approximation, is discussed in [67,68]. First, using the density Equation (6), we can derive the mode estimation for ⃗ U m as follows: ...
Article
Full-text available
This paper proposes a parametric hierarchical model for functional data with an elliptical shape, using a Gaussian process prior to capturing the data dependencies that reflect systematic errors while modeling the underlying curved shape through a von Mises–Fisher distribution. The model definition, Bayesian inference, and MCMC algorithm are discussed. The effectiveness of the model is demonstrated through the reconstruction of curved trajectories using both simulated and real-world examples. The discussion in this paper focuses on two-dimensional problems, but the framework can be extended to higher-dimensional spaces, making it adaptable to a wide range of applications.
... Set u j = w j + ϕ v j 8: end for Under the conjecture that η y β approaches a Gaussian measure in the limit of large data and small noise, we propose to replace Step 6 of Algorithm 1 with a Gaussian approximation step at the mode; this is sometimes referred to as the Laplace approximation to η y β (Kass et al. 1991). More precisely, letting z y β be a mode of η y β obtained by solving (15), we define the Gaussian measure ...
Article
Full-text available
The article presents a systematic study of the problem of conditioning a Gaussian random variable ξ\xi on nonlinear observations of the form Fϕ(ξ)F \circ {\boldsymbol{\phi }}(\xi ) where ϕ:XRN{\boldsymbol{\phi }}: \mathcal {X}\rightarrow \mathbb {R}^N is a bounded linear operator and F is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable ξFϕ(ξ)\xi \mid F\circ {\boldsymbol{\phi }}(\xi ), stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.
... On the other hand, local approximation relies on the parametric space, particularly employing Gaussian parameters. This includes linearizing nonlinear functions through Taylor expansion, as seen in the Extended Kalman Filter (EKF) [9], employing statistical approximations based on the unscented transform and moment matching [10], and using the Laplace approximation [11]. ...
Article
Full-text available
Probabilistic state estimation is essential for robots navigating uncertain environments. Accurately and efficiently managing uncertainty in estimated states is key to robust robotic operation. However, nonlinearities in robotic platforms pose significant challenges that require advanced estimation techniques. Gaussian variational inference (GVI) offers an optimization perspective on the estimation problem, providing analytically tractable solutions and efficiencies derived from the geometry of Gaussian space. We propose a Sequential Gaussian Variational Inference (S-GVI) method to address nonlinearity and provide efficient sequential inference processes. Our approach integrates sequential Bayesian principles into the GVI framework, which are addressed using statistical approximations and gradient updates on the information geometry. Validations through simulations and real-world experiments demonstrate significant improvements in state estimation over the Maximum A Posteriori (MAP) estimation method.
... and use Laplace's approximation [27] to construct the initial mean and covariance in (28), as shown in the subsequent equation: ...
Preprint
Practical Bayes filters often assume the state distribution of each time step to be Gaussian for computational tractability, resulting in the so-called Gaussian filters. When facing nonlinear systems, Gaussian filters such as extended Kalman filter (EKF) or unscented Kalman filter (UKF) typically rely on certain linearization techniques, which can introduce large estimation errors. To address this issue, this paper reconstructs the prediction and update steps of Gaussian filtering as solutions to two distinct optimization problems, whose optimal conditions are found to have analytical forms from Stein's lemma. It is observed that the stationary point for the prediction step requires calculating the first two moments of the prior distribution, which is equivalent to that step in existing moment-matching filters. In the update step, instead of linearizing the model to approximate the stationary points, we propose an iterative approach to directly minimize the update step's objective to avoid linearization errors. For the purpose of performing the steepest descent on the Gaussian manifold, we derive its natural gradient that leverages Fisher information matrix to adjust the gradient direction, accounting for the curvature of the parameter space. Combining this update step with moment matching in the prediction step, we introduce a new iterative filter for nonlinear systems called Natural Gradient Gaussian Approximation filter, or NANO filter for short. We prove that NANO filter locally converges to the optimal Gaussian approximation at each time step. The estimation error is proven exponentially bounded for nearly linear measurement equation and low noise levels through constructing a supermartingale-like inequality across consecutive time steps.
... Approach #1: Laplace approximation. The Laplace approximation [54] provides Gaussian approximations of the individual posteriors. The Laplace approximation is obtained by taking the second-order Taylor expansion around the maximum a posteriori (MAP) estimate. ...
Article
Full-text available
Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) are two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.
... A derivation can be found in appendix A.4. We propose to model the probability of the adapted parameters given each support point as a Gaussian distribution, N (θ i , Σ i ), by means of Laplace's approximation (Kass et al., 1991). In particular, we define the maximum a posteriori (MAP) estimate to be the GBML original formulation i.e.,θ i = θ 0 − α∇ θ0 L(θ 0 , D S i ). ...
Preprint
Full-text available
Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.
... On the other hand, local approximation relies on the parametric space, particularly employing Gaussian parameters. This includes linearizing nonlinear functions through Taylor expansion, as seen in the Extended Kalman Filter (EKF) [11], employing statistical approximations based on the unscented transform and moment matching [12], and using the Laplace approximation [13]. ...
Preprint
Probabilistic state estimation is essential for robots navigating uncertain environments. Accurately and efficiently managing uncertainty in estimated states is key to robust robotic operation. However, nonlinearities in robotic platforms pose significant challenges that require advanced estimation techniques. Gaussian variational inference (GVI) offers an optimization perspective on the estimation problem, providing analytically tractable solutions and efficiencies derived from the geometry of Gaussian space. We propose a Sequential Gaussian Variational Inference (S-GVI) method to address nonlinearity and provide efficient sequential inference processes. Our approach integrates sequential Bayesian principles into the GVI framework, which are addressed using statistical approximations and gradient updates on the information geometry. Validations through simulations and real-world experiments demonstrate significant improvements in state estimation over the Maximum A Posteriori (MAP) estimation method.
... We may then be motivated to consider a Taylor expansion of L about this point, dropping the prior Π 1 , to estimate the distribution near the maximum. Going up to second-order is equivalent to approximating L as a multivariate normal -Laplace's Method: 33,34 ...
Preprint
The design of astronomical hardware operating at the diffraction limit requires optimization of physical optical simulations of the instrument with respect to desired figures of merit, such as throughput or astrometric accuracy. These systems can be high dimensional, with highly nonlinear relationships between outputs and the adjustable parameters of the hardware. In this series of papers we present and apply dLux, an open-source end-to-end differentiable optical modelling framework. Automatic differentiation enables not just efficient high-dimensional optimization of astronomical hardware designs, but also Bayesian experimental design directly targeting the precision of experimental outcomes. Automatic second derivatives enable the exact and numerically stable calculation of parameter covariance forecasts, and higher derivatives of these enable direct optimization of these forecasts. We validate this method against analytic theory and illustrate its utility in evaluating the astrometric precision of a parametrized telescope model, and the design of a diffractive pupil to achieve optimal astrometric performance for exoplanet searches. The source code and tutorial software are open source and publicly available, targeting researchers who may wish to harness dLux for their own optical simulation problems.
... In short, applications of Laplace's method can be found throughout statistics, probability, and machine learning (not to mention its uses in mathematics and physics). Examples include Bayesian computation and inference (e.g., Kass et al. (1991)), higher-order asymptotics (e.g., Shun and McCullagh (1995)), mean-field theory (where Laplace's method is often called the saddle point approximation, e.g., Mézard and Montanari (2009), Chapter 2), Gaussian processes (e.g., Rasmussen and Williams (2006), Chapter 3), and Bayesian deep learning (e.g., Daxberger et al. (2021)). Now we present a formal statement of the validity of Laplace's approximation (11). ...
Preprint
Full-text available
We study approximations to the Moreau envelope -- and infimal convolutions more broadly -- based on Laplace's method, a classical tool in analysis which ties certain integrals to suprema of their integrands. We believe the connection between Laplace's method and infimal convolutions is generally deserving of more attention in the study of optimization and partial differential equations, since it bears numerous potentially important applications, from proximal-type algorithms to solving Halmiton-Jacobi equations.
... Set uj = wj + φ T vj 8: end for Under the conjecture that η y β approaches a Gaussian measure in the limit of large data and small noise, we propose to replace Step 6 of Algorithm 1 with a Gaussian approximation step at the mode; this is sometimes referred to as the Laplace approximation to η y β [91]. More precisely, letting z y β be a mode of η y β obtained by solving (13), we define the Gaussian measure ...
Preprint
The article presents a systematic study of the problem of conditioning a Gaussian random variable ξ\xi on nonlinear observations of the form Fϕ(ξ)F \circ \phi(\xi) where ϕ:XRN\phi: \mathcal{X} \to \mathbb{R}^N is a bounded linear operator and F is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable ξFϕ(ξ)\xi \mid F\circ \phi(\xi), stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.
... Using Laplace's method, the MLE covariance matrix Cov [x − x] is approximately the inverse of the NLS Hessian [26]. The covariance matrix related to only the anchors can be simply cropped out from the complete covariance matrix of all device positions by selecting the corresponding rows and columns. ...
Article
Full-text available
We show how the state of use of Ultra-Wide-Band (UWB) system is improved by removing systematic errors (bias) on device-level to improve accuracy and apply simple procedure to automate calibration process on the system-level to reduce manual efforts. On device-level, we discern the different sources of bias and establish a method that determines their values, for specific hardware and for individual devices. Our comprehensive approach includes simple, easy-to-implement methodologies for compensating these biases, resulting in a significant improvement in ranging accuracy. The mean ranging error has been reduced from 0.15 m to 0.007 m, and the 3-sigma error margin has decreased from 0.277 m to approximately 0.103 m. To demonstrate this, a dedicated test setup was built. On system-level, we developed a method that avoids measuring all anchor positions one by one by exploiting increased redundancy from anchor-to-anchor and anchor-to-tag ranges, and automatically calculating the anchors topology (relative positions between each other). Nonlinear least squares (NLS) provides the maximum likelihood estimate (MLE) of the anchor positions and their uncertainty. This approach not only refines the accuracy of tag localization but also offers a predictive measure of its uncertainty, giving users a clearer understanding of the system's capabilities in real-world scenarios. This system-level enhancement is further complemented by the integration of a ranging protocol called Automatic UWB Ranging Any-to-any (AURA), which offers additional layers of flexibility, reliability and ease of deployment to the UWB localization process.
... Note that in Bayesian design, the utility is always a function of the posterior distribution, which generally means that inference is performed many thousands of times since the posterior p(θ | y, d) must be evaluated for each future data set {θ (m) , y (m) } that is drawn from the joint distribution p(θ, y | d). As a computationally efficient approximation to the posterior distribution, we employ the Laplace approximation (Kass et al., 1991) which has the following form: ...
Article
Full-text available
Optimal design facilitates intelligent data collection. In this paper, we introduce a fully Bayesian design approach for spatial processes with complex covariance structures, like those typically exhibited in natural ecosystems. Coordinate exchange algorithms are commonly used to find optimal design points. However, collecting data at specific points is often infeasible in practice. Currently, there is no provision to allow for flexibility in the choice of design. Accordingly, we also propose an approach to find Bayesian sampling windows, rather than points, via Gaussian process emulation to identify regions of high design efficiency across a multi-dimensional space. These developments are motivated by two ecological case studies: monitoring water temperature in a river network system in the northwestern United States and monitoring submerged coral reefs off the north-west coast of Australia.
... Using Laplace's method, the MLE covariance matrix Cov [x − x] is approximately the inverse of the NLS Hessian [13]. The covariance matrix related to only the anchors can be simply cropped out from the complete covariance matrix of all device positions by selecting the relevant rows and columns. ...
Conference Paper
Full-text available
We show how accuracy of Ultra-Wide-Band (UWB) localization is improved by removing systematic errors (bias) on both device-level and system-level. On device-level, we discern the different sources of bias and establish a method that determines their values, for specific hardware and for individual devices. To this end, a dedicated test setup was built. On system-level, we developed a method that avoids measuring all anchor positions one by one by exploiting increased redundancy from anchor-to-anchor and anchor-to-tag ranges, and automatically calculating the anchors topology (relative positions between each other). Nonlinear least squares (NLS) provides the maximum likelihood estimate (MLE) of the anchor positions and their uncertainty. This allows to also provide an accuracy estimate for the resulting tag localization.
... The Laplace approximation 52 provides Gaussian approximations of the individual posteriors. The Laplace approximation is obtained by taking the second-order Taylor expansion around the maximum a posteriori (MAP) estimate found by maximum likelihood estimation (MLE). ...
Preprint
Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) are two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.
... In particular, as n diverges I =Ĩ(1 + O n −1 ), see e.g. (Tierney et al., 1989;Kass et al., 1991). Eq. (5) can be applied to any regular statistical model and stands as a viable general approach for evaluating the marginal likelihoods involved in the definitions of Bayes factors with an approximation error of order O n −1 . ...
Preprint
Full-text available
This paper introduces a feasible and practical Bayesian method for unit root testing in financial time series. We propose a convenient approximation of the Bayes factor in terms of the Bayesian Information Criterion as a straightforward and effective strategy for testing the unit root hypothesis. Our approximate approach relies on few assumptions, is of general applicability, and preserves a satisfactory error rate. Among its advantages, it does not require the prior distribution on model's parameters to be specified. Our simulation study and empirical application on real exchange rates show great accordance between the suggested simple approach and both Bayesian and non-Bayesian alternatives.
... The computation of the expected information gain, however, may become extremely expensive when the outcomes of the experiment are modeled as functions of the solution of Partial Differential Equations (PDEs). Using the Laplace approximation [4][5][6][7], we proposed in [1] a fast approach for the estimation of the expected information gain and analyzed the rates of different dominant error terms with respect to the amount of data in each experimental scenario, provided that the parameters can be determined completely by the experiments in the sense that a single dominant maximum a posteriori probability (MAP) estimate exists. When both the determinant of the posterior covariance matrix and the prior probability density functions (pdf) have enough regularities with respect to the random parameters, we demonstrated, with several nonlinear examples involving the solutions of PDEs, that sparse quadratures can be employed to carry out the resulting integrations with high efficiency. ...
Article
In [1], a new method based on the Laplace approximation was developed to accelerate the es-timation of the post-experimental expected information gains (Kullback-Leibler divergence) in model parameters and predictive quantities of interest in the Bayesian framework. A closed-form asymptotic approximation of the inner integral and the order of the correspond-ing dominant error term were obtained in the cases where the parameters are determined by the experiment. In this work, we extend that method to the general case where the model parameters cannot be determined completely by the data from the proposed exper-iments. We carry out the Laplace approximations in the directions orthogonal to the null space of the Jacobian matrix of the data model with respect to the parameters, so that the information gain can be reduced to an integration against the marginal density of the transformed parameters that are not determined by the experiments. Furthermore, the ex-pected information gain can be approximated by an integration over the prior, where the integrand is a function of the posterior covariance matrix projected over the aforementioned orthogonal directions. To deal with the issue of dimensionality in a complex problem, we use either Monte Carlo sampling or sparse quadratures for the integration over the prior probability density function, depending on the regularity of the integrand function. We demonstrate the accuracy, efficiency and robustness of the proposed method via several nonlinear under-determined test cases. They include the designs of the scalar parameter in a one dimensional cubic polynomial function with two unidentifiable parameters forming a linear manifold, and the boundary source locations for impedance tomography in a square domain, where the unknown parameter is the conductivity, which is represented as a random field.
... Under some regularity conditions, Tierney and Kadane obtained in [18] Laplace-based second-order approximations to the posterior expectation of real positive functions on multi-dimensional parameter spaces, together with their accuracy, which they called fully exponential forms. These results were then generalized and reviewed in their joint work with Kass [19][20][21], prompting a fruitful line of research that is still active today. We refer to the papers of Tierney, Kadane and Kass for the statements of the conditions under which the approximations employed in this work are valid. ...
... Under some regularity conditions, Tierney and Kadane obtained in [18] Laplace-based second-order approximations to the poster- ior expectation of real positive functions on multi-dimensional parameter spaces, together with their accuracy, which they called fully exponential forms. These results were then generalized and re- viewed in their joint work with Kass [19][20][21], prompting a fruitful line of research that is still active today. We refer to the papers of Tierney, Kadane and Kass for the statements of the conditions un- der which the approximations employed in this work are valid. ...
... The computation of the expected information gain, however, may become extremely expensive when the outcomes of the experiment are modeled as functions of the solution of Partial Differential Equations (PDEs). Using the Laplace approximation [3][4][5][6], we proposed in [7] a fast approach for the estimation of the expected information gain and analyzed the rates of different dominant error terms with respect to the amount of data in each experimental scenario, provided that the parameters can be determined completely by the experiments in the sense that a single dominant maximum a posteriori probability (MAP) estimate exists. When both the determinant of the posterior covariance matrix and the prior probability density functions (pdf) have enough regularities with respect to the random parameters, we demonstrated, with several nonlinear examples involving the solutions of PDEs, that sparse quadratures can be employed to carry out the resulting integrations with high efficiency. ...
... The computation of the expected information gain, however, may become extremely expensive when the outcomes of the experiment are modeled as functions of the solution of Partial Differential Equations (PDEs). Using the Laplace approximation [3][4][5][6], we proposed in [7] a fast approach for the estimation of the expected information gain and analyzed the rates of different dominant error terms with respect to the amount of data in each experimental scenario, provided that the parameters can be determined completely by the experiments in the sense that a single dominant maximum a posteriori probability (MAP) estimate exists. When both the determinant of the posterior covariance matrix and the prior probability density functions (pdf) have enough regularities with respect to the random parameters, we demonstrated, with several nonlinear examples involving the solutions of PDEs, that sparse quadratures can be employed to carry out the resulting integrations with high efficiency. ...
... Under some regularity conditions, Tierney and Kadane obtained in [18] Laplace-based second-order approximations to the posterior expectation of real positive functions on multi-dimensional parameter spaces, together with their accuracy, which they called fully exponential forms. These results were then generalized and reviewed in their joint work with Kass [19][20][21], prompting a fruitful line of research that is still active today. We refer to the papers of Tierney, Kadane and Kass for the statements of the conditions under which the approximations employed in this work are valid. ...
Article
Shannon-type expected information gain can be used to evaluate the relevance of a proposed experiment subjected to uncertainty. The estimation of such gain, however, relies on a double-loop integration. Moreover, its numerical integration in multi-dimensional cases, e.g., when using Monte Carlo sampling methods, is therefore computationally too expensive for realistic physical models, especially for those involving the solution of partial differential equations. In this work, we present a new methodology, based on the Laplace approximation for the integration of the posterior probability density function (pdf), to accelerate the estimation of the expected information gains in the model parameters and predictive quantities of interest. We obtain a closed-form approximation of the inner integral and the corresponding dominant error term in the cases where parameters are determined by the experiment, such that only a single-loop integration is needed to carry out the estimation of the expected information gain. To deal with the issue of dimensionality in a complex problem, we use a sparse quadrature for the integration over the prior pdf. We demonstrate the accuracy, efficiency and robustness of the proposed method via several nonlinear numerical examples, including the designs of the scalar parameter in a one-dimensional cubic polynomial function, the design of the same scalar in a modified function with two indistinguishable parameters, the resolution width and measurement time for a blurred single peak spectrum, and the boundary source locations for impedance tomography in a square domain.
... The path integral method presented here can be regarded as a functional version of the Laplace approximation used in the field of statistics and machine learning (Kass, Tierney, & Kadane, 1991;Rasmussen & Williams, 2006). ...
Article
Full-text available
In many cortical areas, neural spike trains do not follow a Poisson process. In this study, we investigate a possible benefit of non-Poisson spiking for information transmission by studying the minimal rate fluctuation that can be detected by a Bayesian estimator. The idea is that an inhomogeneous Poisson process may make it difficult for downstream decoders to resolve subtle changes in rate fluctuation, but by using a more regular non-Poisson process, the nervous system can make rate fluctuations easier to detect. We evaluate the degree to which regular firing reduces the rate fluctuation detection threshold. We find that the threshold for detection is reduced in proportion to the coefficient of variation of interspike intervals.
... The path integral method presented here can be regarded as a functional version of the Laplace approximation used in the field of machine learning [22, 23, 24] ...
Article
Full-text available
We investigate a Bayesian method for systematically capturing the underlying fir-ing rate of a neuron. Any method of rate estimation requires a prior assumption about the flatness of the underlying rate, which can be represented by a Gaussian process prior. This Bayesian framework enables us to adjust the very assump-tion by taking into account the distribution of raw data: A hyperparameter of the Gaussian process prior is selected so that the marginal likelihood is maximized. It takes place that this hyperparameter diverges for spike sequences derived from a moderately fluctuating rate. By utilizing the path integral method, we demonstrate two cases that exhibit the divergence continuously and discontinuously.
... However, if the voltage noise parameter, σ, is small enough, the integrand in Eq. (12) (considered as a function of V ) will be sharply peaked around its maximum, allowing us to approximate the integral using the Laplace method (see, e.g., Kass et al. (1991) and Berger (1993)). To a first approximation, this will result in ...
Article
Full-text available
Recent advances in experimental stimulation methods have raised the following important computational question: how can we choose a stimulus that will drive a neuron to output a target spike train with optimal precision, given physiological constraints? Here we adopt an approach based on models that describe how a stimulating agent (such as an injected electrical current or a laser light interacting with caged neurotransmitters or photosensitive ion channels) affects the spiking activity of neurons. Based on these models, we solve the reverse problem of finding the best time-dependent modulation of the input, subject to hardware limitations as well as physiologically inspired safety measures, that causes the neuron to emit a spike train that with highest probability will be close to a target spike train. We adopt fast convex constrained optimization methods to solve this problem. Our methods can potentially be implemented in real time and may also be generalized to the case of many cells, suitable for neural prosthesis applications. With the use of biologically sensible parameters and constraints, our method finds stimulation patterns that generate very precise spike trains in simulated experiments. We also tested the intracellular current injection method on pyramidal cells in mouse cortical slices, quantifying the dependence of spiking reliability and timing precision on constraints imposed on the applied currents.
... But the integrals required for a fully Bayesian analysis are typically not available in closed form and therefore a numerical or analytical approximation is required. However, analytical approximation approaches often fail to give entirely satisfactory results, largely due to the high dimensionality of the parameter space involved (Kass et al., 1991). In this dissertation, we will demonstrate that a highly effective Bayesian computation strategy for clustering curve data is available, based on various Markov chain Monte Carlo (MCMC) ...
Article
Full-text available
Thesis (Ph. D.)--University of Washington, 2003 In this dissertation, we propose a general Bayesian hierarchical mixture model for clustering curve data. Instead of clustering based on the high dimensional observed curve data, we construct the hierarchy in such a way that lower dimensional random effects, which characterize the curves, form the basis for clustering. This model provides a flexible framework that can be tuned to the specific context, and allows information regarding curve forms, measurement errors and other prior knowledge to be incorporated. Under this model, the order of observations within curve is explicitly taken into account, and the number of clusters can be treated as unknown and inferred from the data. Computation is carried out via an implementation of birth-death MCMC algorithm. A preliminary filtering algorithm is devised in order to reduce the computational burden. We also propose novel quantitative measures of the strength of the resultant clusters in terms of sensitivity and specificity, which are not easily evaluated with traditional approaches. Substantive application of this model to a set of gene expression experiments demonstrates that substantial insight into yeast transcription programs can be gained through such model-based analysis.
... Tierney and Kadane showed that the resulting approximation has good accuracy since the leading terms of the approximation errors in the numerator and denominator cancel out. Tierney, Kass and Kadane (1987, 1989a, 1989b), Kass, Tierney and Kadane (1988, 1989, 1990, 1991), Wong and Li (1992) worked on the extensions to this methodology. More recently, Laplace approximation was used to approximate Bayes factors for nested models by Kass and Vaidyanathan (1992), for generalized linear models by Raftery (1996), and for variance component models by Pauler, Wakefield, and Kass (1999). ...
Article
This research consists of two parts. The first part examines the posterior probability integrals for a family of linear models which arises from the work of Hart, Koen and Lombard (2003). Applying Laplace's method to these integrals is not entirely straightforward. One of the requirements is to analyze the asymptotic behavior of the information matrices as the sample size tends to infinity. This requires a number of analytic tricks, including viewing our covariance matrices as tending to differential operators. The use of differential operators and their Green's functions can provide a convenient and systematic method to asymptotically invert the covariance matrices. Once we have found the asymptotic behavior of the information matrices, we will see that in most cases BIC provides a reasonable approximation to the log of the posterior probability and Laplace's method gives more terms in the expansion and hence provides a slightly better approximation. In other cases, a number of pathologies will arise. We will see that in one case, BIC does not provide an asymptotically consistent estimate of the posterior probability; however, the more general Laplace's method will provide such an estimate. In another case, we will see that a naive application of Laplace's method will give a misleading answer and Laplace's method must be adapted to give the correct answer. The second part uses numerical methods to compute the "exact" posterior probabilities and compare them to the approximations arising from BIC and Laplace's method.
... It is therefore of interest to determine whether a conjugate prior can be found that represents vague prior knowledge about O. where O is the mode of p(O). Such arl approximation corresponds to a naive standard form of the Laplace approximation to p(O); to some extent, the accuracy of ~(r,z) depends on the form of p(0) (see, for example, Kass, Tierney and Kadane, 1991). Loosely speaking, the more "normal" p(O) is, the more accurate/~(rn) can be expected to be. ...
Article
Full-text available
In this article we consider the Bayesian statistical analysis of a simple Galton-Watson process. Problems of interest include estimation of the offspring distribution, classification of the process, and prediction. We propose two simple analytic approximations to the posterior marginal distribution of the reproduction mean. This posterior distribution suffices to classify the process. In order to assess the accuracy of these approximations, a comparison is provided with a computationally more expensive approximation obtained via standard Monte Carlo techniques. Similarly, a fully analytic approximation to the predictive distribution of the future size of the population is discussed. Sampling-based and hybrid approximations to this distribution are also considered. Finally, we present some illustrative examples.
Chapter
Bayesian modelling has come a long way from the first appearance of the Bayes theorem. Now it is being applied in almost every scientific field. Scientists and practitioners are choosing to use Bayesian methodologies over the classical frequentist framework because of its rigour mathematical framework and the ability to combine prior information to define a prior distribution on the possible values of the unknown parameter. Here in this chapter we briefly discuss various aspects of Bayesian modelling. Starting from a short introduction on conditional probability, the Bayes theorem, different types of prior distributions, hierarchical and empirical Bayes and point and interval estimation, we describe Bayesian regression modelling with more detail. Then we mention an array of Bayesian computational techniques, viz. Laplace approximations, E-M algorithm, Monte Carlo sampling, importance sampling, Markov chain Monte Carlo algorithms, Gibbs sampler and Metropolis-Hastings algorithm. We also discuss model selection tools (e.g. DIC, WAIC, cross-validation, Bayes factor, etc.) and convergence diagnostics of the MCMC algorithm (e.g. Geweke diagnostics, effective sample size, Gelman-Rubin diagnostic, etc.). We end the chapter with some applications of Bayesian modelling and discuss some of the drawbacks in using Bayesian modelling in practice.KeywordsBayesian modellingPrior distributionBayesian regressionBayesian computationMarkov chain Monte CarloGibbs samplerMetropolis-Hastings algorithmBayesian model selectionWAICCross-validationBayes factor
Chapter
This article has no abstract.
Chapter
We give a personal view of what Information Geometry is, and what it is becoming, by exploring a number of key topics: dual affine families, boundaries, divergences, tensorial structures, and dimensionality. For each, we start with a graphical illustrative example (Sect. 1.1), give an overview of the relevant theory and key references (Sect. 1.2), and finish with a number of applications of the theory (Sect. 1.3). We treat ‘Information Geometry’ as an evolutionary term, deliberately not attempting a comprehensive definition. Rather, we illustrate how both the geometries used and application areas are rapidly developing.
Chapter
This article has no abstract.
Article
Full-text available
A major effort in systems biology is the development of mathematical models that describe complex biological systems at multiple scales and levels of abstraction. Determining the topology-the set of interactions-of a biological system from observations of the system's behavior is an important and difficult problem. Here we present and demonstrate new methodology for efficiently computing the probability distribution over a set of topologies based on consistency with existing measurements. Key features of the new approach include derivation in a Bayesian framework, incorporation of prior probability distributions of topologies and parameters, and use of an analytically integrable linearization based on the Fisher information matrix that is responsible for large gains in efficiency. The new method was demonstrated on a collection of four biological topologies representing a kinase and phosphatase that operate in opposition to each other with either processive or distributive kinetics, giving 8-12 parameters for each topology. The linearization produced an approximate result very rapidly (CPU minutes) that was highly accurate on its own, as compared to a Monte Carlo method guaranteed to converge to the correct answer but at greater cost (CPU weeks). The Monte Carlo method developed and applied here used the linearization method as a starting point and importance sampling to approach the Bayesian answer in acceptable time. Other inexpensive methods to estimate probabilities produced poor approximations for this system, with likelihood estimation showing its well-known bias toward topologies with more parameters and the Akaike and Schwarz Information Criteria showing a strong bias toward topologies with fewer parameters. These results suggest that this linear approximation may be an effective compromise, providing an answer whose accuracy is near the true Bayesian answer, but at a cost near the common heuristics.
Article
We develop a Bayesian approach for inferring the joint distribution of several demographic variables when in possession of only the marginal distribution of each variable and prior information about the correlations among the variables. The approach is applied to four marketing problems, two involving direct mail advertising and two involving the location of a retail site, using public domain U.S. Census Bureau data for Sioux Falls and the state of South Dakota. The Bayesian approach has several advantages, which we discuss. We compute posterior quantities using importance sampling and compare this method to Laplace’s approximation and the usual normal approximation. The Bayesian approach does a good job of recovering the joint distribution of the demographic variables and provides a measure of uncertainty about the resulting estimates. Hypothesis testing, highest posterior density regions, and decision problems are demonstrated.
Article
Full-text available
We present a computationally tractable approach to dynamically measure statistical dependencies in multivariate non-Gaussian signals. The approach makes use of extensions of independent component analysis to calculate information coupling, as a proxy measure for mutual information, between multiple signals and can be used to estimate uncertainty associated with the information coupling measure in a straightforward way. We empirically validate relative accuracy of the information coupling measure using a set of synthetic data examples and showcase practical utility of using the measure when analysing multivariate financial time series.
Article
Full-text available
The Dirac delta function has been used successfully in mathematical physics for many years. The purpose of this article is to bring attention to several useful applications of this function in mathematical statistics. Some of these applications include a unified representation of the distribution of a function (or functions) of one or several random variables, which may be discrete or continuous, a proof of a well-known inequality, and a representation of a density function in terms of its noncentral moments.
Article
Efficient algorithms were developed for estimating model parameters from measured data, even in the presence of gross errors. In addition to point estimates of parameters, however, assessments of uncertainty are needed. Linear approximations provide standard errors, but they can be misleading when applied to models that are substantially nonlinear. To overcome this difficulty, profiling methods were developed for the case in which the regressor variables are error free. These methods provide accurate nonlinear confidence regions, but become expensive for a large number of parameters. These profiling methods are modified to error-in-variable-measurement models with many incidental parameters. Laplace's method is used to integrate out the incidental parameters associated with the measurement errors, and then profiling methods are applied to obtain approximate confidence contours for the parameters. This approach is computationally efficient, requires few function evaluations, and can be applied to large-scale problems. It is useful when certain measurement errors (such as input variables) are relatively small, but not so small that they can be ignored.
Article
Full-text available
We observe a training set Q composed of l labeled samples {(X11),...,(Xl, θ<sub>l </sub>)} and u unlabeled samples {X1',...,Xu'}. The labels θi are independent random variables satisfying Pr{θi=1}=η, Pr{θi=2}=1-η. The labeled observations X<sub>i </sub> are independently distributed with conditional density fθi(·) given θi. Let (X<sub>0 </sub>,θ0) be a new sample, independently distributed as the samples in the training set. We observe X0 and we wish to infer the classification θ0. In this paper we first assume that the distributions f1(·) and f2(·) are given and that the mixing parameter is unknown. We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter η. We then assume that two densities g1(·) and g2(·) are given, but we do not know whether g1(·)=f<sub>1 </sub>(·) and g2(·)=f2(·) or if the opposite holds, nor do we know η. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions. Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples
Article
Full-text available
This paper is a survey of the major techniques and approaches available for the numerical approximation of integrals in statistics. We classify these into five broad categories; namely, asymptotic methods, importance sampling, adaptive importance sampling, multiple quadrature and Markov chain methods. Each method is discussed giving an outline of the basic supporting theory and particular features of the technique. Conclusions are drawn concerning the relative merits of the methods based on the discussion and their application to three examples. The following broad recommendations are made. Asymptotic methods should only be considered in contexts where the integrand has a dominant peak with approximate ellipsoidal symmetry. Importance sampling, and preferably adaptive importance sampling, based on a multivariate Student should be used instead of asymptotics methods in such a context. Multiple quadrature, and in particular subregion adaptive integration, are the algorithms of choice for...
Article
Full-text available
The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. We describe the basic LaplaceMetropolis estimator for models without random effects. For models with random effects the compound Laplace-Metropolis estimator is introduced. This estimator is applied to data from the World Fertility Survey and shown to give accurate results. Batching of simulation output is used to assess the uncertainty involved in using the compound Laplace-Metropolis estimator. The method allows us to test for the effects of independent variables in a random effects model, and also to test for the presence of the random effects. KEY WORDS: Laplace-Metropolis estimator; Random effects models; Marginal likelihoods; Posterior simulation; World Fertility Survey. 1 Introduction...
Article
Full-text available
This paper investigates the asymptotic performance of Bayesian target recognition algorithms using deformable-template representations. Rigid CAD-models represent the underlying targets; low-dimensional matrix Lie-groups extend them to particular instances. Remote sensors observing the targets are modeled as projective transformations, converting three-dimensional scenes into random images. Bayesian target recognition corresponds to hypothesis selection in the presence of nuisance parameters; its performance is quantified as the Bayes' error. Analytical expressions for this error probability in small noise situations are derived, yielding asymptotic error rates for exponential error probability decay. keywords: Bayesian ATR, Laplace's asymptotics, nuisance integration, deformable templates 1 Introduction A variety of civilian and military applications require recognizing targets of interest, either stationary or moving, situated in unknown surroundings, using standard remote ...
ResearchGate has not been able to resolve any references for this publication.