Thesis

Probabilistic Inference Using Markov Chain Monte Carlo Methods

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The posterior probability would then be a multiplication of these functions. Gibbs sampling uses a Markov chain process to draw samples from the conditional distributions of the θ i variables given all the other variables (full conditionals), which are linked to the posterior probability [64,65,66,67,68]. We report details about this method in the Appendix. ...
... 3 ,ȳ), and so on. A full conditional is usually easier to sample than the complete posterior density because it likely has a well-defined analytical form; otherwise, approximation techniques can be used for sample drawing [65,66,67]. ...
... After T iterations, the sampler will have generated T samples for each variable. It can be demonstrated that the samples produced in the last iterations likely converge to the samples of the posterior probability density [65]. The Gibbs sampler produces a 1st-order Markov chain of samples (with the full conditionals being the transition functions) because, at each step, it estimates new values using the values of the previous iteration. ...
Preprint
Full-text available
Ecological and ecosystem modellers frequently need to interpolate spatiotemporal observations of geophysical and environmental parameters over an analysed area. However, particularly in marine science, modellers with low expertise in oceanography and hydrodynamics can hardly use interpolation methods optimally. This paper introduces an Open Science oriented, opensource, scalable and efficient workflow for 2D marine environmental parameters. It combines a fast, efficient interpolation method with a Bayesian hierarchical model embedding the stationary advection-diffusion equation as a constraint. Our workflow fills the usability gap between interpolation software providers and the users’ communities. It can run entirely automatically without requiring expert parametrization. It is also available on a cloud computing platform, with a Web Processing Service compliant interface, supporting collaboration, repeatability, reproducibility, and provenance tracking. We demonstrate that our workflow produces comparable results to a state-of-the-art model (frequently used in oceanography) in interpolating four environmental parameters at the global scale.
... In relation to other work, we recognise using a reference for thermodynamic integration is a topic that has been raised, especially in early theoretically-oriented literature [11][12][13]. Our additional contribution is to bridge the gap from theory and simple examples to application, which includes choosing the reference using MCMC samples or gradients, examination of reference support, comparisons of convergence, and illustration of the approach for a non-trivial real-world problem. ...
... While the Eq 2 can be directly applied to conduct a pairwise model comparison between two hypotheses, by introducing a reference we can naturally marginalise the density of a single model [11,12]. This is useful when comparing multiple models as n > n 2 À � for n > 3. Another motivation to reference the TI is the MCMC computational efficiency of converging the TI expectation. ...
... In this case log qðθÞ q ref ðθÞ tends to have a small expectation and variance and converges quickly. This idea of using an exactly solvable reference, to aid in the solution of an otherwise intractable problem, has been a recurrent theme in the computational and mathematical sciences in general [14][15][16][17], and variations on this approach have been used to compute normalising constants in various guises in the statistical literature [8,9,11,[18][19][20][21][22][23][24]. For example, in the generalised stepping stone method a reference is introduced to speed up convergence of the importance sampling at each temperature rung [23,24]. ...
Article
Full-text available
Evaluating normalising constants is important across a range of topics in statistical learning, notably Bayesian model selection. However, in many realistic problems this involves the integration of analytically intractable, high-dimensional distributions, and therefore requires the use of stochastic methods such as thermodynamic integration (TI). In this paper we apply a simple but under-appreciated variation of the TI method, here referred to as referenced TI, which computes a single model's normalising constant in an efficient way by using a judiciously chosen reference density. The advantages of the approach and theoretical considerations are set out, along with pedagogical 1 and 2D examples. The approach is shown to be useful in practice when applied to a real problem -to perform model selection for a semi-mechanistic hierarchical Bayesian model of COVID-19 transmission in South Korea involving the integration of a 200D density.
... In this paper, we are particularly interested in (Generalized) Hamiltonian Monte Carlo-(G)HMC-algorithms [8,18,35,25] to perform MCMC sampling. These algorithms aim at performing an unbiased sampling of the Boltzmann-Gibbs probability measure defined by µ(dq dp) = Z −1 µ e −H(q,p) dq dp, ...
... Notice that in this case the Störmer-Verlet integrator is explicit. Algorithm 1 using ϕ SV ∆t is shown to be unbiased (see for instance [35,46]) using two fundamental properties of the numerical flow ϕ SV ∆t : ...
... First note that, if f : X → X grows at most polynomially, with derivatives growing at most polynomially, then (35) implies that there exist α ∈ N and K ∈ R + such that ...
Preprint
Full-text available
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that allows to sample high dimensional probability measures. It relies on the integration of the Hamiltonian dynamics to propose a move which is then accepted or rejected thanks to a Metropolis procedure. Unbiased sampling is guaranteed by the preservation by the numerical integrators of two key properties of the Hamiltonian dynamics: volume-preservation and reversibility up to momentum reversal. For separable Hamiltonian functions, some standard explicit numerical schemes, such as the St\"ormer--Verlet integrator, satisfy these properties. However, for numerical or physical reasons, one may consider a Hamiltonian function which is nonseparable, in which case the standard numerical schemes which preserve the volume and satisfy reversibility up to momentum reversal are implicit. Actually, when implemented in practice, such implicit schemes may admit many solutions or none, especially when the timestep is too large. We show here how to enforce the numerical reversibility, and thus unbiasedness, of HMC schemes in this context. Numerical results illustrate the relevance of this correction on simple problems.
... 3 ,ȳ), and so on. A full conditional is usually easier to sample than the complete posterior density because it likely has a well-defined analytical form; otherwise, approximation techniques can be used for sample drawing [66,67,68]. ...
... After T iterations, the sampler will have generated T samples for each variable. It can be demonstrated that the samples produced in the last iterations likely converge to the samples of the posterior probability density [66]. The Gibbs sampler produces a 1st-order Markov chain of samples (with the full conditionals being the transition functions) because, at each step, it estimates new values using the values of the previous iteration. ...
... shown that verapamil inhibits Mtb rifampin efflux through its Pgp inhibitory activity and that 96 both R-verapamil and norverapamil have similar efficacy as racemic verapamil in inhibiting Mtb 97 rifampin efflux, macrophage-induced drug tolerance and intramacrophage growth (4,6). 98 ...
... Specifically, we proceed using the Metropolis-Hastings algorithm [4]. We introduce as auxiliary variables a positive definite diagonal 2 × 2 matrix A and a 2 × 1 vector y, and put an arbitrary everywheresupported probability distribution on the product space of such objects. ...
Preprint
Full-text available
Induction of mycobacterial efflux pumps is a cause of Mycobacterium tuberculosis (Mtb) drug tolerance, a barrier to shortening antitubercular treatment. Verapamil inhibits Mtb efflux pumps that mediate tolerance to rifampin, a cornerstone of tuberculosis treatment. Mycobacterial efflux pump inhibition by verapamil also limits Mtb growth in macrophages in the absence of antibiotic treatment. These findings suggest that verapamil could be used as an adjunctive therapy for TB treatment shortening. However, verapamil is rapidly and substantially metabolized when co-administered with rifampin. We determined in a dose-escalation clinical trial that rifampin-induced clearance of verapamil can be countered without toxicity by the administration of larger than usual doses of verapamil. An oral dosage of 360 mg sustained-release (SR) verapamil given every 12 hours concomitantly with rifampin achieved median verapamil exposures of 903.1 ng.h/ml (AUC 0-12h), similar to those in persons receiving daily doses of 240 mg verapamil SR but not rifampin. Norverapamil:verapamil, R:S verapamil and R:S norverapamil AUC ratios were all significantly greater than those of historical controls receiving SR verapamil in the absence of rifampin, suggesting that rifampin administration favors the less-cardioactive verapamil metabolites and enantiomers. Finally, rifampin exposures were significantly greater after verapamil administration. Our findings suggest that a higher dosage of verapamil can be safely used as adjunctive treatment in rifampin-containing treatment regimens.
... Besides energy-based models, this also includes posterior distributions of the form p(x | D) in Bayesian inference, that is the joint distribution p(D | x)p(x) (which can often be evaluated analytically), normalized by the evidence p(D), which is typically intractable as well. In those cases, Markov chain Monte Carlo methods (MCMC;Neal, 1993) have proven to be a versatile tool to sample from such distributions defined up to normalization. Despite their generality though, MCMC methods may suffer from slow mixing when the target distribution p(x) is highly multi-modal, affecting the convergence of these methods and therefore yielding poor samples which are not representative of p(x). ...
... The Markov chain of a GFlowNet, on the other hand, is constructed on an augmented state space S, broader than the sample space X ⊆ S, allowing the chain to use these intermediate steps to move between modes more easily. It is worth noting that there exists some MCMC methods, such as Hamiltionian Monte Carlo methods (HMC; Neal, 1993;MacKay, 2003), where the target distribution is the marginal of the invariant distribution over an augmented space. ...
Preprint
Full-text available
While Markov chain Monte Carlo methods (MCMC) provide a general framework to sample from a probability distribution defined up to normalization, they often suffer from slow convergence to the target distribution when the latter is highly multi-modal. Recently, Generative Flow Networks (GFlowNets) have been proposed as an alternative framework to mitigate this issue when samples have a clear compositional structure, by treating sampling as a sequential decision making problem. Although they were initially introduced from the perspective of flow networks, the recent advances of GFlowNets draw more and more inspiration from the Markov chain literature, bypassing completely the need for flows. In this paper, we formalize this connection and offer a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space as recurrent Markov chains. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks, and most importantly to highlight their
... It is critical to account for uncertainty in situations where data limitations exist, which can lead to imprecise inference about preferences, sensitivities, and other aspects of behavior. To overcome this limitation, the Markov Chain Monte Carlo algorithm is used (Neal R.M., 1993). ...
... Gibbs sampling is a Markov Chain Monte Carlo (MCMC) method that is often used to approximate the joint distribution of a set of random variables. It works by iteratively sampling from the conditional distributions of each variable, given the current values of the other variables (Neal 1993). This process converges to the joint distribution after a sufficient number of iterations. ...
Article
Full-text available
When dealing with maintenance in ships engine room, the space available around machinery and systems (clearance) plays an important role and may significantly affect the cost of the maintenance intervention. In a first part of a current research study Gualeni et al. (Ship Technol Res, 10.1080/09377255.2021.2020949, 2022), a quantitative relation between the maintenance costs increment due to the clearance reduction is determined, using a Bayesian approach to General Linear Model (GLM), with reference to a single item/component of a larger system Sánchez-Herguedas et al. (Reliability Eng Syst Saf 207: 107394, 2021). This paper represents the second part of the activity and it enforces a systemic view over the whole machinery or system Sanders and Klein (Proc Comput Sci 8:413–419, 2012). The aim is to identify not only the relation between maintenance costs and clearance reduction, but also how the clearance reductions of the single components/items interact and affect the whole system/machinery accessibility and maintainability, meant as relevant emerging properties. The system emerging properties are investigated through the design and application of a Hidden Markov Model Salvatier et al. (Peer J Comput Sci 2: e55, 2016); i.e., the system is modeled by a Markov process with unobservable states. The sequence of states is the maintainability of the system (which incorporates each one of the single components) while the evidence is the increase in cost of maintenance related to the space reduction. By predicting a sequence of states, it is therefore possible to predict the interactions between the system components clearances and determine how the emerging maintainability property is affected by the engine room design.
... In the context of our binned model, the quantities being simultaneously inferred ( n, µ, σ, λ) span a D = 2N b + 2 dimensional space. Algorithms such as random walk Metropolis or Gibbs sampling, which scale poorly with the dimensionality of the space being sampled (Neal 1993;Homan & Gelman 2014), can potentially render high-resolution population inference computationally prohibitive. For this reason, we sample the posterior in Eq. (7) using Hamiltonian Monte Carlo (HMC) (Neal 2011;Homan & Gelman 2014), which invokes a compu-tational complexity of O(D 5/4 ) per independent sample and that is significantly more tractable than the O(D 2 ) complexity of random walk Metropolis (Creutz 1988) or Gibbs sampling (Homan & Gelman 2014). ...
... Algorithms such as random walk Metropolis or Gibbs sampling, which scale poorly with the dimensionality of the space being sampled (Neal 1993;Homan & Gelman 2014), can potentially render high-resolution population inference computationally prohibitive. For this reason, we sample the posterior in Eq. (7) using Hamiltonian Monte Carlo (HMC) (Neal 2011;Homan & Gelman 2014), which invokes a compu-tational complexity of O(D 5/4 ) per independent sample and that is significantly more tractable than the O(D 2 ) complexity of random walk Metropolis (Creutz 1988) or Gibbs sampling (Homan & Gelman 2014). ...
Preprint
Full-text available
The observation of gravitational waves from multiple compact binary coalescences by the LIGO-Virgo-KAGRA detector networks has enabled us to infer the underlying distribution of compact binaries across a wide range of masses, spins, and redshifts. In light of the new features found in the mass spectrum of binary black holes and the uncertainty regarding binary formation models, non-parametric population inference has become increasingly popular. In this work, we develop a data-driven clustering framework that can identify features in the component mass distribution of compact binaries simultaneously with those in the corresponding redshift distribution, from gravitational wave data in the presence of significant measurement uncertainties, while making very few assumptions on the functional form of these distributions. Our generalized model is capable of inferring correlations among various population properties such as the redshift evolution of the shape of the mass distribution itself, in contrast to most existing non-parametric inference schemes. We test our model on simulated data and demonstrate the accuracy with which it can re-construct the underlying distributions of component masses and redshifts. We also re-analyze public LIGO-Virgo-KAGRA data from events in GWTC-3 using our model and compare our results with those from some alternative parametric and non-parametric population inference approaches. Finally, we investigate the potential presence of correlations between mass and redshift in the population of binary black holes in GWTC-3 (those observed by the LIGO-Virgo-KAGRA detector network in their first 3 observing runs), without making any assumptions about the specific nature of these correlations.
... Nevertheless, samplers based on dynamics still have some deficiencies. The traditional dynamics samplers [8] and their developments [7,[9][10][11][12] have excellent performance in unimodal distributions. However, when facing multi-modal distributions, these algorithms may reveal some problems, especially when the modes are far away from each other. ...
... HMC instead discretizes these equations by using non-zero time steps, which inevitably introduces some error. It is, nevertheless, necessary to use a discretization for which Liouville's theorem holds exactly [8]. The common discretization method of HMC is leapfrog, which takes the following form: ...
Article
Full-text available
The Hamiltonian Monte Carlo (HMC) sampling algorithm exploits Hamiltonian dynamics to construct efficient Markov Chain Monte Carlo (MCMC), which has become increasingly popular in machine learning and statistics. Since HMC uses the gradient information of the target distribution, it can explore the state space much more efficiently than random-walk proposals, but may suffer from high autocorrelation. In this paper, we propose Langevin Hamiltonian Monte Carlo (LHMC) to reduce the autocorrelation of the samples. Probabilistic inference involving multi-modal distributions is very difficult for dynamics-based MCMC samplers, which is easily trapped in the mode far away from other modes. To tackle this issue, we further propose a variational hybrid Monte Carlo (VHMC) which uses a variational distribution to explore the phase space and find new modes, and it is capable of sampling from multi-modal distributions effectively. A formal proof is provided that shows that the proposed method can converge to target distributions. Both synthetic and real datasets are used to evaluate its properties and performance. The experimental results verify the theory and show superior performance in multi-modal sampling.
... Effective methods for sampling from the posterior distribution of Bayesian models have been an active area in computational statistics for decades. Hamiltonian (or hybrid) Monte Carlo (HMC) proposed that nearly 30 years ago (Neal 1993) has recently achieved much greater popularity with the availability of convenient and powerful implementations. While Stan (Stan Development Team 2020) is probably the most popular such tool, our model was written in Template Model Builder (Kristensen et al. 2015), which is primarily designed for maximum likelihood estimation. ...
Article
Full-text available
The management of forest pests relies on an accurate understanding of the species’ phenology. Thermal performance curves (TPCs) have traditionally been used to model insect phenology. Many such models have been proposed and fitted to data from both wild and laboratory-reared populations. Using Hamiltonian Monte Carlo for estimation, we implement and fit an individual-level, Bayesian hierarchical model of insect development to the observed larval stage durations of a population reared in a laboratory at constant temperatures. This hierarchical model handles interval censoring and temperature transfers between two constant temperatures during rearing. It also incorporates individual variation, quadratic variation in development rates across insects’ larval stages, and “flexibility” parameters that allow for deviations from a parametric TPC. Using a Bayesian method ensures a proper propagation of parameter uncertainty into predictions and provides insights into the model at hand. The model is applied to a population of eastern spruce budworm ( Choristoneura fumiferana ) reared at 7 constant temperatures. Resulting posterior distributions can be incorporated into a workflow that provides prediction intervals for the timing of life stages under different temperature regimes. We provide a basic example for the spruce budworm using a year of hourly temperature data from Timmins, Ontario, Canada. Supplementary materials accompanying this paper appear on-line.
... In the context of our binned model, the quantities being simultaneously inferred ( ) n, , ,   m s l span a D = 2N b + two-dimensional space. Algorithms such as random walk Metropolis or Gibbs sampling, which scale poorly with the dimensionality of the space being sampled (Neal 1993;Homan & Gelman 2014), can potentially render high-resolution population inference computationally prohibitive. For this reason, we sample the posterior in Equation (7) We perform HMC sampling by means of the No-U-Turn Sampler (Homan & Gelman 2014) that improves upon standard HMC by efficiently autotuning the step size, as implemented in the PyMC software library (Salvatier et al. 2016). ...
Article
Full-text available
The observation of gravitational waves from multiple compact binary coalescences by the LIGO–Virgo–KAGRA detector networks has enabled us to infer the underlying distribution of compact binaries across a wide range of masses, spins, and redshifts. In light of the new features found in the mass spectrum of binary black holes and the uncertainty regarding binary formation models, nonparametric population inference has become increasingly popular. In this work, we develop a data-driven clustering framework that can identify features in the component mass distribution of compact binaries simultaneously with those in the corresponding redshift distribution, from gravitational-wave data in the presence of significant measurement uncertainties, while making very few assumptions about the functional form of these distributions. Our generalized model is capable of inferring correlations among various population properties, such as the redshift evolution of the shape of the mass distribution itself, in contrast to most existing nonparametric inference schemes. We test our model on simulated data and demonstrate the accuracy with which it can reconstruct the underlying distributions of component masses and redshifts. We also reanalyze public LIGO–Virgo–KAGRA data from events in GWTC-3 using our model and compare our results with those from some alternative parametric and nonparametric population inference approaches. Finally, we investigate the potential presence of correlations between mass and redshift in the population of binary black holes in GWTC-3 (those observed by the LIGO–Virgo–KAGRA detector network in their first three observing runs), without making any assumptions about the specific nature of these correlations.
... This finding indicates that NUTS has superior efficiency. Even though we are quite certain that our tuning of Metropolis may be optimized, it is possible that even with expert tuning, it will have a lower efficiency than NUTS; it is widely acknowledged that, at least for some problems, dynamical approaches such as NUTS may be faster [17]. This is primarily due to the fact that Metropolis explores the parameter space via inefficient random walks, while dynamical approaches avoid this random walk behaviour, as stated in Section IV-B. ...
Article
Full-text available
Markov Chain Monte Carlo (MCMC) approaches are widely used for tuning model parameters to fit process measurements. While modern probabilistic programming languages (PPLs) such as Stan, PyMC, and Turing have made it easier to implement efficient MCMC samplers, configuring them for high dimensional and multi-modal parameter distributions remains a challenging task. In [1], the No-U-Turn Sampler (NUTS) was employed via Turing to estimate parameters of an air-cooled synchronous generator model using real-world experimental data, but the produced posterior distributions were excessively narrow. The present study extends the findings in [1] by producing more realistic parameter estimates using the same data. To accomplish this, the study first reviews the basics of MCMC; it offers some general advice for choosing appropriate settings for MCMC to ensure successful estimation, as well a discussion of the impact of measurement data on the computation of posteriors. The study then implements the simple classical MCMC technique, Metropolis, from scratch to estimate the generator model parameters, providing more insight into MCMC — its fundamental process and terminology. Finally, the knowledge gained is applied to select appropriate settings for NUTS — implemented via Turing — that yield more accurate parameter estimates.
... In this work, we use the BORG (Jasche & Wandelt 2013 ) algorithm to perform field-level inference of galaxy cluster masses conditioned on the 2M ++ galaxy catalogue (Lavaux & Hudson 2011 ). BORG uses a Hamiltonian Markov Chain Monte Carlo (MCMC) algorithm (Duane et al. 1987 ;Neal 1993 ) to sample the posterior distribution of possible initial density fields, δ IC i , assuming a CDM Gaussian prior and conditioned on the observed galaxy counts, N i , in a set of voxels (labelled by i ). This posterior is given schematically by ...
Article
Full-text available
We investigate the accuracy requirements for field-level inference of cluster and void masses using data from galaxy surveys. We introduce a two-step framework that takes advantage of the fact that cluster masses are determined by flows on larger scales than the clusters themselves. First, we determine the integration accuracy required to perform field-level inference of cosmic initial conditions on these large scales, by fitting to late-time galaxy counts using the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm. A 20-step COLA integrator is able to accurately describe the density field surrounding the most massive clusters in the Local Super-Volume (<135 h−1 Mpc), but does not by itself lead to converged virial mass estimates. Therefore we carry out ‘posterior resimulations’, using full N-body dynamics while sampling from the inferred initial conditions, and thereby obtain estimates of masses for nearby massive clusters. We show that these are in broad agreement with existing estimates, and find that mass functions in the Local Super-Volume are compatible with ΛCDM.
... Most existing Cox process works focus on the log Gaussian Cox process (LGCP) (Møller et al., 1998) where a GP function is passed through an exponential link function to model the positive intensity rate. Due to the nonconjugacy between point process likelihood and GP prior, practitioners need to apply Markov chain Monte Carlo (MCMC) (Neal, 1993) or variational inference (Blei et al., 2017) methods to infer the posterior distribution of model parameters. For MCMC, the specialised MCMC algorithms, such as Metropolis-adjusted Langevin algorithm (MALA) (Møller et al., 1998;Besag, 1994), as well as the probabilistic programming languages based on MCMC (Wood et al., 2014) where one does not need to write a sampler by hand, can be used for sampling from the posterior of intensity function. ...
Article
Full-text available
This paper presents a novel extension of multi-task Gaussian Cox processes for modeling multiple heterogeneous correlated tasks jointly, e.g., classification and regression, via multi-output Gaussian processes (MOGP). A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks, while allowing for nonparametric parameter estimation. To circumvent the non-conjugate Bayesian inference in the MOGP modulated heterogeneous multi-task framework, we employ the data augmentation technique and derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters. We demonstrate the performance and inference on both 1D synthetic data as well as 2D urban data of Vancouver.
... Most existing Cox process works focus on the log Gaussian Cox process (LGCP) (Møller et al, 1998) where a GP function is passed through an exponential link function to model the positive intensity rate. Due to the nonconjugacy between point process likelihood and GP prior, practitioners need to apply Markov chain Monte Carlo (MCMC) (Neal, 1993) or variational inference (Blei et al, 2017) methods to infer the posterior distribution of model parameters. For MCMC, the specialised MCMC algorithms, such as Metropolis-adjusted Langevin algorithm (MALA) (Møller et al, 1998;Besag, 1994), as well as the probabilistic programming languages based on MCMC (Wood et al, 2014) where one does not need to write a sampler by hand, can be used for sampling from the posterior of intensity function. ...
Preprint
This paper presents a novel extension of multi-task Gaussian Cox processes for modeling multiple heterogeneous correlated tasks jointly, e.g., classification and regression, via multi-output Gaussian processes (MOGP). A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks, while allowing for nonparametric parameter estimation. To circumvent the non-conjugate Bayesian inference in the MOGP modulated heterogeneous multi-task framework, we employ the data augmentation technique and derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters. We demonstrate the performance and inference on both 1D synthetic data as well as 2D urban data of Vancouver.
... The momenta are then discarded, projecting the pair (position, momentum) back onto the parameter space, and the "position" estimate that is reached is the sample of said iteration. As we will invoke the HMC algorithm only via the interface tmbstan, we do not need to go in more detail, but a deeper dive into Hamilton's equations and their use in MCMC algorithms can be found in Betancourt (2017) and Neal (1993). ...
Preprint
Full-text available
Multimodality of the likelihood in Gaussian mixtures is a well-known problem. The choice of the initial parameter vector for the numerical optimizer may affect whether the optimizer finds the global maximum, or gets trapped in a local maximum of the likelihood. We propose to use Hamiltonian Monte Carlo (HMC) to explore the part of the parameter space which has a high likelihood. Each sampled parameter vector is used as the initial value for quasi-Newton optimizer, and the resulting sample of (maximum) likelihood values is used to determine if the likelihood is multimodal. We use a single simulated data set from a three component bivariate mixture to develop and test the method. We use state-of-the-art HCM software, but experience difficulties when trying to directly apply HMC to the full model with 15 parameters. To improve the mixing of the Markov Chain we explore various tricks, and conclude that for the dataset at hand we have found the global maximum likelihood estimate.
... We conclude with a discussion of Hamiltonian Monte Carlo (HMC), a technique whose combination with the developments in this paper offers exceptional performance and almost complete achievement of the objective of replacing uncertain MCMC convergence tests by perfect simulation, for continuous processes with differentiable likelihoods. HMC was developed by Duane et al. (1987) and is discussed in detail by Neal (1993Neal ( , 1995Neal ( , 2011. Originally known as Hybrid Monte Carlo, HMC alternates random and deterministic phases, thereby exploring the sample space much more effectively than older MCMC methods that are based on random walks (see Neal, 2011, sec. ...
Preprint
Full-text available
We show that any application of the technique of unbiased simulation becomes perfect simulation when coalescence of the two coupled Markov chains can be practically assured in advance. This happens when a fixed number of iterations is high enough that the probability of needing any more to achieve coalescence is negligible; we suggest a value of $10^{-20}$. This finding enormously increases the range of problems for which perfect simulation, which exactly follows the target distribution, can be implemented. We design a new algorithm to make practical use of the high number of iterations by producing extra perfect sample points with little extra computational effort, at a cost of a small, controllable amount of serial correlation within sample sets of about 20 points. Different sample sets remain completely independent. The algorithm includes maximal coupling for continuous processes, to bring together chains that are already close. We illustrate the methodology on a simple, two-state Markov chain and on standard normal distributions up to 20 dimensions. Our technical formulation involves a nonzero probability, which can be made arbitrarily small, that a single perfect sample point may have its place taken by a "string" of many points which are assigned weights, each equal to $\pm 1$, that sum to~$1$. A point with a weight of $-1$ is a "hole", which is an object that can be cancelled by an equivalent point that has the same value but opposite weight $+1$.
... Sampling techniques built on Monte Carlo integration [15] provide a powerful alternative. Traditional Markov Chain Monte Carlo (MCMC) techniques [27] can construct chains of proposed samples y ′ with strong theoretical guarantees on their convergence to posterior distribution [13]. In MH [16], this is ensured by evaluation of acceptance probabilities. ...
Preprint
Full-text available
Predictive variability due to data ambiguities has typically been addressed via construction of dedicated models with built-in probabilistic capabilities that are trained to predict uncertainty estimates as variables of interest. These approaches require distinct architectural components and training mechanisms, may include restrictive assumptions and exhibit overconfidence, i.e., high confidence in imprecise predictions. In this work, we propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity. The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions. It is architecture agnostic and can be applied to any feed-forward deterministic network without changes to the architecture or training procedure. Experiments on regression tasks on imaging and non-imaging input data show the method's ability to generate diverse and multi-modal predictive distributions, and a desirable correlation of the estimated uncertainty with the prediction error.
... The posterior distribution, p(ψ M |D) in Eq. (8), is not analytically tractable, and thus we use Markov chain Monte Carlo (MCMC) sampling for Bayesian inference. MCMC is a sequential sampling algorithm that is often used in Bayesian statistics to draw samples from a certain target posterior (Neal, 1993). Specifically, we employ the No-U-Turn Sampler (Hoffman and Gelman, 2014) as implemented in the software library NumPyro (Phan et al., 2019). ...
Article
Full-text available
Ground-motion correlation models play a crucial role in regional seismic risk modeling of spatially distributed built infrastructure. Such models predict the correlation between ground-motion amplitudes at pairs of sites, typically as a function of their spatial proximity. Data from physics-based simulators and event-to-event variability in empirically derived model parameters suggest that spatial correlation is additionally affected by path and site effects. Yet, identifying these effects has been difficult due to scarce data and a lack of modeling and assessment approaches to consider more complex correlation predictions. To address this gap, we propose a novel correlation model that accounts for path and site effects via a modified functional form. To quantify the estimation uncertainty, we perform Bayesian inference for model parameter estimation. The derived model outperforms traditional isotropic models in terms of the predictive accuracy for training and testing data sets. We show that the previously found event-to-event variability in model parameters may be explained by the lack of accounting for path and site effects. Finally, we examine implications of the newly proposed model for regional seismic risk simulations.
... Sampling precision refers to the how much our simulated distribution changes as we (24,25). In practice, effective sample size must be estimated via a small sum of estimated autocorrelations, and effective sample size will generally be lower for quantities other than the posterior mean (see (26) for details). ...
Preprint
The Metropolis algorithm is a Markov chain Monte Carlo (MCMC) algorithm used to simulate from parameter distributions of interest, such as generalized linear model parameters. The "Metropolis step" is a keystone concept that underlies classical and modern MCMC methods and facilitates simple analysis of complex statistical models. Beyond Bayesian analysis, MCMC is useful for generating uncertainty intervals, even under the common scenario in causal inference in which the target parameter is not directly estimated by a single, fitted statistical model. We demonstrate, with a worked example, pseudo-code, and R code, the basic mechanics of the Metropolis algorithm. We use the Metropolis algorithm to estimate the odds ratio and risk difference contrasting the risk of childhood leukemia among those exposed to high versus low level magnetic fields. This approach can be used for inference from Bayesian and frequentist paradigms and, in small samples, offers advantages over large-sample methods like the bootstrap.
... In this case, the chain is ergodic and will sample from a unique 911 distribution. The proof is reported in Neal (1993) and not repeated here for 912 conciseness. ...
Article
This study elucidates the behavior of Markov-Chains Monte Carlo ensemble samplers for vadose zone inverse modeling by performing an in-depth comparison of four algorithms that use Affine-Invariant (AI) moves or Differential Evolution (DE) strategies to approximate the target density. Two Rosenbrock toy distributions, and one synthetic and one actual case study focusing on the inverse estimation of soil hydraulic parameters using HYDRUS-1D, are used to compare samplers in different dimensions d. The analysis reveals that an ensemble with N=d+1 chains evolved using DE-based strategies converges to the wrong stationary posterior, while AI doesn’t suffer from this issue but exhibits delayed convergence. DE-based samplers regain their ergodic properties when using N>2d chains. Increasing the number of chains above this threshold has only minor effects on the samplers’ performance, while initializing the ensemble in a high-likelihood region facilitates its convergence. AI strategies exhibit shorter autocorrelation times in the 7d synthetic vadose zone scenario, while DE-based samplers outperform them when the number of soil parameters increases to 16 in the actual scenario. All evaluation metrics degrade as d increases, thus suggesting that sampling strategies based only on interpolation between chains tend to become inefficient when the bulk of the posterior lays in increasingly small portions of the parameters’ space.
... In Fig. 4, we show that circuits with one-to-one or naïvely distributed codes do not give accurate estimates of the concentration mean within 200 ms, while tuning the geometry to match the receptor affinities enables fast convergence. All three circuits overestimate the posterior variance at short times, consistent with what one would expect for unadjusted Langevin samplers (Neal, 1993), but the geometry-aware model's estimate decays most rapidly towards the target. Therefore, when the synaptic weights are tuned, our circuit model can enable fast, robust estimation of concentration statistics, even in the presence of distractors. ...
Preprint
Full-text available
Within a single sniff, the mammalian olfactory system can decode the identity and concentration of odorants wafted on turbulent plumes of air. Yet, it must do so given access only to the noisy, dimensionally-reduced representation of the odor world provided by olfactory receptor neurons. As a result, the olfactory system must solve a compressed sensing problem, relying on the fact that only a handful of the millions of possible odorants are present in a given scene. Inspired by this principle, past works have proposed normative compressed sensing models for olfactory decoding. However, these models have not captured the unique anatomy and physiology of the olfactory bulb, nor have they shown that sensing can be achieved within the 100-millisecond timescale of a single sniff. Here, we propose a rate-based Poisson compressed sensing circuit model for the olfactory bulb. This model maps onto the neuron classes of the olfactory bulb, and recapitulates salient features of their connectivity and physiology. For circuit sizes comparable to the human olfactory bulb, we show that this model can accurately detect tens of odors within the timescale of a single sniff. We also show that this model can perform Bayesian posterior sampling for accurate uncertainty estimation. Fast inference is possible only if the geometry of the neural code is chosen to match receptor properties, yielding a distributed neural code that is not axis-aligned to individual odor identities. Our results illustrate how normative modeling can help us map function onto specific neural circuits to generate new hypotheses.
... Evaluation of the Bayes factor requires marginalization of the likelihood over the entire space of the parameters, which tends to be a very complex and costly integration. Many methods have been developed to perform or approximate this integration, such as MCMC, 268 nested sampling, 269 thermodynamic integration, 270 and the Laplace approximation, 35 among others. ...
Article
Bayesian analysis enables flexible and rigorous definition of statistical model assumptions with well-characterized propagation of uncertainties and resulting inferences for single-shot, repeated, or even cross-platform data. This approach has a strong history of application to a variety of problems in physical sciences ranging from inference of particle mass from multi-source high-energy particle data to analysis of black-hole characteristics from gravitational wave observations. The recent adoption of Bayesian statistics for analysis and design of high-energy density physics (HEDP) and inertial confinement fusion (ICF) experiments has provided invaluable gains in expert understanding and experiment performance. In this Review, we discuss the basic theory and practical application of the Bayesian statistics framework. We highlight a variety of studies from the HEDP and ICF literature, demonstrating the power of this technique. Due to the computational complexity of multi-physics models needed to analyze HEDP and ICF experiments, Bayesian inference is often not computationally tractable. Two sections are devoted to a review of statistical approximations, efficient inference algorithms, and data-driven methods, such as deep-learning and dimensionality reduction, which play a significant role in enabling use of the Bayesian framework. We provide additional discussion of various applications of Bayesian and machine learning methods that appear to be sparse in the HEDP and ICF literature constituting possible next steps for the community. We conclude by highlighting community needs, the resolution of which will improve trust in data-driven methods that have proven critical for accelerating the design and discovery cycle in many application areas.
... Let us consider a family of (generally unnormalized) densities such that (x|0) = g(x) is the prior and (x|1) = (x) is the unnormalized posterior distribution. An example is the so-called geometric path (Neal 1993). The corresponding normalized densities in the family are denoted as Then, the main TI identity is (Llorente et al. 2020a) ...
Article
Full-text available
In Bayesian inference, we are usually interested in the numerical approximation of integrals that are posterior expectations or marginal likelihoods (a.k.a., Bayesian evidence). In this paper, we focus on the computation of the posterior expectation of a function f(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\textbf{x})$$\end{document}. We consider a target-aware scenario where f(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\textbf{x})$$\end{document} is known in advance and can be exploited in order to improve the estimation of the posterior expectation. In this scenario, this task can be reduced to perform several independent marginal likelihood estimation tasks. The idea of using a path of tempered posterior distributions has been widely applied in the literature for the computation of marginal likelihoods. Thermodynamic integration, path sampling and annealing importance sampling are well-known examples of algorithms belonging to this family of methods. In this work, we introduce a generalized thermodynamic integration (GTI) scheme which is able to perform a target-aware Bayesian inference, i.e., GTI can approximate the posterior expectation of a given function. Several scenarios of application of GTI are discussed and different numerical simulations are provided.
... In this work, we use the BORG (Jasche & Wandelt 2013) algorithm to perform field-level inference of galaxy cluster masses conditioned on the 2M++ galaxy catalogue (Lavaux & Hudson 2011). BORG uses a Hamiltonian Markov Chain Monte Carlo (MCMC) algorithm (Duane et al. 1987;Neal 1993) to sample the posterior distribution of possible initial density fields, δ IC i , assuming a ΛCDM Gaussian prior and conditioned on the observed galaxy counts, Ni, in a set of voxels (labelled by i). This posterior is given schematically by ...
Preprint
We investigate the accuracy requirements for field-level inference of cluster masses and void sizes using data from galaxy surveys. We introduce a two-step framework that takes advantage of the fact that cluster masses are determined by flows on larger scales than the clusters themselves. First, we determine the integration accuracy required to perform field-level inference of cosmic initial conditions on these large scales, by fitting to late-time galaxy counts using the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm. A 20-step COLA integrator is able to accurately describe the density field surrounding the most massive clusters in the Local Super-Volume ($<135\,h^{-1}\mathrm{Mpc}$), but does not by itself lead to converged virial mass estimates. Therefore we carry out `posterior resimulations', using full $N$-body dynamics while sampling from the inferred initial conditions, and thereby obtain estimates of masses for nearby massive clusters. We show that these are in broad agreement with existing estimates, and find that mass functions in the Local Super-Volume are compatible with $\Lambda$CDM.
... Partition Functions & Probabilistic Models. MCMC methods are widely used for posterior inference (Neal, 1993(Neal, , 2000Chang and Fisher III, 2013;Zaheer et al., 2016Zaheer et al., , 2017. These methods are concerned with finding a high quality estimate of the distribution as the end task. ...
Preprint
Full-text available
Dual encoder models are ubiquitous in modern classification and retrieval. Crucial for training such dual encoders is an accurate estimation of gradients from the partition function of the softmax over the large output space; this requires finding negative targets that contribute most significantly ("hard negatives"). Since dual encoder model parameters change during training, the use of traditional static nearest neighbor indexes can be sub-optimal. These static indexes (1) periodically require expensive re-building of the index, which in turn requires (2) expensive re-encoding of all targets using updated model parameters. This paper addresses both of these challenges. First, we introduce an algorithm that uses a tree structure to approximate the softmax with provable bounds and that dynamically maintains the tree. Second, we approximate the effect of a gradient update on target encodings with an efficient Nystrom low-rank approximation. In our empirical study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining. Furthermore, our method surpasses prior state-of-the-art while using 150x less accelerator memory.
... Among the state-of-the-art training algorithms for BNN, the Hamiltonian Monte Carlo (HMC) method (Neal, 1996), a variant of Markov Chain Monte Carlo (MCMC) (Gilks et al., 1996;Neal, 1993), is gaining importance and seems to be the gold standard nowadays (Benker et al., 2020). As other Bayesian training algorithms, HMC aims to find an approximation of the posterior distribution p (θ|D, M) by sampling from a Markov Chain, where θ = {w, b} ∈ Θ ⊆ R d represents the weights (w) and biases (b) of the BNN, D the data, and M the model class, which in this case is related to the BNN architecture. ...
Thesis
Full-text available
Nowadays, one of the most widely used structural building systems consists of masonry infilled frames, in which the walls are intended to protect the interior of the building from environmental conditions. The construction of these walls is well known by practitioners and have been attractive for their low cost and ability to isolate different environments of a building. In most cases, the aforementioned walls are considered as non-structural elements; however, the observation made on the behaviour of this type of structures, especially after the occurrence of seismic events, has shown that these elements have an effective collaboration with the rest of the structure. The structural system of masonry infilled frames has a large number of variables which condition its behaviour. Among the most influential variables we can mention: (1) the masonry units: the variety of materials used, the different forms of manufacture, quality and geometry, (2) the bonding mortar between masonry units: the different materials and dosages used and the bonding quality achieved between the masonry units, (3) the bonding quality between the wall and the portal frame: the existence of stress transfer elements or the direct interaction of the mortar with the frame material, (4) the construction process and the expertise of the workmanship, (5) the interaction between the behaviour of the wall in and out of its plane, (6) the existence of openings (doors and/or windows) in the wall. The aforementioned variables constitute sources of uncertainty which imply difficulty when characterizing the seismic-resistant behaviour of this system. The main objective of this thesis is to provide effective and cost-efficient tools to evaluate existing masonry infilled frame structures. In this sense, the use of probabilistic tools has been explored to propose techniques to predict the behaviour of infilled frame buildings with quantified uncertainty. First, the use of approximate Bayesian computational algorithms is studied to infer non-linear numerical modelling parameters of masonry infilled frames, taking as a reference the results of laboratory testing. An improvement to the original ABC-SubSim algorithm is proposed, for ease of use by autonomously estimating a series of meta-parameters that influence the speed of calculation and quality of the result. This new algorithm has been named A2BC-SubSim, and has been proven to achieve a balance between computational speed and result quality, after the solution of some numerical examples. The proposed algorithm has been applied to a Bayesian inference of multiple uncertain parameters from a non-linear multi-story infilled frame model of a building, with high efficiency. On the other hand, the application of neural networks to predict the constitutive behaviour of the structural system was also explored. A database of existing tests has been collected in order to train such networks; however, due to the limited number of tests available in the literature, it was chosen to work with Bayesian neural networks, which have the advantage of also providing information on the quality of the prediction made by quantification of the uncertainty. Additionally, up-to-date training methods of the Bayesian neural networks have been tested. The prediction capabilities of the trained network has been checked against measurements of laboratory tests that were not part of the training group, achieving acceptable results in terms of interpolation of unobserved data, and extrapolation to unknown data. Finally, dedicated laboratory tests have been performed to study the relationship between the out-ofplane fundamental frequency of the wall versus the stiffness of the system in the plane of the wall. With the results of these tests and a parametric study using complex numerical models, a semi-empirical nondestructive methodology has been proposed to estimate the stiffness of existing masonry infilled frames. The methodology has been checked with measurements of specimens tested on a seismic table, demonstrating the feasibility of application of the proposed methodology for existing infilled frame structures whose actual seismic behaviour want to be updated.
... Neal 1993; MacKay 2005 ch. 29. ...
Preprint
Full-text available
In fields such as medicine and drug discovery, the ultimate goal of a classification is not to guess a class, but to choose the optimal course of action among a set of possible ones, usually not in one-one correspondence with the set of classes. This decision-theoretic problem requires sensible probabilities for the classes. Probabilities conditional on the features are computationally almost impossible to find in many important cases. The main idea of the present work is to calculate probabilities conditional not on the features, but on the trained classifier's output. This calculation is cheap, needs to be made only once, and provides an output-to-probability "transducer" that can be applied to all future outputs of the classifier. In conjunction with problem-dependent utilities, the probabilities of the transducer allow us to find the optimal choice among the classes or among a set of more general decisions, by means of expected-utility maximization. This idea is demonstrated in a simplified drug-discovery problem with a highly imbalanced dataset. The transducer and utility maximization together always lead to improved results, sometimes close to theoretical maximum, for all sets of problem-dependent utilities. The one-time-only calculation of the transducer also provides, automatically: (i) a quantification of the uncertainty about the transducer itself; (ii) the expected utility of the augmented algorithm (including its uncertainty), which can be used for algorithm selection; (iii) the possibility of using the algorithm in a "generative mode", useful if the training dataset is biased.
... where M is the chain length, M eff is the effective number of samples, t sampling is the sampling wall-clock time, and ρ τ is the autocorrelation for the lag time τ (Neal, 1993). The speed-up factor is defined as the ESS/s achieved by Timewarp divided by the ESS/s achieved by MD. ...
Preprint
Molecular dynamics (MD) simulation is a widely used technique to simulate molecular systems, most commonly at the all-atom resolution where the equations of motion are integrated with timesteps on the order of femtoseconds ($1\textrm{fs}=10^{-15}\textrm{s}$). MD is often used to compute equilibrium properties, which requires sampling from an equilibrium distribution such as the Boltzmann distribution. However, many important processes, such as binding and folding, occur over timescales of milliseconds or beyond, and cannot be efficiently sampled with conventional MD. Furthermore, new MD simulations need to be performed from scratch for each molecular system studied. We present Timewarp, an enhanced sampling method which uses a normalising flow as a proposal distribution in a Markov chain Monte Carlo method targeting the Boltzmann distribution. The flow is trained offline on MD trajectories and learns to make large steps in time, simulating the molecular dynamics of $10^{5} - 10^{6}\:\textrm{fs}$. Crucially, Timewarp is transferable between molecular systems: once trained, we show that it generalises to unseen small peptides (2-4 amino acids), exploring their metastable states and providing wall-clock acceleration when sampling compared to standard MD. Our method constitutes an important step towards developing general, transferable algorithms for accelerating MD.
... An approximation to equation (11) must be developed since exact posterior inference is not feasible. Simulation techniques such as Markov Chain Monte Carlo (MCMC) [27] methods allow drawing a sequence of correlated samples that can be used to estimate the intractable integrals. However, considering the large number of parameters and the complex nature of the posterior distribution induced by the presence of the classifier function, traditional MCMC methods such as Metropolis and Gibbs sampling will struggle to converge to the target distribution. ...
Preprint
Full-text available
Counterfactual explanations utilize feature perturbations to analyze the outcome of an original decision and recommend an actionable recourse. We argue that it is beneficial to provide several alternative explanations rather than a single point solution and propose a probabilistic paradigm to estimate a diverse set of counterfactuals. Specifically, we treat the perturbations as random variables endowed with prior distribution functions. This allows sampling multiple counterfactuals from the posterior density, with the added benefit of incorporating inductive biases, preserving domain specific constraints and quantifying uncertainty in estimates. More importantly, we leverage Bayesian hierarchical modeling to share information across different subgroups of a population, which can both improve robustness and measure fairness. A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples. Experiments across several datasets demonstrate that the counterfactuals estimated using our approach are valid, sparse, diverse and feasible.
... Dans ce cas, les méthodes par échantillonnage de type MCMC (Monte Carlo Markov Chain) sont plus adaptées. Leur Chapitre 1. Contexte théorique utilisation sort du cadre de cette thèse, mais le lecteur curieux pourra en trouver une description dans ce livre [47] qui leur est dédié. ...
Thesis
L'intelligence artificielle (IA) révolutionne déjà nos habitudes à travers des technologies permettant l'analyse, le filtrage et la classification d’une grande quantité de données qui ne peuvent être traitées par des algorithmes classiques de par leur complexité et leur incomplétude. Ainsi, il existe aujourd'hui une forte demande pour le matériel dédié à l'IA alors que les objets capables de capter, traiter et distribuer l'information sont de plus en plus contraints en énergie et que la puissance de calcul requise ne fait qu'augmenter. Dans ce contexte, l'approche Bayésienne semble une piste particulièrement intéressante pour réaliser des tâches de fusion d'information de manière explicable, avec peu de données et à faible coût énergétique. De plus, en réduisant les multiplications à une simple porte ET logique, l'arithmétique stochastique permet de réduire significativement la taille des opérateurs (et donc les coûts de fabrication) ainsi que leur consommation ce qui la rend particulièrement adaptée aux calcul d'inférences Bayésiennes. De premiers résultats montrent l'efficacité énergétique de cette approche stochastique lorsque la taille du problème à traiter est réduite et lorsque la précision requise est faible.Dans cette thèse nous proposons des solutions permettant d’adresser les principaux goulots d’étranglement lorsque la taille du problème augmente : l'espace mémoire nécessaire pour le stockage des distributions de probabilité, le coût en surface, et en puissance consommée pour générer les nombres aléatoires et le temps de calcul élevé dû à la logique stochastique, réduisant les performances énergétiques du système.Par rapport à l’état de l’art, ces contributions permettent de diviser jusqu’à 2 fois la surface et jusqu’à 30 fois la consommation énergétique, tout en divisant jusqu’à 8 fois le temps de calcul.
... Run infinitely long, it is guaranteed that the posterior mean of a parameter calculated from MCMC samples will be infinitesimally close to the true posterior mean. Given finite run lengths we must account for the resultant error to understand how precise our estimates are (Neal, 1993;Kass et al., 1998). The Markov chain central limit theorem establishes that the sampling distribution of a mean converges asymptotically to a normal distribution (Jones, 2004). ...
... While this model has only one random variable, more complex models are possible; however, the inference may not be analytically solvable, requiring approximations such as Monte Carlo Markov Chain (MCMC) sampling (Neal 1993). Such a generative model can make predictions that are PDFs (i.e., posterior distributions) themselves. ...
Article
Full-text available
Understanding the influence of configuration options on the performance of a software system is key for finding optimal system configurations, system understanding, and performance debugging. In the literature, a number of performance-influence modeling approaches have been proposed, which model a configuration option’s influence and a configuration’s performance as a scalar value. However, these point estimates falsely imply a certainty regarding an option’s influence that neglects several sources of uncertainty within the assessment process, such as (1) measurement bias, choices of model representation and learning process, and incomplete data. This leads to the situation that different approaches and even different learning runs assign different scalar performance values to options and interactions among them. The true influence is uncertain, though. There is no way to quantify this uncertainty with state-of-the-art performance modeling approaches. We propose a novel approach, P4, which is based on probabilistic programming, that explicitly models uncertainty for option influences and consequently provides a confidence interval for each prediction alongside a scalar. This way, we can explain, for the first time, why predictions may be erroneous and which option’s influence may be unreliable. An evaluation on 13 real-world subject systems shows that P4’s accuracy is in line with the state of the art while providing reliable confidence intervals, in addition to scalar predictions. We qualitatively explain how uncertain influences of individual options and interactions cause inaccurate predictions.
... The Markov Chain Monte Carlo process (Brooks, 1998;Brooks et al., 2011;Chung, 1960;Geyer, 1992;Gilks et al., 1995;Hastings, 1970;Kendall, 1953;Larget & Simon, 1999;Neal, 1993;Orey, 1991;Shumway, 1987;Winkler, 2012) is carried out for species concurrence and species occurrence in a carbonate facies using field data from systematic stratigraphic logs. The Markov Chain Monte The calculation requires data on the change in the number of species between beds, herein referred to as transitions between states in the Markov Chain Monte Carlo process. ...
Article
Full-text available
The evolutionary process of anagenesis is thought to control distributions of members of the Zaphrentites delanouei species group, the Tournaisian‐Viséan solitary rugose corals, within the Carboniferous Limestone. We present alternative interpretations specific to the Friars Point Limestone Formation, which point to composite evolutionary processes closer to anacladogenesis. We use a Markov Chain Monte Carlo process to organize fossils into matrices representing states of concurrence, where concurrence is the number of species coexisting in each bed. We analyse their distributions according to transitions between states. Data testing by matrix multiplication shows whether stochastic equilibrium or convergence is reached, to determine probabilities of species coexistence. Taking the probabilities from the Markov Chain Monte Carlo process to represent the first generation of a branching process, we proceed to calculate the second to fourth generations. Finally, we model these values in a Galton–Watson process to determine the likelihood of ultimate extinction, and whether the species belong to the same population without immigration or emigration. Results show that the species distribution is both anagenetic (0.725) and cladogenetic (0.275). Therefore, we define the evolutionary process as anacladogenetic with the potential for up to eight species in addition to the six defined in the literature. This represents some evidence for a population unaffected by immigration or emigration, with a high likelihood of ultimate extinction for most localities. We deduce that second or third‐generation concurrences are a requisite for survival, even with anacladogenesis. As an environmental corollary, the amplification of extinction rates was exacerbated within a regressive marine system, and our techniques will allow further exploration of evolutionary mechanisms and energy within coral ecosystems.
Article
Three-dimensional (3D) mineral prospectivity mapping (MPM) uses mathematical models to integrate different types of 3D data related to mineralization to obtain mineral prospectivity information in 3D space. Existing geological data contain known deposits, non-deposits and unknown ore-bearing data, corresponding to positive samples, negative samples and unlabeled samples respectively in MPM. Different sample combination types require different mathematical models. In this paper, support vector machine class (SVMC) machine learning method is selected to compare the influence of different sample combination types on prediction results. The SVMC is a one-class SVM (OCSVM) model based on positive-only samples, the SVM is based on both positive and negative samples, and the bagging-based positive-unlabeled learning algorithm with SVM base learner (BPUL-SVM) is based on both positive and unlabeled samples. The study area is in the Sanshandao-Cangshang offshore and onshore Au district, where there are Sanshandao, Cangshang and Xinli large- and super-large-scale Au deposits. Moreover, the discovery of large-scale Sea Au deposits in the sea area indicates the great potential for mineralization in the district. According to the metallogenic geological characteristics, the Au deposits in the Sanshandao-Cangshang district are controlled by the NE-striking fault and are closely related to the Linglong intrusions and Guojialing intrusions. The ore-bearing intrusion shows low density and low-moderate magnetic susceptibility. Because the Au orebodies hosted in the Sanshandao fault and its secondary faults, the NE-striking faults are key to delineating the targets. In this paper, weights of evidence (WofE), OCSVM, SVM and BPUL-SVM are used to MPM, and the prediction-area (P-A) plot method is used to delineate the targets. According to the ROC curve, F1 score and P-A plot evaluation methods, the model performance from high to low is BPUL-SVM13, SVM12, WofE and OCSVM. The BPUL-SVM model performance with samples combination types of positive samples and unlabeled samples was optimum in SVMC prediction models. The Markov chain Monte Carlo (MCMC) simulation and return-risk evaluation model are used to evaluate the return and risk of the targets and finally determine the I-level targets with high return and low risk. The delineated targets are mainly distributed along the F2 and F3 faults (Sanshandao-Cangshang fault). Combined with the mineralization regularity, the deep and periphery space of the known deposits are important to explore Au orebodies. The delineated targets are important to explore offshore and onshore Au orebodies in the Sanshandao-Cangshang district.
Article
Full-text available
Recent applications of machine-learned normalizing flows to sampling in lattice field theory suggest that such methods may be able to mitigate critical slowing down and topological freezing. However, these demonstrations have been at the scale of toy models, and it remains to be determined whether they can be applied to state-of-the-art lattice quantum chromodynamics calculations. Assessing the viability of sampling algorithms for lattice field theory at scale has traditionally been accomplished using simple cost scaling laws, but as we discuss in this work, their utility is limited for flow-based approaches. We conclude that flow-based approaches to sampling are better thought of as a broad family of algorithms with different scaling properties, and that scalability must be assessed experimentally.
Chapter
New applications are motivating and informing the design of sensor/actuator (S/A) networks, and, more broadly, distributed intelligent systems. Key to enhancing S/A networks is the Dynamic Data Driven Applications Systems (DDDAS) paradigm, characterized by the ability of an executing model of the system to ingest data and in turn steer the collection of new data. Knowledge of many physical systems is uncertain, so that sensing and actuation must be mediated by inference of the structure and parameters of physical-system models. One application domain of rapidly growing interest is ecological research and agricultural systems, motivated by the need to understand plant survival and growth as a function of genetics, environment, and climate. For these applications, cyber-eco systems must be developed that infer dynamic data-driven predictive models of plant growth dynamics in response to weather and climate drivers that allow incorporation of uncertainty. This chapter describes the cyber-eco systems algorithms and system architecture, including S/A node design, site-level networking, data assimilation, inference, and distributed control. Among the innovations are: a modular, parallel-processing node hardware design allowing real-time processing, energy-aware hardware/software design, and a networking protocol that builds in trade-offs between energy conservation and latency. The implementations presented in this chapter include experimental networks in an Eastern USA forest environment and an operational distributed system, the Southwest Experimental Garden Array (SEGA), consisting of geographically distributed outdoor gardens on an elevational gradient of over 1500 m in Arizona, USA. Finally, results demonstrate fine-scale inference of soil moisture for irrigation control.
Article
Full-text available
Numerical simulations have shown that finescale structures such as fronts are often suitable places for the generation of vertical velocities, transporting subsurface nutrients to the euphotic zone and thus modulating phytoplankton abundance and community structure. In these structures, direct in situ estimations of the phytoplankton growth rates are rare; although difficult to obtain, they provide precious information on the ecosystem functioning. Here, we consider the case of a front separating two water masses characterized by several phytoplankton groups with different abundances in the southwestern Mediterranean Sea. In order to estimate possible differences in growth rates, we measured the phytoplankton diurnal cycle in these two water masses as identified by an adaptive and Lagrangian sampling strategy. A size-structured population model was then applied to these data to estimate the growth and loss rates for each phytoplankton group identified by flow cytometry, showing that these two population parameters are significantly different on the two sides of the front and consistent with the relative abundances. Our results introduce a general method for estimating growth rates at frontal systems, paving the way for in situ exploration of finescale biophysical interactions.
Preprint
Astronomical cycles recorded in stratigraphic sequences offer a powerful data source to estimate Earth’s axial precession frequency k, as well as the frequency of rotation of the planetary perihelia (gi) and of the ascending nodes of their orbital planes (si). Together, these frequencies control the insolation cycles (eccentricity, obliquity and climatic precession) that affect climate and sedimentation, providing a geologic record of ancient Solar system behavior spanning billions of years. Here we introduce two Bayesian methods that harness stratigraphic data to quantitatively estimate ancient astronomical frequencies and their uncertainties. The first method (TimeOptB) calculates the posterior probability density function (PDF) of the axial precession frequency k and of the sedimentation rate u for a given cyclostratigraphic data set, while setting the Solar system frequencies gi and si to fixed values. The second method (TimeOptBMCMC) applies an adaptive Markov chain Monte Carlo algorithm to efficiently sample the posterior PDF of all the parameters that affect astronomical cycles recorded in stratigraphy: five gi, five si, k, and u. We also include an approach to assess the significance of detecting astronomical cycles in cyclostratigraphic records. The methods provide an extension of current approaches that is computationally efficient and well suited to recover the history of astronomical cycles, Earth-Moon history, and the evolution of the Solar system from geological records. As case studies, data from the Xiamaling Formation (N. China, 1.4 Ga) and ODP Site 1262 (S. Atlantic, 55 Ma) are evaluated, providing updated estimates of astronomical frequencies, Earth-Moon history, and secular resonance terms.
Chapter
The restricted Boltzmann machine (RBM) is a fundamentally different model from the feed-forward network. Conventional neural networks are input-output mapping networks where a set of inputs is mapped to a set of outputs. On the other hand, RBMs are networks in which the probabilistic states of a network are learned for a set of inputs, which is useful for unsupervised modeling.
Preprint
Recently, there has been a growing interest in the development of gradient-based sampling algorithms for text generation, especially in the context of controlled generation. However, there exists a lack of theoretically grounded and principled approaches for this task. In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods. We use discrete distributions given by language models to define densities and develop an algorithm based on Hamiltonian Monte Carlo to sample from them. We name our gradient-based technique Structured Voronoi Sampling (SVS). In an experimental setup where the reference distribution is known, we show that the empirical distribution of SVS samples is closer to the reference distribution compared to alternative sampling schemes. Furthermore, in a controlled generation task, SVS is able to generate fluent and diverse samples while following the control targets significantly better than other methods.
Chapter
The Cambridge Handbook of Computational Cognitive Sciences is a comprehensive reference for this rapidly developing and highly interdisciplinary field. Written with both newcomers and experts in mind, it provides an accessible introduction of paradigms, methodologies, approaches, and models, with ample detail and illustrated by examples. It should appeal to researchers and students working within the computational cognitive sciences, as well as those working in adjacent fields including philosophy, psychology, linguistics, anthropology, education, neuroscience, artificial intelligence, computer science, and more.
Chapter
Imputing missing data is still a challenge for mixed datasets containing variables of different nature such as continuous, count, ordinal, categorical, and binary variables. The recently introduced Mixed Deep Gaussian Mixture Models (MDGMM) explicitly handle such different variable types. MDGMMs learn continuous and low dimensional representations of mixed datasets that capture the inter-variable dependence structure. We propose a model inversion that uses the learned latent representation and maps it with the observed parts of the signal. Latent areas of interest are identified for each missing value using an optimization method and synthetic imputation values are drawn. This new procedure is called MI2AMI (Missing data Imputation using MIxed deep GAussian MIxture models). The approach is tested against state-of-the-art mixed data imputation algorithms based on chained equations, Random Forests, k-Nearest Neighbours, and Generative Adversarial Networks. Two missing values designs were tested, namely the Missing Completly at Random (MCAR) and Missing at Random (MAR) designs, with missing value rates ranging from 10% to 30%.KeywordsMissing dataMixed dataData augmentation
Chapter
We consider the approximation of unknown or intractable integrals using quadrature when the evaluation of the integrand is considered very costly. This is a central problem both within and without machine learning, including model averaging, (hyper-)parameter marginalization, and computing posterior predictive distributions. Recently, Batch Bayesian Quadrature has successfully combined the probabilistic integration techniques of Bayesian Quadrature with the parallelization techniques of Batch Bayesian Optimization, resulting in improved performance when compared to state-of-the-art Markov Chain Monte Carlo techniques, especially when parallelization is increased. While the selection of batches in Batch Bayesian Quadrature mitigates costs associated with individual point selection, every point within every batch is nevertheless chosen serially, which impedes the realization of the full potential of batch selection. We resolve this shortcoming. We have developed a novel Batch Bayesian Quadrature method that allows for the updating of points within a batch without incurring the costs traditionally associated with non-serial point selection. We show that our method efficiently reduces uncertainty, leads to lower error estimates of the integrand, and therefore results in more numerically robust estimates of the integral. We demonstrate our method and support our findings using a synthetic test function from the Batch Bayesian Quadrature literature.KeywordsBatch bayesian quadratureBatch updatingFuture uncertainty samplingIntractable integralsMachine learning
Article
The practical use of nonparametric Bayesian methods requires the availability of efficient algorithms for posterior inference. The inherently serial nature of traditional Markov chain Monte Carlo (MCMC) methods imposes limitations on their efficiency and scalability. In recent years, there has been a surge of research activity devoted to developing alternative implementation methods that target parallel computing environments. Sequential Monte Carlo (SMC), also known as a particle filter, has been gaining popularity due to its desirable properties. SMC uses a genetic mutation-selection sampling approach with a set of particles representing the posterior distribution of a stochastic process. We propose to enhance the performance of SMC by utilizing Hamiltonian transition dynamics in the particle transition phase, in place of random walk used in the previous literature. We call the resulting procedure Hamiltonian Sequential Monte Carlo (HSMC). Hamiltonian transition dynamics have been shown to yield superior mixing and convergence properties relative to random walk transition dynamics in the context of MCMC procedures. The rationale behind HSMC is to translate such gains to the SMC environment. HSMC will facilitate practical estimation of models with complicated latent structures, such as nonparametric individual unobserved heterogeneity, that are otherwise difficult to implement. We demonstrate the behavior of HSMC in a challenging simulation study and contrast its favorable performance with SMC and other alternative approaches. We then apply HSMC to a panel discrete choice model with nonparametric consumer heterogeneity, allowing for multiple modes, asymmetries, and data-driven clustering, providing insights for consumer segmentation, individual level marketing, and price micromanagement.
Article
API recommendation differs from conventional recommendation systems in that it integrates the characteristics of development requirements, mashup, and API, where there are two primary challenges, i.e., how to effectively mine the personalized development requirements of developers and the sparseness of interaction data. This research proposes a novel model framework, called Bayesian Probabilistic Matrix Factorization Model with Text Similarity and Adversarial Training (SAMF), to address these two issues. Utilizing natural language processing technology to extract the full-text semantics of requirements documents, mashup descriptions, and API descriptions, fully mining personalized development requirements, and contextual information of mashups and APIs, and calculating text similarity, is our fundamental idea. Simultaneously, the collaborative filtering method is used to mine user needs, mashup, and API information from historical data, and the obtained text similarity is added to the training as auxiliary information. Furthermore, adversarial training is further incorporated to supplement data in order to minimize data sparsity and enhance model robustness and generalization. To assess the performance of SAMF, we run thorough experiments that illustrate the efficacy of each module of the model and explain the influence of hyperparameter settings. Specifically, compared to the baseline, the experimental findings demonstrate that SAMF can basically achieve better performance.
Preprint
Full-text available
This paper aims to reevaluate the Taylor Rule, through a linear and a nonlinear method, such that its estimated federal funds rates match those actually previously implemented by the Federal Reserve Bank. In the linear method, this paper uses an OLS regression model to find more accurate coefficients within the same Taylor Rule equation in which the dependent variable is the federal funds rate, and the independent variables are the inflation rate, the inflation gap, and the output gap. The intercept in the OLS regression model would capture the constant equilibrium target real interest rate set at 2. The linear OLS method suggests that the Taylor Rule overestimates the output gap and standalone inflation rate's coefficients for the Taylor Rule. The coefficients this paper suggests are shown in equation (2). In the nonlinear method, this paper uses a machine learning system in which the two inputs are the inflation rate and the output gap and the output is the federal funds rate. This system utilizes gradient descent error minimization to create a model that minimizes the error between the estimated federal funds rate and the actual previously implemented federal funds rate. Since the machine learning system allows the model to capture the more realistic nonlinear relationship between the variables, it significantly increases the estimation accuracy as a result. The actual and estimated federal funds rates are almost identical besides three recessions caused by bubble bursts, which the paper addresses in the concluding remarks. Overall, the first method provides theoretical insight while the second suggests a model with improved applicability.
Article
Objectives To access the performances of different algorithms for quantification of Intravoxel incoherent motion (IVIM) parameters D, f, \(D^*\) in Vertebral Bone Marrow (VBM).Materials and methodsFive algorithms were studied: four deterministic algorithms (the One-Step and three segmented methods: Two-Step, Three-Step, and Fixed-\(D^*\) algorithm) based on the least-squares (LSQ) method and a Bayesian probabilistic algorithm. Numerical simulations and quantification of IVIM parameters D, f, \(D^*\) in vivo in vertebral bone marrow, were done on six healthy volunteers. The One-way repeated-measures analysis of variance (ANOVA) followed by Bonferroni’s multiple comparison test (p value = 0.05) was applied.ResultsIn numerical simulations, the Bayesian algorithm provided the best estimation of D, f, \(D^*\) compared to the deterministic algorithms. In vivo VBM–IVIM, the values of D and f estimated by the Bayesian algorithm were close to those of the One-Step method, in contrast to the three segmented methods.DiscussionThe comparison of the five algorithms indicates that the Bayesian algorithm provides the best estimation of VBM–IVIM parameters, in both numerical simulations and in vivo data.
ResearchGate has not been able to resolve any references for this publication.