Thesis

# Probabilistic Inference Using Markov Chain Monte Carlo Methods

Authors:
To read the full-text of this research, you can request a copy directly from the author.

## No full-text available

... Quantifying prediction uncertainty associated with, for example, noisy and limited data as well as NN overparametrization, is paramount for deep learning to be reliably used in critical applications involving physical and biological systems. The most successful family of UQ methods so far in deep learning has been based on the Bayesian framework [26][27][28][29][30][31][32][33][34][35][36][37][38]. Alternative methods are based, indicatively, on ensembles of NN optimization iterates or independently trained NNs [39][40][41][42][43][44][45][46][47][48][49][50], as well as on the evidential framework [51][52][53][54][55][56][57][58][59]. ...
... Nevertheless, in this paper we use the 500 unseen samples of f and λ for producing the corresponding predictions of u and λ using the trained U-NNPC and U-NNPC+. Table 10 summarizes the performance of the considered UQ methods for predicting the mean and the standard deviation (std) of the partially unknown stochastic processes u(x; ξ) and λ(x; ξ) of Eq. (26). In addition, Fig. 27 presents the errors of the mean and standard deviation predictions of the UQ methods, as obtained by training with noisy stochastic realizations. ...
... Here we evaluate the accuracy (RL2E) of U-PI-GAN, U-NNPC, and U-NNPC+ in terms of mean and standard deviation (std) predictions corresponding to the partially unknown stochastic processes u(x; ξ) and λ(x; ξ) of Eq. (26). The training data consist of 1,000 clean or noisy realizations of f , u, and λ, while the test (reference) data we use to calculate RL2E are clean in all cases. ...
Preprint
Full-text available
Neural networks (NNs) are currently changing the computational paradigm on how to combine data with mathematical laws in physics and engineering in a profound way, tackling challenging inverse and ill-posed problems not solvable with traditional methods. However, quantifying errors and uncertainties in NN-based inference is more complicated than in traditional methods. This is because in addition to aleatoric uncertainty associated with noisy data, there is also uncertainty due to limited data, but also due to NN hyperparameters, overparametrization, optimization and sampling errors as well as model misspecification. Although there are some recent works on uncertainty quantification (UQ) in NNs, there is no systematic investigation of suitable methods towards quantifying the total uncertainty effectively and efficiently even for function approximation, and there is even less work on solving partial differential equations and learning operator mappings between infinite-dimensional function spaces using NNs. In this work, we present a comprehensive framework that includes uncertainty modeling, new and existing solution methods, as well as evaluation metrics and post-hoc improvement approaches. To demonstrate the applicability and reliability of our framework, we present an extensive comparative study in which various methods are tested on prototype problems, including problems with mixed input-output data, and stochastic problems in high dimensions. In the Appendix, we include a comprehensive description of all the UQ methods employed, which we will make available as open-source library of all codes included in this framework.
... For instances where the target posterior is differentiable on the Euclidean manifold, Hamiltonian Monte Carlo (HMC) provides a powerful mechanism to sample from differentiable target posterior distributions [20,21]. HMC extends the parameter space into the phase space via the introduction of an auxiliary momentum variable which ensures that different energy levels are explored. ...
... HMC exploits the first-order gradient information of the target density to guide its exploration of the phase space. The use of gradient information reduces the randomwalk behaviour typically associated with the Metropolis-Hastings algorithm [3,10,21]. HMC introduces the mass matrix for the momentum variable, trajectory length and step size parameters that need to be tuned for optimal results. Extensions of HMC include Riemannian Manifold Hamiltonian Monte Carlo (RMHMC) [1], the No-U-Turn Sampler (NUTS) [22] and Magnetic Hamiltonian Monte Carlo (MHMC) [6]. ...
... One of the parameters that needs to be set in HMC is the mass matrix of the auxiliary momentum variable. This mass matrix is typically set to equal the identity matrix [1,6,9,21]. Although this produces good results, it is not necessarily the optimal choice across all target distributions. ...
Article
Full-text available
Hamiltonian Monte Carlo (HMC) is a Markov Chain Monte Carlo algorithm that is able to generate distant proposals via the use of Hamiltonian dynamics, which are able to incorporate first-order gradient information about the target posterior. This has driven its rise in popularity in the machine learning community in recent times. It has been shown that making use of the energy-time uncertainty relation from quantum mechanics, one can devise an extension to HMC by allowing the mass matrix to be random with a probability distribution instead of a fixed mass. Furthermore, Magnetic Hamiltonian Monte Carlo (MHMC) has been recently proposed as an extension to HMC and adds a magnetic field to HMC which results in non-canonical dynamics associated with the movement of a particle under a magnetic field. In this work, we utilise the non-canonical dynamics of MHMC while allowing the mass matrix to be random to create the Quantum-Inspired Magnetic Hamiltonian Monte Carlo (QIMHMC) algorithm, which is shown to converge to the correct steady state distribution. Empirical results on a broad class of target posterior distributions show that the proposed method produces better sampling performance than HMC, MHMC and HMC with a random mass matrix.
... Erosheva, 2003;Pritchard et al., 2000). Variational inference transforms the posterior approximation into an optimization problem over simpler distributions with independent parameters (Jordan et al., 1999;Wainwright & Jordan, 2008;Blei et al., 2017), while Markov Chain Monte Carlo enables users to sample from the desired posterior distribution (Neal, 1993;Neal et al., 2011;Robert & Casella, 2013). However, these likelihood-based methods require numerous iterations without any guarantee beyond local improvement at each step (Kulesza et al., 2014). ...
... Instead of using Variational inference (Jordan et al., 1999;Wainwright & Jordan, 2008;Blei et al., 2017) or Markov Chain Monte Carlo (Neal, 1993;Neal et al., 2011;Robert & Casella, 2013), our new algorithms build upon the Joint-Stochastic Matrix Factorization (JSMF) (Lee et al., 2015). Let H ∈R N ×M be the word-document matrix whose m-th column vector h m counts the occurrences of each of the N words in the vocabulary in document m. ...
Preprint
Full-text available
Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it becomes rapidly more expensive to store and run inference algorithms on co-occurrence statistics. Rectifying co-occurrence, the key process to uphold model assumptions, becomes increasingly more vital in the presence of rare terms, but current techniques cannot scale to large vocabularies. We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space. We also present new algorithms learning latent variables from the compressed statistics, and verify that our methods perform comparably to previous approaches on both textual and non-textual data.
... There are many successful sampling algorithms [31,2,11,32]. One class of classical sampling approach is the celebrated Markov chain Monte Carlo (MCMC) [32,35,21,12,20]. ...
... There are many successful sampling algorithms [31,2,11,32]. One class of classical sampling approach is the celebrated Markov chain Monte Carlo (MCMC) [32,35,21,12,20]. This is a class of methods that sets the target distribution as the invariant measure of the Markov transition kernel, so after many rounds of iteration, the sample can be viewed to be drawn from the invariant measure. ...
Article
The classical Langevin Monte Carlo method looks for samples from a target distribution by descending the samples along the gradient of the target distribution. The method enjoys a fast convergence rate. However, the numerical cost is sometimes high because each iteration requires the computation of a gradient. One approach to eliminate the gradient computation is to employ the concept of "ensemble." A large number of particles are evolved together so the neighboring particles provide gradient information to each other. In this article, we discuss two algorithms that integrate the ensemble feature into LMC, and the associated properties. In particular, we find that if one directly surrogates the gradient using the ensemble approximation, the algorithm, termed Ensemble Langevin Monte Carlo, is unstable due to a high variance term. If the gradients are replaced by the ensemble approximations only in a constrained manner, to protect from the unstable points, the algorithm, termed Constrained Ensemble Langevin Monte Carlo, resembles the classical LMC up to an ensemble error but removes most of the gradient computation.
... Chain Monte Carlo (Neal, 1993), Sequential Monte Carlo (Doucet, Freitas, and Gordon, 2001)), that are sampling methods (ii) Variational Inference methods (e.g. Variational Bayes (Jordan et al., 1999), Expectation Propagation (Minka, 2001)), that rely on optimisation techniques. ...
... Ces dernières appartiennent principalement à deux grandes catégories : (i) les méthodes de Monte Carlo (e.g. les méthodes par échantillonnage préférentiel adaptatif(Oh and Berger, 1992), les méthodes de Monte Carlo par chaînes de Markov(Neal, 1993), les méthodes de Monte Carlo séquentielles(Doucet, Freitas, and Gordon, 2001)), qui sont des méthodes d'échantillonnage (ii) les méthodes d'Inférence Variationnelle (e.g. l'algorithme Variational Bayes(Jordan et al., 1999) et l'algorithme Expectation Propagation(Minka, 2001)), qui reposent sur des techniques d'optimisation.En l'état, les méthodes d'Inférence Variationnelle sont souvent plébiscitées du fait de leurs avantages numériques. ...
Thesis
This thesis lies in the field of Statistical Inference and more precisely in Bayesian Inference, where the goal is to model a phenomenon given some data while taking into account prior knowledge on the model parameters.The availability of large datasets sparked the interest in using complex models for Bayesian Inference tasks that are able to capture potentially complicated structures inside the data. Such a context requires the development and study of adaptive algorithms that can efficiently process large volumes of data when the dimension of the model parameters is high.Two main classes of methods attempt to fulfil this role: sampling-based Monte Carlo methods and optimisation-based Variational Inference methods. By relying on the optimisation literature and more recently on Monte Carlo methods, the latter have made it possible to construct fast algorithms that overcome some of the computational hurdles encountered in Bayesian Inference.Yet, the theoretical results and empirical performances of Variational Inference methods are often impacted by two factors: one, an inappropriate choice of the objective function appearing in the optimisation problem and two, a search space that is too restrictive to match the target at the end of the optimisation procedure.This thesis explores how we can remedy the two issues mentioned above in order to build improved adaptive algorithms for complex models at the intersection of Monte Carlo and Variational Inference methods.In our work, we suggest selecting the $alpha$-divergence as a more general class of objective functions and we propose several ways to enlarge the search space beyond the traditional framework used in Variational Inference. The specificity of our approach in this thesis is then that it derives numerically advantageous adaptive algorithms with strong theoretical foundations, in the sense that they provably ensure a systematic decrease in the $alpha$-divergence at each step. In addition, we unravel important connections between the sampling-based and the optimisation-based methodologies.
... Markov chain Monte Carlo (MCMC) is one of the most powerful approaches to sample from complex target distributions in statistical pattern recognition and Bayesian machine learning. It has been widely employed in probabilistic modeling and inference [1,2]. MCMC methods approximate target distributions by generating samples from a proposal distribution depending on the last sample, and ensure that the samples converge to the target distribution by satisfying the detailed balance [3]. ...
... MCMC methods [2] aim to construct an ergodic Markov chain converging to p(x) under a target density p(x) =p (x) Z p , wherep(x) can be readily evaluated and Z p is an unknown constant. At each step of the algorithm, the new sample x is obtained from the transition kernel (or proposal distribution) K θ (x |x) which depends on the current state x. ...
Article
Full-text available
Recently, flow models parameterized by neural networks have been used to design efficient Markov chain Monte Carlo (MCMC) transition kernels. However, inefficient utilization of gradient information of the target distribution or the use of volume-preserving flows limits their performance in sampling from multi-modal target distributions. In this paper, we treat the training procedure of the parameterized transition kernels in a different manner and exploit a novel scheme to train MCMC transition kernels. We divide the training process of transition kernels into the exploration stage and training stage, which can make full use of the gradient information of the target distribution and the expressive power of deep neural networks. The transition kernels are constructed with non-volume-preserving flows and trained in an adversarial form. The proposed method achieves significant improvement in effective sample size and mixes quickly to the target distribution. Empirical results validate that the proposed method is able to achieve low autocorrelation of samples and fast convergence rates, and outperforms other state-of-the-art parameterized transition kernels in varieties of challenging analytically described distributions and real world datasets.
... The above Gaussian mixture model with EM algorithm for parameter optimisation is widely used in the unsupervised classification. The maximum likelihood estimation can also be achieved by Markov chain Monte Carlo method (Neal 1993;Richardson and Green 1997) in a fully Bayesian flavour to find a global solution instead of a local minimum in EM algorithm. But MCMC is computation expensive and thus is less used in unsupervised classiScation. ...
... Much of these are reviewed iu Ruanaidh and Fitzgerald (1996) and Neal (1993). Barker and Rayner (2000) proposed a reversible jump Markov chain Monte Carlo method for image segmentation, enabling the sampling to include the cluster number. ...
Thesis
p>This thesis addresses the automatic extraction of a reference tissue region, devoid of receptor sites, which can then be used as an input for a reference tissue model, allowing for the quantification of receptor sites. It is shown that this segmentation can be determined from the time-activity curves associated with each voxel within the 3D volume, using modern machine learning methods. Previously, supervised learning techniques have not been considered in PET reference region extraction. In this thesis, two new methods are proposed to incorporate expert knowledge and the image models with the data: a hierarchical method and a semi-supervised image segmentation framework. Markov random field (MRF) models are used as a stochastic image model to specify the spatial interactions. The first method uses a Bayesian neural network with a hierarchical Markov random field model. The second method advances the first method by employing a semi-supervised image segmentation framework to combine the fidelity of supervised data with the quantity of unsupervised data. This is realised by a three-level image model structure with probability distributions specifying the interconnections. This has the advantages for the generalisation performance and hence the reduction of bias in PET reference region extraction. An Expectation Maximisation based algorithm is proposed to solve this combined learning problem. The performance of unsupervised, supervised and semi-supervised classification in temporal models and spatio-temporal models are compared, using both simulated and [<sup>11</sup>C] (R) -PK11195 PET data. In conclusion, it shows that the inclusion of expert knowledge greatly reduces the uncertainty in the segmentation with the new semi-supervised framework achieving substantial performance gains over the other methods.</p
... The theory of Markov chains provides us with a general formalism in order to evaluate sampling algorithms, in terms of consistency of the samples w.r.t. the desired sampled distribution (Neal, 1993). Starting from an initial probability distribution denoted by π (0) (x) the Markov chain simulator constructs a probability distribution at time t: π (t) (x). ...
... A complete review of Markov Chain Monte Carlo methods is beyond the scope of the thesis, however for more details on MCMC methods we highly recommend the book by MacKay, 2003 and the seminal papers by Hastings (1970) and Neal (1993 ...
Thesis
For any Internet service provider or network operator, it is crucial to quickly and efficiently diagnose the problems that occur on the network. The benefits of a good fault diagnosis system are mainly to minimize the costs of network and service operations and to enhance the customer's quality of experience. One major challenge for any diagnosis system concerns the discovery of new faults, that are unknown to the current version of the diagnosis system. The exploratory process for finding new faults can prove to be expensive and time consuming for internet service providers. In this thesis, we explore an alternative approach based on learning methods, in order to build learning-based diagnosis systems. Our study explores Probabilistic Graphical Models that are capable of clustering patterns of faults in an unsupervised and a semi-supervised manner. We demonstrate the efficiency of our models on real use-cases of large scale data, extracted from Fiber-to-the Home (FTTH) services based on Gigabit-capable Passive Optical Networks.
... Понашање система у овом тренутку времена 1  n t описано је матрицом прелаза вероватноће. Знајући коначну матрицу вероватноће може се предвидети "гранично" понашање разматраног система [3,13,14]. ...
... Једначина (14) карактерише могућу стратегију система. За процену адекватности (једнакости) примене изабране стратегије понашања саставља се матрица трошкова (прихода) облика: ...
Conference Paper
Full-text available
Мachines are exposed to loads and environmental influences, which over time leads to deformations and damage. Predicting system failure is important in order to determine the choices of the appropriate way to maintain and plan major interventions. Predictions of system maintenance can be realized by analyzing Markov processes. This paper deals with the analysis of the prediction of the monitoring of the state of machines systems at the level of maintenance of bearing assemblies with the help of the optimal strategy of transition graphs. The paper will describe methods for determining the best maintenance strategy, selected optimization and give recommendations for their use.
... We adopt the MCMC algorithm as a straightforward and plausible approach for the effective inference of the posterior distributions on the parameters constructed; see [55,56]. MCMC works by drawing a sequence of dependent samples from the posterior distribution under circumstances, where it is unlikely for a direct draw of independent samples. ...
Article
Full-text available
The COVID-19 pandemic has highlighted the necessity of advanced modeling inference using the limited data of daily cases. Tracking a long-term epidemic trajectory requires explanatory modeling with more complexities than the one with short-time forecasts, especially for the highly vaccinated scenario in the latest phase. With this work, we propose a novel modeling framework that combines an epidemiological model with Bayesian inference to perform an explanatory analysis on the spreading of COVID-19 in Israel. The Bayesian inference is implemented on a modified SEIR compartmental model supplemented by real-time vaccination data and piecewise transmission and infectious rates determined by change points. We illustrate the fitted multi-wave trajectory in Israel with the checkpoints of major changes in publicly announced interventions or critical social events. The result of our modeling framework partly reflects the impact of different stages of mitigation strategies as well as the vaccination effectiveness, and provides forecasts of near future scenarios.
... Monte Carlo methods utilize sampling to approximate the solutions to problems that are intractable to solve analytically or with numerical methods that scale poorly for high-dimensional problems. Classical Monte Carlo methods have been used for inference [146], integration [22], and optimization [147]. Monte Carlo integration (MCI), the focus of this section, is critical to finance for risk and pricing predictions [27]. ...
Preprint
Full-text available
Quantum computers are expected to surpass the computational capabilities of classical computers during this decade and have transformative impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from quantum computing, not only in the medium and long terms, but even in the short term. This survey paper presents a comprehensive summary of the state of the art of quantum computing for financial applications, with particular emphasis on Monte Carlo integration, optimization, and machine learning, showing how these solutions, adapted to work on a quantum computer, can help solve more efficiently and accurately problems such as derivative pricing, risk analysis, portfolio optimization, natural language processing, and fraud detection. We also discuss the feasibility of these algorithms on near-term quantum computers with various hardware implementations and demonstrate how they relate to a wide range of use cases in finance. We hope this article will not only serve as a reference for academic researchers and industry practitioners but also inspire new ideas for future research.
... The challenge is to infer the posterior of the latent parameters given the data that were actually observed: p(P u , S u | A u ). For all but a handful of conjugate models, the posterior is intractable to derive analytically [5,11,25]. Rather, to infer the underlying parameters in the latent space, we use amortised variational inference [17,18,26]. Amortised inference uses a neural network to encode a data point into the latent parameters that are associated with its approximate posterior distribution. ...
Article
Full-text available
Virtual rewards, such as badges, are commonly used in online platforms as incentives for promoting contributions from a userbase. It is widely accepted that such rewards “steer” people’s behaviour towards increasing their rate of contributions before obtaining the reward. This paper provides a new probabilistic model of user behaviour in the presence of threshold rewards, such a badges. We find, surprisingly, that while steering does affect a minority of the population, the majority of users do not change their behaviour around the achievement of these virtual rewards. In particular, we find that only approximately 5–30% of Stack Overflow users who achieve the rewards appear to respond to the incentives. This result is based on the analysis of thousands of users’ activity patterns before and after they achieve the reward. Our conclusion is that the phenomenon of steering is less common than has previously been claimed. We identify a statistical phenomenon, termed “Phantom Steering”, that can account for the interaction data of the users who do not respond to the reward. The presence of phantom steering may have contributed to some previous conclusions about the ubiquity of steering. We conduct a qualitative survey of the users on Stack Overflow which supports our results, suggesting that the motivating factors behind user behaviour are complex, and that some of the online incentives used in Stack Overflow may not be solely responsible for changes in users’ contribution rates.
... However, a full Bayesian estimation over all network parameters is computationally expensive and finding true posterior probability is intractable. These limitations are normally addressed by employing various tricks like Markov Chain Monte Carlo (MCMC) sampling [188] and Variational Inference (VI) [133], or a combination of the two [209], to approximate the true posterior with a manageable distribution. A CNN trained using Bayesian estimates for network parameters is shown to lag its counterpart, trained using point-estimates, in terms of classification accuracy [218,219]. ...
... Nevertheless, the problem of computing the partition function of a physical system generally belongs to the #P -hard complexity class [16,17]. For example, Markov Chain Monte Carlo (MCMC) method [12][13][14][15][16] provides an approach to sampling from high dimensional probability distributions. This method can be used to approximate partition functions with O(∆ −1 ) sampling complexity, where ∆ represents the spectral gap * yusen.wu@research.uwa.edu.au ...
Preprint
The partition function is an essential quantity in statistical mechanics, and its accurate computation is a key component of any statistical analysis of quantum system and phenomenon. However, for interacting many-body quantum systems, its calculation generally involves summing over an exponential number of terms and can thus quickly grow to be intractable. Accurately and efficiently estimating the partition function of its corresponding system Hamiltonian then becomes the key in solving quantum many-body problems. In this paper we develop a hybrid quantum-classical algorithm to estimate the partition function, utilising a novel Clifford sampling technique. Note that previous works on quantum estimation of partition functions require $\mathcal{O}(1/\epsilon\sqrt{\Delta})$-depth quantum circuits~\cite{Arunachalam2020Gibbs, Ashley2015Gibbs}, where $\Delta$ is the minimum spectral gap of stochastic matrices and $\epsilon$ is the multiplicative error. Our algorithm requires only a shallow $\mathcal{O}(1)$-depth quantum circuit, repeated $\mathcal{O}(1/\epsilon^2)$ times, to provide a comparable $\epsilon$ approximation. Shallow-depth quantum circuits are considered vitally important for currently available NISQ (Noisy Intermediate-Scale Quantum) devices.
... The most well-known topic model is the Latent Dirichlet Allocation (LDA) (Blei et al., 2003b), a generative probabilistic model that relies on cooccurrence patterns between observed words to compute latent topics. The inference step of LDA is commonly based on approximation methods such as variational inference or collapsed Gibbs sampling, due to the intractability of exact inference at scale (Neal, 1993). ...
... Unfortunately, this algorithm is not applicable for complex graphical models which have many complicated tasks. Therefore, some approximation algorithms have been developed to resolve this problem: Expectation Maximization (Dempster, Laird, & Rubin, 1977), Laplace Approximations, Expectation propagation (Minka, 2001), Monte Carlo Markov Chain (Neal, 1993), and Variational Inference (Blei, Kucukelbir, & McAuliffe, 2017). Furthermore, the designed probabilistic dependencies between variables may also not be fixed but learned from data (so-called structure learning). ...
Preprint
Full-text available
Transfer learning where the behavior of extracting transferable knowledge from the source domain(s) and reusing this knowledge to target domain has become a research area of great interest in the field of artificial intelligence. Probabilistic graphical models (PGMs) have been recognized as a powerful tool for modeling complex systems with many advantages, e.g., the ability to handle uncertainty and possessing good interpretability. Considering the success of these two aforementioned research areas, it seems natural to apply PGMs to transfer learning. However, although there are already some excellent PGMs specific to transfer learning in the literature, the potential of PGMs for this problem is still grossly underestimated. This paper aims to boost the development of PGMs for transfer learning by 1) examining the pilot studies on PGMs specific to transfer learning, i.e., analyzing and summarizing the existing mechanisms particularly designed for knowledge transfer; 2) discussing examples of real-world transfer problems where existing PGMs have been successfully applied; and 3) exploring several potential research directions on transfer learning using PGM.
... The proposed HIMP imputes missing data with MNAR patterns and stores the results, and then it decomposes the results into two datasets D MCAR and D MAR including missing data with MCAR and MAR patterns, respectively. Next, D MCAR is imputed using single imputation methods K-nearest neighbor (KNN) [65] and hot-deck [66] while D MAR is imputed using three multiple imputation methods Markov chain Monte Carlo (MCMC) [67][68][69], multivariate imputation by chained equations (MICE) [70,71] and expectation maximization (Em) [72]. In this step, the imputed values estimated by each method are assessed using different classifiers to determine winner imputed methods and their D MCAR and D MAR datasets. ...
Article
Full-text available
Real medical datasets usually consist of missing data with different patterns which decrease the performance of classifiers used in intelligent healthcare and disease diagnosis systems. Many methods have been proposed to impute missing data, however, they do not fulfill the need for data quality especially in real datasets with different missing data patterns. In this paper, a four-layer model is introduced, and then a hybrid imputation (HIMP) method using this model is proposed to impute multi-pattern missing data including non-random, random, and completely random patterns. In HIMP, first, non-random missing data patterns are imputed, and then the obtained dataset is decomposed into two datasets containing random and completely random missing data patterns. Then, concerning the missing data patterns in each dataset, different single or multiple imputation methods are used. Finally, the best-imputed datasets gained from random and completely random patterns are merged to form the final dataset. The experimental evaluation was conducted by a real dataset named IRDia including all three missing data patterns. The proposed method and comparative methods were compared using different classifiers in terms of accuracy, precision, recall, and F1-score. The classifiers’ performances show that the HIMP can impute multi-pattern missing values more effectively than other comparative methods.
... In recent years, research has widely focused on the computational aspects of the Bayesian inverse method [26]. The brute force approach is represented by the Markov Chain Monte Carlo (MCMC) method [27][28][29][30]. This approach has the advantage of being model-independent, but it requires a huge number of model simulations. ...
Article
Full-text available
In civil and mechanical engineering, Bayesian inverse methods may serve to calibrate the uncertain input parameters of a structural model given the measurements of the outputs. Through such a Bayesian framework, a probabilistic description of parameters to be calibrated can be obtained ; this approach is more informative than a deterministic local minimum point derived from a classical optimization problem. In addition, building a response surface surrogate model could allow one to overcome computational difficulties. Here, the general polynomial chaos expansion (gPCE) theory is adopted with this objective in mind. Owing to the fact that the ability of these methods to identify uncertain inputs depends on several factors linked to the model under investigation , as well as the experiment carried out, the understanding of results is not univocal, often leading to doubtful conclusions. In this paper, the performances and the limitations of three gPCE-based stochastic inverse methods are compared: the Markov Chain Monte Carlo (MCMC), the polynomial chaos expansion-based Kalman Filter (PCE-KF) and a method based on the minimum mean square error (MMSE). Each method is tested on a benchmark comprised of seven models: four analytical abstract models, a one-dimensional static model, a one-dimensional dynamic model and a finite element (FE) model. The benchmark allows the exploration of relevant aspects of problems usually encountered in civil, bridge and infrastructure engineering, highlighting how the degree of non-linearity of the model, the magnitude of the prior uncertainties, the number of random variables characterizing the model, the information content of measurements and the measurement error affect the performance of Bayesian updating. The intention of this paper is to highlight the capabilities and limitations of each method, as well as to promote their critical application to complex case studies in the wider field of smarter and more informed infrastructure systems.
... When fitting parameters of a probabilistic program, such techniques require the program to be explicitly specified as a probabilistic program. The parameters are then optimized using inference techniques like Monte Carlo inference [63] and variational inference [11]. In contrast, when optimizing parameters with surrogate optimization the original program can be specified in any form, while the parameters are optimized with stochastic gradient descent. ...
Preprint
Full-text available
Surrogates, models that mimic the behavior of programs, form the basis of a variety of development workflows. We study three surrogate-based design patterns, evaluating each in case studies on a large-scale CPU simulator. With surrogate compilation, programmers develop a surrogate that mimics the behavior of a program to deploy to end-users in place of the original program. Surrogate compilation accelerates the CPU simulator under study by $1.6\times$. With surrogate adaptation, programmers develop a surrogate of a program then retrain that surrogate on a different task. Surrogate adaptation decreases the simulator's error by up to $50\%$. With surrogate optimization, programmers develop a surrogate of a program, optimize input parameters of the surrogate, then plug the optimized input parameters back into the original program. Surrogate optimization finds simulation parameters that decrease the simulator's error by $5\%$ compared to the error induced by expert-set parameters. In this paper we formalize this taxonomy of surrogate-based design patterns. We further describe the programming methodology common to all three design patterns. Our work builds a foundation for the emerging class of workflows based on programming with surrogates of programs.
... Although several probabilistic topic models have been proposed to extract the hierarchical topic structure of a corpus [3,12], the Markov chain Monte Carlo (MCMC) method [25] they employed for inference is quite time-consuming and is impractical to train for a largescale dataset. Recently, TSNTM [11] is developed to model the topic hierarchy based on the neural variational inference (NVI) framework with good scalability, but the topic hierarchy extracted by TSNTM is not reasonable enough because the DRNN it applied is unsuitable to discover hierarchical semantics. ...
Article
Full-text available
Topic models have been widely used for learning the latent explainable representation of documents, but most of the existing approaches discover topics in a flat structure. In this study, we propose an effective hierarchical neural topic model with strong interpretability. Unlike the previous neural topic models, we explicitly model the dependency between layers of a network, and then combine latent variables of different layers to reconstruct documents. Utilizing this network structure, our model can extract a tree-shaped topic hierarchy with low redundancy and good explainability by exploiting dependency matrices. Furthermore, we introduce manifold regularization into the proposed method to improve the robustness of topic modeling. Experiments on real-world datasets validate that our model outperforms other topic models in several widely used metrics with much fewer computation costs.
... From an uncertainty point of view, the transition PDFs (from prior to posterior) are often associated with monotonically decreasing uncertainty. Therefore, TMCMC can be classified as an annealing algorithm [45,46] that starts with an initial "hot" state (with greater uncertainty) and ends at a final "cold" solution (lower uncertainty). The samples are transitioned from one state to the next using sampling importance resampling 1 taken to be the sample standard deviation normalized by the sample mean (SIR) [40], whereby the resampling is achieved via multiple MCMC chains (one per unique sample) that aim to infuse diversity in the samples (ensuring that the ensemble is distributed according to the intermediate PDF associated with that particular stage). ...
Preprint
Full-text available
In the context of Bayesian inversion for scientific and engineering modeling, Markov chain Monte Carlo sampling strategies are the benchmark due to their flexibility and robustness in dealing with arbitrary posterior probability density functions (PDFs). However, these algorithms been shown to be inefficient when sampling from posterior distributions that are high-dimensional or exhibit multi-modality and/or strong parameter correlations. In such contexts, the sequential Monte Carlo technique of transitional Markov chain Monte Carlo (TMCMC) provides a more efficient alternative. Despite the recent applicability for Bayesian updating and model selection across a variety of disciplines, TMCMC may require a prohibitive number of tempering stages when the prior PDF is significantly different from the target posterior. Furthermore, the need to start with an initial set of samples from the prior distribution may present a challenge when dealing with implicit priors, e.g. based on feasible regions. Finally, TMCMC can not be used for inverse problems with improper prior PDFs that represent lack of prior knowledge on all or a subset of parameters. In this investigation, a generalization of TMCMC that alleviates such challenges and limitations is proposed, resulting in a tempering sampling strategy of enhanced robustness and computational efficiency. Convergence analysis of the proposed sequential Monte Carlo algorithm is presented, proving that the distance between the intermediate distributions and the target posterior distribution monotonically decreases as the algorithm proceeds. The enhanced efficiency associated with the proposed generalization is highlighted through a series of test inverse problems and an engineering application in the oil and gas industry.
... These methods are particularly well suited for studying systems at critical and cold temperatures, however they only apply to Ising/Potts models defined on graphs. For Spin Glass models, a well-studied method is Heat Bath, also termed the sequential Gibbs, [e.g Neal, 1993], often augmented with a tempering scheme when studying cold systems [e.g. Swendsen and Wang, 1986, Hukushima et al., 1998, Katzgraber et al., 2001, Yucesoy, 2013. ...
Preprint
Full-text available
Ising and Potts models are an important class of discrete probability distributions which originated from Statistical Physics and since then have found applications in several disciplines. Simulation from these models is a well known challenging problem. In this paper, we propose a class of MCMC algorithms to simulate from both Ising and Potts models, by using auxiliary Gaussian random variables. Our algorithms apply to coupling matrices with both positive and negative entries, thus including Spin Glass models such as the SK and Hopfield model. In contrast to existing methods of a similar flavor, our algorithm can take advantage of the low-rank structure of the coupling matrix, and scales linearly with the number of states in a Potts model. We compare our proposed algorithm to existing state of the art algorithms, such as the Swendsen-Wang and Wolff algorithms for Ising and Potts models on graphs, and the Heat Bath for Spin Glass models. Our comparison takes into account a wide range of coupling matrices and temperature regimes, focusing in particular on behavior at or below the critical temperature. For cold systems, augmenting our algorithm with a tempering scheme yields significant improvements.
... This unnormalized posterior distribution can be calculated analytically if the probability distributions are chosen in a specific way, which might be limiting in practice. With the advancement of computer technology and the introduction of Markov Chain Monte Carlo (MCMC) methods [41], such as the Metropolis-algorithm [42], it is possible to estimate complex probability distributions numerically. The algorithm used in this work to sample from the posterior density distribution is described in the following section. ...
Article
Full-text available
On the Moon, in the near infrared wavelength range, spectral diagnostic features such as the 1-μm and 2-μm absorption bands can be used to estimate abundances of the constituent minerals. However, there are several factors that can darken the overall spectrum and dampen the absorption bands. Namely, (1) space weathering, (2) grain size, (3) porosity, and (4) mineral darkening agents such as ilmenite have similar effects on the measured spectrum. This makes spectral unmixing on the Moon a particularly challenging task. Here, we try to model the influence of space weathering and mineral darkening agents and infer the uncertainties introduced by these factors using a Markov Chain Monte Carlo method. Laboratory and synthetic mixtures can successfully be characterized by this approach. We find that the abundance of ilmenite, plagioclase, clino-pyroxenes and olivine cannot be inferred accurately without additional knowledge for very mature spectra. The Bayesian approach to spectral unmixing enables us to include prior knowledge in the problem without imposing hard constraints. Other data sources, such as gamma-ray spectroscopy, can contribute valuable information about the elemental abundances. We here find that setting a prior on TiO2 and Al2O3 can mitigate many of the uncertainties, but large uncertainties still remain for dark mature lunar spectra. This illustrates that spectral unmixing on the Moon is an ill posed problem and that probabilistic methods are important tools that provide information about the uncertainties, that, in turn, help to interpret the results and their reliability.
... In addition, independently of the value of the constraint g, as the bias a increases the critical temperature T c decreases, squeezing more and more the retrieval region towards smaller temperatures. This dependance is evidenced both from the second panel of Fig. 2 [33][34][35] , in terms of the Mattis magnetisation. It can be noticed how, as the parameter a increases for a fixed g, analytical and numerical solutions become closer and closer. ...
Article
The formal equivalence between the Hopfield network (HN) and the Boltzmann Machine (BM) has been well established in the context of random, unstructured and unbiased patterns to be retrieved and recognised. Here we extend this equivalence to the case of “biased” patterns, that is patterns which display an unbalanced count of positive neurons/pixels: starting from previous results of the bias paradigm for the HN, we construct the BM’s equivalent Hamiltonian introducing a constraint parameter for the bias correction. We show analytically and numerically that the parameters suggested by equivalence are fixed points under contrastive divergence evolution when exposed to a dataset of blurred examples of each pattern, also enjoying large basins of attraction when the model suffers of a noisy initialisation. These results are also shown to be robust against increasing storage of the models, and increasing bias in the reference patterns. This picture, together with analytical derivation of HN’s phase diagram via self-consistency equations, allows us to enhance our mathematical control on BM’s performance when approaching more realistic datasets.
... For the Bayesian state filters, when filters have non-Gaussian or non-linear models, Kalman filters (KF) are replaced by extended KF (EKF) [40], unscented KF (UKF) [41], or numerical methods such as Monte Carlo methods or particle filters [42]. These algorithms usually have high computational cost and slow convergence, and are more intractable as the dimensions of random variables increase. ...
Article
Full-text available
Multi-target tracking (MTT) is an important component of situation-awareness based on the Internet of Things (IoT). Existing algorithms mainly focus on tracking based on conventional measurements, e.g., bearings or ranges. However, measurement parameter estimations are considered in isolation, limiting the accuracy and resolution of MTT, and the related data association is an NP-hard multi-dimensional assignment problem. In this paper, we develop a new one-step MTT algorithm based on a novel dynamic Bayesian network (DBN), i.e, DBNMTT. The new MTT algorithm directly infers target states from the raw measurement data by fusing the array signal model, the signal propagation model, and the motion model. In this new DBNMTT framework, we treat target states and conventional measurements such as bearings and target energies as hidden random variables. The posterior joint probability optimization problem is translated into the problem of graphical model learning. In this way, we can improve the accuracy and resolution of MTT and convert the NP-hard data association problem to a hidden variable learning problem. For non-conjugate models in the DBNMTT, we develop a novel reparameterized approximation variational inference (ReAVI) approach to solve the learning problem. The ReAVI converts non-conjugate models to conjugate models with new parameters and reuses the mean-field algorithm. The performance of our proposed new MTT method, namely DBNMTT-ReAVI, is analyzed on extensive simulations in challenging scenarios. The simulation results show that the DBNMTT-ReAVI algorithm is superior to conventional measurement based MTT algorithms in several aspects including the success probability, the convergence, the resolution, and the accuracy.
... The thinning removes that dependency. Hence, we can declare these important MCMC parameters in the model tuning process as hyperparameters that always need to be set on a case-by-case basis (Metropolis and Ulam, 1949;Neal, 1993Neal, , 2012Hoffman and Gelman, 2014;VanDerwerken and Schmidler, 2017;Betancourt, 2017). ...
Article
Some growth data in aquaculture have peculiar characteristics that generate consequences in the analysis and modeling. They are usually incomplete or limited, as classified in this article. This means data are restricted to a few observations and often are limited to observations below the curve’s inflection point due to economic interests in farm settings, or due to limitation of physical space in controlled research laboratories, for example. This possibly causes under and/or overestimation in the inference of nonlinear models. Through shrimp growth simulations from the Michaelis–Menten curve, the limited data were synthesized with threshold observation up to the first 7, 13, 18, 36, and 82 weeks. Seven sigmoid growth functions (Logistic, Gompertz, von Bertalanffy, Richard, Weibull, Morgan–Mercer–Flodin, and the own Michaelis–Menten growth) were fitted to respective limited data, in order to assess the research hypothesis. Taking the scenarios with incompleteness in the first 7, 13 and 18 weeks, the parameters of all growth curves modeled under a frequentist approach were underestimated. Thus, we propose a correction for this possible problem through a hierarchical Bayesian approach. Real data from shrimp farming in northeastern Brazil were used to compare it with the traditional frequentist approach employed. The sensitivity in detecting outstanding treatment (pond or batch level hierarchy) can make the new method a powerful management tool in animal production, and also in trials designed for scientific research.
... From the computational perspective, limitations are inherited from the poor scalability of gps (Rasmussen and Williams, 2006), for which inference becomes impractical when the number of runs of the code and the number of real observations are collectively beyond a few thousands. In addition, the use of Markov chain Monte Carlo (mcmc) (Neal, 1993) techniques to carry out inference for gp models can be painfully slow without careful tuning and clever parameterizations (Filippone et al., 2013;Filippone and Girolami, 2014). ...
... This is motivated by the fact that exact Bayesian inference is often not possible, and sampling approaches represent one major class of tractable approximation algorithms. While there are a number of different sampling approaches (many being variants of Markov Chain Monte Carlo [MCMC]; (Neal, 1993)), the basic idea is that, instead of inferring a full posterior probability distribution, one can simply start by picking a specific hypothesis and calculating its probability. Then one can stochastically move to another hypothesis and evaluate its probability relative to the first. ...
Preprint
Full-text available
There is a growing body of evidence suggesting that the neural processes underlying perception, learning, and decision-making approximate Bayesian inference. Yet, humans perform poorly when asked to solve explicit probabilistic reasoning problems. In response, some have argued that certain brain processes are Bayesian while others are not; others have argued that reasoning errors can be explained by either inaccurate generative models or limitations of approximation algorithms. In this paper, we offer a complementary perspective by considering how a Bayesian brain would implement conscious reasoning processes more generally. These considerations require making two distinctions, each of which highlights a fundamental reason why Bayesian brains should not be expected to perform well at explicit inference. The first distinction is between inferring probability distributions over hidden states and representing probabilities as hidden states. The former assumes that the brain’s dynamics instantiate a form of approximate Bayesian inference, premised on a model of how observations are generated by hidden states of the world. In contrast, the latter assumes the brain represents probabilities themselves as hidden states – namely, hypotheses about the correct answers to explicit reasoning problems. In this latter case, correctly inferring the most likely probability to report would implausibly require the brain to possess a generative model encoding Bayes’ theorem itself. The second distinction is between inference and mental action. In addition to state inference, consciously solving Bayes’ theorem requires the selection of a particular sequence of goal-directed cognitive actions (e.g., mental multiplication and addition, followed by division). While Bayesian brains infer probability distributions over action sequences, the possible sequences themselves often need to be learned. These considerations show that, regardless of the specific generative model in question or approximation algorithm employed, and even if all brain processes were Bayesian, an innate proficiency at solving explicit probabilistic reasoning problems should not be expected.
... The whole idea lies on the "annealing" of solely the data likelihood and not both priors and likelihood (unormalized posterior) (Neal [102]). The reason is for preventing the optimization algorithm getting stuck due to any skewed priors (Mandt et al. [84]). ...
Thesis
Epigenetics is the field of biology that studies the changes in organisms due to alteration of gene expression rather than modification of the DNA sequence itself. DNA methylation is a well-studied type of epigenetic change, which results in gene silencing and can be dangerous when occurs at tumour suppressor gene loci. Many techniques have been developed to map the methylation pattern of individuals at several genetic loci, such as the HumanMethylation450 BeadChip, the EPIC BeadChip and the whole-genome bisulfite sequencing. Each of these DNA profiling platforms quantifies methylation occurrence in different ways, either continuously (rates of methylation intensity) or discretely (counts of methylated reads). Identifying subgroups of individuals with similar methylation patterns, as well as those genetic loci that discriminate the subgroups, is a crucial procedure that helps linking diseases to specific methylation patterns. Clustering analysis and posterior feature selection of the most important genetic loci that discriminate each subgroup of individuals are the two tools we suggest for achieving this venture. Clustering DNA methylation data though is not a trivial procedure since they are platform-specific and not normally distributed. In this thesis, we propose clustering DNA methylation data based on the data type (continuous or discrete) by fast model-based clustering methods, while we select the most important/discriminatory genetic loci by an a posteriori feature selection measure. Specifically, we apply variational non-Gaussian Dirichlet Process mixture models because they have infinite number of components that allow model-determination and are flexible to model any discrete or continuous data type. We also employ Variational Inference with the “annealing” extension that accounts for poor initialisation of the algorithm, due to its high speed in estimating the model parameters and its scalability to high-dimensional data. Our real applications on neonatal DNA methylation data measured in three different ways show that the discrete data types - number of aberrantly methylated genetic loci (counts) and whether a genetic locus is abnormally methylated or not (binary) - can be more informative than its continuous version (intensity of methylation per genetic locus) for revealing the association of artificial conception with the predisposition of developmental disorders.
... The evaluation of Eq. (20) requires the calculation of multi-dimensional integrals which is not possible in practical applications. Markov Chain Monte Carlo (MCMC) methods [50] have been widely used for their ability to estimate the posterior PDF while this approach makes it possible to obtain samples directly from the posterior distribution and bypassing the computation of the evidence. Out of the vast amount of MCMC algorithms available in the literature, the Metropolis-Hastings (M-H) algorithm [51,52] is used here as a stochastic simulation method given its versatility and its ease of implementation [47]. ...
Article
Full-text available
The use of guided waves to identify damage has become a popular method due to its robustness and fast execution, as well as the advantage of being able to inspect large areas and detect minor structural defects. When a travelling wave on a plate interacts with a defect, it generates a scattered field that will depend on the defects geometry. By analysing the scattered field, one can thus characterise the type and size of the plate damage. A Bayesian framework based on a guided waves interaction model for damage identification of infinite plate for the first time is presented here. A semi-analytical approach based on the lowest order plate theories is adopted to obtain the scattering features for damage geometries with circular symmetry, resulting in an efficient inversion procedure. Subsequently, ultrasound experiments are performed on a large aluminium plate with a circular indentation to generate wave reflection and transmission coefficients. With the aid of signal processing techniques, the effectiveness and efficiency of the proposed approach are verified. A full finite element model is used to test the damage identification scheme. Finally, the scattering coefficients are reconstructed, reliably matching the experimental results. The framework supports digital twin technology of structural health monitoring.
... Assigning prior distributions over the hyperparameters however, requires the evaluation of numerous integrals (see section 3.5 of Bishop 2006), which is no longer analytically tractable. In this situation approximation techniques such a Markov Chain Monte Carlo sampling are required (e.g., Neal 1993), which in many cases can be computationally prohibitive. ...
Conference Paper
This thesis explores the application of two novel machine learning approaches to the study of polar climate, with particular focus on Arctic sea ice. The first technique, complex networks, is based on an unsupervised learning approach which is able to exploit spatio-temporal patterns of variability within geospatial time series data sets. The second, Gaussian Process Regression (GPR), is a supervised learning Bayesian inference approach which establishes a principled framework for learning functional relationships between pairs of observation points, through updating prior uncertainty in the presence of new information. These methods are applied to a variety of problems facing the polar climate community at present, although each problem can be considered as an individual component of the wider problem relating to Arctic sea ice predictability. In the first instance, the complex networks methodology is combined with GPR in order to produce skilful seasonal forecasts of pan-Arctic and regional September sea ice extents, with up to 3 months lead time. De-trended forecast skills of 0.53, 0.62, and 0.81 are achieved at 3-, 2- and 1-month lead time respectively, as well as generally highest regional predictive skill ($> 0.30$) in the Pacific sectors of the Arctic, although the ability to skilfully predict many of these regions may be changing over time. Subsequently, the GPR approach is used to combine observations from CryoSat-2, Sentinel-3A and Sentinel-3B satellite radar altimeters, in order to produce daily pan-Arctic estimates of radar freeboard, as well as uncertainty, across the 2018--2019 winter season. The empirical Bayes numerical optimisation technique is also used to derive auxiliary properties relating to the radar freeboard, including its spatial and temporal (de-)correlation length scales, allowing daily pan-Arctic maps of these fields to be generated as well. The estimated daily freeboards are consistent to CryoSat-2 and Sentinel-3 to within $< 1$ mm (standard deviations $< 6$ cm) across the 2018--2019 season, and furthermore, cross-validation experiments show that prediction errors are generally $\leq 4$ mm across the same period. Finally, the complex networks approach is used to evaluate the presence of the winter Arctic Oscillation (AO) to summer sea ice teleconnection within 31 coupled climate models participating in phase 6 of the World Climate Research Programme Coupled Model Intercomparison Project (CMIP6). Two global metrics are used to compare patterns of variability between observations and models: the Adjusted Rand Index and a network distance metric. CMIP6 models generally over-estimate the magnitude of sea-level pressure variability over the north-western Pacific Ocean, and under-estimate the variability over the north Africa and southern Europe, while they also under-estimate the importance of regions such as the Beaufort, East Siberian and Laptev seas in explaining pan-Arctic summer sea ice area variability. They also under-estimate the degree of covariance between the winter AO and summer sea ice in key regions such as the East Siberian Sea and Canada basin, which may hinder their ability to make skilful seasonal to inter-annual predictions of summer sea ice.
... This p(θ ) prior is dependent on the size of the data set, and its parametric form is very complicated, favoring weights with large partition functions. To address this intractable problem, the MCMC (Markov Chain Monte Carlo) (Neal, 1993) methods, such as Metropolis or Langevin sampler, are being used to generate correlated samples from probability distribution with unknown normalization. These methods allow to generalize the inference to any Bayesian learning in a general undirected model of the form 4.16. ...
Thesis
The main objective of this thesis is to improve the automatic capture of semantic information with the goal of modeling and understanding human communication. We have advanced the state of the art in discourse parsing, in particular in the retrieval of discourse structure from chat, in order to implement, at the industrial level, tools to help explore conversations. These include the production of automatic summaries, recommendations, dialogue acts detection, identification of decisions, planning and semantic relations between dialogue acts in order to understand dialogues. In multi-party conversations it is important to not only understand the meaning of a participant's utterance and to whom it is addressed, but also the semantic relations that tie it to other utterances in the conversation and give rise to different conversation threads. An answer must be recognized as an answer to a particular question; an argument, as an argument for or against a proposal under discussion; a disagreement, as the expression of a point of view contrasted with another idea already expressed. Unfortunately, capturing such information using traditional supervised machine learning methods from quality hand-annotated discourse data is costly and time-consuming, and we do not have nearly enough data to train these machine learning models, much less deep learning models. Another problem is that arguably, no amount of data will be sufficient for machine learning models to learn the semantic characteristics of discourse relations without some expert guidance; the data are simply too sparse. Long distance relations, in which an utterance is semantically connected not to the immediately preceding utterance, but to another utterance from further back in the conversation, are particularly difficult and rare, though often central to comprehension. It is therefore necessary to find a more efficient way to retrieve discourse structures from large corpora of multi-party conversations, such as meeting transcripts or chats. This is one goal this thesis achieves. In addition, we not only wanted to design a model that predicts discourse structure for multi-party conversation without requiring large amounts of hand-annotated data, but also to develop an approach that is transparent and explainable so that it can be modified and improved by experts. The method detailed in this thesis achieves this goal as well.
... Figure 1 provides a detailed schematic of our Bayesian GAT model. (10), we can use variational inference [14], [24], [25] or MCMC [26], [27] to approximate the posterior of p(W |Y, X, G r ). According to [28], averaging the weights of the network is an approximate way of Monte Carlo dropout. ...
... (3) The generative process induces an intractable posterior that requires approximation algorithms like Monte Carlo simulation [50] or variational inference [1]. Unfortunately, there is always a tradeoff between accuracy and efficiency with these approximations since they can only be asymptotically exact [57]. ...
Preprint
Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in documents, the difficulty of incorporating external linguistic knowledge, and the lack of both accurate and efficient inference methods for approximating the intractable posterior. Recently, pretrained language models (PLMs) have brought astonishing performance improvements to a wide variety of tasks due to their superior representations of text. Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models. In this paper, we begin by analyzing the challenges of using PLM representations for topic discovery, and then propose a joint latent space learning and clustering framework built upon PLM embeddings. In the latent space, topic-word and document-topic distributions are jointly modeled so that the discovered topics can be interpreted by coherent and distinctive terms and meanwhile serve as meaningful summaries of the documents. Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery, and is conceptually simpler than topic models. On two benchmark datasets in different domains, our model generates significantly more coherent and diverse topics than strong topic models, and offers better topic-wise document representations, based on both automatic and human evaluations.
... The utilization of Monte Carlo sampling methods for improving the Gaussian-based approximate inference (Salimans et al., 2015) has been recently studied for deep generative models. For instance, in (Hoffman, 2017), the authors propose to initialize Markov Chain Monte Carlo (Neal, 1993) with proposals given by the variational approximation provided by the Gaussian encoder, achieving higher likelihoods and producing samples with better quality. More specifically, due to its efficiency and superior performance in exploring regions of interest in the density, Hamiltonian Monte Carlo (HMC) (Duane et al., 1987;Betancourt and Girolami, 2015) outlines among Monte Carlo algorithms in machine learning. ...
Preprint
Full-text available
Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data and identifying outliers. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. Our experiments show that HH-VAEM outperforms existing baselines in the tasks of missing data imputation, supervised learning and outlier identification with missing features. Finally, we also present a sampling-based approach for efficiently computing the information gain when missing features are to be acquired with HH-VAEM. Our experiments show that this sampling-based approach is superior to alternatives based on Gaussian approximations.
... Generally speaking, BNNs are stochastic artificial neural networks trained with Bayesian inference in the parameter space. Hence, the bulk of the work on BNNs has focused on designing scalable approximate inference methods (Neal, 1993;Mackay, 1992;Ritter et al., 2018;Hoffman et al., 2013;Khan et al., 2018;Osawa et al., 2019;Gal & Ghahramani, 2016;Wilson et al., 2016;Al-Shedivat et al., 2017;Gal, 2016;Minka, 2013;Soudry et al., 2014;Hernández-Lobato & Adams, 2015). There has also been a growing interest in obtaining BNNs from the deterministic counterparts, such as the stochastic weight averaging (Izmailov et al., 2018;Maddox et al., 2019;Wilson & Izmailov, 2020). ...
Preprint
This work theoretically studies stochastic neural networks, a main type of neural network in use. Specifically, we prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. Two common examples that our theory applies to are neural networks with dropout and variational autoencoders. Our result helps better understand how stochasticity affects the learning of neural networks and thus design better architectures for practical problems.
... For example, if on trial #1 the participant chose to take the hint and then chose the left machine, this would be: mdp(1).u = (Neal, 1993). These methods involve sequentially sampling from a distribution according to specific sets of rules that try to find locations under that distribution with high probability. ...
Article
Full-text available
The active inference framework, and in particular its recent formulation as a partially observable Markov decision process (POMDP), has gained increasing popularity in recent years as a useful approach for modeling neurocognitive processes. This framework is highly general and flexible in its ability to be customized to model any cognitive process, as well as simulate predicted neuronal responses based on its accompanying neural process theory. It also affords both simulation experiments for proof of principle and behavioral modeling for empirical studies. However, there are limited resources that explain how to build and run these models in practice, which limits their widespread use. Most introductions assume a technical background in programming, mathematics, and machine learning. In this paper we offer a step-by-step tutorial on how to build POMDPs, run simulations using standard MATLAB routines, and fit these models to empirical data. We assume a minimal background in programming and mathematics, thoroughly explain all equations, and provide exemplar scripts that can be customized for both theoretical and empirical studies. Our goal is to provide the reader with the requisite background knowledge and practical tools to apply active inference to their own research. We also provide optional technical sections and multiple appendices, which offer the interested reader additional technical details. This tutorial should provide the reader with all the tools necessary to use these models and to follow emerging advances in active inference research.
... Markov Chain Monte Carlo (MCMC) [7,[42][43][44] is a method to obtain access to unbiased samples from potentially high-dimensional probability distributions. These samples can then be used to compute quantities of interest such as parameter means and variances. ...
Article
Full-text available
We present a case study for Bayesian analysis and proper representation of distributions and dependence among parameters when calibrating process-oriented environmental models. A simple water quality model for the Elbe River (Germany) is referred to as an example, but the approach is applicable to a wide range of environmental models with time-series output. Model parameters are estimated by Bayesian inference via Markov Chain Monte Carlo (MCMC) sampling. While the best-fit solution matches usual least-squares model calibration (with a penalty term for excessive parameter values), the Bayesian approach has the advantage of yielding a joint probability distribution for parameters. This posterior distribution encompasses all possible parameter combinations that produce a simulation output that fits observed data within measurement and modeling uncertainty. Bayesian inference further permits the introduction of prior knowledge, e.g., positivity of certain parameters. The estimated distribution shows to which extent model parameters are controlled by observations through the process of inference, highlighting issues that cannot be settled unless more information becomes available. An interactive interface enables tracking for how ranges of parameter values that are consistent with observations change during the process of a step-by-step assignment of fixed parameter values. Based on an initial analysis of the posterior via an undirected Gaussian graphical model, a directed Bayesian network (BN) is constructed. The BN transparently conveys information on the interdependence of parameters after calibration. Finally, a strategy to reduce the number of expensive model runs in MCMC sampling for the presented purpose is introduced based on a newly developed variant of delayed acceptance sampling with a Gaussian process surrogate and linear dimensionality reduction to support function-valued outputs.
Preprint
Full-text available
M\"ossbauer spectroscopy, which provides knowledge related to electronic states in materials, has been applied to various fields such as condensed matter physics and material sciences. In conventional spectral analyses based on least-square fitting, hyperfine interactions in materials have been determined from the shape of observed spectra. In conventional spectral analyses, it is difficult to discuss the validity of the hyperfine interactions and the estimated values. We propose a spectral analysis method based on Bayesian inference for the selection of hyperfine interactions and the estimation of M\"ossbauer parameters. An appropriate Hamiltonian has been selected by comparing Bayesian free energy among possible Hamiltonians. We have estimated the M\"ossbauer parameters and evaluated their estimated values by calculating the posterior distribution of each M\"ossbauer parameter with confidence intervals. We have also discussed the accuracy of the spectral analyses to elucidate the noise intensity dependence of numerical experiments.
Thesis
In the modern world, machine learning, including deep learning, has become an indispensable part of many intelligent systems, helping people automate the decision-making process. For certain applications (e.g. health care services), reliable predictions and trustworthy decisions are crucial factors to be considered when deploying a machine learning system. In other words, machine learning models should be able to reason under uncertainty. Bayesian inference, powered by the probabilistic framework, is believed to be a principled way to incorporate uncertainty into the decision making process. The difficulty of applying Bayesian inference in practice is rooted in the intractability of computing posterior probabilities. Approximate inference provides an alternative workaround by providing a tractable estimate of posterior probabilities. The performance of Bayesian inference, especially in Bayesian deep learning, crucially depends on the quality of the chosen approximate inference algorithm in terms of its accuracy and scalability. Particularly, variational inference (VI) and Markov Chain Monte Carlo (MCMC) are two major techniques with their own merits and limitations. In the first part of the thesis, we aim to design efficient approximate inference algorithms by combining VI and MCMC (particularly, stochastic gradient MCMC (SG-MCMC)). The first proposed algorithm is called partial amortised inference, which leverages SG-MCMC to improve VI’s accuracy. The uncertainty quantification provided by this inference allows us to solve a practical problem: How to train a VAE-like generative model with insufficient training data. The second algorithm, named Meta-SGMCMC, aims at improving the efficiency of SG-MCMC by automating its dynamics design through meta learning and VI. In the second part of the thesis, we shift our focus to a promising future: Stein discrepancy, which greatly expands the choice of approximating distributions compared to the Kullback-Leibler (KL) divergence. We aim to improve on it by addressing the well-known curse-of-dimensionality problem of its scalable variant: kernelized Stein discrepancy. Inspired by the ’slicing’ idea, we propose a new discrepancy family called sliced kernelized Stein discrepancy that is robust to increasing dimensions, along with two theoretically verified downstream applications.
Preprint
Neural networks are increasingly being used to solve partial differential equations (PDEs), replacing slower numerical solvers. However, a critical issue is that neural PDE solvers require high-quality ground truth data, which usually must come from the very solvers they are designed to replace. Thus, we are presented with a proverbial chicken-and-egg problem. In this paper, we present a method, which can partially alleviate this problem, by improving neural PDE solver sample complexity -- Lie point symmetry data augmentation (LPSDA). In the context of PDEs, it turns out that we are able to quantitatively derive an exhaustive list of data transformations, based on the Lie point symmetry group of the PDEs in question, something not possible in other application areas. We present this framework and demonstrate how it can easily be deployed to improve neural PDE solver sample complexity by an order of magnitude.
Article
Perception emerges from unconscious probabilistic inference, which guides behaviour in our ubiquitously uncertain environment. Bayesian decision theory is a prominent computational model that describes how people make rational decisions using noisy and ambiguous sensory observations. However, critical questions have been raised about the validity of the Bayesian framework in explaining the mental process of inference. Firstly, some natural behaviours deviate from Bayesian optimum. Secondly, the neural mechanisms that support Bayesian computations in the brain are yet to be understood. Taking Marr’s cross level approach, we review the recent progress made in addressing these challenges. We first review studies that combined behavioural paradigms and modelling approaches to explain both optimal and suboptimal behaviours. Next, we evaluate the theoretical advances and the current evidence for ecologically feasible algorithms and neural implementations in the brain, which may enable probabilistic inference. We argue that this cross-level approach is necessary for the worthwhile pursuit to uncover mechanistic accounts of human behaviour.
Chapter
In Bayesian learning, the posterior probability density of a model parameter is estimated from the likelihood function and the prior probability of the parameter. The posterior probability density estimate is refined as more evidence becomes available. However, any non-trivial Bayesian model requires the computation of an intractable integral to obtain the probability density function (PDF) of the evidence. Markov Chain Monte Carlo (MCMC) is a well-known algorithm that solves this problem by directly generating the samples of the posterior distribution without computing this intractable integral. We present a novel perspective of the MCMC algorithm which views the samples of a probability distribution as a dynamical system of Information Theoretic particles in an Information Theoretic field. As our algorithm probes this field with a test particle, it is subjected to Information Forces from other Information Theoretic particles in this field. We use Information Theoretic Learning (ITL) techniques based on Rényi’s α-Entropy function to derive an equation for the gradient of the Information Potential energy of the dynamical system of Information Theoretic particles. Using this equation, we compute the Hamiltonian of the dynamical system from the Information Potential energy and the kinetic energy. The Hamiltonian is used to generate the Markovian state trajectories of the system.
Preprint
Energy-Based Models (EBMs) have proven to be a highly effective approach for modelling densities on finite-dimensional spaces. Their ability to incorporate domain-specific choices and constraints into the structure of the model through composition make EBMs an appealing candidate for applications in physics, biology and computer vision and various other fields. In this work, we present a novel class of EBM which is able to learn distributions of functions (such as curves or surfaces) from functional samples evaluated at finitely many points. Two unique challenges arise in the functional context. Firstly, training data is often not evaluated along a fixed set of points. Secondly, steps must be taken to control the behaviour of the model between evaluation points, to mitigate overfitting. The proposed infinite-dimensional EBM employs a latent Gaussian process, which is weighted spectrally by an energy function parameterised with a neural network. The resulting EBM has the ability to utilize irregularly sampled training data and can output predictions at any resolution, providing an effective approach to up-scaling functional data. We demonstrate the efficacy of our proposed approach for modelling a range of datasets, including data collected from Standard and Poor's 500 (S\&P) and UK National grid.
Preprint
Sampling from complicated probability distributions is a hard computational problem arising in many fields, including statistical physics, optimization, and machine learning. Quantum computers have recently been used to sample from complicated distributions that are hard to sample from classically, but which seldom arise in applications. Here we introduce a quantum algorithm to sample from distributions that pose a bottleneck in several applications, which we implement on a superconducting quantum processor. The algorithm performs Markov chain Monte Carlo (MCMC), a popular iterative sampling technique, to sample from the Boltzmann distribution of classical Ising models. In each step, the quantum processor explores the model in superposition to propose a random move, which is then accepted or rejected by a classical computer and returned to the quantum processor, ensuring convergence to the desired Boltzmann distribution. We find that this quantum algorithm converges in fewer iterations than common classical MCMC alternatives on relevant problem instances, both in simulations and experiments. It therefore opens a new path for quantum computers to solve useful--not merely difficult--problems in the near term.
Book
Full-text available
In applied mathematics, the name Monte Carlo is given to the method of solving problems by means of experiments with random numbers. This name, after the casino at Monaco, was first applied around 1944 to the method of solving deterministic problems by reformulating them in terms of a problem with random elements which could then be solved by large-scale sampling. But, by extension, the term has come to mean any simulation that uses random numbers. In the twentieth century and present time, Monte Carlo methods have become among the fundamental techniques of simulation in modern science. This was accomplished after a long history of efforts done by prominent and distinguished mathematicians and scientists. This book is an illustration of the use of Monte Carlo methods when applied to solve specific problems in mathematics, engineering, physics, statistics, or science in general.
ResearchGate has not been able to resolve any references for this publication.