Article

# Gaussian Processes for Machine Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## No full-text available

... Symmetry in the inputs is required in order for a covariance function to be real valued. Covariance functions are also referred to as kernel functions which will be expanded on in Chapters 3 and 4. From here we can define a Gaussian process from a functional context which gives us the ability to build regression models through them [47,29]. ...
... x − m s 2 . [47]. ...
... For a full treatment of this topic we highly recommend the text by Berlinet and Thomas-Agnan [8]. For machine learning with Gaussian processes Williams and Rasmussen [47] give an in-depth overview of the topic. ...
Preprint
Full-text available
The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network. Despite the impracticality of this limit, the neural tangent kernel has allowed for a more direct study of neural networks and a gaze through the veil of their black box. More recently, it has been shown theoretically that the Laplace kernel and neural tangent kernel share the same reproducing kernel Hilbert space in the space of $\mathbb{S}^{d-1}$ alluding to their equivalence. In this work, we analyze the practical equivalence of the two kernels. We first do so by matching the kernels exactly and then by matching posteriors of a Gaussian process. Moreover, we analyze the kernels in $\mathbb{R}^d$ and experiment with them in the task of regression.
... In this work, rather than training a cumbersome metamodeling procedure involving the high-dimensional LES output fields, we train metamodels to predict the reduced coefficients k = [k 1 , · · · , k L ] from the input parameters µ through GPR [Williams and Rasmussen, 2006]. It is worth mentioning that preliminary tests (not presented here) seemed to show a better performance of GPR models compared to other metamodels based on polynomial chaos expansion and decision trees for instance. ...
... The covariance formulation depends on prior variance r l (U * , U * ) over the test dataset refined by information from the training dataset. In this formulation, the matrix [r l (U, U) + s 2 l I] is inverted using a computationally-efficient Cholesky decomposition [Williams and Rasmussen, 2006]. Therefore, the posterior distribution for the lth reduced coefficient can be directly estimated using Eq. 21. ...
... Strong smoothness is irrelevant when dealing with experimental data. It might be hard to distinguish high values of smoothness ν ≥ 7/2 (existence of high-order derivatives) from noisy data [Williams and Rasmussen, 2006]. In this work, since the training dataset is assumed to be noisy, we consider ν = 5/2. ...
Preprint
Mapping near-field pollutant concentration is essential to track accidental toxic plume dispersion in urban areas. By solving a large part of the turbulence spectrum, large-eddy simulations (LES) have the potential to accurately represent pollutant concentration spatial variability. Finding a way to synthesize this large amount of information to improve the accuracy of lower-fidelity operational models (e.g. providing better turbulence closure terms) is particularly appealing. This is a challenge in multi-query contexts, where LES become prohibitively costly to deploy to understand how plume flow and tracer dispersion change with various atmospheric and source parameters. To overcome this issue, we propose a non-intrusive reduced-order model combining proper orthogonal decomposition (POD) and Gaussian process regression (GPR) to predict LES field statistics of interest associated with tracer concentrations. GPR hyperpararameters are optimized component-by-component through a maximum a posteriori (MAP) procedure informed by POD. We provide a detailed analysis of the reducedorder model performance on a two-dimensional case study corresponding to a turbulent atmospheric boundary-layer flow over a surface-mounted obstacle. We show that near-source concentration heterogeneities upstream of the obstacle require a large number of POD modes to be well captured. We also show that the component-by-component optimization allows to capture the range of spatial scales in the POD modes, especially the shorter concentration patterns in the high-order modes. The reduced-order model predictions remain acceptable if the learning database is made of at least fifty to hundred LES snapshot providing a first estimation of the required budget to move towards more realistic atmospheric dispersion applications.
... To marginalize ( 2 i , i ) in the distribution p(ô i | D, r i , 2 i , i ) and obtain p(ô i | D, r i ) as the inference ofô i , we adopt an empirical Bayes method-maximum log marginal likelihood [54]-to estimate a set of optimal values of ( 2 i , i ). Mathematically, the maximization problem is written as ...
... m) . The same result can be obtained using Gaussian process regression [54] with the following noise-corrupted prior: ...
... in which tr(⌃ i ) can alternatively be written as follows [54]: ...
Article
Full-text available
This work proposes a Bayesian inference method for the reduced-order modeling of time-dependent systems. Informed by the structure of the governing equations, the task of learning a reduced-order model from data is posed as a Bayesian inverse problem with Gaussian prior and likelihood. The resulting posterior distribution characterizes the operators defining the reduced-order model, hence the predictions subsequently issued by the reduced-order model are endowed with uncertainty. The statistical moments of these predictions are estimated via a Monte Carlo sampling of the posterior distribution. Since the reduced models are fast to solve, this sampling is computationally efficient. Furthermore, the proposed Bayesian framework provides a statistical interpretation of the regularization term that is present in the deterministic operator inference problem, and the empirical Bayes approach of maximum marginal likelihood suggests a selection algorithm for the regularization hyperparameters. The proposed method is demonstrated on two examples: the compressible Euler equations with noise-corrupted observations, and a single-injector combustion process.
... A GP model can predict non-linear complex relationships with statistical confidence by assuming that the relationship between input and output follows a Gaussian distribution of functions, explained by the mean and variance (see Equation 4 below) (Rasmussen & Williams, 2006). ...
... where m(x) is the mean function, which is normally assumed to be zero (Rasmussen & Williams, 2006), and k(x,x′) is the covariance function (popularly referred to as a "kernel") that is used to generate the covariance matrix. The kernel controls the variance of the prediction, and numerous kernel functions have been developed (Rasmussen & Williams, 2006). ...
... where m(x) is the mean function, which is normally assumed to be zero (Rasmussen & Williams, 2006), and k(x,x′) is the covariance function (popularly referred to as a "kernel") that is used to generate the covariance matrix. The kernel controls the variance of the prediction, and numerous kernel functions have been developed (Rasmussen & Williams, 2006). Different kernel functions may lead to different results, and therefore initial tests have been carried out using the most commonly used kernel functions including Radial Basis Function, Matern 3/2, Matern 5/2, and Exponential. ...
Article
Full-text available
Accurate flood inundation modeling using a complex high‐resolution hydrodynamic (high‐fidelity) model can be very computationally demanding. To address this issue, efficient approximation methods (surrogate models) have been developed. Despite recent developments, there remain significant challenges in using surrogate methods for modeling the dynamical behavior of flood inundation in an efficient manner. Most methods focus on estimating the maximum flood extent due to the high spatial‐temporal dimensionality of the data. This study presents a hybrid surrogate model, consisting of a low‐resolution hydrodynamic (low‐fidelity) and a Sparse Gaussian Process (Sparse GP) model, to capture the dynamic evolution of the flood extent. The low‐fidelity model is computationally efficient but has reduced accuracy compared to a high‐fidelity model. To account for the reduced accuracy, a Sparse GP model is used to correct the low‐fidelity modeling results. To address the challenges posed by the high dimensionality of the data from the low‐ and high‐fidelity models, Empirical Orthogonal Functions analysis is applied to reduce the spatial‐temporal data into a few key features. This enables training of the Sparse GP model to predict high‐fidelity flood data from low‐fidelity flood data, so that the hybrid surrogate model can accurately simulate the dynamic flood extent without using a high‐fidelity model. The hybrid surrogate model is validated on the flat and complex Chowilla floodplain in Australia. The hybrid model was found to improve the results significantly compared to just using the low‐fidelity model and incurred only 39% of the computational cost of a high‐fidelity model.
... Therefore, in this paper, we present a new data-efficient methodology for robust optimization of microwave circuits and systems based on Bayesian inference, which allows designers to take manufacturing imperfections during the optimization process into account, while minimizing the related computational cost. More precisely, our methodology adopts Gaussian Processes (GPs) [28] and propagates the input uncertainty on the design parameters by moment matching the predictive distribution of the underlying GP. Furthermore, we propose a modified version of the Expected Improvement (EI) criterion that accounts for the input uncertainty, called stochastic EI (sEI), which is used to guide the optimization process [29]. ...
... In this work, we adopt a BO framework that leverages on GP as a surrogate model. Different surrogate models can be used in BO, such as Bayesian neural networks [45] and GPs [28,46]. The latter choice (GPs) is common in a BO context and also used in this paper. ...
... , N, denote the set of observations of the design under study. The predictive distribution of GPs for a new input x based on D n (also called posterior distribution) is denoted p( f |x , D n ) and it can be analytically calculated, resulting in a Gaussian distribution with the following moments [28]: ...
Article
Full-text available
In modern electronics, there are many inevitable uncertainties and variations of design parameters that have a profound effect on the performance of a device. These are, among others, induced by manufacturing tolerances, assembling inaccuracies, material diversities, machining errors, etc. This prompts wide interests in enhanced optimization algorithms that take the effect of these uncertainty sources into account and that are able to find robust designs, i.e., designs that are insensitive to the uncertainties early in the design cycle. In this work, a novel machine learning-based optimization framework that accounts for uncertainty of the design parameters is presented. This is achieved by using a modified version of the expected improvement criterion. Moreover, a data-efficient Bayesian Optimization framework is leveraged to limit the number of simulations required to find a robust design solution. Two suitable application examples validate that the robustness is significantly improved compared to standard design methods.
... To analyze the effect of N T , N L and SNR, we take in account that since the approximated PDF p m (t|Θ) is Gaussian, the minimum of the KLD (31) is obtained when the first two moments of the true and the approximated PDFs are matched [29]. When the SNR is high, the first two moments of the approximated PDF approach those of the true PDF. ...
... Similar to the calibration procedure, we can obtain a coarse estimate by means of the direct linear transformation method together with (29) to initialize the Gauss-Newton method for pose estimation. ...
... As a baseline method, we consider the calibration algorithm for cameras [33]. Because the algorithm of [33] takes the image of an object as input, which in this case corresponds to the observation of the light spot's position, we use (29) prior to this algorithm to convert the observation t to the estimated center of the light spot. The number of LEDs in the LED plane is set to N L = 25, and the receiver observes the plane from N T = 4 randomly generated poses with θ β = π/9 rad. ...
Article
... Subsequently, initial experiments are performed, after which a ML model, also referred to as a surrogate model, is trained on the acquired data. A popular surrogate model for this is a Gaussian process (GP), as the uncertainty of the predictions is readily available with this technique [9]. An acquisition function then iteratively determines the best experiment to perform based on the trained machine model. ...
... A surrogate model, which represents the studied process, is required for the uncertainty measurement. For active learning, a convenient surrogate model is a GP which gives both a predicted outcome and an uncertainty on this prediction for an experiment [9]. The GP is trained iteratively after every experiment to get a more accurate description of the modeled reaction, which allows for an improved experimental design. ...
... This in contradiction to, for example, artificial neural networks (ANN) [53,54]. A GP is a kernelbased ML technique which gives a probability distribution of all functions that can fit the given data [9]. The kernel is a function that approximates the covariance matrix, allowing to calculate the expected value and uncertainty of the model predictions. ...
Article
Research in chemical engineering requires experiments, which are often expensive, time-consuming, and laborious. Design of experiments (DoE) aims to extract maximal information from a minimum number of experiments. The combination of DoE with machine learning leads to the field of active learning, which results in a more flexible, multi-dimensional selection of experiments. Active learning has not yet been applied in reaction modeling, as most active learning techniques still require an excessive amount of data. In this work, a novel active learning framework called GandALF that combines Gaussian processes and clustering is proposed and validated for yield prediction. The performance of GandALF is compared to other active learning strategies in a virtual case study for hydrocracking. Compared to these active learning methods, the novel framework out-performs the state-of-the-art and achieves a 33%-reduction in experiments. The proposed active learning approach is the first to also perform well for data-scarce applications, which is demonstrated by selecting experiments to investigate the ex-situ catalytic pyrolysis of plastic waste. Both a common DoE-technique, and our methodology selected 18 experiments to study the effect of temperature, space time, and catalyst on the olefin yield for the catalytic pyrolysis of LDPE. The experiments selected with active learning were significantly more informative than the regular DoE-technique, proving the applicability of GandALF for reaction modeling and experimental campaigns.
... where υ ∈ R nυ , n υ = n x + n u are input measurements and y ∈ R nx are output measurements of the system, which are gathered subject to some noisy process ω ∈ W ⊆ R nx [45]. Here, W is assumed to be an infinite set representative of possible realisations of system noise, such that: ...
... SPs define a probability model over an infinite collection of random variables, any finite subset of which have a joint distribution [46]. This definition leads to the interpretation of SPs as probability distributions over functions [45], such that one realisation of an SP can be thought of as obtaining a sample from a function space. When the distribution over the function space is assumed Gaussian, the resultant model is termed a GP. ...
... As such, the decision constitutes a problem not dissimilar from architecture search in neural networks where information about the dataset and system concerned often aids model construction. Popular choices include the Matern 5/2 and radial basis function (RBF) covariance functions [45]. The RBF is detailed as follows: ...
Chapter
Reinforcement Learning (RL) has generated excitement within the process industries within the context of decision making under uncertainty. The primary benefit of RL is that it provides a flexible and general approach to handling systems subject to both exogenous and endogenous uncertainties. Despite this there has been little reported uptake of RL in the process industries. This is partly due to the inability to provide optimality guarantees under the model used for learning, but more importantly due to safety concerns. This has led to the development of RL algorithms in the context of ‘Safe RL’. In this work, we present an algorithm that leverages the variance prediction of Gaussian process state space models to a) handle operational constraints and b) account for mismatch between the offline process model and the real online process. The algorithm is then benchmarked on an uncertain Lutein photo-production process against nonlinear model predictive control (NMPC) and several state-of-the-art Safe RL algorithms. Through the definition of key performance indicators, we quantitatively demonstrate the efficacy of the method with respect to objective performance and probabilistic constraint satisfaction.
... One method to find these parameters is Bayesian optimization [2], whereby a Gaussian process model [3] of the objective function over the parameter space is progressively learned. At each iteration of this method, the model sug-gests the most promising set of parameters to be assessed by the simulation, and the simulation result is in turn used to update and refine the model. ...
... The MTBO algorithm [9,10] builds a Gaussian process model [3] of the simulation output (a scalar that quantifies beam quality in our case) as a function of both the vector of design parameters (e.g., plasma density, beam profile parameters) and the fidelity . (We let = 1 denote low-fidelity and = 2 the high-fidelity, and we denote the respective simulation output at a given fidelity by ( ).) Accordingly, the correlation kernel used inside the Gaussian process model depends on both the parameters and the fidelity and is assumed to be of the form (( , x), ( ′ , x ′ )) = ′ ( − ′ ), where is typically a Mattérn kernel [3] and is a 2×2 symmetric matrix. ...
... The MTBO algorithm [9,10] builds a Gaussian process model [3] of the simulation output (a scalar that quantifies beam quality in our case) as a function of both the vector of design parameters (e.g., plasma density, beam profile parameters) and the fidelity . (We let = 1 denote low-fidelity and = 2 the high-fidelity, and we denote the respective simulation output at a given fidelity by ( ).) Accordingly, the correlation kernel used inside the Gaussian process model depends on both the parameters and the fidelity and is assumed to be of the form (( , x), ( ′ , x ′ )) = ′ ( − ′ ), where is typically a Mattérn kernel [3] and is a 2×2 symmetric matrix. In practice, the coefficients of (as well as the parameters of ) are hyperparameters that are automatically determined by maximizing the marginal likelihood of the previously observed data. ...
Conference Paper
Full-text available
When designing a laser-plasma acceleration experiment, one commonly explores the parameter space (plasma density, laser intensity, focal position, etc.) with simulations in order to find an optimal configuration that, for example, minimizes the energy spread or emittance of the accelerated beam. However, laser-plasma acceleration is typically modeled with full particle-in-cell (PIC) codes, which can be computationally expensive. Various reduced models can approximate beam behavior at a much lower computational cost. Although such models do not capture the full physics, they could still suggest promising sets of parameters to be simulated with a full PIC code and thereby speed up the overall design optimization. In this work we automate such a workflow with a Bayesian multitask algorithm, where each task has a different fidelity. This algorithm learns from past simulation results from both full PIC codes and reduced PIC codes and dynamically chooses the next parameters to be simulated. We illustrate this workflow with a proof-of-concept optimization using the Wake-T and FBPIC codes. The libEnsemble library is used to orchestrate this workflow on a modern GPU-accelerated high-performance computing system.
... The usual choice for the probabilistic model is to use Gaussian Process Regression (GPR), also known as Kriging (Stein 1999;Rasmussen & Williams 2006, and references therein). Kriging surrogate-based optimization was previously used for parameter optimization of plasma transport codes by Rodriguez-Fernandez et al. (2018). ...
... GPR is a Bayesian regression technique and is very powerful for interpolating small sets of data as well as retaining information about the uncertainty of the regression (Stein 1999;Rasmussen & Williams 2006, and references therein). Gaussian process (GP) is a stochastic process, for which any finite collection of random values has a multivariate normal distribution. ...
... where k * represents the vector of covariances between the test point, x * , and the n observations, f n is a vector of the n observations, and K is the covariance matrix (Gutmann & Corander 2016;Rasmussen & Williams 2006). Since the function evaluations in this work are deterministic, the σ n term is constrained to a low value that does not impact the predictions. ...
Preprint
Plasma-terminating disruptions in future fusion reactors may result in conversion of the initial current to a relativistic runaway electron beam. Validated predictive tools are required to optimize the scenarios and mitigation actuators to avoid the excessive damage that can be caused by such events. Many of the simulation tools applied in fusion energy research require the user to specify several input parameters that are not constrained by the available experimental information. Hence, a typical validation exercise requires multiparameter optimization to calibrate the uncertain input parameters for the best possible representation of the investigated physical system. The conventional approach, where an expert modeler conducts the parameter calibration based on domain knowledge, is prone to lead to an intractable validation challenge. For a typical simulation, conducting exhaustive multiparameter investigations manually to ensure a globally optimal solution and to rigorously quantify the uncertainties is an unattainable task, typically covered only partially and unsystematically. Bayesian inference algorithms offer a promising alternative approach that naturally includes uncertainty quantification and is less subjective to user bias in choosing the input parameters. The main challenge in using these methods is the computational cost of simulating enough samples to construct the posterior distributions for the uncertain input parameters. This challenge can be overcome by combining probabilistic surrogate modelling, such as Gaussian Process regression, with Bayesian optimization, which can reduce the number of required simulations by several orders of magnitude. Here, we implement this type of Bayesian optimization framework for a model for analysis of disruption runaway electrons, and explore for simulations of current quench in a JET plasma discharge with an argon induced disruption.
... After this experiment, we obtain the measurementṼ (θ k ) of V (θ k ) and we determine a non-parametric probabilistic model 1 (usually 1 A closed-form expression of the cost V as a function of the design parameter vector θ is indeed not available due to the fact that the system S is unknown/uncertain. a Gaussian process model [11]) of the cost function V (θ) based on the measurements collected up to iteration k. This probabilistic model (together with its uncertainty) is then used to suggest the next controller configuration θ k+1 to be tested in the next experiment, aiming at minimization of the cost V (θ). ...
... The most common probabilistic model used in BO is the Gaussian Process (GP) [11]. Let us analyze how this framework can be used for our BO algorithm and let us suppose that we are after iteration k in this BO algorithm. ...
... The parameters describing this prior information (the so-called hyperparameters) are re-estimated at each iteration to make the observed data the most likely, by maximizing the marginal data likelihood, see[11]. ...
Conference Paper
Full-text available
Bayesian Optimization is a powerful machine-learning tool enabling automated design of fixed-structure controllers. A sequence of closed-loop calibration experiments is performed, and the next configuration to be tested is selected by the optimization algorithm in order to minimize an objective function measured directly on the real system. While the approach has been shown to be effective, its applicability is limited in certain domains by safety considerations, as the algorithm may suggest controller configurations which lead to dangerous behaviours in some of the calibration experiments. In this paper, we modify the standard Bayesian Optimization algorithm by introducing explicit constraints for safe exploration of the controller configuration space. The constraints are derived based on a preliminary model of the process dynamics, which is assumed to be available. Aspects for efficient implementation of the proposed methodology are discussed. Simulation examples highlight the advantage of the proposed methodology for controller calibration over the plain Bayesian Optimization algorithm.
... Let (x) denote the prediction at point x of a model learned using (X , y ) [19,43]. A classical measure for assessing the predictive ability of , in order to evaluate its validity, is the predictivity coefficient. ...
... For each test-case, a GP regression model is fitted to the observations using ordinary Kriging [43] (a GP model with constant mean), with an anisotropic Matérn kernel with regularity parameter 5/2: we substitute (15), with D a diagonal matrix with diagonal elements 1/ 2 , and the correlation lengths are estimated by maximum likelihood via a truncated Newton algorithm. All calculations were done using the Python package OpenTURNS for uncertainty quantification [1]. ...
... We think this unproductively biases thinking towards d = 2-dimensional point-referenced (latitude and longitude) data, whereas these methods can be applied much more widely than that. Machine learning (e.g., Rasmussen and Williams, 2006) and computer surrogate modeling (e.g., Gramacy, 2020) applications are typically in higher input dimension, and one of our goals in this paper is to introduce this way of thinking into the mining literature. ...
... ARD/separable Matérn kernels are also common, but their expression(s) are less tidy so we do not include it here. For more discussion, consult Rasmussen and Williams (2006) or Gramacy (2020). ...
Preprint
The canonical technique for nonlinear modeling of spatial and other point-referenced data is known as kriging in the geostatistics literature, and by Gaussian Process (GP) regression in surrogate modeling and machine learning communities. There are many similarities shared between kriging and GPs, but also some important differences. One is that GPs impose a process on the data-generating mechanism that can be used to automate kernel/variogram inference, thus removing the human from the loop in a conventional semivariogram analysis. The GP framework also suggests a probabilistically valid means of scaling to handle a large corpus of training data, i.e., an alternative to so-called ordinary kriging. Finally, recent GP implementations are tailored to make the most of modern computing architectures such as multi-core workstations and multi-node supercomputers. Ultimately, we use this discussion as a springboard for an empirics-based advocacy of state-of-the-art GP technology in the geospatial modeling of a large corpus of borehole data involved in mining for gold and other minerals. Our out-of-sample validation exercise quantifies how GP methods (as implemented by open source libraries) can be both more economical (fewer human and compute resources), more accurate and offer better uncertainty quantification than kriging-based alternatives. Once in the GP framework, several possible extensions benefit from a fully generative modeling apparatus. In particular, we showcase a simple imputation scheme that copes with left-censoring of small measurements, which is a common feature in borehole assays.
... Moreover, in most cases, it is not feasible to find a solution using a linear hyperplane. Thus, a Gaussian kernel function is used, transforming the data into a higher-dimensional space to pass a linear hyperplane (Rasmussen & Williams, 2006). ...
... GPR has several benefits, working well on small datasets and having the ability to provide uncertainty measurements on the predictions. They are a type of kernel model, like SVMs, they are capable of predicting highly calibrated class membership probabilities, although the choice and configuration of the kernel used at the heart of the method can be challenging (Rasmussen & Williams, 2006). ...
Conference Paper
Full-text available
The concept of resilience is increasingly attracting attention, specifically after the recent Covid-19 pandemic. In the context of seismic threat, the infrastructure seismic resilience is essential to keep the functionality of critical infrastructure and emergency departments during the occurrence of the disastrous earthquake event and aims to recover them quickly afterward. To achieve these targets, lots of preparation ahead are necessary, and prediction analysis of the great number of scenarios for damage and recovery needs to be simulated, compared, and analyzed to offer optimal resilient infrastructure designs and retrofitting. To perform these actions, several studies have used different technics such as Artificial Intelligence (AI) technology and machine learning (ML) techniques. These technics, moreover, have gained a rapid increase in the last years. This paper aims to review the available concepts of AI and ML techniques while used for seismic infrastructure resilience context, specifically in its four major analysis components i.e., hazards, damage, losses, and recovery. Limitations are discussed and recommendations are finally offered. This analysis can inspire future researchers by exploring the overall characteristics of the published literature.
... GPR mainly defines a distribution over functions such that for every two or more points chosen, the output of these points follows a joint multivariate Gaussian distribution [58]. A more detailed explanation of the model can be found in Appendix 2. One of the main advantages of using the ML-GPR method is that in addition to predicting a mean value, it also provides a variance for the predicted distribution. ...
... See Table 4. Gaussian process regression [45,58] (GPR, also known as Kriging, Gaussian spatial modeling, Gaussian stochastic process) is a method that can be used to model a complex relationship between an output and several inputs that cannot reasonably be approached by a simple linear model. GPR works by defining a distribution over functions, and inference takes place directly in the space of functions. ...
Article
Full-text available
The surface tension (ST) of metallic alloys is a key property in many processing techniques. Notably, the ST value of liquid metals is crucial in additive manufacturing processes as it has a direct effect on the stability of the melt pool. Although several theoretical models have been proposed to describe the ST, mainly in binary systems, both experimental studies and existing theoretical models focus on simple systems. This study presents a machine learning model based on Gaussian process regression to predict the surface tension of multi-component metallic systems. The model is built and tested on available experimental data from the literature. It is shown that the model accurately predicts the ST value of binaries and ternaries with high precision, and that identifying certain trends in the ST values as a function of alloy composition is possible. The ability of the model to extrapolate to higher-order systems, especially novel concentrated alloys (high entropy alloys, HEA), is discussed.
... In this study, we employed the following algorithms: k-nearest neighbors (Tikhonov, 1943;kNN, Fix & Hodges, 1951;Cover & Hart, 1967; Elastic-Net (for regression), Santosa & William, 1986;Tibshirani, 1996;Witten & Frank, 2005;Zou & Hastie, 2005 Gaussian Process, Rasmussen & Williams, 2006;Kotsiantis, 2007); support vector machines (Vapnik, 1998;Hsu & Lin, 2002;Karatzoglou et al., 2006); tree-based algorithms such as random forest and adaptive boosting or AdaBoost (Ho, 1995;Breiman, 1996aBreiman, , 1996bFreund & Schapire, 1997;Breiman, 2001a;Kotsiantis, 2014;Sagi & Rokach, 2018); logistic regression (for classification), Cramer, 2002); naïve Bayes (for classification, Rennie et al., 2003;Hastie et al., 2009); and artificial neural network (ANN, Curry, 1944;Rosenblatt, 1961;Rumelhart et al., 1986;Hastie et al., 2009;Lemaré -chal, 2012). For details of these algorithms aside from Gaussian Process, their functionality and parameters, as well as for two application examples that are similar to those of this study, see Zhang et al. (2021Zhang et al. ( , 2022. ...
... For details of these algorithms aside from Gaussian Process, their functionality and parameters, as well as for two application examples that are similar to those of this study, see Zhang et al. (2021Zhang et al. ( , 2022. For Gaussian Process, details can be found in Rasmussen and Williams (2006). ...
Article
Full-text available
The Assen Fe ore deposit is a banded iron formation (BIF)-hosted orebody, occurring in the Penge Formation of the Transvaal Supergroup, located 50 km northwest of Pretoria in South Africa. Most BIF-hosted Fe ore deposits have experienced post-depositional alteration including supergene enrichment of Fe and low-grade regional metamorphism. Unlike most of the known BIF-hosted Fe ore deposits, high-grade hematite (> 60% Fe) in the Assen Fe ore deposit is located along the lithological contacts with dolerite intrusions. Due to the variability in alteration levels, identifying the lithologies present within the various parts of the Assen Fe ore deposit, specifically within the weathering zone, is often challenging. To address this challenge, machine learning was applied to enable the automatic classification of rock types identified within the Assen Fe ore mine and to predict the in-situ Fe grade. This classification is based on geochemical analyses, as well as petrography and geological mapping. A total of 21 diamond core drill cores were sampled at 1 m intervals, covering all the lithofacies present at Assen mine. These were analyzed for major elements and oxides by means of X-ray fluorescence spectrometry. Numerous machine learning algorithms were trained, tested and cross-validated for automated lithofacies classification and prediction of in-situ Fe grade, namely (a) k-nearest neighbors, (b) elastic-net, (c) support vector machines (SVMs), (d) adaptive boosting, (e) random forest, (f) logistic regression, (g) Naïve Bayes, (h) artificial neural network (ANN) and (i) Gaussian process algorithms. Random forest, SVM and ANN classifiers yield high classification accuracy scores during model training, testing and cross-validation. For in-situ Fe grade prediction, the same algorithms also consistently yielded the best results. The predictability of in-situ Fe grade on a per-lithology basis, combined with the fact that CaO and SiO 2 were the strongest predictors of Fe concentration, support the hypothesis that the process that led to Fe enrichment in the Assen Fe ore deposit is dominated by supergene processes. Moreover, we show that predictive modeling can be used to demonstrate that in this case, the main differentiator between the predictability of Fe concentration between different lithofacies lies in the strength of multivariate elemental associations between Fe and other oxides. Localized high-grade Fe ore along with lithological contacts with dolerite intrusion is indicative of intra-basinal fluid circulation from an already Fe-enriched hematite. These findings have a wider implication on lithofacies classification in weathered rocks and mobility of economic valuable elements such as Fe.
... For fluid flow problems and many others, Gaussian processes (GPs) present a useful approach for capturing the physics of dynamical systems. A GP is a Bayesian nonparametric machine learning technique that provides a flexible prior distribution over functions, enjoys analytical tractability, defines kernels for encoding domain structure, and has a fully probabilistic workflow for principled uncertainty reasoning [26,27]. For these reasons GPs are used widely in scientific modeling, with several recent methods more directly encoding physics into GP models: Numerical GPs have covariance functions resulting from temporal discretization of time-dependent partial differential equations (PDEs) which describe the physics [28,29], modified Matérn GPs can be defined to represent the solution to stochastic partial differential equations [30] and extend to Riemannian manifolds to fit more complex geometries [31], and the physics-informed basis-function GP derives a GP kernel directly from the physical model [32] -the latter method we elucidate in an experiment optimization example in the surrogate modeling motif. ...
... For surrogate modeling in the sciences, non-linear, nonparametric Gaussian processes (GP) [26] are typically used because of their flexibility, interpretability, and accurate uncertainty estimates [70]. Although traditionally limited to smaller datasets because of O(N 3 ) computational cost of training (where N is the number of training data points), much work on reliable GP sparsification and approximation methods make them viable for real-world use [71,72,73,74]. ...
Article
The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.
... (Σ ts ) jk = C(t j , s k ) for j = 0, · · · , m−1 and k = 0, · · · , n−1. This approach utilizes unconstrained samples u, which is preferable compared to standard Gaussian process regression [47] when an efficient unconstrained sampling algorithm such as circulant embedding is readily in place. ...
... Structure functions of order p = 2, 4, 6 of the unconstrained Fourier-and multiwaveletbased simulations. The scaling laws S p (τ ) = C p τ ζp are given for reference as black dashed lines with log-normal scaling exponents ζ p = p 3 − µ 18 (p 2 − 3p) and constant factor C p = (p − 1)!!T µ 18 (p 2 −3p)according to equation(47). ...
Preprint
We present a novel method for stochastic interpolation of sparsely sampled time signals based on a superstatistical random process generated from a multivariate Gaussian scale mixture. In comparison to other stochastic interpolation methods such as Gaussian process regression, our method possesses strong multifractal properties and is thus applicable to a broad range of real-world time series, e.g. from solar wind or atmospheric turbulence. Furthermore, we provide a sampling algorithm in terms of a mixing procedure that consists of generating a 1 + 1-dimensional field u(t, {\xi}), where each Gaussian component u{\xi}(t) is synthesized with identical underlying noise but different covariance function C{\xi}(t,s) parameterized by a log-normally distributed parameter {\xi}. Due to the Gaussianity of each component u{\xi}(t), we can exploit standard sampling alogrithms such as Fourier or wavelet methods and, most importantly, methods to constrain the process on the sparse measurement points. The scale mixture u(t) is then initialized by assigning each point in time t a {\xi}(t) and therefore a specific value from u(t, {\xi}), where the time-dependent parameter {\xi}(t) follows a log-normal process with a large correlation time scale compared to the correlation time of u(t, {\xi}). We juxtapose Fourier and wavelet methods and show that a multiwavelet-based hierarchical approximation of the interpolating paths, which produce a sparse covariance structure, provide an adequate method to locally interpolate large and sparse datasets.
... In this paper, we adopt the Gaussian Process (GP) [40] as the surrogate model, which is a commonly used nonparametric model. ...
... To maintain generality, the mean function is assumed to be m(f ) = 0. For kernel function, we adopt the Matern kernel function [40] that enjoys high flexibility. Specifically, we choose the smooth parameter of it to be 5/2. ...
Article
Full-text available
Data augmentation has been an essential technique to increase the amount and diversity of datasets, thus improving deep learning models. To pursue further performance, several automated data augmentation approaches have recently been proposed to find data augmentation policies automatically. However, there are still some key issues that deserve further exploration, i.e., a precise policy search space definition, the instructive policy evaluation method, and the low computational cost of policy search. In this paper, we propose a novel method named BO-Aug that attempts to solve the above issues. Empirical verification on three widely used image classification datasets shows that the proposed method can achieve state-of-the-art or comparable performance compared with advanced automated data augmentation methods, with a relatively low cost. Our code is available at https://github.com/Zhangcx19/BO-Aug.
... Kriging (Krige (1951), Matheron (1970), see also (Cressie 1993, Stein 2012, Santner et al. 2013) consists in inferring the values of a Gaussian random field given observations at a finite set of observation points. It has become a popular method for a large range of applications, such as geostatistics (Matheron 1970), numerical code approximation (Sacks et al. 1989, Santner et al. 2013, Bachoc et al. 2016, global optimization (Jones et al. 1998) or machine learning (Rasmussen and Williams 2006). ...
... Numerical illustration of the consistency results. Propositions 1 and 2 are now illustrated on simple examples where the test functions are given by random samples of a centered Gaussian Process Y with Matérn 3/2 covariance (see Rasmussen and Williams 2006). The observation points x 1 , . . . ...
Article
Full-text available
Kriging is a widely employed technique, in particular for computer experiments, in machine learning or in geostatistics. An important challenge for Kriging is the computational burden when the data set is large. This article focuses on a class of methods aiming at decreasing this computational cost, consisting in aggregating Kriging predictors based on smaller data subsets. It proves that aggregation methods that ignore the covariance between sub-models can yield an inconsistent final Kriging prediction. In contrast, a theoretical study of the nested Kriging method shows additional attractive properties for it: First, this predictor is consistent, second it can be interpreted as an exact conditional distribution for a modified process and third, the conditional covariances given the observations can be computed efficiently. This article also includes a theoretical and numerical analysis of how the assignment of the observation points to the sub-models can affect the prediction ability of the aggregated model. Finally, the nested Kriging method is extended to measurement errors and to universal Kriging.
... This is what happens in posterior when the combination of prior and the data lead to the posterior distribution over functions. The problem of learning in Gaussian process is exactly the problem of finding suitable properties for covariance function [37]. ...
... Rasmussen and Williams [37] explain to compute predictive distributions for the outputs y * corresponding to the novel test input x * . The predictive distribution is Gaussian with mean and variance given byf * =k ⊺ * (K + σ 2 n I) −1 y, ...
Thesis
Full-text available
This thesis develops an optimization algorithm to select the most optimal section for a long-span bridge in terms of the minimum section weight by minimizing the buffeting response and limiting the flutter limit. Parametric variations in the depth and width of the fairing shape are introduced as variables in the optimization process. Several methods for prediction, namely Polynomial, Extreme machine Learning, and Gaussian Process, are used to build the response surface model, and the optimized sections for each method are compared with the Finite Element Model (FEM) model and computational fluid dynamic (CFD) simulation. The summary of the optimization process is presented in the mapping results in the hope of assisting in decision-making. The aerodynamic phenomenon for long-span bridges is closely related to the shape of the cross-section, improving the aeroelastic characteristics of the cross-section by modifying the shape of the cross-section is a common way to reduce the effect of wind phenomena. Structures with high flutter limits and limited buffeting response are desirable in designing long-span bridges. Meanwhile, deck dead load is also a major factor, a slender structure is required to obtain a longer bridge span, and the consequence of a slender structure is that it is more susceptible to wind phenomena. Thus, to see the trade-off effect between objectives with the constraint of cross-section weight is one aspect that is presented in this work. In most cases, the design process is an iterative process, in which many sample sections and structural systems are calculated with specific constraints to get the best results. With 25 predefined sample sections, the response surface model tries to predict the structural response for all sections within the boundary, thus reducing the iterative process in the design. Due to the importance of response surface prediction, the quality of the response surface method is also the main topic of discussion in this thesis. In addition, engineers must ensure that the design is reliable enough to always be on the safe side with the uncertainty in design variables. Reliability-based design optimization is needed to overcome this problem, by determining the prescribed probability of failure, it is hoped that the safe area can be described clearly so that the optimized section is selected based on the sections in this safe region. In the end, the aim is to get not only an improved design but also a higher level of confidence in the design. Keywords: Reliability-based design optimization, Flutter Limit, Buffeting, Aerodynamic Shape, Long-span Bridges
... Optimizing draft tubes is an example of a search requiring an expensive multi-modal, black-box, performance function because the efficiency and pressure recovery of each candidate draft tube must be computed with CFD. Bayesian Optimization (BO) is an efficient search method for this type of optimization because it requires far fewer evaluations of the performance function to find an optimum than most other optimization algorithms [35,36]. BO techniques have been successful in the design of architected meta-materials [37,38,39,40,41,42], hyper-parameter tuning for machine learning algorithms [43,44,45], drug design [46,47], and controller sensor placement [48]. ...
Preprint
Full-text available
Finding the optimal design of a hydrodynamic or aerodynamic surface is often impossible due to the expense of evaluating the cost functions (say, with computational fluid dynamics) needed to determine the performances of the flows that the surface controls. In addition, inherent limitations of the design space itself due to imposed geometric constraints, conventional parameterization methods, and user bias can restrict {\it all} of the designs within a chosen design space regardless of whether traditional optimization methods or newer, data-driven design algorithms with machine learning are used to search the design space. We present a 2-pronged attack to address these difficulties: we propose (1) a methodology to create the design space using morphing that we call {\it Design-by-Morphing} (DbM); and (2) an optimization algorithm to search that space that uses a novel Bayesian Optimization (BO) strategy that we call {\it Mixed variable, Multi-Objective Bayesian Optimization} (MixMOBO). We apply this shape optimization strategy to maximize the power output of a hydrokinetic turbine. Applying these two strategies in tandem, we demonstrate that we can create a novel, geometrically-unconstrained, design space of a draft tube and hub shape and then optimize them simultaneously with a {\it minimum} number of cost function calls. Our framework is versatile and can be applied to the shape optimization of a variety of fluid problems.
... In [182], Perrone et al. have proposed a general constrained Bayesian optimization framework which is able to cater different ML models and one or multiple fairness constraints. Following the Bayesian optimization [183], this study iteratively tunes hyperparameters (x) based on the best query (x * ), which is achieved by maximizing an acquisition function. The acquisition function is built on a posterior Gaussian process with respect to the designed objective f (x) (i.e., accuracy) and fairness constraint c(x). ...
Preprint
Recent years have seen the rapid development of fairness-aware machine learning in mitigating unfairness or discrimination in decision-making in a wide range of applications. However, much less attention has been paid to the fairness-aware multi-objective optimization, which is indeed commonly seen in real life, such as fair resource allocation problems and data driven multi-objective optimization problems. This paper aims to illuminate and broaden our understanding of multi-objective optimization from the perspective of fairness. To this end, we start with a discussion of user preferences in multi-objective optimization and then explore its relationship to fairness in machine learning and multi-objective optimization. Following the above discussions, representative cases of fairness-aware multiobjective optimization are presented, further elaborating the importance of fairness in traditional multi-objective optimization, data-driven optimization and federated optimization. Finally, challenges and opportunities in fairness-aware multi-objective optimization are addressed. We hope that this article makes a small step forward towards understanding fairness in the context of optimization and promote research interest in fairness-aware multi-objective optimization.
... A mathematical model of temperatureinduced strains is developed based on the GPM for extracting the features. GPM is a powerful tool for handling high dimension regression problems [44]. Consider a data set D = {X, Y} consisting of n pairs of input vector X = {x 1 , x 2 , …x n } and outputY = { y 1 , y 2 , …y n } . ...
Article
Full-text available
Concrete structures in cold regions are highly susceptible to internal cracks due to the periodic variation of temperature and the alternation of drying and wetting in the long term. Conventional damage detecting techniques for concrete are often destructive and need tedious manual operation. Concrete strain monitoring is a non-destructive measurement technique, which provides continuously streaming strain data and has the potential of detecting damage automatically based on a data-driven approach. Temperature-induced strains are highly responsive to damage to concrete structures under the freeze-thaw (F-T) cycles. In this regard, we propose a novel damage detection approach based on temperature-induced strains. The proposed method separates the temperature-induced component from the measured data based on the independent component analysis, which improves the modeling accuracy of the Gaussian processes modeling (GPM) approach for detecting damage. The principle of damage detection is extracting the features related to the damage from the measured strain data. The model residuals of temperature-induced strains are extracted as features that are sensitive to concrete damage. A novel damage index is established to determine the presence of structural damage based on the Kolmogorov Smirnov (KS) test, which estimates the probability distribution of residuals. To increase the reliability of damage detection and decrease the pseudo fault alarm rate, the general extreme value (GEV) theory is considered to determine a reliable threshold limit. A moving window strategy is introduced to perform damage detection and identify the damage occurrence time effectively. Monitoring data of the F-T cycle experiment are utilized to validate the proposed method. The results of three different damage cases demonstrate the effectiveness of the proposed data-driven method in terms of detecting the damage and identifying the damage occurrence time. The variation rules of the concrete temperature characteristics are revealed through the damage detection results under F-T cycles.
... Snoek et al. [39] have introduced an automatic hyperparameters search method for machine learning models, based on the Bayesian optimization (BO) process. BO algorithms seek to minimize a given objective function f(h) for the hyperparameter vector h in a bounded domain H ⊂  , by fitting a Gaussian process (GP) regression model [42] to the evaluations of f(h), i.e., constructing a prior probability distribution of the objective function itself. The GP prior is exploited to make decisions about where in H to evaluate f(·) and, after the result of the experiment with the new set of hyperparameters has been observed, such model is updated to improve its fitting to previous observations. ...
Article
Full-text available
In this study a novel procedure is presented for an efficient development of predictive models of road pavement asphalt concretes mechanical characteristics and volumetric properties, using shallow artificial neural networks. The problems of properly assessing the actual generalization feature of a model and avoiding the effects induced by a fixed training-test data split are addressed. Since machine learning models require a careful definition of the network hyperparameters, a Bayesian approach is presented to set the optimal model configuration. The case study covered a set of 92 asphalt concrete specimens for thin wearing layers.
... Kriging refers to a surrogate model that is based on Gaussian process modelling (and is sometimes called Gaussian process regression Rasmussen and Williams 2006). The method first originated in geostatistics in a paper by Krige (1951) and became popular after its use for analysis and of various computer experiments (Sacks et al. 1989). ...
Article
Full-text available
The utilization of surrogate models to approximate complex systems has recently gained increased popularity. Because of their capability to deal with black-box problems and lower computational requirements, surrogates were successfully utilized by researchers in various engineering and scientific fields. An efficient use of surrogates can bring considerable savings in computational resources and time. Since literature on surrogate modelling encompasses a large variety of approaches, the appropriate choice of a surrogate remains a challenging task. This review discusses significant publications where surrogate modelling for finite element method-based computations was utilized. We familiarize the reader with the subject, explain the function of surrogate modelling, sampling and model validation procedures, and give a description of the different surrogate types. We then discuss main categories where surrogate models are used: prediction, sensitivity analysis, uncertainty quantification, and surrogate-assisted optimization, and give detailed account of recent advances and applications. We review the most widely used and recently developed software tools that are used to apply the discussed techniques with ease. Based on a literature review of 180 papers related to surrogate modelling, we discuss major research trends, gaps, and practical recommendations. As the utilization of surrogate models grows in popularity, this review can function as a guide that makes surrogate modelling more accessible.
... Recall that in Section 3.2, the inference is conducted by building and using surrogate models rather than the LAMMPS code directly. Note that in this study, we use Gaussian process surrogates [69,64,49]. We discuss these results first, highlighting the use of surrogate-based predictive distributions as diagnostics for the calibration. ...
Preprint
Full-text available
Developing reliable interatomic potential models with quantified predictive accuracy is crucial for atomistic simulations. Commonly used potentials, such as those constructed through the embedded atom method (EAM), are derived from semi-empirical considerations and contain unknown parameters that must be fitted based on training data. In the present work, we investigate Bayesian calibration as a means of fitting EAM potentials for binary alloys. The Bayesian setting naturally assimilates probabilistic assertions about uncertain quantities. In this way, uncertainties about model parameters and model errors can be updated by conditioning on the training data and then carried through to prediction. We apply these techniques to investigate an EAM potential for a family of gold-copper systems in which the training data correspond to density-functional theory values for lattice parameters, mixing enthalpies, and various elastic constants. Through the use of predictive distributions, we demonstrate the limitations of the potential and highlight the importance of statistical formulations for model error.
... Gaussian process regression (GPR) is a probabilistic, non-parametric supervised learning method for generalizing nonlinear and complicated function mapping hidden in data sets. The GPR model is based on Rasmussen and Williams' [32] assumption that adjacent observations should communicate information about each other; it is a means of describing a prior directly over function space. The mean and covariance of a Gaussian distribution are vectors and matrices, respectively, whereas the Gaussian process is an over function. ...
Article
Full-text available
In the design stage of construction projects, determining the soil permeability coefficient is one of the most important steps in assessing groundwater, infiltration, runoff, and drainage. In this study, various kernel-function-based Gaussian process regression models were developed to estimate the soil permeability coefficient, based on six input parameters such as liquid limit, plastic limit, clay content, void ratio, natural water content, and specific density. In this study, a total of 84 soil samples data reported in the literature from the detailed design-stage investigations of the Da Nang–Quang Ngai national road project in Vietnam were used for developing and validating the models. The models’ performance was evaluated and compared using statistical error indicators such as root mean square error and mean absolute error, as well as the determination coefficient and correlation coefficient. The analysis of performance measures demonstrates that the Gaussian process regression model based on Pearson universal kernel achieved comparatively better and reliable results and, thus, should be encouraged in further research.
... In addition, a Gaussian process classifier (GPC) was used to classify all 4 BCI task conditions. The GPC 153 is a probabilistic classification method relying on random field theory (Rasmussen and Williams, 2006), and has 154 been successfully tested for the decoding of fMRI data (Marquand et al., 2010). For classification accuracy, class 155 and balanced accuracy (BA, an average of sensitivity and specificity) were reported (Schrouff et al., 2013). ...
Preprint
Full-text available
Accurate mapping of cortical engagement during mental imagery or cognitive tasks remains a challenging brain–imaging problem with immediate relevance to the development of brain – computer interfaces (BCI). We analyzed data from fourteen individuals who performed cued motor imagery, mental arithmetic, or silent word generation tasks during MEG recordings. During the motor imagery task, participants imagined the movement of either both hands (HANDS) or both feet (FEET) after the appearance of a static visual cue. During the cognitive task, participants either mentally subtracted two numbers that were displayed on the screen (SUB) or generated words starting with a letter cue that was presented (WORD). The MEG recordings were denoised using a combination of spatiotemporal filtering, the elimination of noisy epochs, and ICA decomposition. Cortical source power in the beta-band (17 – 25 Hz) was estimated from the selected temporal windows using a frequency–resolved beamforming method applied to the sensor–level MEG signals. The task–related cortical engagement was inferred from beta power decrements within non–overlapping 400 ms temporal windows between 400 and 2800 ms after cue presentation relative to a baseline 400 ms temporal window before cue onset. We estimated the significance of these power changes within each parcel of the Desikan–Killiany atlas using a non–parametric permutation test at the group level. During the HANDS and FEET movement–imagery conditions, beta-power decreased in premotor and motor areas, consistent with a robust engagement of these cortical regions during motor imagery. During WORD and SUB tasks, beta–power decrements signaling cortical engagement were lateralized to left hemispheric brain areas that are expected to engage in language and arithmetic processing within the temporal (superior temporal gyrus), parietal (supramarginal gyrus), and (inferior) frontal regions. A leave–one–subject–out cross–validation using a support vector machine (SVM) applied to beta power decrements across brain parcels yielded accuracy rates of 74% and 68% respectively, for classifying motor-imagery (HANDS–vs–FEET) and cognitive (WORD–vs–SUB) tasks. From the motor-versus-nonmotor contrasts, accuracy rates of 85% and 80% respectively, were observed for HAND–vs–WORD and HAND–vs–SUB. A multivariate Gaussian process classifier (GPC) provided an accuracy rate of ≈60% for a four–way (HANDS–FEET–WORD–SUB) classification problem. The regions identified by both SVM and GPC classification weight maps were largely consistent with the source modeling findings. Within–subject correlations of beta–decrements during the two task sessions provided insights into the level of engagement by individual subjects and showed moderately high correlations for most subjects. Our results show that it is possible to map the dynamics of cortical engagement during mental processes in the absence of dynamic sensory stimuli or overt behavioral outputs using task–related beta–power decrements. The ability to do so with the high spatiotemporal resolution afforded by MEG could potentially help better characterize the physiological basis of motor or cognitive impairments in neurological disorders and guide strategies for neurorehabilitation.
... The concept of distant-aware uncertainty is native in Gaussian Processes (GPs) [42], where a kernel captures a measure of distance between pairs of inputs. Modern approaches combine Radial Basis Function (RBF) kernels with deep feature extractors, i.e. deep neural networks that transform the input space in order to obtain a better fit of the data [4]. ...
Preprint
Full-text available
As we move away from the data, the predictive uncertainty should increase, since a great variety of explanations are consistent with the little available information. We introduce Distance-Aware Prior (DAP) calibration, a method to correct overconfidence of Bayesian deep learning models outside of the training domain. We define DAPs as prior distributions over the model parameters that depend on the inputs through a measure of their distance from the training set. DAP calibration is agnostic to the posterior inference method, and it can be performed as a post-processing step. We demonstrate its effectiveness against several baselines in a variety of classification and regression problems, including benchmarks designed to test the quality of predictive distributions away from the data.
... The activations of the last hidden layer were extracted as the features that were used to train the GP. The GP comprises a Matern kernel [41] with ν = 3/2 with white noise to account for noisy data. Several different states of white noise level were increasingly introduced to the model to distort it. ...
Article
Full-text available
Microwave sensors are principally sensitive to effective permittivity, and hence not selective to a specific material under test (MUT). In this work, a highly compact microwave planar sensor based on zeroth-order resonance is designed to operate at three distant frequencies of 3.5, 4.3, and 5 GHz, with the size of only λg−min/8 per resonator. This resonator is deployed to characterize liquid mixtures with one desired MUT (here water) combined with an interfering material (e.g., methanol, ethanol, or acetone) with various concentrations (0%:10%:100%). To achieve a sensor with selectivity to water, a convolutional neural network (CNN) is used to recognize different concentrations of water regardless of the host medium. To obtain a high accuracy of this classification, Style-GAN is utilized to generate a reliable sensor response for concentrations between water and the host medium (methanol, ethanol, and acetone). A high accuracy of 90.7% is achieved using CNN for selectively discriminating water concentrations.
... Latent Gaussian fields are explicitly or implicitly used in many spatial analysis methods, including INLA (Rue et al., 2009), kriging (Cressie, 1990), Gaussian process (GP) regression (Rasmussen & Williams, 2006), and spatial GAMs with smoothing splines (Wood, 2020). In all these implementations a key challenge is appropriately specifying the smoothness properties of the spatial field in a manner that accurately captures the autocorrelation of the data without overfitting to the noise. ...
Article
Statistical models use observations of animals to make inferences about the abundance and distribution of species. However, the spatial distribution of animals is a complex function of many factors, including landscape and environmental features, and intra‐ and interspecific interactions. Modelling approaches often have to make significant simplifying assumptions about these factors, which can result in poor model performance and inaccurate predictions. Here, we explore the implications of complex spatial structure for modelling the abundance of the Serengeti wildebeest, a gregarious migratory species. The social behaviour of wildebeest leads to a highly aggregated distribution, and we examine the consequences of omitting this spatial complexity when modelling species abundance. To account for this distribution, we introduce a multi‐latent framework that uses two random fields to capture the clustered distribution of wildebeest. Our results show that simplifying assumptions that are often made in spatial models can dramatically impair performance. However, by allowing for mixtures of spatial models accurate predictions can be made. Furthermore, there can be a non‐monotonic relationship between model complexity and model performance; complex, flexible models that rely on unfounded assumptions can potentially make highly inaccurate predictions, whereas simpler more traditional approaches involve fewer assumptions and are less sensitive to these issues. We demonstrate how to develop flexible spatial models that can accommodate the complex processes driving animal distributions. Our findings highlight the importance of robust model checking protocols, and we illustrate how realistic assumptions can be incorporated into models using random fields.
... It should be noted that the stationarity assumption may be required to hold true even for interpolation problems. For example, the implementation of the MVAR models, but also Gaussian processes with a Gaussian kernel, requires the stationary covariance function assumption to be satisfied (see, e.g., the discussion in Rasmussen 90 and Cheng et al. 91 ). Third, when dealing with real-world time series, such as financial time series, the number of available snapshots (even at the intraday frequency of trading) is limited in contrast to the size of temporal data that one can produce by model simulations. ...
Article
We address a three-tier numerical framework based on nonlinear manifold learning for the forecasting of high-dimensional time series, relaxing the “curse of dimensionality” related to the training phase of surrogate/machine learning models. At the first step, we embed the high-dimensional time series into a reduced low-dimensional space using nonlinear manifold learning (local linear embedding and parsimonious diffusion maps). Then, we construct reduced-order surrogate models on the manifold (here, for our illustrations, we used multivariate autoregressive and Gaussian process regression models) to forecast the embedded dynamics. Finally, we solve the pre-image problem, thus lifting the embedded time series back to the original high-dimensional space using radial basis function interpolation and geometric harmonics. The proposed numerical data-driven scheme can also be applied as a reduced-order model procedure for the numerical solution/propagation of the (transient) dynamics of partial differential equations (PDEs). We assess the performance of the proposed scheme via three different families of problems: (a) the forecasting of synthetic time series generated by three simplistic linear and weakly nonlinear stochastic models resembling electroencephalography signals, (b) the prediction/propagation of the solution profiles of a linear parabolic PDE and the Brusselator model (a set of two nonlinear parabolic PDEs), and (c) the forecasting of a real-world data set containing daily time series of ten key foreign exchange rates spanning the time period 3 September 2001–29 October 2020.
... 1. the objective function is modelled with a Gaussian process (GP) (Williams & Rasmussen, 2006) -the expectation in eq. (1); 2. each new evaluation of the objective function is incorporated via a Bayesian update procedure and 3. an acquisition function is used to determine the next, high utility, point of evaluation of the objective. ...
Preprint
Full-text available
In this paper we explore cyber security defence, through the unification of a novel cyber security simulator with models for (causal) decision-making through optimisation. Particular attention is paid to a recently published approach: dynamic causal Bayesian optimisation (DCBO). We propose that DCBO can act as a blue agent when provided with a view of a simulated network and a causal model of how a red agent spreads within that network. To investigate how DCBO can perform optimal interventions on host nodes, in order to reduce the cost of intrusions caused by the red agent. Through this we demonstrate a complete cyber-simulation system, which we use to generate observational data for DCBO and provide numerical quantitative results which lay the foundations for future work in this space.
... Therefore a GP should be able to represent functions of arbitrary margin that behave arbitrarily badly away from the training data. Since GP classification is effective in practice (Rasmussen & Williams, 2005), such poorly behaving functions must not be selected for by GP inference. To test whether this is essentially the same behavior that is being observed in § 4 and § 5, one needs to build a model of GP classification that explicitly involves a normalized margin parameter. ...
Conference Paper
Weight norm ‖w‖ and margin γ participate in learning theory via the normalized margin γ/‖w‖. Since standard neural net optimizers do not control normalized margin, it is hard to test whether this quantity causally relates to generalization. This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions. First: does normalized margin always have a causal effect on generalization? The paper finds that no -- networks can be produced where normalized margin has seemingly no relationship with generalization, counter to the theory of Bartlett et al. (2017). Second: does normalized margin ever have a causal effect on generalization? The paper finds that yes -- in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior.
... Indeed, given a GP Z = (Z s ) s∈D on some domain D and a set of points s 1 , ..., s n ∈ D, the distribution of Z conditionally on Z s1 , ..., Z sn is again Gaussian, with mean and covariance functions that can be computed in closed form, see e.g. [35]. ...
Preprint
Full-text available
Gaussian processes appear as building blocks in various stochastic models and have been found instrumental to account for imprecisely known, latent functions. It is often the case that such functions may be directly or indirectly evaluated, be it in static or in sequential settings. Here we focus on situations where, rather than pointwise evaluations, evaluations of prescribed linear operators at the function of interest are (sequentially) assimilated. While working with operator data is increasingly encountered in the practice of Gaussian process modelling, mathematical details of conditioning and model updating in such settings are typically by-passed. Here we address these questions by highlighting conditions under which Gaussian process modelling coincides with endowing separable Banach spaces of functions with Gaussian measures, and by leveraging existing results on the disintegration of such measures with respect to operator data. Using recent results on path properties of GPs and their connection to RKHS, we extend the Gaussian process - Gaussian measure correspondence beyond the standard setting of Gaussian random elements in the Banach space of continuous functions. Turning then to the sequential settings, we revisit update formulae in the Gaussian measure framework and establish equalities between final and intermediate posterior mean functions and covariance operators. The latter equalities appear as infinite-dimensional and discretization-independent analogues of Gaussian vector update formulae.
... Gaussian processes, also known are Kriging, are infinite-dimensional generalizations of multivariate Gaussian distributions. 18 Their predictions follow Gaussian distributions and thus include an estimate and a variance. Formally, Gaussian processes are defined by its mean function (mpxq :" IE "f pxq ‰ ) and covariance function (kpx, x 1 q :" IE " p ypxq´mpxq q p ypx 1 q´mpxq q T ‰ ), where y is an observation fromf pxq. ...
Preprint
Trained ML models are commonly embedded in optimization problems. In many cases, this leads to large-scale NLPs that are difficult to solve to global optimality. While ML models frequently lead to large problems, they also exhibit homogeneous structures and repeating patterns (e.g., layers in ANNs). Thus, specialized solution strategies can be used for large problem classes. Recently, there have been some promising works proposing specialized reformulations using mixed-integer programming or reduced space formulations. However, further work is needed to develop more efficient solution approaches and keep up with the rapid development of new ML model architectures.
... Let f t+1 = f ( t+1 ) . Then where, and The resulting probabilistic GP surrogate for the complete parameter space is given by Rasmussen and Williams (2006) as: ...
Article
Full-text available
Constitutive modeling of the meniscus is critical in areas like knee surgery and tissue engineering. At low strain rates, the meniscus can be described using a hyperelastic model. Calibration of hyperelastic material models of the meniscus is challenging on many fronts due to material variability and friction. In this study, we present a framework to determine the hyperelastic material parameters of porcine meniscus (and similar soft tissues) using no-slip uniaxial compression experiments. Because of the nonhomogeneous deformation in the specimens, a finite element solution is required at each step of the iterative calibration process. We employ a Bayesian calibration approach to account for the inherent material variability and a Bayesian optimization approach to minimize the resulting cost function in the material parameter space. Cylindrical specimens of porcine meniscus from the anterior, middle and posterior regions are tested up to 30% compressive strain and the Yeoh form of hyperelastic strain energy density function is used to describe the material response. The results show that the Yeoh form is able to accurately describe the compressive response of porcine meniscus and that the Bayesian calibration and optimization approaches are able to calibrate the model in a computationally efficient manner while taking into account the inherent material variability. The results also show that the shear modulus or the initial stiffness is roughly uniform across the different areas of the meniscus, but there is significant spatial heterogeneity in the response at high strains. In particular, the middle region is considerably stiffer at high strains. This heterogeneity is important to consider in modeling the response of the meniscus for clinical applications.
... These terms are referred to as the 'hyperparameters' of the GP so as to differentiate this process from fitting a parametric function to the data. For further information of the quasi-periodic kernel GP and on GP's in general, we refer the reader to Rasmussen & Williams (2006) and Roberts et al. (2013). The QP GP used in this work is implemented using the G python package (Ambikasaran et al. 2015) by combining the E -S 2K and E S K built-in kernel functions. ...
Preprint
Full-text available
In recent years, Gaussian Process (GP) regression has become widely used to analyse stellar and exoplanet time-series data sets. For spotted stars, the most popular GP covariance function is the quasi-periodic (QP) kernel, whose the hyperparameters of the GP have a plausible interpretation in terms of physical properties of the star and spots. In this paper, we test the reliability of this interpretation by modelling data simulated using a spot model using a QP GP, and the recently proposed quasi-periodic plus cosine (QPC) GP, comparing the posterior distributions of the GP hyperparameters to the input parameters of the spot model. We find excellent agreement between the input stellar rotation period and the QP and QPC GP period, and very good agreement between the spot decay timescale and the length scale of the squared exponential term. We also compare the hyperparameters derived from light and radial velocity (RV) curves for a given star, finding that the period and evolution timescales are in good agreement. However, the harmonic complexity of the GP, while displaying no clear correlation with the spot properties in our simulations, is systematically higher for the RV than for the light curve data. Finally, for the QP kernel, we investigate the impact of noise and time-sampling on the hyperparameters in the case of RVs. Our results indicate that good coverage of rotation period and spot evolution time-scales is more important than the total number of points, and noise characteristics govern the harmonic complexity.
Thesis
Models with Gaussian process priors and Gaussian likelihoods are one of only a handful of Bayesian models where inference can be performed without the need for approximation. However, a frequent criticism of these models from practitioners of Bayesian machine learning is that they are challenging to scale to large datasets due to the need to compute a large kernel matrix and perform standard linear-algebraic operations with this matrix. This limitation has driven decades of research in both statistics and machine learning seeking to scale Gaussian process regression models to ever-larger datasets. This thesis builds on this line of research. We focus on the problem of approximate inference and model selection with approximate maximum marginal likelihood as applied to Gaussian process regression. Our discussion is guided by three questions: Does an approximation work on a range of models and datasets? Can you verify that an approximation has worked on a given dataset? Is an approximation easy for a practitioner to use? While we are far from the first to ask these questions, we offer new insights into each question in the context of Gaussian process regression. In the first part of this thesis, we focus on sparse variational Gaussian process regression (Titsias, 2009). We provide new diagnostics for inference with this method that can be used as practical guides for practitioners trying to balance computation and accuracy with this approximation. We then provide an asymptotic analysis that highlights properties of the model and dataset that are sufficient for this approximation to perform reliable inference with a small computational cost. This analysis builds on an approach laid out in Burt (2018), as well as on similar guarantees in the kernel ridge regression literature. In the second part of this thesis, we consider iterative methods, especially the method of conjugate gradients, as applied to Gaussian process regression (Gibbs and MacKay, 1997). We primarily focus on improving the reliability of approximate maximum marginal likelihood when using these approximations. We investigate how the method of conjugate gradients and related approaches can be used to derive bounds on quantities related to the log marginal likelihood. This idea can be used to improve the speed and stability of model selection with these approaches, making them easier to use in practice.
Article
Using Bayesian inference, we determine probabilistic constraints on the parameters describing the fluctuating structure of protons at high energy. We employ the color glass condensate framework supplemented with a model for the spatial structure of the proton, along with experimental data from the ZEUS and H1 Collaborations on coherent and incoherent diffractive J/ψ production in e+p collisions at HERA. This data is found to constrain most model parameters well. This work sets the stage for future global analyses, including experimental data from e+p, p+p, and p+A collisions, to constrain the fluctuating structure of nucleons along with properties of the final state.
Article
Full-text available
Background and aims – Climatic fluctuations during the Pleistocene altered the distribution of many species and even entire biomes, allowing some species to increase their range while others underwent reductions. Recent and ongoing anthropogenic climate change is altering climatic patterns very rapidly and is likely to impact species’ distributions over shorter timescales than previous natural fluctuations. Therefore, we aimed to understand how Pleistocene and Holocene climatic fluctuations might have shaped the current distribution of Holoregmia and explore its expected distribution under future climate scenarios. Material and methods – We modelled the potential distribution of Holoregmia viscida (Martyniaceae), a monospecific plant genus endemic to the semi-arid Caatinga Domain in Brazil. We used an ensemble approach to model suitable areas for Holoregmia under present conditions, Paleoclimatic scenarios, and global warming scenarios in 2050 and 2090. Key results – Holocene climates in most Caatinga were too humid for Holoregmia , which restricted its suitable areas to the southern Caatinga, similar to its current distribution. However, under global warming scenarios, the Caatinga is expected to become too dry for this lineage, resulting in a steady decline in the area suitable for Holoregmia and even its possible extinction under the most pessimistic scenario modelled. Conclusion – The predicted extinction of the ancient and highly specialized Holoregmia viscida highlights the possible consequences of climate change for some species of endemic Caatinga flora. Invaluable phylogenetic diversity may be lost in the coming decades, representing millions of years of unique evolutionary history and consequent loss of evolutionary potential to adapt to future environmental changes in semi-arid environments.
Preprint
Full-text available
In this paper, we presented a data-driven framework to optimize the out-of-plane stiffness for soft grippers to achieve mechanical properties as hard-to-twist and easy-to-bend. The effectiveness of this method is demonstrated in the design of a soft pneumatic bending actuator (SPBA). First, a new objective function is defined to quantitatively evaluate the out-of-plane stiffness as well as the bending performance. Then, sensitivity analysis is conducted on the parametric model of an SPBA design to determine the optimized design parameters with the help of Finite Element Analysis (FEA). To enable the computation of numerical optimization, a data-driven approach is employed to learn a cost function that directly represents the out-of-plane stiffness as a differentiable function of the design variables. A gradient-based method is used to maximize the out-of-plane stiffness of the SPBA while ensuring specific bending performance. The effectiveness of our method has been demonstrated in physical experiments taken on 3D-printed grippers.
Article
Machine learning (ML) has emerged as a formidable force for identifying hidden but pertinent patterns within a given data set with the objective of subsequent generation of automated predictive behavior. In recent years, it is safe to conclude that ML and its close cousin, deep learning (DL), have ushered in unprecedented developments in all areas of physical sciences, especially chemistry. Not only classical variants of ML, even those trainable on near-term quantum hardwares have been developed with promising outcomes. Such algorithms have revolutionized materials design and performance of photovoltaics, electronic structure calculations of ground and excited states of correlated matter, computation of force-fields and potential energy surfaces informing chemical reaction dynamics, reactivity inspired rational strategies of drug designing and even classification of phases of matter with accurate identification of emergent criticality. In this review we shall explicate a subset of such topics and delineate the contributions made by both classical and quantum computing enhanced machine learning algorithms over the past few years. We shall not only present a brief overview of the well-known techniques but also highlight their learning strategies using statistical physical insight. The objective of the review is not only to foster exposition of the aforesaid techniques but also to empower and promote cross-pollination among future research in all areas of chemistry which can benefit from ML and in turn can potentially accelerate the growth of such algorithms.
ResearchGate has not been able to resolve any references for this publication.