Zoubin Ghahramani

University of Cambridge, Cambridge, England, United Kingdom

Are you Zoubin Ghahramani?

Claim your profile

Publications (311)374.75 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: We present an information-theoretic framework for solving global black-box optimization problems that also have black-box constraints. Of particular interest to us is to efficiently solve problems with decoupled constraints, in which subsets of the objective and constraint functions may be evaluated independently. For example, when the objective is evaluated on a CPU and the constraints are evaluated independently on a GPU. These problems require an acquisition function that can be separated into the contributions of the individual function evaluations. We develop one such acquisition function and call it Predictive Entropy Search with Constraints (PESC). PESC is an approximation to the expected information gain criterion and it compares favorably to alternative approaches based on improvement in several synthetic and real-world problems. In addition to this, we consider problems with a mix of functions that are fast and slow to evaluate. These problems require balancing the amount of time spent in the meta-computation of PESC and in the actual evaluation of the target objective. We take a bounded rationality approach and develop a partial update for PESC which trades off accuracy against speed. We then propose a method for adaptively switching between the partial and full updates for PESC. This allows us to interpolate between versions of PESC that are efficient in terms of function evaluations and those that are efficient in terms of wall-clock time. Overall, we demonstrate that PESC is an effective algorithm that provides a promising direction towards a unified solution for constrained Bayesian optimization.
    No preview · Article · Nov 2015
  • Amar Shah · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications, including problems in machine learning, rocket science and robotics.
    No preview · Article · Nov 2015
  • Roger B. Grosse · Zoubin Ghahramani · Ryan P. Adams
    [Show abstract] [Hide abstract]
    ABSTRACT: Computing the marginal likelihood (ML) of a model requires marginalizing out all of the parameters and latent variables, a difficult high-dimensional summation or integration problem. To make matters worse, it is often hard to measure the accuracy of one's ML estimates. We present bidirectional Monte Carlo, a technique for obtaining accurate log-ML estimates on data simulated from a model. This method obtains stochastic lower bounds on the log-ML using annealed importance sampling or sequential Monte Carlo, and obtains stochastic upper bounds by running these same algorithms in reverse starting from an exact posterior sample. The true value can be sandwiched between these two stochastic bounds with high probability. Using the ground truth log-ML estimates obtained from our method, we quantitatively evaluate a wide variety of existing ML estimators on several latent variable models: clustering, a low rank approximation, and a binary attributes model. These experiments yield insights into how to accurately estimate marginal likelihoods.
    No preview · Article · Nov 2015
  • Hong Ge · Yarin Gal · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: Tree structures are ubiquitous in data across many domains, and many datasets are naturally modelled by unobserved tree structures. In this paper, first we review the theory of random fragmentation processes [Bertoin, 2006], and a number of existing methods for modelling trees, including the popular nested Chinese restaurant process (nCRP). Then we define a general class of probability distributions over trees: the Dirichlet fragmentation process (DFP) through a novel combination of the theory of Dirichlet processes and random fragmentation processes. This DFP presents a stick-breaking construction, and relates to the nCRP in the same way the Dirichlet process relates to the Chinese restaurant process. Furthermore, we develop a novel hierarchical mixture model with the DFP, and empirically compare the new model to similar models in machine learning. Experiments show the DFP mixture model to be convincingly better than existing state-of-the-art approaches for hierarchical clustering and density modelling.
    No preview · Article · Sep 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hidden conditional random fields (HCRFs) are discriminative latent variable models which have been shown to successfully learn the hidden structure of a given classification problem. An Infinite hidden conditional random field is a hidden conditional random field with a countably infinite number of hidden states, which rids us not only of the necessity to specify a priori a fixed number of hidden states available but also of the problem of overfitting. Markov chain Monte Carlo (MCMC) sampling algorithms are often employed for inference in such models. However, convergence of such algorithms is rather difficult to verify, and as the complexity of the task at hand increases the computational cost of such algorithms often becomes prohibitive. These limitations can be overcome by variational techniques. In this paper, we present a generalized framework for infinite HCRF models, and a novel variational inference approach on a model based on coupled Dirichlet Process Mixtures, the HCRF-DPM. We show that the variational HCRF-DPM is able to converge to a correct number of represented hidden states, and performs as well as the best parametric HCRFs—chosen via cross-validation—for the difficult tasks of recognizing instances of agreement, disagreement, and pain in audiovisual sequences.
    No preview · Article · Sep 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Source
    Sara Wade · Zoubin Ghahramani

    Full-text · Dataset · Jul 2015
  • Source
    Sara Wade · Zoubin Ghahramani

    Full-text · Dataset · Jul 2015
  • Source
    Yutian Chen · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling also suffers from high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.
    Full-text · Article · Jun 2015
  • Source
    Amar Shah · David A. Knowles · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we investigate whether the understanding gleaned from LDA applies in the setting of sparse latent factor models, specifically beta process factor analysis (BPFA). We demonstrate that the big picture is consistent: using Gibbs sampling within SVI to maintain certain posterior dependencies is extremely effective. However, we find that different posterior dependencies are important in BPFA relative to LDA. Particularly, approximations able to model intra-local variable dependence perform best.
    Full-text · Article · Jun 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gaussian process (GP) models form a core part of probabilistic machine learning. Considerable research effort has been made into attacking three issues with GP models: how to compute efficiently when the number of data is large; how to approximate the posterior when the likelihood is not Gaussian and how to estimate covariance function parameter posteriors. This paper simultaneously addresses these, using a variational approximation to the posterior which is sparse in support of the function but otherwise free-form. The result is a Hybrid Monte-Carlo sampling scheme which allows for a non-Gaussian approximation over the function values and covariance parameters simultaneously, with efficient computations based on inducing-point sparse GPs. Code to replicate each experiment in this paper will be available shortly.
    Full-text · Article · Jun 2015
  • Shixiang Gu · Zoubin Ghahramani · Richard E. Turner
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequential Monte Carlo (SMC), or particle filtering, is a popular class of methods for sampling from an intractable target distribution using a sequence of simpler intermediate distributions. Like other importance sampling-based methods, performance is critically dependent on the proposal distribution: a bad proposal can lead to arbitrarily inaccurate estimates of the target distribution. This paper presents a new method for automatically adapting the proposal using an approximation of the Kullback-Leibler divergence between the true posterior and the proposal distribution. The method is very flexible, applicable to any parameterised proposal distribution and it supports online and batch variants. We use the new framework to adapt powerful proposal distributions with rich parameterisations based upon neural networks leading to Neural Adaptive Sequential Monte Carlo (NASMC). Experiments indicate that NASMC significantly improves inference in a non-linear state space model outperforming adaptive proposal methods including the Extended Kalman and Unscented Particle Filters. Experiments also indicate that improved inference translates into improved parameter learning when NASMC is used as a subroutine of Particle Marginal Metropolis Hastings. Finally we show that NASMC is able to train a neural network-based deep recurrent generative model achieving results that compete with the state-of-the-art for polymorphic music modelling. NASMC can be seen as bridging the gap between adaptive SMC methods and the recent work in scalable, black-box variational inference.
    No preview · Article · Jun 2015
  • Source
    Yarin Gal · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: We show that a multilayer perceptron (MLP) with arbitrary depth and nonlinearities, with dropout applied after every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model. This interpretation offers an explanation to some of dropout's key properties, such as its robustness to over-fitting. Our interpretation allows us to reason about uncertainty in deep learning, and allows the introduction of the Bayesian machinery into existing deep learning frameworks in a principled way. This document is an appendix for the main paper "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" by Gal and Ghahramani, 2015.
    Preview · Article · Jun 2015
  • Source
    Yarin Gal · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an efficient Bayesian convolutional neural network (convnet). The model offers better robustness to over-fitting on small data than traditional approaches. This is by placing a probability distribution over the convnet's kernels (also known as filters). We approximate the model's intractable posterior with Bernoulli variational distributions. This requires no additional model parameters. Our model can be implemented using existing tools in the field. This is by extending the recent interpretation of dropout as approximate inference in the Gaussian process to the case of Bayesian neural networks. The model achieves a considerable improvement in classification accuracy compared to previous approaches. We finish with state-of-the-art results on CIFAR-10 following our new interpretation.
    Preview · Article · Jun 2015
  • Source
    Yarin Gal · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: Deep learning tools have recently gained much attention in applied machine learning. However such tools for regression and classification do not allow us to capture model uncertainty. Bayesian models offer us the ability to reason about model uncertainty, but usually come with a prohibitive computational cost. We show that dropout in multilayer perceptron models (MLPs) can be interpreted as a Bayesian approximation. Results are obtained for modelling uncertainty for dropout MLP models - extracting information that has been thrown away so far, from existing models. This mitigates the problem of representing uncertainty in deep learning without sacrificing computational performance or test accuracy. We perform an exploratory study of the dropout uncertainty properties. Various network architectures and non-linearities are assessed on tasks of extrapolation, interpolation, and classification. We show that model uncertainty is important for classification tasks using MNIST as an example, and use the model's uncertainty in a Bayesian pipeline, with deep reinforcement learning as a concrete example.
    Preview · Article · Jun 2015
  • Novi Quadrianto · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: Random forests works by averaging several predictions of de-correlated trees. We show a conceptually radical approach to generate a random forest: random sampling of many trees from a prior distribution, and subsequently performing a weighted ensemble of predictive probabilities. Our approach uses priors that allow sampling of decision trees even before looking at the data, and a power likelihood that explores the space spanned by combination of decision trees. While each tree performs Bayesian inference to compute its predictions, our aggregation procedure uses the power likelihood rather than the likelihood and is therefore strictly speaking not Bayesian. Nonetheless, we refer to it as a Bayesian random forest but with a built-in safety. The safeness comes as it has good predictive performance even if the underlying probabilistic model is wrong. We demonstrate empirically that our Safe-Bayesian random forest outperforms MCMC or SMC based Bayesian decision trees in term of speed and accuracy, and achieves competitive performance to entropy or Gini optimised random forest, yet is very simple to construct.
    No preview · Article · Jun 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.
    No preview · Article · May 2015 · Nature
  • John P. Cunningham · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: This is the author accepted manuscript. It is currently under indefinite embargo pending publication by MIT Press.
    No preview · Article · May 2015
  • Gintare Karolina Dziugaite · Daniel M. Roy · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider training a deep neural network to generate samples from an unknown distribution given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an unbiased estimate of the maximum mean discrepancy, which is the centerpiece of the nonparametric kernel two-sample test proposed by Gretton et al. (2012). We compare to the adversarial nets framework introduced by Goodfellow et al. (2014), in which learning is a two-player game between a generator network and an adversarial discriminator network, both trained to outwit the other. From this perspective, the MMD statistic plays the role of the discriminator. In addition to empirical comparisons, we prove bounds on the generalization error incurred by optimizing the empirical MMD.
    No preview · Article · May 2015
  • Source
    Sara Wade · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to classical algorithms which return a single clustering solution, Bayesian nonparametric models provide a posterior over the entire space of partitions, allowing one to assess statistical properties, such as uncertainty on the number of clusters. However, an important problem is how to summarize the posterior; the huge dimension of partition space and difficulties in visualizing it add to this problem. In a Bayesian analysis, the posterior of a real-valued parameter of interest is often summarized by reporting a point estimate such as the posterior mean along with 95% credible intervals to characterize uncertainty. In this paper, we extend these ideas to develop appropriate point estimates and credible sets to summarize the posterior of clustering structure based on decision and information theoretic techniques.
    Full-text · Article · May 2015
  • Source
    Nilesh Tripuraneni · Shane Gu · Hong Ge · Zoubin Ghahramani
    [Show abstract] [Hide abstract]
    ABSTRACT: Infinite Hidden Markov Models (iHMM's) are an attractive, nonparametric generalization of the classical Hidden Markov Model which automatically `infer' the number of hidden states in the model. This avoids the awkward problem of model selection and provides a parameter-free solution for a wide range of applications. Using the stick-breaking construction for the Hierarchical Dirichlet Process (HDP), we present a scalable, truncation-free Particle Gibbs sampler, leveraging Ancestor Sampling, to efficiently sample state trajectories for the infinite HMM. Our algorithm demonstrates state-of-the-art empirical performance and improved mixing while maintaining linear-time complexity in the number of particles in the sampler.
    Preview · Article · May 2015

Publication Stats

18k Citations
374.75 Total Impact Points

Institutions

  • 2006-2015
    • University of Cambridge
      • Department of Engineering
      Cambridge, England, United Kingdom
  • 1998-2012
    • University College London
      • Gatsby Computational Neuroscience Unit
      Londinium, England, United Kingdom
  • 2010
    • Northeastern University
      • Department of Electrical and Computer Engineering
      Boston, MA, United States
  • 2003-2009
    • Carnegie Mellon University
      • Computer Science Department
      Pittsburgh, Pennsylvania, United States
  • 2005-2006
    • UCL Eastman Dental Institute
      Londinium, England, United Kingdom
  • 1999-2005
    • Oxford Centre for Computational Neuroscience
      Oxford, England, United Kingdom
  • 2002
    • McMaster University
      Hamilton, Ontario, Canada
  • 1995-2002
    • University of Toronto
      • Department of Computer Science
      Toronto, Ontario, Canada
  • 1996-2000
    • Massachusetts Institute of Technology
      • • Department of Brain and Cognitive Sciences
      • • Center for Biological and Computational Learning
      Cambridge, Massachusetts, United States