About
24
Publications
2,784
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
151
Citations
Citations since 2017
Introduction
I'm a postdoctoral researcher at Google Brain Berlin (powered by Adecco). I'm interested in Bayesian deep learning, approximate inference and probabilistic models.
Publications
Publications (24)
Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, lead to strong performance. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that these two approaches have complementary features...
Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications. While there have been many proposed methods that either focus on distance-aware model uncertainties for out-of-distribution detection or on input-dependent label uncertainties for in-distr...
High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often l...
Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to...
We propose automated augmented conjugate inference, a new inference method for non-conjugate Gaussian processes (GP) models. Our method automatically constructs an auxiliary variable augmentation that renders the GP model conditionally conjugate. Building on the conjugate structure of the augmented model, we develop two inference methods. First, a...
During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early...
We propose a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. The new likelihood has two benefits: it leads to well-calibrated uncertainty estimates and allows for an efficient latent variable augmentation. The augmented model has the advantage that it is conditionally conju...
We propose a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. The new likelihood has two benefits: it leads to well-calibrated uncertainty estimates and allows for an efficient latent variable augmentation. The augmented model has the advantage that it is conditionally conju...
Normalizing flows provide a general approach to construct flexible variational posteriors. The parameters are learned by stochastic optimization of the variational bound, but inference can be slow due to high variance of the gradient estimator. We propose Quasi-Monte Carlo (QMC) flows which reduce the variance of the gradient estimator by one order...
We present AugmentedGaussianProcesses.jl, a software package for augmented stochastic variational inference (ASVI) for Gaussian process models with non-conjugate likelihood functions. The idea of ASVI is to find an augmentation of the original GP model which renders the model conditionally conjugate and perform inference in the augmented model. We...
Many machine learning problems involve Monte Carlo gradient estimators. As a prominent example, we focus on Monte Carlo variational inference (MCVI) in this paper. The performance of MCVI crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (QMC) sampling. QMC replaces N i.i.d. s...
Many machine learning problems involve Monte Carlo gradient estimators. As a prominent example , we focus on Monte Carlo variational inference (mcvi) in this paper. The performance of mcvi crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (qmc) sampling. qmc replaces N i.i.d....
Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular...
Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular...
We propose an efficient stochastic variational approach to GP classification building on Polya- Gamma data augmentation and inducing points, which is based on closed-form updates of natural gradients. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to three orders of magnitude f...
This paper proposes a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. This form of likelihood allows for a latent variable augmentation that leads to a conditionally conjugate model and enables efficient variational inference via block coordinate ascent updates. Our experim...
We propose an efficient stochastic variational approach to Gaussian Process (GP) classification building on Pólya-Gamma data augmentation and inducing points, which is based on closed-form updates of natural gradients. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two order...
Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that topics change continuously over time and therefore impose continuous stochastic process priors on their model parameters. In this paper, we extend the class of tractable priors from Wiener processes to...
Linear mixed models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for lin...
We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors s...
We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors s...
We develop a variational inference (VI) scheme for the recently proposed Bayesian kernel support vector machine (SVM) and a stochastic version (SVI) for the linear Bayesian SVM. We compute the SVM's posterior, paving the way to apply attractive Bayesian techniques, as we exemplify in our experiments by means of automated model selection.
Previous work on inference for dynamic mixture models has so far been directed to models that follow a simple Brownian motion diffusion over time and pursued a batch inference approach. We generalize the underlying dynamics model to follow a Gaussian process, introducing a novel class of dynamic priors for mixture models. Further, we propose a stoc...
A large class of problems in statistical genetics amounts to finding a sparse
linear effect in a binary classification setup, such as finding a small set of
genes that most strongly predict a disease. Very often, these signals are
spurious and obfuscated by confounders such as age, ethnicity or population
structure. In the probit regression model,...
Projects
Projects (5)
Develope low-variance gradient estimators to speed up stochastic optimization. In particular, we focus on making Monte Carlo based variational inference faster.