# Aki VehtariAalto University · Department of Computer Science

Aki Vehtari

Associate professor, D.Sc. (Tech.)

## About

257

Publications

49,169

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

16,166

Citations

Introduction

See my publication list at https://users.aalto.fi/~ave/publications.html. Almost all my papers are available for download from there.

Additional affiliations

January 2010 - present

April 1997 - December 2009

**Helsinki University of Technology**

## Publications

Publications (257)

Auto-regressive moving-average (ARMA) models are ubiquitous forecasting tools. Parsimony in such models is highly valued for their interpretability and generalisation, and as such the identification of model orders remains a fundamental task. We propose a novel method of ARMA order identification through projection predictive inference. Our procedu...

Statistical models can involve implicitly defined quantities, such as solutions to nonlinear ordinary differential equations (ODEs), that unavoidably need to be numerically approximated in order to evaluate the model. The approximation error inherently biases statistical inference results, but the amount of this bias is generally unknown and often...

Variable selection, or more generally, model reduction is an important aspect of the statistical workflow aiming to provide insights from data. In this paper, we discuss and demonstrate the benefits of using a reference model in variable selection. A reference model acts as a noise-filter on the target variable by modeling its data generating mecha...

Assessing goodness of fit to a given distribution plays an important role in computational statistics. The probability integral transformation (PIT) can be used to convert the question of whether a given sample originates from a reference distribution into a problem of testing for uniformity. We present new simulation- and optimization-based method...

We describe a numerical scheme for evaluating the posterior moments of Bayesian linear regression models with partial pooling of the coefficients. The principal analytical tool of the evaluation is a change of basis from coefficient space to the space of singular vectors of the matrix of predictors. After this change of basis and an analytical inte...

Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In th...

BACKGROUND
The dopamine system contributes to a multitude of functions ranging from reward and motivation to learning and movement control, making it a key component in goal-directed behavior. Altered dopaminergic function is observed in neurological and psychiatric conditions. Numerous factors have been proposed to influence dopamine function, but...

Specification of the prior distribution for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. Prior elicitation transforms domain knowledge of various kinds into well-defined prior distributions, and offers a solution to the prior specification problem, in principl...

We describe a class of algorithms for evaluating posterior moments of certain Bayesian linear regression models with a normal likelihood and a normal prior on the regression coefficients. The proposed methods can be used for hierarchical mixed effects models with partial pooling over one group of predictors, as well as random effects models with pa...

This paper considers reparameterization invariant Bayesian point estimates and credible regions of model parameters for scientific inference and communication. The effect of intrinsic loss function choice in Bayesian intrinsic estimates and regions is studied with the following findings. A particular intrinsic loss function, using Kullback-Leibler...

Given a reference model that includes all the available variables, projection predictive inference replaces its posterior with a constrained projection including only a subset of all variables. We extend projection predictive inference to enable computationally efficient variable and structure selection in models outside the exponential family. By...

BACKGROUND
The dopamine system contributes to a multitude of functions ranging from reward and motivation to learning and movement control, making it a key component in goal-directed behavior. Altered dopaminergic function is observed in neurological and psychiatric conditions. Numerous factors have been proposed to influence dopamine function, but...

We introduce Pathfinder, a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathf...

Determining the sensitivity of the posterior to perturbations of the prior and likelihood is an important part of the Bayesian workflow. We introduce a practical and computationally efficient sensitivity analysis approach that is applicable to a wide range of models, based on power-scaling perturbations. We suggest a diagnostic based on this that c...

We review the most important statistical ideas of the past half century, which we categorize as: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, Bayesian multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data an...

Cross-validation can be used to measure a model’s predictive accuracy for the purpose of model comparison, averaging, or selection. Standard leave-one-out cross-validation (LOO-CV) requires that the observation model can be factorized into simple terms, but a lot of important models in temporal and spatial statistics do not have this property or ar...

In some scientific fields, it is common to have certain variables of interest that are of particular importance and for which there are many studies indicating a relationship with a different explanatory variable. In such cases, particularly those where no relationships are known among explanatory variables, it is worth asking under what conditions...

The increasing availability of structured but high dimensional data has opened new opportunities for optimization. One emerging and promising avenue is the exploration of unsupervised methods for projecting structured high dimensional data into low dimensional continuous representations, simplifying the optimization problem and enabling the applica...

Sea urchin (Loxechinus albus) is one of the most important benthic resource in Chile. Due to their large-scale spatial metapopulation structure, sea urchin subpopulations are interconnected by larval dispersion, so the recovery of local abundance depends on the distance and hydrodynamic characteristics of their spatial domain. Currently, this resou...

Assessing goodness of fit to a given distribution plays an important role in computational statistics. The Probability integral transformation (PIT) can be used to convert the question of whether a given sample originates from a reference distribution into a problem of testing for uniformity. We present new simulation and optimization based methods...

Adaptive importance sampling is a class of techniques for finding good proposal distributions for importance sampling. Often the proposal distributions are standard probability distributions whose parameters are adapted based on the mismatch between the current proposal and a target distribution. In this work, we present an implicit adaptive import...

We explore the limitations of and best practices for using black-box variational inference to estimate posterior summaries of the model parameters. By taking an importance sampling perspective, we are able to explain and empirically demonstrate: 1) why the intuitions about the behavior of approximate families and divergences for low-dimensional pos...

Drug development commonly studies an adult population first and then the pediatric population. The knowledge from the adult population is taken advantage of for the design of the pediatric trials. Adjusted drug doses for these are often derived from adult pharmacokinetic (PK) models which are extrapolated to patients with smaller body size. This ex...

Stacking is a widely used model averaging technique that yields asymptotically optimal prediction among all linear averages. We show that stacking is most effective when the model predictive performance is heterogeneous in inputs, so that we can further improve the stacked mixture with a hierarchical model. With the input-varying yet partially-pool...

Motivation: Longitudinal study designs are indispensable for studying disease progression. Inferring covariate effects from longitudinal data, however, requires interpretable methods that can model complicated covariance structures and detect non-linear effects of both categorical and continuous covariates, as well as their interactions. Detecting...

Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not...

The increasing availability of structured but high dimensional data has opened new opportunities for optimization. One emerging and promising avenue is the exploration of unsupervised methods for projecting structured high dimensional data into low dimensional continuous representations, simplifying the optimization problem and enabling the applica...

We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss common...

We describe a numerical scheme for evaluating the posterior moments of Bayesian linear regression models with partial pooling of the coefficients. The principal analytical tool of the evaluation is a change of basis from coefficient space to the space of singular vectors of the matrix of predictors. After this change of basis and an analytical inte...

The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit Bayesian models, but this still leaves us with many options regarding constructing, evaluating, and using thes...

Gaussian latent variable models are a key class of Bayesian hierarchical models with applications in many fields. Performing Bayesian inference on such models can be challenging as Markov chain Monte Carlo algorithms struggle with the geometry of the resulting posterior distribution and can be prohibitively slow. An alternative is to use a Laplace...

Projection predictive inference is a decision theoretic Bayesian approach that decouples model estimation from decision making. Given a reference model previously built including all variables present in the data, projection predictive inference projects its posterior onto a constrained space of a subset of variables. Variable selection is then per...

Published now in Bioinformatics: https://academic.oup.com/bioinformatics/article/37/13/1860/6104850

The normalizing constant plays an important role in Bayesian computation, and there is a large literature on methods for computing or approximating normalizing constants that cannot be evaluated in closed form. When the normalizing constant varies by orders of magnitude, methods based on importance sampling can require many rounds of tuning. We pre...

We consider the problem of fitting variational posterior approximations usingstochastic optimization methods. The performance of these approximations de-pends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective.We show that even in the bes...

A salient approach to interpretable machine learning is to restrict modeling to simple models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users’ preferences, not the data generation mechanism; it is more natural to formu...

When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. It is known, however, that no unbiased estimator for the variance can be constructed in a general case. While it has not been discussed before, it could be possibl...

Leave-one-out cross-validation (LOO-CV) is a popular method for comparing Bayesian models based on their estimated predictive performance on new, unseen, data. Estimating the uncertainty of the resulting LOO-CV estimate is a complex task and it is known that the commonly used standard error estimate is often too small. We analyse the frequency prop...

Latent Gaussian models are a common class of Bayesian hierarchical models, characterized by a normally distributed local parameter. The posterior distribution of such models often induces a geometry that frustrates sampling algorithms, such as Stan’s Hamiltonian Monte Carlo (HMC), resulting in an incomplete or slow exploration of the parameter spac...

One of the common goals of time series analysis is to use the observed series to inform predictions for future observations. In the absence of any actual new data to predict, cross-validation can be used to estimate a model's future predictive accuracy, for instance, for the purpose of model comparison or selection. Exact cross-validation for Bayes...

When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms can have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights...

Many data sets contain an inherent multilevel structure, for example, because of repeated measurements of the same observational units. Taking this structure into account is critical for the accuracy and calibration of any statistical analysis performed on such data. However, the large number of possible model configurations hinders the use of mult...

Alterations in the brain’s μ-opioid receptor (MOR) system have been associated with several neuropsychiatric diseases. Also healthy individuals vary considerably in MOR availability. Multiple epidemiological factors have been proposed to influence MOR system, but due to small sample sizes the magnitude of their influence remains inconclusive. We co...

Variable selection, or more generally, model reduction is an important aspect of the statistical workflow aiming to provide insights from data. In this paper, we discuss and demonstrate the benefits of using a reference model in variable selection. A reference model acts as a noise-filter on the target variable by modeling its data generating mecha...

Gaussian processes are powerful non-parametric probabilistic models for stochastic functions. However they entail a complexity that is computationally intractable when the number of observations is large, especially when estimated with fully Bayesian methods such as Markov chain Monte Carlo. In this paper, we focus on a novel approach for low-rank...

Most research in Bayesian optimization (BO) has focused on direct feedback scenarios, where one has access to exact, or perturbed, values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyper-parameter configuration problems. However, in domains such as modelling human preferences,...

Bayesian model comparison is often based on the posterior distribution over the set of compared models. This distribution is often observed to concentrate on a single model even when other measures of model fit or forecasting ability indicate no strong preference. Furthermore, a moderate change in the data sample can easily shift the posterior mode...

Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method...

The minimum mode following method can be used to find saddle points on an energy surface by following a direction guided by the lowest curvature mode. Such calculations are often started close to a minimum on the energy surface to find out which transitions can occur from an initial state of the system, but it is also common to start from the vicin...

Microfading Spectrometry (MFS) is a method for assessing light sensitivity color (spectral) variations of cultural heritage objects. The MFS technique provides measurements of the surface under study, where each point of the surface gives rise to a time-series that represents potential spectral (color) changes due to sunlight exposition over time....

The brain's mu-opioid receptors (MORs) are involved in analgesia, reward and mood regulation. Several neuropsychiatric diseases have been associated with dysfunctional MOR system, and there is also considerable variation in receptor density among healthy individuals. Sex, age, body mass and smoking have been proposed to influence the MOR system, bu...

Calculations of minimum energy paths for atomic rearrangements using the nudged elastic band method can be accelerated with Gaussian process regression to reduce the number of energy and atomic force evaluations needed for convergence. Problems can arise, however, when configurations with large forces due to short distance between atoms are include...

The minimum mode following method can be used to find saddle points on an energy surface by following a direction guided by the lowest curvature mode. Such calculations are often started close to a minimum on the energy surface to find out which transitions can occur from an initial state of the system, but it is also common to start from the vicin...

A salient approach to interpretable machine learning is to restrict modeling to simple and hence understandable models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users' preferences, not the data generation mechanism: it...

For complex nonlinear supervised learning models, assessing the relevance of input variables or their interactions is not straightforward due to the lack of a direct measure of relevance, such as the regression coefficients in generalized linear models. One can assess the relevance of input variables locally by using the mean prediction or its deri...

Surrogate models such as Gaussian processes (GP) have been proposed to accelerate approximate Bayesian computation (ABC) when the statistical model of interest is expensive-to-simulate. In one such promising framework the discrepancy between simulated and observed data is modelled with a GP. So far principled strategies have been proposed only for...

Calculations of minimum energy paths for atomic rearrangements using the nudged elastic band method can be accelerated with Gaussian process regression to reduce the number of energy and atomic force evaluations needed for convergence. Problems can arise, however, when configurations with large forces due to short distance between atoms are include...

The accuracy of an integral approximation via Monte Carlo sampling depends on the distribution of the integrand and the existence of its moments. In importance sampling, the choice of the proposal distribution markedly affects the existence of these moments and thus the accuracy of the obtained integral approximation. In this work, we present a met...

Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, surrogate models and Bayesian optimisation (BO) have been proposed. Bayesia...

We present a selection criterion for the Euclidean metric adapted during warmup in a Hamiltonian Monte Carlo sampler that makes it possible for a sampler to automatically pick the metric based on the model and the availability of warmup draws. Additionally, we present a new adaptation inspired by the selection criterion that requires significantly...