Gaussian Variational Approximate Inference for Generalized Linear Mixed Models
ABSTRACT Variational approximation methods have become a mainstay of contemporary Machine Learning methodology, but currently have little presence in Statistics. We devise an effective variational approximation strategy for fitting generalized linear mixed models (GLMM) appropriate for grouped data. It involves Gaussian approximation to the distributions of random effects vectors, conditional on the responses. We show that Gaussian variational approximation is a relatively simple and natural alternative to Laplace approximation for fast, non-Monte Carlo, GLMM analysis. Numerical studies show Gaussian variational approximation to be very accurate in grouped data GLMM contexts. Finally, we point to some recent theory on consistency of Gaussian variational approximation in this context.
University of Wollongong
Centre for Statistical & Survey Methodology
Working Paper Series
Faculty of Informatics
Gaussian Variational Approximate Inference for
Generalized Linear Mixed Models
J. T. Ormerod
University of Wollongong, firstname.lastname@example.org
M. P. Wand
University of Wollongong, email@example.com
Research Online is the open access institutional repository for the
University of Wollongong. For further information contact Manager
Repository Services: firstname.lastname@example.org.
Ormerod, J. T. and Wand, M. P., Gaussian Variational Approximate Inference for Generalized Linear Mixed Models, Centre for
Statistical and Survey Methodology, University of Wollongong, Working Paper 04-09, 2009, 31p.
Copyright © 2008 by the Centre for Statistical & Survey Methodology, UOW. Work in progress,
no part of this paper may be reproduced without permission from the Centre.
Centre for Statistical & Survey Methodology, University of Wollongong, Wollongong NSW
2522. Phone +61 2 4221 5435, Fax +61 2 4221 4845. Email: email@example.com
Centre for Statistical and Survey Methodology
The University of Wollongong
Gaussian Variational Approximate Inference for Generalized
Linear Mixed Models.
Ormerod, J.T. and Wand, M.P
Gaussian Variational Approximate Inference for Generalized
Linear Mixed Models
BY J.T. ORMEROD AND M.P. WAND
Centre for Statistical and Survey Methodology,
School of Mathematics and Applied Statistics,
University of Wollongong, Wollongong 2522, Australia.
7th July, 2009
Variational approximation methods have become a mainstay of contemporary Machine Learn-
ing methodology, but currently have little presence in Statistics. We devise an effective vari-
ational approximation strategy for fitting generalized linear mixed models (GLMM) appro-
priate for grouped data. It involves Gaussian approximation to the distributions of random
effects vectors, conditional on the responses. We show that Gaussian variational approximation is
a relatively simple and natural alternative to Laplace approximation for fast, non-Monte Carlo,
GLMM analysis. Numerical studies show Gaussian variational approximation to be very accu-
rate in grouped data GLMM contexts. Finally, we point to some recent theory on consistency
of Gaussian variational approximation in this context.
Key Words: Best prediction; Longitudinal data analysis; Likelihood-based inference; Machine
learning; Variance components.
Statistical and probabilistic models continue to grow in complexity in response to the demands
of modern applications. Fitting and inference for such models is an ongoing issue and new
sectors of research have emerged to meet this challenge. In Statistics, the most prominent of
these is Markov chain Monte Carlo (MCMC), which is continually being tailored to handle dif-
ficult inferential questions arising in, for example, Bayesian models (e.g. Gelman, Carlin, Stern
& Rubin, 2004; Marin & Robert, 2007, Carlin & Louis, 2008), mixed and latent variable models
(e.g. Skrondal & Rabe-Hesketh, 2004; McCulloch, Searle & Neuhaus, 2008) and missing data
models (e.g. Little & Rubin, 2002). The main difficulty addressed by MCMC is the presence of
intractable multivariate integrals in likelihood and posterior density expressions.
In parallel to these developments in Statistics, the Machine Learning community has been
These variational approximations sacrifice some of the accuracy of MCMC by solving perturbed
Motivating settings include probabilistic graphical models, hidden Markov models and phylo-
genetic trees. Summaries of recent variational approximation research may be found in Jordan
et al. (1999), Jordan (2004) and Bishop (2006). An introduction to variational approximation
from a statistical perspective is provided by Ormerod & Wand (2009).
In this article we help bring variational approximation into mainstream Statistics by tailor-
ing it to the most popular current setting for which integration difficulties arise: generalized
linear mixed models (GLMM). In the interest of conciseness, we focus on the commonest type
of GLMM – that arising in the analysis of grouped data with Gaussian random effects. General
design GLMMs, as described in Zhao, Coull, Staudenmayer & Wand (2006), are not treated
here. We identify a particular type of variational approximation that is well-suited to grouped
data GLMMs. It involves approximation of the distributions of random effects vectors, given
the responses, by Gaussian distributions. The resulting Gaussian variational approximation (GV A)
approach emerges as a new alternative to Laplace approximation for fast, deterministic fitting
of grouped data GLMMs. Conceptually, the approach is very simple: its derivation requires
application of Jensen’s inequality to the log-likelihood to obtain a variational lower bound.
rameters. GVA involves a little more algebra and calculus to implement compared with some
of the simpler versions of Laplace approximation such as penalized quasi-likelihood (PQL)
(Breslow & Clayton, 1993). However, with the aid of the formulae presented in Appendix A,
effective computation can be achieved in order m operations, where m is the number of groups.
For some GLMMs, such as Poisson GLMMs, the GVA completely eradicates the need for inte-
gration. In others, such as logistic GLMMs, only univariate numerical integration is required on
Standard errors for fixed effect and covariance parameter estimates are a by-product of the
fitting algorithm. Approximate best predictions for the random effects, along with predic-
tion variances, also arise quite simply from the Gaussian approximation. Moreover, numerical
studies show GVA to be very accurate; often almost as good as MCMC and a significant im-
provement on PQL. Other varieties of Laplace approximation (e.g. Raudenbush, Yang & Yosef,
2000; Lee & Nelder, 1996; Rue, Martino & Chopin, 2009), also offer accuracy improvements
over PQL but, like GVA, have their own costs in terms of implementability.
Recently, Hall, Ormerod & Wand (2009) investigated the theoretical properties of GVA for a
simple special case of the grouped data GLMMs considered here. They established root-m con-
sistency of Gaussian variational approximate maximum likelihood estimators under relatively
A significant portion of variational approximation methodology is based on the notion of
factorized density approximations to key conditional densities with respect to Kullback-Liebler
distance (e.g. Bishop, 2006, Section 10.1). This general strategy is sometimes called mean field
approximation, and has its roots in 1980s Statistical Physics research (Parisi, 1988). However,
mean field approximation is not well-suited to GLMMs since they lack the conjugacy that nor-
mally give rise to explicit updating formulae. In addition, mean field approximation has a
tendency to markedly underestimate the variability of parameter estimates (Wang & Tittering-
ton, 2005; Rue et al., 2009).
& Jordan (2000). It may be applied to logistic GLMMs (Ormerod, 2008) but does not ex-
tend to other situations such as Poisson response models. We have also encountered variance
under-estimation problems with the Jaakkola and Jordan variational approximation (Ormerod
& Wand, 2008).
The use of Gaussian densities in variational approximation has a small and recent litera-
ture in Machine Learning. Gaussian variational approximations have been considered in the
context of neural networks (Barber & Bishop, 1998; Honkela & Valpola, 2004), Support Vec-
tor Machines (Seeger, 2000), stochastic differential equations (Archambeah, Cornford, Opper &