Page 1

Computational Statistics and Data Analysis 52 (2008) 3408–3423

www.elsevier.com/locate/csda

Adaptive rejection Metropolis sampling using Lagrange

interpolation polynomials of degree 2

Renate Meyera,∗, Bo Caib, Franc ¸ois Perronc

aDepartment of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand

bDepartment of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC 29208, United States

cDepartment of Mathematics and Statistics, University of Montreal, Montreal, Quebec, Canada H3C 3J7

Received 16 April 2007; received in revised form 8 January 2008; accepted 9 January 2008

Available online 26 January 2008

Abstract

A crucial problem in Bayesian posterior computation is efficient sampling from a univariate distribution, e.g. a full conditional

distribution in applications of the Gibbs sampler. This full conditional distribution is usually non-conjugate, algebraically complex

and computationally expensive to evaluate. We propose an alternative algorithm, called ARMS2, to the widely used adaptive

rejection sampling technique ARS [Gilks, W.R., Wild, P., 1992. Adaptive rejection sampling for Gibbs sampling. Applied Statistics

41 (2), 337–348; Gilks, W.R., 1992. Derivative-free adaptive rejection sampling for Gibbs sampling. In: Bernardo, J.M., Berger,

J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics, Vol. 4. Clarendon, Oxford, pp. 641–649] for generating a sample from

univariatelog-concavedensities.WhereasARSisbasedonsamplingfrompiecewiseexponentials,thenewalgorithmusestruncated

normal distributions and makes use of a clever auxiliary variable technique [Damien, P., Walker, S.G., 2001. Sampling truncated

normal, beta, and gamma densities. Journal of Computational and Graphical Statistics 10 (2) 206–215]. Furthermore, we extend

this algorithm to deal with non-log-concave densities to provide an enhanced alternative to adaptive rejection Metropolis sampling,

ARMS [Gilks, W.R., Best, N.G., Tan, K.K.C., 1995. Adaptive rejection Metropolis sampling within Gibbs sampling. Applied

Statistics 44, 455–472]. The performance of ARMS and ARMS2 is compared in simulations of standard univariate distributions as

well as in Gibbs sampling of a Bayesian hierarchical state-space model used for fisheries stock assessment.

c ? 2008 Elsevier B.V. All rights reserved.

1. Introduction

The Gibbs sampler (Geman and Geman, 1984; Casella and George, 1992) for the computation of high-dimensional

posterior distributions requires iterative sampling from the univariate full conditional posterior distribution of each

parameter, i.e. its conditional distribution on the data and the current values of all other parameters. As these full

conditionals change from one iteration to the next, are usually non-conjugate and have a complicated algebraic form,

efficient omnibus techniques are needed to generate draws from univariate probability density functions. Apart from

auxiliary methods (Chen and Schmeiser, 1998; Damien et al., 1999; Mira and Tierney, 2002; Neal, 2003), adaptive

rejection sampling algorithms (Gilks and Wild, 1992) are frequently adopted. These are our focus in this article.

∗Corresponding author. Tel.: +64 9 3737599x85755; fax: +64 9 3737018.

E-mail addresses: meyer@stat.auckland.ac.nz (R. Meyer), bocai@gwm.sc.edu (B. Cai), perronf@dms.umontreal.ca (F. Perron).

0167-9473/$ - see front matter c ? 2008 Elsevier B.V. All rights reserved.

doi:10.1016/j.csda.2008.01.005

Page 2

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3409

Gilks and Wild (1992) developed adaptive rejection sampling (ARS), a black-box technique for the rich class of

log-concave density functions. Examples of log-concave densities are listed in the table of Gilks and Wild (1992)

or Devroye (1986), p. 287. In the seminal paper by Dellaportas and Smith (1993) it was shown that under a log-

concave prior the posterior densities for the whole class of generalized linear models with canonical link functions

are log-concave. The same holds for proportional hazards models which are widely used in survival analysis. Various

fast and efficient methods for sampling from log-concave distributions have been proposed in the literature (Devroye,

1986). However, these require the location of the mode of the density and therefore necessitate a time-intensive and

computer-expensive maximization step. This is also the case with the generalization of the ratio-of-uniform methods

proposed by Wakefield et al. (1991, 1994) in the context of sampling from full conditionals in non-linear population

models. This algorithm is even more general in that it does not require log-concave densities, however, the trade-off

for this universality is still the global minimization and maximization in order to find the bounding rectangle.

Usingthefactthatanyconcavefunctioncanbeboundedfromaboveandbelowbyitstangentsandchords,ARSwas

able to dispense with the awkward and time-consuming optimization. It is based on the usual Monte Carlo rejection

sampling using squeezing functions (Ripley, 1987). A further advantage is its adaptivity which reduces the number

of function evaluations of the target density. ARS adapts the envelope and squeezing function after each rejection by

making use of the fact that the target function has already been evaluated at the rejected point. The adaptive envelope

gets closer to the target density with every rejection, thus reducing the rejection probability in the subsequent rejection

sampling step and thereby the probability that the target density needs to be evaluated. To calculate the tangents, the

first derivatives of the density are required in the original algorithm. A derivative-free version (Gilks, 1992) uses

secants instead of tangents, as shown in Figure 1, and thus avoids the need for the specification of derivatives. This

is implemented in the widely used program BUGS (Spiegelhalter et al., 1996). For an efficient rejection sampling

algorithm it is also essential that the envelope density is easy to sample from. This is the case in ARS, the envelope

being a piecewise exponential density.

To sample from non-log-concave distributions, Gilks et al. (1995) developed a general algorithm, adaptive rejection

Metropolis sampling (ARMS). In this algorithm, ARS is supplemented with a Metropolis–Hastings step to deal with

non-log-concave parts.

Although ARS and ARMS are efficient and fast sampling algorithms, we see the potential for improvement. Rather

than construct an envelope for the logarithm of the target density from piecewise linear functions, piecewise quadratic

functions constructed using the Lagrange interpolation polynomial of degree 2 will give a better approximation to

a log-concave density, especially for steep target densities. We will show that due to log-concavity this construction

using quadratic Lagrange interpolation polynomials yields a piecewise Gaussian blanketing density. To sample fast

and efficiently from a truncated normal distribution, we employ a recently proposed auxiliary variable technique

(Damien and Walker, 2001). The piecewise normal rejection function, however, is no longer a strict envelope.

Therefore, we append a Metropolis–Hastings step in analogy to ARMS and furthermore extend ARMS2 to non-

log-concave densities. Although this algorithm, ARMS2, will no longer produce independent samples from the target

density, we will demonstrate that the efficiency increases due to a reduction in the number of function evaluations.

This is of utmost importance in Gibbs sampling where draws from many different full conditionals of complicated

algebraic form are required. It is in these situations that we expect the greatest advantage of ARMS2 over ARMS, as

illustrated in Section 5.

The remainder of the article is organized as follows. To facilitate comparison, set notation and make this paper

self-contained, Section 2 gives a brief description of adaptive rejection sampling and adaptive rejection Metropolis

sampling followed by the specification of the Lagrange interpolation adaptive rejection sampling algorithm in

Section 3. Section 4 compares the performance of ARMS2 and ARMS in some simulation studies. Section 5 applies

both ARMS and ARMS2 in Gibbs samplers for posterior inference in a non-linear non-Gaussian state-space model.

We conclude the paper with a discussion.

2. Adaptive rejection sampling

We considerthe general problemof samplingfrom a givenprobability density function1

set D ⊆ R, which is known only up to a normalizing constant K and for which π(x) is a strictly log-concave function,

i.e. which satisfies

Kπ(x), definedon a convex

logπ(λx + (1 − λ)y) > λlogπ(x) + (1 − λ)logπ(y)

for 0 < λ < 1,x ?= y ∈ D.

Page 3

3410

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

Fig. 1. Construction of the envelope function for a log-concave density in the ARS algorithm.

Let hu(x) denote an blanketing density, i.e. hu(x) ≥ cπ(x), and hl(x) a squeezing function, i.e. hl(x) ≤ cπ(x) for

all x ∈ D and some constant c, then the following step is performed in rejection sampling (RS) until one sample has

been accepted:

Sample y from hu(x), and v independently from Uniform (0,1).

Squeezing test: If v ≤ hl(y)/hu(y) accept y, otherwise

Rejection test: If v ≤ π(y)/hu(y) accept y, otherwise reject y and repeat.

Using a squeezing function which is usually simple and easy to evaluate, has the advantage that it potentially

bypasses the costly evaluation of the target density π(x) and thus reduces the number of target function evaluations.

Furthermore, rejection sampling is efficient if the blanketing density is easy to sample from as well as close to the

target density (Devroye, 1986).

ARS is an adaptive version of RS in the sense that rejected points are put to use in updating the blanketing density

and lower squeezing function, yielding tighter bounds and thereby reducing the rejection probability in the subsequent

rejection sampling step. Note that only rejected points are included in the set of abscissae as the ARS algorithm is

usually used to draw a single sample from a certain distribution, e.g. a full conditional distribution in Gibbs sampling.

Also, only rejected samples have previously necessitated a target function evaluation, accepted points may have been

accepted through the evaluation of the squeezing function. Moreover, rejected draws indicate a substantial disparity

between target and blanketing density and therefore an opportunity to substantially improve the blanketing density

and thus to markedly decrease the probability of having to evaluate the target density in future steps.

Let Sn = {x0 < x1 < ··· < xn < xn+1}, n ≥ 3, denote a current set of abscissae where x0and xn+1are the

possibly infinite lower and upper bounds of D. For 0 ≤ i < j ≤ n + 1 let Li,j(x, Sn) denote the straight line through

the points (xi,logπ(xi)) and (xj,logπ(xj)). If x0= −∞ then define L0,1(x, Sn) = L1,2(x, Sn) and if xn+1= ∞

then define Ln,n+1(x, Sn) = Ln−1,n(x, Sn).

As illustrated in Fig. 1, the piecewise linear function gn(x) is defined by

Similarly, if D is unbounded on the right then the gradient of Ln−1,n(x, Sn) should be negative. Usually n = 3 is

chosen to initialize Sn. For recommendations as to the choice of the starting abscissae, see Gilks et al. (1995).

Due to the concavity of logπ(x), every gn(x) defines an envelope for logπ(x) and therefore

1

Mn

is a blanketing density for π(x), i.e. π(x) ≤ Mnhn(x). Thus, the piecewise exponential density hn(x) can be used as

a proposal density in rejection sampling yielding the following ARS algorithm:

gn(x) =

L0,1(x, Sn)

min{Li−1,i(x, Sn), Li+1,i+2(x, Sn)}

Ln,n+1(x, Sn)

If D is unbounded on the left, starting abscissae should be chosen so that the gradient of L1,2(x, Sn) is positive.

for x ∈ (x0,x1)

for x ∈ [xi,xi+1),i = 1,...,n − 1, and

for x ∈ [xn,xn+1).

(1)

hn(x) =

exp{gn(x)},

where Mn=

?

exp{gn(x)}dx,

Page 4

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3411

Fig. 2. Construction of the envelope function for a non-log-concave density in the ARMS algorithm.

Step 0: initialize n and Sn

Step n + 1: generate a pair x ∼ hn(x), u ∼ U(0,1)

if u ≤

else set Sn+1= Sn∪ {x} and relabel points in Sn+1in ascending order.

Note that a lower linear squeezing function can be defined and utilized as well but is omitted in the description

here.

If the target density is not log-concave, the idea of ARMS is to use the linear function Li,i+1(x, Sn) connecting

two points xi and xi+1in the non-log-concave parts as a pseudo-envelope as illustrated in Fig. 2 and to append

a Metropolis–Hastings step. Unlike ARS, ARMS will not produce independent samples from π(x). The pseudo-

envelope is defined as hn(x) ∝ expgn(x) where

gn(x) = max[Li,i+1(x, Sn),min{Li−1,i(x, Sn), Li+1,i+2(x, Sn)}],

Let xcdenote the current value of x. The ARMS algorithm to sample a new value x∗from π(x) is specified in the

following:

π(x)

exp{gn(x)}accept x

for xi≤ x ≤ xi+1.

(2)

Step 0: Initialize n and Snindependently of the current value xc;

Step 1: Generate x ∼ hn(·) and u ∼ U(0,1);

Step 2: Obtain the next rejection point xa;

if u ≤

else

set Sn+1= Sn∪ {x} (ARS rejection step);

relabel points in Sn+1in ascending order;

increment n and go back to step 1;

π(x)

exp(gn(x)), set xa= x (ARS acceptance step);

Step 3: Return xa;

Step 4: Generate v ∼ U(0,1);

Step 5: Obtain the next state x∗;

if v ≤ αn(xc,xa), set x∗= xa(Metropolis–Hastings acceptance step);

else

set x∗= xc(Metropolis–Hastings rejection step);

Step 6: Return x∗.

Page 5

3412

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

where

αn(xc,xa) = min

?

1,π(xa)min{π(xc),exp(gn(xc))}

π(xc)min{π(xa),exp(gn(xa))}

?

.

If π(x) is log-concave, gn(x) in Eq. (2) reduces to expression (1). Then, hn(x) is a proper envelope and xain the

Metropolis–Hastings step 5 is always accepted.

3. Adaptive rejection sampling using Lagrange interpolation polynomials

In the case of a log-concave target density, the key idea of ARMS2 is to achieve a better approximation to the

log-density by using a piecewise quadratic rather than a piecewise linear function. This piecewise quadratic function

is constructed from Lagrange interpolation polynomials of degree 2.

Definition 1. The Lagrange interpolation polynomial P(x) of degree (n − 1) that passes through the n points

(x1, y1),...,(xn, yn) is given by

n

?

where

n ?

The formula was first published by Waring (1779), rediscovered by Euler in 1783, and published by Lagrange in 1795

(Jeffreys and Jeffreys, 1988).

Suppose, we start by evaluating the logarithm of the target density at three points. Then there exists a unique

quadratic Lagrange interpolation polynomial going through these three points. Due to concavity, the exponent of this

Lagrange polynomial is proportional to the density of a normal random variable. This is shown in Lemma 1.

P(x) =

j=1

Pj(x),

Pj(x) = yj

k=1,k?=j

x − xk

xj− xk.

Lemma 1. Let l(x) be a strictly concave function, xi ∈ R,i = 1,2,3, be in strictly ascending order, yi = l(xi),

i = 1,2,3, and A = {x1, y1,x2, y2,x3, y3}. Then there exists a unique quadratic function Q(x) = Q(x; A) such that

l(xi) = Q(xi) for i = 1,2,3 and the quadratic function Q is given by

Q(x) = −(x − µ)2

2σ2

where

µ = x2+(x3− x2)k1+ (x2− x1)k2

2(k1− k2)

σ2=

c = y2+((x3− x2)k1+ (x2− x1)k2)2

4(x3− x1)(k1− k2)

with

k1=y2− y1

k2=y3− y2

+ c

(3)

,

x3− x1

2(k1− k2)> 0,

and

x2− x1,

x3− x2.

Proof. A direct calculation using Definition 1 with n = 3 shows that l(xi) = Q(xi) for i = 1,2,3. The condition

σ2> 0 is a direct consequence of the strict concavity of l. The uniqueness comes from the fact that two quadratic

functions cannot intersect at more than two points unless they are identical (polynomial uniqueness theorem, see

e.g. Estep (2002)).

?