Page 1

Computational Statistics and Data Analysis 52 (2008) 3408–3423

www.elsevier.com/locate/csda

Adaptive rejection Metropolis sampling using Lagrange

interpolation polynomials of degree 2

Renate Meyera,∗, Bo Caib, Franc ¸ois Perronc

aDepartment of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand

bDepartment of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC 29208, United States

cDepartment of Mathematics and Statistics, University of Montreal, Montreal, Quebec, Canada H3C 3J7

Received 16 April 2007; received in revised form 8 January 2008; accepted 9 January 2008

Available online 26 January 2008

Abstract

A crucial problem in Bayesian posterior computation is efficient sampling from a univariate distribution, e.g. a full conditional

distribution in applications of the Gibbs sampler. This full conditional distribution is usually non-conjugate, algebraically complex

and computationally expensive to evaluate. We propose an alternative algorithm, called ARMS2, to the widely used adaptive

rejection sampling technique ARS [Gilks, W.R., Wild, P., 1992. Adaptive rejection sampling for Gibbs sampling. Applied Statistics

41 (2), 337–348; Gilks, W.R., 1992. Derivative-free adaptive rejection sampling for Gibbs sampling. In: Bernardo, J.M., Berger,

J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics, Vol. 4. Clarendon, Oxford, pp. 641–649] for generating a sample from

univariatelog-concavedensities.WhereasARSisbasedonsamplingfrompiecewiseexponentials,thenewalgorithmusestruncated

normal distributions and makes use of a clever auxiliary variable technique [Damien, P., Walker, S.G., 2001. Sampling truncated

normal, beta, and gamma densities. Journal of Computational and Graphical Statistics 10 (2) 206–215]. Furthermore, we extend

this algorithm to deal with non-log-concave densities to provide an enhanced alternative to adaptive rejection Metropolis sampling,

ARMS [Gilks, W.R., Best, N.G., Tan, K.K.C., 1995. Adaptive rejection Metropolis sampling within Gibbs sampling. Applied

Statistics 44, 455–472]. The performance of ARMS and ARMS2 is compared in simulations of standard univariate distributions as

well as in Gibbs sampling of a Bayesian hierarchical state-space model used for fisheries stock assessment.

c ? 2008 Elsevier B.V. All rights reserved.

1. Introduction

The Gibbs sampler (Geman and Geman, 1984; Casella and George, 1992) for the computation of high-dimensional

posterior distributions requires iterative sampling from the univariate full conditional posterior distribution of each

parameter, i.e. its conditional distribution on the data and the current values of all other parameters. As these full

conditionals change from one iteration to the next, are usually non-conjugate and have a complicated algebraic form,

efficient omnibus techniques are needed to generate draws from univariate probability density functions. Apart from

auxiliary methods (Chen and Schmeiser, 1998; Damien et al., 1999; Mira and Tierney, 2002; Neal, 2003), adaptive

rejection sampling algorithms (Gilks and Wild, 1992) are frequently adopted. These are our focus in this article.

∗Corresponding author. Tel.: +64 9 3737599x85755; fax: +64 9 3737018.

E-mail addresses: meyer@stat.auckland.ac.nz (R. Meyer), bocai@gwm.sc.edu (B. Cai), perronf@dms.umontreal.ca (F. Perron).

0167-9473/$ - see front matter c ? 2008 Elsevier B.V. All rights reserved.

doi:10.1016/j.csda.2008.01.005

Page 2

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3409

Gilks and Wild (1992) developed adaptive rejection sampling (ARS), a black-box technique for the rich class of

log-concave density functions. Examples of log-concave densities are listed in the table of Gilks and Wild (1992)

or Devroye (1986), p. 287. In the seminal paper by Dellaportas and Smith (1993) it was shown that under a log-

concave prior the posterior densities for the whole class of generalized linear models with canonical link functions

are log-concave. The same holds for proportional hazards models which are widely used in survival analysis. Various

fast and efficient methods for sampling from log-concave distributions have been proposed in the literature (Devroye,

1986). However, these require the location of the mode of the density and therefore necessitate a time-intensive and

computer-expensive maximization step. This is also the case with the generalization of the ratio-of-uniform methods

proposed by Wakefield et al. (1991, 1994) in the context of sampling from full conditionals in non-linear population

models. This algorithm is even more general in that it does not require log-concave densities, however, the trade-off

for this universality is still the global minimization and maximization in order to find the bounding rectangle.

Usingthefactthatanyconcavefunctioncanbeboundedfromaboveandbelowbyitstangentsandchords,ARSwas

able to dispense with the awkward and time-consuming optimization. It is based on the usual Monte Carlo rejection

sampling using squeezing functions (Ripley, 1987). A further advantage is its adaptivity which reduces the number

of function evaluations of the target density. ARS adapts the envelope and squeezing function after each rejection by

making use of the fact that the target function has already been evaluated at the rejected point. The adaptive envelope

gets closer to the target density with every rejection, thus reducing the rejection probability in the subsequent rejection

sampling step and thereby the probability that the target density needs to be evaluated. To calculate the tangents, the

first derivatives of the density are required in the original algorithm. A derivative-free version (Gilks, 1992) uses

secants instead of tangents, as shown in Figure 1, and thus avoids the need for the specification of derivatives. This

is implemented in the widely used program BUGS (Spiegelhalter et al., 1996). For an efficient rejection sampling

algorithm it is also essential that the envelope density is easy to sample from. This is the case in ARS, the envelope

being a piecewise exponential density.

To sample from non-log-concave distributions, Gilks et al. (1995) developed a general algorithm, adaptive rejection

Metropolis sampling (ARMS). In this algorithm, ARS is supplemented with a Metropolis–Hastings step to deal with

non-log-concave parts.

Although ARS and ARMS are efficient and fast sampling algorithms, we see the potential for improvement. Rather

than construct an envelope for the logarithm of the target density from piecewise linear functions, piecewise quadratic

functions constructed using the Lagrange interpolation polynomial of degree 2 will give a better approximation to

a log-concave density, especially for steep target densities. We will show that due to log-concavity this construction

using quadratic Lagrange interpolation polynomials yields a piecewise Gaussian blanketing density. To sample fast

and efficiently from a truncated normal distribution, we employ a recently proposed auxiliary variable technique

(Damien and Walker, 2001). The piecewise normal rejection function, however, is no longer a strict envelope.

Therefore, we append a Metropolis–Hastings step in analogy to ARMS and furthermore extend ARMS2 to non-

log-concave densities. Although this algorithm, ARMS2, will no longer produce independent samples from the target

density, we will demonstrate that the efficiency increases due to a reduction in the number of function evaluations.

This is of utmost importance in Gibbs sampling where draws from many different full conditionals of complicated

algebraic form are required. It is in these situations that we expect the greatest advantage of ARMS2 over ARMS, as

illustrated in Section 5.

The remainder of the article is organized as follows. To facilitate comparison, set notation and make this paper

self-contained, Section 2 gives a brief description of adaptive rejection sampling and adaptive rejection Metropolis

sampling followed by the specification of the Lagrange interpolation adaptive rejection sampling algorithm in

Section 3. Section 4 compares the performance of ARMS2 and ARMS in some simulation studies. Section 5 applies

both ARMS and ARMS2 in Gibbs samplers for posterior inference in a non-linear non-Gaussian state-space model.

We conclude the paper with a discussion.

2. Adaptive rejection sampling

We considerthe general problemof samplingfrom a givenprobability density function1

set D ⊆ R, which is known only up to a normalizing constant K and for which π(x) is a strictly log-concave function,

i.e. which satisfies

Kπ(x), definedon a convex

logπ(λx + (1 − λ)y) > λlogπ(x) + (1 − λ)logπ(y)

for 0 < λ < 1,x ?= y ∈ D.

Page 3

3410

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

Fig. 1. Construction of the envelope function for a log-concave density in the ARS algorithm.

Let hu(x) denote an blanketing density, i.e. hu(x) ≥ cπ(x), and hl(x) a squeezing function, i.e. hl(x) ≤ cπ(x) for

all x ∈ D and some constant c, then the following step is performed in rejection sampling (RS) until one sample has

been accepted:

Sample y from hu(x), and v independently from Uniform (0,1).

Squeezing test: If v ≤ hl(y)/hu(y) accept y, otherwise

Rejection test: If v ≤ π(y)/hu(y) accept y, otherwise reject y and repeat.

Using a squeezing function which is usually simple and easy to evaluate, has the advantage that it potentially

bypasses the costly evaluation of the target density π(x) and thus reduces the number of target function evaluations.

Furthermore, rejection sampling is efficient if the blanketing density is easy to sample from as well as close to the

target density (Devroye, 1986).

ARS is an adaptive version of RS in the sense that rejected points are put to use in updating the blanketing density

and lower squeezing function, yielding tighter bounds and thereby reducing the rejection probability in the subsequent

rejection sampling step. Note that only rejected points are included in the set of abscissae as the ARS algorithm is

usually used to draw a single sample from a certain distribution, e.g. a full conditional distribution in Gibbs sampling.

Also, only rejected samples have previously necessitated a target function evaluation, accepted points may have been

accepted through the evaluation of the squeezing function. Moreover, rejected draws indicate a substantial disparity

between target and blanketing density and therefore an opportunity to substantially improve the blanketing density

and thus to markedly decrease the probability of having to evaluate the target density in future steps.

Let Sn = {x0 < x1 < ··· < xn < xn+1}, n ≥ 3, denote a current set of abscissae where x0and xn+1are the

possibly infinite lower and upper bounds of D. For 0 ≤ i < j ≤ n + 1 let Li,j(x, Sn) denote the straight line through

the points (xi,logπ(xi)) and (xj,logπ(xj)). If x0= −∞ then define L0,1(x, Sn) = L1,2(x, Sn) and if xn+1= ∞

then define Ln,n+1(x, Sn) = Ln−1,n(x, Sn).

As illustrated in Fig. 1, the piecewise linear function gn(x) is defined by

Similarly, if D is unbounded on the right then the gradient of Ln−1,n(x, Sn) should be negative. Usually n = 3 is

chosen to initialize Sn. For recommendations as to the choice of the starting abscissae, see Gilks et al. (1995).

Due to the concavity of logπ(x), every gn(x) defines an envelope for logπ(x) and therefore

1

Mn

is a blanketing density for π(x), i.e. π(x) ≤ Mnhn(x). Thus, the piecewise exponential density hn(x) can be used as

a proposal density in rejection sampling yielding the following ARS algorithm:

gn(x) =

L0,1(x, Sn)

min{Li−1,i(x, Sn), Li+1,i+2(x, Sn)}

Ln,n+1(x, Sn)

If D is unbounded on the left, starting abscissae should be chosen so that the gradient of L1,2(x, Sn) is positive.

for x ∈ (x0,x1)

for x ∈ [xi,xi+1),i = 1,...,n − 1, and

for x ∈ [xn,xn+1).

(1)

hn(x) =

exp{gn(x)},

where Mn=

?

exp{gn(x)}dx,

Page 4

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3411

Fig. 2. Construction of the envelope function for a non-log-concave density in the ARMS algorithm.

Step 0: initialize n and Sn

Step n + 1: generate a pair x ∼ hn(x), u ∼ U(0,1)

if u ≤

else set Sn+1= Sn∪ {x} and relabel points in Sn+1in ascending order.

Note that a lower linear squeezing function can be defined and utilized as well but is omitted in the description

here.

If the target density is not log-concave, the idea of ARMS is to use the linear function Li,i+1(x, Sn) connecting

two points xi and xi+1in the non-log-concave parts as a pseudo-envelope as illustrated in Fig. 2 and to append

a Metropolis–Hastings step. Unlike ARS, ARMS will not produce independent samples from π(x). The pseudo-

envelope is defined as hn(x) ∝ expgn(x) where

gn(x) = max[Li,i+1(x, Sn),min{Li−1,i(x, Sn), Li+1,i+2(x, Sn)}],

Let xcdenote the current value of x. The ARMS algorithm to sample a new value x∗from π(x) is specified in the

following:

π(x)

exp{gn(x)}accept x

for xi≤ x ≤ xi+1.

(2)

Step 0: Initialize n and Snindependently of the current value xc;

Step 1: Generate x ∼ hn(·) and u ∼ U(0,1);

Step 2: Obtain the next rejection point xa;

if u ≤

else

set Sn+1= Sn∪ {x} (ARS rejection step);

relabel points in Sn+1in ascending order;

increment n and go back to step 1;

π(x)

exp(gn(x)), set xa= x (ARS acceptance step);

Step 3: Return xa;

Step 4: Generate v ∼ U(0,1);

Step 5: Obtain the next state x∗;

if v ≤ αn(xc,xa), set x∗= xa(Metropolis–Hastings acceptance step);

else

set x∗= xc(Metropolis–Hastings rejection step);

Step 6: Return x∗.

Page 5

3412

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

where

αn(xc,xa) = min

?

1,π(xa)min{π(xc),exp(gn(xc))}

π(xc)min{π(xa),exp(gn(xa))}

?

.

If π(x) is log-concave, gn(x) in Eq. (2) reduces to expression (1). Then, hn(x) is a proper envelope and xain the

Metropolis–Hastings step 5 is always accepted.

3. Adaptive rejection sampling using Lagrange interpolation polynomials

In the case of a log-concave target density, the key idea of ARMS2 is to achieve a better approximation to the

log-density by using a piecewise quadratic rather than a piecewise linear function. This piecewise quadratic function

is constructed from Lagrange interpolation polynomials of degree 2.

Definition 1. The Lagrange interpolation polynomial P(x) of degree (n − 1) that passes through the n points

(x1, y1),...,(xn, yn) is given by

n

?

where

n ?

The formula was first published by Waring (1779), rediscovered by Euler in 1783, and published by Lagrange in 1795

(Jeffreys and Jeffreys, 1988).

Suppose, we start by evaluating the logarithm of the target density at three points. Then there exists a unique

quadratic Lagrange interpolation polynomial going through these three points. Due to concavity, the exponent of this

Lagrange polynomial is proportional to the density of a normal random variable. This is shown in Lemma 1.

P(x) =

j=1

Pj(x),

Pj(x) = yj

k=1,k?=j

x − xk

xj− xk.

Lemma 1. Let l(x) be a strictly concave function, xi ∈ R,i = 1,2,3, be in strictly ascending order, yi = l(xi),

i = 1,2,3, and A = {x1, y1,x2, y2,x3, y3}. Then there exists a unique quadratic function Q(x) = Q(x; A) such that

l(xi) = Q(xi) for i = 1,2,3 and the quadratic function Q is given by

Q(x) = −(x − µ)2

2σ2

where

µ = x2+(x3− x2)k1+ (x2− x1)k2

2(k1− k2)

σ2=

c = y2+((x3− x2)k1+ (x2− x1)k2)2

4(x3− x1)(k1− k2)

with

k1=y2− y1

k2=y3− y2

+ c

(3)

,

x3− x1

2(k1− k2)> 0,

and

x2− x1,

x3− x2.

Proof. A direct calculation using Definition 1 with n = 3 shows that l(xi) = Q(xi) for i = 1,2,3. The condition

σ2> 0 is a direct consequence of the strict concavity of l. The uniqueness comes from the fact that two quadratic

functions cannot intersect at more than two points unless they are identical (polynomial uniqueness theorem, see

e.g. Estep (2002)).

?

Page 6

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3413

Obviously, the exponent of Q(x) is proportional to a normal probability density function

?

1

√

2σ2

exp(Q(x)) = exp

−(x − µ)2

2σ2

+ c

?

∝

2πσ2exp

?

−(x − µ)2

?

.

(4)

Therefore, as the exponent of the quadratic Lagrange interpolation polynomial for a log-concave target π(x) evaluated

at three points is proportional to a normal density, Lemma 1 and (4) provide the rationale for the construction of a

proposal density consisting of a sequence of truncated normal densities. Such a proposal density provides closer

bounds to the target density than piecewise exponential envelopes in ARS, but is generally not a blanketing density.

Given a set of abscissae Sn = {x0< x1< ··· < xn < xn+1} and yi = logπ(xi), i = 1,...,n, there are two

possible quadratic interpolation polynomials for each inner interval (xi,xi+1), namely Qi−1,i,i+1(x) = Q(x; Ai−1)

with Ai−1= {xi−1, yi−1,xi, yi,xi+1, yi+1} based on the three abscissae xi−1,xi,xi+1and Qi,i+1,i+2(x) = Q(x; Ai)

with Ai= {xi, yi,xi+1, yi+1,xi+2, yi+2} based on the three abscissae xi,xi+1,xi+2. Rather than choosing arbitrarily

between these two quadratic functions, we decided to use Qi−1,i,i+1(x) for roughly the first half of the interval,

more precisely for x ∈ [xi,zi), and Qi,i+1,i+2(x) for x ∈ [zi,xi+1), where ziis the abscissa of the intersection of

Li−1,i(x, Sn) and Li+1,i+2(x, Sn). This choice has the potential of further improving the approximation. If uidenotes

the slope of Li,i+1(x, Sn), then zi, i = 1,...n − 1, can be expressed as

zi=ui(xi+1− xi) − ui+1xi+1+ ui−1xi

ui−1− ui+1

where

(5)

uj=logπ(xj+1) − logπ(xj)

xj+1− xj

Setting z0 = x0and zn = xn+1, the adaptive pseudo-envelope gn(x) for log-concave target densities is defined as

follows:

This construction of gn(x) is illustrated in Fig. 3 for a log-concave target function evaluated an n = 4 points. Note

that since the leftmost and rightmost abscissae could be infinite, we use linear functions in the two end-tails, just as in

ARS.

We define the proposal density hn(x) of ARMS2 as follows:

.

(6)

gn(x) =

L0,1(x, Sn),

Qi,i+1,i+2(x, Sn),

Ln,n+1(x, Sn),

for x ∈ (z0,z1),

for x ∈ [zi,zi+1),i = 1,...,n − 2,

for x ∈ [zn−1,zn).

(7)

hn(x) =

1

Mn

exp(gn(x))

=

1

Mn

1

Mn

1

Mn

exp(L0,1(x, Sn)),

for x ∈ (z0,z1)

for x ∈ [zi,zi+1),i = 1,...,n − 2,

for x ∈ [zn−1,zn)

exp(Qi,i+1,i+2(x, Sn)),

exp(Ln,n+1(x, Sn)),

(8)

where

Mn=

?

?z1

D

exp(gn(t))dt

=

z0

exp(L0,1(t))dt +

n−2

?

i=1

?zi+1

zi

exp

?

−(t − µi)2

2σ2

i

+ ci

?

dt +

?zn

zn−1

exp(Ln,n+1(t))dt

Page 7

3414

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

Fig. 3. Construction of the pseudo-envelope function for a log-concave density in the ARMS2 algorithm; thin solid curve: logarithm of the target

density, bold solid curve: logarithm of the proposal density, dashed line: envelope function, and dotted line: squeezing function.

=

1

u0

?exp(L0,1(z1)) − exp(L0,1(z0))?+

1

un−1

iand ci are the constants defined in Lemma 1 with A = Ai, i = 1,...,n − 2. Note that for the inner

intervals, exp(Qi,i+1,i+2(x, Sn)) is proportional to a normal density ψi(x) with mean µiand variance σ2

proposal density hn(x) is a sequence of two piecewise exponential and n − 2 truncated normal densities and we

face the problem of sampling fast and efficiently from truncated normal densities. Also, as hn(x) does not yield a

blanketing density, we need to supplement rejection sampling with a Metropolis–Hastings step, similarly to ARMS.

Thus, the ARMS2 algorithm is specified by the ARMS algorithm given in Section 2 with the appropriate envelope

function gn(x) as defined in Eq. (7). ARMS2, unlike ARS, will not produce independent samples from π(x). It is an

adaptive generalization of the rejection sampling chain proposed by Tierney (1994). In analogy to the generalization

of ARS to ARMS, we extended the ARMS2 algorithm to non-log-concave target densities by using piecewise linear

interpolations for the non-log-concave parts, as shown in Fig. 4.

In the remainder of this section, we focus on operational details of the ARMS2 implementation, namely an efficient

auxiliary variable algorithm to generate truncated normals.

To generate a sample from a truncated normal distribution in the implementation of ARMS2 any technique could be

used, as for instance the inverse CDF method proposed and developed by Marsagalia (1963) and Norman and Cannon

(1972). Although this method is conceptually simple, numerical problems can occur when sampling from highly

skewed or extremely concentrated densities. Furthermore, sampling from truncated normal densities via the inverse

CDF method is computationally expensive. Thus, in order to sample fast and efficiently from truncated normals, we

use an auxiliary variable technique, adaptive uniform rejection sampling (AURS) proposed by Damien and Walker

(2001). The idea is to introduce a latent variable and, after obtaining a sample from the marginal density of the latent

variable, to generate a sample from the conditional density of the target given the latent variable. We give a brief

description of the algorithm:

To sample from a truncated normal density f (x) ∝ exp(−x2/2)I(a < x < b) on a finite interval (a,b), a latent

variable u is introduced with joint density f (x,u) ∝ I(u < exp(−x2/2))I(a < x < b). This yields a monotone

decreasing marginal density

?

for 0 ≤ u ≤ 1 and a conditional density f (x|u) that is uniform on the interval (a,b) ∩ (−?−2logu,+?−2logu).

n−2

?

i=1

exp(ci)

?

2πσ2

i

?zi+1

zi

ψi(t)dt

+

?exp(Ln,n+1(zn)) − exp(Ln,n+1(zn−1))?

and µi,σ2

i. Thus the

f (u) ∝ max0,min(b,

?−2logu) − max(a,−?−2logu)

?

Thus, a sample from f (x) can be generated by sampling u from f (u), and then sampling x from f (x|u). To sample

Page 8

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3415

Fig. 4. Construction of the pseudo-envelope function for a non-log-concave density function in the ARMS2 algorithm; thin solid curve: logarithm

of the target density, bold solid curve: logarithm of the proposal density, dashed line: envelope function, and dotted line: squeezing function.

efficiently from f (u), the fact that f (u) is monotone decreasing is utilized to construct an adaptive rejection sampling

algorithm. At the ith iteration of AURS, suppose f (uj) has been evaluated at uj for j = 1,...,i + 1, where

0 < u1< u2< ··· < ui< ui+1= 1. Then an envelope gi(u) is defined by

i ?

A value u∗is generated from gi(·) and w from the uniform distribution on (0,1). If

f (u∗)

f (uj),

then u∗is accepted as a random variate from f (u), else one proceeds with the (i + 1)th iteration.

Thus, sampling from a truncated normal distribution is reduced to sampling from uniform distributions. We

implemented AURS within ARMS2 to sample from truncated normals with an initial i = 4 points and needed a

mean of 2.86 AURS iterations per sample in the examples discussed in the next section.

gi(u) ∝

j=1

f (uj)I(uj< u < uj+1).

w ≤

where uj≤ u∗< uj+1,

4. Simulation study

In this section we demonstrate the efficiency of the new ARMS2 algorithm in comparison to ARMS when sampling

from four univariate target densities. To give a fair platform for comparisons, we implemented ARMS2 in the C pro-

gramming language used for ARMS on a 400 MHz Linux workstation. We chose three log-concave densities, namely

the Gumbel(0, a), Logistic(0, b) and Normal(10,σ2) distributions, and a further non-log-concave density, the bimodal

mixture distribution 0.3 ∗ Normal(5,0.12) + 0.7 ∗ Normal(6,σ2). We set the left and right bounds as −100 and 100,

respectively, and following general recommendations in Gilks et al. (1995) for log-concave distributions chose four

starting abscissae such that the gradient of L1,2(x, S4) is positive and the gradient of L3,4(x, S4) is negative. We placed

these more or less symmetrically around the mode. The number of target function evaluations could be sensitive to the

initial choice of Snbut as observed by Gilks and Wild (1992), widely separated starting abscissa are only modestly

detrimental and asymmetry in the starting abscissa has only little impact on the number of function evaluations. We

chose x[4] = {−10,−3,7,10} as initial abscissae for the Gumbel and logistic distributions, x[4] = {0,3,17,20} for

the normal distribution and x[4] = {0,3,7,10} for the mixture distribution in both ARMS and ARMS2.

Considering the construction of the pseudo-envelopes, it is for relatively steep target densities that one would

expect the greatest improvement of ARMS2 over ARMS in approximating the target density. The graphs in the top

row of Fig. 5 compare the construction of the envelope function in ARMS for a flat Normal(10,102) and a steep

Page 9

3416

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

Fig. 5. Graphs in the top row compare ARMS envelopes for generating a sample from a Normal(10,102) and a Normal(10,0.22) distribution. The

following rows compare the number of additional abscissae in the construction of ARMS and ARMS2 pseudo-envelopes for generating a sample

from a Normal(10,102) and a Normal(10,0.22) distribution. ⊗ denotes the values of the target log-density at the four initial abscissae. ? denotes

additional functional evaluations required to accept a proposal. The envelopes are shown as bold solid lines.

Normal(10,0.22) log-density. Please note the different scales of the y-axes. For the steep Normal density, the linear

envelope of ARMS gives a very poor approximation. Another illustration is given in the next two rows of Fig. 5

showing the additional abscissae needed to accept a sample from the adaptive proposal density. When the standard

deviation is large, i.e. σ = 10, both ARMS and ARMS2 require only one additional point. But when the standard

deviation is small, i.e. σ = 0.2, ARMS needs 8 additional abscissae whereas ARMS2 only needs one additional point.

To verify empirically, that ARMS2 is more efficient than ARMS particularly for steep target densities, we systemat-

ically varied the values of the scale parameters of each of the four distributions from high to low. For the Gumbel(0,a),

the Logistic(0,b), and the Normal(10, σ2) distribution, we decreased the scale parameters a, b and the standard devia-

tion σ from 10 to 0.1, i.e. the values were set to 10,8,6,4,2,1,0.8,0.6,0.4,0.2,0.1. For the mixture of normal distri-

butions, we altered the standard deviation of the second component. With decreasing scale parameters, the precision of

each of the four distributions increases and the shape of the concave parts of the log-density changes from flat to steep.

Fig. 6 compares the mean CPU time in seconds (based on 10,000 iterations) to generate one sample using ARMS

and ARMS2 from the four distributions with varying scale parameters. As can be seen from these graphs, ARMS is

slightly faster than ARMS2 only for large scale parameters, i.e. relatively flat log-density functions. However, ARMS2

is much faster than ARMS when the log-density is steep, i.e. for low values of the scale parameters. This is due to

the fact that ARMS2 gives a closer pseudo-envelope even for a steep target, resulting in less rejected points in the

Page 10

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3417

Fig. 6. Comparison of the speed of ARMS and ARMS2 (based on 10,000 iterations) for generating one sample from the target distributions with

varying scale parameters.

rejection step and ultimately fewer function evaluations of the target density which immediately translates into higher

computational speed.

These results suggest that it would always be beneficial to rescale the target density first, generate from the

target with increased scale and then transform the draws back to the original scale. After an appropriate rescaling

to a more dispersed density, the use of ARMS would be more efficient than ARMS2. Indeed, in certain situations,

some knowledge about the steepness of the full conditionals might be available from previous experimentations or

subject knowledge and suitable constants for rescaling can be specified. Especially in all applications where the full

conditional density p(θi|θj, j ?= i) is proportional to a location and scale form f ((θi− µ)/σ), where µ and σ

are functions of θj, j ?= i, a repeated sampling using either ARMS or ARMS2 from the same optimally scaled full

conditional followed by an appropriate scale and location change for each of the full conditional densities will result in

increased efficiency. Such cases occur for likelihood functions of the form L(x;?ziθi) for given data x, covariates xi

However, in general Gibbs sampling applications, this strategy would require an automatic derivation of suitable

rescaling factors which might prove to be an impossible undertaking. Thus, we recommend the use of ARMS2 in

general Gibbs sampling applications where there is limited knowledge about the shape of the full conditionals since

ARMS2 is only marginally slower for dispersed but considerably faster for concentrated densities than ARMS.

Fig. 7 shows kernel density estimates based on the samples from the ARMS and ARMS2 algorithms for the four

target densities Gumbel(0, 0.4), Logistic(0, 0.4), Normal(10,0.42) and 0.3∗Normal(5,0.12)+0.7∗Normal(6,0.42).

and parameters θiwith flat priors for θi’s, e.g. in the proportional hazards model, as discussed in Dellaportas (1995).

Page 11

3418

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

Fig. 7. True density functions of the Gumbel, logistic, normal, and bimodal mixture distribution overlaid by empirical estimates (kernel density

estimates) using ARMS and ARMS2 based on 10,000 samples.

Table 1

Comparison of the CPU time in seconds and the number of evaluations of π(x) required to generate one sample from the target density using

ARMS and ARMS2 based on 10,000 iterations

Target function Maximum number of evaluations

of π(x)

ARMS

Mean number of evaluations of

π(x)

ARMS

Speed

ARMS2ARMS2 ARMSARMS2

Gumbel(0, 0.4)

Logistic(0, 0.4)

Normal(10,0.42)

0.3 ∗ N(5,0.12) + 0.7 ∗ N(6,0.42)

30

11

16

18

1021.80

7.24

11.54

12.17

6.29

6.12

6.08

6.16

10.05

3.18

3.90

3.60

4.25

2.53

2.40

2.85

8

7

10

These are basically not divergent from the true densities and thus demonstrate good sampling qualities for both

techniques. Q–Q plots (not shown here) produced from the samples of ARMS and ARMS2 showed basically straight

lines confirming that the sample sets from ARMS and ARMS2 stem from the same distribution.

Table 1 compares the computational speed as well as the mean and maximum number (based on 10,000

iterations) of function evaluations of the four target densities Gumbel(0, 0.4), Logistic(0, 0.4), Normal(10,0.42) and

0.3 ∗ N(5,0.12) + 0.7 ∗ N(6,0.42) needed for generating one sample by using ARMS and ARMS2. As expected,

ARMS2 requires a significantly smaller maximum and mean number of function evaluations for all four densities.

Therefore, the time required for generating 10,000 samples is considerably less when using ARMS2 than ARMS.

5. State-space model example

To illustrate the comparative advantages of ARMS2 over ARMS, we chose a case study with the framework of

Bayesian non-linear non-normal state-space models because here the Gibbs sampler requires sampling from a large

Page 12

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3419

number (greater than the length of the time series) of complex full conditional distributions. The state-space approach

is one of the most powerful tools for dynamic modeling and forecasting of time series and longitudinal data. Excellent

overviews are given in Fahrmeir and Tutz (1994), West and Harrison (1997) and Kuensch (2001). The observation

equations of a state-space model specify the conditional distributions of the observations ytat time t as a function

of unknown states θt. But unlike a static model, the state of nature, θt, changes over time according to a relationship

prescribed by engineering or scientific principles. This dynamic Markovian transition of the latent states from time

t to t + 1 is given by the state equations. The ability to include knowledge of the system behaviour in the statistical

model is largely what makes state-space modeling so attractive for biologists, economists, engineers, and physicists.

ThecommonKalmanfilter(Kalman,1960)isnotapplicableformaximumlikelihoodestimationbecauseitdepends

crucially on the linearity of state-space equations and normal error distributions, assumptions that are limiting and not

realistic in most applications. For functionally non-linear state-space models, only approximate filters, including the

extended Kalman filter (Harvey, 1989), are available. For non-linear non-normal state-space models, ML estimation

is complicated due to the intractable form of the likelihood function. Similarly, Bayesian posterior computation will

generally require multidimensional integration to find normalization constants as well as marginal summaries. Carlin

et al. (1992) showed how these computational difficulties can be overcome by the Gibbs sampler. For many non-linear

non-normal state-space models, however, the full conditional posterior distributions tend to be complex functions so

that a simple rejection method as proposed by Carlin et al. (1992) is no longer feasible. To this end, the Gibbs sampler

in conjunction with ARMS has been successfully applied for fitting non-linear non-normal state-space models, e.g. in

the context of stochastic volatility models in econometrics (e.g. Meyer and Yu (2000)), chaotic non-linear dynamics

in physics (Meyer and Christensen, 2000) and the delay difference model for fisheries stock assessment (e.g. Meyer

and Millar (1999)). We will use the latter example for illustration. In the following, we give a brief description of the

delay difference model. A more detailed account is given in Meyer and Millar (1999).

Population dynamics models (e.g., for a review, see Hilborn and Walters (1992)) in general relate exploitable

biomass in year t + 1 to biomass, growth, recruitment, natural mortality, and catch in the previous year t. A state-

space model relates the observed relative abundance indices {It}, e.g., catch per unit effort (CPUE) from commercial

fisheries, to unobserved states, here the biomasses {Bt} by a stochastic observation model for Itgiven Bt. The delay

difference model is used in Meyer and Millar (1999) for the stock assessment of yellowfin tuna in the eastern tropical

Pacific Ocean. The historical data, consisting of catch in million of pounds and CPUE in pounds per boat-day for

the years 1934–1967, are taken from Pella and Tomlinson (1969) and reproduced in Table 2. Here we reanalyze this

dataset using ARMS2 and compare results to those obtained using ARMS in Meyer and Millar (1999).

Catches and relative abundance indices in years t = 1,..., N are denoted by Ctand It, respectively. The unknown

parameters of interest are the carrying capacity K, recruitment R and catchability q, whereas the growth parameters

ρ and ω are assumed to be known. Assuming that the observed relative abundance indices Itare proportional to the

total biomass Bt, and defining

Pt=Bt

K,

k =

1

K,

r =R

K,

and

Q = qK

the observation equations of the non-linear state-space model are given by

It= QPt+ vt

The state transitions are governed by the delay difference model that relates the annual biomass to the biomasses in

the two previous years:

for t = 1,..., N.

(9)

P1= 1 + u1

P2= e−M(1 + ρ − ρ e−M)(P1− kC1) + r

Pt+1 = (1 + ρ)e−M(Pt− kCt) − ρ e−2M(Pt− kCt)

?

?

1 − ρ ωe−M(P1− kC1)

P1

?

+ u2

Pt

(Pt−1− kCt−1)

+r

1 − ρ ωe−M(Pt− kCt)

Pt

?

+ ut+1

(10)

Page 13

3420

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

Table 2

Catch (millions of pounds) and CPUE (pounds per boat-day) data from Pella and Tomlinson (1969)

Year CatchCPUE

1934

1935

1936

1937

1938

1939

1940

1941

1942

1943

1944

1945

1946

1947

1948

1949

1950

1951

1952

1953

1954

1955

1956

1957

1958

1959

1960

1961

1962

1963

1964

1965

1966

1967

60.9

72.3

78.4

91.5

78.3

110.4

114.6

76.8

42.0

50.1

64.9

89.2

129.7

160.2

207.0

200.1

224.8

186.0

195.3

140.0

140.0

140.9

177.0

163.0

148.5

140.5

244.3

230.9

174.1

145.5

203.9

180.1

182.3

178.9

10361

11484

11571

11116

11463

10528

10609

8018

7040

8441

10019

9512

9292

7857

8353

8363

7057

10108

5606

3852

5339

8191

6507

6090

4768

4982

6817

5544

4120

4368

4844

4166

4513

5292

for t = 2,..., N. We assume independent normal errors for {ut} and {vt}. Specifically, ut ∼ N(0,σ2) and the

CPUEs are given approximately a constant coefficient of variation by assuming the vtto be N(0,wtτ2) with weights

wtproportional to the squared fitted values obtained from a non-linear robust smoothing of the CPUE time series by

means of running medians (using the SPLUS function “smooth”). The weights are standardized by wN = 1 so that

vN∼ N(0,τ2).

The 39 unobservables in this delay difference model are the five unknown population parameters and the N = 34

unknown relative biomasses (K, Q,r,σ2,τ2, P1,..., PN). We used the same prior distributions as in Meyer and

Millar (1999). The full conditional posterior densities are given in the Appendix. We performed 250,000 cycles of

the Gibbs sampler and thinned the chain by taking every 25th observation. For the remaining 10,000 samples, we

used a burn-in period of 1000, which yielded a final chain of length 9000. The results are summarized in Table 3. We

achieved a considerable reduction in both the maximum and mean number of target function evaluations when using

ARMS2 as compared to ARMS. Furthermore, the computation time was almost halved. Whereas the implementation

with ARMS took 124.86 min for 250,000 iterations, that using ARMS2 required only 69.35 min.

6. Discussion

In this article, we presented an alternative adaptive rejection Metropolis sampling method, ARMS2, to the black-

box algorithm ARMS to sample from arbitrarily complex univariate distributions. ARMS2 can be used to generate

from log-concave and non-log-concave distributions. ARMS2 uses Lagrange interpolation polynomials of degree

Page 14

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3421

Table 3

Comparison of the parameter estimates (with standard deviations (SD) in parentheses) for the delay difference model, and the maximum and mean

number of evaluations of the joint posterior density π(x) using ARMS and ARMS2a

ModelMean (SD) Maximum number of evaluations of

π(x)

ARMS

Mean number of

evaluations of π(x)

ARMS ARMS ARMS2 ARMS2ARMS2

P1

P34

P35

K

Q

r

σ2

τ2

1.021(0.115)

0.488(0.085)

0.511(0.142)

1801(1683)

10798(1493)

0.203(0.052)

0.011(0.008)

260010(160032)

1.023(0.114)

0.490(0.085)

0.510(0.143)

1804(1683)

10801(1487)

0.203(0.051)

0.011(0.008)

260016(160029)

19

17

16

19

27

28

14

15

9 10.53

10.78

10.75

10.14

13.62

17.62

9.03

9.34

6.83

6.78

6.91

6.29

10.33

12.68

6.58

6.65

10

9

10

16

19

9

9

aIterations = 9000.

2 rather than linear functions to construct a pseudo-envelope for the target density. Thus, it achieves a better

approximation of the target by the blanketing density and ultimately fewer target function evaluations which can be

computationally expensive in many applications. A considerable reduction in target function evaluations and thereby

a pronounced decrease in computation time is seen for steep log-densities. This is irrespective of whether the target

density is log-concave or non-log-concave. For arbitrary target densities, if ARMS2 loses against ARMS, it does not

losebymuchwhileifitwinsthegainscanbelarge,implyingthatiftheshapeofthedensityisnotverywellunderstood

(which is often the case in Gibbs sampling applications) ARMS2 seems to be a safer strategy than ARMS. Gibbs

sampling in complex hierarchical Bayesian models, such as the non-linear non-normal state-space model discussed

in Section 5 or population pharmacokinetic models as in Gilks et al. (1995) usually encounters a large number of

algebraically complex and mostly non-log-concave full conditionals, that are costly to evaluate. Here, the marked

reduction in target function evaluations yields a substantial speed-up of implementation.

The C-subroutine of the ARMS2 implementation is similar in structure to that of ARMS (Gilks et al., 1995) and

available upon request from the authors.

Acknowledgements

The authors gratefully acknowledge the support of this research by the Marsden Fund Council from Government

funding, administered by the Royal Society of New Zealand and by NSERC.

Appendix

Full conditional posterior densities for latent states and parameters in the delay difference model.

In the following, let

(1 + ρ)e−M(Pt−1− kCt−1) − ρ e−2M(Pt−1− kCt−1)

?

Full conditional posterior density of Pt, t = 3,..., N − 2:

p(Pt| P1,..., Pt−1, Pt+1,..., PN,k,r, Q,σ2,τ2)

∝ p(Pt| Pt−1, Pt−2,k,r,σ2) × p(Pt+1| Pt, Pt−1,k,r,σ2)

×p(Pt+2| Pt+1, Pt,k,r,σ2) × p(It| Pt, Q,τ2)

g(Pt) =

1

e−M(1 + ρ − ρ e−M)(P1− kC1) + r

for t = 1,

for t = 2,

?

1 − ρ ωe−M(P1− kC1)

P1

?

Pt−1

(Pt−2− kCt−2)

+r

1 − ρ ωe−M(Pt−1− kCt−1)

Pt−1

?

for t = 3,..., N + 1

Page 15

3422

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

∝ exp

?

−

?

1

2σ2{(Pt− g(Pt))2+ (Pt+1− g(Pt+1))2+ (Pt+2− g(Pt+2))2}

1

2wtτ2(It− QPt)2

Similar expressions are obtained for P1, P2, PN−1, PN, and PN+1by dropping respective terms.

Full conditional posterior density of k:

?

×exp

−

?

.

p(k | P1,..., PN,r, Q,σ2,τ2) ∝ p(k)

N ?

t=2

p(Pt| Pt−1Pt−2,k,r,σ2)

?

t=2

∝

1

kexp

0,

−

1

2σ2

N

?

(Pt− g(Pt))2

?

,

k > 0,

otherwise

Full conditional posterior density of r:

p(r | P1,..., PN,k, Q,σ2,τ2) ∝ p(r)

N ?

t=2

p(Pt| Pt−1, Pt−2,k,r,σ2)

?

r

∝

1

rexp

0,

−(logr − µr)2

2σ2

−

1

2σ2

N

?

t=2

(Pt− g(Pt))2

?

,

r > 0

otherwise

Full conditional posterior density of Q:

p(Q | P1,..., PN,k,r,σ2,τ2) ∝ p(Q)

N ?

t=2

p(It| Pt, Q,τ2)

?

t=1

∝

1

Qexp

0,

−

1

2τ2

N

?

(It− QPt)2

wt

?

,

Q > 0

otherwise

Full conditional posterior density of σ2:

The full conditional distribution for σ2is IG(α,β) where

α =N

2,β =1

2

N

?

t=1

(Pt− g(Pt))2.

Full conditional posterior density of τ2:

The IG(3,500,000) prior distribution is conjugate family, so that the full conditional distribution for τ2is again

IG(α,β) where

α = 3 +N

2,β = 500,000 +1

2

N

?

t=1

(It− QPt)2

wt

.

References

Carlin, B.P., Polson, N.G., Stoffer, D.S., 1992. A Monte Carlo approach to nonnormal and nonlinear state-space modeling. Journal of the Americal

Statistical Association 87, 493–500.

Casella, G., George, E.I., 1992. Explaining the Gibbs sampler. American Statistician 46, 167–174.

Chen, M.-H., Schmeiser, B.W., 1998. Toward black-box sampling: A random-direction interior-point Markov chain approach. Journal of

Computational and Graphical Statistics 7, 1–22.

Page 16

R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423

3423

Damien, P., Wakefield, J.C., Walker, S.G., 1999. Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables.

Journal of the Royal Statistical Society Series B 61, 331–344.

Damien, P., Walker, S.G., 2001. Sampling truncated normal, beta, and gamma densities. Journal of Computational and Graphical Statistics 10 (2),

206–215.

Dellaportas, P., 1995. Random variate transformations in the Gibbs sampler: Issues of efficiency and convergence. Statistics and Computing 5,

133–140.

Dellaportas, P., Smith, A.F.M., 1993. Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling. Applied

Statistics 42, 443–459.

Devroye, L., 1986. Non-uniform Random Variate Generation. Springer-Verlag, New York.

Estep, D., 2002. Practical Analysis in One Variable. Springer, New York.

Fahrmeir, L., Tutz, G., 1994. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York.

Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern

Analysis and Machine Intelligence 6, 721–741.

Gilks, W.R., 1992. Derivative-free adaptive rejection sampling for Gibbs sampling. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M.

(Eds.), Bayesian Statistics, vol. 4. Clarendon, Oxford, pp. 641–649.

Gilks, W.R., Wild, P., 1992. Adaptive rejection sampling for Gibbs sampling. Applied Statistics 41 (2), 337–348.

Gilks, W.R., Best, N.G., Tan, K.K.C., 1995. Adaptive rejection Metropolis sampling within Gibbs sampling. Applied Statistics 44, 455–472.

Harvey, A., 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, New York.

Hilborn, R., Walters, C.J., 1992. Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty. Chapman & Hall, New York.

Jeffreys, H., Jeffreys, B.S., 1988. Lagrange’s Interpolation Formula. 9.011 in Methods of Mathematical Physics, third ed. Cambridge University

Press, Cambridge, p. 260.

Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. Journal on Basic Engineering 82, 34–45.

Kuensch, H.R., 2001. State space and hidden Markov models. In: Barndorff-Nielsen, et al. (Eds.), Complex Stochastic Systems. Chapman & Hall,

London, pp. 109–174.

Marsagalia, G., 1963. Generating discrete random variables in a computer. Communications of the ACM 6, 101–102.

Meyer, R., Millar, R.B., 1999. Bayesian stock assessment using a state-space implementation of the delay difference model. Canadian Journal on

Fisheries and Aquatic Science 56, 37–52.

Meyer, R., Yu, J., 2000. BUGS for a Bayesian analysis of stochastic volatility models. The Econometrics Journal 3 (2), 198–215.

Meyer, R., Christensen, N.L., 2000. Bayesian reconstruction of chaotic dynamical systems. Physical Review E 62, 3535–3542.

Mira, A., Tierney, L., 2002. Efficiency and convergence properties of slice samplers. Scandinavian Journal of Statistics 29, 1–12.

Neal, R.M., 2003. Slice sampling. The Annals of Statistics 31, 705–767.

Norman, J.E., Cannon, L.E., 1972. A computer program for the generation of random variables from any discrete distribution. Journal of Statistical

Computation and Simulation 1, 331–348.

Pella, J.J., Tomlinson, P.K., 1969. A generalized stock production model. Inter American Tropical Tuna Commission Bulletin 13, 421–496.

Ripley, B.D., 1987. Stochastic Simulation. Wiley, New York.

Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R., 1996. BUGS 0.5, Bayesian inference using Gibbs sampling. Manual (Version II). MRC

Biostatistics Unit, Cambridge, UK.

Tierney, L., 1994. Markov chains for exploring posterior distribution. Annals of Statistics 22, 1701–1762.

Waring, E., 1779. Philosophical Transactions 69, 59–67.

Wakefield, J.C., Gelfand, A.E., Smith, A.F.M., 1991. Efficient generation of random variates via the ratio-of-uniform method. Statistics and

Computing 1, 129–133.

Wakefield, J.C., Smith, A.F.M., Racine-Poon, A., Gelfand, A.E., 1994. Bayesian analysis of linear and nonlinear population models using the Gibbs

sampler. Journal of Applied Statistics 43, 201–221.

West, M., Harrison, P.J., 1997. Bayesian Forecasting and Dynamic Models, second ed. Springer, New York.