Page 1
Computational Statistics and Data Analysis 52 (2008) 3408–3423
www.elsevier.com/locate/csda
Adaptive rejection Metropolis sampling using Lagrange
interpolation polynomials of degree 2
Renate Meyera,∗, Bo Caib, Franc ¸ois Perronc
aDepartment of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand
bDepartment of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC 29208, United States
cDepartment of Mathematics and Statistics, University of Montreal, Montreal, Quebec, Canada H3C 3J7
Received 16 April 2007; received in revised form 8 January 2008; accepted 9 January 2008
Available online 26 January 2008
Abstract
A crucial problem in Bayesian posterior computation is efficient sampling from a univariate distribution, e.g. a full conditional
distribution in applications of the Gibbs sampler. This full conditional distribution is usually non-conjugate, algebraically complex
and computationally expensive to evaluate. We propose an alternative algorithm, called ARMS2, to the widely used adaptive
rejection sampling technique ARS [Gilks, W.R., Wild, P., 1992. Adaptive rejection sampling for Gibbs sampling. Applied Statistics
41 (2), 337–348; Gilks, W.R., 1992. Derivative-free adaptive rejection sampling for Gibbs sampling. In: Bernardo, J.M., Berger,
J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics, Vol. 4. Clarendon, Oxford, pp. 641–649] for generating a sample from
univariatelog-concavedensities.WhereasARSisbasedonsamplingfrompiecewiseexponentials,thenewalgorithmusestruncated
normal distributions and makes use of a clever auxiliary variable technique [Damien, P., Walker, S.G., 2001. Sampling truncated
normal, beta, and gamma densities. Journal of Computational and Graphical Statistics 10 (2) 206–215]. Furthermore, we extend
this algorithm to deal with non-log-concave densities to provide an enhanced alternative to adaptive rejection Metropolis sampling,
ARMS [Gilks, W.R., Best, N.G., Tan, K.K.C., 1995. Adaptive rejection Metropolis sampling within Gibbs sampling. Applied
Statistics 44, 455–472]. The performance of ARMS and ARMS2 is compared in simulations of standard univariate distributions as
well as in Gibbs sampling of a Bayesian hierarchical state-space model used for fisheries stock assessment.
c ? 2008 Elsevier B.V. All rights reserved.
1. Introduction
The Gibbs sampler (Geman and Geman, 1984; Casella and George, 1992) for the computation of high-dimensional
posterior distributions requires iterative sampling from the univariate full conditional posterior distribution of each
parameter, i.e. its conditional distribution on the data and the current values of all other parameters. As these full
conditionals change from one iteration to the next, are usually non-conjugate and have a complicated algebraic form,
efficient omnibus techniques are needed to generate draws from univariate probability density functions. Apart from
auxiliary methods (Chen and Schmeiser, 1998; Damien et al., 1999; Mira and Tierney, 2002; Neal, 2003), adaptive
rejection sampling algorithms (Gilks and Wild, 1992) are frequently adopted. These are our focus in this article.
∗Corresponding author. Tel.: +64 9 3737599x85755; fax: +64 9 3737018.
E-mail addresses: meyer@stat.auckland.ac.nz (R. Meyer), bocai@gwm.sc.edu (B. Cai), perronf@dms.umontreal.ca (F. Perron).
0167-9473/$ - see front matter c ? 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.csda.2008.01.005
Page 2
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3409
Gilks and Wild (1992) developed adaptive rejection sampling (ARS), a black-box technique for the rich class of
log-concave density functions. Examples of log-concave densities are listed in the table of Gilks and Wild (1992)
or Devroye (1986), p. 287. In the seminal paper by Dellaportas and Smith (1993) it was shown that under a log-
concave prior the posterior densities for the whole class of generalized linear models with canonical link functions
are log-concave. The same holds for proportional hazards models which are widely used in survival analysis. Various
fast and efficient methods for sampling from log-concave distributions have been proposed in the literature (Devroye,
1986). However, these require the location of the mode of the density and therefore necessitate a time-intensive and
computer-expensive maximization step. This is also the case with the generalization of the ratio-of-uniform methods
proposed by Wakefield et al. (1991, 1994) in the context of sampling from full conditionals in non-linear population
models. This algorithm is even more general in that it does not require log-concave densities, however, the trade-off
for this universality is still the global minimization and maximization in order to find the bounding rectangle.
Usingthefactthatanyconcavefunctioncanbeboundedfromaboveandbelowbyitstangentsandchords,ARSwas
able to dispense with the awkward and time-consuming optimization. It is based on the usual Monte Carlo rejection
sampling using squeezing functions (Ripley, 1987). A further advantage is its adaptivity which reduces the number
of function evaluations of the target density. ARS adapts the envelope and squeezing function after each rejection by
making use of the fact that the target function has already been evaluated at the rejected point. The adaptive envelope
gets closer to the target density with every rejection, thus reducing the rejection probability in the subsequent rejection
sampling step and thereby the probability that the target density needs to be evaluated. To calculate the tangents, the
first derivatives of the density are required in the original algorithm. A derivative-free version (Gilks, 1992) uses
secants instead of tangents, as shown in Figure 1, and thus avoids the need for the specification of derivatives. This
is implemented in the widely used program BUGS (Spiegelhalter et al., 1996). For an efficient rejection sampling
algorithm it is also essential that the envelope density is easy to sample from. This is the case in ARS, the envelope
being a piecewise exponential density.
To sample from non-log-concave distributions, Gilks et al. (1995) developed a general algorithm, adaptive rejection
Metropolis sampling (ARMS). In this algorithm, ARS is supplemented with a Metropolis–Hastings step to deal with
non-log-concave parts.
Although ARS and ARMS are efficient and fast sampling algorithms, we see the potential for improvement. Rather
than construct an envelope for the logarithm of the target density from piecewise linear functions, piecewise quadratic
functions constructed using the Lagrange interpolation polynomial of degree 2 will give a better approximation to
a log-concave density, especially for steep target densities. We will show that due to log-concavity this construction
using quadratic Lagrange interpolation polynomials yields a piecewise Gaussian blanketing density. To sample fast
and efficiently from a truncated normal distribution, we employ a recently proposed auxiliary variable technique
(Damien and Walker, 2001). The piecewise normal rejection function, however, is no longer a strict envelope.
Therefore, we append a Metropolis–Hastings step in analogy to ARMS and furthermore extend ARMS2 to non-
log-concave densities. Although this algorithm, ARMS2, will no longer produce independent samples from the target
density, we will demonstrate that the efficiency increases due to a reduction in the number of function evaluations.
This is of utmost importance in Gibbs sampling where draws from many different full conditionals of complicated
algebraic form are required. It is in these situations that we expect the greatest advantage of ARMS2 over ARMS, as
illustrated in Section 5.
The remainder of the article is organized as follows. To facilitate comparison, set notation and make this paper
self-contained, Section 2 gives a brief description of adaptive rejection sampling and adaptive rejection Metropolis
sampling followed by the specification of the Lagrange interpolation adaptive rejection sampling algorithm in
Section 3. Section 4 compares the performance of ARMS2 and ARMS in some simulation studies. Section 5 applies
both ARMS and ARMS2 in Gibbs samplers for posterior inference in a non-linear non-Gaussian state-space model.
We conclude the paper with a discussion.
2. Adaptive rejection sampling
We considerthe general problemof samplingfrom a givenprobability density function1
set D ⊆ R, which is known only up to a normalizing constant K and for which π(x) is a strictly log-concave function,
i.e. which satisfies
Kπ(x), definedon a convex
logπ(λx + (1 − λ)y) > λlogπ(x) + (1 − λ)logπ(y)
for 0 < λ < 1,x ?= y ∈ D.
Page 3
3410
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
Fig. 1. Construction of the envelope function for a log-concave density in the ARS algorithm.
Let hu(x) denote an blanketing density, i.e. hu(x) ≥ cπ(x), and hl(x) a squeezing function, i.e. hl(x) ≤ cπ(x) for
all x ∈ D and some constant c, then the following step is performed in rejection sampling (RS) until one sample has
been accepted:
Sample y from hu(x), and v independently from Uniform (0,1).
Squeezing test: If v ≤ hl(y)/hu(y) accept y, otherwise
Rejection test: If v ≤ π(y)/hu(y) accept y, otherwise reject y and repeat.
Using a squeezing function which is usually simple and easy to evaluate, has the advantage that it potentially
bypasses the costly evaluation of the target density π(x) and thus reduces the number of target function evaluations.
Furthermore, rejection sampling is efficient if the blanketing density is easy to sample from as well as close to the
target density (Devroye, 1986).
ARS is an adaptive version of RS in the sense that rejected points are put to use in updating the blanketing density
and lower squeezing function, yielding tighter bounds and thereby reducing the rejection probability in the subsequent
rejection sampling step. Note that only rejected points are included in the set of abscissae as the ARS algorithm is
usually used to draw a single sample from a certain distribution, e.g. a full conditional distribution in Gibbs sampling.
Also, only rejected samples have previously necessitated a target function evaluation, accepted points may have been
accepted through the evaluation of the squeezing function. Moreover, rejected draws indicate a substantial disparity
between target and blanketing density and therefore an opportunity to substantially improve the blanketing density
and thus to markedly decrease the probability of having to evaluate the target density in future steps.
Let Sn = {x0 < x1 < ··· < xn < xn+1}, n ≥ 3, denote a current set of abscissae where x0and xn+1are the
possibly infinite lower and upper bounds of D. For 0 ≤ i < j ≤ n + 1 let Li,j(x, Sn) denote the straight line through
the points (xi,logπ(xi)) and (xj,logπ(xj)). If x0= −∞ then define L0,1(x, Sn) = L1,2(x, Sn) and if xn+1= ∞
then define Ln,n+1(x, Sn) = Ln−1,n(x, Sn).
As illustrated in Fig. 1, the piecewise linear function gn(x) is defined by
Similarly, if D is unbounded on the right then the gradient of Ln−1,n(x, Sn) should be negative. Usually n = 3 is
chosen to initialize Sn. For recommendations as to the choice of the starting abscissae, see Gilks et al. (1995).
Due to the concavity of logπ(x), every gn(x) defines an envelope for logπ(x) and therefore
1
Mn
is a blanketing density for π(x), i.e. π(x) ≤ Mnhn(x). Thus, the piecewise exponential density hn(x) can be used as
a proposal density in rejection sampling yielding the following ARS algorithm:
gn(x) =
L0,1(x, Sn)
min{Li−1,i(x, Sn), Li+1,i+2(x, Sn)}
Ln,n+1(x, Sn)
If D is unbounded on the left, starting abscissae should be chosen so that the gradient of L1,2(x, Sn) is positive.
for x ∈ (x0,x1)
for x ∈ [xi,xi+1),i = 1,...,n − 1, and
for x ∈ [xn,xn+1).
(1)
hn(x) =
exp{gn(x)},
where Mn=
?
exp{gn(x)}dx,
Page 4
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3411
Fig. 2. Construction of the envelope function for a non-log-concave density in the ARMS algorithm.
Step 0: initialize n and Sn
Step n + 1: generate a pair x ∼ hn(x), u ∼ U(0,1)
if u ≤
else set Sn+1= Sn∪ {x} and relabel points in Sn+1in ascending order.
Note that a lower linear squeezing function can be defined and utilized as well but is omitted in the description
here.
If the target density is not log-concave, the idea of ARMS is to use the linear function Li,i+1(x, Sn) connecting
two points xi and xi+1in the non-log-concave parts as a pseudo-envelope as illustrated in Fig. 2 and to append
a Metropolis–Hastings step. Unlike ARS, ARMS will not produce independent samples from π(x). The pseudo-
envelope is defined as hn(x) ∝ expgn(x) where
gn(x) = max[Li,i+1(x, Sn),min{Li−1,i(x, Sn), Li+1,i+2(x, Sn)}],
Let xcdenote the current value of x. The ARMS algorithm to sample a new value x∗from π(x) is specified in the
following:
π(x)
exp{gn(x)}accept x
for xi≤ x ≤ xi+1.
(2)
Step 0: Initialize n and Snindependently of the current value xc;
Step 1: Generate x ∼ hn(·) and u ∼ U(0,1);
Step 2: Obtain the next rejection point xa;
if u ≤
else
set Sn+1= Sn∪ {x} (ARS rejection step);
relabel points in Sn+1in ascending order;
increment n and go back to step 1;
π(x)
exp(gn(x)), set xa= x (ARS acceptance step);
Step 3: Return xa;
Step 4: Generate v ∼ U(0,1);
Step 5: Obtain the next state x∗;
if v ≤ αn(xc,xa), set x∗= xa(Metropolis–Hastings acceptance step);
else
set x∗= xc(Metropolis–Hastings rejection step);
Step 6: Return x∗.
Page 5
3412
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
where
αn(xc,xa) = min
?
1,π(xa)min{π(xc),exp(gn(xc))}
π(xc)min{π(xa),exp(gn(xa))}
?
.
If π(x) is log-concave, gn(x) in Eq. (2) reduces to expression (1). Then, hn(x) is a proper envelope and xain the
Metropolis–Hastings step 5 is always accepted.
3. Adaptive rejection sampling using Lagrange interpolation polynomials
In the case of a log-concave target density, the key idea of ARMS2 is to achieve a better approximation to the
log-density by using a piecewise quadratic rather than a piecewise linear function. This piecewise quadratic function
is constructed from Lagrange interpolation polynomials of degree 2.
Definition 1. The Lagrange interpolation polynomial P(x) of degree (n − 1) that passes through the n points
(x1, y1),...,(xn, yn) is given by
n
?
where
n ?
The formula was first published by Waring (1779), rediscovered by Euler in 1783, and published by Lagrange in 1795
(Jeffreys and Jeffreys, 1988).
Suppose, we start by evaluating the logarithm of the target density at three points. Then there exists a unique
quadratic Lagrange interpolation polynomial going through these three points. Due to concavity, the exponent of this
Lagrange polynomial is proportional to the density of a normal random variable. This is shown in Lemma 1.
P(x) =
j=1
Pj(x),
Pj(x) = yj
k=1,k?=j
x − xk
xj− xk.
Lemma 1. Let l(x) be a strictly concave function, xi ∈ R,i = 1,2,3, be in strictly ascending order, yi = l(xi),
i = 1,2,3, and A = {x1, y1,x2, y2,x3, y3}. Then there exists a unique quadratic function Q(x) = Q(x; A) such that
l(xi) = Q(xi) for i = 1,2,3 and the quadratic function Q is given by
Q(x) = −(x − µ)2
2σ2
where
µ = x2+(x3− x2)k1+ (x2− x1)k2
2(k1− k2)
σ2=
c = y2+((x3− x2)k1+ (x2− x1)k2)2
4(x3− x1)(k1− k2)
with
k1=y2− y1
k2=y3− y2
+ c
(3)
,
x3− x1
2(k1− k2)> 0,
and
x2− x1,
x3− x2.
Proof. A direct calculation using Definition 1 with n = 3 shows that l(xi) = Q(xi) for i = 1,2,3. The condition
σ2> 0 is a direct consequence of the strict concavity of l. The uniqueness comes from the fact that two quadratic
functions cannot intersect at more than two points unless they are identical (polynomial uniqueness theorem, see
e.g. Estep (2002)).
?
Page 6
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3413
Obviously, the exponent of Q(x) is proportional to a normal probability density function
?
1
√
2σ2
exp(Q(x)) = exp
−(x − µ)2
2σ2
+ c
?
∝
2πσ2exp
?
−(x − µ)2
?
.
(4)
Therefore, as the exponent of the quadratic Lagrange interpolation polynomial for a log-concave target π(x) evaluated
at three points is proportional to a normal density, Lemma 1 and (4) provide the rationale for the construction of a
proposal density consisting of a sequence of truncated normal densities. Such a proposal density provides closer
bounds to the target density than piecewise exponential envelopes in ARS, but is generally not a blanketing density.
Given a set of abscissae Sn = {x0< x1< ··· < xn < xn+1} and yi = logπ(xi), i = 1,...,n, there are two
possible quadratic interpolation polynomials for each inner interval (xi,xi+1), namely Qi−1,i,i+1(x) = Q(x; Ai−1)
with Ai−1= {xi−1, yi−1,xi, yi,xi+1, yi+1} based on the three abscissae xi−1,xi,xi+1and Qi,i+1,i+2(x) = Q(x; Ai)
with Ai= {xi, yi,xi+1, yi+1,xi+2, yi+2} based on the three abscissae xi,xi+1,xi+2. Rather than choosing arbitrarily
between these two quadratic functions, we decided to use Qi−1,i,i+1(x) for roughly the first half of the interval,
more precisely for x ∈ [xi,zi), and Qi,i+1,i+2(x) for x ∈ [zi,xi+1), where ziis the abscissa of the intersection of
Li−1,i(x, Sn) and Li+1,i+2(x, Sn). This choice has the potential of further improving the approximation. If uidenotes
the slope of Li,i+1(x, Sn), then zi, i = 1,...n − 1, can be expressed as
zi=ui(xi+1− xi) − ui+1xi+1+ ui−1xi
ui−1− ui+1
where
(5)
uj=logπ(xj+1) − logπ(xj)
xj+1− xj
Setting z0 = x0and zn = xn+1, the adaptive pseudo-envelope gn(x) for log-concave target densities is defined as
follows:
This construction of gn(x) is illustrated in Fig. 3 for a log-concave target function evaluated an n = 4 points. Note
that since the leftmost and rightmost abscissae could be infinite, we use linear functions in the two end-tails, just as in
ARS.
We define the proposal density hn(x) of ARMS2 as follows:
.
(6)
gn(x) =
L0,1(x, Sn),
Qi,i+1,i+2(x, Sn),
Ln,n+1(x, Sn),
for x ∈ (z0,z1),
for x ∈ [zi,zi+1),i = 1,...,n − 2,
for x ∈ [zn−1,zn).
(7)
hn(x) =
1
Mn
exp(gn(x))
=
1
Mn
1
Mn
1
Mn
exp(L0,1(x, Sn)),
for x ∈ (z0,z1)
for x ∈ [zi,zi+1),i = 1,...,n − 2,
for x ∈ [zn−1,zn)
exp(Qi,i+1,i+2(x, Sn)),
exp(Ln,n+1(x, Sn)),
(8)
where
Mn=
?
?z1
D
exp(gn(t))dt
=
z0
exp(L0,1(t))dt +
n−2
?
i=1
?zi+1
zi
exp
?
−(t − µi)2
2σ2
i
+ ci
?
dt +
?zn
zn−1
exp(Ln,n+1(t))dt
Page 7
3414
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
Fig. 3. Construction of the pseudo-envelope function for a log-concave density in the ARMS2 algorithm; thin solid curve: logarithm of the target
density, bold solid curve: logarithm of the proposal density, dashed line: envelope function, and dotted line: squeezing function.
=
1
u0
?exp(L0,1(z1)) − exp(L0,1(z0))?+
1
un−1
iand ci are the constants defined in Lemma 1 with A = Ai, i = 1,...,n − 2. Note that for the inner
intervals, exp(Qi,i+1,i+2(x, Sn)) is proportional to a normal density ψi(x) with mean µiand variance σ2
proposal density hn(x) is a sequence of two piecewise exponential and n − 2 truncated normal densities and we
face the problem of sampling fast and efficiently from truncated normal densities. Also, as hn(x) does not yield a
blanketing density, we need to supplement rejection sampling with a Metropolis–Hastings step, similarly to ARMS.
Thus, the ARMS2 algorithm is specified by the ARMS algorithm given in Section 2 with the appropriate envelope
function gn(x) as defined in Eq. (7). ARMS2, unlike ARS, will not produce independent samples from π(x). It is an
adaptive generalization of the rejection sampling chain proposed by Tierney (1994). In analogy to the generalization
of ARS to ARMS, we extended the ARMS2 algorithm to non-log-concave target densities by using piecewise linear
interpolations for the non-log-concave parts, as shown in Fig. 4.
In the remainder of this section, we focus on operational details of the ARMS2 implementation, namely an efficient
auxiliary variable algorithm to generate truncated normals.
To generate a sample from a truncated normal distribution in the implementation of ARMS2 any technique could be
used, as for instance the inverse CDF method proposed and developed by Marsagalia (1963) and Norman and Cannon
(1972). Although this method is conceptually simple, numerical problems can occur when sampling from highly
skewed or extremely concentrated densities. Furthermore, sampling from truncated normal densities via the inverse
CDF method is computationally expensive. Thus, in order to sample fast and efficiently from truncated normals, we
use an auxiliary variable technique, adaptive uniform rejection sampling (AURS) proposed by Damien and Walker
(2001). The idea is to introduce a latent variable and, after obtaining a sample from the marginal density of the latent
variable, to generate a sample from the conditional density of the target given the latent variable. We give a brief
description of the algorithm:
To sample from a truncated normal density f (x) ∝ exp(−x2/2)I(a < x < b) on a finite interval (a,b), a latent
variable u is introduced with joint density f (x,u) ∝ I(u < exp(−x2/2))I(a < x < b). This yields a monotone
decreasing marginal density
?
for 0 ≤ u ≤ 1 and a conditional density f (x|u) that is uniform on the interval (a,b) ∩ (−?−2logu,+?−2logu).
n−2
?
i=1
exp(ci)
?
2πσ2
i
?zi+1
zi
ψi(t)dt
+
?exp(Ln,n+1(zn)) − exp(Ln,n+1(zn−1))?
and µi,σ2
i. Thus the
f (u) ∝ max0,min(b,
?−2logu) − max(a,−?−2logu)
?
Thus, a sample from f (x) can be generated by sampling u from f (u), and then sampling x from f (x|u). To sample
Page 8
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3415
Fig. 4. Construction of the pseudo-envelope function for a non-log-concave density function in the ARMS2 algorithm; thin solid curve: logarithm
of the target density, bold solid curve: logarithm of the proposal density, dashed line: envelope function, and dotted line: squeezing function.
efficiently from f (u), the fact that f (u) is monotone decreasing is utilized to construct an adaptive rejection sampling
algorithm. At the ith iteration of AURS, suppose f (uj) has been evaluated at uj for j = 1,...,i + 1, where
0 < u1< u2< ··· < ui< ui+1= 1. Then an envelope gi(u) is defined by
i ?
A value u∗is generated from gi(·) and w from the uniform distribution on (0,1). If
f (u∗)
f (uj),
then u∗is accepted as a random variate from f (u), else one proceeds with the (i + 1)th iteration.
Thus, sampling from a truncated normal distribution is reduced to sampling from uniform distributions. We
implemented AURS within ARMS2 to sample from truncated normals with an initial i = 4 points and needed a
mean of 2.86 AURS iterations per sample in the examples discussed in the next section.
gi(u) ∝
j=1
f (uj)I(uj< u < uj+1).
w ≤
where uj≤ u∗< uj+1,
4. Simulation study
In this section we demonstrate the efficiency of the new ARMS2 algorithm in comparison to ARMS when sampling
from four univariate target densities. To give a fair platform for comparisons, we implemented ARMS2 in the C pro-
gramming language used for ARMS on a 400 MHz Linux workstation. We chose three log-concave densities, namely
the Gumbel(0, a), Logistic(0, b) and Normal(10,σ2) distributions, and a further non-log-concave density, the bimodal
mixture distribution 0.3 ∗ Normal(5,0.12) + 0.7 ∗ Normal(6,σ2). We set the left and right bounds as −100 and 100,
respectively, and following general recommendations in Gilks et al. (1995) for log-concave distributions chose four
starting abscissae such that the gradient of L1,2(x, S4) is positive and the gradient of L3,4(x, S4) is negative. We placed
these more or less symmetrically around the mode. The number of target function evaluations could be sensitive to the
initial choice of Snbut as observed by Gilks and Wild (1992), widely separated starting abscissa are only modestly
detrimental and asymmetry in the starting abscissa has only little impact on the number of function evaluations. We
chose x[4] = {−10,−3,7,10} as initial abscissae for the Gumbel and logistic distributions, x[4] = {0,3,17,20} for
the normal distribution and x[4] = {0,3,7,10} for the mixture distribution in both ARMS and ARMS2.
Considering the construction of the pseudo-envelopes, it is for relatively steep target densities that one would
expect the greatest improvement of ARMS2 over ARMS in approximating the target density. The graphs in the top
row of Fig. 5 compare the construction of the envelope function in ARMS for a flat Normal(10,102) and a steep
Page 9
3416
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
Fig. 5. Graphs in the top row compare ARMS envelopes for generating a sample from a Normal(10,102) and a Normal(10,0.22) distribution. The
following rows compare the number of additional abscissae in the construction of ARMS and ARMS2 pseudo-envelopes for generating a sample
from a Normal(10,102) and a Normal(10,0.22) distribution. ⊗ denotes the values of the target log-density at the four initial abscissae. ? denotes
additional functional evaluations required to accept a proposal. The envelopes are shown as bold solid lines.
Normal(10,0.22) log-density. Please note the different scales of the y-axes. For the steep Normal density, the linear
envelope of ARMS gives a very poor approximation. Another illustration is given in the next two rows of Fig. 5
showing the additional abscissae needed to accept a sample from the adaptive proposal density. When the standard
deviation is large, i.e. σ = 10, both ARMS and ARMS2 require only one additional point. But when the standard
deviation is small, i.e. σ = 0.2, ARMS needs 8 additional abscissae whereas ARMS2 only needs one additional point.
To verify empirically, that ARMS2 is more efficient than ARMS particularly for steep target densities, we systemat-
ically varied the values of the scale parameters of each of the four distributions from high to low. For the Gumbel(0,a),
the Logistic(0,b), and the Normal(10, σ2) distribution, we decreased the scale parameters a, b and the standard devia-
tion σ from 10 to 0.1, i.e. the values were set to 10,8,6,4,2,1,0.8,0.6,0.4,0.2,0.1. For the mixture of normal distri-
butions, we altered the standard deviation of the second component. With decreasing scale parameters, the precision of
each of the four distributions increases and the shape of the concave parts of the log-density changes from flat to steep.
Fig. 6 compares the mean CPU time in seconds (based on 10,000 iterations) to generate one sample using ARMS
and ARMS2 from the four distributions with varying scale parameters. As can be seen from these graphs, ARMS is
slightly faster than ARMS2 only for large scale parameters, i.e. relatively flat log-density functions. However, ARMS2
is much faster than ARMS when the log-density is steep, i.e. for low values of the scale parameters. This is due to
the fact that ARMS2 gives a closer pseudo-envelope even for a steep target, resulting in less rejected points in the
Page 10
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3417
Fig. 6. Comparison of the speed of ARMS and ARMS2 (based on 10,000 iterations) for generating one sample from the target distributions with
varying scale parameters.
rejection step and ultimately fewer function evaluations of the target density which immediately translates into higher
computational speed.
These results suggest that it would always be beneficial to rescale the target density first, generate from the
target with increased scale and then transform the draws back to the original scale. After an appropriate rescaling
to a more dispersed density, the use of ARMS would be more efficient than ARMS2. Indeed, in certain situations,
some knowledge about the steepness of the full conditionals might be available from previous experimentations or
subject knowledge and suitable constants for rescaling can be specified. Especially in all applications where the full
conditional density p(θi|θj, j ?= i) is proportional to a location and scale form f ((θi− µ)/σ), where µ and σ
are functions of θj, j ?= i, a repeated sampling using either ARMS or ARMS2 from the same optimally scaled full
conditional followed by an appropriate scale and location change for each of the full conditional densities will result in
increased efficiency. Such cases occur for likelihood functions of the form L(x;?ziθi) for given data x, covariates xi
However, in general Gibbs sampling applications, this strategy would require an automatic derivation of suitable
rescaling factors which might prove to be an impossible undertaking. Thus, we recommend the use of ARMS2 in
general Gibbs sampling applications where there is limited knowledge about the shape of the full conditionals since
ARMS2 is only marginally slower for dispersed but considerably faster for concentrated densities than ARMS.
Fig. 7 shows kernel density estimates based on the samples from the ARMS and ARMS2 algorithms for the four
target densities Gumbel(0, 0.4), Logistic(0, 0.4), Normal(10,0.42) and 0.3∗Normal(5,0.12)+0.7∗Normal(6,0.42).
and parameters θiwith flat priors for θi’s, e.g. in the proportional hazards model, as discussed in Dellaportas (1995).
Page 11
3418
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
Fig. 7. True density functions of the Gumbel, logistic, normal, and bimodal mixture distribution overlaid by empirical estimates (kernel density
estimates) using ARMS and ARMS2 based on 10,000 samples.
Table 1
Comparison of the CPU time in seconds and the number of evaluations of π(x) required to generate one sample from the target density using
ARMS and ARMS2 based on 10,000 iterations
Target function Maximum number of evaluations
of π(x)
ARMS
Mean number of evaluations of
π(x)
ARMS
Speed
ARMS2ARMS2 ARMSARMS2
Gumbel(0, 0.4)
Logistic(0, 0.4)
Normal(10,0.42)
0.3 ∗ N(5,0.12) + 0.7 ∗ N(6,0.42)
30
11
16
18
1021.80
7.24
11.54
12.17
6.29
6.12
6.08
6.16
10.05
3.18
3.90
3.60
4.25
2.53
2.40
2.85
8
7
10
These are basically not divergent from the true densities and thus demonstrate good sampling qualities for both
techniques. Q–Q plots (not shown here) produced from the samples of ARMS and ARMS2 showed basically straight
lines confirming that the sample sets from ARMS and ARMS2 stem from the same distribution.
Table 1 compares the computational speed as well as the mean and maximum number (based on 10,000
iterations) of function evaluations of the four target densities Gumbel(0, 0.4), Logistic(0, 0.4), Normal(10,0.42) and
0.3 ∗ N(5,0.12) + 0.7 ∗ N(6,0.42) needed for generating one sample by using ARMS and ARMS2. As expected,
ARMS2 requires a significantly smaller maximum and mean number of function evaluations for all four densities.
Therefore, the time required for generating 10,000 samples is considerably less when using ARMS2 than ARMS.
5. State-space model example
To illustrate the comparative advantages of ARMS2 over ARMS, we chose a case study with the framework of
Bayesian non-linear non-normal state-space models because here the Gibbs sampler requires sampling from a large
Page 12
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3419
number (greater than the length of the time series) of complex full conditional distributions. The state-space approach
is one of the most powerful tools for dynamic modeling and forecasting of time series and longitudinal data. Excellent
overviews are given in Fahrmeir and Tutz (1994), West and Harrison (1997) and Kuensch (2001). The observation
equations of a state-space model specify the conditional distributions of the observations ytat time t as a function
of unknown states θt. But unlike a static model, the state of nature, θt, changes over time according to a relationship
prescribed by engineering or scientific principles. This dynamic Markovian transition of the latent states from time
t to t + 1 is given by the state equations. The ability to include knowledge of the system behaviour in the statistical
model is largely what makes state-space modeling so attractive for biologists, economists, engineers, and physicists.
ThecommonKalmanfilter(Kalman,1960)isnotapplicableformaximumlikelihoodestimationbecauseitdepends
crucially on the linearity of state-space equations and normal error distributions, assumptions that are limiting and not
realistic in most applications. For functionally non-linear state-space models, only approximate filters, including the
extended Kalman filter (Harvey, 1989), are available. For non-linear non-normal state-space models, ML estimation
is complicated due to the intractable form of the likelihood function. Similarly, Bayesian posterior computation will
generally require multidimensional integration to find normalization constants as well as marginal summaries. Carlin
et al. (1992) showed how these computational difficulties can be overcome by the Gibbs sampler. For many non-linear
non-normal state-space models, however, the full conditional posterior distributions tend to be complex functions so
that a simple rejection method as proposed by Carlin et al. (1992) is no longer feasible. To this end, the Gibbs sampler
in conjunction with ARMS has been successfully applied for fitting non-linear non-normal state-space models, e.g. in
the context of stochastic volatility models in econometrics (e.g. Meyer and Yu (2000)), chaotic non-linear dynamics
in physics (Meyer and Christensen, 2000) and the delay difference model for fisheries stock assessment (e.g. Meyer
and Millar (1999)). We will use the latter example for illustration. In the following, we give a brief description of the
delay difference model. A more detailed account is given in Meyer and Millar (1999).
Population dynamics models (e.g., for a review, see Hilborn and Walters (1992)) in general relate exploitable
biomass in year t + 1 to biomass, growth, recruitment, natural mortality, and catch in the previous year t. A state-
space model relates the observed relative abundance indices {It}, e.g., catch per unit effort (CPUE) from commercial
fisheries, to unobserved states, here the biomasses {Bt} by a stochastic observation model for Itgiven Bt. The delay
difference model is used in Meyer and Millar (1999) for the stock assessment of yellowfin tuna in the eastern tropical
Pacific Ocean. The historical data, consisting of catch in million of pounds and CPUE in pounds per boat-day for
the years 1934–1967, are taken from Pella and Tomlinson (1969) and reproduced in Table 2. Here we reanalyze this
dataset using ARMS2 and compare results to those obtained using ARMS in Meyer and Millar (1999).
Catches and relative abundance indices in years t = 1,..., N are denoted by Ctand It, respectively. The unknown
parameters of interest are the carrying capacity K, recruitment R and catchability q, whereas the growth parameters
ρ and ω are assumed to be known. Assuming that the observed relative abundance indices Itare proportional to the
total biomass Bt, and defining
Pt=Bt
K,
k =
1
K,
r =R
K,
and
Q = qK
the observation equations of the non-linear state-space model are given by
It= QPt+ vt
The state transitions are governed by the delay difference model that relates the annual biomass to the biomasses in
the two previous years:
for t = 1,..., N.
(9)
P1= 1 + u1
P2= e−M(1 + ρ − ρ e−M)(P1− kC1) + r
Pt+1 = (1 + ρ)e−M(Pt− kCt) − ρ e−2M(Pt− kCt)
?
?
1 − ρ ωe−M(P1− kC1)
P1
?
+ u2
Pt
(Pt−1− kCt−1)
+r
1 − ρ ωe−M(Pt− kCt)
Pt
?
+ ut+1
(10)
Page 13
3420
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
Table 2
Catch (millions of pounds) and CPUE (pounds per boat-day) data from Pella and Tomlinson (1969)
Year CatchCPUE
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
60.9
72.3
78.4
91.5
78.3
110.4
114.6
76.8
42.0
50.1
64.9
89.2
129.7
160.2
207.0
200.1
224.8
186.0
195.3
140.0
140.0
140.9
177.0
163.0
148.5
140.5
244.3
230.9
174.1
145.5
203.9
180.1
182.3
178.9
10361
11484
11571
11116
11463
10528
10609
8018
7040
8441
10019
9512
9292
7857
8353
8363
7057
10108
5606
3852
5339
8191
6507
6090
4768
4982
6817
5544
4120
4368
4844
4166
4513
5292
for t = 2,..., N. We assume independent normal errors for {ut} and {vt}. Specifically, ut ∼ N(0,σ2) and the
CPUEs are given approximately a constant coefficient of variation by assuming the vtto be N(0,wtτ2) with weights
wtproportional to the squared fitted values obtained from a non-linear robust smoothing of the CPUE time series by
means of running medians (using the SPLUS function “smooth”). The weights are standardized by wN = 1 so that
vN∼ N(0,τ2).
The 39 unobservables in this delay difference model are the five unknown population parameters and the N = 34
unknown relative biomasses (K, Q,r,σ2,τ2, P1,..., PN). We used the same prior distributions as in Meyer and
Millar (1999). The full conditional posterior densities are given in the Appendix. We performed 250,000 cycles of
the Gibbs sampler and thinned the chain by taking every 25th observation. For the remaining 10,000 samples, we
used a burn-in period of 1000, which yielded a final chain of length 9000. The results are summarized in Table 3. We
achieved a considerable reduction in both the maximum and mean number of target function evaluations when using
ARMS2 as compared to ARMS. Furthermore, the computation time was almost halved. Whereas the implementation
with ARMS took 124.86 min for 250,000 iterations, that using ARMS2 required only 69.35 min.
6. Discussion
In this article, we presented an alternative adaptive rejection Metropolis sampling method, ARMS2, to the black-
box algorithm ARMS to sample from arbitrarily complex univariate distributions. ARMS2 can be used to generate
from log-concave and non-log-concave distributions. ARMS2 uses Lagrange interpolation polynomials of degree
Page 14
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3421
Table 3
Comparison of the parameter estimates (with standard deviations (SD) in parentheses) for the delay difference model, and the maximum and mean
number of evaluations of the joint posterior density π(x) using ARMS and ARMS2a
ModelMean (SD) Maximum number of evaluations of
π(x)
ARMS
Mean number of
evaluations of π(x)
ARMS ARMS ARMS2 ARMS2ARMS2
P1
P34
P35
K
Q
r
σ2
τ2
1.021(0.115)
0.488(0.085)
0.511(0.142)
1801(1683)
10798(1493)
0.203(0.052)
0.011(0.008)
260010(160032)
1.023(0.114)
0.490(0.085)
0.510(0.143)
1804(1683)
10801(1487)
0.203(0.051)
0.011(0.008)
260016(160029)
19
17
16
19
27
28
14
15
9 10.53
10.78
10.75
10.14
13.62
17.62
9.03
9.34
6.83
6.78
6.91
6.29
10.33
12.68
6.58
6.65
10
9
10
16
19
9
9
aIterations = 9000.
2 rather than linear functions to construct a pseudo-envelope for the target density. Thus, it achieves a better
approximation of the target by the blanketing density and ultimately fewer target function evaluations which can be
computationally expensive in many applications. A considerable reduction in target function evaluations and thereby
a pronounced decrease in computation time is seen for steep log-densities. This is irrespective of whether the target
density is log-concave or non-log-concave. For arbitrary target densities, if ARMS2 loses against ARMS, it does not
losebymuchwhileifitwinsthegainscanbelarge,implyingthatiftheshapeofthedensityisnotverywellunderstood
(which is often the case in Gibbs sampling applications) ARMS2 seems to be a safer strategy than ARMS. Gibbs
sampling in complex hierarchical Bayesian models, such as the non-linear non-normal state-space model discussed
in Section 5 or population pharmacokinetic models as in Gilks et al. (1995) usually encounters a large number of
algebraically complex and mostly non-log-concave full conditionals, that are costly to evaluate. Here, the marked
reduction in target function evaluations yields a substantial speed-up of implementation.
The C-subroutine of the ARMS2 implementation is similar in structure to that of ARMS (Gilks et al., 1995) and
available upon request from the authors.
Acknowledgements
The authors gratefully acknowledge the support of this research by the Marsden Fund Council from Government
funding, administered by the Royal Society of New Zealand and by NSERC.
Appendix
Full conditional posterior densities for latent states and parameters in the delay difference model.
In the following, let
(1 + ρ)e−M(Pt−1− kCt−1) − ρ e−2M(Pt−1− kCt−1)
?
Full conditional posterior density of Pt, t = 3,..., N − 2:
p(Pt| P1,..., Pt−1, Pt+1,..., PN,k,r, Q,σ2,τ2)
∝ p(Pt| Pt−1, Pt−2,k,r,σ2) × p(Pt+1| Pt, Pt−1,k,r,σ2)
×p(Pt+2| Pt+1, Pt,k,r,σ2) × p(It| Pt, Q,τ2)
g(Pt) =
1
e−M(1 + ρ − ρ e−M)(P1− kC1) + r
for t = 1,
for t = 2,
?
1 − ρ ωe−M(P1− kC1)
P1
?
Pt−1
(Pt−2− kCt−2)
+r
1 − ρ ωe−M(Pt−1− kCt−1)
Pt−1
?
for t = 3,..., N + 1
Page 15
3422
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
∝ exp
?
−
?
1
2σ2{(Pt− g(Pt))2+ (Pt+1− g(Pt+1))2+ (Pt+2− g(Pt+2))2}
1
2wtτ2(It− QPt)2
Similar expressions are obtained for P1, P2, PN−1, PN, and PN+1by dropping respective terms.
Full conditional posterior density of k:
?
×exp
−
?
.
p(k | P1,..., PN,r, Q,σ2,τ2) ∝ p(k)
N ?
t=2
p(Pt| Pt−1Pt−2,k,r,σ2)
?
t=2
∝
1
kexp
0,
−
1
2σ2
N
?
(Pt− g(Pt))2
?
,
k > 0,
otherwise
Full conditional posterior density of r:
p(r | P1,..., PN,k, Q,σ2,τ2) ∝ p(r)
N ?
t=2
p(Pt| Pt−1, Pt−2,k,r,σ2)
?
r
∝
1
rexp
0,
−(logr − µr)2
2σ2
−
1
2σ2
N
?
t=2
(Pt− g(Pt))2
?
,
r > 0
otherwise
Full conditional posterior density of Q:
p(Q | P1,..., PN,k,r,σ2,τ2) ∝ p(Q)
N ?
t=2
p(It| Pt, Q,τ2)
?
t=1
∝
1
Qexp
0,
−
1
2τ2
N
?
(It− QPt)2
wt
?
,
Q > 0
otherwise
Full conditional posterior density of σ2:
The full conditional distribution for σ2is IG(α,β) where
α =N
2,β =1
2
N
?
t=1
(Pt− g(Pt))2.
Full conditional posterior density of τ2:
The IG(3,500,000) prior distribution is conjugate family, so that the full conditional distribution for τ2is again
IG(α,β) where
α = 3 +N
2,β = 500,000 +1
2
N
?
t=1
(It− QPt)2
wt
.
References
Carlin, B.P., Polson, N.G., Stoffer, D.S., 1992. A Monte Carlo approach to nonnormal and nonlinear state-space modeling. Journal of the Americal
Statistical Association 87, 493–500.
Casella, G., George, E.I., 1992. Explaining the Gibbs sampler. American Statistician 46, 167–174.
Chen, M.-H., Schmeiser, B.W., 1998. Toward black-box sampling: A random-direction interior-point Markov chain approach. Journal of
Computational and Graphical Statistics 7, 1–22.
Page 16
R. Meyer et al. / Computational Statistics and Data Analysis 52 (2008) 3408–3423
3423
Damien, P., Wakefield, J.C., Walker, S.G., 1999. Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables.
Journal of the Royal Statistical Society Series B 61, 331–344.
Damien, P., Walker, S.G., 2001. Sampling truncated normal, beta, and gamma densities. Journal of Computational and Graphical Statistics 10 (2),
206–215.
Dellaportas, P., 1995. Random variate transformations in the Gibbs sampler: Issues of efficiency and convergence. Statistics and Computing 5,
133–140.
Dellaportas, P., Smith, A.F.M., 1993. Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling. Applied
Statistics 42, 443–459.
Devroye, L., 1986. Non-uniform Random Variate Generation. Springer-Verlag, New York.
Estep, D., 2002. Practical Analysis in One Variable. Springer, New York.
Fahrmeir, L., Tutz, G., 1994. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York.
Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern
Analysis and Machine Intelligence 6, 721–741.
Gilks, W.R., 1992. Derivative-free adaptive rejection sampling for Gibbs sampling. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M.
(Eds.), Bayesian Statistics, vol. 4. Clarendon, Oxford, pp. 641–649.
Gilks, W.R., Wild, P., 1992. Adaptive rejection sampling for Gibbs sampling. Applied Statistics 41 (2), 337–348.
Gilks, W.R., Best, N.G., Tan, K.K.C., 1995. Adaptive rejection Metropolis sampling within Gibbs sampling. Applied Statistics 44, 455–472.
Harvey, A., 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, New York.
Hilborn, R., Walters, C.J., 1992. Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty. Chapman & Hall, New York.
Jeffreys, H., Jeffreys, B.S., 1988. Lagrange’s Interpolation Formula. 9.011 in Methods of Mathematical Physics, third ed. Cambridge University
Press, Cambridge, p. 260.
Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. Journal on Basic Engineering 82, 34–45.
Kuensch, H.R., 2001. State space and hidden Markov models. In: Barndorff-Nielsen, et al. (Eds.), Complex Stochastic Systems. Chapman & Hall,
London, pp. 109–174.
Marsagalia, G., 1963. Generating discrete random variables in a computer. Communications of the ACM 6, 101–102.
Meyer, R., Millar, R.B., 1999. Bayesian stock assessment using a state-space implementation of the delay difference model. Canadian Journal on
Fisheries and Aquatic Science 56, 37–52.
Meyer, R., Yu, J., 2000. BUGS for a Bayesian analysis of stochastic volatility models. The Econometrics Journal 3 (2), 198–215.
Meyer, R., Christensen, N.L., 2000. Bayesian reconstruction of chaotic dynamical systems. Physical Review E 62, 3535–3542.
Mira, A., Tierney, L., 2002. Efficiency and convergence properties of slice samplers. Scandinavian Journal of Statistics 29, 1–12.
Neal, R.M., 2003. Slice sampling. The Annals of Statistics 31, 705–767.
Norman, J.E., Cannon, L.E., 1972. A computer program for the generation of random variables from any discrete distribution. Journal of Statistical
Computation and Simulation 1, 331–348.
Pella, J.J., Tomlinson, P.K., 1969. A generalized stock production model. Inter American Tropical Tuna Commission Bulletin 13, 421–496.
Ripley, B.D., 1987. Stochastic Simulation. Wiley, New York.
Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R., 1996. BUGS 0.5, Bayesian inference using Gibbs sampling. Manual (Version II). MRC
Biostatistics Unit, Cambridge, UK.
Tierney, L., 1994. Markov chains for exploring posterior distribution. Annals of Statistics 22, 1701–1762.
Waring, E., 1779. Philosophical Transactions 69, 59–67.
Wakefield, J.C., Gelfand, A.E., Smith, A.F.M., 1991. Efficient generation of random variates via the ratio-of-uniform method. Statistics and
Computing 1, 129–133.
Wakefield, J.C., Smith, A.F.M., Racine-Poon, A., Gelfand, A.E., 1994. Bayesian analysis of linear and nonlinear population models using the Gibbs
sampler. Journal of Applied Statistics 43, 201–221.
West, M., Harrison, P.J., 1997. Bayesian Forecasting and Dynamic Models, second ed. Springer, New York.