ArticlePDF Available

Abstract and Figures

A framework is presented for generalized additive modelling under shape constraints on the component functions of the linear predictor of the GAM. We represent shape constrained model components by mildly non-linear extensions of P-splines. Models can contain multiple shape constrained and unconstrained terms as well as shape constrained multi-dimensional smooths. The constraints considered are on the sign of the first or/and the second derivatives of the smooth terms. A key advantage of the approach is that it facilitates efficient estimation of smoothing parameters as an integral part of model estimation, via GCV or AIC, and numerically robust algorithms for this are presented. We also derive simulation free approximate Bayesian confidence intervals for the smooth components, which are shown to achieve close to nominal coverage probabilities. Applications are presented using real data examples including the risk of disease in relation to proximity to municipal incinerators and the association between air pollution and health.
Content may be subject to copyright.
Stat Comput (2015) 25:543–559
DOI 10.1007/s11222-013-9448-7
Shape constrained additive models
Natalya Pya ·Simon N. Wood
Received: 6 March 2013 / Accepted: 27 December 2013 / Published online: 25 February 2014
© The Author(s) 2014. This article is published with open access at Springerlink.com
Abstract A framework is presented for generalized addi-
tive modelling under shape constraints on the component
functions of the linear predictor of the GAM. We represent
shape constrained model components by mildly non-linear
extensions of P-splines. Models can contain multiple shape
constrained and unconstrained terms as well as shape con-
strained multi-dimensional smooths. The constraints consid-
ered are on the sign of the first or/and the second deriva-
tives of the smooth terms. A key advantage of the approach
is that it facilitates efficient estimation of smoothing para-
meters as an integral part of model estimation, via GCV
or AIC, and numerically robust algorithms for this are pre-
sented. We also derive simulation free approximate Bayesian
confidence intervals for the smooth components, which are
shown to achieve close to nominal coverage probabilities.
Applications are presented using real data examples includ-
ing the risk of disease in relation to proximity to munic-
ipal incinerators and the association between air pollution
and health.
Keywords Monotonic smoothing ·Convex smoothing ·
Generalized additive model ·P-splines
Electronic supplementary material The online version of this
article (doi:10.1007/s11222-013-9448-7) contains supplementary
material, which is available to authorized users.
N. Pya (B)·S. N. Wood
Mathematical Sciences, University of Bath,
Bath BA2 7AY, UK
e-mail: n.y.pya@bath.ac.uk
S. N. Wood
e-mail: s.wood@bath.ac.uk
1 Introduction
This paper is about estimation and inference with the model
gi)=Aθ+
j
fj(zji)+
k
mk(xki), YiEFi),
(1)
where Yiis a univariate response variable with mean μiaris-
ing from an exponential family distribution with scale para-
meter φ(or at least with mean variance relationship known
to within a scale parameter), gis a known smooth monotonic
link function, Ais a model matrix, θis a vector of unknown
parameters, fjis an unknown smooth function of predictor
variable zjand mkis an unknown shape constrained smooth
function of predictor variable xk. The predictors xjand zk
may be vector valued.
It is the shape constraints on the mkthat differentiate this
model from a standard generalized additive model (GAM).
In many studies it is natural to assume that the relation-
ship between a response variable and one or more predictors
obeys certain shape restrictions. For example, the growth
of children over time and dose-response curves in medi-
cine are known to be monotonic. The relationships between
daily mortality and air pollution concentration, between body
mass index and incidence of heart disease are other examples
requiring shape restrictions. Unconstrained models might be
too flexible and give implausible or un-interpretable results.
Here we develop a general framework for shape con-
strained generalized additive models (SCAM), covering esti-
mation, smoothness selection, interval estimation and also
allowing for model comparison. The aim is to make SCAMs
as routine to use as conventional unconstrained GAMs. To do
this we build on the established framework for generalized
123
544 Stat Comput (2015) 25:543–559
additive modelling covered, for example, in Wood (2006a).
Model smooth terms are represented using spline type penal-
ized basis function expansions; given smoothing parameter
values, model coefficients are estimated by maximum penal-
ized likelihood, achieved by an inner iteratively reweighted
least squares type algorithm; smoothing parameters are esti-
mated by the outer optimization of a GCV or AIC crite-
rion. Interval estimation is achieved by taking a Bayesian
view of the smoothing process, and model comparison can
be achieved using AIC, for example.
This paper supplies the novel components required to
make this basic strategy work, namely
1. We propose shape constrained P-splines (SCOP-splines),
based on a novel mildly non linear extension of the P-
splines of Eilers and Marx (1996), with novel discrete
penalties. These allow a variety of shape constraints for
one and multidimensional smooths. From a computa-
tional viewpoint, they ensure that the penalized likelihood
and the GCV/AIC scores are smooth with respect to the
model coefficients and smoothing parameters, allowing
the development of efficient and stable model estimation
methods.
2. We develop stable computational schemes for estimating
the model coefficients and smoothing parameters, able to
deal with the ill-conditioning that can affect even uncon-
strained GAM fits (Wood 2004,2008), while retaining
computational efficiency. The extra non-linearity induced
by the use of SCOP-splines does not allow the uncon-
strained GAM methods to be re-used or simply modified.
Substantially new algorithms are required instead.
3. We provide simulation free approximate Bayesian confi-
dence intervals for the SCOP-spline model components
in this setting.
The bulk of this paper concentrates on these new develop-
ments, covering standard results on unconstrained GAMs
only tersely. We refer the reader to Wood (2006a) for a more
complete coverage of this background. Technical details and
extensive comparative testing are provided in online supple-
mentary material.
To understand the motivation for our approach, note that
it is not difficult to construct shape constrained spline like
smoothers, by subjecting the spline coefficients to linear
inequality constraints (Ramsay 1988;Wood 1994;Zhang
2004;Kelly and Rice 1990;Meyer 2012). However, this
approach leads to methodological problems in estimating the
smoothing parameters of the spline. The use of linear inequal-
ity constraints makes it difficult to optimize standard smooth-
ness selection criteria, such as AIC and GCV with respect to
multiple smoothing parameters. The difficulty arises because
the derivatives of these criteria change discontinuously as
constraints enter or leave the set of active constraints. This
leads to failure of the derivative based optimization schemes
which are essential for efficient computation when there are
many smoothing parameters to optimize. SCOP-splines cir-
cumvent this problem.
Other procedures based on B-splines were proposed by He
and Shi (1998), Bollaerts et al. (2006), Rousson (2008), Wang
and Meyer (2011). Meyer (2012) presented a cone projection
method for estimating penalized B-splines with monotonic-
ity or convexity constraints and proposed a GCV based test
for checking the shape constrained assumptions. Monotonic
regression within the Bayesian framework has been consid-
ered by Lang and Brezger (2004), Holmes and Heard (2003),
Dunson and Neelon (2003), and Dunson (2005). In spite of
their diversity these existing approaches also lack the ability
to efficiently compute the smoothing parameter in a multiple
smooth context. In addition, to our knowledge except for the
bivariate constrained P-spline introduced by Bollaerts et al.
(2006), multi-dimensional smooths under shape constraints
on either all or a selection of the covariates have not yet been
presented in the literature.
The remainder of the paper is structured as follows. The
next section introduces SCOP-splines. Section 3.1 shows
how SCAMs can be represented for estimation. A penal-
ized likelihood maximization method for SCAM coefficient
estimation is discussed in Sect. 3.2. Section 3.3 investigates
the selection of multiple smoothing parameters. Interval esti-
mation of the component smooth functions of the model is
considered in Sect. 3.4. A simulation study is presented in
Sect. 4while Sect. 5demonstrates applications of SCAM to
two epidemiological examples.
2 SCOP-splines
2.1 B-spline background
In the smoothing literature B-splines are a common choice
for the basis functions because of their smooth interpolation
property, flexibility, and local support. The B-splines proper-
ties are thoroughly discussed in De Boor (1978). Eilers and
Marx (1996) combined B-spline basis functions with dis-
crete penalties in the basis coefficients to produce the popular
‘P-spline’ smoothers. Li and Ruppert (2008) established the
corresponding asymptotic theory. Specifically that the rate
of convergence of the penalized spline to a smooth function
depends on an order of the difference penalty but not on a
degree of B-spline basis and number of knots, given that the
number of knots grows with the number of data and assum-
ing the function is twice continuously differentiable. Ruppert
(2002) and Li and Ruppert (2008) showed that the choice
of the basis dimension is not critical but should be above
some minimal level which depends on the spline degree.
Asymptotic properties of P-splines were also studied in
123
Stat Comput (2015) 25:543–559 545
Kauermann et al. (2009) and Claeskens et al. (2009). Here
we propose to build on the P-spline idea to produce SCOP-
splines.
2.2 One-dimensional case
The basic idea is most easily introduced by considering the
construction of a monotonically increasing smooth, m,using
a B-spline basis. Specifically let
m(x)=
q
j=1
γjBj(x),
where qis the number of basis function, the Bjare B-
spline basis functions of at least second order for represent-
ing smooth functions over interval [a,b], based on equally
spaced knots, and the γjare the spline coefficients.
It is well known that a sufficient condition for m(x)0
over [a,b]is that γjγj1j(see Supplementary material,
S.1, for details). In the case of quadratic splines this condition
is necessary. It is easy to see that this condition could be
imposed by re-parameterizing, so that
γ=Σ˜
β,
where β=β1
2,...,β
qTand ˜
β=β1,exp2),...,
expq)T, while Σij =0ifi<jand Σij =1ifij.
So if m=[m(x1), m(x2), . . . , m(xn)]Tis the vector of
mvalues at the observed points xi, and Xis the matrix such
that Xij =Bj(xi), then we have
m=XΣ˜
β.
2.2.1 Smoothing
In a smoothing context we would also like to havea penalty on
the m(x)which can be used to control its ‘wiggliness’. Eilers
and Marx (1996) introduced the notion of directly penaliz-
ing differences in the basis coefficients of a B-spline basis,
which is used with a relatively large qto avoid underfitting.
We can adapt this idea here. For j>2 our βjare log dif-
ferences in γj. We therefore propose penalizing the squared
differences between adjacent βj, starting from β2,usingthe
penalty Dβ2where Dis the (q2)×qmatrix that is all
zero except that Di,i+1=−Di,i+2=1fori=1,...,q2.
The penalty is zeroed when all the βjafter β1are equal, so
that the γjform a uniformly increasing sequence and m(x)is
an increasing straight line (see Fig. 1). As a result our penalty
shares with a second order P-spline penalty, the basic feature
of ‘smoothing towards a straight line’, but in manner that is
computationally convenient for constrained smoothing.
It might be asked whether penalization is necessary at
all, given the restrictions imposed by the shape constraints?
−1 0 1 2 3
0123
x
y
Fig. 1 Illustration of the SCOP-splines for five values of the smooth-
ing parameter: λ1=104(long dashed curve), λ2=0.005 (short
dashed curve), λ3=0.01 (dotted curve), λ4=0.1(dot-dashed curve),
and λ5=100 (two dashed curve). The true curve is represented as a
solid line and dots show the simulated data. Twenty five B-spline basis
functions of the third order were used
−1 0123
0123
x
y
Fig. 2 Illustration of the SCOP-splines: un-penalized (long dashed
curve,λ=0), penalized (dotted curve,λ=104), and the true curve
(solid line). Despite a monotonicity constraint, the un-penalized curve
shows spurious detail that the penalty can remove
Figure 2provides an illustration of what the penalty achieves.
Even with shape constraint, the unpenalized estimated curve
shows a good deal of spurious variation that the penalty
removes.
2.2.2 Identifiability, basis dimension
If we were interested solely in smoothing one-dimensional
Gaussian data then βwould be chosen to minimize
yXΣ˜
β2+λDβ2,
where λis a smoothing parameter controlling the trade-off
between smoothness and fidelity to the response data y. Here,
123
546 Stat Comput (2015) 25:543–559
we are interested in the basis and penalty in order to be able to
embed the shape constrained smooth m(x)in a larger model.
This requires an additional constraint on m(x)in order to
achieve identifiability to avoid confounding with the intercept
of the model in which it is embedded. A convenient way to
do this is to use centering constraints on the model matrix
columns, i.e. the sum of the values of the smooth is set to be
zero n
i=1m(xi)=0 or equivalently 1TXΣβ =0.
As with any penalized regression spline approaches, the
choice of the basis dimension, q, is not crucial but should be
generous enough to avoid oversmoothing/underfitting (Rup-
pert 2002;Li and Ruppert 2008). Ruppert (2002) suggested
algorithms for the basis dimension selection by minimizing
GCV over a set of specified values of q, while Kauermann
and Opsomer (2011) proposed an equivalent likelihood based
scheme.
This simple monotonically increasing smooth can be
extended to a variety of monotonic functions, including
decreasing, convex/concave, increasing/decreasing and con-
cave, increasing/ decreasing and convex, the difference
between alternative shape constraints being the form of the
matrices Σand D. Table 1details eight possibilities, while
Supplementary material, S.2, provides the corresponding
derivations.
2.3 Multi-dimensional SCOP-splines
Using the concept of tensor product spline bases it is pos-
sible to build up smooths of multiple covariates under
the monotonicity constraint, where monotonicity may be
assumed on either all or a selection of the covariates. In
this section the construction of a multivariable smooth,
m(x1,x2,...,xp), with multiple monotonically increasing
constraints along all covariates is first considered, followed
by a discussion of single monotonicity along a single direc-
tion.
2.3.1 Tensor product basis
Consider pB-spline bases of dimensions q1,q2,and qp
for representing marginal smooth functions, each of a single
covariate
f1(x1)=
q1
k1=1
α1
k1Bk1(x1), f2(x2)=
q2
k2=1
α2
k2Bk2(x2), . . . ,
fp(xp)=
qp
kp=1
αp
kpBkp(xp),
Tabl e 1 Univariate shape constrained smooths
Shape constraints ΣD
Monotone increasing Σij =0,if i<j
1,if ij
Di,i+1=−Di,i+2=1,
i=1,...,q2
Dij =0,otherwise
Monotone decreasing Σij =
0,if i<j
1,if j=1,i1
1,if j2,ij
Di,i+1=−Di,i+2=1,
i=1,...,q2
Dij =0,otherwise
Convex Σij =
0,if i<j
1,if j=1,i1
(i1), if j=2,ij
ij+1,if j3,ij
Di,i+2=−Di,i+3=1,
i=1,...,q3
Dij =0,otherwise
Concave Σij =
0,if i<j
1,if j=1,i1
i1,if j=2,ij
(ij+1), if j3,ij
As above
increasing and convex Σij =
0,if i<j
1,if j=1,i1
ij+1,if j2,ij
As above
Increasing and concave Σij =
0,if i=1,j2
1,if j=1,i1
i1,if i2,j=2,...,q1+2
qj+1,if i2,j=qi+3,...,q
As above
Decreasing and convex Σij =
0,if i=1,j2
1,if j=1,i1
(i1), if i2,j=2,...,q1+2
(qj+1), if i2,j=qi+3,...,q
As above
Decreasing and concave Σij =
0,if i<j
1,if j=1,i1
(ij+1), if j2,ij
As above
123
Stat Comput (2015) 25:543–559 547
where Bkj(xj), j=1,..., p,are B-spline basis func-
tions, and αj
kjare spline coefficients. Then, following Wood
(2006a) the multivariate smooth can be represented by
expressing spline coefficients of the marginal smooths as
the B-spline of the following covariate, starting from the
first marginal smooth. By denoting Bk1...kp(x1,...,xp)=
Bk1(x1)·...·Bkp(xp), the smooth of pcovariates may be
written as follows
m(x1,...,xp)=
q1
k1=1
...
qp
kp=1
Bk1...kp(x1,...,xpk1...kp,
where γk1...kpare unknown coefficients.
So if Xis the matrix such that its ith row is Xi=X1i
X2i⊗···⊗Xpi,where denotes a Kronecker product, and
γ=11...1,...,γ
k1k2...kp,...,γ
q1q2...qp)T,then
m=Xγ.
2.3.2 Constraints
By extending the univariate case one can see that a suffi-
cient condition for f(x1,...,xp)/∂ xj0isγk1...kj...kp
γk1...(kj1)...kp.To impose these conditions the
re-parametrization γ=Σ˜
βis proposed, where
˜
β=β11...1,exp11...2),...,expk1...kp),...,expq1...qp)T,
and Σ=Σ1Σ2···Σp.The elements of Σj
are the same as for the univariate monotonically increas-
ing smooth (see Table 1). For the multiple monotonically
decreasing multivariate function Σ=1:Σ
(,1),where
Σ=−Σ1Σ2⊗···Σp,that is Σis a matrix Σwith
the first column replaced by the column of one’s.
To satisfy conditions for a monotonically increasing or
decreasing smooth with respect to only one covariate the
following re-parameterizations are suggested:
1. For the single monotonically increasing constraint along
the xjdirection:
Let Σjbe defined as previously while Isis an identity
matrix of size qs,s= j,then
Σ=I1⊗···Σj⊗···Ip,
and γ=Σ˜
β,where ˜
βis a vector containing a mixture
of un-exponentiated and exponentiated coefficients with
˜
βk1...kj...kp=expk1...kj...kp)when kj= 1.
2. For the single monotonically decreasing constraint along
the xjdirection:
The re-parametrization is the same as above except for
the representation of the matrix Σjwhich is as for uni-
variate smooth with monotonically decreasing constraint
(see Table 1).
By analogy it is not difficult to construct tensor products
with monotonicity constraints along any number of covari-
ates.
2.3.3 Penalties
For controlling the level of smoothing, the penalty introduced
in Sect. 2can be extended. For multiple monotonicity the
penalties may be written as
P=λ1βTS1β+λ2βTS2β+···+λpβTSpβ,
where Sj=DT
jDjand Dj=I1I2⊗···Dmj ···⊗Ip.
Dmj is as Din Table 1for a monotone smooth. Penalties for
single monotonicity along xjare
P=λ1βT˜
S1β+···+λjβTSjβ+λpβT˜
Spβ,
where Sjis defined as above. The penalty matrices ˜
Si,i= j,
in the unconstrained directions can be constructed using the
marginal penalty approach described in Wood (2006a). The
degree of smoothness in the unconstrained directions can be
controlled by the second-order difference penalties applied
to the non-exponentiated coefficients, and by the first-order
difference penalties for the exponentiated coefficients. As
in the univariate case, these penalties keep the parameter
estimates close to each other, resulting in similar increments
in the coefficients of marginal smooths. When λj→∞such
penalization results in straight lines for marginal curves.
3 SCAM
3.1 SCAM representation
To represent (1) for computation we now choose basis expan-
sions, penalties and identifiability constraints for all the
unconstrained fj, as described in detail in Wood (2006a),
for example. This allows jfj(zji)to be replaced by Fiγ,
where Fis a model matrix determined by the basis functions
and the constraints, and γis a vector of coefficients to be
estimated. The penalties on the fjare quadratic in γ.
Each shape constrained term mkis represented by a model
matrix of the form XΣand corresponding coefficient vector.
Identifiability constraints are absorbed by the column center-
ing constraints. The model matrices for all the mkare then
combined so that we can write
k
mk(xki)=Mi˜
β,
where Mis a model matrix and ˜
βis a vector containing a
mixture of model coefficients (βi) and exponentiated model
123
548 Stat Comput (2015) 25:543–559
coefficients (expi)). The penalties in this case are quadratic
in the coefficients β(not in the ˜
β).
So (1) becomes
gi)=Aiθ+Fiγ+Mi˜
β,YiEFi).
For fitting purposes we may as well combine the model matri-
ces column-wise into one model matrix X, and write the
model as
gi)=Xi˜
β,(2)
where ˜
βhas been enlarged to now contain θ,γand the orig-
inal ˜
β. Similarly there is a corresponding expanded model
coefficient vector βcontaining θ,γand the original β.The
penalties on the terms have the general form βTSλβwhere
Sλ=kλkSk, and the Skare the original penalty matri-
ces expanded with zeros everywhere except for the elements
which correspond to the coefficients of the kth smooth.
3.2 SCAM coefficient estimation
Now consider the estimation of βgiven values for the
smoothing parameters λ. The exponential family chosen
determines the form of the log likelihood l(β)of the model,
and to control the degree of model smoothness we seek to
maximize its penalized version
lp(β)=l(β)βTSλβ/2.
However, the non-linear dependence of X˜
βon βmakes this
more difficult than in the case of unconstrained GAMs. In par-
ticular we found that optimization via Fisher scoring caused
convergence problems for some models, and we therefore
use a full Newton approach. The special structure of the
model means that it is possible to work entirely in terms
of a matrix square root of the Hessian of l, when applying
Newton’s method, thereby improving the numerical stabil-
ity of computations, so we also adopt this refinement. Also,
since SCAM is very much within GAM theory, the same
convergence issues might arise as in the case of GAM/GLM
fitting (Wood 2006a). In particular, the likelihood might not
be uni-modal and the process may converge to different esti-
mates depending on the starting values of the fitting process.
However, if the initial values are reasonably selected then it
is unlikely that there will be major convergence issues. The
following algorithm suggests such initial values.
Let V(μ) be the variance function for the model’s expo-
nential family distribution, and define
α(μi)=1+(yiμi)Vi)
Vi)+gi)
gi).
Penalized likelihood maximization is then achieved as fol-
lows:
1. To obtain an initial estimate of β, minimize g(y)
X˜
β2+˜
βTSλ˜
βw.r.t. ˜
β,subject to linear inequality con-
straints ensuring that ˜
βj>0 whenever ˜
βj=expj).
This is a standard quadratic programming (QP) problem.
(If necessary yis adjusted slightly to avoid infinite g(y).)
2. Set k = 0 and repeat the steps 3–11 to convergence…
3. Evaluate zi=(yiμi)gi)/α(μi)and wi=
ωiα(μi)/{Vi)g2i)},using the current estimate of
μi.
4. Evaluate vectors ˜
w=|w|and ˜
zwhere ˜zi=sign(wi)zi.
5. Evaluate the diagonal matrix Csuch that Cjj =1if
˜
βj=βj, and Cjj =expj)otherwise.
6. Evaluate the diagonal matrix Esuch that Ejj =0if ˜
βj=
βj, and Ejj =n
iwigi)[XC]ij(yiμi)/α(μi)oth-
erwise.
7. Let Ibe the diagonal matrix such that I
ii =1ifwi<0
and I
ii =0 otherwise.
8. Letting ˜
Wdenote diag(˜
w), form the QR decomposition
˜
WXC
B=QR,
where Bis any matrix square root such that BTB=Sλ.
9. Letting Q1denote the first nrows of Q, form the sym-
metric eigen-decomposition
QT
1IQ1+RTER1=UΛUT.
10. Hence define P=R1U(IΛ)1/2and K=Q1U(I
Λ)1/2.
11. Update the estimate of βas β[k+1]=β[k]+PKT˜
W˜
z
PPTSλβ[k]and increment k.
The algorithm is derived in Appendix 1 which shows sev-
eral similarities to a standard penalized IRLS scheme for
penalized GLM estimation. However, the more complicated
structure results from the need to use full Newton, rather than
Fisher scoring, while at the same time avoiding computation
of the full Hessian, which would approximately square the
condition number of the update computations.
Two refinements of the basic iteration may be required.
1. If the Hessian of the log likelihood is indefinite then
step 10 will fail, because some Λii will exceed 1. In this
case a Fisher update step must be substituted, by setting
α(μi)=1.
2. There is considerable scope for identifiability issues to
hamper computation. In common with unconstrained
GAMs, flexible SCAMs with highly correlated covari-
ates can display co-linearity problems between model
coefficients, which require careful handling numerically,
in order to ensure numerical stability of the estimation
123
Stat Comput (2015) 25:543–559 549
algorithms. An additional issue is that the non-linear con-
straints mean that parameters can be poorly identified on
flat sections of a fitted curve, where βis simply ‘very
negative’, but the data contain no information on how
negative. So steps must be taken to deal with unidentifi-
able parameters. One approach is to work directly with
the QR decomposition to calculate which coefficients are
unidentifiable at each iteration and to drop these, but a
simpler strategy substitutes a singular value decomposi-
tion for the R factor at step 8 if it is rank deficient, so
that
R=UDVT.
Then we set Q=QU,R=DVT,and Q1is the first n
rows of Q,and everything proceeds as before, except for
the inversion of R. We now substitute the pseudoinverse
R=VD, where the diagonal matrix Dis such that
D
jj =0 if the singular value Djj is ‘too small’, but
otherwise D
jj =1/Djj. ‘Too small’ is judged relative to
the largest singular value D11 multiplied by some power
(in the range .5 to 1) of the machine precision. If all
parameters are numerically identifiable then the pseudo-
inverse is just the inverse.
3.3 SCAM smoothing parameter estimation
We propose to estimate the smoothing parameter vector
λby optimizing a prediction error criterion such as AIC
(Akaike 1973)orGCV(Craven and Wahba 1979). The model
deviance is defined in the standard way as
D(ˆ
β)=2{lmax l(ˆ
β)}φ,
where lmax is the saturated log likelihood. When the scale
parameter is known we find λwhich minimizes Vu=D(ˆ
β)+
2φγτ, where τis the effective degrees of freedom (edf) of
the model. γis a parameter that in most cases has the value
of 1,but is sometimes increased above 1 to obtain smoother
models [see Kim and Gu (2004)]. When the scale parameter
is unknown we find λminimizing the GCV score, Vg=
nD(ˆ
β)/(nγτ)
2.For both criteria the dependence on λis
via the dependence of τand ˆ
βon λ[see Hastie and Tibshirani
(1990) and Wood (2008) for further details].
The edf can be found, following Meyer and Woodroofe
(2000)as
τ=
n
i=1
ˆμi
yi=tr(KKTL+), (3)
where L+is the diagonal matrix such that
L+
ii =α(μi)1,if wi0
α(μi)1,otherwise.
Details are provided in Appendix 2.
Optimization of the Vw.r.t. ρ=log(λ)can be achieved
by a quasi-Newton method. Each trial ρvector will require a
Sect. 3.2 iteration to find the corresponding ˆ
βso that the cri-
terion can be evaluated. In addition the first derivative vector
of Vw.r.t. ρwill be required, which in turn requires ˆ
β/∂ρ
and ∂τ/∂ρ.
As demonstrated in Supplementary material, S.3, implicit
differentiation can be used to obtain
ˆ
β
∂ρk=−λkPPTSkˆ
β.
The derivatives of Dand τthen follow, as S.4 (Supplemen-
tary material), shows in tedious detail.
3.4 Interval estimation
Having obtained estimates ˆ
βand ˆ
λ, we have point estimates
for the component smooth functions of the model, but it is
usually desirable to obtain interval estimates for these func-
tions as well. To facilitate the computation of such intervals
we seek distributional results for the ˜
β, i.e. for the coefficients
on which the estimated functions depend linearly.
Here we adopt the Bayesian approach to interval estima-
tion pioneered in Wahba (1983), but following Silverman’s
(1985) formulation. Such intervals are appealing following
Nychka’s (1988) analysis showing that they have good fre-
quentist properties by virtue of accounting for both sam-
pling variability and smoothing bias. Specifically, we view
the smoothness penalty as equivalent to an improper prior
distribution on the model coefficients
βN(0,S
λ/(2φ)),
where S
λis the Moore–Penrose pseudoinverse of Sλ=
kλkSk. In conjunction with the model likelihood, Bayes
theorem then leads to the approximate result
˜
β|yN(ˆ
˜
β,V˜
β), (4)
where V˜
β=C(CTXTWXC+Sλ)1Cφ, and Wis the diago-
nal matrix of wicalculated with α(μi)=1. Supplementary
material, S.5, derives this result. The deviance or Pearson
statistic divided by the effective residual degrees of freedom
provides an estimate of φ, if required. To use the result we
condition on the smoothing parameter estimates: the inter-
vals display surprisingly good coverage properties despite
this (Marra and Wood (2012), provide a theoretical analysis
which partly explains this).
123
550 Stat Comput (2015) 25:543–559
x1
x2
m1
−1 0 1 2 3
−3 −2 −1 0 1 2 3
0.0 0.4 0.80.0 1.0 2.0 3.0
f2
Fig. 3 Shape of the functions used for the simulation study
4 Simulated examples
4.1 Simulations: comparison with alternative methods
In this section performance comparison with unconstrained
GAM and the QP approach to shape preserving smoothing
(Wood 1994) is illustrated on a simulated example of an addi-
tive model with a mixture of monotone and unconstrained
smooth terms. All simulation studies and data applications
were performed using R packages scam, which implements
the proposed SCAM approach, and mgcv for GAM and QP
implementations. A more extensive simulation study is given
in Supplementary material, S.6. Particularly, the first subsec-
tion of S.6 shows comparative study with the constrained
P-spline regression (Bollaerts et al. 2006), monotone piece-
wise quadratic splines of Meyer (2008), and shape-restricted
penalized B-splines of Meyer (2012) on simulated example
on univariate single smooth term models. Since there was
no mean square error advantage of these approaches over the
SCAM for the univariate model, and moreover, the direct grid
search for multiple optimal smoothing parameter is computa-
tional expensive, (and to the authors’ knowledge, R routines
for the implementation of these methods are not freely avail-
able) the comparison for multivariate and additive examples
were performed only with the unconstrained GAM and QP
approach.
The following additive model is considered:
gi)=m1(x1i)+f2(x2i), E(Yi)=μi,(5)
where YiNi2)or Poii)distribution. Figure 3illus-
trates the graphs of the true functions used for this study.Their
analytical expressions are given in Supplementary material,
S.4.
The covariate values, x1iand x2i,were simulated from
the uniform distribution on [−1,3]and [−3,3]respectively.
For the Gaussian data the values of σwere 0.05, 0.1, 0.2,
which gave the signal to noise ratios of about 0.97, 0.88, and
0.65. For the Poisson model the noise level was controlled
by multiplying gi)by d, taking values 0.5, 0.7, 1.2, which
resulted in the signal to noise ratios of about 0.58, 0.84, and
0.99. For the SCAM implementation a cubic SCOP-spline of
the dimension 30 was used to represent the first monotonic
smooth term and a cubic P-spline with q=15 for the second
unconstrained term. For an unconstrained GAM, P-splines
with the same basis dimensions were used for both model
components. The models were fitted by penalized likelihood
maximization with the smoothing parameter selected using
Vgin the Gaussian case and Vufor the Poisson case.
For implementing the QP approach to monotonicity pre-
serving constraint, we approximated the necessary and suf-
ficient condition f(x)0, via the standard technique (Vil-
lalobos and Wahba 1987) of using a fine grid of linear con-
straints (f(x
i)0,i=1,...,n), where x
iare spread
evenly through the range of x(strictly such constraints are
necessary, but only sufficient as n→∞, but in practice
we observed no violations of monotonicity). Cubic regres-
sion spline bases were used here together with the inte-
grated squared second order derivative of the smooth as
the penalty. The model fit is obtained by setting the QP
problem within a penalized IRLS loop given λchosen via
GCV/UBRE from unconstrained model fit. Cubic regression
splines tend to have slightly better MSE performance than P-
splines (Wood 2006a) and moreover, the conditions built on
finite differences are not only sufficient but also necessary for
monotonicity. So this is a challenging test for SCAM. Three
hundred replicates were produced for Gaussian and Poisson
distributions at each of three levels of noise and for two sam-
ple sizes, 100, 200, for the three alternative approaches.
The simulation results for the Gaussian data are illustrated
in Fig. 4. The results show that SCAM works better than the
other two alternative methods in the sense of MSE perfor-
mance. Note that the performance of GAM was better than
the performance of the QP approach in this case, but the
difference in MSE between SCAM and GAM is much less
than that in the one-dimensional simulation studies shown in
Supplementary material, S.6. Also it is noticeable that GAM
reconstructed the truth better than the QP method. The expla-
nation may be due to there being only one monotonic term,
and both GAM and SCAM gave similar fits for the uncon-
strained term, f2.At lower noise levels GAM might also
be able to reconstruct the monotone shape of m1for some
replicates. The results also suggest that the SCAM works
better than GAM for greater levels of noise which seems
to be natural since at lower noise levels the shapes of con-
strained terms can be captured by the unconstrained GAM.
The reduction in performance of the QP compared to GAM
was due to the smoothing parameter estimation from the
unconstrained fit which sometimes resulted in less smooth
123
Stat Comput (2015) 25:543–559 551
MSE difference
−2
−1
0
1
2
3
.05
n=100
.1
n=100
g−mg qp−mg g−mg qp−mg g−mg qp−mg
.2
n=100
.05
n=200
.1
n=200
−2
−1
0
1
2
3
.2
n=200
Fig. 4 MSE comparisons between SCAM (mg), GAM (g), and
quadratic programming (qp) approaches for the Gaussian distribution
for each of three noise levels. The upper panel illustrates the results for
n=200,the lower for n=100.Boxplots show the distributions of dif-
ferences in relative MSE between each alternative method and SCAM.
300 replicates were used. Relative MSE was calculated by dividing the
MSE value by the average MSE of SCAM for the given case
tails of the smooth term than those of the unconstrained
GAM. For the Poisson data of the samples size n=100
all three methods worked similarly, but with an increase in
sample size SCAM outperformed the other two approaches
(plots are not shown). As in the Gaussian case the uncon-
strained GAM worked better than QP.
The simulation studies show that SCAM may have prac-
tical advantages over the alternative methods considered. It
is computationally slower than GAM and QP approaches,
however, obviously GAM cannot impose monotonicity, and
the selection of the smoothing parameter for SCAM is well
founded, in contrast to the ad hoc method used with QP of
choosing λfrom an unconstrained fit, and then refitting sub-
ject to constraint. Finally, the practical MSE performance of
SCAM seems to be better than that of the alternatives con-
sidered here.
4.2 Coverage probabilities
The proposed Bayesian approach for confidence intervals
construction makes a number of key assumptions; (i) it uses
linear approximation of the exponentiated parameters, and
in the case of non-Gaussian models adopts large sample
inference; (ii) the smoothing parameters are treated as fixed.
The simulation example of the previous subsection is used
in order to examine how these restrictions affect the per-
formance of the confidence intervals. The realized coverage
probabilities is taken as a measure of their performance. Sup-
plementary material, S.7, demonstrates two other examples
for more thorough confidence interval performance presen-
tation.
The simulation study of confidence interval performance
is conducted in an analogous manner to Wood (2006b). Sam-
ples of sizes n=200 and 500 were generated from (5)for
Gaussian and Poisson distributions. 500 replicates were pro-
duced for both distributions at each of three levels of noise
and for two sample sizes. For each replicate the realized cov-
erage proportions were calculated as the proportions of the
values of the true functions (at each of the covariate val-
ues) falling within the constructed confidence interval. Three
confidence levels were considered 90, 95, 99 %. An over-
all mean coverage probability and its standard error were
obtained from the 500 ‘across-the-function’ coverage pro-
portions. The results of the study are presented in Fig. 5
for the Gaussian and Poisson models. The realized coverage
probabilities are near the corresponding nominal values, the
larger sample size reduces the standard errors as expected.
The results for the Poisson models are quite good with an
123
552 Stat Comput (2015) 25:543–559
Gaussian
all m1f2
Poisson
0.05 0.1 0.2 0.05 0.1 0.2 0.05 0.1 0.2 0.5 0.7 1.2 0.5 0.7 1.2 0.5 0.7 1.2
all m1f2
0.8 0.85 0.9 0.95 10.8 0.85 0.9 0.95 1
0.8 0.85 0.9 0.95 1 0.8 0.85 0.9 0.95 1
Fig. 5 Realized coverage probabilities for confidence intervals from
the SCAM simulation study of the first example, for normal and Pois-
son data for n=200 (top panel)andn=500 (bottom panel). Three
noise levels are used for each smooth term and for the overall model
(“all”). The nominal coverage probabilities of 0.90, 0.95, and 0.99, are
shown as horizontal dashed lines.‘’ indicates the average realized
coverage probabilities over 500 replicate data sets. Vertical lines show
twice standard error intervals of the mean coverage probabilities
exception for the first monotone smooth, m1(x1),forthelow
signal strength, which may be explained by the fact that the
optimal fit inclines toward a straight line model (Marra and
Wood 2012).
5 Examples
This section presents the application of SCAM to two dif-
ferent data sets. The purpose of the first application is to
investigate whether proximity to municipal incinerators in
Great Britain is associated with increased risk of stomach
cancer (Elliott et al. 1996;Shaddick et al. 2007). It is hypoth-
esized that the risk of cancer is a decreasing function of dis-
tance from an incinerator. The second application uses data
from the National Morbidity, Mortality, and Air Pollution
Study (Peng and Welty 2004). The relationship between daily
counts of mortality and short-term changes in air pollution
concentrations is investigated. It is assumed that increases in
concentrations of ozone, sulphur dioxide, particular matter
will be associated with adverse health effects.
Incinerator data: Elliott et al. (1996) presented a large-
scale study to investigate whether proximity to incinerators
is associated with an increased risk of cancer. They analyzed
data from 72 municipal solid waste incinerators in Great
Britain and investigated the possibility of a decline in risk
with distance from sources of pollution for a number of can-
cers. There was significant evidence for such a decline for
stomach cancer, among several others. Data from a single
incinerator from those 72 sources, located in the northeast
of England, are analyzed using the SCAM approach in this
section. This incinerator had a significant result indicating a
monotone decreasing risk with distance (Elliott et al. 1996).
The data are from 44 enumeration districts (census-
defined administrative areas), ED, whose geographical cen-
troids lay within 7.5 km of the incinerator. The response
variable, Yi,are the observed numbers of cases of stomach
cancer for each enumeration district. Associated estimates of
the expected number of cases, Ei,available for risk determi-
nation, riski=Yi/Ei, obtained using national rates for the
whole of Great Britain, standardized for age and sex, were
calculated for each ED. The two covariates are the distance
(km), disti,from the incinerator and a deprivation score,
the Carstairs score, csi.
Under the model, it is assumed that, Yiare independent
Poisson variables, YiPoii), where μi=λiEi
iis the
rate of the Poisson distribution with Eithe expected number
of cases (in area i) and λithe relative risk. Shaddick et al.
(2007) proposed a model under which the effect of a covari-
ate, e.g., distance, on cancer risk was linear through an expo-
nential function, i.e. λi=exp0+β1disti). Since the risk
of cancer might be expected to decrease with distance from
the incinerator, in this paper a smooth monotonically decreas-
ing function, m(disti), is suggested for modelling its rela-
123
Stat Comput (2015) 25:543–559 553
Fig. 6 The estimated smooth
and cancer risk function for
monotone and unconstrained
versions of model 1 (incinerator
data). athe estimated smooth of
SCAM + 95 % confidence
interval; bthe SCAM estimated
risk as the function of distance;
cthe GAM estimated smooth +
95 % confidence interval; dthe
GAM estimated risk as the
function of distance. Points
show the observed data. As
noted in the text, AIC suggests
that the shape constrained model
(a,b) is better than the
unconstrained version (c,d)
a
dist
s(dist,2.28)
b
dist
risk
dist
s(dist,4.43)
c
01234567
−1.0 0.0 1.0 2.0
01234567
0123456
01234567
−1.5 −0.5 0.5 1.5
01234567
0123456
d
dist
risk
tionship with the distance λi=exp {m(disti)}.Hence, the
model can be represented as the following:
logi)=m(disti)log (μi/Ei)=m(disti)
logi)=log(Ei)+m(disti),
which is a single smooth generalized Poisson regression
model under monotonicity constraint, where log(Ei)is
treated as an offset (a variable with a coefficient equal to
1). Therefore, the SCAM approach can be applied to fit such
a model. Carstairs score is known to be a good predictor of
cancer rates (Elliott et al. 1996;Shaddick et al. 2007), so its
effect may also be included in the model. The following four
models are considered for this application.
Model 1: log {E(Yi)}=log(Ei)+m1(disti), m
1
(disti)<0.Model 2 is the same as model 1 but with
m2(csi)as its smooth term instead with m
2(csi)>0.
Model 3 combines both smooths while model 4 takes a bivari-
ate function m3(disti,csi)subject to double monotone
increasing constraint. The univariate smooth terms were rep-
resented by the third order SCOP-splines with q=15,while
q1=q2=6 were used for the bivariate SCOP-spline.
Plots for assessing the suitability of model one are given
in Supplementary material, S.8. The first model for compar-
ison has been also fitted without constraint. The estimated
smooths and risk functions for both methods are illustrated in
Fig. 6. The estimate of the cancer risk function was obtained
by ˆ
riskiμi/Ei=exp ˆm1(disti).Note, that the
unconstrained GAM resulted in a non-monotone smooth,
which supports the SCAM approach. The AIC score allows
us to compare models with and without shape constraints.
The AIC values were 152.35 for GAM and 150.57 for SCAM
which favoured the shape constrained model.
In model 2 the number of cases of stomach cancer are
represented by a smooth function of deprivation score. This
function is assumed to be monotonically increasing since it
was shown (Elliott et al. 1996) that in general, people living
closer to incinerators tend to be less affluent (low Carstairs
score). The AIC value for this model was 155.59, whereas
the unconstrained version gave AIC = 156.4, both of which
were higher than for the previous model. The other three
measures of the model performance, Vu,the adjusted r2,
and the deviance explained, also gave slightly worse results
than those seen in model 1.
Model 3 incorporates both covariates, dist and cs,
assuming an additive effect on log scale. The estimated edf of
m2(cs)was about zero. This smoothing term was insignifi-
cant in this model, with all its coefficients near zero. This can
be explained by a high correlation between two covariates.
Considering a linear effect of Carstairs in place of the smooth
function, m2,as it was proposed in Shaddick et al. (2007),
log {E(Yi)}=log(Ei)+m1(disti)+βcsi,also resulted
in an insignificant value for β.
The bivariate function, m3(disti,csi), is considered
in the last model. The perspective plot of the estimated
smooth is illustrated in Fig. 7. This plot also supports the
previous result, that the Carstairs score does not provide any
additional information for modelling cancer risk when dis-
tance is included in the model. The graph of the estimated
smooth has almost no increasing trend with respect to the sec-
ond covariate. The measures of the model performance, such
as Vu, adjusted r2,and the percentage of deviance explained
were not as good as for the first simple model 1. The equiv-
123
554 Stat Comput (2015) 25:543–559
Carstairs
dist
s(Carstairs,dist,3)
Fig. 7 Perspective plot of the estimated bivariate smooths of model 4
(incinerator data)
alent model without shape constraints resulted in the AIC =
157.35, whereas the AIC score for SCAM was 155.4. Hence,
the AIC best selected model is the simple shape constrained
model which only includes distance.
Air pollution data: The second application investigates
the relationship between non-accidental daily mortality and
air pollution. The data were from the National Morbidity,
Mortality, and Air Pollution Study (Peng and Welty 2004)
which contains 5,114 daily measurements on different vari-
ables for 108 cities within the United States. As an example a
single city (Chicago) study was examined in Wood (2006a).
The response variable was the daily number of deaths in
Chicago (death) for the years 1987–1994. Four explana-
tory variables were considered: average daily temperature
(tempd), levels of ozone (o3median), levels of particu-
late matter (pm10median), and time. Since it might be
expected that increased mortality will be associated with
increased concentrations of air pollution, modelling with
SCAM may prove useful.
The preliminary modelling and examination of the data
showed that the mortality rate at a given day could be better
predicted if the aggregated air pollution levels and aggre-
gated mean temperature were incorporated into the model,
rather than levels of pollution and temperature on the day in
question (Wood 2006a). It was proposed that the aggrega-
tion should be the sum of each covariate (except time), over
the current day and three preceding days. Hence, the three
aggregated predictors are as follows
tmpi=
i
j=i3
tempdj,o3i=
i
j=i3
o3medianj,
pm10i=
i
j=i3
pm10medianj.
Assuming that the observed numbers of daily death are
independent Poisson random variables, the following addi-
tive model structure can be considered
Model 1: log {E(deathi)}=f1(timei)+m2(pm10i)+
m3(o3i)+f4(tmpi),
where monotonically increasing constraints are assumed on
m2and m3,since increased air pollution levels are expected
to be associated with increases in mortality. The plots for
assessing the suitability of this model together and the plots of
the smooth estimates are illustrated in Supplementary mate-
rial, S.8. This model indicates that though the effect of the
ozone level is only with one degree of freedom, it is positive
and increasing. The rapid increase in the smooth of aggre-
gated mean temperature can be explained by the four high-
est daily death rates occurring on four consecutive days of
very high temperature, which also experienced high levels of
ozone (Wood 2006a).
Since the combination of high temperatures together with
high levels of ozone might be expected to result in higher
mortality, we consider a bivariate smooth of these predictors.
The following model is now considered
Model 2: log {E(deathi)}=f1(timei)+m2(pm10i)+
m3(o3i,tmpi),
where m2(pm10i)is a monotone increasing function and
m3(o3i,tmpi)is subject to single monotonicity along the
first covariate. The diagnostic plots of this model showed a
slight improvement in comparison to the first model (Sup-
plementary material, S.8). The estimates of the univari-
ate smooths and perspective plot of the estimated bivariate
smooth of model 2 are illustrated in Fig. 8. The second model
also has a lower Vuscore which implies that model 2 is a
preferable model.
The current approach has been applied to air pollution
data for Chicago just for demonstration purpose. It would
be of interest to apply the same model to other cities, to see
whether the relationship between non-accidental mortality
and air pollution can be described by the proposed SCAM in
other locations.
6 Discussion
In this paper a framework for generalized additive mod-
elling with a mixture of unconstrained and shape restricted
smooth terms, SCAM, has been presented and evaluated on
123
Stat Comput (2015) 25:543–559 555
time
s(time,138.61)
−2000 −1000 0 1000 2000
−0.3 −0.2 −0.1 0.0 0.1 0.2
10 12 14 16 18
−0.3 −0.2 −0.1 0.0 0.1 0.2
pm10
s(pm10,5.99)
o3
tmp
s(o3,tmp,18.58)
Fig. 8 The estimates of the smooth terms of model 2 (air pollution data). A cubic regression spline was used for f1with q=200,SCOP-spline
of the third order with q=10 for m2,and bivariate SCOP-spline with the marginal basis dimensions q1=q2=10 for m3
a range of simulated and real data sets. The motivation of
this framework is an attempt to develop general methods
for estimating SCAMs similar to that of a standard uncon-
strained GAM. SCAM models allow inclusion of multiple
unconstrained and shape constrained smooths of both uni-
variate and multi-dimensional type which are represented
by the proposed SCOP-splines. It should be mentioned that
the shape constraints were assured by the sufficient but not
necessary condition for the cubic and higher order splines.
However, this condition for the cubic splines is equivalent
to that of Fritsch and Carlson (1980) who showed that the
sufficient parameter space constitutes the substantial part of
the necessary parameter space (see their Fig. 2, p. 242). Also
the sensitivity analysis of Brezger and Steiner (2008)onan
empirical application models defends the point that the suf-
ficient condition is not highly restrictive.
Since a major challenge of any flexible regression method
is its implementation in a computationally efficient and sta-
ble manner, numerically robust algorithms for model esti-
mation have been presented. The main benefit of the pro-
cedure is that smoothing parameter selection is incorpo-
rated into the SCAM parameter estimation scheme, which
also produces interval estimates at no additional cost. The
approach has the O(nq2)computational cost of standard
penalized regression spline based GAM estimation, but typ-
ically involves 2–4 times as many O(nq2)steps because
of the additional non-linearities required for the monotonic
terms, and the need to use Quasi-Newton in place of full
Newton optimization. However, in contrast to the ad hoc
methods of choosing the smoothing parameter used in other
approaches, smoothing parameter selection for SCAMs is
well founded. It should also be mentioned that although
the simulation free intervals proposed in this paper show
good coverage probabilities it might be of interest to see
whether Bayesian confidence intervals derived from pos-
terior distribution simulated via MCMC would give better
results.
Acknowledgments The incinerator data were provided by the Small
Area Health Statistics Unit, a unit jointly funded by the UK Department
of Health, the Department of the Environment, Food and Rural Affairs,
Environment Agency, Health and Safety Executive, Scottish Executive,
National Assembly for Wales, and Northern Ireland Assembly. The
authors are grateful to Jianxin Pan and Gavin Shaddick for useful dis-
cussions on several aspects of the work. The authors are also grateful
for the valuable comments and suggestions of two referees and an asso-
ciated editor. NP was funded by EPSRC/NERC grant EP/1000917/1.
123
556 Stat Comput (2015) 25:543–559
Open Access This article is distributed under the terms of the Creative
Commons Attribution License which permits any use, distribution, and
reproduction in any medium, provided the original author(s) and the
source are credited.
Appendix
Appendix 1: Newton method for penalized likelihood
estimation of SCAM
This appendix describes a full Newton (Newton–Raphson)
method for maximizing the penalized likelihood of a SCAM.
The penalized log likelihood function to be maximized w.r.t.
βis
lp(β)=l(β)βTSλβ/2,
where the log likelihood of βcan be written as
l(β)=
n
i=1
[{yiθibii)}/ai(φ) +ci, yi)],(6)
where ai,bi, and ciare arbitrary functions, φan arbitrary
‘scale’ parameter, and θia ‘canonical parameter’ of the dis-
tribution related to the linear predictor via the relationship
E(Yi)=bi)(Wood 2006a). While the functions ai,bi,
and cimayvarywithi,the scale parameter φis assumed to
be constant for all observations.
The distribution parameters θidepend on the model coef-
ficients βjvia the link between the mean of Yiand θi,
E(Yi)=b
ii). Recall that the smoothing parameter vec-
tor λis considered to be fixed while estimating β. Consider
only cases where ai(φ) =φ/ωi,and ωiis a known constant,
which usually equals 1.Almost all probability distributions
of interest from the exponential family are covered by such
a limitation. Then
l(β)=
n
i=1
[ωi{yiθibii)}+ci(φ, yi)]
and the first order derivative of l(β)w.r.t. βjis
lp
∂β j=1
φ
n
i=1
ωiyi∂θi
∂β jb
ii)∂θi
∂β jSλjβ,
where (for this appendix only) Sλj=kλkSkj while Skj is
the jth row of the matrix Sk,and
∂θi
∂β j=∂θi
∂μi
∂μi
∂β j.
Taking the first order derivatives from the both sides of
the linking equation E(Yi)=b
ii), we get
∂μi
∂θi=b
ii)∂θi
∂μi=1
b
ii),
lp
∂β j=1
φ
n
i=1yib
ii)
b
ii)/ωi
∂μi
∂β jSλjβ.(7)
Since gi)=Xi˜
β,then
gi)∂μi
∂β j=[X]ij if ˜
βj=βj,
gi)∂μi
∂β j=[X]ij expj)otherwise.
Hence
∂μi
∂β j=[X]ij
gi)if ˜
βj=βj,
∂μi
∂β j=[X]ij expj)
gi)otherwise.
Another key point of the exponential family concerns the
variance
var(Yi)=b
ii)ai(φ) =b
ii)φ/ωi,
which is represented in the theory of GLMs in terms of μi
as var(Yi)=Vi)φ, where Vi)=b
ii)/ωi.
Let Gand W1be n×ndiagonal matrices with the diagonal
elements Gi=gi)and
w1i=ωi
Vi)g2i),
and let Cbe a q×qdiagonal matrix such that
Cjj =1,if ˜
βj=βj
expj), otherwise.
Then a penalized score vector may be written as
up(β)=lp
β=1
φCTXTW1G(yμ)Sλβ.(8)
To find the working model parameters estimates, ˆ
β,one
needs to solve up(β)=0.These equations are non-linear
and have no analytical solution, so some numerical meth-
ods should be applied. In the case of unconstrained GAM
the penalized iteratively reweighed least squares (P-IRLS)
scheme based on Fisher scoring is used to solve these equa-
tions.
To proceed the Hessian of the log-likelihood function is
derived from (8)
H(β)=2lp
βjβk=−1
φCTXTWXC +1
φESλ,
(9)
123
Stat Comput (2015) 25:543–559 557
where Wis a diagonal matrix with
wi=ωiαi
Vi)g2i),and
αi=1+(yiμi)Vi)
Vi)+gi)
gi),(10)
Eis a q×qdiagonal matrix with
Ejj =0,if ˜
βj=βj
n
i=1wigi)[XC]ij(yiμi)/α(μi), otherwise.
Note that for the model with a canonical link function, the
second term of αiis equal to zero, since in this case
Vi)/Vi)+gi)/gi)=0.
Therefore, αi=1 and the matrices W1and Ware identical.
So, using the Newton method, if β[k]is the current estimate
of β,then the next estimate is
β[k+1]=β[k]
+C[k]TXTW[k]XC[k]E[k]+Sλ1
C[k]TXTW[k]
1G[k](yμ[k])Sλβ[k],(11)
where the scale parameter φis absorbed into the smoothing
parameter λ.
To use (11) directly for βestimation is not efficient since
explicit formation of the Hessian would square the condition
number of the working model matrix, WXC (Golub and
van Loan 1996). It should be noted that the Hessian matrix
also appears in an expression for the edf of the fitted model
(Appendix 2). In the case of the unconstrained model (Wood
2006a) a stable solution for ˆ
βis based on a QR decom-
position of WX augmented with B, where BTB=Sλ.
The same approach can be applied here for the shape con-
strained model, i.e. use a QR decomposition of the augmented
WXC.However, the values of Wcan be negative when a
non-canonical link function is assumed, so firstly, the issue
with these negative weights has to be handled.
The approach applied here is similar to that given in
Sect. 3.3 of Wood (2011). Let ˜
Wdenote a diagonal matrix
with the elements |wi|,and Wbe a diagonal matrix with
w
i=0,if wi0
wi,otherwise.
Then
CTXTWXC =CTXT˜
WXC 2CTXTWXC.
Now the QR decomposition may be used for the augmented
matrix,
˜
WXC
B=QR,(12)
where Qis a rectangular matrix with orthogonal columns,
and Ris upper triangular. Now let Q1be the first nrows of
Q,then ˜
WXC =Q1R.
Therefore
CTXTWXC +SλE=RTR2CTXTWXC E
=RTI2RTCTXTWXCR1RTER1R
=RTI2QT
1IQ1RTER1R,
where Iis an n×ndiagonal matrix with
I
i=0,if wi0
1,otherwise.
Note that several near non-identifiability issues can arise
here. In order to deal with unidentifiable parameters it is pro-
posed to use a singular value decomposition for the R factor
of the QR decomposition if it is rank deficient. This step is
described in Sect. 3.2.
The next step is to apply the eigen-decomposition
2QT
1IQ1+RTER1=UΛUT
which gives
CTXTWXC +SλE=RTIUΛUTR=RTU(IΛ)UTR.
Defining a vector zsuch that zi=(yiμi)gi)/α(μi)
and ˜
zwhere ˜zi=sign(wi)zi,then
β[k+1]=β[k]
+R1U(IΛ)1UTQT
1˜
W˜
zR1U(IΛ)1UTRTSλβ[k].
(13)
By denoting
P=R1U(IΛ)1/2and K=Q1U(IΛ)1/2(14)
(13) may be written as
β[k+1]=β[k]+PKT˜
W˜
zPPTSλβ[k].(15)
The last expression has roughly the square root of the con-
dition number of (11) for the unpenalized likelihood maxi-
mization problem, since the condition number of R1equals
the condition number of ˜
WXC.
Another refinement may be required in the last step. If
the Hessian of the log likelihood is indefinite then step in
expression (14) will fail because some Λii will exceed 1. To
avoid indefiniteness problem a Fisher update step must be
substituted by setting α(μi)=1 so that wi0 for any i,
then the QR decomposition is used as previously
WXC
B=QR,
123
558 Stat Comput (2015) 25:543–559
then β[k+1]=R1QT
1Wz,where z=G(yμ)+XCβ[k].
If there is an identifiability issue then the singular value
decomposition step is applied on the QR factor, R=UDVT,
resulting in
β[k+1]=VDQT
1Wz,
where Q1is the first nrows of QU.
Note that in case of the canonical link function αi=1for
any i, and therefore, ˜
W=W.
Appendix 2: SCAM degrees of freedom
An un-penalized model would have as many degrees of free-
dom as the number of unconstrained model parameters. How-
ever, the use of penalties decreases the number of degrees
of freedom so that a model with λ→∞would have the
degrees of freedom near 1.Using the concept of the diver-
gence of the maximum likelihood estimator, the edf of the
penalized fit can be found as (Meyer and Woodroofe 2000;
Wood 2001)
τ=div(ˆ
μ)=
n
i=1
yiˆμi(y).
Substituting (11) (Appendix 1) into the model gi)=Xi˜
β
and taking first-order derivatives with respect to yi,we
get
ˆμi
yi=XC CTXTWXC E+Sλ1CTXTW1ii
,
where the right-hand-side of this expression is the ith diag-
onal element of the matrix written in the square brack-
ets.
Therefore,
τ=tr(F), (16)
where
F=CTXTWXC E+Sλ1CTXTW1XC
and the matrices W,W1,C,and Eare evaluated at con-
vergence. Note that Fis the expected Hessian of l(β), pre-
multiplied by the inverse of the Hessian of lp(β).
Using the approach and notations of Appendix 1, τcan
also be obtained in a stable manner. Introducing n×ndiag-
onal matrices L+such that
L+
ii =α(μi)1,if wi0
α(μi)1,otherwise,
then the expression for the edf (16) becomes
tr(F)=tr {CTXTWXC E+Sλ}1CTXT˜
WL+˜
WXC
=tr PKTL+˜
WXC=tr KKTL+.(17)
References
Akaike, H.: Information theory and an extension of the maximum
likelihood principle. In: Petrov, B.N., Csaki, B.F. (eds.) Second
International Symposium on Information Theory. Academiai Kiado,
Budapest (1973)
Bollaerts, K., Eilers, P., van Mechelen, I.: Simple and multiple P-splines
regression with shape constraints. Br.J. Math. Stat. Psychol. 59, 451–
469 (2006)
Brezger, A., Steiner, W.: Monotonic regression based on Bayesian P-
splines: an application to estimating price response functions from
store-level scanner data. J. Bus. Econ. Stat. 26(1), 90–104 (2008)
Claeskens, G., Krivobokova, T., Opsomer, J.: Asymptotic properties of
penalized spline estimators. Biometrica 96(3), 529–544 (2009)
Craven, P., Wahba, G.: Smoothing noisy data with spline functions.
Numer. Math. 31, 377–403 (1979)
De Boor, C.: A Practical Guide to Splines. Cambridge University Press,
Cambridge (1978)
Dunson, D.: Bayesian semiparametric isotonic regression for count
data.J.Am.Stat.Assoc.100(470), 618–627 (2005)
Dunson, D., Neelon, B.: Bayesian inference on order-constrained para-
meters in generalized linear models. Biometrics 59, 286–295 (2003)
Eilers, P., Marx, B.: Flexible smoothing with B-splines and penalties.
Stat. Sci. 11, 89–121 (1996)
Elliott, P., Shaddick, G., Kleinschmidt, I., Jolley, D., Walls, P., Beres-
ford, J., Grundy, C.: Cancer incidence near municipal solid waste
incinerators in Great Britain. Br. J. Cancer 73, 702–710 (1996)
Fritsch, F., Carlson, R.: Monotone piecewise cubic interpolation. SIAM
J. Numer. Anal. 17(2), 238–246 (1980)
Golub, G., van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins
University Press, Baltimore (1996)
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman &
Hall, New York (1990)
He, X., Shi, P.: Monotone B-spline smoothing. J. Am. Stat. Assoc.
93(442), 643–650 (1998)
Holmes, C., Heard, N.: Generalized monotonic regression using random
change points. Stat. Med. 22, 623–638 (2003)
Kauermann, G., Krivobokova, T.,Fahrmeir, L.: Some asymptotic results
on generalized penalized spline smoothing. J. R. Stat. Soc. B 71(2),
487–503 (2009)
Kauermann, G., Opsomer, J.: Data-driven selection of the splinedimen-
sion in penalized spline regression. Biometrika 98(1), 225–230
(2011)
Kelly, C., Rice, J.: Monotone smoothing with application to dose-
response curves and the assessment of synergism. Biometrics 46,
1071–1085 (1990)
Kim, Y.-J., Gu, C.: Smoothing spline gaussian regression: more scal-
able computation via efficient approximation. J. R. Stat. Soc: Ser. B.
66(2), 37–356 (2004)
Lang, S., Brezger, A.: Bayesian P-splines. J. Comput. Graph. Stat. 13(1),
183–212 (2004)
Li, Y., Ruppert, D.: On the asymptotics of penalized splines. Biometrika
95(2), 415–436 (2008)
Marra, G., Wood, S.N.: Coverage properties of confidence intervals for
generalized additive model components. Scand. J. Stat. 39(1), 53–74
(2012)
Meyer, M.: Inference using shape-restricted regression splines. Ann.
Appl. Stat. 2(3), 1013–1033 (2008)
123
Stat Comput (2015) 25:543–559 559
Meyer, M.: Constrained penalized splines. Can. J. Stat. 40(1), 190–206
(2012)
Meyer, M., Woodroofe, M.: On the degrees of freedom in shape-
restricted regression. Ann. Stat. 28(4), 1083–1104 (2000)
Nychka, D.: Bayesian confidence intervals for smoothing splines. J.
Am. Stat. Assoc. 88, 1134–1143 (1988)
Peng, R., Welty, L.: The NMMAPSdata package. R News 4(2), 10–14
(2004)
Ramsay, J.: Monotone regression splines in action (with discussion).
Stat. Sci. 3(4), 425–461 (1988)
Rousson, V.: Monotone fitting for developmental variables. J. Appl.
Stat. 35(6), 659–670 (2008)
Ruppert, D.: Selecting the number of knots for penalized splines. J.
Comput. Graph. Stat. 11(4), 735–757 (2002)
Silverman, B.: Some aspects of the spline smoothing approach to non-
parametric regression curve fitting. J. R. Stat. Soc.: Ser. B. 47, 1–52
(1985)
Shaddick, G., Choo, L., Walker, S.: Modelling correlated count data
with covariates. J. Stat. Comput. Simul. 77(11), 945–954 (2007)
Villalobos, M., Wahba, G.: Inequality-constrained multivariate smooth-
ing splines with application to the estimation of posterior probabili-
ties. J. Am. Stat. Assoc. 82(397), 239–248 (1987)
Wahba, G.: Bayesian confidence intervals for the cross validated
smoothing spline. J. R. Stat. Soc: Ser. B. 45, 133–150 (1983)
Wang,J., Meyer, M.: Testing the monotonicity or convexity of a function
using regression splines. Can. J. Stat. 39(1), 89–107 (2011)
Wood, S.: Monotonic smoothing splines fitted by cross validation.
SIAM J. Sci. Comput. 15(5), 1126–1133 (1994)
Wood, S.: Partially specified ecological models. Ecol. Monogr. 71(1),
1–25 (2001)
Wood, S.: Stable and efficient multiple smoothing parameter estimation
for generalized additive models. J. Am. Stat. Assoc. 99, 673–686
(2004)
Wood, S.: Generalized Additive Models. An Introduction with R. Chap-
man & Hall, Boca Raton (2006a)
Wood,S.: On confidence intervals for generalized additive models based
on penalized regression splines. Aust. N. Z. J. Stat. 48(4), 445–464
(2006b)
Wood, S.: Fast stable direct fitting and smoothness selection for gener-
alized additive models. J. R. Stat. Soc. B 70(3), 495–518 (2008)
Wood, S.: Fast stable restricted maximum likelihood and marginal like-
lihood estimation of semiparametric generalized linear models. J. R.
Stat. Soc. B 73(1), 1–34 (2011)
Zhang, J.: A simple and efficient monotone smoother using smoothing
splines. J. Nonparametr. Stat. 16(5), 779–796 (2004)
123
... To obtain the desired shape with the original P-splines, Bollaerts, Eilers, and Mechelen (2006) suggested adding a multi-degree penalty term to penalize segments with undesired shapes. In contrast, Pya and Wood (2015) introduced a shape-restricted additive model that achieves shapes by implementing specific linear transformations on spline coefficients. Building on Bayesian P-splines, Brezger and Steiner (2012) presented a variation of Bayesian P-splines designed to ensure monotonicity. ...
... Despite these advancements, overfitting remains a challenge, and the selection of knots and degrees requires additional efforts. Pya and Wood (2015) proposed a method that employs reparametrization of the P-splines coefficients to generate a variety of shapes. From a Bayesian perspective, multi-layer transformations involving a large number of parameters can result in a highly complex posterior geometry space. ...
... In this paper, we integrate the ideas of Lang and Brezger (2004) and Pya and Wood (2015) to design a penalty scheme that penalizes regions where the shape does not conform to the desired characteristics. This approach seeks a balance between preserving the intended shape while allowing for adequate flexibility in the model. ...
Preprint
Degradation data are essential for determining the reliability of high-end products and systems, especially when covering multiple degradation characteristics (DCs). Modern degradation studies not only measure these characteristics but also record dynamic system usage and environmental factors, such as temperature, humidity, and ultraviolet exposures, referred to as the dynamic covariates. Most current research either focuses on a single DC with dynamic covariates or multiple DCs with fixed covariates. This paper presents a Bayesian framework to analyze data with multiple DCs, which incorporates dynamic covariates. We develop a Bayesian framework for mixed effect nonlinear general path models to describe the degradation path and use Bayesian shape-constrained P-splines to model the effects of dynamic covariates. We also detail algorithms for estimating the failure time distribution induced by our degradation model, validate the developed methods through simulation, and illustrate their use in predicting the lifespan of organic coatings in dynamic environments.
... P-splines with penalties (Eilers and Marx, 1996;Pya and Wood, 2015). Simulation results show that B-splines and P-splines in SWSR perform similarly. ...
Preprint
Full-text available
In confirmatory Phase 3 clinical trials with recruitment over the years, the underlying placebo effect may follow an unknown temporal trend. Taking a clinical trial on Hidradenitis Suppurativa (HS) as an example, fluctuations or variabilities are common in HS-related endpoints, mainly due to the natural disease characteristics, variations of evaluation from different physicians, and standard of care evolvement. The adjustment of time-varying placebo effects receives some attention in adaptive clinical trials and platform trials, but is usually ignored in traditional non-adaptive designs. However, under the impact of such a time drift, some existing methods may not simultaneously control the type I error rate and achieve satisfactory power. In this article, we propose SWSR (Semiparametric Weighted Spline Regression) to estimate the treatment effect with B-splines to accommodate the time-varying placebo effects nonparametrically. Our method aims to achieve the following three objectives: a proper type I error rate control under varying settings, an overall high power to detect a potential treatment effect, and robustness to unknown time-varying placebo effects. Simulation studies and a case study provide supporting evidence. Those three key features make SWSR an appealing option to be pre-specified for practical confirmatory clinical trials. Supplemental materials, including the R code, additional simulation results and theoretical discussion, are available online.
... We estimate the model at each 5% percentiles, from 5% to 95%. To extrapolate at the tails, we run a supplemental regression of a shape constrained additive model (scam package in R, Pya and Wood, 2015) imposing monotonicity of C with respect to a flexible spline of order 4 in η. From the estimated CCCs, we can recoverη for every observations. ...
Preprint
Full-text available
This paper develops a general framework for dynamic models in which individuals simultaneously make both discrete and continuous choices. The framework incorporates a wide range of unobserved heterogeneity. I show that such models are nonparametrically identified. Based on constructive identification arguments, I build a novel two-step estimation method in the lineage of Hotz and Miller (1993) and Arcidiacono and Miller (2011) but extended to simultaneous discrete-continuous choice. In the first step, I recover the (type-dependent) optimal choices with an expectation-maximization algorithm and instrumental variable quantile regression. In the second step, I estimate the primitives of the model taking the estimated optimal choices as given. The method is especially attractive for complex dynamic models because it significantly reduces the computational burden associated with their estimation compared to alternative full solution methods.
... To derive age-and/or sex-stratified CAD thresholds, we first fitted shape-constrained (monotonically 1 2 6 increasing) generalized additive models 26 for each subgroup, using Xpert result as the outcome and X-1 2 7 ray score as the explanatory variable. We then identified, for each age and/or sex subgroup, the score 1 2 8 ...
Preprint
Full-text available
Background: Computer-aided detection (CAD) software analyzes chest X-rays for features suggestive of tuberculosis (TB) and provides a numeric abnormality score. However, estimates of CAD accuracy for TB screening in general populations are hindered by the lack of confirmatory data among people with lower CAD scores, including those without symptoms. Additionally, the appropriate CAD score thresholds for obtaining further testing may vary according to population and client characteristics. Methods: We screened for TB in Ugandan individuals aged ≥15 years using portable chest X-rays with CAD (qXR v3). Participants were offered screening regardless of their symptoms. Those with X-ray scores above a pre-specified threshold of 0.1 (range, 0-1) were asked to provide sputum for Xpert Ultra testing. We estimated the diagnostic accuracy of CAD for detecting Xpert-positive TB when using the same threshold for all individuals (under different assumptions about TB prevalence among people with X-ray scores <0.1), and compared this estimate to age- and/or sex-stratified approaches. Findings: Of 52,835 participants screened for TB using CAD, 8,949 (16.9%) had X-ray scores ≥0.1. Of 7,219 participants with valid Xpert Ultra results, 382 (5.3%) were Xpert-positive, including 81 with trace results. Assuming 0.1% of participants with X-ray scores <0.1 would have been Xpert-positive if tested, qXR had an estimated AUC of 0.928 (95% confidence interval [CI] 0.910-0.943) for Xpert-positive TB. Stratifying CAD thresholds according to age and sex improved accuracy; for example, at 95.4% specificity, estimated sensitivity was 75.0% for a universal threshold (of ≥0.54) versus 77.1% for thresholds stratified by age and sex (p=0.032). Interpretation: The accuracy of CAD in screening for TB, among general populations irrespective of symptoms, is higher than previously estimated. Stratifying CAD thresholds based on client characteristics such as age and sex could further improve accuracy, enabling a more effective and personalized approach to TB screening. Funding: National Institutes of Health
... The event that a given taxon is significantly different between groups is a Bernoulli trial. To estimate statistical power for various combinations of log mean abundance and log fold change, we fitted a shape-constrained generalized additive model (GAM) [19]. The model predicting fold change as a function of log mean abundance is as follows: ...
Article
Full-text available
Identifying microbial taxa that differ in abundance between groups (control/treatment, healthy/diseased, etc.) is important for both basic and applied science. As in all scientific research, microbiome studies must have good statistical power to detect taxa with substantially different abundance between treatments; low power leads to poor precision and biased effect size estimates. Several studies have raised concerns about low power in microbiome studies. In this study, we investigate statistical power in differential abundance analysis. In particular, we present a novel approach for estimating the statistical power to detect effects at the level of individual taxa as a function of effect size (fold change) and mean abundance. We analyzed seven real case-control microbiome datasets and developed a novel method for simulating microbiome data. We illustrate how power varies with effect size and mean abundance; our results suggest that typical differential abundance studies are underpowered for detecting changes in individual taxon.
Article
Environmental exposures often exhibit temporal variability, prompting extensive research to understand their dynamic impacts on human health. There has been a growing interest in studying time‐dependent exposure mixtures beyond a single exposure. However, current analytic methods typically assess each exposure individually or assume an additive relationship. This paper aims to fill the gap in method development for evaluating the joint effects of multiple time‐dependent exposures on a scalar outcome. We introduce a dynamic single‐index scalar‐on‐function model to characterize the exposure mixture's time‐varying effect through a non‐parametric bivariate exposure‐time‐outcome surface function. Utilizing B‐spline tensor product bases to approximate the surface function, we propose a profiling algorithm for model estimation and establish large‐sample properties for the resulting single‐index estimators. In addition, we introduce a non‐parametric hypothesis testing procedure to determine whether the surface function varies over time at each fixed mixture level and a model averaging solution to circumvent the issue of knot selection for spline approximations. The performance of our proposed methods is examined through extensive simulations and further illustrated using real‐world applications.
Book
This book is based on the author's experience with calculations involving polynomial splines. It presents those parts of the theory which are especially useful in calculations and stresses the representation of splines as linear combinations of B-splines. After two chapters summarizing polynomial approximation, a rigorous discussion of elementary spline theory is given involving linear, cubic and parabolic splines. The computational handling of piecewise polynomial functions (of one variable) of arbitrary order is the subject of chapters VII and VIII, while chapters IX, X, and XI are devoted to B-splines. The distances from splines with fixed and with variable knots is discussed in chapter XII. The remaining five chapters concern specific approximation methods, interpolation, smoothing and least-squares approximation, the solution of an ordinary differential equation by collocation, curve fitting, and surface fitting. The present text version differs from the original in several respects. The book is now typeset (in plain TeX), the Fortran programs now make use of Fortran 77 features. The figures have been redrawn with the aid of Matlab, various errors have been corrected, and many more formal statements have been provided with proofs. Further, all formal statements and equations have been numbered by the same numbering system, to make it easier to find any particular item. A major change has occured in Chapters IX-XI where the B-spline theory is now developed directly from the recurrence relations without recourse to divided differences. This has brought in knot insertion as a powerful tool for providing simple proofs concerning the shape-preserving properties of the B-spline series.
Article
Summary. Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton–Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike's information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike's information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike's information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.
Book
The first edition of this book has established itself as one of the leading references on generalized additive models (GAMs), and the only book on the topic to be introductory in nature with a wealth of practical examples and software implementation. It is self-contained, providing the necessary background in linear models, linear mixed models, and generalized linear models (GLMs), before presenting a balanced treatment of the theory and applications of GAMs and related models. The author bases his approach on a framework of penalized regression splines, and while firmly focused on the practical aspects of GAMs, discussions include fairly full explanations of the theory underlying the methods. Use of R software helps explain the theory and illustrates the practical application of the methodology. Each chapter contains an extensive set of exercises, with solutions in an appendix or in the book’s R data package gamair, to enable use as a course text or for self-study.
Article
Models are useful when they are compared with data. Whether this comparison should be qualitative or quantitative depends on circumstances, but in many cases some statistical comparison of model and data is useful and enhances objectivity. Unfortunately, ecological dynamic models tend to contain assumptions and simplifications that enhance tractability, promote insight, but spoil model fit, and this can cause difficulties when adopting a statistical approach. Furthermore, the arcane numerical analysis required to fit dynamic models reliably presents an impediment to objective model testing by fitting. This paper presents methods for formulating and fitting partially specified models, which aim to achieve a measure of generality by avoiding some of the irrelevant incidental assumptions that are inevitable in more traditional approaches. This is done by allowing delay differential equation models, difference equation models, and differential equation models to be constructed with part of their structure represented by unknown functions, while part of the structure may contain conventional model elements that contain only unknown parameters. An integrated practical methodology for using such models is presented along with several examples, which include use of models formulated using delay differential equations, discrete difference equations/matrix models, ordinary differential equations, and partial differential equations. The methods also allow better estimation from ecological data by model fitting, since models can be formulated to include fewer unjustified assumptions than would usually be the case if more traditional models were used, while still including as much structure as the modeler believes can be justified by biological knowledge: model structure improves precision, while fewer extraneous assumptions reduce unquantifiable bias.
Article
In many practical situations, it is desirable to restrict the flexibility of nonparametric estimation to accommodate a presumed monotonic relationship between a covariate and the response variable. We follow a Bayesian approach using penalized B-splines and incorporate the assumption of monotonicity in a natural way by an appropriate specification of the respective prior distributions. We illustrate the methodology in an empirical application modeling demand for brands of orange juice and show that imposing monotonicity constraints for own- and cross-item price effects improves the predictive validity of the estimated sales response functions considerably.
Article
The Sweave system of Leisch (2002) is a literate programming tool based on ideas of Knuth (1984) and is currently part of the core R installation. Specifically, Sweave is a system for processing documents that mix LATEX document formatting with R code. R code can be interspersed within the LATEX markup by indicating “code chunks”. These code chunks are evaluated by the Sweave