ON SEMIPARAMETRIC REGRESSION WITH O'SULLIVAN PENALIZED SPLINES
ABSTRACT An exposition on the use of O'Sullivan penalized splines in contemporary semiparametric regression, including mixed model and Bayesian formulations, is presented. O'Sullivan penalized splines are similar to P-splines, but have the advantage of being a direct generalization of smoothing splines. Exact expressions for the O'Sullivan penalty matrix are obtained. Comparisons between the two types of splines reveal that O'Sullivan penalized splines more closely mimic the natural boundary behaviour of smoothing splines. Implementation in modern computing environments such as Matlab, r and bugs is discussed.
On semiparametric regression with O’Sullivan
BY M. P. WAND
School of Mathematics and Applied Statistics, University of Wollongong, Wollongong 2522,
AND J.T. ORMEROD
School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia
25th June, 2007
This is an expos´ e on the use of O’Sullivan penalised splines in contemporary semipara-
metric regression, including mixed model and Bayesian formulations. O’Sullivan pe-
nalised splines are similar to P-splines, but have an advantage of being a direct gener-
alisation of smoothing splines. Exact expressions for the O’Sullivan penalty matrix are
obtained. Comparisons between the two reveals that O’Sullivan penalised splines more
closely mimic the natural boundary behaviour of smoothing splines. Implementation in
modern computing environments such as Matlab, R and BUGS is discussed.
Keywords: Additivemodels, MarkovchainMonteCarlo; Mixedmodels; P-splines; Smooth-
Splines continue to play a central role in nonparametric and semiparametric regression
modelling. Recent synopses include Eubank (1999), Gu (2002), Ruppert, Wand & Carroll
(2003) and Denison, Holmes, Mallick & Smith (2002). In all but the last reference, smooth
functional relationships are fitted using a large basis of spline functions subject to penali-
sation. Up until the mid-1990s most literature on spline-based nonparametric regression
was concerned with smoothing splines, and their multivariate extension thin plate splines,
where the penalty takes a particular form and the number of basis functions roughly
equals the sample size (e.g. Wahba, 1990; Green & Silverman, 1994). However, in recent
years, there has been a great deal of research on more general spline/penalty strategies,
most of which use considerably fewer basis functions. Driving forces include:
• more complicated models, often with several smooth functions;
• larger data sets, where smoothing and thin plate splines become computationally
• mixed model and Bayesian representations of smoothers that lend themselves to
the use of established software, such as BUGS, lme() in R and PROC MIXED in
SAS; provided the number of basis functions is relatively low.
Ruppert, Wand & Carroll (2003) summarise and provide access to many of these devel-
opments. The term penalised splines has emerged as a descriptor for general spline fitting
subject to penalties.
O’Sullivan (1986, Section 3) introduced a class of penalised splines based on B-spline
basis functions. O’Sullivan penalised splines are a direct generalisation of smoothing
splines in that the latter arises when the maximal number of B-spline basis functions
are included. Like smoothing splines, O’Sullivan penalised splines possess the attractive
arXiv:0707.0143v1 [stat.ME] 2 Jul 2007
feature of natural boundary conditions (e.g. Green & Silverman, 1994, p.12). They have
also become the most widely used class of penalised splines in statistical analyses due
to their implementation in the popular R and S-PLUS function smooth.spline() and
associated generalised additive model software (e.g. the gam library in R; Hastie, 2006).
Despite the omnipresence of O’Sullivan penalised splines, their use in semiparamet-
ric regression contexts, particularly those involving mixed model and Bayesian represen-
tations, is not very common. Recently, Welham, Cullis, Kenward & Thompson (2007)
showed how most of the commonly used penalised splines can be treated within a single
mixed model framework, although they did not work explicitly with the form given in
Our contributions in this paper are (a) providing an exact matrix expression for the
penalty of O’Sullivan splines that allows implementation in a few lines of a matrix-
based computing language, (b) comparison with their closest penalised spline relative,
P-splines (Eilers & Marx, 1996), which reveal some noticeable differences near the bound-
aries, (c) demonstrate explicitly, including with R code, how O’Sullivan splines can be
simply added to the mixed model-based regression armoury, and (d) investigate their ef-
ficacy in Bayesian semiparametric regression using MCMC software such as BUGS and its
variants. We conclude that the several attractive features of O’Sullivan penalised splines
— smoothness, numerical stability, natural boundary properties, direct generalisation of
smoothing splines — makes them a very good choice of basis in semiparametric regres-
Section 2 provides a brief description of O’Sullivan penalised splines. Comparison
with P-splines is made in Section 3. Section 4 describes mixed model representation
of O’Sullivan penalised splines and how they can be used in models that benefit from
this representation. Issues concerning Bayesian penalised spline smoothing and Markov
chain Monte Carlo are described in Section 5. O’Sullivan splines of general degree are
described in Section 6. Closing discussion is given in Section 7. An appendix contains
relevant R code.
2 O’Sullivan Penalised Splines
O’Sullivan penalised splines have already been described several times in the literature.
A recent reference is the Chapter 5 Appendix of Hastie, Tibshirani & Friedman (2001). A
brief sketch is given here for convenience.
Consider the simplest nonparametric regression setting
yi= f(xi) + εi,
1 ≤ i ≤ n,
where (xi,yi) ∈ R × R. Suppose that an estimate of f is required over [a,b], an interval
containing the xi’s. For an integer K ≤ n let κ1,...,κK+8be a knot sequence such that
a = κ1= κ2= κ3= κ4< κ5< ··· < κK+4< κK+5= κK+6= κK+7= κK+8= b
and let B1,...,BK+4be the cubic B-spline basis functions defined by these knots (see e.g.
pp.160–161 of Hastie et al., 2001). Set up the n×(K +4) design matrix B with (i,k) entry
Bik= Bk(xi) and the (K + 4) × (K + 4) penalty matrix Ω Ω Ω with (k,k?) entry
Then an estimate of f at location x ∈ R can be obtained as
?fO(x;λ) ≡ Bx? ν ν νO
Ω Ω Ωkk? =
? ν ν νO≡ (BTB + λΩ Ω Ω)−1BTy,
Bx≡ [B1(x),...,BK+4(x)] and λ > 0 is a smoothing parameter.
Note that the cubic smoothing spline arises in the special case K = n and κk+4= xk,
1 ≤ k ≤ n, providedthexi’saredistinct(e.g. Green&Silverman, 1994, Section3.6). Apart
from giving a smooth (twice continuously differentiable) scatterplot smooth,?fO(·;λ) has
problems. Moreover, BTB is 4-banded, which leads to O(n) algorithms when K is close
to n (e.g. Hastie, et al., 2001). In addition,?fO(·;λ) satisfies so-called natural boundary
natural boundary properties of?fO(·;λ) for data on ratios of strontium isotopes found in
proximates the least squares line as λ → ∞. The implication for mixed model smoothing
is that the induced fixed effects component corresponds to straight line basis functions.
Details are given in Section 4.
conditions, meaning that
O(b;λ) = 0
and implying that?fO(·;λ) is linear over [a,κ5] and [κK+4,b]. Figure 1 illustrates these
fossil shells and their age; see Chaudhuri & Marron (1999) for details. Also,?fO(·;λ) ap-
90100 110120 130
age (millions of years)
Figure 1: Illustration of natural boundary properties of a 20-interior knot O’Sullivan penalised
spline fit to the fossil data over the interval [85,130] millions of years. The interior knots are
shown as solid diamonds (?). Inset: The 24 B-spline basis functions.
Computation of the design matrix B is usually quite easy. For example, B-splines are
readily available in the Matlab, R and S-PLUS computing environments. Otherwise re-
currence formulae (e.g. de Boor, 1978; Eilers & Marx, 1996) can be called upon. However,
computation of Ω Ω Ω requires some additional effort. In Section 6, while treating general de-
gree O’Sullivan penalised splines, we derive an exact matrix algebraic expression for the
corresponding penalty matrices. In the cubic case our theorem reduces to the expression:
Ω Ω Ω = (?B??)Tdiag(w)?B??
where?B??is the 3(K + 7) × (K + 4) matrix with (i,j) entry B??
? x =
j(? xi), ? xiis the ith entry of
and w is the 3(K + 7) × 1 vector given by
where (∆κ κ κ)k≡ κk+1− κk, 1 ≤ k ≤ K + 7. Result (3) is none other than Simpson’s rule
applied over each of the inter-knot differences. This is because each B??
piecewise quadratic. For commonly used values of K, (3) allows straightforward compu-
tation of Ω Ω Ω in matrix-based languages such as Matlab, R and S-PLUS. In the Appendix
we demonstrate computation of Ω Ω Ω in 4 lines of R code.
Lastly, we mention knot choice. The R and S-PLUS function smooth.spline() uses
?k + 1
Other values of n between 50 and 3200 are handled via a logarithmic interpolation. For
many functional relationships, fewer knots are sufficient. Figure 1 is one example, where
only K = 20 interior knots are used without compromising the quality of the fit. A
common default in the penalised spline literature is K = min(nU/4,35), where nU is
the number of unique xi’s (e.g. Ruppert et al., 2003). Ruppert (2002) discusses ‘hi-tech’
choice of K. The distribution of the knots, for a given K, may have some affect on the
results. As mentioned above, smooth.spline() uses quantile-based knots while e.g.
Eilers & Marx (1996) recommend equally-spaced knots. In most situations this effect will
be minor. However, for either strategy, it is possible to construct regression functions and
predictor variable distributions for which problems arise. More sophisticated knot place-
ment strategies may help. For example, Luo & Wahba (1997) propose more sophisticated
basis function reduction methods that could be adapted to the current context.
6(∆κ κ κ)1,4
6(∆κ κ κ)1,1
6(∆κ κ κ)1,1
6(∆κ κ κ)2,4
6(∆κ κ κ)2,1
6(∆κ κ κ)2,...,1
6(∆κ κ κ)K+7,4
6(∆κ κ κ)K+7,1
6(∆κ κ κ)K+7
K + 2
th sample quantile of the xi’s
200 + (n − 3200)1/5
n < 50
n = 200
n = 800
n > 3200.
3Comparison with P-splines
The closest relatives of O’Sullivan penalised splines are the P-splines of Eilers & Marx
(1996). If the interior knots κ5,...,κK+4are taken to be equally-spaced then the family of
cubic P-splines is given by (2) with the Ω Ω Ω replaced by DT
differencing matrix. This differencing penalty corresponds to a discrete approximation
to the integrated square of the kth derivative of the B-spline smoother. The choice k = 2
leads to the cubic P-spline estimate
kDk, where Dkis the kth-order
?fP(x;λ) = Bx? ν ν νP,
? ν ν νP≡ (BTB + λDT
having the property that?fP(·;λ) approaches the least squares line as λ → ∞. In this
the bands in the interior rows are, up to multiplicative factors, as follows:
sense, (4) is the closest relative of?fO(·;λ). If the interior knots are equally-spaced then
O’Sullivan penalised splines (2):
Cubic P-splines; 2nd order diff. (4):
Figure 2 facilitates visual comparison of the two. It is seen that the differences are rela-
tively small, although not negligible.
penalised splines, or O-splines for short? Theoretical comparison between P-splines and
position with respect to diagonal
value of penalty
Figure 2: Comparison of near-diagonal entries of the penalty matrices for O’Sullivan penalised
splines and cubic P-splines with k = 2 and equally-spaced interior knots.
O-splines in terms of estimation performance, perhaps in the spirit of Hall & Opsomer
(2005), would be ideal – although beyond the scope of the current paper.
Eilers and Marx (1996) partially justify use of P-splines rather than O’Sullivan splines
based on simplicity of the P-spline penalty matrix. However, as seen from (3), the penalty
matrix needed for O-splines can be obtained straightforwardly. Furthermore the discrete
approximation of P-splines requires equally-spaced knots which, depending on f, may
not be desirable,
A possible advantage of P-splines is the option of higher-order penalties, although
the resulting smoothers can have erratic extrapolation behaviour. A possible advantage
of O-splines is their direct relationship with time-honoured smoothing splines, and their
attractive theoretical properties (e.g. Nussbaum, 1985). ¿From the results described in
Section 2 is clear that O-splines approach smoothing splines as K → n. But how close are
O-splines to smoothing splines for common (smaller) choices of K, and are they closer
than P-splines with the same value of K and interior knots? To address these questions
we conducted an empirical study based on the eighteen homoscedastic nonparametric
regression settings in Wand (2000). For O-splines we used K = 100 equally spaced inte-
rior knots with 4 repeated knots at each boundary as described in Section 2. However, for
P-splines we used the knot sequence described in the Appendix of Eilers & Marx (1996)
which involves extending the knots beyond the boundary rather than repeating them.
For each setting 200 samples were generated and smoothing spline estimates?fS, with
computed?fOand?fPto have the same effective degrees of freedom as?fSand recorded
smoothing parameter chosen via generalised cross-validation, were obtained. We then
closeness measures d(?fO,?fS;A) and d(?fP,?fS;A) where
(f − g)2.
We took A corresponding to the intervals (a,κ5) (left boundary), (κ5,κK+5) (interior),
(κK+5,b) (right boundary) and (a,b) (total region) where the κkdenote the knots used
for the O-spline fits. The Wand (2000) settings all involve predictor data within the unit