On O'Sullivan penalised splines and semiparametric regression

Australian & New Zealand Journal of Statistics (Impact Factor: 0.42). 05/2008; 50(2):179 - 198. DOI: 10.1111/j.1467-842X.2008.00507.x
Source: arXiv


An exposition on the use of O'Sullivan penalized splines in contemporary semiparametric regression, including mixed model and Bayesian formulations, is presented. O'Sullivan penalized splines are similar to P-splines, but have the advantage of being a direct generalization of smoothing splines. Exact expressions for the O'Sullivan penalty matrix are obtained. Comparisons between the two types of splines reveal that O'Sullivan penalized splines more closely mimic the natural boundary behaviour of smoothing splines. Implementation in modern computing environments such as Matlab, r and bugs is discussed.

Download full-text


Available from: John T. Ormerod, Oct 13, 2014
  • Source
    • "2q − 1 defines the smoothing spline estimator. In this article only low-rank spline smoothers with k = o(n) and λ > 0 are considered and referred to as penalized spline estimators, see Wand and Ormerod (2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: There are two popular smoothing parameter selection methods for spline smoothing. First, criteria that approximate the average mean squared error of the estimator (e.g. generalized cross validation) are widely used. Alternatively, the maximum likelihood paradigm can be employed under the assumption that the underlying function to be estimated is a realization of some stochastic process. In this article the asymptotic properties of both smoothing parameter estimators are studied and compared in the frequentist and stochastic framework for penalized spline smoothing. Consistency and asymptotic normality of the estimators are proved and small sample properties are discussed. A simulation study and a real data example illustrate the theoretical fi ndings.
    Preview · Article · Sep 2013 · Journal of the Royal Statistical Society Series B (Statistical Methodology)
  • Source
    • "The z 1 (·), . . . , z K (·) represent spline basis functions and in this paper O'Sullivan splines, providing a close approximation to smoothing splines, are used for this purpose (Wand and Ormerod, 2008). Note that (1) can be interpreted as a linear mixed model and leads to the following Bayesian Gaussian response model "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a method for semiparametric regression analysis of large-scale data which are distributed over multiple hosts. This enables modeling of nonlinear relationships and both the batch approach, where analysis starts after all data have been collected, and the real-time setting are addressed. The methodology is extended to operate in evolving environments, where it can no longer be assumed that model parameters remain constant over time. Two areas of application for the methodology are presented: regression modeling when there are multiple data owners and regression modeling within the MapReduce framework. A website,, illustrates the use of the proposed method on United States domestic airline data in real-time.
    Preview · Article · Jun 2013 · IEEE Transactions on Knowledge and Data Engineering
  • Source
    • ", κ K is a dense set of knots placed over the range of the x i s and r + = max(0, r) for any real number r. However, we recommend a smoother and more numerically stable choice for z k , such as those described in Welham et al. (2007) and Wand and Ormerod (2008). The number of basis functions K has a minor effect on the efficacy of (3) and, for most signals arising in practice, K = 25 is sufficient. "

    Full-text · Article · Feb 2013
Show more