Content uploaded by Vito Muggeo
Author content
All content in this area was uploaded by Vito Muggeo on Mar 06, 2018
Content may be subject to copyright.
Using the R package quantregGrowth:
some examples
Vito M.R. Muggeo∗
Universit`a di Palermo, Italy
Abstract
We provide some examples and comments in fitting nonparametric quantile
regression via the R package quantregGrowth. This is a short note meant to
be a quick reference for the practitioner. Neither theory nor reference are
reported.
1 Background
Additive quantile regression (QR) aims to model covariate effects on the response
quantiles without imposing any rigid and parametric functions. For the response
variable Y, covariates xand zand assigned probability value τ∈(0,1), the QR
equation can be written as
QY(τ|x, z) =
J
X
j
sj(xj) + zTβ(1)
where zTβrepresents a linear predictor, and the term sj(·) accounts for the non-
linear effect of covariate xj. We assume that the smooth yet unspecified function
sj(·) is expressed via B-splines, i.e. sj(·) = PKj
kbjk Bjk (·) where Kjis the number
of basis functions, i.e Bj= [Bj1, . . . , BjKj] with coefficients bj= (bj1, . . . , bjKj)T.
The objective function for (1) to be minimized is L(b, β ) = Piρτ(yi−BT
ib−zT
iβ)
where ρτ(·) is the usual check function. However to ovoid overfitting coming from
an excessive number of basis functions, proper penalties are appended to the spline
coefficients. The penalized objective can be written as
Lλ(b, β) = X
i
ρτ(yi−BT
ib−zT
iβ) + X
j
λj||Dd
jbj||1.(2)
The penalty ||Dd
jbj||1is the sum of absolute values of the coefficient differences and
λjthe smoothing parameter. For instance to penalize the first-order differences
(d= 1) for the jth smooth term, ||D1
jbj||1=PKj−1
k|bjk −bj,k−1|.
Estimation of QR model (1) via minimization of (2) is straightforward using
the quantregGrowth package. As a quick example, for the hypothetical response
y, continuous covariates x1 x2 assumed to have a nonlinear effect, and additional
linear term (possibly categorical) z, the code line would be
gcrq(y~ps(x1, lambda=l1, d=d1)+ ps(x2, lambda=l2, d=d2)+z, tau=.5)
where l1 l2 are the known smoothing parameter values and d1 d2 the difference
orders.
∗Email: vito.muggeo@unipa.it
1
2 Fitting the model
We simulate data in R using code reported at ?rqss
> set.seed(12345)
> n <- 200
> x <- sort(rchisq(n,4))
> y <- log(x)+ .1*(log(x))^2 + log(x)*rnorm(n,0,2)/4
To begin with, we fit simple QR models assuming λ= 4 and difference order
values d= 1,2,3
> library(quantregGrowth)
>
> m1<- gcrq(y~ps(x, lambda=4, d=1), tau=.5)
> m2<- gcrq(y~ps(x, lambda=4, d=2), tau=.5)
> m3<- gcrq(y~ps(x, lambda=4, d=3), tau=.5)
> par(mfrow=c(1,3))
> plot(x,y,col=grey(.8)); plot(m1, add=TRUE)
> plot(x,y,col=grey(.8)); plot(m2, add=TRUE)
> plot(x,y,col=grey(.8)); plot(m3, add=TRUE)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
−2 −1 0 1 2 3 4 5
x
y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
−2 −1 0 1 2 3 4 5
x
y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
−2 −1 0 1 2 3 4 5
x
y
Figure 1: Scatterplots with the fitted quantile regression lines superimposed. The
continuous lines come from the fits having λ= 4, the dashed lines refer to fits
obtained at λ= 1000.
Apart from the possible undersmoothing (i.e. λtoo small) to be discussed later,
the figures emphasize how the fits depend on the dvalue employed. Indeed, the
fitted curve is a piecewise polynomial of degree d−1 at intermediate values of λ.
Thus to avoid sharp changes in the fitted curves, d≥3 is suggested and d=3 is set
as default value in ps().
In each panel, the dashed line portrays the resulting curve coming from objects
fitted with large values of λ. For instance in the left panel, the dashed line refers
to the model
> gcrq(y~ps(x, lambda=1000, d=1), tau=.5)
It is clearly seen, as it is well known from theory, that the fitted curve approaches
to a d−1 polynomial as λ→ ∞.
We continue illustration of the package by considering responses depending on
another covariate,
2
> z <- x + rnorm(n) # new (linear) covariate
> y <- y + z # new responses
Fitting the regression quantile model via gcrq() is obtained straightforwardly,
by adding zto the formula. Also the smoothing parameter value is unknown in
practice, and thus we ‘select’ the optimal λvia cross validation by providing a
set of candidate values in ps(). We also set n.boot>0 to run (case resampling)
bootstrap replicates in order to quantify uncertainty in the estimates.
> m5<- gcrq(y~ps(x, lambda=seq(0.1,20,l=30))+z, tau=.5, n.boot=200)
>
> par(mfrow=c(1,2))
> plot(m5, cv=TRUE)
> plot(m5, res=TRUE, conf.level=.95, col=2, lwd=2, lty=1)
●
●●
●●●●●●●●●●●●●●●●
5.5 6.0 6.5 7.0
lambda values
Cross Validation score
●
1.15
2.19
3.24
4.29
5.34
6.38
7.43
8.48
9.53
10.57
11.62
12.67
13.72
14.76
15.81
16.86
17.91
18.95
20
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0 5 10 15 20
−2 0 2 4
x
y
Figure 2: Cross validation scores (left panel) and fitted quantile curve with 95%
pointwise confidence intervals for the fitted model m5.
If cross validation has been run, plot.gcrq(..,cv=TRUE) shows the CV scores
along the candidate lambda values, otherwise (cv=FALSE, default) the fitted quantile
regression line is plotted. Rather than plotting data and adding the fitted lines
(as in Figure 1), argument res in plot.gcrq() is used to portray the ’partial’
residuals (defined as in the GLM framework) along with pointise confidence intervals
(provided that conf.level>0).
Depending on application, the fall of the fitted curve at large covariate values
(see right panel in Figure 2) could turn out biologically unsound. For instance if
the response is any growth variable supposed to be a non decreasing function of
covariate age. Therefore we fit the same model with monotonicity constraints on
the fitted curve. Also, to run cross validation using the same folds employed in
model m5, we extract such information to pass it on the new call, and then we plot
results accordingly.
> id.fold<-attr(m5$cv,"foldid")
> m6<-gcrq(y ~ps(x, lambda=seq(0.1,20,l=30), monotone=1) + z,
+ foldid=id.fold, tau=.5, n.boot=200)
>
> par(mfrow=c(1,2))
> plot(m6, cv=TRUE)
> plot(m6, res=TRUE, conf.level=.95, shade=TRUE, col=3, lwd=2,
+ pch.p=2, cex.p=.6)
where pch.p and cex.p (as well as col.p not used above) refer to the residuals in
the plot, if res=TRUE.
3
●
●
●●●●●
●●
●
●●●●●●●
●●
5.12 5.16 5.20 5.24
lambda values
Cross Validation score
●
1.15
2.19
3.24
4.29
5.34
6.38
7.43
8.48
9.53
10.57
11.62
12.67
13.72
14.76
15.81
16.86
17.91
18.95
20
0 5 10 15 20
−2 0 2 4
x
y
Figure 3: Cross validation scores (left panel) and fitted quantile curve with 95%
poitwise confidence intervals for the fitted model m6.
gcrq() stands for growth charts regression quantiles, and in fact gcrq() can
be used to estimate multiple reqression quantile (i.e at different probability levels)
with non crossing constraints . In some field, epidemiology or medicine, multiple
quantile curves are sometimes called ‘growth charts’. At this end we just supply a
vector of values tau in the gcrq() call. We use the growthData shipped with the
package,
> taus<-c(.1,.25,.5,.75,.95)
> data(growthData)
> m7<-gcrq(y ~ps(x, lambda=seq(0.1,20,l=30)) + z, data=growthData,
+ tau=taus, n.boot=200)
We have used taus<-c(.1,.25,.5,.75,.95) which is the default value and
therefore it could be omitted in the gcrq() call.
We conclude this short note by reporting some code to display the fitted curves
with different options.
> par(mfrow=c(2,2))
> plot(m7, res=TRUE, lwd=2, conf.level=.95, pch.p=20, col.p=grey(.7),
+ legend=TRUE, overlap=TRUE, shade=TRUE, lty=1)
> plot(m7, res=TRUE, lwd=2, conf.level=.95, pch.p=20, col.p=grey(.7),
+ legend=TRUE, overlap=FALSE, shade=TRUE, lty=1, col=2:6)
> plot(m7, res=TRUE, lwd=2, conf.level=.95, pch.p=20, col.p=2,
+ legend=TRUE, overlap=FALSE, shade=TRUE, lty=1, col=1)
> plot(m7, res=TRUE, lwd=2, conf.level=.95, pch.p=20, legend=TRUE,
+ overlap=FALSE, lty=1, col=2:4, select.tau=c(1,3,5), cex=1.1,
+ ylab="my response", xlab="covariate", xlim=c(0.1,0.4),ylim=c(2,12))
Among the different (rather intuitive) arguments used in plot.gcrq(), it is
worth noting select.tau which allows to display only some of the fitted curves,
cex which refers to the legend font size, and xlim and ylim used to zoom in some
area in the plot.
4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0 5 10 15
x
y
0.10
0.25
0.50
0.75
0.95
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0 5 10 15
x
y
0.10
0.25
0.50
0.75
0.95
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0 5 10 15
x
y
0.10
0.25
0.50
0.75
0.95
●
●●●
●●●
●
●
●
●●●
●●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
2 4 6 8 10 12
covariate
my response
0.10
0.50
0.95
Figure 4: Plots of ‘growth charts ’ obtained via plot.gcrq() with different graphical
options.
3 Conclusions
As reported in the Abstract this note is simply a sort of (extended) help file. The
R package quantregGrowth is still under development. New releases will include
facilities for additional spline bases, varying coefficients and computation of stan-
dard errors with better statistical properties. If you use quantregGrowth in your
work/paper, please consider to cite the reference paper
Muggeo V.M.R., Sciandra M., Tomasello A., Calvo S. (2013) Estimating growth
charts via nonparametric quantile regression: a practical framework with applica-
tion in Ecology. Environmental and Ecological Statistics, 20, 519-531.
5


