Technical ReportPDF Available

## Abstract and Figures

In this short note we present and briefly discuss the R package islasso dealing with regression models having a large number of covariates. Estimation is carried out by penalizing the coefficients via a quasi-lasso penalty, wherein the nonsmooth lasso penalty is replaced by its smooth counterpart determined iteratively by data according to the induced smoothing idea. The package includes functions to estimate the model and to test for linear hypothesis on linear combinations of relevant coefficients. We illustrate R code throughout a worked example, by avoiding intentionally to report details and extended bibliography.
Content may be subject to copyright.
The R package islasso:
estimation and hypothesis testing
in lasso regression
Gianluca SottileGiovanna CilluﬀoVito M.R. Muggeo
Abstract
In this short note we present and brieﬂy discuss the R package islasso deal-
ing with regression models having a large number of covariates. Estimation
is carried out by penalizing the coeﬃcients via a quasi-lasso penalty, wherein
the nonsmooth lasso penalty is replaced by its smooth counterpart determined
iteratively by data according to the induced smoothing idea. The package in-
cludes functions to estimate the model and to test for linear hypothesis on
linear combinations of relevant coeﬃcients. We illustrate R code throughout
a worked example, by avoiding intentionally to report details and extended
bibliography.
1 Introduction
Let y=Xβ +be the linear model of interest with usual zero-means and ho-
moscedastic errors. As usual, y= (y1, . . . , yn)Tis the response vector, Xis the
n×pdesign matrix (having pquite large) with regression coeﬃcients β.
When interest lies in selecting the non-noise covariates and estimating the rele-
vant eﬀect, one assumes the lasso penalized objective function (Tibshirani, 1996),
1
2||yXβ||2
2+λ||β||1(1)
to be minimized at ﬁxed λ > 0. As it is well-know, the lasso penalty ||β||1
allows to rule out the noise covariates by returning exactly zero estimates as model
output.
Model estimation via the aforementioned penalized objective does not come
without price. The non-null estimates are shrunken towards the zero and, probably
more importantly, inference on the model is complicated and not straightforward.
In other words, no conﬁdence intervals or p-values on (linear combinations of) β
are easily obtained. The islasso aims at ﬁlling this gap, partially. More speciﬁcally,
at time of writing, the package returns point estimates, reliable standard errors and
corresponding p-values for the regression coeﬃcients and any linear combination
of them. We do not provide details of the methodology that can be found in the
paper (Cilluﬀo et al., 2019). Rather, we describe the R functions in the package by
providing a worked example.
Dip. Scienze Economiche, Az e Statistiche, Universit`a di Palermo, Italy. Email:
gianluca.sottile@unipa.it, vito.muggeo@unipa.it
Istituto per la Ricerca e l’Innovazione Biomedica (IRIB), Consiglio Nazionale delle Ricerche
(CNR), Palermo, Italy. Email giovanna.cilluffo@ibim.cnr.it
1
2 The R functions
The main function of the package is islasso() where the user supplies the model
formula as in the usual lm or glm functions, i.e.
islasso(formula, family, lambda, alpha, data, weights, subset,
offset, unpenalized, contrasts, control = is.control())
family accepts speciﬁcation of family and link function as in Table 1, lambda
is the tuning parameter and unpenalized allows to indicate covariates with unpe-
nalized coeﬃcients.
Table 1: Families and link functions allowed in islasso
gaussian identity
binomial logit, probit
poisson log
gamma identity, log, inverse
The ﬁtter function is is.lasso.fit() which reads as
islasso.fit(X, y, family, lambda, alpha = 1, intercept = FALSE,
weights = NULL, offset = NULL, unpenalized = NULL,
control)
which actually implements the estimating algorithm as described in the paper.
The lambda argument in islasso.fit and islasso speciﬁes the positive tuning
parameter in the penalized objective. Any non-negative value can be provided, but
if missing, it is computed via K-fold cross validation by the function cv.glmnet()
from package glmnet (Friedman et al., 2010). The number of folds being used can
be speciﬁed via the argument nfolds of the auxiliary function is.control().
3 A worked example: the Diabetes data set
We use the well-known diabetes dataset available in the lars package. The data refer
to n= 442 patients enrolled to investigate a measure of disease progression one
year after the baseline. There are ten covariates, such as age, sex, bmi (body mass
index), map (average blood pressure) and several blood serum measurements (tc,
ldl, hdl, tch, ltg, glu). The matrix x2 in the dataframe also includes second-order
terms, namely ﬁrst-order interactions between covariates, and quadratic terms for
the continuous variables.
To select the important terms in the regression equation we apply the lasso
> library(glmnet)
> library(lars)
> data(diabetes)
> a1 <- with(diabetes, cv.glmnet(x2, y))
> n <- nrow(diabetes)
> a1$lambda.min*n #the lambda value of (1)  1344.186 > > b <- drop(coef(a1, "lambda.min")) #coeffs at the optimum lambda > length(b[b != 0])  15 2 Ten-fold cross validation ‘selects’ λ= 1344.2 corresponding to 15 non null coeﬃ- cients, whose the last ones, are, just to illustrate > tail(b[b != 0]) glu^2 age:sex age:map age:ltg age:glu bmi:map 69.599081 107.479925 29.970061 8.506032 11.675332 85.530937 A reasonable question is if all the ‘selected’ coeﬃcients are signiﬁcant in the model. Unfortunately lasso regression does not return standard errors due to nons- moothness of objective, and some alternative approaches have been proposed. One of them, is the ‘covariance test’ (Lockhart et al., 2013) as implemented in the pack- age covTest > library(covTest) > > o <- with(diabetes, lars(x2, y)) > with(diabetes, covTest(o, x2, y))$‘results’
Predictor_Number Drop_in_covariance P-value
3 20.1981 0.0000
9 52.5964 0.0000
4 5.7714 0.0034
7 4.0840 0.0176
37 1.3310 0.2655
20 0.3244 0.7232
.....................
The CovTest approach suggest that only the terms corresponding to columns 3,
9, 4, and 7 in the matrix x2 are signiﬁcant, namely
> colnames(diabetes$x2)[c(3, 9, 4, 7)]  "bmi" "ltg" "map" "hdl" However covTest returns p-values across the λpath. It means that such p-values are not matched to the corresponding point estimates obtained at the optimal λ value (λ= 1344.2 in this example). As a consequence, some discrepancies between the results by covTest and glmnet/lars are likely to occur. For instance, out of the 15 selected non-null coeﬃcients, just 4 are assessed as signiﬁcant. The R package islasso provides and alternative to covTest, by implementing the recent ‘quasi’ lasso approach based on the induced smoothing idea (Brown and Wang, 2005) as discussed in Cilluﬀo et al. (2019). Point estimates and p-values are returned within the same framework. While the optimal lambda could be selected (without supplying any value to lambda), we use the same above value to facilitate comparisons > library(islasso) > out <- islasso(y ~ x2, data=diabetes, lambda=1344.186) The summary method quickly returns the main output of the ﬁtted model, in- cluding point estimates, standard errors and p-values > summary(out) Call: islasso(formula = y ~ x2, lambda = 1344.186, data = diabetes) 3 Residuals: Min 1Q Median 3Q Max -138.74 -40.18 -4.53 34.45 143.43 Estimate Std. Error Df z value Pr(>|z|) (Intercept) 1.521e+02 2.554e+00 1.000 59.570 < 2e-16 *** x2age 1.873e-01 2.408e+01 0.005 0.008 0.99379 x2sex -1.149e+02 5.377e+01 0.891 -2.137 0.03258 * x2bmi 4.952e+02 7.058e+01 1.000 7.016 2.29e-12 *** x2map 2.514e+02 6.447e+01 0.999 3.899 9.64e-05 *** x2tc -4.514e-01 2.837e+01 0.012 -0.016 0.98730 ...................... x2tch:glu 2.848e-01 2.546e+01 0.006 0.011 0.99107 x2ltg:glu 2.712e-01 3.611e+01 0.005 0.008 0.99401 ...................... Visualizing estimates for all covariates could be somewhat inconvenient, espe- cially when the number of covariates is large, thus one could opt to print estimates only if their p-value is less than a speciﬁed value. We use 0.10 as a threshold. > summary(out, pval = .1) Call: islasso(formula = y ~ x2, lambda = 1344.186, data = diabetes) Residuals: Min 1Q Median 3Q Max -138.74 -40.18 -4.53 34.45 143.43 Estimate Std. Error Df z value Pr(>|z|) (Intercept) 152.133 2.554 1.000 59.570 < 2e-16 *** x2sex -114.923 53.773 0.891 -2.137 0.03258 * x2bmi 495.168 70.581 1.000 7.016 2.29e-12 *** x2map 251.409 64.473 0.999 3.899 9.64e-05 *** x2hdl -189.213 67.826 0.978 -2.790 0.00528 ** x2ltg 466.026 70.701 1.000 6.592 4.35e-11 *** x2age:sex 109.177 50.732 0.904 2.152 0.03139 * x2bmi:map 86.476 47.404 0.812 1.824 0.06812 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2855.198) Null deviance: 2621009 on 441.0 degrees of freedom Residual deviance: 1222391 on 428.1 degrees of freedom AIC: 4786.9 Lambda: 1344.2 Number of Newton-Raphson iterations: 40 In addition to the usual information printed by the summary method, the output also includes the column Df representing the degrees of freedom of each coeﬃcient. Negligible coeﬃcients (i.e. with approximately a null estimate) will exhibit almost zero degree-of-freedom, see the previous output of summary(out). The sum of all degrees-of-freedom is used to quantify the model complexity 4 > sum(out$internal\$hi)
 13.87174
and the corresponding residual degrees of freedom (428.1) are printed next to the
residual deviance, as reported above. The Wald test (column z value) and p-values
can be used to assess important or signiﬁcant covariates. In addition to those ruled
out by covTest (”bmi” ”ltg” ”map” ”hdl”), islasso() also returns ‘small’ p-values
for the terms ”sex”, ”sex:age”, and ”bmi:map”. Simulation studies in Cilluﬀo et
al. (2019) have shown good performance of islasso with respect to some alternative
approaches.
As an alternative to the Cross Validation, it is also possible to select the tuning
parameter λby means of the Bayesian or Akaike Information Criterion. The func-
tion aic.islasso, requires a islasso ﬁt object and speciﬁcation of the criterion to
be used (AIC/BIC). Hence
> lmb.bic <- aic.islasso(out, "bic")
> out1 <- update(out, lambda = lmb.bic) #fit with a BIC-based lambda
Comparisons between methods to select the tuning parameter and further dis-
cussions are beyond our goals.
We conclude this short note by emphasizing that islasso also accepts the so-called
elastic-net penalty, such that
1
2||yXβ||2
2+λ{α||β||1+1
2(1 α)||β||2
2}
where 0 α1 is the mixing parameter to be speciﬁed in islasso() and
islasso.fit() via the argument alpha.
4 Concluding remarks
The package islasso provides an alternative to the ‘plain’ lasso regression. The
main disadvantage with respect to lasso lies in the point estimates: islasso does not
perform variable selection, in that the point estimates will be never exactly zero,
however diﬀerences in terms of ﬁndings will be typically negligible. However unlike
the plain lasso, islasso is able to return reliable standard errors and p-values which
can be used to assess signiﬁcance of coeﬃcients.
References
Brown B and Wang Y. Standard errors and covariance matrices for smoothed rank
estimators. Biometrika 2005; 92: 149–158.
Cilluﬀo, G, Sottile, G, La Grutta, S and Muggeo, VMR (2019) The Induced
Smoothed lasso: A practical framework for hypothesis testing in high dimen-
sional regression. Statistical Methods in Medical Research, online
doi: 10.1177/0962280219842890.
Friedman, J, Hastie, T, Tibshirani, R (2010). Regularization Paths for Generalized
Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1),
1-22.
Lockhart R, Taylor J, Tibshirani R, et al. A signiﬁcance test for the lasso. Ann
Stat 2014; 42: 413–468.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc: Series
B 1996; 58: 267–288
5