PreprintPDF Available

Scalable Gradient Boosting using Randomized Neural Networks

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

This paper presents a gradient boosting machine inspired by the LS Boost model introduced in [Friedman, 2001]. Instead of using linear least squares models as base learners, the proposed algorithm is model-agnostic and employs randomized neural networks. The combination of gradient boosting and randomized neural network base learners provides a powerful and scal-able solution for various machine learning tasks, as demonstrated in the examples.
Content may be subject to copyright.
Scalable Gradient Boosting using Randomized Neural Networks
T. Moudiki
Techtonique, LLC
Abstract
This paper presents a gradient boosting ma-
chine inspired by the LS Boost model in-
troduced in [Friedman, 2001]. Instead of
using linear least squares models as base
learners, the proposed algorithm is model-
agnostic and employs randomized neural
networks. The combination of gradient
boosting and randomized neural network
base learners provides a powerful and scal-
able solution for various machine learning
tasks, as demonstrated in the examples.
1 Introduction
Gradient boosting is a powerful machine learning
technique that has been successfully applied to a
wide range of supervised regression and classifica-
tion problems. The idea behind gradient boosting is
to combine a set of weak learners to create a strong
learner that can make accurate predictions.
Decision trees are, by far, the most popular choice
for weak learners in gradient boosting (see for
example [Chen and Guestrin, 2016], [Ke et al., 2017],
[Prokhorenkova et al., 2018]). The key to the success
of gradient boosting lies in the use of an additive
model, where each weak learner is trained to mini-
mize the residual error of the previous ensemble of
learners. This iterative process allows the model to
learn complex patterns efficiently.
In this paper, I propose a new gradient boosting
model, GenericGradientBooster, inspired by the
LS Boost model introduced in (Friedman, 2001). In-
stead of using linear least squares models as base
learners, the model proposed in this paper is
model-agnostic and employs randomized neural
networks, which enable highly nonlinear represen-
tations and scalability for batch learning, includ-
ing GPU acceleration. The combination of gradient
boosting and randomized neural network base learn-
ers hence provide a powerful and scalable solution
for machine learning tasks. An implementation of
the GenericGradientBooster is available online in
Python package mlsauce.
2 Algorithm description
Without loss of generality, I describe
GenericGradientBooster, in the situation where
the base learner is a linear regression model. The
GenericGradientBooster described can then be
extended to other situations where base learners are
anything else.
Let yRnbe a centered response variable (tar-
get), and xiRpstandardized explanatory vari-
ables for the ith observation. I am interested in
computing E[yi|xi]. That’s what LS Boost from
[Friedman, 2001] does:
F0(x) = y
For m=1 to Mdo:
˜
yi=yiFm1(xi),i=1, . . . , N
(ρm,am)=arg min
a,ρ
N
i=1
[˜
yiρh(xi;a)]2
Fm(x) = Fm1(x) + ρmh(x;am)
endFor
end Algorithm
(1)
The pseudoresponse at line 3 of the LS Boost algo-
rithm is ˜
yi=yiFm1(xi). Thus, line 4 simply fits
the current residuals and the line search (line 5) pro-
duces the result ρm=βm, where βmis the optimal
least squares coefficient. Therefore, gradient boosting
on squared-error loss produces the usual stagewise
approach of iteratively fitting the current residuals.
The general principles of the LS Boost do apply to
Scalable Gradient Boosting using Randomized Neural Networks
GenericGradientBoosters with linear base learners.
However, in the latter, at each boosting iteration m,
line 4 of the LS Boost algorithm is replaced by the
following randomized neural networks model below,
and replacing yby ˜
y, the current model residuals):
˜
yRn, to be explained by X(j),j {1, . . . , p}
˜
y=β0+
p
j=1
βjX(j)+
L
l=1
γIg p
j=1
W(j,l)X(j)!+ϵ
That is:
βm=argminβRp+L
N
i=1
[˜
yi,mβh(xi;wm)]2(1)
wis drawn from a sequence of pseudo-random
U([0; 1]) numbers. In addition:
happlies an activation function, an elementwise
ReLU, x7→ max(0, x), to XW
rows and columns of X a matrix containing xs
in rows can be subsampled to increase the final
ensemble’s diversity.
at line 5 of the LS Boost algorithm, and as sug-
gested by [Friedman, 2001], a learning rate can
be included
0<v1 (3)
In the more general case where the base learners are
not least squares models, there is no more (Eq. 1).
Instead, a nonlinear regression of ˜
yi,mon h(xi)is per-
formed for m=1 To M.
It’s worth mentioning that for supervised classifi-
cation tasks, the model is regression-based. How?
Chapter 4 of Elements of Statistical Learning (ESL, see
[Hastie et al., 2009]) describes classification using lin-
ear regression. Here, I’ll describe multi-class classi-
fication using any regressor regressor being under-
stood in this context as any machine learning model
trying to explain a continuous variable as a function
of different characteristics a.k.a explanatory variables.
Let:
nNbe the number of examples or observa-
tions of a variable to be explained.
KNthe number of classes. For example, when
assessing a housing credit in banking, depending
on the case and its characteristics, there could be
two classes: good credit and bad credit.
yNnwith values in {1, . . . , K}the variable
to be explained. For example, when assessing a
housing credit in banking, ywould contain all
the past decisions on nobserved cases: “it’s a
good credit or a bad credit”.
An indicator response matrix YNn×K, containing
only 0’s and 1’s, can be obtained from y. Each row of
Yshall contain a single 1 in the column correspond-
ing to the class where the example belongs, and 0’s
elsewhere. The process of transforming yinto Yis
commonly defined as one-hot encoding.
Now, let XRn×pbe the set of pexplanatory vari-
ables (for example, when assessing a housing credit in
banking these could be age and salary with p=2) for
yand Y, with examples in rows, and characteristics in
columns. ESL applies Kleast squares regression (lin-
ear) models to X, for each column of Y. The regres-
sion’s predicted values can be interpreted as raw es-
timates of probabilities, because the least squares’ so-
lution is a conditional expectation. And for G, a ran-
dom variable describing the class kfor k {1, . . . , K},
I have:
E[IG=k|X=x] = P[G=k|X=x]
Next section will demonstrate the empirical quality of
the GenericGradientBooster on multiple data sets.
3 Numerical examples
Benchmarking machine learning models is difficult
because of its inherent subjectivity, and because, in
general different machine learning models do not
have a common set of hyperparameters.
In the following examples no hyperparameter
tuning is performed. The aim is to demon-
strate the capabilities of the proposed model,
GenericGradientBooster, compared to other
popular, battle-tested machine learning models
(Random Forest [Breiman, 2001] and XGBoost
[Chen and Guestrin, 2016]), when they are all as-
signed with their default hyperparameters. Examples
include supervised regression and classification.
3.1 Classification experiments
The data sets used for these experiments are adapted
from the OpenML-CC18 suite [Bischl et al., 2017]. In
our modified (25) data sets, response variables are all
subsampled in a stratified way (in order to maintain
the class distribution in the subsample), and contain
approximately 1000 samples. Among the features of
each data set, 10 columns are selected, based on a
Random Forest’s feature importance.
T. Moudiki
Here’s the list of the 25 datasets from
[Bischl et al., 2017] employed for our supervised
classification experiments:
mfeat-fourier
mfeat-karhunen
mfeat-zernike
credit-g
pendigits
spambase
electricity
satimage
sick
vowel
mnist 784
kc2
kc1
adult
Bioresponse
qsar-biodeg
ilpd
ozone-level-8hr
banknote-authentication
blood-transfusion-service-center
cylinder-bands
bank-marketing
MiceProtein
car
Internet-Advertisements
Approximately 30 scikit-learn
([Pedregosa et al., 2011]) models are then used as
base learners for the GenericGradientBooster, with
their default hyperparameters. These 30 ensembles
are compared to Random Forest [Breiman, 2001] and
XGBoost [Chen and Guestrin, 2016] classifiers on
hold-out set cross-validation, using F1-score as an
error metric. The summarized results (model ranks
based on test set F1-score) are presented in (Figure 1)
and (Table 1), and demonstrate the robustness of the
GenericGradientBooster on this given task.
Figure 1: Distribution of model ranks based on test
set F1-score (25 datasets)
3.2 Regression experiments
For the supervised regression examples, I compare:
Prediction intervals computed (directly avail-
able in Python package mlsauce) for the
GenericGradientBooster using split conformal
prediction (SCP, [Vovk et al., 2005]).
Prediction intervals obtain for XGBoost and
LightGBM [Ke et al., 2017] by optimizing a quan-
tile regression loss.
In the SCP context, after training the model on a
proper training set, and obtaining an inference of the
residuals on a calibration set, predictive simulations
are obtained through the use of parametric and semi-
parametric simulation methods. SCP’s recipe for tab-
ular data is relatively straightforward. I let
(X,y);XRn×p,yRn(2)
Be respectively a set of explanatory variables and a
response variable with nobservations and pfeatures.
Under the assumption that the observations are ex-
changeable, SCP begins by splitting the training data
into two disjoint subsets: a proper training set
{(xi,yi):i I1}(3)
and a calibration set
{(xi,yi):i I2}(4)
Scalable Gradient Boosting using Randomized Neural Networks
Table 1: Distribution of model ranks based on test set F1-score (25 datasets)
GB(ExtraTreeRegressor) GB(DecisionTreeRegressor) GB(LinearRegression) XGBClassifier
count 25 25 25 25
mean 2.640000 5.200000 5.600000 4.520000
std 2.580052 4.092676 3.523729 2.518597
min 1 1 1 1
25% 1 2 4 3
50% 2 5 5 4
75% 3 8 6 5
max 11 19 15 12
With I1 I2={1, . . . , n}and I1 I2=. Now, let
Abe a machine learning model. Ais trained on I1,
and absolute calibrated residuals are computed on I2,
as follows:
Ri=|yiˆ
µA(xi)|,i I2(5)
ˆ
µAgives the value predicted by Aon xi,i I2. For
a given level of risk α, the risk of being wrong by say-
ing that the prediction belongs to a certain prediction
interval, a quantile of the empirical distribution of the
absolute residuals (Eq. 5) is then computed as:
Q1α(R,I2):= (1α)(1+1/ |I2|)th elt. of {Ri:i I2}
(6)
To finish, a prediction interval at a new point xn+1is
given by
CA,1α(xn+1):=ˆ
µA(xn+1)±Q1α(R,I2)(7)
This type of prediction interval in (Eq. 7), is described
in [Vovk et al., 2005] as being able to satisfy coverage
guarantees, given that some assumptions do hold
such as the exchangeability of the observations. The
simplest case of data exchangeability is: {x1, . . . , xn}
are independent and identically distributed. Under
these assumptions and a given, expected level of con-
fidence 1 α, I have:
P(yn+1CA,1α(xn+1)) 1α(8)
Where yn+1is the true value of the response variable
yfor unseen observations xn+1.
The California Housing data set (20640 rows and 8
features) available in scikit-learn is used for the
illustration of predictive uncertainty quantification.
Two error metrics are computed on the test set, at a
level of confidence of 95%: coverage rate and Winkler
score.
The coverage rate is a metric used to assess the ac-
curacy of prediction intervals. It measures the pro-
portion of times that the true values fall within the
predicted intervals. The coverage rate is defined as:
Coverage Rate =1
N
N
t=1
I(α,tytuα,t),
where Nis the total number of predictions, α,tand
uα,tare the lower and upper bounds of the prediction
interval at time t, respectively, and I(·)is the indicator
function that equals 1 if the condition is true and 0
otherwise.
The Winkler score is another measure used to evalu-
ate the quality of prediction intervals, taking into ac-
count both the width of the interval and the penalty
for predictions that fall outside it. It is defined as fol-
lows:
Wα,t=
(uα,tα,t) + 2
α(α,tyt)if yt< α,t,
(uα,tα,t)if α,tytuα,t,
(uα,tα,t) + 2
α(ytuα,t)if yt>uα,t,
where α,tand uα,tare the lower and upper bounds of
the prediction interval, respectively, and ytis the true
value. The Winkler score penalizes intervals that do
not contain the true value more heavily, encouraging
both accuracy and precision in predictions.
Results of the experiments are presented in Table
2. Table 2 summarizes the uncertainty quantification
metrics for four different machine learning models. A
few observations:
The models evaluated (assigned with their de-
fault hyperparameters) are:
GenericBoostingClassifier(ExtraTreeRegressor)
GradientBoostingRegressor from
scikit-learn
XGBoost
LightGBM
The key metrics used for comparison in this table
are:
T. Moudiki
Table 2: Uncertainty quantification statistics
Model Coverage Winkler Score
0 GBooster(ExtraTreeRegressor) 0.947190 9527.231309
1 GradientBoostingRegressor 0.948401 10537.027143
2 XGBoost 0.971415 16444.573094
3 LightGBM 0.954700 15525.061013
1. Coverage: Higher (but close to the desired
95% coverage rate) coverage values are de-
sirable, as they indicate that the model’s pre-
diction intervals are accurately capturing the
true test set values.
2. Winkler Score: A lower Winkler Score in-
dicates better performance, as it reflects nar-
rower intervals with fewer instances of in-
correct predictions.
Specific observations from the table:
GBooster(ExtraTreeRegressor): Achieved a
coverage of 0.947190 and a relatively low
Winkler Score of 9527.231309, suggesting a
good balance between interval accuracy and
width.
GradientBoostingRegressor: Has a sim-
ilar coverage (0.948401) to GBooster but
with a slightly higher Winkler Score
(10537.027143), indicating wider intervals or
more penalty for missed predictions.
XGBoost: Shows the highest coverage
(0.971415), indicating that it captures the
true values more often. However, its
Winkler Score (16444.573094) is the highest
among the models, suggesting that it has
wider intervals or a higher penalty for incor-
rect predictions.
LightGBM: Has a good coverage rate
(0.954700) and a lower Winkler Score
(15525.061013) compared to XGBoost, indi-
cating a better trade-off between interval
width and accuracy.
3.3 Other results
These hold-out cross-validation results are
based on 6 data sets available in scikit-learn
([Pedregosa et al., 2011]):
Classification: Breast cancer data set, Wine data
set, Iris data set, Digits data set
Regression: Diabetes data set, California Hous-
ing
For supervised regression tasks, out-of-sample Ad-
justed R-Squared, R-Squared, Root Mean Squared Er-
ror (RMSE) and Time Taken are measured. For clas-
sification, I compute Accuracy, Balanced Accuracy,
ROC AUC (for binary classification), F1 Score and
Time Taken.
Scalable Gradient Boosting using Randomized Neural Networks
Table 3: Breast cancer statistics
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
GenericBooster(MultiTask(TweedieRegressor)) 0.991228 0.986111 0.986111 0.991194 1.606356
GenericBooster(LinearRegression) 0.991228 0.986111 0.986111 0.991194 0.344393
GenericBooster(TransformedTargetRegressor) 0.991228 0.986111 0.986111 0.991194 0.420567
GenericBooster(RidgeCV) 0.991228 0.986111 0.986111 0.991194 1.193292
GenericBooster(Ridge) 0.991228 0.986111 0.986111 0.991194 0.281048
XGBClassifier 0.956140 0.960470 0.960470 0.956588 0.123644
GenericBooster(MultiTask(BayesianRidge)) 0.938596 0.932692 0.932692 0.938819 7.541651
GenericBooster(ExtraTreeRegressor) 0.938596 0.940171 0.940171 0.939223 0.235430
RandomForestClassifier 0.921053 0.927350 0.927350 0.922309 0.235471
GenericBooster(KNeighborsRegressor) 0.868421 0.888889 0.888889 0.872149 0.421673
GenericBooster(DecisionTreeRegressor) 0.868421 0.881410 0.881410 0.871703 0.984263
GenericBooster(MultiTaskElasticNet) 0.868421 0.791667 0.791667 0.856879 0.103075
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.859649 0.785256 0.785256 0.848489 1.178128
GenericBooster(MultiTaskLasso) 0.850877 0.763889 0.763889 0.835158 0.058971
GenericBooster(ElasticNet) 0.850877 0.763889 0.763889 0.835158 0.091385
GenericBooster(MultiTask(QuantileRegressor)) 0.824561 0.722222 0.722222 0.800791 11.295548
GenericBooster(LassoLars) 0.815789 0.708333 0.708333 0.788792 0.081118
GenericBooster(Lasso) 0.815789 0.708333 0.708333 0.788792 0.085661
GenericBooster(MultiTask(LinearSVR)) 0.807018 0.694444 0.694444 0.776487 14.017829
GenericBooster(DummyRegressor) 0.684211 0.500000 0.500000 0.555921 0.009317
GenericBooster(MultiTask(SGDRegressor)) 0.500000 0.462607 0.462607 0.514167 2.173141
Table 4: Iris statistics
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
RandomForestClassifier 1.000000 1.000000 NaN 1.000000 0.265511
GenericBooster(ExtraTreeRegressor) 1.000000 1.000000 NaN 1.000000 0.146940
GenericBooster(TransformedTargetRegressor) 1.000000 1.000000 NaN 1.000000 0.239919
GenericBooster(RidgeCV) 1.000000 1.000000 NaN 1.000000 0.203825
GenericBooster(Ridge) 1.000000 1.000000 NaN 1.000000 0.156059
GenericBooster(LinearRegression) 1.000000 1.000000 NaN 1.000000 0.156545
XGBClassifier 0.972222 0.962963 NaN 0.971853 0.063264
GenericBooster(MultiTask(SGDRegressor)) 0.972222 0.977778 NaN 0.972474 1.271692
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.972222 0.977778 NaN 0.972299 1.185861
GenericBooster(MultiTask(LinearSVR)) 0.972222 0.977778 NaN 0.972474 2.615789
GenericBooster(MultiTask(BayesianRidge)) 0.972222 0.977778 NaN 0.972474 1.765075
GenericBooster(MultiTask(TweedieRegressor)) 0.972222 0.977778 NaN 0.972474 1.364793
GenericBooster(Lars) 0.944444 0.940741 NaN 0.945285 0.901583
GenericBooster(KNeighborsRegressor) 0.916667 0.933333 NaN 0.916667 0.202454
GenericBooster(DecisionTreeRegressor) 0.916667 0.918519 NaN 0.916550 0.208835
GenericBooster(MultiTaskElasticNet) 0.694444 0.611111 NaN 0.607908 0.030114
GenericBooster(ElasticNet) 0.611111 0.527778 NaN 0.529705 0.040390
GenericBooster(MultiTaskLasso) 0.416667 0.333333 NaN 0.245098 0.006721
GenericBooster(LassoLars) 0.416667 0.333333 NaN 0.245098 0.006808
GenericBooster(Lasso) 0.416667 0.333333 NaN 0.245098 0.012097
GenericBooster(DummyRegressor) 0.416667 0.333333 NaN 0.245098 0.009705
GenericBooster(MultiTask(QuantileRegressor)) 0.250000 0.333333 NaN 0.100000 3.300806
T. Moudiki
Table 5: Wine statistics
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
GenericBooster(RidgeCV) 1.000000 1.000000 NaN 1.000000 0.175558
GenericBooster(Ridge) 1.000000 1.000000 NaN 1.000000 0.155528
RandomForestClassifier 0.966667 0.974359 NaN 0.966980 0.159266
GenericBooster(LinearRegression) 0.966667 0.974359 NaN 0.966980 0.138224
GenericBooster(DecisionTreeRegressor) 0.966667 0.974359 NaN 0.966980 0.157965
GenericBooster(TransformedTargetRegressor) 0.966667 0.974359 NaN 0.966980 0.233222
GenericBooster(ExtraTreeRegressor) 0.966667 0.974359 NaN 0.966980 0.158870
XGBClassifier 0.966667 0.974359 NaN 0.966980 0.224243
GenericBooster(KNeighborsRegressor) 0.933333 0.948718 NaN 0.934259 0.278677
GenericBooster(MultiTask(SGDRegressor)) 0.900000 0.923077 NaN 0.901373 0.745216
GenericBooster(MultiTask(TweedieRegressor)) 0.900000 0.923077 NaN 0.901373 1.258438
GenericBooster(MultiTask(LinearSVR)) 0.800000 0.846154 NaN 0.797273 1.457314
GenericBooster(MultiTaskElasticNet) 0.800000 0.846154 NaN 0.797273 0.067585
GenericBooster(MultiTask(BayesianRidge)) 0.633333 0.717949 NaN 0.573580 1.748017
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.566667 0.666667 NaN 0.447126 1.057053
GenericBooster(Lars) 0.500000 0.455484 NaN 0.478851 0.571014
GenericBooster(MultiTask(QuantileRegressor)) 0.433333 0.333333 NaN 0.262016 3.456711
GenericBooster(LassoLars) 0.266667 0.333333 NaN 0.112281 0.010081
GenericBooster(MultiTaskLasso) 0.266667 0.333333 NaN 0.112281 0.006777
GenericBooster(Lasso) 0.266667 0.333333 NaN 0.112281 0.013820
GenericBooster(ElasticNet) 0.266667 0.333333 NaN 0.112281 0.008298
GenericBooster(DummyRegressor) 0.266667 0.333333 NaN 0.112281 0.007588
Table 6: Digits statistics
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
RandomForestClassifier 0.972222 0.971118 NaN 0.972152 0.557913
XGBClassifier 0.969444 0.968237 NaN 0.969636 0.475942
GenericBooster(ExtraTreeRegressor) 0.958333 0.957797 NaN 0.958254 2.004470
GenericBooster(KNeighborsRegressor) 0.952778 0.951295 NaN 0.952138 6.032941
GenericBooster(LinearRegression) 0.938889 0.937112 NaN 0.938683 2.083279
GenericBooster(MultiTask(BayesianRidge)) 0.938889 0.937202 NaN 0.938869 51.394490
GenericBooster(TransformedTargetRegressor) 0.938889 0.937112 NaN 0.938683 2.805541
GenericBooster(RidgeCV) 0.938889 0.937112 NaN 0.938683 6.592610
GenericBooster(Ridge) 0.938889 0.937112 NaN 0.938683 0.654987
GenericBooster(MultiTask(TweedieRegressor)) 0.930556 0.927979 NaN 0.929712 14.350415
GenericBooster(DecisionTreeRegressor) 0.883333 0.881847 NaN 0.884949 4.265207
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.791667 0.792782 NaN 0.800618 11.834521
GenericBooster(MultiTask(LinearSVR)) 0.372222 0.386690 NaN 0.263898 304.779165
GenericBooster(Lars) 0.197222 0.197369 NaN 0.211660 19.009126
GenericBooster(MultiTask(QuantileRegressor)) 0.125000 0.100000 NaN 0.027778 140.477283
GenericBooster(MultiTask(SGDRegressor)) 0.102778 0.100859 NaN 0.061031 9.973222
GenericBooster(LassoLars) 0.072222 0.100000 NaN 0.009729 0.019943
GenericBooster(Lasso) 0.072222 0.100000 NaN 0.009729 0.036511
GenericBooster(MultiTaskLasso) 0.072222 0.100000 NaN 0.009729 0.037265
GenericBooster(ElasticNet) 0.072222 0.100000 NaN 0.009729 0.037229
GenericBooster(DummyRegressor) 0.072222 0.100000 NaN 0.009729 0.016346
GenericBooster(MultiTaskElasticNet) 0.072222 0.100000 NaN 0.009729 0.041954
Scalable Gradient Boosting using Randomized Neural Networks
Table 7: Diabetes statistics
Adjusted R-Squared R-Squared RMSE Time Taken
Model
GenericBooster(HuberRegressor) 0.550044 0.601175 50.126454 3.991869
GenericBooster(SGDRegressor) 0.545136 0.596825 50.399113 0.364206
GenericBooster(RidgeCV) 0.542730 0.594692 50.532220 0.412034
GenericBooster(LinearSVR) 0.542646 0.594618 50.536876 0.173818
GenericBooster(PassiveAggressiveRegressor) 0.541009 0.593167 50.627208 0.311943
GenericBooster(Ridge) 0.539660 0.591971 50.701581 0.304846
GenericBooster(LinearRegression) 0.538827 0.591233 50.747423 0.288610
GenericBooster(TransformedTargetRegressor) 0.538827 0.591233 50.747423 0.483403
GenericBooster(TweedieRegressor) 0.532319 0.585464 51.104249 0.643203
GenericBooster(LassoLars) 0.531062 0.584350 51.172885 0.310882
GenericBooster(Lasso) 0.531060 0.584349 51.172960 0.206111
GenericBooster(ElasticNet) 0.529781 0.583215 51.242705 0.321837
GenericBooster(SVR) 0.516281 0.571249 51.973097 3.476585
GenericBooster(BayesianRidge) 0.498343 0.555350 52.927976 0.714809
GenericBooster(LassoLarsIC) 0.493130 0.550729 53.202286 0.269661
GenericBooster(ElasticNetCV) 0.488796 0.546888 53.429245 3.769253
GenericBooster(LassoLarsCV) 0.488677 0.546782 53.435491 0.781457
GenericBooster(LassoCV) 0.488368 0.546508 53.451635 3.522330
GenericBooster(LarsCV) 0.486622 0.544961 53.542726 1.329292
GenericBooster(ExtraTreeRegressor) 0.407619 0.474935 57.515200 0.579720
XGBRegressor 0.31 0.39 61.96 0.13
GenericBooster(DecisionTreeRegressor) 0.276246 0.358491 63.573680 1.103728
GenericBooster(Lars) 0.185778 0.278303 67.430036 1.146609
GenericBooster(DummyRegressor) -0.128588 -0.000339 79.387047 0.015633
GenericBooster(QuantileRegressor) -0.145991 -0.015765 79.996798 4.535711
GenericBooster(KNeighborsRegressor) -7.859314 -6.852574 222.424213 1.736072
Table 8: California Housing statistics
Adjusted R-Squared R-Squared RMSE Time Taken
Model
GenericBooster(ExtraTreeRegressor) 0.820121 0.827352 0.343269 1.310487
GenericBooster(SVR) 0.784341 0.793010 0.375862 15.781998
XGBRegressor 0.78 0.79 0.38 1.48
GenericBooster(HuberRegressor) 0.768820 0.778114 0.389152 6.593755
GenericBooster(LinearSVR) 0.765265 0.774702 0.392133 7.471201
GenericBooster(TransformedTargetRegressor) 0.750575 0.760602 0.404217 2.428576
GenericBooster(LinearRegression) 0.750575 0.760602 0.404217 2.170331
GenericBooster(Ridge) 0.749891 0.759946 0.404771 0.460996
GenericBooster(RidgeCV) 0.745055 0.755304 0.408665 2.424808
GenericBooster(PassiveAggressiveRegressor) 0.737846 0.748385 0.414403 0.566062
GenericBooster(SGDRegressor) 0.734213 0.744898 0.417264 0.652428
GenericBooster(DecisionTreeRegressor) 0.727793 0.738736 0.422274 3.242057
GenericBooster(LassoLarsIC) 0.710070 0.721725 0.435804 1.223871
GenericBooster(BayesianRidge) 0.705895 0.717719 0.438930 3.260712
GenericBooster(LassoLarsCV) 0.700271 0.712321 0.443107 1.837407
GenericBooster(LassoCV) 0.700099 0.712156 0.443234 10.310291
GenericBooster(ElasticNetCV) 0.699678 0.711751 0.443546 8.910940
GenericBooster(TweedieRegressor) 0.693864 0.706171 0.447818 2.001986
GenericBooster(LarsCV) 0.689654 0.702131 0.450887 1.600699
GenericBooster(Lars) 0.419367 0.442709 0.616730 1.028323
GenericBooster(ElasticNet) 0.251419 0.281512 0.700267 0.264049
GenericBooster(QuantileRegressor) -0.041958 -0.000070 0.826170 10.428148
GenericBooster(DummyRegressor) -0.078596 -0.035236 0.840570 0.015235
GenericBooster(Lasso) -0.078596 -0.035236 0.840570 0.020283
GenericBooster(LassoLars) -0.078596 -0.035236 0.840570 0.024084
GenericBooster(KNeighborsRegressor) -0.459139 -0.400480 0.977671 3.530411
T. Moudiki
References
[Bischl et al., 2017] Bischl, B., Casalicchio, G., Feurer,
M., Gijsbers, P., Hutter, F., Lang, M., Manto-
vani, R. G., van Rijn, J. N., and Vanschoren, J.
(2017). Openml benchmarking suites. arXiv
preprint arXiv:1708.03731.
[Breiman, 2001] Breiman, L. (2001). Random forests.
Machine learning, 45:5–32.
[Chen and Guestrin, 2016] Chen, T. and Guestrin, C.
(2016). Xgboost: A scalable tree boosting system. In
Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
page 785–794. ACM.
[Friedman, 2001] Friedman, J. H. (2001). Greedy
function approximation: a gradient boosting ma-
chine. Annals of statistics, pages 1189–1232.
[Hastie et al., 2009] Hastie, T., Tibshirani, R., Fried-
man, J. H., and Friedman, J. H. (2009). The elements
of statistical learning: data mining, inference, and pre-
diction, volume 2. Springer.
[Ke et al., 2017] Ke, G., Meng, Q., Finley, T., Wang,
T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017).
Lightgbm: A highly efficient gradient boosting de-
cision tree. In Proceedings of the 31st International
Conference on Neural Information Processing Systems,
page 3149–3157. Curran Associates Inc.
[Pedregosa et al., 2011] Pedregosa, F., Varoquaux, G.,
Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg,
V., et al. (2011). Scikit-learn: Machine learning
in python. the Journal of machine Learning research,
12:2825–2830.
[Prokhorenkova et al., 2018] Prokhorenkova, L., Gu-
sev, G., Vorobev, A., Dorogush, A. V., and Gulin, A.
(2018). Catboost: unbiased boosting with categori-
cal features. Advances in Neural Information Process-
ing Systems, 31:6638–6648.
[Vovk et al., 2005] Vovk, V., Gammerman, A., and
Shafer, G. (2005). Algorithmic learning in a random
world, volume 29. Springer.
ResearchGate has not been able to resolve any citations for this publication.
Catboost: unbiased boosting with categorical features
  • Prokhorenkova
[Prokhorenkova et al., 2018] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. (2018). Catboost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31:6638-6648.