Content uploaded by Thierry Moudiki
Author content
All content in this area was uploaded by Thierry Moudiki on Feb 22, 2020
Content may be subject to copyright.
Content uploaded by Thierry Moudiki
Author content
All content in this area was uploaded by Thierry Moudiki on Jan 03, 2020
Content may be subject to copyright.
Content uploaded by Thierry Moudiki
Author content
All content in this area was uploaded by Thierry Moudiki on Jul 26, 2019
Content may be subject to copyright.
Multinomial logistic regression using
quasi-randomized networks
Thierry Moudiki
22nd February 2020
Contents
1 Introduction 2
2 Describing the model 2
3 Numerical example 3
Abstract
This paper contributes to the development of quasi-randomized net-
works; neural networks with quasi-randomized hidden layers. It deals in
particular with multinomial logistic regression, a supervised learning
method that allows to classify statistical/machine learning model obser-
vations in multiple categories. The model presented here notably takes
advantage of clustering and dropout, to improve its learning capabilities.
1
Multinomial logistic regression using quasi-randomized networks
1 Introduction
Following some ideas from Moudiki et al. (2018), the model introduced here
is a hybrid penalized regression/neural network model, derived from the class
of randomized neural networks. Randomized neural networks were introduced
by Schmidt et al. (1992), and Random Vector Functional Link neural networks
(RVFL), by Pao et al. (1994).
In RVFL networks, in addition to a single layer neural network that explains
the non linear effects of covariates on the response, there is an optional direct
link (linear link) between the explanatory variables and the output variable,
explaining the linear effects. They have been successfully applied to solving
different types of classification and regression problems; see for example Dehuri
and Cho (2010).
Here, the focus is placed on multinomial logistic regression (see Friedman
et al. (2001), Chapter 4); a supervised learning method allowing to classify
model observations in multiple categories. In order to obtain hidden layer’s
nodes of our RVFL, we use a deterministic Sobol sequence (see Niederreiter
(1992)). Sobol sequences have been successfully used in the past for this type
of models by Moudiki et al. (2018), on multivariate time series. Some data
preprocessing methods such as clustering and dropout (Srivastava et al. (2014))
are also considered in the construction of this model.
In section 2, we describe our penalized multinomial logistic regression model,
and section 3presents a numerical example of this model applied to a dataset.
2 Describing the model
Our model is based on ideas from Moudiki et al. (2018). It’s a hybrid penalized
regression/neural network model, with separate constraints on the linear link
and the hidden layer. As in Zhu and Hastie (2004), Bishop (2006), Friedman
et al. (2010), model probabilities for each class are calculated for k0∈{1, . . . , K},
and x∈Rp, as:
P(G=k0|X=x) = exTβk0
∑K
k=1exTβk
(1)
xis a vector containing characteristics of an observation: the initial model
covariates, plus non-linear transformations of these covariates, as in Moudiki
et al. (2018). Non-linear transformations of the covariates are obtained through
an activation function (typically, a Rectified linear units, ReLU, or an hyperbolic
tangent, etc.).
More precisely, all x’s are stored as rows in a matrix X:
X:=[ZΦ(Z)] (2)
Xis the concatenation of matrices Zand Φ(Z)by columns. The first
columns of Xcontain a matrix Z: model’s standardized input data, potentially
enriched with clustering information. Clustering can determine a priori, and be-
fore model learning, homogeneous groups of model observations. Typically this
2
Multinomial logistic regression using quasi-randomized networks
clustering information on input data, if requested, consists of one-hot encoded
covariates. One for each k-means or Gaussian Mixture clustering cluster. For
each line i∈{1, . . . , n}of matrix Φ(Z), we have the following terms, with zi
being the ith line of matrix Z,Wterms of a Sobol sequence, and g an activation
function (as mentioned before):
Φ(Z)i=gzT
iW(3)
In the construction of Φ(Z),dropout can also be used. The idea of dropout
(Srivastava et al. (2014)) is to randomly remove some nodes in the hidden layer,
in order to prevent the model from being too flexible, and overfit the input
data. Now, going back to (Eq. 1), Kis the total number of classes and every
βk∈Rp,k∈{1, . . . , K}is a vector of unknown model coefficients:
β(d)
kβ(h)
kT
β(d)
kare coefficients on the linear link (on Z), and β(d)
k, coefficients on the
hidden layer (on Φ(Z)). These coefficients are determined by optimizing the
model’s penalized log-likelihood l. Using (Eq. 1), an expression of our penalized
log-likelihood lfor nobservations is:
l(X,β) = −1
n
n
∑
i=1"K
∑
k=1
(YXβ)i,k−log K
∑
k=1
e(Xβ)i,k!# (4)
+λ1||β(d)||2
F+λ2||β(h)||2
F(5)
Yis a one-hot encoded version of model response y.β∈Rp×Kis a matrix
of coefficients, containing at column k0the coefficients for class k0(that is, βk0).
is a elementwise matrix multiplication, and ||.||2
Fis the Frobenius norm of
a matrix. λ1and λ2are regularization parameters constraining the norm of
model coefficients βand preventing overfitting, as in a Ridge regression (Hoerl
and Kennard (1970)).
The method presented here is available in Python package nnetsauce Moudiki
(2019–2020) (as of writing, in the development version on Github), and current
optimization methods available for optimizing the log-likelihood (Eq. 4) are
Newton Conjugate gradient or L-BFGS-B.
3 Numerical example
This example is based on a dataset from scikit-learn:breast cancer, the
breast cancer wisconsin dataset. This dataset contains 569 observations, 30
covariates, and 2classes. Other examples based on other datasets can be found
on Github.
3
Multinomial logistic regression using quasi-randomized networks
We start by importing the data:
import nnetsauce as ns
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
# split data into training test and test set
np.random.seed(123)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Next, we fit the model, and obtain accuracy and area under the curve (AUC)
on test set:
# create the model with nnetsauce
fit_obj = ns.Ridge2Classifier(lambda1 = 6.90185578e+04,
lambda2 = 3.17392781e+02,
n_hidden_features=95,
n_clusters=2,
row_sample = 4.63427734e-01,
dropout = 3.62817383e-01,
type_clust = "gmm")
# fit the model on training set
fit_obj.fit(X_train, y_train)
# get the accuracy on test set
print(fit_obj.score(X_test, y_test))
# get area under the curve on test set (auc)
print(fit_obj.score(X_test, y_test, scoring="roc_auc"))
On this example, we obtain an accuracy of 98.24% on the test set, and an
AUC of 0.98. See GitHub repository for other examples, on other datasets.
4
Multinomial logistic regression using quasi-randomized networks
References
Bishop CM (2006). Pattern recognition and machine learning. springer.
Dehuri S, Cho SB (2010). “A comprehensive survey on functional link neural
networks and an adaptive PSO–BP learning for CFLNN.” Neural Computing
and Applications,19(2), 187–205.
Friedman J, Hastie T, Tibshirani R (2001). The elements of statistical learning,
volume 1. Springer series in statistics New York.
Friedman J, Hastie T, Tibshirani R (2010). “Regularization paths for generalized
linear models via coordinate descent.” Journal of statistical software,33(1),
1.
Hoerl AE, Kennard RW (1970). “Ridge regression: Biased estimation for
nonorthogonal problems.” Technometrics,12(1), 55–67.
Moudiki T (2019–2020). “nnetsauce, A general-purpose tool for Statisti-
cal/Machine Learning.” https://github.com/thierrymoudiki/nnetsauce.
BSD 3-Clause Clear License. Version 0.3.3.
Moudiki T, Planchet F, Cousin A (2018). “Multiple Time Series Forecasting
Using Quasi-Randomized Functional Link Neural Networks.” Risks,6(1), 22.
Niederreiter H (1992). Random number generation and quasi-Monte Carlo meth-
ods. SIAM.
Pao YH, Park GH, Sobajic DJ (1994). “Learning and generalization charac-
teristics of the random vector functional-link net.” Neurocomputing,6(2),
163–180.
Schmidt WF, Kraaijveld MA, Duin RP (1992). “Feedforward neural networks
with random weights.” In Pattern Recognition, 1992. Vol. II. Conference
B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR
International Conference on, pp. 1–4. IEEE.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014).
“Dropout: a simple way to prevent neural networks from overfitting.” The
journal of machine learning research,15(1), 1929–1958.
Zhu J, Hastie T (2004). “Classification of gene microarrays by penalized logistic
regression.” Biostatistics,5(3), 427–443.
5