Content uploaded by Shuquan Ye
Author content
All content in this area was uploaded by Shuquan Ye on Oct 05, 2018
Content may be subject to copyright.
Two-dimensional-reduction Random Forest
Shuquan Ye
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
e-mail: 201536401655@mail.scut.edu.cn
Zhiwen Yu
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
e-mail: zhwyu@scut.edu.cn
Jiaying Lin
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
e-mail: 201530501498@mail.scut.edu.cn
Kaixiang Yang
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
e-mail: xgkaixiang@163.com
Dan Dai
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
e-mail: daidanjune@hotmail.com
Zhi-Hui Zhan
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
Wei-Neng Chen
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
Jun Zhang
School of Computer Science and Engineering
South China University of Technology
Guangzhou 510006, China
Abstract—Random forest (RF) is a competitive machine
learning theorem, while one of the big challenges for it is
imbalanced real-world data. A Two-dimensional-reduction RF
(2DRRF) is presented in this paper, which is optimized based on
traditional RF and three innovation points as follows. To
improve RF in terms of performance on imbalanced data, a two-
dimensional-reduction approach is created. Then, a modified T-
link is proposed focusing on detecting and reducing safe samples.
Moreover, a biased sampling manner is employed to build up
optimal training datasets. Across 13 imbalanced datasets from
KEEL-dataset with imbalance-ratio ranging from 6.38 to 129.44,
experiments are carried out indicating that 2DRRF steadily
holds advantages over the other two relevant implementations
of RF in terms of accuracy, recall, precision and F-value.
Keywords—Imbalanced dataset; classification; random forest;
two-dimensional-reduction; biased sampling
I. INTRODUCTION
The imbalanced datasets can be defined as the data that
holds a negligible number of samples belonging to a particular
class, or positive class, which are of our main interest but
outnumbered by the remainders. With the continuous
emergence of information in fields such as medicine,
genomics, financial businesses, education [1] or network
intrusion [2], vast amount of data are generated with high
imbalance ratio. Regarding the standard data mining theorems
which are sensitive to the imbalance in data, gaining vital
information from imbalanced dataset has become a
challenging task.
A variety of theorems have been proposed to overcome
imbalanced problems. Generally, they are classified as data
processing methods and algorithmic methods. The first
category of methods apply a bias on the data distribution by
resampling or adjusting weights of each class to balance the
dataset ( e.g. under-sampling on majority class, over-sampling
on minority class and weight matrix [3] ). The second category
is internal, which is based on adjusting traditional algorithm
by taking imbalance into consideration ( e.g. biased minimax
probability machine [4], weighted extreme learning machine
[5] ).
The challenge of resampling method is to distinguish the
informative from redundant data, while the challenge of
algorithmic level approaches are selecting and improving
accuracy over original algorithms. This study provides an
approach combining the characteristics from above two
categories with bagging random forest model.
Traditionally, random forests (RF) are competitive
ensemble machine learning theorem, which combines the
result of tree predictors depending on the respective bootstrap
training samples. However, relying on the bootstrap sampling
approach, standard RF are not adapted to the contingency of
learning from imbalanced datasets.
Recently, a parallel random forest algorithm for big data
[6] was proposed to solve the high-dimensional and noisy data
problem which can be summarized by two steps: a bootstrap
sampling method, followed by a reduction in the number of
features. The result demonstrates the promising results of RF
and that there are plenty of room to adapt random forest model
to real-world data.
To adapt RF to learning imbalanced datasets, a two-
dimensional-reduction RF (2DRRF) is put forward in this
study. Usually, datasets are expressed in two dimensions: the
volume of data and the amount of features, which correspond
to the numbers of rows and columns of the data matrix
respectively. For the reduction in horizontal dimension, we
proposed a modified Tomek-link (T-link) used for reducing
safe samples, combined with Hart’s Condensed Nearest
Neighbor Rule (CNN) [7] under-sampling method. Following
this, a biased sampling manner was applied. For the reduction
in vertical dimension, Gini index [8] and Chi-square [9] are
employed of feature-reduction usage. With the two index
above focusing on informational and relational features
respectively, such framework can provide a new benchmark
for feature measurement. According to all above, training sets
can be built up for each tree.
Besides the improvement of learning the imbalanced data,
the two-dimensional-reduction(2DR) may also provide some
guidance for dealing with challenge of data growth of real-
world data.
The rest of the paper is organized as follows. Section 2
reviews the background works. Section 3 describes 2DRRF in
detail. Section 4 presents the experiment results and
conclusions are shown in Section 5 together with future works.
II. BACKGROUND
A. Related Work
Leo Breiman published his work on the first RF in 2001
[10]. Since then, this relatively new learner has showed its
power and robustness and been applied to a vast variety of
fields. With few experimentation on the construction of
random forest classifiers on imbalanced data has been
reported, T.Khoshgoftaar, M.Golawala and J.Hulse firstly
discussed the topic of performance of RF on imbalanced data
on Weka in 2007 [11]. In the work of M.Khalilia,
S.Chakraborty and M.Popescu in 2011, they provided
extensive proof of the guaranteed result on RF when used for
disease prediction with the highly imbalanced Healthcare Cost
and Utilization Project (HCUP) dataset [12].
A study of several methods on imbalanced training data
for machine learning was carried out by G.Batista in 2004,
performing a broad experimental evaluation [13] and
enlightened this study, in which one-sided selection (OSS) [14]
was the under-sampling method we are improving. Recently,
there are also many other researches combining or improving
methods in prior study, such as a hybrid scheme of over-
sampling and under-sampling in 2013 [15], Class Ratio
Preserved RF in 2015 [16], K-L feature compression method
in 2017 [17], and class weights voting (CWsRF) in 2018 [18].
Recently, a parallel-RF algorithm for big data on Apache
Spark platform [6] was presented by J.Chen, K.Li and Z.Tang
in 2017 to deal with the high-dimensional data by two steps: a
traditional bootstrap sampling manner in RF, followed by a
reduction in the number of features. Inspired by their study,
we took a step forward to combining modified T-link and
CNN in sampling followed by improved feature-reduction
method.
A scikit-learn[19] ensemble random forest classifier,
which is one of the most popular machine learning packages
in python, is an simple and efficient tool for this reseach. Their
algorithms are perturb-and-combine techniques [20]. While
another strong and robust implementation of ensembles and
tree learning algorithm called treelearn[21] was employed as
another control group.
The experimental datasets are from online KEEL
(Knowledge Extraction based on Evolutionary Learning)
imbalanced-dataset [22] which is a set of benchmarks
allowing to perform a complete analysis of the behavior of the
learning methods in comparison to existing ones.
B. Performance measure
In real-world tasks, a learner can avail itself of the
imbalance distribution of datasets. For example, with a dataset
of 99% negative class samples, a learner always predicting
negative should holds 99% average accuracy. In this case, the
average accuracy is not an adequate criterion. Several
alternative criteria are used in our experiment:
A Confusion matrix as shown in Table 1 with four
outcomes when examples are classified, is the most
straightforward way used as evaluation of a machine
learning model. From the confusion matrix, accuracy,
precision, recall and F-value may be defined as
follows:
In which β is usually set to 1. The main goal for
classifying imbalanced datasets is to heighten the
recall without hurting the precision and the F-value
represents the trade-off among different values of them.
Identify applicable sponsor/s here. If no sponsors, delete this text box
(sponsors).
III. TWO DIMENSIONAL REDUCTION RF
This section describes the optimization on Random forest
algorithm aiming at accommodating to imbalanced data and
we propose a Two-dimensional-reduction RF (2DRRF). The
major goal is to improve the overall performance and
overcome the difficulties with imbalanced datasets for RF as
analyzed below, by applying both vertically and horizontally
reductions to ensures the training subset are sampled
reasonably and features selected are optimally.
The steps constructing 2DRRF are shown in figure 1. For
the horizontal dimension reduction, modified T-link is used
firstly. Other than the convention of using traditional T-link
[23] for reducing boundary samples, it is used for reduction
among the interiority of majority class distribution in our
method so as to reduce the proportion of majority data as well
as emphasize the original border between the majority and
minority. Next, with CNN, we will find consistent subset from
the dataset we did with the previous step. To sum up, for
horizontal dimension we reduce examples from the majority
class that are distant from the decision border, since these sorts
of examples might be considered redundant or less relevant
for learning. After the horizontal dimension reduction, a
biased sampling manner was applied. For the vertical
dimension reduction, a Gini index is used firstly to compute
entropy, with which feature variables can be divided into low
and high importance. After Chi-square test, we can further
divide the feature variable into low or high correlation.
According to the four groups above, features can be extracted
obliquely so as to tease out informative and relative features
building up training sets. More detailed description are shown
in Subsection B, C and D.
A. The Random Forest Model
The traditional random forest model is an ensemble of
predictors
such that each tree are trained on independent samples
with the same distribution for all trees in the forest using
random sampling with replacement, or bootstrap, along with
random selection of features. During prediction, each sample
in testing data should be classified by all trees, which returns
final result according to the votes.
However, these fundamental ideas and rules based on
general case will be misled when imbalance ratio exceeds
certain threshold. Theoretically, the upper bound for the
generalization error of a forest, given by (1), is in direct
proportion to the correlation between the individual trees and
of negative correlation with the strength of them. Given sparse
positive examples, the positive regions would be extremely
Table 1: Confusion Matrix
Fig 1: Process for 2DR construction
Fig 2: Process for RF construction
small with decision border fitted closely to positive samples,
causing overfitting and reduce strength of trees eventually.
Also, the variance of , which is proportional to the mean
value of correlation as shown in (2), will be high when vast
majority of samples in each training set are from negative
class. The main processes of constructing a random forest are
as illustrated in figure 2.
Where the strength of the set of classifier is
And the variance of is
B. Horizontal dimension reduction
For the multi-class problem with different class
labels , suppose the given training
dataset is in the form
, where is the feature vector and is the corresponding label.
There are totally samples. The explanation of relevant
elements and denotes are shown in Table 2 which will appear
in the following steps.
Modified T-link: Reduction Inside Majority Class
In this step, to improve the performance of the RF
algorithm on imbalanced data, we present a modified T-link
for the usage of under sampling.
Traditional T-link as shown in figure 3 colored by deep
green, can be defined as follows: Given two examples in
dataset, and , coming from different classes, say,
and . Let refer to the distance
between feature vectors and . A T-link is said to be
formed when there is no sample so that
or . That is:
The character of T-link is that for the two examples who
form a T-link, they should be removed because both examples
are boundary, or either one is noise. In another words, the
samples are divided into two categories:
1) Borderline examples distributing around class
boundaries, where the positive and negative classes overlap.
2) Safe samples which are located in relatively
homogeneous areas without overlapping class labels.
However, it is the deletion of borderline samples that causes
disputes because the samples along boundary are important
information for forming decision border. In consideration of
this, an adjustment is made to T-link as shown in figure 3
which are colored by bright orange. When a T-link is found
between samples both from majority class, and
, a reduction in majority class would be carried out.
A sample is randomly selected from and , say, , will be
moved from to another set in order to counteract the bias
of majority data far away from the borderline without hurting
the original border between the majority and minority at the
same time.
After this step, raw dataset is divided into and all
the rest . The modified Tomek-link algorithm is shown in
algorithm 1.
CNN: Find consistent subset
In this step, Hart’s Condensed Nearest Neighbor Rule is
applied to find a consistent subset of examples in for under
sampling.
The CNN attempts to remove the examples far away from
decision border and selectively reduces the original
population. A subset is consistent with when
Table 2: Table of elements
Element
Description
jDj
number of samples in
D
jyi=C1j
number of samples labeled with
C1
V(C1)
the set with all samples labeled
with
C1
Cmax
the majority class,
Cmax =Cmjyi=Cmj
Cmin
the others classes
dim(xi)
the dimension of
xi
V(C1)
the set with all samples labeled
with
C1
ak
the feature
a
with all the data in
this column
Fig 3: Tomek-links
using a 1-nearest-neighbor algorithm (1-NN), can
correctly classify all the instances in correctly.
This algorithm goes as follows: Firstly, select one sample
from the majority class and all samples belonging to the
minority class forming . A reminder that, this majority class
is not necessarily the majority in raw dataset because of the
previous step of T-link. Secondly, a 1-NN was applied upon
to classify the examples in Afterwards, all
misclassified instances are added to so as to ensure
correctness. The three processes above are repeated until no
examples in will be misclassified. The rest are collected
into where . To note that this algorithm does
not guarantee a smallest consistent set.
The horizontal dimension reduction ends with three
subsets , and .
C. Sampling with bias
In this step, to sample from the dataset which was reduced
or compressed by the first step, a biased sampling manner is
applied. A biased sample is collected in the way that some
samples of the intended population are less or more likely to
be included than others. In this case, aiming at selecting
subsets from , and as
training data for trees, with samples each, the probability
should be set differently:
This is a simple implementation of dynamically
probability setting for biased sampling, the principle is to do a
square operation on
, because in the range
[0,1], the characteristics of square operation is monotonically
increasing, and its derivation also monotonically increasing. It
can be automatically realized when the set is too small and
the set is too large, the probability of extracting
from the set is appropriately increased in order to
ensure the diversity of the training subsets.
D. Second dimension reduction
In this part, we are further reducing or compressing each
training subsets in the dimension of feature space.
Both Gini index and Chi-square are employed to evaluate
features comprehensively. The former is from the perspective
of tree-growth and information gain which emphasizes
learning depth, while the latter is from the perspective of
statistical relativity between features and final result which
emphasizes correlation.
Gini index: Evaluating importance
For the sample subspace in ,
there are total features. If the sample is discrete,
and is divided into two subsets and , by one feature
, Gini index can be computed as follows:
If the sample is divided into subsets where ,
Where for dataset ,
Here Gini index is used as feature importance evaluation
and the reason why Gini index was chosen is that it is much
Algorithm 1 modi¯ed T-link
Input: x=fxigN
i=1: The set of feature vectors.
id: The indexes of samples in xfrom ma-
jority class
Output: Da=all link,D0=all rest
1: Initialize all link =Â;all rest =Â;
x nearest neighbor =Â;
2: x nearest neighbor =NearestNeighbor(x),
stores index of neighbor
3: for i in id do
4: if x nearest neighbor(i)=2id, whether
sample ibelongs to majority class then
5: Continue
6: end if
7: if x nearest neighbor(i)< i, to ensure
the pair in all link is unique then
8: Continue
9: end if
10: Randomly choose one from
x nearest neighbor and i, put into
all link
11: end for
12: all rest =x¡all link
Table 2: Four groups
: High correlation
and high importance
:High correlation
and low importance
:Low correlation
and high importance
:Low correlation
and low importance
more easier to compute than information gain ratio(entropy),
for it does not employ the logarithm operation.
After sorting according to Gini index and divide the
features by a certain threshold, the important feature set and
the unimportant feature set can be generated. In this
application, the top , or ranking are moved to
and the rest which ranks from to are .
.
Chi-square: Evaluating correlation
Pearson’s Chi-square test is used in nominal variables to
determine whether there is a significant correlation between
the expect features, and in this situation. The steps for
Chi-square test are as follows:
1) Null pypothesis and Alternative hypothesis: Null
pypothesis assumes that there is no association between
and , while Alternative hypothesis assumes that there is an
association.
2) Transform the data into table: Each row represent a
value of and each column represent a value of . Let
and .
3) Expected value of and :
О
О
О
О
4) Chi-square test of independence:
О
О
5) Degree of freedom:
6) Hypothesis testing:
The critical value for the chi-square statistic is determined
by the level of significance (typically 0.05) and the degrees of
freedom. If the observed chi-square test statistic is greater than
the critical value, the null hypothesis can be rejected and there
is a significant correlation between and .
After sorting with Gini index and Chi-square test,
eventually, the feature variables are grouped as shown in table
2. The following step is to select features with bias which is
the same way as the biased sampling. For example:
This procedure ends up with training subsets of
including biased selected features.
All operations on data can be expressed as figure 4.
IV. EXPERIMENTAL EVALUATION
This section performed experiments to evaluate the
proposed idea of RF improvement, which was improved on
the basis of scikit-learn ensemble random forest classifier,
while another strong and robust implementation of ensembles
and tree learning algorithm called treelearn was employed as
another control group. For experimental details, in terms of
the accuracy, recall, precision and F-value of our
implementation on 2DRRF are evaluated by comparison with
two RF: the scikit-learn RF classifier, and the treelearn
classifier.
Fig 4: Data operation process
Original data is split into random training and testing sets
with 0.25 of the proportion of the dataset to include in test
splits. Each performance test result is obtained averaging the
measurements of 100-30 loops, according to time consuming.
In these experiments, it turns out that several parameters
and attributes are keys for classification performance. Using
GridSearch, the optimal value of the number of trees can be
automatically selected. Also, there are other factors in 2DRRF
critical for learning: ratio of sample size for each tree, ratio of
sample bias, and three ratio of feature biases. To make it more
rigorous, in order to carry out controlled experiment, some
parameters of scikit-learn RF are set manually instead of using
default value. Totally 13 real-world-datasets from KEEL-
dataset[22] are chosen with different volume of samples from
168 to 4174, number of features from 6 to 41, and imbalance
ratios from 6.38 to 129.44 as presented in table 3.
Various experimentations of comparison are constructed
and performed on two different machines with Intel 3.20GHz
x64- based processor and 8.00GB RAM each.
As described in table 4, the performance of scikitlearn-RF,
treelearn-RF and 2DRRF are compared in terms of the
accuracy, recall, precision and F-value.
In general, our implementation of 2DRRF outperforms the
other two, especially in terms of the F-value, providing very
good result in practice. The process of 2DR will provide
increase their strength, thus reducing generalization error.
When analyzing each result of datasets, three conclusions can
be extracted:
In terms of performance on datasets 9, 10 and 13
whose imbalance ratio are over 50, or highly
imbalanced, first of all, it can be observed that
2DRRF performs steadily holding the highest Recall,
Precision and F-value, as those three standards are
relatively more convincing when measuring highly
imbalanced data sets, since their design is aimed at
measuring the classification of positive samples by
the model, which makes it more suitable as an
indicator of the performance on imbalanced data.
Moreover, the recall remains at 1.0.
In terms of performance on datasets 4-7, where each
of these datasets holds the same data volume, 2DRRF
still maintains its advantages. However, with these
four datasets, especially with yeast3, our approach
shows less stable performance on Recall. Also with
fixed data amount and features but increasing the
imbalance ratio, our approach didn’t show a more or
less performance on Accuracy and Precision.
Considering small or medium sized dataset without
high imbalance ratio, nor large feature number, our
algorithm performs slightly worse than in other case.
When looking at the above three cases, answer may be
hidden in the design and structure of the algorithm itself. For
the first case of great performance on large and high-
imbalanced data, and the third case of poor performance on
low feature number and small data-size, the reduction
procedure can be the point. Since 2DR can reduce the size of
original data and the feature for tree construction, it provides
RF with better ability to learn large datasets. In contrast, its
ability to learn from small and few features data is weaken. As
for case two, the result shows that 2DR is relatively not
sensitive to changes in imbalance-ratio. In other words, it
holds some stability and robustness.
V. CONCLUSION AND LIMITATIONS
This paper presents the improvement of Random forest
over imbalanced data with a strategy which employs reduction
in both the data volume and the feature space, and
substantiates the improvement both theoretical and
experimental. A proposal of a taxonomy of 2DR is offered,
under-sampling input training set with a combination of
modified T-link and CNN, driving training subsets for each
tree with a biased sampling manner, selecting features with a
mixed feature measurement of Gini index and Chi-square test.
After experimental study of improved RF comparing with
traditional RF, the main conclusions achieved are as follows:
Benefitting from the 2DR, our algorithm steadily
outperforms the other two in comparison and shown
more robust to the change of imbalance ratio in terms
of accuracy, recall and precision, indicating notable
superiority and strength over the others.
Taking advantage of the reduction method, it has
gained a better ability to deal with large data with
many features and high imbalance ratio.
The difficulty of dealing with small sized and few
features dataset indicated that pure dimensionality
reduction may not be the optional solution to all cases
in imbalanced problem.
For feature work, we will focus on improvement of
2DRRF in three fields:
To further improve the performance on small dataset,
we may employee the algorithm combining both 2DR
and oversampling method so as to adjust itself to
sample size.
In comparison with the original RF algorithm, the
process of 2DR ensures the training subset and
selected features are optimal, while we may loss some
diversity among the trees, which may lead to increase
of generalization error, or overfitting, the problem not
mentioned in the original algorithm. To solve this
problem, we can simplify the specific decision trees
by pruning.
In our implementation, training time and efficiency
are not taken in account, resulting in significantly
increased time-consuming. Therefore, we plan to
simplify the algorithm, optimize the order on the basis
of carrying out complexity analysis.
ACKNOWLEDGMENT
The work described in this paper was partially funded by
grants from the NSFC No. 61722205, No. 61751205, No. 61-
572199, No. 61572540, and No. U1611461, the grant from
the Guangdong Natural Science Funds (No. S2013050014677,
and No. 2017A030312008), the grant from Science and
Technology Planning Project of Guangdong Province, China
(No. 2015A050502011, No. 2016B090918042, No. 2016-
A050503015, No. 2016B010127003), the grant from Guang-
zhou Science and Technology Planning Project (No.
201704030051)
REFERENCES
[1] Chau V, Phung N, Proceedings - 2013 RIVF International Conference
on Computing and Communication Technologies: Research,
Innovation, and Vision for Future, RIVF 2013 (2013) pp. 135-140.
[2] J. Zhang and M. Zulkernine. Network intrusion detection using random
forests. Proc. Of the Third Annual Conference on Privacy, Security and
Trust (PST), pages 53–61, October 2005.
[3] Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P.
(2002). SMOTE: Synthetic minority over-sampling technique. Journal
of Artificial Intelligence Research, 16:321–357.
[4] Huang, K., Yang, H., King, I., and Lyu, M. R. (2006). Imbalanced
learning with a biased minimax probability machine. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics,
36(4):913–923.
[5] Zong, W., Huang, G.B., and Chen, Y. (2013). Weighted extreme
learning machine for imbalance learning. Neurocomputing 101, 229–
242.
[6] Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., and Li, K. (2017).
A Parallel Random Forest Algorithm for Big Data in a Spark Cloud
Computing Environment. IEEE Transactions on Parallel and
Distributed Systems 28, 919–933.
[7] Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE
Transactions on Information Theory, 14, 515–516.
[8] Breiman, L., Friedman, J., Olshen, R., Stone., C.: Classification and
Regression Trees, Wadsworth, Belmont, MA (1984).
[9] Michalski, Stepp, & Diday, 1981; Diday, 1974 .
[10] Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001).
[11] Khoshgoftaar T, Golawala M, Hulse J. 19th IEEE International
Conference on Tools with Artificial Intelligence(ICTAI 2007), vol. 2
(2007) pp. 310-317.
[12] Khalilia M, Chakraborty S, Popescu M. BMC Medical Informatics and
Decision Making, vol. 11, issue 1 (2011).
[13] G.Batista, R.Prati, M.Monard. ACM SIGKDD Explorations
Newsletter, vol. 6, 2004.
[14] Kubat, M., and Matwin, S. Addressing the Course of Imbalanced
Training Sets: One-sided Selection. In ICML (1997), pp. 179–186.
[15] Chau, V.T.N., and Phung, N.H. (2013). Imbalanced educational data
classification: An effective approach with resampling and random
forest. In Proceedings - 2013 RIVF International Conference on
Computing and Communication Technologies: Research, Innovation,
and Vision for Future, RIVF 2013, pp. 135–140.
[16] Khoshgoftaar T, Fazelpour A, DIttman D, Napolitano A, Proceedings
- 2015 IEEE 16th International Conference on Information Reuse and
Integration, IRI 2015 (2015) pp. 342-348 Published by Institute of
Electrical and Electronics Engineers Inc.
[17] Zhu M, Su B, Ning G, 2017 International Conference on Smart Grid
and Electrical Automation (ICSGEA) (2017) pp. 273-277 Published by
IEEE.
[18] Zhu M, Xia J, Jin X, Yan M, Cai G, Yan J, Ning G, IEEE Access (2018)
pp. 1-1.
[19] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12,
pp. 2825-2830, 2011.
[20] L.Breiman, “Arcing Classifiers”, Annals of Statistics 1998.
[21] Ensembles and Tree Learning Algorithms for Python, 1iskandr
https://github.com/capitalk/treelearn, 2013.
[22] J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L.
Sánchez, F. Herrera. KEEL Data-Mining Software Tool: Data Set
Repository, Integration of Algorithms and Experimental Analysis
Framework. Journal of Multiple-Valued Logic and Soft Computing
17:2-3 (2011) 255-287.
[23] Tomek, I. Two Modifications of CNN. IEEE Trans- actions on Systems
Man and Communications SMC-6 (1976), 769–772.