Content uploaded by Shuquan Ye

Author content

All content in this area was uploaded by Shuquan Ye on Oct 05, 2018

Content may be subject to copyright.

Two-dimensional-reduction Random Forest

Shuquan Ye

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

e-mail: 201536401655@mail.scut.edu.cn

Zhiwen Yu

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

e-mail: zhwyu@scut.edu.cn

Jiaying Lin

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

e-mail: 201530501498@mail.scut.edu.cn

Kaixiang Yang

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

e-mail: xgkaixiang@163.com

Dan Dai

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

e-mail: daidanjune@hotmail.com

Zhi-Hui Zhan

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

Wei-Neng Chen

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

Jun Zhang

School of Computer Science and Engineering

South China University of Technology

Guangzhou 510006, China

Abstract—Random forest (RF) is a competitive machine

learning theorem, while one of the big challenges for it is

imbalanced real-world data. A Two-dimensional-reduction RF

(2DRRF) is presented in this paper, which is optimized based on

traditional RF and three innovation points as follows. To

improve RF in terms of performance on imbalanced data, a two-

dimensional-reduction approach is created. Then, a modified T-

link is proposed focusing on detecting and reducing safe samples.

Moreover, a biased sampling manner is employed to build up

optimal training datasets. Across 13 imbalanced datasets from

KEEL-dataset with imbalance-ratio ranging from 6.38 to 129.44,

experiments are carried out indicating that 2DRRF steadily

holds advantages over the other two relevant implementations

of RF in terms of accuracy, recall, precision and F-value.

Keywords—Imbalanced dataset; classification; random forest;

two-dimensional-reduction; biased sampling

I. INTRODUCTION

The imbalanced datasets can be defined as the data that

holds a negligible number of samples belonging to a particular

class, or positive class, which are of our main interest but

outnumbered by the remainders. With the continuous

emergence of information in fields such as medicine,

genomics, financial businesses, education [1] or network

intrusion [2], vast amount of data are generated with high

imbalance ratio. Regarding the standard data mining theorems

which are sensitive to the imbalance in data, gaining vital

information from imbalanced dataset has become a

challenging task.

A variety of theorems have been proposed to overcome

imbalanced problems. Generally, they are classified as data

processing methods and algorithmic methods. The first

category of methods apply a bias on the data distribution by

resampling or adjusting weights of each class to balance the

dataset ( e.g. under-sampling on majority class, over-sampling

on minority class and weight matrix [3] ). The second category

is internal, which is based on adjusting traditional algorithm

by taking imbalance into consideration ( e.g. biased minimax

probability machine [4], weighted extreme learning machine

[5] ).

The challenge of resampling method is to distinguish the

informative from redundant data, while the challenge of

algorithmic level approaches are selecting and improving

accuracy over original algorithms. This study provides an

approach combining the characteristics from above two

categories with bagging random forest model.

Traditionally, random forests (RF) are competitive

ensemble machine learning theorem, which combines the

result of tree predictors depending on the respective bootstrap

training samples. However, relying on the bootstrap sampling

approach, standard RF are not adapted to the contingency of

learning from imbalanced datasets.

Recently, a parallel random forest algorithm for big data

[6] was proposed to solve the high-dimensional and noisy data

problem which can be summarized by two steps: a bootstrap

sampling method, followed by a reduction in the number of

features. The result demonstrates the promising results of RF

and that there are plenty of room to adapt random forest model

to real-world data.

To adapt RF to learning imbalanced datasets, a two-

dimensional-reduction RF (2DRRF) is put forward in this

study. Usually, datasets are expressed in two dimensions: the

volume of data and the amount of features, which correspond

to the numbers of rows and columns of the data matrix

respectively. For the reduction in horizontal dimension, we

proposed a modified Tomek-link (T-link) used for reducing

safe samples, combined with Hart’s Condensed Nearest

Neighbor Rule (CNN) [7] under-sampling method. Following

this, a biased sampling manner was applied. For the reduction

in vertical dimension, Gini index [8] and Chi-square [9] are

employed of feature-reduction usage. With the two index

above focusing on informational and relational features

respectively, such framework can provide a new benchmark

for feature measurement. According to all above, training sets

can be built up for each tree.

Besides the improvement of learning the imbalanced data,

the two-dimensional-reduction(2DR) may also provide some

guidance for dealing with challenge of data growth of real-

world data.

The rest of the paper is organized as follows. Section 2

reviews the background works. Section 3 describes 2DRRF in

detail. Section 4 presents the experiment results and

conclusions are shown in Section 5 together with future works.

II. BACKGROUND

A. Related Work

Leo Breiman published his work on the first RF in 2001

[10]. Since then, this relatively new learner has showed its

power and robustness and been applied to a vast variety of

fields. With few experimentation on the construction of

random forest classifiers on imbalanced data has been

reported, T.Khoshgoftaar, M.Golawala and J.Hulse firstly

discussed the topic of performance of RF on imbalanced data

on Weka in 2007 [11]. In the work of M.Khalilia,

S.Chakraborty and M.Popescu in 2011, they provided

extensive proof of the guaranteed result on RF when used for

disease prediction with the highly imbalanced Healthcare Cost

and Utilization Project (HCUP) dataset [12].

A study of several methods on imbalanced training data

for machine learning was carried out by G.Batista in 2004,

performing a broad experimental evaluation [13] and

enlightened this study, in which one-sided selection (OSS) [14]

was the under-sampling method we are improving. Recently,

there are also many other researches combining or improving

methods in prior study, such as a hybrid scheme of over-

sampling and under-sampling in 2013 [15], Class Ratio

Preserved RF in 2015 [16], K-L feature compression method

in 2017 [17], and class weights voting (CWsRF) in 2018 [18].

Recently, a parallel-RF algorithm for big data on Apache

Spark platform [6] was presented by J.Chen, K.Li and Z.Tang

in 2017 to deal with the high-dimensional data by two steps: a

traditional bootstrap sampling manner in RF, followed by a

reduction in the number of features. Inspired by their study,

we took a step forward to combining modified T-link and

CNN in sampling followed by improved feature-reduction

method.

A scikit-learn[19] ensemble random forest classifier,

which is one of the most popular machine learning packages

in python, is an simple and efficient tool for this reseach. Their

algorithms are perturb-and-combine techniques [20]. While

another strong and robust implementation of ensembles and

tree learning algorithm called treelearn[21] was employed as

another control group.

The experimental datasets are from online KEEL

(Knowledge Extraction based on Evolutionary Learning)

imbalanced-dataset [22] which is a set of benchmarks

allowing to perform a complete analysis of the behavior of the

learning methods in comparison to existing ones.

B. Performance measure

In real-world tasks, a learner can avail itself of the

imbalance distribution of datasets. For example, with a dataset

of 99% negative class samples, a learner always predicting

negative should holds 99% average accuracy. In this case, the

average accuracy is not an adequate criterion. Several

alternative criteria are used in our experiment:

A Confusion matrix as shown in Table 1 with four

outcomes when examples are classified, is the most

straightforward way used as evaluation of a machine

learning model. From the confusion matrix, accuracy,

precision, recall and F-value may be defined as

follows:

In which β is usually set to 1. The main goal for

classifying imbalanced datasets is to heighten the

recall without hurting the precision and the F-value

represents the trade-off among different values of them.

Identify applicable sponsor/s here. If no sponsors, delete this text box

(sponsors).

III. TWO DIMENSIONAL REDUCTION RF

This section describes the optimization on Random forest

algorithm aiming at accommodating to imbalanced data and

we propose a Two-dimensional-reduction RF (2DRRF). The

major goal is to improve the overall performance and

overcome the difficulties with imbalanced datasets for RF as

analyzed below, by applying both vertically and horizontally

reductions to ensures the training subset are sampled

reasonably and features selected are optimally.

The steps constructing 2DRRF are shown in figure 1. For

the horizontal dimension reduction, modified T-link is used

firstly. Other than the convention of using traditional T-link

[23] for reducing boundary samples, it is used for reduction

among the interiority of majority class distribution in our

method so as to reduce the proportion of majority data as well

as emphasize the original border between the majority and

minority. Next, with CNN, we will find consistent subset from

the dataset we did with the previous step. To sum up, for

horizontal dimension we reduce examples from the majority

class that are distant from the decision border, since these sorts

of examples might be considered redundant or less relevant

for learning. After the horizontal dimension reduction, a

biased sampling manner was applied. For the vertical

dimension reduction, a Gini index is used firstly to compute

entropy, with which feature variables can be divided into low

and high importance. After Chi-square test, we can further

divide the feature variable into low or high correlation.

According to the four groups above, features can be extracted

obliquely so as to tease out informative and relative features

building up training sets. More detailed description are shown

in Subsection B, C and D.

A. The Random Forest Model

The traditional random forest model is an ensemble of

predictors

such that each tree are trained on independent samples

with the same distribution for all trees in the forest using

random sampling with replacement, or bootstrap, along with

random selection of features. During prediction, each sample

in testing data should be classified by all trees, which returns

final result according to the votes.

However, these fundamental ideas and rules based on

general case will be misled when imbalance ratio exceeds

certain threshold. Theoretically, the upper bound for the

generalization error of a forest, given by (1), is in direct

proportion to the correlation between the individual trees and

of negative correlation with the strength of them. Given sparse

positive examples, the positive regions would be extremely

Table 1: Confusion Matrix

Fig 1: Process for 2DR construction

Fig 2: Process for RF construction

small with decision border fitted closely to positive samples,

causing overfitting and reduce strength of trees eventually.

Also, the variance of , which is proportional to the mean

value of correlation as shown in (2), will be high when vast

majority of samples in each training set are from negative

class. The main processes of constructing a random forest are

as illustrated in figure 2.

Where the strength of the set of classifier is

And the variance of is

B. Horizontal dimension reduction

For the multi-class problem with different class

labels , suppose the given training

dataset is in the form

, where is the feature vector and is the corresponding label.

There are totally samples. The explanation of relevant

elements and denotes are shown in Table 2 which will appear

in the following steps.

Modified T-link: Reduction Inside Majority Class

In this step, to improve the performance of the RF

algorithm on imbalanced data, we present a modified T-link

for the usage of under sampling.

Traditional T-link as shown in figure 3 colored by deep

green, can be defined as follows: Given two examples in

dataset, and , coming from different classes, say,

and . Let refer to the distance

between feature vectors and . A T-link is said to be

formed when there is no sample so that

or . That is:

The character of T-link is that for the two examples who

form a T-link, they should be removed because both examples

are boundary, or either one is noise. In another words, the

samples are divided into two categories:

1) Borderline examples distributing around class

boundaries, where the positive and negative classes overlap.

2) Safe samples which are located in relatively

homogeneous areas without overlapping class labels.

However, it is the deletion of borderline samples that causes

disputes because the samples along boundary are important

information for forming decision border. In consideration of

this, an adjustment is made to T-link as shown in figure 3

which are colored by bright orange. When a T-link is found

between samples both from majority class, and

, a reduction in majority class would be carried out.

A sample is randomly selected from and , say, , will be

moved from to another set in order to counteract the bias

of majority data far away from the borderline without hurting

the original border between the majority and minority at the

same time.

After this step, raw dataset is divided into and all

the rest . The modified Tomek-link algorithm is shown in

algorithm 1.

CNN: Find consistent subset

In this step, Hart’s Condensed Nearest Neighbor Rule is

applied to find a consistent subset of examples in for under

sampling.

The CNN attempts to remove the examples far away from

decision border and selectively reduces the original

population. A subset is consistent with when

Table 2: Table of elements

Element

Description

jDj

number of samples in

D

jyi=C1j

number of samples labeled with

C1

V(C1)

the set with all samples labeled

with

C1

Cmax

the majority class,

Cmax =Cmjyi=Cmj

Cmin

the others classes

dim(xi)

the dimension of

xi

V(C1)

the set with all samples labeled

with

C1

ak

the feature

a

with all the data in

this column

Fig 3: Tomek-links

using a 1-nearest-neighbor algorithm (1-NN), can

correctly classify all the instances in correctly.

This algorithm goes as follows: Firstly, select one sample

from the majority class and all samples belonging to the

minority class forming . A reminder that, this majority class

is not necessarily the majority in raw dataset because of the

previous step of T-link. Secondly, a 1-NN was applied upon

to classify the examples in Afterwards, all

misclassified instances are added to so as to ensure

correctness. The three processes above are repeated until no

examples in will be misclassified. The rest are collected

into where . To note that this algorithm does

not guarantee a smallest consistent set.

The horizontal dimension reduction ends with three

subsets , and .

C. Sampling with bias

In this step, to sample from the dataset which was reduced

or compressed by the first step, a biased sampling manner is

applied. A biased sample is collected in the way that some

samples of the intended population are less or more likely to

be included than others. In this case, aiming at selecting

subsets from , and as

training data for trees, with samples each, the probability

should be set differently:

This is a simple implementation of dynamically

probability setting for biased sampling, the principle is to do a

square operation on

, because in the range

[0,1], the characteristics of square operation is monotonically

increasing, and its derivation also monotonically increasing. It

can be automatically realized when the set is too small and

the set is too large, the probability of extracting

from the set is appropriately increased in order to

ensure the diversity of the training subsets.

D. Second dimension reduction

In this part, we are further reducing or compressing each

training subsets in the dimension of feature space.

Both Gini index and Chi-square are employed to evaluate

features comprehensively. The former is from the perspective

of tree-growth and information gain which emphasizes

learning depth, while the latter is from the perspective of

statistical relativity between features and final result which

emphasizes correlation.

Gini index: Evaluating importance

For the sample subspace in ,

there are total features. If the sample is discrete,

and is divided into two subsets and , by one feature

, Gini index can be computed as follows:

If the sample is divided into subsets where ,

Where for dataset ,

Here Gini index is used as feature importance evaluation

and the reason why Gini index was chosen is that it is much

Algorithm 1 modi¯ed T-link

Input: x=fxigN

i=1: The set of feature vectors.

id: The indexes of samples in xfrom ma-

jority class

Output: Da=all link,D0=all rest

1: Initialize all link =Â;all rest =Â;

x nearest neighbor =Â;

2: x nearest neighbor =NearestNeighbor(x),

stores index of neighbor

3: for i in id do

4: if x nearest neighbor(i)=2id, whether

sample ibelongs to majority class then

5: Continue

6: end if

7: if x nearest neighbor(i)< i, to ensure

the pair in all link is unique then

8: Continue

9: end if

10: Randomly choose one from

x nearest neighbor and i, put into

all link

11: end for

12: all rest =x¡all link

Table 2: Four groups

: High correlation

and high importance

:High correlation

and low importance

:Low correlation

and high importance

:Low correlation

and low importance

more easier to compute than information gain ratio(entropy),

for it does not employ the logarithm operation.

After sorting according to Gini index and divide the

features by a certain threshold, the important feature set and

the unimportant feature set can be generated. In this

application, the top , or ranking are moved to

and the rest which ranks from to are .

.

Chi-square: Evaluating correlation

Pearson’s Chi-square test is used in nominal variables to

determine whether there is a significant correlation between

the expect features, and in this situation. The steps for

Chi-square test are as follows:

1) Null pypothesis and Alternative hypothesis: Null

pypothesis assumes that there is no association between

and , while Alternative hypothesis assumes that there is an

association.

2) Transform the data into table: Each row represent a

value of and each column represent a value of . Let

and .

3) Expected value of and :

О

О

О

О

4) Chi-square test of independence:

О

О

5) Degree of freedom:

6) Hypothesis testing:

The critical value for the chi-square statistic is determined

by the level of significance (typically 0.05) and the degrees of

freedom. If the observed chi-square test statistic is greater than

the critical value, the null hypothesis can be rejected and there

is a significant correlation between and .

After sorting with Gini index and Chi-square test,

eventually, the feature variables are grouped as shown in table

2. The following step is to select features with bias which is

the same way as the biased sampling. For example:

This procedure ends up with training subsets of

including biased selected features.

All operations on data can be expressed as figure 4.

IV. EXPERIMENTAL EVALUATION

This section performed experiments to evaluate the

proposed idea of RF improvement, which was improved on

the basis of scikit-learn ensemble random forest classifier,

while another strong and robust implementation of ensembles

and tree learning algorithm called treelearn was employed as

another control group. For experimental details, in terms of

the accuracy, recall, precision and F-value of our

implementation on 2DRRF are evaluated by comparison with

two RF: the scikit-learn RF classifier, and the treelearn

classifier.

Fig 4: Data operation process

Original data is split into random training and testing sets

with 0.25 of the proportion of the dataset to include in test

splits. Each performance test result is obtained averaging the

measurements of 100-30 loops, according to time consuming.

In these experiments, it turns out that several parameters

and attributes are keys for classification performance. Using

GridSearch, the optimal value of the number of trees can be

automatically selected. Also, there are other factors in 2DRRF

critical for learning: ratio of sample size for each tree, ratio of

sample bias, and three ratio of feature biases. To make it more

rigorous, in order to carry out controlled experiment, some

parameters of scikit-learn RF are set manually instead of using

default value. Totally 13 real-world-datasets from KEEL-

dataset[22] are chosen with different volume of samples from

168 to 4174, number of features from 6 to 41, and imbalance

ratios from 6.38 to 129.44 as presented in table 3.

Various experimentations of comparison are constructed

and performed on two different machines with Intel 3.20GHz

x64- based processor and 8.00GB RAM each.

As described in table 4, the performance of scikitlearn-RF,

treelearn-RF and 2DRRF are compared in terms of the

accuracy, recall, precision and F-value.

In general, our implementation of 2DRRF outperforms the

other two, especially in terms of the F-value, providing very

good result in practice. The process of 2DR will provide

increase their strength, thus reducing generalization error.

When analyzing each result of datasets, three conclusions can

be extracted:

In terms of performance on datasets 9, 10 and 13

whose imbalance ratio are over 50, or highly

imbalanced, first of all, it can be observed that

2DRRF performs steadily holding the highest Recall,

Precision and F-value, as those three standards are

relatively more convincing when measuring highly

imbalanced data sets, since their design is aimed at

measuring the classification of positive samples by

the model, which makes it more suitable as an

indicator of the performance on imbalanced data.

Moreover, the recall remains at 1.0.

In terms of performance on datasets 4-7, where each

of these datasets holds the same data volume, 2DRRF

still maintains its advantages. However, with these

four datasets, especially with yeast3, our approach

shows less stable performance on Recall. Also with

fixed data amount and features but increasing the

imbalance ratio, our approach didn’t show a more or

less performance on Accuracy and Precision.

Considering small or medium sized dataset without

high imbalance ratio, nor large feature number, our

algorithm performs slightly worse than in other case.

When looking at the above three cases, answer may be

hidden in the design and structure of the algorithm itself. For

the first case of great performance on large and high-

imbalanced data, and the third case of poor performance on

low feature number and small data-size, the reduction

procedure can be the point. Since 2DR can reduce the size of

original data and the feature for tree construction, it provides

RF with better ability to learn large datasets. In contrast, its

ability to learn from small and few features data is weaken. As

for case two, the result shows that 2DR is relatively not

sensitive to changes in imbalance-ratio. In other words, it

holds some stability and robustness.

V. CONCLUSION AND LIMITATIONS

This paper presents the improvement of Random forest

over imbalanced data with a strategy which employs reduction

in both the data volume and the feature space, and

substantiates the improvement both theoretical and

experimental. A proposal of a taxonomy of 2DR is offered,

under-sampling input training set with a combination of

modified T-link and CNN, driving training subsets for each

tree with a biased sampling manner, selecting features with a

mixed feature measurement of Gini index and Chi-square test.

After experimental study of improved RF comparing with

traditional RF, the main conclusions achieved are as follows:

Benefitting from the 2DR, our algorithm steadily

outperforms the other two in comparison and shown

more robust to the change of imbalance ratio in terms

of accuracy, recall and precision, indicating notable

superiority and strength over the others.

Taking advantage of the reduction method, it has

gained a better ability to deal with large data with

many features and high imbalance ratio.

The difficulty of dealing with small sized and few

features dataset indicated that pure dimensionality

reduction may not be the optional solution to all cases

in imbalanced problem.

For feature work, we will focus on improvement of

2DRRF in three fields:

To further improve the performance on small dataset,

we may employee the algorithm combining both 2DR

and oversampling method so as to adjust itself to

sample size.

In comparison with the original RF algorithm, the

process of 2DR ensures the training subset and

selected features are optimal, while we may loss some

diversity among the trees, which may lead to increase

of generalization error, or overfitting, the problem not

mentioned in the original algorithm. To solve this

problem, we can simplify the specific decision trees

by pruning.

In our implementation, training time and efficiency

are not taken in account, resulting in significantly

increased time-consuming. Therefore, we plan to

simplify the algorithm, optimize the order on the basis

of carrying out complexity analysis.

ACKNOWLEDGMENT

The work described in this paper was partially funded by

grants from the NSFC No. 61722205, No. 61751205, No. 61-

572199, No. 61572540, and No. U1611461, the grant from

the Guangdong Natural Science Funds (No. S2013050014677,

and No. 2017A030312008), the grant from Science and

Technology Planning Project of Guangdong Province, China

(No. 2015A050502011, No. 2016B090918042, No. 2016-

A050503015, No. 2016B010127003), the grant from Guang-

zhou Science and Technology Planning Project (No.

201704030051)

REFERENCES

[1] Chau V, Phung N, Proceedings - 2013 RIVF International Conference

on Computing and Communication Technologies: Research,

Innovation, and Vision for Future, RIVF 2013 (2013) pp. 135-140.

[2] J. Zhang and M. Zulkernine. Network intrusion detection using random

forests. Proc. Of the Third Annual Conference on Privacy, Security and

Trust (PST), pages 53–61, October 2005.

[3] Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P.

(2002). SMOTE: Synthetic minority over-sampling technique. Journal

of Artificial Intelligence Research, 16:321–357.

[4] Huang, K., Yang, H., King, I., and Lyu, M. R. (2006). Imbalanced

learning with a biased minimax probability machine. IEEE

Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics,

36(4):913–923.

[5] Zong, W., Huang, G.B., and Chen, Y. (2013). Weighted extreme

learning machine for imbalance learning. Neurocomputing 101, 229–

242.

[6] Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., and Li, K. (2017).

A Parallel Random Forest Algorithm for Big Data in a Spark Cloud

Computing Environment. IEEE Transactions on Parallel and

Distributed Systems 28, 919–933.

[7] Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE

Transactions on Information Theory, 14, 515–516.

[8] Breiman, L., Friedman, J., Olshen, R., Stone., C.: Classification and

Regression Trees, Wadsworth, Belmont, MA (1984).

[9] Michalski, Stepp, & Diday, 1981; Diday, 1974 .

[10] Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001).

[11] Khoshgoftaar T, Golawala M, Hulse J. 19th IEEE International

Conference on Tools with Artificial Intelligence(ICTAI 2007), vol. 2

(2007) pp. 310-317.

[12] Khalilia M, Chakraborty S, Popescu M. BMC Medical Informatics and

Decision Making, vol. 11, issue 1 (2011).

[13] G.Batista, R.Prati, M.Monard. ACM SIGKDD Explorations

Newsletter, vol. 6, 2004.

[14] Kubat, M., and Matwin, S. Addressing the Course of Imbalanced

Training Sets: One-sided Selection. In ICML (1997), pp. 179–186.

[15] Chau, V.T.N., and Phung, N.H. (2013). Imbalanced educational data

classification: An effective approach with resampling and random

forest. In Proceedings - 2013 RIVF International Conference on

Computing and Communication Technologies: Research, Innovation,

and Vision for Future, RIVF 2013, pp. 135–140.

[16] Khoshgoftaar T, Fazelpour A, DIttman D, Napolitano A, Proceedings

- 2015 IEEE 16th International Conference on Information Reuse and

Integration, IRI 2015 (2015) pp. 342-348 Published by Institute of

Electrical and Electronics Engineers Inc.

[17] Zhu M, Su B, Ning G, 2017 International Conference on Smart Grid

and Electrical Automation (ICSGEA) (2017) pp. 273-277 Published by

IEEE.

[18] Zhu M, Xia J, Jin X, Yan M, Cai G, Yan J, Ning G, IEEE Access (2018)

pp. 1-1.

[19] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12,

pp. 2825-2830, 2011.

[20] L.Breiman, “Arcing Classifiers”, Annals of Statistics 1998.

[21] Ensembles and Tree Learning Algorithms for Python, 1iskandr

https://github.com/capitalk/treelearn, 2013.

[22] J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L.

Sánchez, F. Herrera. KEEL Data-Mining Software Tool: Data Set

Repository, Integration of Algorithms and Experimental Analysis

Framework. Journal of Multiple-Valued Logic and Soft Computing

17:2-3 (2011) 255-287.

[23] Tomek, I. Two Modifications of CNN. IEEE Trans- actions on Systems

Man and Communications SMC-6 (1976), 769–772.