ArticlePDF Available

Abstract and Figures

The banking industry has been seeking novel ways to leverage database marketing efficiency. However, the nature of bank marketing data hindered the researchers in the process of finding a reliable analytical scheme. Various studies have attempted to improve the performance of Artificial Neural Networks in predicting clients' intentions but did not resolve the issue of imbalanced data. This research aims at improving the performance of predicting the willingness of bank clients to apply for a term deposit in highly imbalanced datasets. It proposes enhanced Artificial Neural Network models (i.e., cost-sensitive) to mitigate the dramatic effects of highly imbalanced data, without distorting the original data samples. The generated models are evaluated, validated, and consequently compared to different machine-learning models. A real-world telemarketing dataset from a Portuguese bank is used in all the experiments. The best prediction model achieved 79% of geometric mean, and misclassification errors were minimized to 0.192, 0.229 of Type I & Type II Errors, respectively. In summary, an interesting Meta-Cost method improved the performance of the prediction model without imposing significant processing overhead or altering original data samples.
Content may be subject to copyright.
applied
sciences
Article
Business Analytics in Telemarketing: Cost-Sensitive
Analysis of Bank Campaigns Using Artificial Neural
Networks
Nazeeh Ghatasheh 1,* , Hossam Faris 2, Ismail AlTaharwa 3, Yousra Harb 4
and Ayman Harb 5
1Department of Information Technology, The University of Jordan, 77110 Aqaba, Jordan
2Department of Information Technology, The University of Jordan, 11942 Amman, Jordan;
hossam.faris@ju.edu.jo
3Department of Computer Information Systems, The University of Jordan, 77110 Aqaba, Jordan;
i_taharwa@ju.edu.jo
4Department of Management Information Systems, Yarmouk University, 21163 Irbid, Jordan;
yousra.harb@yu.edu.jo
5Department of Hotel Management, The University of Jordan, 77110 Aqaba, Jordan; a.harb@ju.edu.jo
*Correspondence: n.ghatasheh@ju.edu.jo
Received: 26 February 2020; Accepted: 2 April 2020 ; Published: 9 April 2020


Featured Application: This study attempts to mitigate the effects of highly imbalanced data
in realizing an enhanced cost-sensitive prediction model. The model intends to enable telemarketing
decision makers in the banking industry to have more insights on their marketing efforts, such that
potential clients gain more focus based on quantifiable cost estimates.
Abstract:
The banking industry has been seeking novel ways to leverage database marketing
efficiency. However, the nature of bank marketing data hindered the researchers in the process
of finding a reliable analytical scheme. Various studies have attempted to improve the performance
of Artificial Neural Networks in predicting clients’ intentions but did not resolve the issue
of imbalanced data. This research aims at improving the performance of predicting the willingness
of bank clients to apply for a term deposit in highly imbalanced datasets. It proposes enhanced
Artificial Neural Network models (i.e., cost-sensitive) to mitigate the dramatic effects of highly
imbalanced data, without distorting the original data samples. The generated models are evaluated,
validated, and consequently compared to different machine-learning models. A real-world
telemarketing dataset from a Portuguese bank is used in all the experiments. The best prediction
model achieved 79% of geometric mean, and misclassification errors were minimized to 0.192, 0.229
of Type I & Type II Errors, respectively. In summary, an interesting Meta-Cost method improved
the performance of the prediction model without imposing significant processing overhead or altering
original data samples.
Keywords:
applied computational intelligence; business analytics; cost-sensitive analysis;
electronic direct marketing; MLP-ANN
1. Introduction
Data-driven decision-making process [
1
7
] has been playing an essential part of the critical
responses to the stringent business environment [
4
,
8
13
]. Identifying profitable or costly customer
is crucial for businesses to maximize the returns, preserve a long-term relationship with the customers,
and sustain a competitive advantage [
14
]. In the banking industry, there are several opportunities yet to
Appl. Sci. 2020,10, 2581; doi:10.3390/app10072581 www.mdpi.com/journal/applsci
Appl. Sci. 2020,10, 2581 2 of 15
be considered in sustaining a competitive advantage. Improving database marketing efficiency is still
one of the main issues that requires intensive investigation. The nature of bank marketing data presents
a challenge that is facing the researchers in business analytics [
3
,
7
,
10
]. The low volume of the potential
target/important customer data (i.e., imbalanced data distribution) is a major challenge in extracting
the latent knowledge in banks marketing data [
1
,
3
,
10
]. There is still an insisting need for handling
the imbalanced dataset distribution reliably [
15
17
]; commonly used approaches
[1,15,16,1821]
impose processing overhead or lead to loss of information.
Artificial Neural Network (ANN) models have been used broadly in marketing to predict the behavior
of customers [
22
,
23
]. ANNs are supposed to discover non-linear relationships from complex domain
data, e.g., in bank telemarketing data [
14
,
24
]. Interestingly, ANNs are able to generalize the “trained”
model to predict the relationships of unseen inputs. In some business cases ANNs are highly
competitive as they can handle any type of input data distribution [
25
28
]. Modifying an ANN
to make it cost-sensitive is a promising approach to enhance its performance and mitigate the effects
of imbalanced data distribution. Cost-sensitive methods preserve the quality of original datasets,
in contrast to other methods (e.g., pre-processing the original dataset by re-sampling techniques
to adjust the skewed distribution of classes) which may degrade the quality of the data [
29
].
In practice, bank telemarketing data are highly imbalanced (e.g., 11% of the contacted clients
in a marketing campaign may be interested to accept an offer). Therefore, a cost-sensitive ANN
model is a strong candidate [
1
,
30
] to predict the willingness of bank clients to take a term deposit
in a telemarketing database.
Researchers and practitioners in the field of electronic commerce have been striving to streamline
and promote various business processes. For a few decades, different interesting research efforts have
attempted to improve understanding of the behavior of customers using ANNs [
1
,
14
,
22
24
,
26
,
27
,
31
].
However, cost-sensitive algorithms have been with marginal interest to the researchers in bank
marketing, while pre-processing the input dataset by re-sampling techniques to solve imbalanced class
distribution have gained significant interest [
1
,
15
]. This research seeks unleashing the potentials
of cost-sensitive analysis in providing enhanced bank marketing models, and reliable handling
of imbalanced data distributions.
Predicting the potential bank clients who are willing to apply for a term deposit would reduce
marketing costs by saving campaigns wasted efforts and resources. On the other hand, contacting
uninterested clients is an incurred cost with marginal returns. It is a matter of fact that wrong
prejudgments on client intentions (e.g., willing or not willing to accept an offer) have unequal
consequent costs [
29
,
30
]. A marketing manager would expect a relatively higher cost of not contacting
a potential client who is willing to invest than contacting an uninterested client. Therefore, it is with
high value to find a reliable prediction model that takes misclassification costs into account.
Cost-sensitive approach has been proposed before to overcome the imbalanced nature of a bank
telemarketing data. Cost-sensitive learning, by re-weighting instances [
32
], leveraging only nine
features of the whole feature set was proposed in [
30
]. Compared to a previous study that considered
re-sampling [
30
,
33
], proposed approach did not perform well. Therefore, they [
30
] did not recommend
the use of cost-sensitive approaches to mitigate imbalance issue among bank telemarketing data. This
study argues that cost-sensitive approaches are capable of handling the issue of imbalanced data.
Moreover, it can improve prediction results while maintaining better reliability compared to instance
re-sampling approaches. Both over-sampling and under-sampling scenarios may reduce imbalance
ratio, but each scenario has its inherent shortcoming. The former scenario would result in over-fitting
model while the latter discards potentially valuable instances [
30
]. In order to justify this claim, a wider
range of cost-sensitive approaches [
1
,
29
,
30
,
34
,
35
] are applied and compared from different angles
of view.
This research considered a highly imbalanced data [
3
] that is publicly released, concerning
a Portuguese bank telemarketing campaign. The data alongside arbitrary cost matrix were used
to generate different cost-sensitive ANN models. The generated models were evaluated using
Appl. Sci. 2020,10, 2581 3 of 15
several evaluation measures. The best cost-sensitive models were compared against conventional
machine-learning approaches. The main contributions of this research are:
Proposing a relatively reliable cost-sensitive ANN model to predict the intentions of bank clients
in applying for a long-term deposit.
Mitigating the effects of imbalanced bank marketing data without distorting the distribution
of real-world data.
Translating the best outcomes into a decision support formula, which could be used to quantify
the estimated costs of running a telemarketing campaign.
The rest of the this paper is organized as follows: Section 2provides the theoretical background
supporting this research. The used data and proposed methodology are explained in Section 3.
Experiments and corresponding results are illustrated in Section 4. A discussion of the obtained results
is presented in Section 5. Conclusions, research limitations and the sought future works are stated
in Section 6.
2. Theoretical Background
Telemarketing is one form of direct marketing. In telemarketing, a firm dedicates a call
center to directly contact potentially interested customers/clients with offers through telephone.
Telemarketing is accompanied with two merits: it raises response rates and reduces costs and
allocated resources. Telemarketing campaigns become the preferred mean among banking sector to
promote products or services being offered [
10
,
27
]. Due to its prevalence, ANN become the preferred
classification algorithm to handle complex finance and marketing issues [14,22,23,28,31,36].
Imbalance data remains a key challenge against classification models [
15
,
18
]. The majority
of literature considered re-sampling approaches, i.e., both over-sampling and under-sampling, to alleviate
degradation due to the issue of imbalanced data [
1
,
17
,
19
,
33
,
37
]. Recent research contributions
warn from the limitations and shortcomings accompany re-sampling approaches
[16,38,39].
In particular, questionable reliability of produced models, i.e., while under-sampling approach may
discard important instances, over-sampling approach may result in generating over-fitting models [
30
].
Among the vast majority of proposed approaches to replace conventional re-sampling approach,
cost-sensitive classification seems to be overlooked or underestimated [
1
,
30
,
34
,
35
]. Therefore, this
study aims to fill these gaps by advising a cost-sensitive-based ANN classifier, which is reliably capable
of dealing with highly imbalanced datasets.
2.1. Base Classifier: Multilayer Perceptron
An Artificial Neural Network (ANN) consists of many neurons that are organized in a number of layers.
The neurons are considered processing units that are updated with values according to a specific function.
Such neuron values are results of a formula that processes several weights from input layer and propagate
the results to subsequentlayers. Therefore, it simulates the process of physiological neurons that transfer
information from one node to another according to interconnection weights. Multilayer Perceptron
(MLP) is a widely used special purpose implementation of ANNs that has been applied to many
business domains [
2
,
28
]. A general architecture of MLP is illustrated in Figure 1, which has in this case
16 input variables at input layer. In turn, those input variables are fully connected to a hidden layer.
Similarly, the hidden layer is connected to an output layer with two neurons. After the construction
of a MLP-ANN, it passes a training phase, in which the weights are updated based on specific input
variables that are mapped to identified output variables. The updated weights enable MLP network to
predict outputs for instance with unseen input. A testing phase examines the performance of the MLP
by providing input features only to the trained network, then the output layer specifies the prediction
result of the network (i.e., to which class the variables belong).
Appl. Sci. 2020,10, 2581 4 of 15
default
balance
housing
loan
contact
day
month
duration
campaign
pdays
previous
poutcome
y
marital
education
job
age
0: will NOT subscribe
1: will subscribe
Figure 1. Multilayer Perceptron Artificial Neural Network (MLP).
2.2. Cost-Sensitive Classification
The theory behind cost-sensitive classification is to either (a) re-weight training inputs in line with
a pre-defined class cost, or (b) predicting a class with lowest misclassification cost [
1
,
40
]. This scheme
transforms the classifier into cost-sensitive meta-classifier. Improving the estimates probability
of the base classifier is done usually by the use of bagged classifier. In essence, the predictor splits
the outputs into minority or majority class according to an adjusted probability threshold. A general
cost matrix
C
is illustrated in Equation (1). Where
λ
and
µ
denote the cost of each class misclassification.
C="0λ
µ0#(1)
Matrix
C
shown in Equation (1) is a general illustration of cost re-weighting, in which
λ
and
µ
are pre-defined cost factors that are determined by domain experts. Therefore, the classification
algorithm uses matrix
C
to predict the minority and majority classes according to a specific probability
threshold over the output of the classifier.
2.3. Cost-Sensitive Learning
In cost-sensitive learning the internal re-weighting of the original dataset instances creates an
input for new classifier to build the prediction model [
36
,
41
]. With respect to MLP particularly,
an update to the behavior of Naive Bayes classifier makes it cost-sensitive, which is achieved by label
selection according to its cost (i.e., lowest cost) rather than those with high probability. In cost-sensitive
learning, adding cost sensitivity to the base algorithm is done either by re-sampling input data [
30
]
or by re-weighting misclassification errors [
32
]. Due to the raising concerns regarding reliability
of re-sampling method, new trends of cost-sensitive learning shift towards re-weighting method [
30
].
Appl. Sci. 2020,10, 2581 5 of 15
3. Data and Methodology
This research aims to enhance the prediction accuracy of imbalanced telemarketing data leveraging
cost-sensitive approaches. Such enhancement is attained by making base classifiers sensitive to different
cost of target classes. Hence the cost of misclassification is supposed to be unequal [
42
]. Cost sensitivity
is added to re-weight marketing campaign results during model building. Consequently, leaving
an opportunity to improve prediction results toward class of interest. The dataset is described
in Section 3.1, the proposed approach is illustrated in Section 3.2, and Section 3.3 describes
the performance measures and metrics in detail.
3.1. Data Description
The public dataset is provided by [
3
] which includes real-world data from a Portuguese bank
industry that is covering the years from May 2008 to June 2013. This research considers the Additional
dataset file for model building and validation, which contains 4521 instances. There are 16input variables
divided into seven numeric and nine nominal attributes. The target variable named “y” is a binary class
indicating whether a client applied for term deposit or not. All 17 variables are described in
Tables 1and 2.
The dataset is highly imbalanced as only 11.5% of the instances indicate a positive label (i.e., a client
has subscribed for a term deposit), which is the class of interest in this case.
Table 1. Nominal Variables (Portuguese Bank dataset [3]).
Attribute Description Values Count Percent
job Type of job
management 969 21.4%
blue-collar 946 20.9%
technician 768 17.0%
admin 478 10.6%
services 417 9.2%
retired 230 5.1%
self-employed 183 4.0%
entrepreneur 168 3.7%
unemployed 128 2.8%
housemaid 112 2.5%
student 84 1.9%
unknown 38 0.8%
marital Marital status single 1196 26.5%
married 2797 61.9%
divorced 528 11.7%
education Level of education
primary 678 15.0%
secondary 2306 51.0%
tertiary 1350 29.9%
unknown 187 4.1%
default Has credit in default account yes 76 1.7%
no 4445 98.3%
housing If there is a housing loan yes 2559 56.6%
no 1962 43.4%
loan Has a personal loan yes 691 15.3%
no 3830 84.7%
contact Type of communication cellular 2896 64.1%
telephone 301 6.7%
unknown 1324 29.3%
month Last contact month of the year
jan 148 3.3%
feb 222 4.9%
mar 49 1.1%
apr 293 6.5%
may 1398 30.9%
jun 531 11.7%
jul 706 15.6%
aug 633 14.0%
sep 52 1.2%
oct 80 1.8%
nov 389 8.6%
dec 20 .4%
poutcome Previous outcome results
failure 490 10.8%
success 129 2.9%
other 197 4.4%
unknown 3705 82.0%
y: Target Class Has the client subscribed to a term deposit? yes 521 11.5%
no 4000 88.5%
Appl. Sci. 2020,10, 2581 6 of 15
Table 2. Numeric Variables (Portuguese Bank dataset [3]).
Variable Description Mean SD Percentile 25 Percentile 50 Percentile 75
age Integer indicating client’s age 41.17 10.576 33.00 39.00 49.00
balance Average yearly balance 1422.66 3009.638 69.00 444.00 1480.00
day Last contact day of the month 15.92 8.248 9.00 16.00 21.00
duration Duration of last contact in seconds 263.96 259.857 104.00 185.00 329.00
campaign Number of contacts performed during
this campaign and for this client 2.79 3.110 1.00 2.00 3.00
days Number of days passed since
last campaign contact 39.77 100.121 1.00 1.00 1.00
previous Number of contacts performed for
this client from previous campaigns 0.54 1.694 0.00 0.00 0.00
3.2. The Proposed Methodology
Basically, the phases of the proposed research scheme are: (a) data preparation (b) proposing
an intuitive cost matrix, (c) adding cost sensitivity in different schemes to the classifier, and (d)
assessing the resulted direct marketing prediction models. Figure 2summarizes the main steps
of the experiments.
Pre-
Processing
Bank Campaign
Data
Experts
Cost Matrix
0
0
𝛌
𝝁
COST-SENSITIVE MODEL
DEVELOPMENT MODEL ASSESSMENT
(Best Model)
(FN × 𝜆1) ( FP × 𝜇1)
(FN × 𝜆2) ( FP × 𝜇2)
(FN × 𝜆3) ( FP × 𝜇3)
(FN × 𝜆n) ( FP × 𝜇n)
(FN × 𝜆x) ( FP × 𝜇x)
INPUT VARIABLESDOMAIN KNOWLEDGE
MODELING
VALIDATION
MODEL SELECTION
Figure 2. The Proposed Methodology.
Several classification models were constructed and evaluated in order to capture the added value
by adding cost sensitivity analysis to the conventional classification algorithm. In particular, two
distinct cost-sensitive methods used, namely CostSensitiveClassifier and Meta-Cost. The characteristics
of these two methods are illustrated in Table 3. Furthermore, both methods are elaborated
in Sections 3.2.1 and 3.2.2 respectively. One base classifier is leveraged to evaluate each of those
methods, which is MLP (Presented in Section 2.2). 10-fold cross validation scheme was maintained
Appl. Sci. 2020,10, 2581 7 of 15
in all experiments. The proposed approaches do not maintain instance resample either for training or
for testing.
Table 3. Employed Cost-sensitive analysis algorithms
Algorithm Theoretical Base Weka Implementation Basis (Re-Sampling vs. Re-Weighting)
Cost-Sensitive Classifier Meta-learning: Thresholding [43] CostSensitiveClassifier [40] Re-weighting
Meta-Cost Meta-learning: Thresholding [43] Meta-Cost [44] Re-weighting
Cost-sensitive algorithms deal with two types of cost factor, namely FN-Cost Factor and
FP-Cost Factor
.
While the former represents an estimated loss value if a potentially deposit-take willing client
is incorrectly classified as not willing to subscribe a term deposit, the latter represents an estimated loss
value if an unwilling client is incorrectly classified as willing to subscribe a term deposit. Consisting with
the stated objectives and in accordance with the imbalance nature of the dataset, FN-Cost Factor tuning
is necessary. Usually, values of cost factors are set by domain experts. In this research, FP-Cost Factor
was fixed at one, meanwhile, nine distinct values of FN-Cost Factor were examined, such that each
classification model was evaluated against each FN-Cost Factor value in order to achieve a trade-off
between type I and type II errors. Ordinary base classifier, MLP, was used in all experiments without
any particular parameter tuning. Experimental setup configurations are shown in Table 4.
Table 4. Experiment Setup.
Configuration Description
FN-Cost Factor (λ) 1, 5, 10, 25, 50, 75, 100, 150, 200
FP-Cost Factor (µ) 1 always
Base Algorithm MLP
Adding Cost Sensitivity CostSensitiveClassifier, Meta-Cost
Validation 10-Fold Cross Validation
Target Class Yes ( the client will subscribe a term deposit )
3.2.1. CostSensitiveClassifier Method
This research applied a generic “CostSensitiveClassifier” method in order to minimize
the expected misclassification cost of the MLP-ANN algorithm. In which it re-weights the training
instances according to the total assigned cost to every class. The assigned cost is defined by a cost
matrix C.
3.2.2. Meta-Cost Method
Meta-Cost [
44
] method adds cost sensitivity to the base classifier, such that it generates
a weighted dataset for training by labeling the estimated minimal cost classes in the original dataset.
Afterwards, an error-based learner uses the weighted dataset in model building process. If the base
classifier can produce understandable outcomes, then the adapted cost-sensitive classifier produces
explanatory results. Combining bagging with cost-sensitive classifier is a common approach to handle
imbalanced data issues. However, Meta-Cost algorithm is proven to gain better results [40,44].
Meta-Cost relies on partitioning the samples space
X
into
x
regions, where the class
j
in region
x
has the least cost. Cost-sensitive classifications aims at finding frontiers between resulted areas
according to the cost matrix
C
. The main idea behind Meta-Cost is to find an estimate of the class
probabilities
P(j|x)
then to label the training instances with their least cost classes (i.e., finding
optimal classes). The pseudocode of Meta-Cost procedure is summarized and illustrated in Algorithm 1.
For the sake of this particular research, Multilayer Perceptron (i.e., back-propagation artificial network)
is used as base classifier [40].
Appl. Sci. 2020,10, 2581 8 of 15
Algorithm 1: Meta-Cost Algorithm (Adapted from [44])
Input: S- (Bank) training set
L- (MLP) classification learning algorithm
C- cost matrix
m- number of resamples to generate
n- number of examples in each resample
p- True iff Lproduces class probabilities
q- True iff all resamples are to be used for each example
Procedure Meta-Cost(S, L, C, m, n, p, q)
for i=1tomdo
Let Sibe a resample of Swith nexamples
Let Mi= Model produced by applying Lto Si
for each example x in S do
for each class j do
Let P(j|x) = 1
i=11i=1P(j|x,Mi)
Where
if pthen
P(j|x,Mi)is produced by Mi
else
P(j|x,Mi) = 1 for the class predicted by Mifor x, and 0 for all others
if qthen
iranges over all Mi
else
iranges over all Misuch that x6∈ Si
Let x’s class = argminijP(j|x)C(i,j)
Let M= Model produced by applying Lto S
Return M
3.3. Evaluation Measures
A useful visual representation of the classification results is the Confusion (Alternatively
Contingency) Matrix [
45
]. The totals of correct predictions are presented under “True” area, while incorrect
predictions totals are presented under “False” area. The outcomes of the prediction process are divided
into Positive (
P
) and Negative (
N
) classes, hence this research tackles a binary classification problem.
The class of interest is the positive class (i.e., a client has subscribed term deposit,
y=yes
) and the other
is a negative class (i.e., a client not interested in a term deposit,
y=no
). To clarify more the predictions
of the classification process, they are divided such that a True Positive (
TP
) and True Negative (
TN
)
are considered correct predictions that match actual facts in the testing dataset. While False Positive
(
FP
) and False Negative (
FN
) are incorrect classifications, meaning an actual negative classified as
positive and actual positive classified as negative, respectively. Table 5illustrates the organization
of the classification results in the Confusion Matrix.
Table 5. Confusion Matrix.
Predicted Willingness to
Take a Term Deposit
Willing to Not interested
Actual Willingness to
Take a Term Deposit
Willing to TP FN
Not interested FP T N
Appl. Sci. 2020,10, 2581 9 of 15
Furthermore, several performance metrics are derived from the confusion matrix to illustrate
more the performance of the prediction model. Some of the metrics that are adopted by this research
are listed in Table 6.
Table 6. Evaluation metrics that are used in this research.
Metric Equation
Total Accuracy (All Correctly Classified Instances) T P+TN
TP+T N+FP+F N
True Positive Rate (TPR), alternatively Recall T P
TP+F N
True Negative Rate (TNR), alternatively Precision TN
TN +FP
Type I Error [45] 1 Recall_Yes =FN
TP+F N =false negative rate as α
Type II Error [45] 1 Recall_No =FP
TN +FP =false positive rate as β
Geometric Mean TPR ×TNR
Lift [46]Li f t =TPR
Sam ple_Size
4. Results
This section presents the results of the experiments supporting the research methodology which
is described in Section 3.2.
The intensive processing of the dataset asserts the high level of challenge in finding an accurate
prediction model. Adding cost sensitivity to the classification algorithm increased dramatically
the prediction accuracy of the positive class, i.e., the target class. At the same time, cost sensitivity led
to relatively slight drop in the accuracy of the negative class. The effects of different cost factors on
the classification performance are illustrated by Geometric Mean in Figure 3, and Type I and Type II
Errors in Figure 4.
Figure 3. The effect of Adjusting FN-Cost Factor on Geometric Mean.
Taking into consideration the target class, i.e., bank clients who subscribed a term deposit.
It is apparent that there is a positive correlation between the cost factor and the classification accuracy
at FN-Cost Factor in the range 1 to 200; nonetheless, an increase of FN-Cost Factor over 200 degrades
the performance of the prediction model. Interestingly, the classifiers do not attain any significant
Appl. Sci. 2020,10, 2581 10 of 15
improvement in terms of geometric mean metric after the FN-Cost Factor of 10. However, it is clear that
Meta-Cost algorithm outperforms Cost-Sensitive Classifier at higher values of FN-Cost Factor.
Figure 4. The effect of Adjusting FN-Cost Factor on Type I and Type II Errors.
Error rates (i.e., Type I and Type II errors) reveal an interesting behavior of the cost-sensitive
schemes. Meta-Cost scheme caused dramatic decrease in Type I error, while Cost-Sensitive Classifier
resulted in a relatively slight increase of Type II Error. Since the target class, i.e., positive class, has
more impact on the main research problem, Meta-Cost scheme of cost-sensitive analysis is proved to
be a candidate solution in overcoming the issue of highly imbalanced bank marketing data. Figure 5
plots the Lift Curve of the Meta-Cost using MLP for classification with a FN-Cost Factor equals 200.
The lift chart indicates the ratio of having positive responses (bank clients accept to subscribe to a term
deposit) to all positive responses. According to this lift chart, reaching only 10% of the total clients
by the developed prediction model we will hit 4.3 times more than if no model was applied (clients
reached randomly).
Figure 5. Lift Curve, Meta-Cost using MLP with FN-Cost Factor of 200.
Appl. Sci. 2020,10, 2581 11 of 15
To illustrate more the achieved enhancement in classification performance, Table 7summarizes
the performance of conventional machine-learning classifiers in Weka environment against the best
cost-sensitive prediction models. The algorithms are namely: Lib-LINEAR (LL), i.e., an implementation
of Support Vector Machines, Decision Table (DT), Very Fast Decision Rules (VFDR), Random Forest
Trees (RF), Multilayer Perceptron (MLP), J48, and Deep Learning for MLP Classifier (DL-MLP).
The experiments were repeated 10 times. 10-Fold Cross Validation is used to evaluate each run.
Results are reported in terms of TPR, TNR, Geometric Mean, Type | Error, Type || Error, and Total
Accuracy metrics. FPR for target class is reported in terms of the average and standard deviation
(SD) of the 10 runs, while average of the 10 runs used to report the remaining metrics. The third part
of the Table 7(C) shows the results of the cost-sensitive analysis in [
1
] on the full version of a bank
marketing dataset [3].
Table 7. Performance Comparison of Different Classification Algorithms.
Algorithm TPR TNR Geometric Mean Type I Error Type II Error Accuracy
(A) Our approach
Meta-Cost-MLP 0.808 0.771 78.93% 0.192 0.229 77.48
CostSensitiveClassifier-MLP 0.614 0.872 73.17% 0.386 0.128 84.18
(B) Conventional machine-learning classifiers
MLP (Baseline) 0.39 0.95 60.87% 0.61 0.05 88.98
DL-MLP 0.36 0.93 57.86% 0.64 0.07 86.24
J48 0.36 0.96 58.79% 0.64 0.04 88.9
LL 0.47 0.83 62.46% 0.53 0.17 78.61
DT 0.33 0.97 56.58% 0.67 0.03 89.49
VFDR 0.46 0.76 59.13% 0.54 0.24 72.61
RF 0.27 0.98 51.44% 0.73 0.02 89.82
(C) Other cost-sensitive results from related works
CSDE * [1] 0.705 0.62.2 66.2% n.a. n.a. n.a.
CSDNN ** [1] 0.615 0.542 57.9% n.a. n.a. n.a.
AdaCost [1] 0.89 0.22 44.2% n.a. n.a. n.a.
Meta-Cost [1] 0.35 0.868 55.1% n.a. n.a. n.a.
* CSDE: Cost-Sensitive Deep Neural Network Ensemble, ** CSDNN: Cost-Sensitive Deep Neural Network.
5. Discussion and Implications
All classification algorithms studied in Table 7are sensitive against imbalanced datasets [
47
].
Results are expected to be biased towards the majority class, making more false negatives. In the case
of this study, term deposit subscription, more interested bank clients are expected to be misclassified
as uninterested (i.e., higher risk of losing potential opportunity).
Results of the experiments in this research support the propositions in [
1
,
40
,
44
]; which
expect better results of Meta-Cost over Cost-Sensitive Classification method in some cases
of imbalanced datasets. The nature of Meta-Cost method enables it to outperform other related
methods in certain cases, this is apparent in the algorithms willingness to make more relevant heuristics.
The overall results of using cost-sensitive approaches show classification performance improvement
in comparison with classical classification algorithms. Such improvement is illustrated in Figure 6that
plots Type I and Type II errors base lines for Meta-Cost-MLP at 200 Cost Factor as comparison points
with other classical classification algorithms. The base lines indicate that Type I and Type II errors
are minimized to 0.192 and 0.229, respectively.
Appl. Sci. 2020,10, 2581 12 of 15
Figure 6. Meta-Cost-MLP vs Conventional machine-learning classifiers.
The comparison of performance asserts that Meta-Cost-based classification maintained an
acceptable trade-off between Type I and II errors. Consequently, it would benefit the decision-making
process in understanding and judging the probability of clients term deposit subscription.
Decision makers in the bank industry would have data-driven assessment of the risks associated
with their marketing efforts. Instead of qualitative or probable subjective evaluations of marketing
decisions; an objective, quantifiable, and justifiable prospective would support the decisions.
Furthermore, market and client segmentation will be based on actual historical data that are derived
from the market itself. Calibration of the cost Equation (2) is one of the possible data-driven risk
assessment tools. In which
(FN ×λ)
represents the cost not contacting the clients who are willing to
apply for a term deposit, and (FP ×µ)represents the cost of contacting uninterested clients.
TotalCost = (F N ×λ) + (FP ×µ)(2)
The cost of missing a potential client is much higher than contacting uninterested client.
The quantification of the cost for decision makers, according to the best prediction model,
is TotalCost = (FN ×200) + (FP ×1) = (100 ×200) + (918 ×1) = 20, 918.00
. If a domain expert
provides an equivalence of cost factor in local currency
, the exact cost will be
TotalCost ×
(i.e., 20, 918.00 ×for the population of this study).
6. Limitations and Future Research
This research proposed an enhanced telemarketing prediction model using cost-sensitive analysis.
Nonetheless, the research outcomes should be interpreted with caution and there might be some
possible limitations. The issue of highly imbalanced datasets is challenging; such that it was almost
impossible to completely evade its effects on the classification algorithms. The used cost matrix is an
arbitrary selection of possible cost factors; domain experts should be consulted to assign realistic
cost values. Cost-sensitive analysis improved the prediction accuracy of the target group; on the other
side, there is a slight drop in the prediction accuracy of the second group of customer (i.e., customers
not willing to apply for a term deposit). It is unavoidable to make a trade-off between Type I and Type
II errors; this a fundamental characteristic of cost-sensitive methods. The ANN model itself is not
self-explanatory; a representative practical model has been developed to support the streamlining
of the decision-making process in the business domain. Several cost-sensitive classifiers applied;
Meta-Cost classifier exhibited a high degree of granularity while maintaining best performance in most
cases. Interestingly, obtained results are optimistic. Future research would attempt to overcome
Appl. Sci. 2020,10, 2581 13 of 15
the research limitations and cover the areas that were difficult to be tackled in this research such
as: estimating cost factor values from the perspective of domain experts. Applying real-cost values
in prediction model building process. Applying the same approach on other real data. Developing
self-explanatory decision process systems or algorithms.
Author Contributions:
Conceptualization, N.G. and H.F.; methodology, N.G., H.F. and I.A.; software, N.G.
and I.A.; validation, N.G. and I.A.; formal analysis, N.G., H.F. and I.A.; investigation, Y.H.; resources, N.G.,
H.F. and I.A.; data curation, N.G.; writing–original draft preparation, N.G., H.F., I.A., and Y.H.; writing–review
and editing, N.G., H.F., I.A., Y.H. and A.H.; visualization, N.G. and I.A.; supervision, N.G. and H.F. All authors
have read and approved the final version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Wong, M.L.; Seng, K.; Wong, P.K. Cost-sensitive ensemble of stacked denoising autoencoders for class
imbalance problems in business domain. Expert Syst. Appl.
2020
,141, 112918, doi:10.1016/j.eswa.2019.112918.
2.
Bigus, J.P. Data Mining with Neural Networks: Solving Business Problems from Application Development to
Decision Support; McGraw-Hill, Inc.: New York, NY, USA, 1996.
3.
Moro, S.; Cortez, P.; Rita, P. A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst.
2014,62, 22–31.
4.
Waller, M.A.; Fawcett, S.E. Data science, predictive analytics, and big data: A revolution that will transform
supply chain design and management. J. Bus. Logist. 2013,34, 77–84.
5.
Ghatasheh, N. Business Analytics using Random Forest Trees for Credit Risk Prediction: A Comparison Study.
Int. J. Adv. Sci. Technol. 2014,72, 19–30.
6.
Faris, H.; Al-Shboul, B.; Ghatasheh, N. A genetic programming based framework for churn prediction in
telecommunication industry. Lect. Notes Comput. Sci. 2014,8733, 353–362.
7.
Ajah, I.A.; Nweke, H.F. Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications.
Big Data Cogn. Comput. 2019,3, 32, doi:10.3390/bdcc3020032.
8.
Chen, Y.; Guo, J.; Li, C.; Ren, W. FaDe: A Blockchain-Based Fair Data Exchange Scheme for Big Data Sharing.
Future Internet 2019,11, 225, doi:10.3390/fi11110225.
9.
Liu, H.; Huang, Y.; Wang, Z.; Liu, K.; Hu, X.; Wang, W. Personality or Value: A Comparative Study of
Psychographic Segmentation Based on an Online Review Enhanced Recommender System. Appl. Sci.
2019,9, 1992, doi:10.3390/app9101992.
10.
Moro, S.; Cortez, P.; Rita, P. A divide-and-conquer strategy using feature relevance and expert knowledge
for enhancing a data mining approach to bank telemarketing. Expert Syst. 2018,35, e12253.
11.
Gerrikagoitia, J.K.; Castander, I.; Rebón, F.; Alzua-Sorzabal, A. New trends of Intelligent E-Marketing based
on Web Mining for e-shops. Procedia-Soc. Behav. Sci. 2015,175, 75–83.
12.
Burez, J.; Van den Poel, D. CRM at a pay-TV company: Using analytical models to reduce customer attrition
by targeted marketing for subscription services. Expert Syst. Appl. 2007,32, 277–288.
13.
Corte, V.D.; Iavazzi, A.; D’Andrea, C. Customer involvement through social media: the cases of some
telecommunication firms. J. Open Innov. Technol. Mark. Complex. 2015,1, doi:10.1186/s40852-015-0011-y.
14.
Ayoubi, M. Customer Segmentation Based on CLV Model and Neural Network. Int. J. Comput. Sci. Issues
2016,13, 31–37, doi:10.20943/01201602.3137.
15.
Rendón, E.; Alejo, R.; Castorena, C.; Isidro-Ortega, F.J.; Granda-Gutiérrez, E.E. Data Sampling Methods to
Deal With the Big Data Multi-Class Imbalance Problem. Appl. Sci.
2020
,10, 1276, doi:10.3390/app10041276.
16.
Kaur, H.; Pannu, H.S.; Malhi, A.K. A Systematic Review on Imbalanced Data Challenges in Machine
Learning: Applications and Solutions. ACM Comput. Surv. 2019,52, doi:10.1145/3343440.
17.
Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data:
Review of methods and applications. Expert Syst. Appl. 2017,73, 220–239.
18.
Lin, H.I.; Nguyen, M.C. Boosting Minority Class Prediction on Imbalanced Point Cloud Data. Appl. Sci.
2020,10, 973, doi:10.3390/app10030973.
Appl. Sci. 2020,10, 2581 14 of 15
19.
Gonzalez-Cuautle, D.; Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, L.K.; Portillo-Portillo, J.;
Olivares-Mercado, J.; Perez-Meana, H.M.; Sandoval-Orozco, A.L. Synthetic Minority Oversampling
Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets. Appl. Sci.
2020,10, 794, doi:10.3390/app10030794.
20.
Suh, S.; Lee, H.; Jo, J.; Lukowicz, P.; Lee, Y. Generative Oversampling Method for Imbalanced Data on
Bearing Fault Detection and Diagnosis. Appl. Sci. 2019,9, 746, doi:10.3390/app9040746.
21.
Alejo, R.; Monroy-De-Jesús, J.; Pacheco-Sánchez, J.; López-González, E.; Antonio-Velázquez, J. A Selective
Dynamic Sampling Back-Propagation Approach for Handling the Two-Class Imbalance Problem. Appl. Sci.
2016,6, 200, doi:10.3390/app6070200.
22.
Ghochani, M.; Afzalian, M.; Gheitasi, S.; Gheitasi, S. Simulation of customer behavior using artificial neural
network techniques. Int. J. Inf. Bus. Manag. 2013,5, 59–68.
23.
Kim, Y.; Street, W.N.; Russell, G.J.; Menczer, F. Customer Targeting: A Neural Network Approach Guided by
Genetic Algorithms. Manag. Sci. 2005,51, 264–276, doi:10.1287/mnsc.1040.0296.
24.
Elsalamony, H.A.; Elsayad, A.M. Bank Direct Marketing Based on Neural Network and C5. 0 Models. Int. J.
Eng. Adv. Technol. 2013,2, 392–400.
25.
Guresen, E.; Kayakutlu, G.; Daim, T.U. Using artificial neural network models in stock market index
prediction. Expert Syst. Appl. 2011,38, 10389–10397.
26.
Zakaryazad, A.; Duman, E. A profit-driven Artificial Neural Network (ANN) with applications to fraud
detection and direct marketing. Neurocomputing 2016,175, 121–131.
27.
Koç, A.; Yeniay, Ö. A Comparative Study of Artificial Neural Networks and Logistic Regression for Classification
of Marketing Campaign Results. Math. Comput. Appl. 2013,18, 392–398, doi:10.3390/mca18030392.
28.
Adwan, O.; Faris, H.; Jaradat, K.; Harfoushi, O.; Ghatasheh, N. Predicting customer churn in telecom
industry using multilayer preceptron neural networks: Modeling and analysis. Life Sci. J. 2014,11, 75–81.
29.
Mitik, M.; Korkmaz, O.; Karagoz, P.; Toroslu, I.H.; Yucel, F. Data Mining Approach for Direct Marketing of
Banking Products with Profit/Cost Analysis. Rev. Socionetw. Strateg. 2017,11, 17–31.
30.
Khor, K.C.; Ng, K.H. Evaluation of Cost Sensitive Learning for Imbalanced Bank Direct Marketing Data.
Indian J. Sci. Technol. 2016,9, doi:10.17485/ijst/2016/v9i42/100812.
31.
Naseri, M.B.; Elliott, G. A Comparative Analysis of Artificial Neural Networks and Logistic Regression.
J. Decis. Syst. 2010,19, 291–312, doi:10.3166/jds.19.291-312.
32.
Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining, Fourth Edition: Practical Machine Learning Tools and
Techniques, 4th ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2016.
33.
Kalid, S.N.; Khor, K.C.; Ng, K.H. Effective Classification for Unbalanced Bank Direct Marketing Data
with Over-sampling. In Proceedings of the Knowledge Management International Conference (KMICe),
Langkawi, Kedah, 12–15 August 2014; pp. 16–21.
34.
Jiang, X.; Pan, S.; Long, G.; Chang, J.; Jiang, J.; Zhang, C. Cost-sensitive hybrid neural networks for
heterogeneous and imbalanced data. In Proceedings of the IEEE International Joint Conference on Neural
Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8.
35.
Ghazikhani, A.; Monsefi, R.; Yazdi, H.S. Online cost-sensitive neural network classifiers for non-stationary
and imbalanced data streams. Neural Comput. Appl. 2013,23, 1283–1295.
36.
Elkan, C. The Foundations of Cost-sensitive Learning. In Proceedings of the 17th International Joint
Conference on Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; Morgan Kaufmann Publishers Inc.:
San Francisco, CA, USA, 2001; Volume 2; pp. 973–978.
37.
Chandrasekara, V.; Tilakaratne, C.; Mammadov, M. An Improved Probabilistic Neural Network Model for
Directional Prediction of a Stock Market Index. Appl. Sci. 2019,9, 5334, doi:10.3390/app9245334.
38.
Feng, W.; Huang, W.; Ren, J. Class Imbalance Ensemble Learning Based on the Margin Theory. Appl. Sci.
2018,8, 815, doi:10.3390/app8050815.
39.
Collell, G.; Prelec, D.; Patil, K.R. A simple plug-in bagging ensemble based on threshold-moving for
classifying binary and multiclass imbalanced data. Neurocomputing 2018,275, 330–340.
40.
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA Data Mining Software:
An Update. SIGKDD Explor. Newsl. 2009,11, 10–18, doi:10.1145/1656274.1656278.
41.
Wang, J. Encyclopedia of Data Warehousing and Mining, Second Edition, 2nd ed.; IGI Publishing: Hershey, PA,
USA, 2008.
Appl. Sci. 2020,10, 2581 15 of 15
42.
Han, X.; Cui, R.; Lan, Y.; Kang, Y.; Deng, J.; Jia, N. A Gaussian mixture model based combined resampling algorithm
for classification of imbalanced credit data sets. Int. J. Mach. Learn. Cybern. 2019, doi:10.1007/s13042-019-00953-2.
43.
Ling, C.X.; Sheng, V.S. Cost-Sensitive Learning and the Class Imbalance Problem. In Encyclopedia of Machine
Learning; Springer: Berlin, Germany, 2008.
44.
Domingos, P. MetaCost: A General Method for Making Classifiers Cost-sensitive. In Proceedings of the Fifth
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’99, San Diego,
CA, USA, 15–18 August 1999; ACM: New York, NY, USA, 1999; pp. 155–164, doi:10.1145/312129.312220.
45.
Powers, D.M.W. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness &
Correlation; Technical Report SIE-07-001; School of Informatics and Engineering, Flinders University:
Adelaide, Australia, 2007.
46.
Berry, M.J.; Linoff, G. Data Mining Techniques: For Marketing, Sales, and Customer Support; John Wiley & Sons,
Inc.: New York, NY, USA, 1997.
47.
Palade, V. Class imbalance learning methods for support vector machines. In Imbalanced Learning: Foundations,
Algorithms, and Applications; Wiley: Hoboken, NJ, USA, 2013; p. 83.
©
2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
... To increase the success rate and improve effectiveness, scholars recommend that banks elaborate analysis of consumer information before telemarketing. In recent years, increasing studies have applied machine learning (ML) to identify potential consumers who are willing to subscribe to deposits (Yan, Li, & Liu, 2020;Moro, Cortez, & Rita, 2014;Ghatasheh et al., 2020;Feng, Yin, Wang, & Dhamotharan, 2022). In this way, banks would save effort and resources on campaigns by improving the effectiveness of bank telemarketing. ...
... Yan, Li, and Liu (2020) adopt the improved whale optimization algorithm to the Kohonen network with supervised learning, which enhances the classification accuracy of bank telemarketing. Ghatasheh et al. (2020) modify the artificial neural network model by cost-sensitive analysis for bank telemarketing prediction, which shows good prediction performance and mitigates the dramatic effects of highly imbalanced data without distorting the original data samples. Tékouabou et al. (2022) apply the class membership-based approach to bank telemarketing, and accurately predict potential customers before launching a campaign. ...
... FNR and FPR indicate that banks miss potential depositors and contact customers who are not interested in time deposits, respectively. Marketing managers believe that the cost of missing potential customers is relatively higher than that of contacting with uninterested customers (Ghatasheh et al., 2020). In other words, it is more critical to reduce FNR than FPR in predicting the success of bank telemarketing. ...
Article
Because of the low cost and user-friendliness, telemarketing has become a common way for banks to obtain deposits for a long time. Meanwhile, researchers have been attempting to analyze consumer information in-depth to improve the effectiveness of bank telemarketing and attract deposits through telephone communication. In this paper, we construct bank telemarketing prediction models by three machine learning (ML) methods, i.e., Random Subspace (RS), Multi-Boosting (MB) and Random Subspace-Multi-Boosting (RS-MB), and find the best performing model. Also, we make the interpretability analysis to provide banks with decision information to develop and implement an effective marketing plan. We rank the importance of the original independent variables by the ML method and select those variables whose influence on the prediction results was significant. And we reconstruct the bank telemarketing prediction models based on the selected independent variables. Furthermore, we illustrate the marginal effects of the selected independent variables on the consumers’ subscription of deposits by the Partial Dependence Plots (PDP) to analyze how these selected independent variables affect the success of bank telemarketing campaigns. The empirical results indicate that the RS-MB using selected independent variables achieves the best performance for prediction. It is worth noting that banks would rather contact uninterested customers than miss potential deposit customers. Therefore, when predicting the success of telemarketing campaigns, it is more critical to reduce false negative rate than false positive rate. Moreover, banks using telemarketing should pay more attention to type of job that the customer does, the month that the customer was connected, and contact day of week.
... Assuming a marketer adopts a strategy to contact all the customers from the purchase database to minimize a Type I error (defined as not targeting customers who otherwise are profitable) and simultaneously Type II error increases (defined as targeting customer who is not eager to buy) (Ghatasheh et al., 2020). Optimizing both errors above also becomes costly due to the negative influence of both (Venkatesan & Kumar, 2004). ...
Article
Predicting profitable customers is a strategic knowledge portfolio of retailer managers because some customers are better profitable than others in a business. The present work is an effort to demonstrate a better model of predicting profitable customers. We apply the k-means algorithm to identify customer patterns based on Recency, Frequency, and Monetary (RFM) attributes computed from a real-life dataset of UK-based and registered non-store online retail. Six data mining models have been applied to each identified pattern and overall data to predict whether each customer would purchase in the next six months or not. A comparative analysis of identified pattern characteristics and predictable performances and Type I and Type II errors have been performed to identify the target customer group in terms of better predictability and profitability. The identified patterns help to generate novel marketing strategies. Thus, the retailers may successfully target the most consistently profitable customer groups to apply diverse knowledge on marketing strategies for the specific pattern.
... There are many concerns in using the total accuracy as a performance metric, more particularly in imbalanced datasets [72]- [74]. Usually the negative class is dominant and more frequent in real life. ...
Article
Full-text available
Recently, spam on online social networks has attracted attention in the research and business world. Twitter has become the preferred medium to spread spam content. Many research efforts attempted to encounter social networks spam. Twitter brought extra challenges represented by the feature space size, and imbalanced data distributions. Usually, the related research works focus on part of these main challenges or produce black-box models. In this paper, we propose a modified genetic algorithm for simultaneous dimensionality reduction and hyper parameter optimization over imbalanced datasets. The algorithm initialized an eXtreme Gradient Boosting classifier and reduced the features space of tweets dataset; to generate a spam prediction model. The model is validated using a 50 times repeated 10-fold stratified cross-validation, and analyzed using nonparametric statistical tests. The resulted prediction model attains on average 82.32% and 92.67% in terms of geometric mean and accuracy respectively, utilizing less than 10% of the total feature space. The empirical results show that the modified genetic algorithm outperforms $Chi^{2}$ and $PCA$ feature selection methods. In addition, eXtreme Gradient Boosting outperforms many machine learning algorithms, including BERT-based deep learning model, in spam prediction. Furthermore, the proposed approach is applied to SMS spam modeling and compared to related works.
Chapter
Data mining plays a vital role in the success of direct marketing campaigns by predicting which leads subscribe to a term deposit. This study is accomplished to illustrate with practical mining methods that the data are related to a Portuguese banking institution’s direct marketing campaign (phone calls). The algorithms are used: K-nearest neighbor, logistic regression, linear supported vector machines, and extreme gradient boosting to classify potential customers for long-term deposits finance products. Response coding is used to vectorize categorical data while solving a machine learning classification problem. Accuracy and AUC scores are key metrics to evaluate performance. We inherited selecting important features from previous research. This paper employed a better method by combining response coding techniques with practical algorithms in an unbalanced dataset. The best prediction model achieved 91.07% and 0.9324 of accuracy and AUC score, significantly higher than the prior of 79% and 0.8, respectively.
Article
Imbalanced binary classification plays an important role in many applications. Some popular classifiers, such as logistic regression (LR), usually underestimate the probability of the minority class. Therefore, in this paper, we introduce two novel methods under distribution uncertainty, the idea of which is to modify the predicted probability with an additional uncertainty estimation. We develop the mean-uncertain method and the volatility-uncertain method, respectively, by assuming that the disturbance term follows the maximal and the G-normal distributions, which are the most important distributions within a sublinear expectation framework. Experiments on the simulated dataset and 10real-life datasets are conducted to compare the newly proposed approaches to several existing ones, including two resampling methods and two regression-based methods. The results of experiments show that our methods outperform most of the others in common evaluation metrics, especially the accuracy of the minority class.
Chapter
In this paper, we present a cutting-edge approach to predict the success of telemarketing calls by banks to sell their long-lasting deposits. The presented paper can substantially aid in predicting the future behavior of the ever-green field of telemarketing. In recent times, various new methods and strategies have been adopted which creates an illusion that classic telemarketing has vanished, but in reality, it has just evolved. A dataset from the UCI repository consisting of information about the retail bank processed data on its customers, products, and socioeconomic attributes, including the impact of the financial crisis. An initial set comprising of 150 features was explored and 21 of the eminent features were selected, including labels for the proposed method. This research presents a new modeling methodology performed with deep learning models with different dense layers combined. For validation, 6 distinct approaches having different parameters and dense layers were utilized and further analysis of the respective output was performed. The highest value of 90.34% testing accuracy and 91.04% training accuracy are obtained, respectively, by the proposed model. The results of deep learning models are compared with the traditional machine learning approach like the support vector machine, random forest, and k-nearest neighbor.
Chapter
Generation the Distance Matrix (DMx) is an important aspect that influences the correct solution of the routing problem in the dynamic variant. In the case of a frequent changing of points number and location, a continuous and effective update of the data is required, e.g., from more and more popular services such as Mapping APIs. The time-consuming nature of this process, which may extend the planning process, was emphasized. The article discusses the possibility of estimating the distance matrix based on the correction of the “haversine” distance. Method for the generation and updating of the DMx was proposed. The influence of update progress on some optimization algorithms was investigated. The research was carried out on the example of the real VRP problem. It was found that even a partial DMx update can significantly reduce the discrepancy between the VRP optimization results.
Chapter
Nowadays, younger generation is much more exposed to technology than previous generations used to. The recent advances in artificial intelligence (AI) and particularly natural language processing (NLP) and understanding (NLU) make it possible to reinforce and widespread the adoption of AI chatbots in education not only to help students in their administrative affairs or in academic advising but also in assisting them and monitoring their performance during their learning experience. This paper presents a review of the different methods and tools devoted to the design of chatbots with an emphasis on their use and challenges in the education field. Additionally, this paper focuses on language-related challenges and obstacles that hinder the implementation of English, Arabic, and other languages of chatbots. To show how AI chatbots benefit education, a use case is described where Hubert.ai chatbot has been used to assess students’ feedback regarding a machine learning course evaluation.
Article
Full-text available
Outbound telemarketing is an efficient direct marketing method wherein telemarketers solicit potential customers by phone to purchase or subscribe to products or services. However, those who are not interested in the information or offers provided by outbound telemarketing gen-erally experience such interactions negatively because they perceive telemarketing as spam. In this study, therefore, we investigate the use of deep learning models to predict the success of outbound telemarketing for insurance policy loans. We propose an explainable multiple-filter convolutional neural network model called XmCNN that can alleviate overfitting and extract various high-level features using hundreds of input variables. To enable the practical application of the proposed method, we also examine ensemble models to further improve its performance. We experimentally demonstrate that the proposed XmCNN significantly outperformed conventional deep neural network models and machine learning models. Furthermore, a deep learning ensemble model con-structed using the XmCNN architecture achieved the lowest false positive rate (4.92%) and the highest F1-score (87.47%). We identified important variables influencing insurance policy loan prediction through the proposed model, suggesting that these factors should be considered in practice. The proposed method may increase the efficiency of outbound telemarketing and reduce the spam problems caused by calling non-potential customers.
Article
Full-text available
The class imbalance problem has been a hot topic in the machine learning community in recent years. Nowadays, in the time of big data and deep learning, this problem remains in force. Much work has been performed to deal to the class imbalance problem, the random sampling methods (over and under sampling) being the most widely employed approaches. Moreover, sophisticated sampling methods have been developed, including the Synthetic Minority Over-sampling Technique (SMOTE), and also they have been combined with cleaning techniques such as Editing Nearest Neighbor or Tomek’s Links (SMOTE+ENN and SMOTE+TL, respectively). In the big data context, it is noticeable that the class imbalance problem has been addressed by adaptation of traditional techniques, relatively ignoring intelligent approaches. Thus, the capabilities and possibilities of heuristic sampling methods on deep learning neural networks in big data domain are analyzed in this work, and the cleaning strategies are particularly analyzed. This study is developed on big data, multi-class imbalanced datasets obtained from hyper-spectral remote sensing images. The effectiveness of a hybrid approach on these datasets is analyzed, in which the dataset is cleaned by SMOTE followed by the training of an Artificial Neural Network (ANN) with those data, while the neural network output noise is processed with ENN to eliminate output noise; after that, the ANN is trained again with the resultant dataset. Obtained results suggest that best classification outcome is achieved when the cleaning strategies are applied on an ANN output instead of input feature space only. Consequently, the need to consider the classifier’s nature when the classical class imbalance approaches are adapted in deep learning and big data scenarios is clear.
Article
Full-text available
Data imbalance during the training of deep networks can cause the network to skip directly to learning minority classes. This paper presents a novel framework by which to train segmentation networks using imbalanced point cloud data. PointNet, an early deep network used for the segmentation of point cloud data, proved effective in the point-wise classification of balanced data; however, performance degraded when imbalanced data was used. The proposed approach involves removing between-class data point imbalances and guiding the network to pay more attention to majority classes. Data imbalance is alleviated using a hybrid-sampling method involving oversampling, as well as undersampling, respectively, to decrease the amount of data in majority classes and increase the amount of data in minority classes. A balanced focus loss function is also used to emphasize the minority classes through the automated assignment of costs to the various classes based on their density in the point cloud. Experiments demonstrate the effectiveness of the proposed training framework when provided a point cloud dataset pertaining to six objects. The mean intersection over union (mIoU) test accuracy results obtained using PointNet training were as follows: XYZRGB data (91%) and XYZ data (86%). The mIoU test accuracy results obtained using the proposed scheme were as follows: XYZRGB data (98%) and XYZ data (93%).
Article
Full-text available
Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples can significantly vary. In highly unbalanced data, classification models regularly have high precision with respect to the majority class, while minority classes are considered noise due to the lack of information that they provide. Well-known datasets used for malware-based analyses like botnet attacks and Intrusion Detection Systems (IDS) mainly comprise logs, records, or network-traffic captures that do not provide an ideal source of evidence as a result of obtaining raw data. As an example, the numbers of abnormal and constant connections generated by either botnets or intruders within a network are considerably smaller than those from benign applications. In most cases, inadequate dataset design may lead to the downgrade of a learning algorithm, resulting in overfitting and poor classification rates. To address these problems, we propose a resampling method, the Synthetic Minority Oversampling Technique (SMOTE) with a grid-search algorithm optimization procedure. This work demonstrates classification-result improvements for botnet and IDS datasets by merging synthetically generated balanced data and tuning different supervised-learning algorithms.
Article
Full-text available
Financial market prediction attracts immense interest among researchers nowadays due to rapid increase in the investments of financial markets in the last few decades. The stock market is one of the leading financial markets due to importance and interest of many stakeholders. With the development of machine learning techniques, the financial industry thrived with the enhancement of the forecasting ability. Probabilistic neural network (PNN) is a promising machine learning technique which can be used to forecast financial markets with a higher accuracy. A major limitation of PNN is the assumption of Gaussian distribution as the distribution of input variables which is violated with respect to financial data. The main objective of this study is to improve the standard PNN by incorporating a proper multivariate distribution as the joint distribution of input variables and addressing the multi-class imbalanced problem persisting in the directional prediction of the stock market. This model building process is illustrated and tested with daily close prices of three stock market indices: AORD, GSPC and ASPI and related financial market indices. Results proved that scaled t distribution with location, scale and shape parameters can be used as more suitable distribution for financial return series. Global optimization methods are more appropriate to estimate better parameters of multivariate distributions. The global optimization technique used in this study is capable of estimating parameters with considerably high dimensional multivariate distributions. The proposed PNN model, which considers multivariate scaled t distribution as the joint distribution of input variables, exhibits better performance than the standard PNN model. The ensemble technique: multi-class undersampling based bagging (MCUB) was introduced to handle class imbalanced problem in PNNs is capable enough to resolve multi-class imbalanced problem persisting in both standard and proposed PNNs. Final model proposed in the study with proposed PNN and proposed MCUB technique is competent in forecasting the direction of a given stock market index with higher accuracy, which helps stakeholders of stock markets make accurate decisions.
Article
Full-text available
In the big data era, data are envisioned as critical resources with various values, e.g., business intelligence, management efficiency, and financial evaluations. Data sharing is always mandatory for value exchanges and profit promotion. Currently, certain big data markets have been created for facilitating data dissemination and coordinating data transaction, but we have to assume that such centralized management of data sharing must be trustworthy for data privacy and sharing fairness, which very likely imposes limitations such as joining admission, sharing efficiency, and extra costly commissions. To avoid these weaknesses, in this paper, we propose a blockchain-based fair data exchange scheme, called FaDe. FaDe can enable de-centralized data sharing in an autonomous manner, especially guaranteeing trade fairness, sharing efficiency, data privacy, and exchanging automation. A fairness protocol based on bit commitment is proposed. An algorithm based on blockchain script architecture for a smart contract, e.g., by a bitcoin virtual machine, is also proposed and implemented. Extensive analysis justifies that the proposed scheme can guarantee data exchanging without a trusted third party fairly, efficiently, and automatically.
Article
Full-text available
Big data and business analytics are trends that are positively impacting the business world. Past researches show that data generated in the modern world is huge and growing exponentially. These include structured and unstructured data that flood organizations daily. Unstructured data constitute the majority of the world’s digital data and these include text files, web, and social media posts, emails, images, audio, movies, etc. The unstructured data cannot be managed in the traditional relational database management system (RDBMS). Therefore, data proliferation requires a rethinking of techniques for capturing, storing, and processing the data. This is the role big data has come to play. This paper, therefore, is aimed at increasing the attention of organizations and researchers to various applications and benefits of big data technology. The paper reviews and discusses, the recent trends, opportunities and pitfalls of big data and how it has enabled organizations to create successful business strategies and remain competitive, based on available literature. Furthermore, the review presents the various applications of big data and business analytics, data sources generated in these applications and their key characteristics. Finally, the review not only outlines the challenges for successful implementation of big data projects but also highlights the current open research directions of big data analytics that require further consideration. The reviewed areas of big data suggest that good management and manipulation of the large data sets using the techniques and tools of big data can deliver actionable insights that create business values.
Article
Full-text available
Big consumer data promises to be a game changer in applied and empirical marketing research. However, investigations of how big data helps inform consumers’ psychological aspects have, thus far, only received scant attention. Psychographics has been shown to be a valuable market segmentation path in understanding consumer preferences. Although in the context of e-commerce, as a component of psychographic segmentation, personality has been proven to be effective for prediction of e-commerce user preferences, it still remains unclear whether psychographic segmentation is practically influential in understanding user preferences across different product categories. To the best of our knowledge, we provide the first quantitative demonstration of the promising effect and relative importance of psychographic segmentation in predicting users’ online purchasing preferences across different product categories in e-commerce by using a data-driven approach. We first construct two online psychographic lexicons that include the Big Five Factor (BFF) personality traits and Schwartz Value Survey (SVS) using natural language processing (NLP) methods that are based on behavior measurements of users’ word use. We then incorporate the lexicons in a deep neural network (DNN)-based recommender system to predict users’ online purchasing preferences considering the new progress in segmentation-based user preference prediction methods. Overall, segmenting consumers into heterogeneous groups surprisingly does not demonstrate a significant improvement in understanding consumer preferences. Psychographic variables (both BFF and SVS) significantly improve the explanatory power of e-consumer preferences, whereas the improvement in prediction power is not significant. The SVS tends to outperform BFF segmentation, except for some product categories. Additionally, the DNN significantly outperforms previous methods. An e-commerce-oriented SVS measurement and segmentation approach that integrates both BFF and the SVS is recommended. The strong empirical evidence provides both practical guidance for e-commerce product development, marketing and recommendations, and a methodological reference for big data-driven marketing research.
Article
Full-text available
Credit scoring represents a two-classification problem. Moreover, the data imbalance of the credit data sets, where one class contains a small number of data samples and the other contains a large number of data samples, is an often problem. Therefore, if only a traditional classifier is used to classify the data, the final classification effect will be affected. To improve the classification of the credit data sets, a Gaussian mixture model based combined resampling algorithm is proposed. This resampling approach first determines the number of samples of the majority class and the minority class using a sampling factor. Then, the Gaussian mixture clustering is used for undersampling of the majority of samples, and the synthetic minority oversampling technique is used for the rest of the samples, so an eventual imbalance problem is eliminated. Here we compare several resampling methods commonly used in the analysis of imbalanced credit data sets. The obtained experimental results demonstrate that the proposed method consistently improves classification performances such as F-measure, AUC, G-mean, and so on. In addition, the method has strong robustness for credit data sets.
Article
In machine learning, the data imbalance imposes challenges to perform data analytics in almost all areas of real-world research. The raw primary data often suffers from the skewed perspective of data distribution of one class over the other as in the case of computer vision, information security, marketing, and medical science. The goal of this article is to present a comparative analysis of the approaches from the reference of data pre-processing, algorithmic and hybrid paradigms for contemporary imbalance data analysis techniques, and their comparative study in lieu of different data distribution and their application areas.
Article
Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.