ArticlePDF Available

Abstract and Figures

More and more financial transactions through different E-commerce platforms have appeared now-days within the big data era bringing plenty of opportunities but also challenges and risks of stealing information for potential frauds that need to be faced. This is due to the massive use of tools such as credit cards for electronic payments which are targeted by attackers to steal sensitive information and perform fraudulent operations. Although intelligent fraud detection systems have been developed to face the problem, they still suffer from some well-known problems due to the imbalance of the used data. Therefore this paper proposes a novel data intelligence technique based on a Prudential Multiple Consensus model which combines the effectiveness of several state-of-the-art classification algorithms by adopting a twofold criterion, probabilistic and majority based. The goal is to maximize the effectiveness of the model in detecting fraudulent transactions regardless the presence of any data imbalance. Our model has been validated with a set of experiments on a large real-world dataset characterized by a high degree of data imbalance and results show how the proposed model outperforms several state-of-the-art solutions, both in terms of ensemble models and classification approaches.
Content may be subject to copyright.
Fraud Detection for E-commerce Transactions by
Employing a Prudential Multiple Consensus Model
Salvatore Cartaa, Gianni Fenua, Diego Reforgiato Recuperoa, Roberto Saiaa
aDepartment of Mathematics and Computer Science, University of Cagliari
Palazzo delle Scienze, Via Ospedale 72, 09124 Cagliari
Abstract
More and more financial transactions through different E-commerce platforms have
appeared now-days within the big data era bringing plenty of opportunities but also
challenges and risks of stealing information for potential frauds that need to be faced.
This is due to the massive use of tools such as credit cards for electronic payments
which are targeted by attackers to steal sensitive information and perform fraudulent
operations. Although intelligent fraud detection systems have been developed to face
the problem, they still suffer from some well-known problems due to the imbalance of
the used data. Therefore this paper proposes a novel data intelligence technique based
on a Prudential Multiple Consensus model which combines the effectiveness of several
state-of-the-art classification algorithms by adopting a twofold criterion, probabilistic
and majority based. The goal is to maximize the effectiveness of the model in detecting
fraudulent transactions regardless the presence of any data imbalance. Our model has
been validated with a set of experiments on a large real-world dataset characterized by
a high degree of data imbalance and results show how the proposed model outperforms
several state-of-the-art solutions, both in terms of ensemble models and classification
approaches.
Keywords: Information Security,Credit Card,Fraud Detection,Machine
Learning
1. Introduction
Nowadays, the employment of credit cards for financial transactions represent the
backbone of the E-commerce dynamics and business, since they allows purchasing in
real-time of goods and services all over the world and using any device (smart-phone,
tablet, pc) connected to the Internet. As it can be noticed, there are risks associated5
to this operation that might cause the theft of sensitive information associated to the
Corresponding author
Email address: diego.reforgiato@unica.it (Diego Reforgiato Recupero)
Preprint submitted to Journal of Information Security and Applications February 22, 2019
customers’ credit cards. A recent report from the European Payments Council1shows
that a certain percentage of the Internet electronic payments is related to frauds.
One more analysis has been performed by the Euromonitor International2in the
Europe, Middle East and Africa (EMEA) area, which shows that the number of frauds,10
and the associated budget in euros, within the EMEA area kept growing from 2006 to
2016, the year of the publication of the study. The values showing that are reported
in Figure 1. Although these data refer only to the EMEA area, they clearly underline
the seriousness of the problem. In US, the American Association of Fraud Examiners3
found out that 15% of all the frauds are somehow connected to credit cards transactions,15
and this represents the 80% of the whole financial value.
According to the FBI’s Internal Crime Complaint Center (IC3)4, the term credit
card fraud is defined as:
a wide-ranging term for fraud committed using a credit card or any similar
payment mechanism as a fraudulent source of funds in a transaction.20
It can be carried out in two different ways, off-line or on-line [1]. If the fraud is
off-line, that means the credit card has been previously stolen and then used to perform
fraudulent payments, assuming the identity of the legitimate owner. For this case the
thief has a limit amount of time which lasts from the time of the theft of the credit card
to the time when the owner reports to his/her bank and the bank consequently disables25
the card. When the fraud is on-line, and this is the most common, the information been
stolen is digital and it has been obtained in several ways (e.g., skimming,shimming,
cloning, or phishing). Once the fraudster obtains this information, he/she can purchase
through the Internet, until either the legitimate owner does not notice the problem and
blocks the card or the budget in the card ends. The latter (fraud on-line) is the case we30
take into account within the proposed paper.
Several research institutions and industries have made huge investments with the
aim of designing effective methods capable of tackling the problem by employing ma-
chine learning, deep learning, big data, and computational intelligence technologies.
The efforts in this context have led to a large number of solutions that are able to35
automatically distinguish legitimate credit card transactions from the fraudulent ones.
However, regardless of the used approach, there are some common problems that
reduce its performance. The most common is represented by the unbalanced distri-
bution nature of the training data characterizing the past transactions which generates
different problems of over-fitting and leads to low performances of the adopted classi-40
fiers. In other words, such a problem arises because the number of available fraudulent
samples is usually much lower than the legitimate ones and this high grade of unbalance
does not allow the definition of a reliable model of evaluation [2, 3, 4].
This happens because the fraudulent transactions collected in the past by the fraud
detection systems are much less frequent than the legitimate ones. Moreover, (i) the45
heterogeneity nature of the data and the (ii) presence of overlaps among the data [5]
1https://bit.ly/2yQC7G1
2http://www.euromonitor.com/
3http://www.acfe.com
4https://www.ic3.gov/
2
2006 2008 2010 2012 2014 2016
1,400
1,600
1,800
Years
Mill ions o f euros
Figure 1: Fraud Amount in EME A Area
are two elements that worsen the problem.
The two elements mentioned above highly affect the effectiveness of any fraud
detection systems generating a large number of miss-classifications.
The approach we propose in this paper aims at improving the classification perfor-50
mances of several state-of-the-art classification algorithms, by adopting an ensemble
approach where the final classification is given by the combination of the different el-
ements of the ensemble through a novel model regulated by a twofold policy. The
policy is further defined with probabilistic and prudential criteria in order to maximize
the effectiveness of single approaches.55
More in detail, the main scientific contributions of our proposed approach are the
following:
(i) we introduce a formalization of the Prudential Multiple Consensus (PMC) model
aimed at combining the classification made by each single approach by adopting
both a probabilistic and a prudential criterion;60
(ii) we defined the algorithm used to classify the new transactions as legitimate or
fraudulent depending on the PMC model previously formalized, according to the
performed ensemble criteria analysis.
The remainder of this paper is organized as it follows. Section 2 introduces back-
ground and related work of the scenario taken into account. Section 3 includes the65
adopted formal notation defining formally the problem we face. In Section 4 we de-
scribe the implementation of the proposed approach whereas in Section 5 we perform a
preliminary study aimed at selecting and ensembling a set of classification algorithms.
Section 6 describes the characteristics of the experimental environment as well as the
description of the adopted datasets, strategy, and metrics whereas Section 7 shows the70
obtained results along with a related discussion. Remarks and future work where we
are headed are given in Section 8 which also ends the paper.
2. Background and Related Work
After an introduction on the most common approaches and methods used to tackle
the fraud detection problem, this section underlines the current open problems, intro-75
ducing the ensemble classification methods and the most suitable performance metrics
that are used for evaluation.
3
2.1. Fraud Detection Approaches
Different approaches in literature which tackle the fraud detection problem exploits
the following techniques:80
Data Mining: an example is provided by the work of researchers in [6] where
the generation of ad-hoc patterns are presented to recognize frauds. In one more
example [7], authors investigate several combinations of manual and automatic
approaches of classification.
Artificial Intelligence: as in [8], which uses a technique to obtain a reduction of85
the number of false alarms, during the evaluation process.
Machine Learning: the work presented in [9] makes use of several types of clas-
sification (single and ensemble). Another work, [10], takes into account the
combination of unsupervised and supervised strategies.
Genetic Programming: an example is represented by [11]), where an evolution-90
ary computation technique has been implemented to improve the fraud detection
process, taking into account the dynamics of the credit card transactions.
Reinforcement Learning: as in [12], which formalizes the interactions between
fraudsters and card-issuers as a Markov Decision Process.
Transformed-domain-based: as in [13, 14, 15, 16], where the evaluation pro-95
cess has been performed in a non-canonical domain (e.g., time, frequency, or
frequency-time).
Combined Criteria: as in [17, 18], where a multidimensional technique is ex-
ploited to improve the classification performance. One more example is given
in [17], where the authors introduce several fraud indicators in the classification100
process.
2.2. Operating Modalities
Under a different point of view, all the state-of-the-art fraud detection approaches
can be divided in two categories which depend on the involvement of artificial intelli-
gence technology [19]: there are supervised approaches and unsupervised approaches.105
In the following we will show a brief description of each of the two classes.
Supervised approaches define their evaluation model by exploiting the past fraud-
ulent and non-fraudulent transactions collected by the fraud detection system.
These methods do not work well without a considerable number of examples
(training data) which have annotations in both classes (i.e., legitimate and fraud-110
ulent). As these methods must learn from the training data how to predict future
data, and therefore they discover only known patterns, it is important to provide
them with a fully consistent and complete training data.
Unsupervised approaches operate by searching anomalies in the features that
compose the transaction under evaluation. The problem in this case is that a115
fraudulent transaction might not have any anomalies in its values and, conse-
quently, the design of effective unsupervised fraud detection approaches contin-
ues being a hard research challenge [20].
In addition, the evaluation model definition can be performed using three different
modes: static,updating, or forgetting:120
4
by following the static mode, the data under analysis are divided into several
blocks of equal size and the training of the model is made by exploiting a defined
number of contiguous blocks [21]. A drawback of this mode is the absence of a
dynamic model of evaluation able to follow users behaviour changes.
the updating mode does not work by using a unique evaluation model, since it125
updates the model when a new block arrives, involving in this process a defined
number of the most recent and contiguous blocks [22]. The problem in this case
is the impossibility to operate with small classes of data.
the forgetting mode also updates the evaluation model at each new block, but it
performs this operation by involving all the past fraudulent transactions and the130
legitimate ones present in the last two blocks [23]. However, this mode presents
high computational cost.
2.3. Current Open Problems
In addition to the intrinsic issues mentioned in Section 2.2, in the following we
report the most important open problems that affect the fraud detection domain, re-135
gardless of the used approach.
Data Scarcity: it happens because for different reasons (commercial operators
policies, privacy, legal constraints, etc.) there is not much availability of real-
world datasets to use to develop and verify novel fraud detection approaches [11].
This scenario is quite understandable, given the intrinsically private nature of the140
involved data and it represents a big problem for the researchers, which in many
cases are forced to use synthetic data [24].
Data Heterogeneity: this problem is related to the difficulty to model the rela-
tionships among transaction features that are represented differently in various
sets of data [25]. In other words, it is presented because each card issuer pro-145
cesses every day a high number of transactions, and each transaction is composed
by a number of features whose values periodically might change within a single
user’s account or between all the accounts.
Model Staticity: the classification approaches define their evaluation model
based on the available data (i.e., past transactions). Considering the high level150
of heterogeneity of the involved information, this is a problem where the pattern
that characterizes a new transaction is not present among those used to define the
evaluation model [26].
Cold Start: in the fraud detection scenario and in others needing the training
step of an evaluation model, this problem happens when the available data are155
not enough [27]. In the context taken into consideration in this paper, we have
a cold-start situation when a fraud detection system does not collect a sufficient
number of fraudulent and legitimate transactions to perform the model training.
Data Imbalance: this issue is given by the small number of fraudulent cases
usually collected by a fraud detection system, respect to the legitimate cases.160
Considering the past transactions are used to train the evaluation models, such
an occurrence leads towards a reduction of the fraud detection approaches ef-
fectiveness [2, 3, 4]. The literature presents many approaches able to face this
problem, such as those proposed in [28, 29].
5
2.4. Ensemble Methods165
Ensemble methods are largely adopted in order to perform data classification [30],
since they can improve the performance achieved by a single classification method.
Such methods have been much investigated in the past, as for example the work in [31],
which evaluates the advantages related to the computational, statistical, and represen-
tational aspects.170
It should be observed how the combination of more classification algorithms does
not always lead to better results, because this operation can also reduce the overall
classification performance. This means that the effectiveness of the resulting approach
is strictly related to the strategy adopted to aggregate the single results, thus it highly
depends on the definition of the global assessment model [30].175
In scenarios such as the one considered in this paper, the ensemble methods are
mainly aimed at improving the correct evaluation of a minority class label (e.g., that
related to the fraudulent cases), since it represents an important result when the avail-
able data are strongly unbalanced [32].
The literature indicates the ensemble methods as one of the most effective ap-180
proaches able to face the class imbalance problems. Moreover, such a scenario has
been well outlined in a survey [33], where out of 527 papers taken into consideration,
218 referred to ensemble models.
The strategy used to combine several classification algorithms usually operates in
the following two steps:185
(i) in the first step a series of different algorithms is selected based on the comple-
mentarity of their results (i.e., they get misclassifications in different places of
the test set);
(ii) in the second step their results are combined by adopting a consensus crite-
rion, such as complete agreement, majority, absolute, correction, multi stage,190
weighted, confidence and ranked voting [34].
Approaches similar to ours are listed in the following: [35], where the authors
analyzed the performance of three state-of-the-art data mining techniques in the con-
text of a bagging ensemble classifier based on decision tree algorithms; [36], where
the authors propose a strategy that drops a certain number of classifiers periodically195
and uses only a part of them for the evaluation; [37], where the authors combine the
bagging and boosting techniques. However, regardless the adopted ensemble strategy,
the state-of-the-art solutions do not implement any prudential criterion, such us that in
our PMC approach, which is based on the observation that in the context taken into ac-
count (credit card fraud detection) a wrong classification of a transaction as fraudulent200
is preferable rather than a wrong classification of a transaction as legitimate.
The reader notices that some classification algorithms are ensemble in nature, such
as AdaBoost [38] and Gradient Boosting [39]. Such algorithms operate by exploiting a
prediction model based on an ensemble of weak prediction models [33].
2.5. Performance Assessment205
Within the fraud detection scenario, especially where credit card transactions are
involved, the performance assessment must follow certain criteria, due to the particular
6
configuration of the involved data. This is necessary because some canonical metrics
(e.g. the accuracy) usually used to evaluate the classification algorithms performance
might lead to unreliable results.210
This happens especially when the experiments involve unbalanced data [3, 33, 40].
For example, let us assume a dataset in which the fraudulent cases represent the 0.01%
of the entire dataset (legitimate and fraudulent), an algorithm that classifies all the
samples as legitimate achieves the 99% of accuracy.
It should be also underlined that such an event is not rare, and it is in accordance215
with the real-world data.
Table 1: Con f usion M atrix
Real
class
Algorithm classification
fra leg total
fra0True
Positive
False
Negative |fra0|
leg0False
Positive
True
Negative |leg0|
total |f ra| |leg|
For the aforementioned considerations, the suitable assessment metrics should be
oriented to evaluate the algorithms performance by taking into account the unbalanced
configuration of data.
For this reason it is preferable to use the metrics based on the confusion matrix220
shown in Table 1 (avoiding their use in aggregate form), where fra stands for fraudulent
and leg stands for legitimate).
Simple metrics as the Sensitivity (true positive rate) and the Fallout (false posi-
tive rate) give us information about the classification algorithms effectiveness in terms
of fraudulent cases correctly classified, while metrics as the Specificity (true negative225
rate) and the Miss Rate (false negative rate), provide specular information on the per-
formance related to the detection of the legitimate cases.
In addition to the aforementioned metrics, it is also preferable to add another one
such as the AUC (Area Under the ROC Curve), since it is able to investigate the ability
to discriminate between the possible destination classes (i.e., legitimate and fraudulent230
in our case) of the adopted evaluation model [41, 42].
3. The Proposed Approach, Notation and Problem Formulation
This section introduces the proposed approach, the formal notation which includes
the formulation of the problem we address in this paper.
3.1. Proposed Approach235
This paper proposes a combined approach aimed at maximizing the effectiveness of
several single approaches. We employ an ensemble strategy regulated by a Prudential
7
Multiple Consensus model, which is based on a twofold criterion, probabilistic and
majority based.
Such an idea relies on the observation that the results given by different classifi-240
cation algorithms are not the same in terms of correct classifications and misclassifi-
cations. It means that in many cases their results do not agree on the identification of
certain legitimate or fraudulent transactions.
In more detail, in a preliminary study we observed that the classifications made
by different algorithms are frequently in conflict, also when the different algorithms245
achieve good individual performances in terms of Specificity (i.e., legitimate cases cor-
rectly classified) and Sensitivity (fraudulent cases correctly classified). This can be
exploited to increase the classification reliability, by adopting strategies that take into
account the classifications made by multiple algorithms.
On the basis of these considerations, the proposed approach wants to exploit an250
aggregation strategy able to conveniently combine the correct evaluations made by the
single algorithms, maximizing their effectiveness in the fraudulent transactions detec-
tion.
Instead of adopting a canonical aggregation criteria (e.g., complete agreement,ma-
jority voting, or weighted voting, etc.) to determine the class of destination of a new255
transaction and the basis of the results of the single algorithms, our approach adopts a
novel prudential criterion which works as it follows:
(i) each single algorithm classifies a new transaction as legitimate only if its classi-
fication is legitimate and the classification probability is above the average value
of the probabilities of the classifications made by all the algorithms for that trans-260
action. When this does not happen (i.e., the algorithm classification is fraudulent
or the classification probability is below that average value of probability) the
transaction is classified as fraudulent;
(ii) a canonical consensus criterion based on the majority voting is then taken into
account and the final classification of the transaction under analysis will depend265
on the results of all the algorithms.
3.2. Formal Notation
Given a set of transactions T={t1,t2,...,tN}collected in the past and already clas-
sified and the subsets T+={t1,t2,...,tK}and T={t1,t2,...,tJ}, respectively related
to the legitimate and fraudulent transactions in T(i.e., T+Tand TT), we denote270
as F={f1,f2,..., fM}the set of features that compose each transaction tT.
In addition, we denote as ˆ
T={ˆ
t1,ˆ
t2,...,ˆ
tU}a set of new transactions to classify and
as C={legitimate,f raudul ent}the possible classes of destination of each transaction,
meaning that a transaction can belong to only one class cC.
Finally, we denote as A={a1,a2,...,aZ}a set of classification algorithms.275
Let Ψbe the classification process made by using our PMC model. Then we eval-
uate the correctness of each classification performed by PMC through the function
Evaluation(ˆe,Ψ)that returns a Boolean value β:1in case of correct classification, 0
8
otherwise. In this way we can formalize our classification problem in terms of maxi-
mization of the sum of the values returned by this function, as indicated in Equation 1.280
max
0β≤| ˆ
E|
β=
|ˆ
E|
u=1
Evaluation(ˆeu,Ψ)(1)
4. Implementation
The architecture of our approach is shown in Figure 2.
Data
Preprocessing
Algorithms
Training
Algorithms
Predictions
PMC
Model
ˆ
t
T
ˆ
tClassification
Figure 2: PMS High-l evel Architecture
In such an architecture, the activity made in the Data Preprocessing block depends
on the input data and it can involve, for instance, a class binarization (a mapping of
a multi-class learning problem to several two-class learning problems) or a minority285
class oversampling (in order to adjust the class distribution of the dataset).
4.1. Step 1: Model Definition
The proposed Prudential Multiple Consensus (PMC) model operates by combining
the results of five classification algorithms (described in Section 6.3) on the basis of
two criteria, one based on the classification probability and one based on the majority290
voting.
To get the classification probability we use Logistic Function, since it is able to
measure the probability of a binary response based on more independent predictors.
More formally, the probability that a new transaction ˆ
tˆ
Tbelongs to a class cCis
calculated by mapping the algorithm predictions in terms of probabilities through the295
sigmoid σfunction5. Such a method is formalized in the Equation 2, where σ(az(p))
5A mathematical function characterized by a sigmoid curve, which maps any real value into the interval
[0,1].
9
is the probability estimate for the prediction pthrough the algorithm az, whose result
is given in the range [0,1]and edenotes the base of natural log.
σ(az(p)) = 1
1+ep(2)
Subsequently, in each classification performed by a single algorithm, a transaction
is considered as legitimate only when its probability is above a certain value, otherwise,300
prudentially, the transaction is classified as fraudulent. The final classification is given
according to the results of all the algorithms (by using the majority voting criterion), as
shown in Equation 3, where |A|is the number of classification algorithms and cis the
transaction classification.
c=legitimate,i f w1>w2
fraudulent,otherwise
with
µ=1
|A|·
|A|
z=1
σ(az(p))
w1=
|A|
z=1
1i f σ(az(p)) >µaz(p) = legitimate
w2=
|A|
z=1
1i f σ(az(p)) µaz(p) = fraudulent
(3)
It should be observed that Logistic Function represents only one of the possible305
approaches able to estimate the probability of a binary response given by a predictor.
It means that also other approaches able to perform the same operation can be used in
our model.
4.2. Step 2: Data Classification
According to the model previously formalized in Section 4.1, each new transaction310
ˆ
tˆ
Tis classified by using the Algorithm 1.
The input of the Algorithm 1 is represented by the classification algorithms in the
set A, the previous transactions Ealready classified, and a new transaction ˆ
tˆ
Tto
evaluate. The output will be the classification of the event ˆ
tas legitimate or fraudulent.
At step 4 the evaluation models related to the set Aof the classification algorithms are315
defined, while their classifications for the transaction ˆeare calculated at step 5. The
average probability value of all the performed classifications is calculated at step 6 and
saved in µ. A control aimed at checking whether the classification probability of each
algorithm is above the average value in µis performed (step 7 to step 13). In particular,
we increase by one the value of w1when p=legitimate and the prediction probability320
is above the µvalue, otherwise we increment the value of w2. The transaction ˆeis
classified as legitimate when all predictions have been processed and w1>w2, other-
wise the transaction is classified as fraudulent. Such a classification is returned, and
the algorithm ends, at step 19.
It should be noted that the functions getProbabilityAverage() and getProbability()325
are both based on the Logistic Function model formalized in Equation 2.
10
Algorithm 1 Transact ion cl assi f ication
Input: A=Set of algorithms, T=Past classified transactions, ˆ
t=Unevaluated transac-
tion
Output: result=Transaction ˆ
tclassification
1: procedure CLASSIFICATION(A,T,ˆ
t)
2: w10
3: w20
4: models =trainingModels(A,T)
5: predictions =getPredict ions(A,models)
6: µgetProbabilityAverage(predictions)
7: for each pin predictions do
8: if getProbability(p)>µp== legitimate then
9: w1w1+1
10: else
11: w2w2+1
12: end if
13: end for
14: if w1>w2then
15: result legitimate
16: else
17: result f raudul ent
18: end if
19: return result
20: end procedure
11
5. Classification Algorithms
This section first explains the used criteria for the selection of the algorithm to use
in our approach, then it describes the adopted ensemble criteria.
5.1. Selection Criteria330
In order to implement the proposed evaluation model, we need that the classifica-
tion algorithms, those in the set Aof Section 3.2, not only predict the class label, but
also provide the probability related to each class label. It should be observed that not
all the classification algorithms provide this type of information, which represents a
kind of confidence level about the prediction. For this reason, during the composition335
of the set of algorithm A, we kept out the algorithms not providing this information and
the algorithms performing a poor estimation of the class probabilities (i.e., those that,
instead of a continuous probability value in [0,1], returned only the 0or 1values).
The five algorithms that have been thus chosen for the experiments are: Multilayer
Perceptron (MLP), Gaussian Naive Bayes (GNB), Adaptive Boosting (ADA), Gradient340
Boosting (GBC), and Random Forests (RFC), and their settings are shown in Table 2.
The reason why we limited the number of algorithms to five derived by several
analysis and work in literature, such as [43], which fixes to five the maximum number
of algorithms to be used within an ensemble approach to obtain the best classification
performances.345
5.2. Ensemble Criteria
We performed a set of experiments in order to try different combinations for the
proposed Prudential Multiple Consensus model.
As a first step, we trivially used our model with single algorithms applying the
prudential voting defined in Section 4.1, as shown in Table 3. In order to underline350
the differences with respect to the native performance gained by each single algorithm,
the table reports this information (Native columns) beside the performance gained by
using the proposed approach (Model columns).
Afterwards, we tested our model by combining the algorithms in pairs, triples,
quadruples, and finally by using all of the algorithms. The experimental results are355
reported in Tables 4, 5, and 6.
They show that our model applied on a single algorithm reaches the best result
by using Random Forests, which compared to the native performance indicates an im-
provement in terms of fraudulent transactions correctly detected (0.807% instead of
0.653%), slightly increasing the value of Fallout (0.016% instead of 0.000%) but im-360
proving that of AUC (0.896% instead of 0.827%).
Also by combining the algorithms in pairs, triples, and quadruples, we obtain the
best results when Random Forests is involved. This is in line with other studies in
literature which indicate this algorithm [44] as one of the best approaches in these kind
of tasks within the proposed domain [45, 4, 46].365
Moreover, the results in Table 5 indicate the configuration based on four algorithms
as the most promising, since by using the combination of the MLP,GNB,GBC, and
RFC we get the best performances in terms of all the considered metrics.
The reader notices that the above results (even in the case of the single algorithms),
have been calculated using our PMC as decision strategy.370
12
Table 2: Algorithms Configuration
Algorithm Parameters
MLP activation=’relu’, alpha=0.0001, batch size=’auto’, beta 1=0.9,
beta 2=0.999, early stopping=False, epsilon=1e-08,
hidden layer sizes=(100,), learning rate=’constant’,
learning rate init=0.001, max iter=200, momentum=0.9,
nesterovs momentum=True, power t=0.5, random state=None,
shuffle=True, solver=’adam’, tol=0.0001, validation fraction=0.1,
verbose=False, warm start=False
GNB priors=None
ADA algorithm=’SAMME.R’, base estimator=None, learning rate=1.0,
n estimators=50, random state=None
GBC criterion=’friedman mse’, init=None, learning rate=0.1,
loss=’deviance’, max depth=3, max features=None,
max leaf nodes=None, min impurity decrease=0.0,
min impurity split=None, min samples leaf=1, min samples split=2,
min weight fraction leaf=0.0, n estimators=100, presort=’auto’,
random state=None, subsample=1.0, verbose=0, warm start=False
RFC bootstrap=True, class weight=None, criterion=’gini’,
max depth=None, max features=’auto’, max leaf nodes=None,
min impurity decrease=0.0, min impurity split=None,
min samples leaf=1, min samples split=2,
min weight fraction leaf=0.0, n estimators=10, n jobs=1,
oob score=False, random state=None, verbose=0, warm start=False
Table 3: Single Algorithms Performance
Eval uation Native N ative Native Model Model Model
algorit hm Sensitivity Fallout AUC Sensitivity Fallout AUC
Multilayer Perceptron (MLP) 0.146 0.000 0.573 0.781 0.629 0.576
Gaussian Naive Bayes (GNB) 0.709 0.012 0.848 0.739 0.016 0.862
Adaptive Boosting (ADA) 0.614 0.000 0.807 0.970 0.667 0.651
Gradient Boosting (GBC) 0.506 0.000 0.753 0.699 0.082 0.808
Random Forests (RFC) 0.653 0.000 0.827 0.807 0.016 0.896
Table 4: Ensemble Algorithms Performance by Pairs and Triples
Algorit hms by pairs Sensitivit y Fal lout AUC Algorithms by tripl es Sensit ivit y Fal lout AUC
MLP, GNB 0.753 0.016 0.869 MLP, GNB, ADA 0.730 0.012 0.859
MLP, ADA 1.000 0.992 0.504 GNB, ADA, GBC 0.747 0.012 0.867
MLP, GBC 0.869 0.727 0.571 ADA, GBC, RFC 0.782 0.001 0.890
MLP, RFC 0.807 0.016 0.896 MLP, ADA, RFC 0.782 0.001 0.890
GNB, ADA 1.000 0.992 0.504 MLP, GNB, RFC 0.744 0.002 0.871
ADA, GBC 1.000 0.992 0.504 MLP, GBC, RFC 0.699 0.002 0.848
GBC, RFC 0.828 0.041 0.894 GNB, ADA, RFC 0.794 0.013 0.890
GNB, GBC 0.824 0.074 0.875 MLP, ADA, GBC 0.635 0.000 0.817
GNB, RFC 0.817 0.022 0.898 MLP, GNB, GBC 0.631 0.007 0.812
13
Table 5: Ensemble Algorithms Performance by Quadruples
Algorit hms by quadru ples Sensitivity Fall out AUC Al gorithms by quad ru ples Sensitivity Fall out AU C
GNB, ADA, GBC, RFC 0.807 0.016 0.895 MLP, GNB, ADA, RFC 0.807 0.016 0.895
MLP, ADA, GBC, RFC 0.797 0.005 0.896 MLP, GNB, ADA, GBC 0.768 0.013 0.878
MLP, GNB, GBC, RFC 0.800 0.005 0.897
Table 6: Ensemble All Algorithms Performance
Algorit hms Sensitivity Fallout AUC
MLP, GNB, ADA, GBC, RFC 0.769 0.002 0.884
6. Experimental Environment
This section provides details on the experimental environment, on the dataset and
the performed strategy, and on the metrics used to evaluate the classification perfor-
mance.
6.1. Technological Environment375
The development environment used to implement the approach presented in this
paper is based on the Python language: the scikit-learn 6libraries have been used to
implement the state-of-the-art algorithms. In order to ensure the reproducibility of the
experiments we have carried out, the seed of the pseudo-random number generator
used by the scikit-learn classification algorithms has been set to 1.380
6.2. DataSet
The real-world dataset7used for the experiment contains a series of transactions
related to European cardholders and executed in two days of 2013. As shown in Ta-
ble 7, such a dataset presents a high degree of data imbalance [47], since only 492 out
of 284,807 transactions are classified as fraudulent (i.e., the 0.0017%). All the infor-385
mation in the dataset have been anonymized, except those related to the time and the
amount, which contain, respectively, the seconds elapsed since the first transaction in
the dataset and the amount of the underlying transaction.
Table 7: Dataset Details
Transactions Legitimate Fraudulent Features Classes
|T| |T+| |T| |F| |C|
284,807 284,315 492 30 2
6http://scikit-learn.org
7https://www.kaggle.com/mlg-ulb/creditcardfraud
14
6.3. Strategy
In order to respect the transaction chronology, instead of a canonical k-fold cross-390
validation criterion we used the TimeSeriesSplit scikit-learn function to perform a time
series cross-validation criterion. Such a function allows us splitting our dataset in a
series of training and test sets, respecting the transactions chronology. For the experi-
ments we used the TimeSeriesSplit function with n splits=10.
The data imbalance problem, previously described in Section 2.3, has not been395
faced during the experiments. As suggested in [48], we have preferred to evaluate
the effectiveness of our approach without any kind of data preprocessing (e.g., under-
sampling or over-sampling balancing process) because in some cases the undersam-
pling can potentially remove important samples whereas the oversampling can lead to
overfitting and increase the computational load when the dataset is already fairly large.400
The existence of a statistical significance between the obtained results has been
verified by using the independent-samples two-tailed Student's t-tests (p<0.05).
6.4. Metrics
According to the considerations made in Section 2.5, the performance of the in-
volved algorithms has been evaluated by using three metrics: the Sensitivity, the Fall-405
out, and the AUC (i.e., Area Under the ROC Curve). As we mentioned before, such
metrics have been chosen because they provide information about the performance in
terms of fraudulent transactions correctly classified (Sensitivity and Fallout), a crucial
indicator in the context taken into account, and in terms of effectiveness of the adopted
evaluation model( AUC).410
In order to evaluate the algorithm performance also in terms of correct and incorrect
classification of the legitimate transactions, we took into account two more metrics,
which provide specular information with respect to the Sensitivity and Fallout: the
Specificity and the Miss Rate.
The formulation of all the aforementioned metrics is presented below:415
6.4.1. Sensitivity
The Sensitivity is calculated as reported in Equation 4, where ˆ
Tis the set of new
transactions to classify, TP is the number of transactions correctly classified as fraud-
ulent, and FN is the number of legitimate transactions erroneously classified as fraud-
ulent.420
Sensitivity(ˆ
T) = T P
(T P +FN)(4)
6.4.2. Fallout
The Fallout is calculated as reported in Equation 5, where ˆ
Tis the set of new trans-
actions to classify, FP is the number of fraudulent transactions erroneously classified
as legitimate), and T N is the number of transactions correctly classified as legitimate.
Fallout(ˆ
T) = F P
(FP +T N )(5)
15
6.4.3. AUC425
The AUC is calculated as reported in Equation 6, where given the subsets of the
past legitimate T+and the past fraudulent transactions I,Ψindicates all the possible
comparisons between these subsets (i.e., T+and T). Its result will be given by averag-
ing all the comparisons and will lie within the interval [0,1], where 1denotes the best
performance.430
Ψ(i+,i) =
1,i f i+>i
0.5,i f i+=i
0,i f i+<i
AUC =1
|I+|·|I|
|I+|
1
|I|
1
Ψ(i+,i)(6)
6.4.4. Specificity
The Specificity is calculated as reported in Equation 7, where ˆ
Tis the set of new
transactions to classify, T N is the number of transactions correctly classified as le-
gitimate, and F P is the number of fraudulent transactions erroneously classified as
legitimate.435
Speci f icit y(ˆ
T) = T N
(T N +F P)(7)
6.4.5. Miss Rate
The Miss Rate is calculated as reported in Equation 8, where ˆ
Tis the set of new
transactions to classify, FN is the number of legitimate transactions erroneously clas-
sified as fraudulent, and T P is the number of transactions correctly classified as fraud-
ulent.440
Miss Rate(ˆ
T) = FN
(FN +T P)(8)
7. Results
This section reports the results of the performed experiments by comparing our
solution to single and multiple algorithms approaches. Discussions on the results are
also highlighted.
7.1. Single Algorithm445
Figure 3 summarizes the comparison between our solution based on the PMC
model and each single algorithm. Results indicate Sensitivity,Fallout,Specificity,Miss
Rate, and AUC values. Please note that the Sensitivity,Fallout, and AUC values for the
PMC model are the same shown in Table 5 using the best combination found (MLP,
GNB,GBC,RFC). The reader notices that, we have not included the Adaptive Boosting450
(ADA) algorithm as indicated by a preliminary study discussed in Section 5.
As reported in Section 6.3, all the experiments have been performed according to
atime series cross-validation criterion and after a thorough analysis of their results
shown in Figure 3, we can made the following observations:
16
0.20 0.40 0.60 0.80
MLP
GNB
ADA
GBC
RFC
PMC
0.15
0.71
0.61
0.51
0.65
0.80
Sensitivity
Algorit hm
0.001 0.005 0.009 0.013
MLP
GNB
ADA
GBC
RFC
PMC
0
0.012
0
0
0
0.005
Fallout
Algorit hm
0.60 0.70 0.80 0.90
MLP
GNB
ADA
GBC
RFC
PMC
0.57
0.85
0.81
0.75
0.83
0.90
AUC
Algorit hm
0.20 0.40 0.60 0.80 1.00
MLP
GNB
ADA
GBC
RFC
PMC
0.99
0.98
0.99
0.99
0.99
0.99
Speci f icit y
Algorit hm
0.20 0.40 0.60 0.80
MLP
GNB
ADA
GBC
RFC
PMC
0.85
0.29
0.38
0.49
0.34
0.19
Miss Rate
Algorit hm
Figure 3: S peci f icit y,Miss Rate,AUC,Sensitivity,and Fallout
17
As fraudulent transactions in the dataset were 492, our approach correctly de-455
tected 394 of them (Sensitivit y =0.800), with only 2misclassifications (Fallout =
0.005);
compared to the best single approach (i.e., GNB), which correctly detected 349
fraudulent transactions, with 6misclassifications, this means that our approach
had a gain of 9.1%;460
our solution was able to outperform the other algorithms in terms of AUC metric;
the improvement of our approach is further confirmed in terms of Specificity and
Miss Rate, as they prove that the obtained gain in terms of Sensitivity does not
depend on a mere increase of fraudulent classifications made by our evaluation
model.465
7.2. Multiple Algorithms
Here, we report the results of the last sets of experiments, which were aimed to
compare the performance of the proposed approach, based on the PMC model, with
that of the canonical state-of-the-art models used to manage the ensemble strategy of
classification, and to a state-of-the-art solution that operates by adopting a Bagging470
method based on the Decision Tree algorithm [35].
7.2.1. Model Comparison
This set of experiments is aimed at comparing our strategy based on the PMC model
with the strategies such as the complete agreement, the majority voting (i.e., classifi-
cation based on the majority of the classifications made by the algorithms), and the475
weighted voting (i.e., classification based on the weight of the classifications made by
the algorithms, in terms of class probability) between the algorithms. The experiments
have been conducted by using the four chosen algorithms presented in Section 5 (i.e.,
the same algorithms used in our approach) and all the algorithms. The results reported
in Table 8 and Table 9 indicate that our PMC approach outperforms other ensemble480
approaches which use different decision strategies. Please note that the Sensitivity,
Fallout,AUC,Specificity, and Miss Rate values for the PMC model have been found
using the best combination found of Table 5 (MLP, GNB, GBC, RFC).
Table 8: Ensemble Strategies Comparison (Four Algorithms)
Strat egy Sensitivit y Fall out AUC S peci f icity Miss Rate
Complete agreement 0.08 0.000 0.54 0.99 0.91
Majority voting 0.68 0.000 0.84 0.99 0.31
Weighted voting 0.55 0.000 0.77 0.99 0.44
PMC 0.80 0.005 0.89 0.99 0.19
7.2.2. Algorithm Comparison
The last set of experiments has been performed in order to evaluate our approach485
with a state-of-the-art approach employing a Bagging method based on the Decision
18
Table 9: Ensemble Strategies Comparison (All Algorithms)
Strat egy Sensitivit y Fall out AUC S peci f icity Miss Rate
Complete agreement 0.06 0.000 0.53 1.00 0.93
Majority voting 0.63 0.000 0.81 0.99 0.36
Weighted voting 0.63 0.000 0.81 0.99 0.36
PMC 0.80 0.005 0.89 0.99 0.19
Tree algorithm [35] (denoted as BDT ). According to the experimental criteria formal-
ized in [35], our dataset has been divided into four parts by following the same percent-
age criterion (i.e., P1=21.27%, P2=27.19%, P3=39.02%, P4=12.52%), as shown
in Table 10. The results of the experiments are reported in Table 11, which contains490
both the performances in terms of Sensitivity,Fallout, and AUC calculated on each part
of the dataset (Table 10), and their average value calculated on all the dataset parts.
Table 10: Dataset Composition
Dataset part Legitimate Fraudulent Total Fraud rate
P1 60,415 163 60,578 0.0026%
P2 77,338 101 77,439 0.0013%
P3 110,942 189 111,131 0.0017%
P4 35,620 39 35,659 0.0010%
Total 284,315 492 284,807
Table 11: Ensemble Algorithm Comparison (Bagging and Decision Tree)
Ap proach Dataset Sensitivit y Fall out AUC
BDT P1 0.75 0.010 0.81
BDT P2 0.70 0.010 0.80
BDT P3 0.65 0.019 0.79
BDT P4 0.71 0.019 0.78
BDT Average 0.70 0.014 0.79
PMC P1 0.85 0.005 0.88
PMC P2 0.80 0.005 0.89
PMC P3 0.83 0.004 0.87
PMC P4 0.77 0.006 0.80
PMC Average 0.81 0.005 0.86
7.3. Discussion
The results highlighted in Section 7 indicate that the proposed approach based on
our PMC model is able to improve the performance of a fraud detection system in terms495
of number of fraudulent transactions correctly classified.
Such an achievement is related to the value of Sensitivity (i.e., 0.800) and Fall-
out (i.e., 0.005) that, respectively, indicate its capability to correctly classify 9.1% of
19
fraudulent transactions more than the best competitor algorithm (GNB, which has a
Sensitivity value of 0.71).500
The results in terms of AUC metric underline the effectiveness of our evaluation
model, proving its ability to classify new transactions as legitimate or fraudulent).
The evaluation in terms of Specificity and Miss Rate confirms the above results,
showing that the increase of correctly classified fraudulent transactions implies a more
robust and precise model.505
The experiments aimed at comparing our method with other combined approaches
show the effectiveness of our evaluation model (compared to the state-or-art based on
the complete agreement,majority voting, and weighted voting criteria) in two different
configurations, i.e. by using the four algorithms selected in Section 5 and by using all
the algorithms.510
The last set of experiments, where the performance of our approach has been com-
pared to that of a performing state-of-the-art approach that uses a Bagging method
based on the Decision Tree algorithm, also show that our PMC model outperforms its
competitor, since it obtains best average performances in terms of Sensitivity,Fallout,
and AUC.515
Summarizing, we have proved that in real-world scenarios, characterized by a high
degree of data imbalance, the proposed PMC model can significantly improve a fraud
detection system, reducing the losses related to the misclassification of the fraudulent
events.
The reader notices that the rationale of the PMC method, and the reason why it520
works well in the proposed domain, is because legitimate transactions are much higher
in number and usually share a similar pattern easy to recognize. During the classi-
fication, several algorithms are thus able to assess with higher precision whether a
transaction is legitimate. On the other hand, when a sample is fraudulent, most of
the algorithms return a lower probability (confidence value) on their classification (ei-525
ther legitimate or fraudulent) and that is likely to be fraudulent. We have modeled
this behaviour in our proposed PMC algorithm and this is why we obtain such high
performances.
8. Conclusions and Future Work
In our era of big data, data intelligence and data security are very important re-530
search topics, and present constant challenges for academia and industry. The rapid
evolution of the E-commerce platforms is an example of the increasing number of fi-
nancial transactions made by electronic instruments of payment such as credit cards.
Malicious people try to steal sensitive information from these transactions creating
huge risks for the entire ecosystem. This is why, Fraud Detection Systems, especially535
those oriented to discover credit card frauds, are becoming more and more important.
The approach proposed in this paper, based on a novel Prudential Multiple Con-
sensus model, addresses this problem and its risks associated with the aim of identify
fraudulent transactions with higher precision than several state-of-the-art classification
approaches. Our ensemble approach is able to reduce some well known problems that540
20
affect this kind of classification tasks, first of all the issue related to the data imbal-
ance, improving the classification performance in terms of number of frauds correctly
detected.
All the performed experiments have been conducted by involving a real-world
dataset characterized by a high degree of data imbalance, and the performances of our545
approach have been compared to those of several state-of-the-art solutions, both single
and combined strategies, proving its effectiveness in terms of Sensitivity and AUC.
A future work would be to evaluate the proposed approach in other scenarios also
characterized by a high degree of data imbalance, as well as the experimentation of
new aggregation strategies based on the Artificial Neural Network.550
Acknowledgments
This research is partially funded by: Regione Sardegna under project Next genera-
tion Open Mobile Apps Development (NOMAD), Pacchetti Integrati di Agevolazione
(PIA) - Industria Artigianato e Servizi - Annualit`
a 2013; Italian Ministry of Educa-
tion, University and Research - Program Smart Cities and Communities and Social555
Innovation project ILEARNTV (D.D. n.1937 del 05.06.2014, CUP F74G14000200008
F19G14000910008); Sardinia Regional Government (Convenzione triennale tra la Fon-
dazione di Sardegna e gli Atenei Sardi Regione Sardegna L.R. 7/2007 annualit`
a 2016
DGR 28/21 del 17.05.2016, CUP: F72F16003030002).
References560
[1] A. Abdallah, M. A. Maarof, A. Zainal, Fraud detection system: A survey, J.
Network and Computer Applications 68 (2016) 90–113.
[2] N. Japkowicz, S. Stephen, The class imbalance problem: A systematic study,
Intell. Data Anal. 6 (5) (2002) 429–449.
[3] H. He, E. A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data565
Eng. 21 (9) (2009) 1263–1284. doi:10.1109/TKDE.2008.239.
[4] I. Brown, C. Mues, An experimental comparison of classification algorithms for
imbalanced credit scoring data sets, Expert Syst. Appl. 39 (3) (2012) 3446–3453.
doi:10.1016/j.eswa.2011.09.033.
[5] R. C. Holte, L. Acker, B. W. Porter, Concept learning and the problem of small570
disjuncts, in: N. S. Sridharan (Ed.), Proceedings of the 11th International Joint
Conference on Artificial Intelligence. Detroit, MI, USA, August 1989, Morgan
Kaufmann, 1989, pp. 813–818.
[6] M. Lek, B. Anandarajah, N. Cerpa, R. Jamieson, Data mining prototype for de-
tecting e-commerce fraud, in: S. Smithson, J. Gricar, M. Podlogar, S. Avgerinou575
(Eds.), Proceedings of the 9th European Conference on Information Systems,
Global Co-operation in the New Millennium, ECIS 2001, Bled, Slovenia, June
27-29, 2001, 2001, pp. 160–165.
21
[7] N. Carneiro, G. Figueira, M. Costa, A data mining based system for credit-card
fraud detection in e-tail, Decision Support Systems 95 (2017) 91–101.580
[8] A. J. Hoffman, R. E. Tessendorf, Artificial intelligence based fraud agent to iden-
tify supply chain irregularities, in: M. H. Hamza (Ed.), IASTED International
Conference on Artificial Intelligence and Applications, part of the 23rd Multi-
Conference on Applied Informatics, Innsbruck, Austria, February 14-16, 2005,
IASTED/ACTA Press, 2005, pp. 743–750.585
[9] D. G. Whiting, J. V. Hansen, J. B. McDonald, C. C. Albrecht, W. S. Albrecht,
Machine learning methods for detecting patterns of management fraud, Compu-
tational Intelligence 28 (4) (2012) 505–527.
[10] I. Nolan, Transaction fraud detection using random forest classifier and logistic
regression, Neural Networks & Machine Learning 1 (1) (2017) 2–2.590
[11] C. Assis, A. M. Pereira, M. de Arruda Pereira, E. G. Carrano, Using genetic pro-
gramming to detect fraud in electronic transactions, in: C. V. S. Prazeres, P. N. M.
Sampaio, A. Santanch`
e, C. A. S. Santos, R. Goularte (Eds.), A Comprehensive
Survey of Data Mining-based Fraud Detection Research, Vol. abs/1009.6119,
2010, pp. 337–340.595
[12] A. Mead, T. Lewris, S. Prasanth, S. Adams, P. Alonzi, P. Beling, Detecting fraud
in adversarial environments: A reinforcement learning approach, in: Systems and
Information Engineering Design Symposium (SIEDS), 2018, IEEE, 2018, pp.
118–122.
[13] W. Wang, D. Lu, X. Zhou, B. Zhang, J. Mu, Statistical wavelet-based anomaly600
detection in big data with compressive sensing, EURASIP J. Wireless Comm. and
Networking 2013 (2013) 269.
[14] R. Saia, A discrete wavelet transform approach to fraud detection, in: NSS, Vol.
10394 of Lecture Notes in Computer Science, Springer, 2017, pp. 464–474.
[15] R. Saia, S. Carta, Evaluating credit card transactions in the frequency domain605
for a proactive fraud detection approach, in: SECRYPT, SciTePress, 2017, pp.
335–342.
[16] R. Saia, S. Carta, A frequency-domain-based pattern mining for credit card fraud
detection, in: IoTBDS, SciTePress, 2017, pp. 386–391.
[17] S. Y. Huang, C. Lin, A. Chiu, D. C. Yen, Fraud detection using fraud triangle risk610
factors, Information Systems Frontiers 19 (6) (2017) 1343–1356.
[18] R. Saia, Unbalanced data classification in fraud detection by introducing a multi-
dimensional space analysis, in: IoTBDS, SciTePress, 2018, pp. 29–40.
[19] R. J. Bolton, D. J. Hand, Statistical fraud detection: A review, Statistical Science
(2002) 235–249.615
22
[20] C. Phua, V. C. S. Lee, K. Smith-Miles, R. W. Gayler, A comprehensive survey of
data mining-based fraud detection research, CoRR abs/1009.6119.
[21] A. D. Pozzolo, O. Caelen, Y. L. Borgne, S. Waterschoot, G. Bontempi, Learned
lessons in credit card fraud detection from a practitioner perspective, Expert Syst.
Appl. 41 (10) (2014) 4915–4928. doi:10.1016/j.eswa.2014.02.026.620
[22] H. Wang, W. Fan, P. S. Yu, J. Han, Mining concept-drifting data streams using
ensemble classifiers, in: L. Getoor, T. E. Senator, P. M. Domingos, C. Falout-
sos (Eds.), Proceedings of the Ninth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27,
2003, ACM, 2003, pp. 226–235. doi:10.1145/956750.956778.625
[23] J. Gao, W. Fan, J. Han, P. S. Yu, A general framework for mining concept-drifting
data streams with skewed distributions, in: Proceedings of the Seventh SIAM In-
ternational Conference on Data Mining, April 26-28, 2007, Minneapolis, Min-
nesota, USA, SIAM, 2007, pp. 3–14. doi:10.1137/1.9781611972771.1.
[24] E. L. Barse, H. Kvarnstr ¨
om, E. Jonsson, Synthesizing test data for fraud detection630
systems, in: ACSAC, IEEE Computer Society, 2003, pp. 384–394.
[25] A. Chatterjee, A. Segev, Data manipulation in heterogeneous databases, ACM
SIGMOD Record 20 (4) (1991) 64–68.
[26] S. Sorournejad, Z. Zojaji, R. E. Atani, A. H. Monadjemi, A survey of credit
card fraud detection techniques: Data and technique oriented perspective, CoRR635
abs/1611.06439.
[27] J. Attenberg, F. J. Provost, Inactive learning?: difficulties employing active learn-
ing in practice, SIGKDD Explorations 12 (2) (2010) 36–41. doi:10.1145/
1964897.1964906.
URL http://doi.acm.org/10.1145/1964897.1964906640
[28] M. Zareapoor, J. Yang, A novel strategy for mining highly imbalanced data in
credit card transactions, Intelligent Automation & Soft Computing (2017) 1–7.
[29] V. Vinciotti, D. J. Hand, Scorecard construction with unbalanced class sizes, Jour-
nal of Iranian Statistical Society 2 (2) (2003) 189–205.
[30] H. M. Gomes, J. P. Barddal, F. Enembreck, A. Bifet, A survey on ensemble learn-645
ing for data stream classification, ACM Comput. Surv. 50 (2) (2017) 23:1–23:36.
[31] T. G. Dietterich, Ensemble methods in machine learning, in: Multiple Classifier
Systems, Vol. 1857 of Lecture Notes in Computer Science, Springer, 2000, pp.
1–15.
[32] S. Akila, U. S. Reddy, Risk based bagged ensemble (rbe) for credit card fraud650
detection, in: Inventive Computing and Informatics (ICICI), International Con-
ference on, IEEE, 2017, pp. 670–674.
23
[33] H. Guo, Y. Li, J. Shang, G. Mingyun, H. Yuanyue, G. Bing, Learning from class-
imbalanced data: Review of methods and applications, Expert Syst. Appl. 73
(2017) 220–239.655
[34] M. Faghani, M. J. Nordin, S. Shojaeipour, Optimization of the performance face
recognition using adaboost-based, in: R. Chen (Ed.), Intelligent Computing and
Information Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, pp.
359–365.
[35] M. Zareapoor, P. Shamsolmoali, Application of credit card fraud detection: Based660
on bagging ensemble classifier, Procedia Computer Science 48 (2015) 679–685.
[36] D. Wu, Y. Liu, G. Gao, Z. Mao, W. Ma, T. He, An adaptive ensemble classifier for
concept drifting stream, in: Computational Intelligence and Data Mining, 2009.
CIDM’09. IEEE Symposium on, IEEE, 2009, pp. 69–75.
[37] Y. Bian, M. Cheng, C. Yang, Y. Yuan, Q. Li, J. L. Zhao, L. Liang, Financial fraud665
detection: a new ensemble learning approach for imbalanced data., in: PACIS,
2016, p. 315.
[38] Y. Freund, R. E. Schapire, A decision-theoretic generalization of on-line learning
and an application to boosting, J. Comput. Syst. Sci. 55 (1) (1997) 119–139.
[39] A. Natekin, A. Knoll, Gradient boosting machines, a tutorial, Front. Neurorobot.670
2013.
[40] A. O. Adewumi, A. A. Akinyelu, A survey of machine-learning and nature-
inspired based credit card fraud detection techniques, International Journal of
System Assurance Engineering and Management 8 (2) (2017) 937–953.
[41] D. M. Powers, Evaluation: from precision, recall and f-measure to roc, informed-675
ness, markedness and correlation.
[42] D. Faraggi, B. Reiser, Estimation of the area under the roc curve, Statistics in
medicine 21 (20) (2002) 3093–3106.
[43] A. T. Sergio, T. P. F. de Lima, T. B. Ludermir, Dynamic selection of forecast
combiners, Neurocomputing 218 (2016) 37–50.680
[44] L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5–32. doi:10.
1023/A:1010933404324.
[45] S. Lessmann, B. Baesens, H. Seow, L. C. Thomas, Benchmarking state-of-the-
art classification algorithms for credit scoring: An update of research, Euro-
pean Journal of Operational Research 247 (1) (2015) 124–136. doi:10.1016/685
j.ejor.2015.05.030.
[46] S. Bhattacharyya, S. Jha, K. K. Tharakunnel, J. C. Westland, Data mining for
credit card fraud: A comparative study, Decision Support Systems 50 (3) (2011)
602–613. doi:10.1016/j.dss.2010.08.008.
URL http://dx.doi.org/10.1016/j.dss.2010.08.008690
24
[47] A. D. Pozzolo, O. Caelen, R. A. Johnson, G. Bontempi, Calibrating probability
with undersampling for unbalanced classification, in: IEEE Symposium Series
on Computational Intelligence, SSCI 2015, Cape Town, South Africa, December
7-10, 2015, IEEE, 2015, pp. 159–166. doi:10.1109/SSCI.2015.33.
URL http://dx.doi.org/10.1109/SSCI.2015.33695
[48] N. V. Chawla, N. Japkowicz, A. Kotcz, Special issue on learning from imbalanced
data sets, ACM Sigkdd Explorations Newsletter 6 (1) (2004) 1–6.
25
... In [55], a detailed survey is performed on the supplementation of fraud detection systems in many other industries. In [56], fraud detection model using prudential multiple consensus model is developed, for detecting fraudulent transactions in E-commerce industry. The model is validated on real world dataset which contains high degree of imbalance data and results of validation shows that the ensemble model outperforms the other state of the art models. ...
... Web spam classification also requires resampling methods for data preprocessing [10]. Fraud detection related to bank transactions, credit cards, insurance companies, etc. [11], software fault prediction [12] and profile injection attacks in recommender systems [13] are other significant application domains of machine learning methods where data are imbalanced. ...
Preprint
Full-text available
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
... The prediction result can have highly skew performance, insufficient and less accurate outcomes due to imbalanced data. [37] Imbalanced data classes can have different two approaches as shown in figure 2, this includes [38]: (1) Binary-Classes where the dataset has only two classes, one positive and one negative [39]. (2) Multi-Classes that has more than two classes [39]. ...
... This approach increases the prediction performance of the ML algorithm used for classification [31]. To overcome data imbalance in big data problem, in [39] the authors have proposed a data intelligence based multiple consensus model, where probabilistic and majority voting of classification algorithms is combined for in large real dataset. ...
Article
Full-text available
Due to fast-evolving technology, the world is moving to the use of credit cards rather than money in their daily lives, giving rise to many new opportunities for fraudsters to use credit cards maliciously. Based on the Nilson report, losses related to global cards were estimated to be over $35 billion by 2020. In order to maintain the security of users of these cards, the credit card company must develop a service to ensure that users are protected from any risks they may be exposed to. For this reason, we introduce a fraud detection model, denoted ST-BPNN, which is based on machine and deep learning approaches to identify fraudulent transactions. ST-BPNN was applied on real fraud detection data provided by the European bank. Comparing the obtained results from ST-BPNN with a recent state-of-the-art approach shows that our proposed model demonstrates high predictive performance for detecting fraudulent transactions.
Article
This study aims to present an effective algorithm for identifying fraudulent transactions of banking cards. The proposed method uses missing value replacements and scale‐free graphs and the ensemble method with a large number of graphs that utilized voting between all graphs to predict whether a transaction is authorized or suspicious. The purpose of this method is to develop a graph‐based system for detecting fraudulent activities in the banking industry. This research improves the evaluation criteria and eliminates the weakness of individual methods and other group methods. This is achieved by combining group capability and group decision‐making and free scaling of scale‐free graphs. This was confirmed through conducting different experiments on two standard datasets and comparing with different studies and individual methods. The unique features of the proposed model included estimating the generalization error in the implementation as well as no need for assessment methods or test stage, along with estimating the level of importance of each variable in the problem during the algorithm runtime. To evaluate this algorithm, various parameters that are used in many data mining methods have been employed. The proposed algorithm is assessed against the decision tree algorithm, support vector machine, neural network, and ensemble neural network. Based on the obtained results, it was observed that in both datasets, the amount of evaluation criteria including accuracy, sensitivity, and f‐criteria calculated in the proposed method has increased by approximately 20% compared to the above‐mentioned our methods.
Article
Phishing has been consolidating itself as a chronic problem due to its approach to exploiting the end-user, seen as the weakest factor. Through social engineering, the attacker seeks a carelessness of the human being to intercept sensitive data. Concomitantly, the richness in details makes it more difficult to mitigate the attack by most anti-phishing mechanisms since they are sustained in classifying a malicious page that lacks visual and textual details. This study aims to present a rule-based model approach, called piracema.io, for phishing prediction. Compared with other solutions proposed in the literature, the study believes that it has a different model that increases its efficiency in prediction as phishing presents greater richness based on page reputation-driven. In the light of the results obtained in logistic regression, the study detected static and dynamic features, considering relevance, relationship, and similarity between them. As a proof of concept, the study uses a statistical approach to evaluate the prediction modeling over the gradual depth and adherent acting strategies adopted to the proposal. As a result, the study discusses the quantitative and qualitative data obtained by the proposal, presenting contributions, threats, and limitations, as well as perspectives for future work for the continuity and improvement of the model in its current state.
Article
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
Chapter
Chadli, HajarBikrat, YoussefChadli, SaraSaber, MohammedFakir, AmineTahani, Abdelwahed Currently, green energy is knowing a massive growth in the world with the growth of newer energy sources such as wind energy, hydro energy, tidal energy geothermal energy, biomass energy and of Corse the Solar energy which is considered the second biggest source of electricity worldwide including morocco. The production of electricity via these centrals requires optimization at the different conversion levels. To obtain electricity that meets the standards of the electrical grid (sine wave of frequency 50 Hz), the inverter remains the first element to design and build. The structures based on multi-level inverters have brought an undeniable advantage to alternative continuous conversion, especially in high power applications. In this article a new 7-level inverter architecture with only six switches is presented and compared along with the other seven level inverter topologies. To improve the performance of our proposed multilevel inverter, we used a digital sinusoidal Pulse Width Modulation (SPWM) strategy using the Arduino wich leads to further reduction of THD. In this paper, the inverter was tested using Proteus software and Matlab Simulink simulator for harmonic analysis. Then real-time implementation of inverter was tested for a resistive load.
Conference Paper
Full-text available
The problem of frauds is becoming increasingly important in this E-commerce age, where an enormous number of financial transactions are carried out by using electronic instruments of payment such as credit cards. Given the impossibility of adopting human-driven solutions, due to the huge number of involved operations, the only possible way to face this kind of problems is the adoption of automatic approaches able to discern the legitimate transactions from the fraudulent ones. For this reason, today the development of techniques capable of carrying out this task efficiently represents a very active research field that involves a large number of researchers around the world. Unfortunately, this is not an easy task, since the definition of effective fraud detection approaches is made difficult by a series of well-known problems, the most important of them being the non-balanced class distribution of data that leads towards a significant reduction of the machine learning approaches performance. Such limitation is addressed by the approach proposed in this paper, which exploits three different metrics of similarity in order to define a three-dimensional space of evaluation. Its main objective is a better characterization of the financial transactions in terms of the two possible target classes (legitimate or fraudulent), facing the information asymmetry that gives rise to the problem previously exposed. A series of experiments conducted by using real-world data with different size and imbalance level, demonstrate the effectiveness of the proposed approach with regard to the state-of-the-art solutions.
Presentation
Full-text available
The massive increase in financial transactions made in the e-commerce field has led to an equally massive increase in the risks related to fraudulent activities. It is a problem directly correlated with the use of credit cards, considering that almost all the operators that offer goods or services in the e-commerce space allow their customers to use them for making payments. The main disadvantage of these powerful methods of payment concerns the fact that they can be used not only by the legitimate users (cardholders) but also by fraudsters. Literature reports a considerable number of techniques designed to face this problem, although their effectiveness is jeopardized by a series of common problems, such as the imbalanced distribution and the heterogeneity of the involved data. The approach presented in this paper takes advantage of a novel evaluation criterion based on the analysis, in the frequency domain, of the spectral pattern of the data. Such strategy allows us to obtain a more stable model for representing information, with respect to the canonical ones, reducing both the problems of imbalance and heterogeneity of data. Experiments show that the performance of the proposed approach is comparable to that of its state-of-the-art competitor, although the model definition does not use any fraudulent previous case, adopting a proactive strategy able to contrast the well known cold-start issue.
Presentation
Full-text available
Nowadays, the prevention of credit card fraud represents a crucial task, since almost all the operators in the E-commerce environment accept payments made through credit cards, aware of that some of them could be fraudulent. The development of approaches able to face effectively this problem represents a hard challenge due to several problems. The most important among them are the heterogeneity and the imbalanced class distribution of data, problems that lead toward a reduction of the effectiveness of the most used techniques, making it difficult to define effective models able to evaluate the new transactions. This paper proposes a new strategy able to face the aforementioned problems based on a model defined by using the Discrete Fourier Transform conversion in order to exploit frequency patterns, instead of the canonical ones, in the evaluation process. Such approach presents some advantages, since it allows us to face the imbalanced class distribution and the cold-start issues by involving only the past legitimate transactions, reducing the data heterogeneity problem thanks to the frequency-domain-based data representation, which results less influenced by the data variation. A practical implementation of the proposed approach is given by presenting an algorithm able to classify a new transaction as reliable or unreliable on the basis of the aforementioned strategy.
Article
Full-text available
Ensemble-based methods are among the most widely used techniques for data stream classification. Their popularity is attributable to their good performance in comparison to strong single learners while being relatively easy to deploy in real-world applications. Ensemble algorithms are especially useful for data stream learning as they can be integrated with drift detection algorithms and incorporate dynamic updates, such as selective removal or addition of classifiers. This work proposes a taxonomy for data stream ensemble learning as derived from reviewing over 60 algorithms. Important aspects such as combination, diversity, and dynamic updates, are thoroughly discussed. Additional contributions include a listing of popular open-source tools and a discussion about current data stream research challenges and how they relate to ensemble learning (big data streams, concept evolution, feature drifts, temporal dependencies, and others).
Article
Full-text available
Credit-card fraud leads to billions of dollars in losses for online merchants. With the development of machine learning algorithms, researchers have been finding increasingly sophisticated ways to detect fraud, but practical implementations are rarely reported. We describe the development and deployment of a fraud detection system in a large e-tail merchant. The paper explores the combination of manual and automatic classification, gives insights into the complete development process and compares different machine learning methods. The paper can thus help researchers and practitioners to design and implement data mining based systems for fraud detection or similar problems. This project has contributed not only with an automatic system, but also with insights to the fraud analysts for improving their manual revision process, which resulted in an overall superior performance.
Article
The design of an efficient credit card fraud detection technique is, however, particularly challenging, due to the most striking characteristics which are; imbalancedness and non-stationary environment of the data. These issues in credit card datasets limit the machine learning algorithm to show a good performance in detecting the frauds. The research in the area of credit card fraud detection focused on detection the fraudulent transaction by analysis of normality and abnormality concepts. Balancing strategy which is designed in this paper can facilitate classification and retrieval problems in this domain. In this paper, we consider the classification problem in supervised learning scenario by creating a contrast vector for each customer based on its historical behaviors. The performance evaluation of proposed model is made possible by a real credit card data-set provided by FICO, and it is found that the proposed model has significant performance than other state-of-the-art classifiers.
Article
Credit card is one of the popular modes of payment for electronic transactions in many developed and developing countries. Invention of credit cards has made online transactions seamless, easier, comfortable and convenient. However, it has also provided new fraud opportunities for criminals, and in turn, increased fraud rate. The global impact of credit card fraud is alarming, millions of US dollars have been lost by many companies and individuals. Furthermore, cybercriminals are innovating sophisticated techniques on a regular basis, hence, there is an urgent task to develop improved and dynamic techniques capable of adapting to rapidly evolving fraudulent patterns. Achieving this task is very challenging, primarily due to the dynamic nature of fraud and also due to lack of dataset for researchers. This paper presents a review of improved credit card fraud detection techniques. Precisely, this paper focused on recent Machine Learning based and Nature Inspired based credit card fraud detection techniques proposed in literature. This paper provides a picture of recent trend in credit card fraud detection. Moreover, this review outlines some limitations and contributions of existing credit card fraud detection techniques, it also provides necessary background information for researchers in this domain. Additionally, this review serves as a guide and stepping stone for financial institutions and individuals seeking for new and effective credit card fraud detection techniques.