Content uploaded by Roberto Saia
Author content
All content in this area was uploaded by Roberto Saia on Oct 11, 2017
Content may be subject to copyright.
A Discrete Wavelet Transform Approach
to Fraud Detection
Roberto Saia
Department of Mathematics and Computer Science
University of Cagliari, Via Ospedale 72 - 09124 Cagliari, Italy
roberto.saia@unica.it
Abstract. The exponential growth in the number of operations car-
ried out in the e-commerce environment is directly related to the growth
in the number of operations performed through credit cards. This hap-
pens because practically all commercial operators allow their customers
to make their payments by using them. Such scenario leads toward an
high level of risk related to the potential fraudulent activities that the
fraudsters can perform by exploiting this powerful instrument of pay-
ment illegitimately. A large number of state-of-the-art approaches have
been designed to address this problem, but they must face some common
issues, the most important of them are the imbalanced distribution and
the heterogeneity of data. This paper presents a novel fraud detection
approach based on the Discrete Wavelet Transform, which is exploited in
order to define an evaluation model able to address the aforementioned
problems. Such objective is achieved by using only legitimate transac-
tions in the model definition process, an operation made possible by the
more stable data representation offered by the new domain. The per-
formed experiments show that our approach performance is comparable
to that of one of the best state-of-the-art approaches such as random
forests, demonstrating how such proactive strategy is also able to face
the cold-start problem.
Keywords: Business intelligence ·Fraud detection ·Pattern mining ·
Wavelet
1 Introduction
A study performed by American Association of Fraud Examiners1shows that
the credit card frauds (i.e., purchases without authorization or counterfeits of
credit cards) are the 10–15% of all the fraud cases, for a financial value close to
75–80%. Only in the USA, such frauds lead toward an estimated average loss per
fraud case of 2million of dollars, and for this reason in recent years there was an
increase in the researchers'efforts, aimed to define effective techniques for the
fraud detection. Literature presents several state-of-the-art techniques for this
task, but all of them have to face some common problems, e.g., the imbalanced
1http://www.acfe.com
distribution of data and the heterogeneity of the information that compose a
transaction. Such scenario is worsened by the scarcity of information that usually
characterizes a transaction, a problem that leads toward an overlapping of the
classes of expense.
The core idea of the proposed approach is the adoption of a new evalua-
tion model based on the data obtained by processing the transactions through a
Discrete Wavelet Transformation (DW T ) [1]. Considering that such process in-
volves only the previous legitimate transactions, it operates proactively by facing
the cold-start issue (i.e., scarcity or absence of fraudulent examples during the
model definition), reducing also the problems related to the data heterogeneity,
since the new model is less influenced by the data variations.
The scientific contributions given by this paper are as follows:
(i) definition of the time series to use as input in the DW T process, in terms
of sequence of values assumed by the features of a credit card transaction;
(ii) formalization of the process aimed to compare the DW T output of a new
transaction with those of the previous legitimate ones;
(iii) classification of the new transactions as legitimate or fraudulent through
an algorithm based on the previous comparison process.
The paper is organized into several sections: Section 2 introduces the background
and related work of the fraud detection scenario; Section 3 reports the formal
notation adopted in this paper and defines the faced problem; Section 4 gives all
details about our approach; Section 5 describes the experimental environment,
the used datasets and metrics, the adopted strategy and competitor approach,
ending with the presentation of the experimental results; the last Section 6 pro-
vides some concluding remarks and future work.
2 Background and Related Work
Fraud Detection Techniques: the strategy adopted by the fraud detection
systems can be of two types: supervised or unsupervised [2]. By following the
supervised strategy it uses the previous fraudulent and non-fraudulent transac-
tions in order to define its evaluation model. This is a strategy that needs a set of
examples related to both classes of transactions, and its effectiveness is usually
restricted to the recognition of patterns present in the training set. By following
the unsupervised strategy, the system analyzes the new transactions with the
aim to detect anomalous values in their features, where as anomaly we mean a
value outside the range of values assumed by the feature in the set of previous
legitimate cases.
The static approach [3] represents the most common way to operate in order
to detect fraudulent transactions related to a credit card activity. By following
such approach, the data stream is divided into blocks of equal size and the model
is trained by using only a limited number of initial and contiguous blocks. Dif-
ferently from the static approach, the updating approach [4] updates its model
at each new block, performing this activity by using a certain number of latest
and contiguous blocks. A forgetting approach [5] can be also followed, and in this
case the model is updated when a new block appears, performing this operation
by using all the previous fraudulent transactions, but only the legitimate trans-
actions present in the last two blocks. The models defined on the basis of these
approaches can be used individually or they can be aggregated in order to define
a bigger model of evaluation. Some of the problems related to the aforementioned
approaches are the inability to model the users behavior (static approach), the
inability to manage small classes of data (updating approach), and the compu-
tational complexity (forgetting approach), plus the common issues described in
the following.
Open Problems: a series of problems, reported below, make the work of re-
searchers operating in this field harder.
(i) Lack of public real-world datasets: this happens for several reasons, the first
of them being the restrictive policies adopted by commercial operators, aimed
to not reveal information about their business, for privacy, competition, or legal
issues [6].
(ii) Non-adaptability: caused by the inability of the evaluation models to classify
the new transactions correctly, when these have patterns different to those used
during the model training [7].
(iii) Data heterogeneity: this problem is related to the incompatibility between
similar features resulting in the same data being represented differently in dif-
ferent datasets [8].
(iv) Unbalanced distribution of data: it is certainly the most important issue [9],
which happens because the information available to train the evaluation models
is usually composed by a large number of legitimate cases and a small number of
fraudulent ones, resulting in a data configuration that reduces the effectiveness
of the classification approaches.
(v) Cold-start: another problem is related to those scenarios where the data
used for the evaluation model training does not contain enough information on
the domain taken into account, leading toward the definition of unreliable mod-
els [10]. Basically, this happens when the data available for the model training
does not contain representative examples of all classes of information.
Proposed Approach: the core idea of this work is to move the evaluation
process from the canonical domain to a new domain by exploiting the Discrete
Wavelet Transformation (DW T ) [11]. In more detail, we use the DW T process
in a time series data mining context, where a time series usually refers to a
sequence of values acquired by measuring the variation in the time of a specific
data type (i.e., temperature, amplitude, etc.).
The DW T process transforms a time series by exploiting a set of functions
named wavelets [12], and in literature it is usually performed in order to reduce
the data size or the data noise (e.g., in the image compression and filtering
tasks). The time-scale multiresolution offered by the D W T allows us to observe
the original time series from different points of view, each of them containing
interesting information on the original data. The capability in the new domain to
observe the data by using multiple scales (multiple resolution levels) allows our
approach to define a more stable and representative model of the transactions,
with regard to the canonical state-of-the-art approaches.
In our approach we define time series as the sequence of values assumed by
the features of a credit card transaction, frequency represents the number of
occurrences of a value in a time series over a unit of time, and as scale we refer
to the time interval that characterize a time series.
Formally, a Continuous Wavelet Transform (C W T ) is defined as shown in
Equation 1, where ψ(t) represents a continuous function in both the time and
frequency domain (called mother wavelet) and the ∗denoting the complex con-
jugate .
Xw(a, b) = 1
|a|1/2R∞
−∞ x(t)ψ∗t−b
adt (1)
Given the impossibility to analyze the data by using all wavelets coefficients, it
is usually acceptable to consider a discrete subset of the upper half-plane to be
able to reconstruct the data from the corresponding wavelets coefficients The
considered discrete subset of the half-plane are all the points (am, namb), where
m, n ∈Z, and this allows us to define the so-called child wavelets as shown in
Equation 2.
ψm,n(t) = 1
√amψt−nb
am(2)
The use of small scales (i.e., that corresponds to large frequencies, since the scale
is given by the formula 1
frequency ) compress the data, giving us an overview
of the involved information, while large scales (i.e., low frequencies) expand
the data, offering a detailed analysis of the information. On the basis of the
characteristics of the wavelets transformation, although it is possible to use many
basis functions as mother wavelet (e.g., Daubechies,Meyer,Symlets,Coiflets,
etc), for the scope of our approach we decided to use one of the simplest and
oldest wavelets formalization, the Haar wavelet [13]. It is shown in Equation 3
and it allows us to measure the contrast directly from the responses of low and
high frequency sub-bands.
ψ(t) =
1,0≤t > 1
2
−1,1
2≤t < 1
0, otherwise
(3)
Competitor Approach: considering that the most effective fraud detection
approaches in literature need both the fraudulent and legitimate examples to
train their model, we have chosen not to compare our approach to many of
them, limiting the comparison to only one of the most used and effective ones,
being Random Forests [14]. Our intention is to demonstrate the capability of
the proposed approach to define an effective evaluation model by using a single
class of transactions, overcoming some well-known issues.
Random Forests represents one of the most effective state-of-the-art ap-
proaches, since in most of the cases reported in literature it outperforms the
other ones in this particular field [15, 16]. It works by following an ensemble
learning method for classification and regression based on the construction of a
number of randomized decision trees during the training phase and the classifi-
cation is inferred by averaging the obtained results.
3 Notation and Problem Definition
Given a set of classified transactions T={t1, t2,...,tN}, and a set of fea-
tures V={v1, v2,...,vM}that compose each t∈T, we denote as T+=
{t1, t2,...,tK}the subset of legitimate transactions (then T+⊆T), and as
T−={t1, t2, . . . , tJ}the subset of fraudulent ones (then T−⊆T). We also
denote as ˆ
T={ˆ
t1,ˆ
t2,...,ˆ
tU}a set of unevaluated transactions. It should be
observed that a transaction only can belong to one class c∈C, where C=
{legitimate, f raudulent}. Finally, we denote as F={f1, f2,...,fX}the output
of the DW T process.
Denoting as Ξthe process of comparison between the DW T output of the
time series in the set T+(i.e., the sequence of feature values in the previous
legitimate transactions) and the D W T output of the time series related to the
unevaluated transactions in the set ˆ
T(processed one at a time), the objective of
our approach is the classification of each transaction ˆ
t∈ˆ
Tas legitimate or fraud-
ulent. Defining a function Evaluation(ˆ
t, Ξ) that performs this operation based
on our approach, returning a boolean value β(0=misclassification,1=correct
classification) for each classification, we can formalize our objective function
(Equation 4) in terms of maximization of the results sum.
max
0≤β≤| ˆ
T|
β=|ˆ
T|
P
u=1
Eval uation(ˆ
tu, Ξ)(4)
4 Proposed Approach
Step 1 of 3 - Data Definition: a time series is a series of events acquired
during a certain period of time, where each of these events is characterized by
a value. The set composed by all the acquisitions refers to a single variable,
since it contains data of the same type. In our approach we consider as time
series (ts) the sequence of values assumed by the features v∈Vin the sets T+
(previous legitimate transactions) and ˆ
T(unevaluated transactions), as shown
in Equation 5.
T+=
v1,1v1,2. . . v1,M
v2,1v2,2. . . v2,M
.
.
..
.
.....
.
.
vK,1vK,2. . . vK,M
ˆ
T=
v1,1v1,2. . . v1,M
v2,1v2,2. . . v2,M
.
.
..
.
.....
.
.
vU,1vU,2. . . vU,M
ts(T+) = (v1,1, v1,2,...,v1,M ),(v2,1, v2,2,...,v2,M ),··· ,(vK,1, vK,2,...,vK,M )
ts(ˆ
T) = (v1,1, v1,2,...,v1,M ),(v2,1, v2,2,...,v2,M ),··· ,(vU,1, vU,2, . . . , vU,M)
(5)
Step 2 of 3 - Data Processing: the time series previously defined are here
used as input in the DW T process. Without going deeply into the formal prop-
erties of the wavelet transform, we want to exploit the following two:
(i) Dimensionality reduction: the DW T process can reduce the time series data,
since its orthonormal transformation reduces their dimensionality, providing a
compact representation that preserves the original information in its coefficients.
By exploiting this property a fraud detection system can reduce the computa-
tional complexity of the involved processes;
(ii) Multiresolution analysis: the DW T process allows us to define separate time
series on the basis of the original one, distributing the information in them in
terms of wavelet coefficients. The orthonormal transformation carried out by
DW T preserves the original information, allowing us to return to the original
data representation. A fraud detection system can exploit this property in order
to detect rapid changes in the data under analysis, observing the data series
under two different points of view, one approximated and one detailed. The first
provides an overview on the data, while the second provides useful information
for the data changing evaluation.
Our approach exploits both the aforementioned properties, transforming the
time series through the Haar wavelet process. The approximation coefficients
at level N
2was preferred to a more precise one in order to define a more stable
evaluation model, less influenced by the data heterogeneity.
Step 3 of 3 - Data Classification: a new transaction ˆ
t∈ˆ
Tis evaluated by
comparing the output of the D W T process applied on each time series extracted
by the set T+(previous legitimate transactions) to the output of the same process
applied on the time series of the transaction ˆ
tto evaluate.
The comparison is performed in terms of cosine similarity between the output
vectors (i.e. values in the set F), as shown in Equation 6, where ∆is the similar-
ity, αis a threshold experimentally defined, and cis the resulting classification.
We repeat this process for each transaction t∈T+, evaluating the classification
of the transaction ˆ
ton the basis of the average of all the comparisons.
∆=Cosim(F(t), F (ˆ
t)), with c =(∆≥α, legitimate
∆ < α, fraudulent (6)
The Algorithm 1 takes the past legitimate transactions in T+as input, the
transaction ˆ
tto evaluate, and the threshold α, returning a boolean value that
indicates the ˆ
tclassification (i.e., true=legitimate or false=fraudulent) as output.
Algorithm 1 T r ansaction evaluation
Require: T+=Legitimate previous transactions, ˆ
t=Unevaluated transaction, α=Threshold
Ensure: β=Classification of the transaction ˆ
t
1: procedure transactionEvaluation(T+,ˆ
t)
2: ts1←getT imeseries(ˆ
t)
3: sp1←getDW T (ts1)
4: for each tin T+do
5: ts2←getT imeseries(t)
6: sp2←getDW T (ts2)
7: cos ←cos +getCosineSimilarity(sp1, sp2)
8: end for
9: avg ←cos
|T+|
10: if avg > α then β←true else β←f alse
11: return β
12: end procedure
5 Experiments
5.1 Environment
The proposed approach was developed in Java, by using the JWave2library for
the Discrete Wavelet Transformation. The competitor approach (i.e., Random
Forests) and the metrics used for its evaluation have been implemented in R3, by
using randomForest,DMwR, and ROCR packages. For reproducibility reasons,
the Rfunction set.seed() has been used, and the Random Forests parameters
were tuned by finding those that maximize the performance. Statistical differ-
ences between the results were calculated by the independent-samples two-tailed
Student's t-tests (p < 0.05).
5.2 DataSet
The public real-world dataset used for the evaluation of the proposed approach is
related to a series of credit card transactions made by European cardholders4in
two days of September 2013, for a total of 492 frauds out of 284,807 transactions.
It is an highly unbalanced dataset [17], since the fraudulent cases are only the
0.0017% of all the transactions.
For confidentiality reasons all dataset fields have been made public in
anonymized form, except the time, the amount, and the classification ones.
5.3 Metrics
Cosine Similarity: it measures the similarity (Cosim) between two non-zero
vectors v1and v2in terms of cosine angle between them, as shown in the Equa-
tion (7). It allows us to evaluate the similarity between vectors of values returned
by the DW T processes.
Cosim(v1,v2) = cos(v1,v2) = v1·v2
kv1k·kv2k(7)
F-score: it represents the weighted average of the Precision and Recall metrics,
a largely used metric in the statistical analysis of binary classification that re-
turns a value in a range [0,1], where 0 is the worst value and 1 the best one.
More formally, given two sets T(P)and T(R), where T(P)denotes the set of per-
formed classifications of transactions, and T(R)the set that contains the actual
classifications of them, it is defined as shown in Equation 8.
F-score(T(P), T (R)) = 2 ·P recision·Recall
P recision+Recal
with
P recision(T(P), T (R)) = |T(R)∩T(P)|
|T(P)|, Recall(T(P), T (R)) = |T(R)∩T(P)|
|T(R)|
(8)
AUC: the Area Under the Receiver Operating Characteristic curve (AU C ) is
a performance measure used to evaluate the predictive power of a classification
2https://github.com/cscheiblich/JWave/
3https://www.r-project.org/
4https://www.kaggle.com/dalpozz/creditcardfraud
model. Its result is in a range [0,1], where 1 indicates the best performance. More
formally, given the subsets of previous legitimate transactions T+and previous
fraudulent ones T−, its formalization is reported in the Equation 9, where Θ
indicates all possible comparisons between the transactions of the two subsets
T+and T−. The result is obtained by averaging over these comparisons.
Θ(t+, t−) =
1, if t+> t−
0.5, if t+=t−
0, if t+< t−
AUC =1
|T+|·|T−|
|T+|
P
1
|T−|
P
1
Θ(t+, t−)(9)
5.4 Strategy
Cross-validation: in order to improve the reliability of the obtained results
and reduce the impact of data dependency, the experiments followed a k-fold
cross-validation criterion, with k=10, where each dataset is divided in ksubsets,
and each ksubset is used as test set, while the other k-1 subsets are used as
training set, and the final result is given by the average of all kresults.
Threshold Tuning: according to the Algorithm 1 we need to define the op-
timal value of the αparameter, since the classification process depends on it
(Equation 6). It is the average value of cosine similarity calculated between all
the pairs of legitimate transactions in the set T+(α= 0.91 in our case).
5.5 Competitor
The state-of-the-art approach chosen as our competitor is Random Forests. It
was implemented in Rlanguage by using the randomForest and the DMwR pack-
ages. The DMwR package was used to face the class imbalance problem through
the Synthetic Minority Over-sampling Technique (SMOTE) [18], a popular sam-
pling method that creates new synthetic data by randomly interpolating pairs
of nearest neighbors.
5.6 Results
Analyzing the experimental results, we can do the following considerations:
(i) the first set of experiments, which results are shown in Figure 1.a, was
focused on the evaluation of our approach (denoted as W T ) in terms of F-
score. We can observe how it gets performance close to that of its competi-
tor Random Forests, despite the adoption of a proactive strategy (i.e., not
using previous fraudulent transactions during the model training), demon-
strating its ability to define an effective model by exploiting only a class of
transaction (i.e., the legitimate one);
(ii) the second set of experiments, which results are shown in Figure 1.b, was
instead aimed to evaluate the performance of our approach in terms of
AUC. This metric measures the predictive power of a classification model
0.20 0.40 0.60 0.80 1.00
RF
WT
0.95
0.92
(a) F-score
Approaches
0.20 0.40 0.60 0.80 1.00
RF
WT
0.98
0.78
(b) AUC
Approaches
Fig. 1. F-score and AUC perf ormance
and the results indicate that our approach, also in this case, offers perfor-
mance levels close to those of its competitor RF , while not using previous
fraudulent cases to define its model.
(iii) summarizing all the results, the first consideration that arises is related to
the capability of our approach to face the data imbalance and the cold-start
problems, adopting a proactive strategy that only needs a transaction class
for the model definition. The last but not least important consideration is
that such proactivity allows a fraud detection system to operate without the
need to have previous examples of fraudulent cases, with all the advantages
that derive from it.
6 Conclusions and Future Work
Nowadays, credit cards represent an irreplaceable instrument of payment and
such scenario obviously leads towards an increasing of the related fraud cases,
making it necessary to design effective techniques for the fraud detection.
Instead of aiming to outperform the existing state-of-the-art approaches, with
this paper we want to demonstrate that through a new data representation is
possible to design a fraud detection system that operates without the need of
previous fraudulent examples. The goal was to prove that our evaluation model,
defined by using a single class of transactions, is able to offer a level of perfor-
mance similar to one of the best state-of-the-art approaches based on a model
defined by using all classes of transactions (i.e., Random Forests), overcoming
some important issues such as the data imbalance and the cold-start ones.
We can consider the obtained results to be very interesting, given that our
competitor, in addition to use both classes of transactions to train its model,
adopts a data balance mechanism (i.e., SMOTE).
For the aforementioned considerations, a future work will be focused on the
definition of an hybrid fraud detection approach able to combine the advantages
of the non-proactive state-of-the-art approaches with those of our proactive al-
ternative.
Acknowledgments. This research is partially funded by Regione Sardegna un-
der project Next generation Open Mobile Apps Development (NOMAD ), Pac-
chetti Integrati di Agevolazione (PIA)Industria Artigianato e Servizi (2013).
References
1. Chaovalit, P., Gangopadhyay, A., Karabatis, G., Chen, Z.: Discrete wavelet
transform-based time series analysis and mining. ACM Comput. Surv. 43(2) (2011)
6:1–6:37
2. Bolton, R.J., Hand, D.J.: Statistical fraud detection: A review. Statistical Science
(2002) 235–249
3. Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G.: Learned
lessons in credit card fraud detection from a practitioner perspective. Expert Syst.
Appl. 41(10) (2014) 4915–4928
4. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using
ensemble classifiers. In Getoor, L., Senator, T.E., Domingos, P.M., Faloutsos,
C., eds.: Proceedings of the Ninth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27,
2003, ACM (2003) 226–235
5. Gao, J., Fan, W., Han, J., Yu, P.S.: A general framework for mining concept-
drifting data streams with skewed distributions. In: Proceedings of the Seventh
SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis,
Minnesota, USA, SIAM (2007) 3–14
6. Phua, C., Lee, V., Smith, K., Gayler, R.: A comprehensive survey of data mining-
based fraud detection research. (2010)
7. Sorournejad, S., Zojaji, Z., Atani, R.E., Monadjemi, A.H.: A survey of credit
card fraud detection techniques: Data and technique oriented perspective. CoRR
abs/1611.06439 (2016)
8. Chatterjee, A., Segev, A.: Data manipulation in heterogeneous databases. ACM
SIGMOD Record 20(4) (1991) 64–68
9. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study.
Intell. Data Anal. 6(5) (2002) 429–449
10. Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In:
ECML. Volume 4701 of Lecture Notes in Computer Science., Springer (2007) 116–
127
11. Chernick, M.R.: Wavelet methods for time series analysis. Technometrics 43(4)
(2001) 491
12. Percival, D.B., Walden, A.T.: Wavelet methods for time series analysis. Volume 4.
Cambridge university press (2006)
13. Mallat, S.: A theory for multiresolution signal decomposition: The wavelet repre-
sentation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7) (1989) 674–693
14. Breiman, L.: Random forests. Machine Learning 45(1) (2001) 5–32
15. Lessmann, S., Baesens, B., Seow, H., Thomas, L.C.: Benchmarking state-of-the-
art classification algorithms for credit scoring: An update of research. European
Journal of Operational Research 247(1) (2015) 124–136
16. Brown, I., Mues, C.: An experimental comparison of classification algorithms for
imbalanced credit scoring data sets. Expert Syst. Appl. 39(3) (2012) 3446–3453
17. Dal Pozzolo, A., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability
with undersampling for unbalanced classification. In: Computational Intelligence,
2015 IEEE Symposium Series on, IEEE (2015) 159–166
18. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic
minority over-sampling technique. Journal of artificial intelligence research 16
(2002) 321–357