A Fourier Spectral Pattern Analysis to Design Credit Scoring Models
Roberto Saia and Salvatore Carta
Dipartimento di Matematica e Informatica - Universit`a di Cagliari
Via Ospedale 72, 09124 Cagliari - Italy
The increase of consumer credit has made it necessary to research
more and more effective models for the credit scoring. Such mod-
els are usually trained by using the past loan applications, evalu-
ating the new ones on the basis of certain criteria. Although the
state of the art offers several different approaches for their defini-
tion, this process represents a hard challenge due to several reasons.
The most important ones are the data unbalance between the default
and the non-default cases that reduces the effectiveness of almost
all techniques, and the data heterogeneity, which makes it difficult
the definition of a model able to effectively evaluate all the new
loan applications. The approach proposed in this paper faces the
aforementioned problems by moving the evaluation process from
the canonical time domain to a frequency one, using a model based
on the past non-default loan applications. It allows us to overcome
the data unbalance problem by exploiting only a class of data, also
defining a model that is less influenced by the data heterogeneity.
The performed experiments show interesting results, since the pro-
posed approach achieves performance closer or better than that of
one of the best state-of-the-art approaches of credit scoring, such
as random forests, although it operates in a proactive way, only by
exploiting the past non-default cases.
•Information systems →Data stream mining; Clustering and
classification; Business intelligence; •Theory of computation →
Pattern matching; •General and reference →Metrics;
Business intelligence, credit scoring, imbalanced datasets, classifi-
cation, metrics
and from its result depends the acceptation or non acceptation of it.
It is clear how the effectiveness of such models is strongly related
to the gains and losses of the involved financial operators [24], es-
pecially in these last years characterized by an increasing use of the
consumer credit, which is obviously correlated with that of the de-
faulted cases (i.e., loans that have been fully or partially not repaid).
An ideal approach of credit scoring should be able to correctly clas-
sify the new instances into two classes, accepted or rejected), on
the basis of the information given by the past instances.
In other words, the credit scoring techniques can be considered
a set of statistical tools able to calculate the probability that an in-
stance leads toward a default case [26, 35], allowing the financial
operators to evaluate the credit risk [20] and to monitor the credit
activities [8].
The definition of effective credit scoring models represents a
hard challenge due to several problems, the most important of which
is the imbalance in the data used during the model training pro-
cess [4]. It means that these data sources are composed by a small
number of default cases and a big number of non-default ones, an
unbalanced configuration that reduces the effectiveness of almost
all the machine learning approaches [28].
The idea behind this paper is to move the process of evaluation of
the new instances from the canonical time domain to the frequency
one, performing the spectral analysis through the Fourier transfor-
mation [19]. It is performed by using the Fast Fourier Transform
(FFT) algorithm, which allows us to move a time series (i.e. in our
case, a sequence of discrete-time data represented by the feature
values of an instance) from its original time domain to a frequency
one, where we can study the data from a different point of view.
Considering that the model used in the process of evaluation of
the new instances is defined only by using a class of data (the non-
default cases), such approach offers a threefold advantage: it allows
us to operate proactively, it faces the problems related to the cold-
start issue (i.e., the scarcity or absence of default cases), and it re-
duces the issues related to the data heterogeneity, because their new
representation in the frequency domain is less influenced by the
data variation. We compare our approach to the Random Forests
one, since in most cases reported in literature [5, 9, 32] it outper-
forms the other credit scoring approaches.
The main key contributions of this paper are listed below:
(i) definition of the time series to be used in the Fast Fourier
Transform (FFT) process, made on the basis of the data that
compose each instance in the considered datasets;
(ii) formalization of the Fourier spectral analysis comparison pro-
cess, performed in terms of frequency magnitude difference
measured between the time series in the set of the past non-
default instances and that in an unevaluated one;
IML ’17, October 17-18, 2017, Liverpool, United Kingdom Roberto Saia and Salvatore Carta
(iii) definition of the Dynamic Feature Selection (DFS) process,
aimed to assign a different weight to each frequency compo-
nent of the previously extracted instance spectrum, on the ba-
sis of their relevance (in terms of entropy) in the evaluation
(iv) formalization of the Fourier Spectrum Pattern (FSP) algo-
rithm able to classify the new instances as accepted or re-
jected by exploiting the previous spectral analysis, the DFS
process, and the tolerance range ρ.
The remainder of the paper is organized as follows: Section 2
discusses the background and related work; Section 3 provides a
formal notation, makes some premises, and defines the faced prob-
lem; Section 4 describes the implementation of the proposed ap-
proach; Section 5 provides details on the experimental environment,
on the used datasets and metrics, as well as on the adopted strategy
and the chosen competitor, reporting the experimental results at the
end; some concluding remarks and future work are given in the last
Section 6.
Recent literature proposes numerous classification techniques able
to operate in the credit scoring context [18], as well as a big num-
ber of studies aimed to evaluate their performance [32], also by
taking into account the optimal configuration of the involved pa-
rameters [2] and the metrics to be used in order to evaluate the per-
formance [22].
The two most important advantages derived from the adoption
of credit scoring techniques [40] are the capability to infer when it
is reasonable (in terms of potential risks) to grant a loan to some-
one, and the capability to define models able to infer the customer
behavior, information that can be used to propose targeted financial
services. In the context of this paper we take into account only the
first one.
2.1 Credit Scoring Models
In order to perform a credit scoring process it is possible to exploit
many state-of-the-art techniques usually used in the statistic and
data mining fields [1, 10]. Some significant examples are the lin-
ear discriminant models [38], the logistic regression models [26],
the neural network models [6, 16], the genetic programming mod-
els [11, 36], the k-nearest neighbor models [25], and the decision
tree models [14, 46].
It should be observed that in many cases these techniques can
be combined in order to define hybrid approaches of credit scoring.
Some examples are the techniques that exploit the neural networks
and the clustering methods, presented in [27], and the two-stage
hybrid modeling procedure with artificial neural networks and mul-
tivariate adaptive regression splines, proposed in [31, 45].
2.2 Imbalanced Class Distribution
One of the most important problems that makes it difficult the def-
inition of effective models for the credit scoring is the imbalanced
class distribution of data [23, 28]. This issue is given by the fact
that the data used in order to train the models are characterized by
a small number of default cases and a big number of non-default
ones, a distribution of data that limits the performance of the classi-
fication techniques [9, 28].
This problem leads toward misclassification costs, as reported
in [44], which proposes to preprocess the training data through an
over-sampling or under-sampling of the classes, as a possible solu-
The effect of such preprocessing activity on the performance has
been studied in [12, 34].
2.3 Cold Start
The cold start issue [17, 47] arises when there is not enough infor-
mation to train a reliable model about a domain. In the context of
the credit scoring, such scenario appears when the data used to train
the model are not representative of all classes of data [3, 43] (i.e.,
default and non-default cases).
This kind of issue affects many areas, e.g., those related to the
recommender systems [21, 33, 42], since they are usually based on
models defined on the basis of the previous choices of the users
(user profiles), similarly to the credit scoring context, where the
past loan applications are taken into account.
In the approach proposed in this paper the cold start issue can be
reduced/overcome by using only a class of data during the model
definition process (i.e., only the non-default cases in the training
2.4 Random Forests
Since its formalization [7], Random Forests represents one of the
most common techniques for data analysis, thanks to its better per-
formance compared to the other state-of-the-art techniques.
This technique represents an ensemble learning method for clas-
sification and regression that is based on the construction of a num-
ber of randomized decision trees during the training phase and it
infers conclusions by averaging the results.
It is able to face a wide range of prediction problems, without
performing any complex configuration, since it only requires the
tuning of two parameters: the number of trees and the number of
attributes used to grow each tree.
2.5 Fourier Transform and Spectral Analysis
The basic idea behind the approach proposed in this paper is to
move the process of evaluation of the new instances (time series)
from their canonical time domain to the frequency one, in order to
obtain a representative pattern composed by their frequency compo-
nents, as shown in Figure 1.
This operation is performed by recurring to the Discrete Fourier
Transform (DFT ), whose formalization is shown in Equation 1,
where iis the imaginary unit.
The result of the Equation 1 is a set of sinusoidal functions, each
corresponding to a particular frequency component (i.e., the spec-
1The spectrum of the frequency components is the frequency domain representation of
a signal.
A Fourier Spectral Pattern Analysis to Design Credit Scoring Models IML ’17, October 17-18, 2017, Liverpool, United Kingdom
Figure 1: Time and Frequency Domains
If it is necessary, we can use the inverse Fourier transform shown
in Equation 2 to return to the original time domain.
The Fast Fourier Transform (FFT) algorithm, used in the con-
text of this paper to perform the Fourier transformations, rapidly
computes the DFT , or its Inverse Fast Fourier Transform (IDFT ),
by factorizing the input matrix into a product of sparse (mostly zero)
factors. It is largely used because it reduces the computational com-
plexity of the process from O(n2)to O(nlog n), where ndenotes the
data size.
Formal notation, premises, and problem statement related to this
paper are stated in the following:
3.1 Notation
Given a set of classified instances I={i1,i2,...,iN}, and a set of
features V={v1,v2,...,vM}that compose each i∈I, we denote as
I+⊆Ithe subset of non-default instances, and as I−⊆Ithe subset
of default ones.
We also denote as ˆ
iU}a set of unclassified in-
stances and as O={o1,o2,...,oU}these instances after the clas-
sification process, thus |ˆ
It should be observed that an instance can belong only to one
class c∈C, where C={acce pted ,re ject ed}.
Finally, we denote as F={f1,f2,..., fX}the frequency compo-
nents of each instance (spectrum), obtained at the end of the DF T
3.2 Premises
Considering that the periodic wavesare characterized by a frequency
fand a wavelength λ(i.e., the distance in the medium between
the beginning and end of a cycle λ=w
, where wstands for the
wave velocity), which are defined by the repeating pattern, the non-
periodic waves taken into account in the Discrete Fourier Transform
process do not have a frequency or wavelength. Their fundamental
period Tis the period where the wave values were taken and sr de-
notes their number over this time (i.e., the acquisition frequency).
Assuming that the time interval between the acquisitions is equal,
on the basis of the previous definitions applied in the context of this
paper, the considered non-periodic wave is given by the sequence of
values v1,v2,...,vMwith v∈V, which compose each instance i∈I+
(i.e., the past non-default instances) and ˆ
I(i.e., the unevaluated
instances), and that representing the time series taken into account.
Their fundamental period Tstarts with v1and it ends with vM,
thus we have that sr =|V|; the sample interval si is instead given by
the fundamental period Tdivided by the number of acquisition, i.e.,
si =T
Through the FFT algorithm we compute the Discrete Fourier
Transform of each time series i ∈I+and ˆ
I, by converting their
representation from the time domain to the frequency one. The ob-
tained frequency-domain representation provides information about
the signal’s magnitude and phase at each frequency. For this reason,
the output (denoted as x) of the FFT computation is a series of com-
plex numbers composed by a real part xrand an imaginary part xi,
thus x=(xr+ixi).
We can obtain the xmagnitude by using |x|=q(x2
the xphase by using ϕ(x)=arctanxi
xr, although in the context
of this paper we will take into account only the magnitude at each
3.3 Problem Statement
On the basis of the comparison of the spectral analysis λperformed
by the FFT algorithm on the time series in i∈I+and ˆ
I, our FSP
approach classifies each instance ˆ
Ias accepted or rejected.
Given a function eval(ˆ
i,λ)created to evaluate the correctness of
the ˆ
iclassification, which returns a boolean value σ(0=misclassifi-
cation,1=correct classification), we formalize our objective as the
maximization of the results sum, as shown in Equation 3.
The implementation of our approach is carried out through the fol-
lowing steps:
(1) Time Series Definition (T SD): definition of the time se-
ries to use in the FFT algorithm, in terms of sequence of
instance feature values;
(2) Time Series Analysis (T SA): comparison of the Fourier
spectral patterns of two instances, performed by process-
ing their time series, defined in the previous step, through
the FFT algorithm;
(3) Time Series Dynamic Feature Selection (DFS): deter-
mination of the weight of each frequency component in
the instance spectrum, on the basis of the Shannon entropy
(4) Time Series Classification (T SC): formalization of the
FSP algorithm able to classify a new instance as accepted
or rejected, on the basis of the T SA comparison, the DFS
process, and the tolerance range ρ.
In the following, we provide a detailed description of each of
these steps, since we have introduced the high-level architecture of
the proposed FSP approach.
IML ’17, October 17-18, 2017, Liverpool, United Kingdom Roberto Saia and Salvatore Carta
Time Series
DFS and ρ
Figure 2: F SP Architecture
The high-level description shown in Figure 2 wants shortly intro-
duces the processes involved in our approach, which are however
explained in detail in the following.
According to the notation given in Section 3.1, I+and ˆ
respectively, the set of non-default instances and the set of instances
to evaluate, while the set Odenotes the instances in ˆ
Iafter their clas-
sification. We indicates with T(I+)and T(ˆ
I)the time series related,
respectively, with the instances in I+and ˆ
I, and with F(I+)and F(ˆ
the set of frequency components obtained by processing these time
series through the FFT algorithm.
At the beginning, the time series related to the sets of the unevalua-
ted instances and the previous non-default instances are extracted.
They are used as input in the Fourier Transform process, obtaining
as result the spectral pattern of each instance. The classification
of the instances to evaluate is based on a comparison process per-
formed between their spectral patterns and those of the previous
non-default ones, taking into account the importance of each fre-
quency component (in terms of entropy) and a tolerance range ρ
experimentally defined.
4.1 Time Series Definition
In the first step of our approach we define the time series to use in
the Discrete Fourier Transform process.
Formally, a time series represents a series of data points stored
by following the time order and usually it is a sequence captured at
successive equally spaced points in time, thus it can be considered
a sequence of discrete-time data.
In the context of the proposed approach, the time series taken
into account are defined by using the set of features Vthat com-
pose each instance in the I+and ˆ
Isets, as shown in Equation 4, by
following the criterion reported in Equation 5.
v1,1v1,2... v1,M
v2,1v2,2... v2,M
vN,1vN,2... vN,M
v1,1v1,2... v1,M
v2,1v2,2... v2,M
vU,1vU,2... vU,M
The time series related to an item ˆ
Iwill be compared to the
time series related to all the items i∈I+, by following the criteria
explained in the next steps.
4.2 Time Series Analysis
Before we describe the process of analysis based on the Fourier
transformation, it is useful to observe the spectral pattern of an in-
stance randomly taken from a dataset, with |V|=20 (Figure 3), be-
side its canonical representation in the time domain.
The frequency domain representation allows us to perform a data
(represented by the sequence of values assumed by the instance fea-
tures, as described in Section 4.1) analysis in terms of peaks (mag-
nitudes) of the spectral frequencies that compose it. This allows us
to detect some patterns in the features, which are not discoverable
in the time domain.
Comparing the two different domains, we can observe some in-
teresting properties for the context taken into account in this paper.
The most significant are the following:
•The phase invariance property shown in Figure 4 proves
that also in case of translation2between instances, a spe-
cific pattern still exists in the frequency spectrum. More
formally, it is one of the phase properties of the Fourier
transform [41], i.e., a shift of a time series in the time do-
main leaves the magnitude unchanged in the frequency do-
main. This property allows us to detect a particular pattern
in the user behavior, regardless to the involved instances
that compose it. A concrete example is represented by the
values in the features from 6to 11, from 12 to 17, and
from 18 to 23, of the DC dataset (described in Table 2 of
Section 5.2). They report a sequence of values that belong
to three different types of information, related to the loan
applicant (i.e., past repayments,bill statement, and amount
paid), and by exploiting the spectrum pattern analysis we
can detect a specific pattern (behavior), also when it shifts
along the features that compose one of these subsets of val-
•Another interesting aspect of the frequency domain is given
by the amplitude correlation property shown in Figure 5.
It proves the existence of a direct correlation between the
values assumed by the features in the temporal domain
and the magnitudes of the spectral components in the fre-
quency domain. More formally, it is the homogeneity prop-
erty of the Fourier transform [41], i.e., when the amplitude
is altered in one domain, it is altered by the same entity
2In terms of signal it represents a change of phase, considering that a translation in
time domain corresponds to a change in phase in the frequency domain.
A Fourier Spectral Pattern Analysis to Design Credit Scoring Models IML ’17, October 17-18, 2017, Liverpool, United Kingdom
0 10 20
Time (Series)
0.0 0.2 0.4
Frequency (Hz)
Magnitud e
Figure 3: Time and Frequency Domains
1 2 3 4 5 6 7 8 9 10
Time (Series)
1 2 3 4 5 6 7 8 9 10
Time (Series)
0.2 0.4
Frequency (Hz)
Magnitud e
0.2 0.4
Frequency (Hz)
Magnitud e
Figure 4: Phase Invariance Pro pert y
1 2 3 4 5 6 7 8 9 10
Time (Series)
1 2 3 4 5 6 7 8 9 10
Time (Series)
0.2 0.4
Frequency (Hz)
Magnitud e
0.2 0.4
Frequency (Hz)
Magnitud e
Figure 5: Am plitud e Correlat ion Pro perty
in the other domain3. This property assures us of the ca-
pability of the frequency representation to differentiate the
instances on the basis of the size of the values in their fea-
Practically, the process of analysis is performed by moving the
time series of the instances to compare from their time domain to
the frequency one, by recurring to the FFT approach introduced in
Section 2.5.
In this context, although there are many algorithms able to cal-
culate the FFT, the most used are those based on the Cooley-Tukey
recursive algorithm. It grants us a decimation in time on the basis
3Scaling in one domain corresponds to scaling in the other domain
x·· · ||· · ·
Magnitud e
Figure 6: Delt a Di f f erence
of the following considerations: when the number Nof input data
is even, it is possible to express it as N=2·M, allowing us to split
the Nelement summation of the DFT formula into two Melement
ones, one over n=2·m, another over n=2·(m+1), as shown in the
Equation 6.
We implement the FFT approach by using the JTransforms Java
library, according to what reported in Section 5.1.
The process of comparison between an instance ˆ
Ito evaluate
and a past non-default instance i∈I+is performed by measuring the
difference ∆between the magnitude |f|of each component f∈F
in the frequency spectrum of the involved instances.
It is shown in the Equation 7, where f1
xand f2
xdenote, respec-
tively, the same frequency component of an item i∈I+and an item
I. Such process of comparison between the same frequency
component of two instances is also graphically shown in Figure 6.
x| − | f2
x|,with |f1
x| ≥ | f2
It should be noted that, as described in Section 4.4, for each in-
stance ˆ
Ito evaluate, the aforementioned process is repeated by
comparing it to each instance i∈I+. This allows us to evaluate the
variation ∆in the context of all the non-default past cases.
4.3 Time Series Dynamic Feature Selection
In the context of machine learning and statistics, the feature se-
lection process is aimed to detect a subset of relevant features to
use during the model definition. It represents an important prepro-
cessing step, since it reduces the complexity of the final model, de-
creasing the training times, and increasing the generalization of the
model. It also reduces the problem related with the overfitting, a
problem that occurs when a statistical model describes random error
or noise instead of the underlying relationship, and this frequently
happens during the definition of excessively complex models, since
many parameters, with respect to the number of training data, are
In the context of our approach, we perform the feature selection
task in a dynamic way, by measuring the Shannon entropy (i.e., a
metric described in Section 5.3.1) in each feature of the training
datasets (Figure 7).
Since the entropy gives us a measure of the uncertainty of a ran-
dom variable, the larger it is, the less a-priori information one has
on the value of it, then the entropy increases as the data becomes
equally probable and decreases when their chances are unbalanced.
IML ’17, October 17-18, 2017, Liverpool, United Kingdom Roberto Saia and Salvatore Carta
2 4 68 10 12 14 16 18 20
Features (GC dataset )
Entro py
2 4 68 10 12 14 16 18 20 22
Features (DC dataset )
Entro py
Figure 7: Instance Features Ent ropy
The adoption of different weights during the evaluation process,
performed by the Dynamic Feature Selection (DFS) approach, is
aimed to differentiate the instance features on the basis of their pre-
dictive power.
For the needs of the Algorithm 1, we formalize the DFS in terms
of inverse Shannon entropy (in order to have a high value when the
feature is important, and a low value otherwise). Such formalization
is shown in Equation 8, where P(f)indicates the probability that the
frequency component fis present in the set F.
The obtained result represents the weight of the ffrequency com-
ponent (in terms of entropy) to use in the evaluation process de-
scribed in the Section 4.4.
DFS(f)=1− −∑
4.4 Time Series Classification
This section formalizes the FSP algorithm used to perform the clas-
sification of new instances, together with the analysis of its asymp-
totic time complexity.
4.4.1 Algorithm. The proposed FSP approach is based on the
Algorithm 1. It takes as input the set I+of non-default instances oc-
curred in the past, the set ˆ
Iof unevaluated instances, and the toler-
ance range ρ(determined as described in Section 5.4.2). It returns
as output a set Othat contains all the instances in ˆ
I, classified as
accepted or rejected.
From step 2 to step 23 we process all unevaluated instances ˆ
by starting with the extraction of the time series of each instance
(step 3), which is processed at step 4 in order to obtain the fre-
quency spectrum. From the step 5 to step 15 we instead process
each non-default instance i∈I+, by performing the extraction of the
time series of each instance (step 6) and by obtaining its frequency
spectrum (step 7).
Algorithm 1 FS P I nstances cl assi f ication
Input: I+=Non-default instances, ˆ
I=Unevaluated instances, ρ=Tolerance range
Output: O=Set of classified instances
2: for each ˆ
iin ˆ
3: ts1=getT imeseries(ˆ
4: F1=getF FT (t s1)
5: for each iin I+do
6: ts2=getT imeseries(i)
7: F2=getF FT (ts2)
8: for each fin Fd o
9: if (|F2(f)|− |F1(f)| ∈ ρ)then
10: reliable += DF S(f)
11: else
12: unreliable += DF S(f)
13: end if
14: end for
15: end for
16: if reliable >unrel iable then
17: O←(ˆ
i,accepted )
18: else
19: O←(ˆ
i,reject ed)
20: end if
21: reliable =0
22: unreliable =0
23: end for
24: return O
25: end procedure
The steps from 8to 14 verify if the difference between the mag-
nitude of each frequency components f∈Fof the non-default in-
stances and the correspondent component of the current instance, is
within the ρrange.
On the basis of the result of this operation, in the steps from
9to 13, the weight (in terms of entropy) of the current frequency
component is used in order to increase the reliable value (when the
difference is within the ρrange) or the unreliable one (otherwise)
(steps 10 and 12).
On the basis of these two values the instance under evaluation
is classified as accepted or rejected in the steps from 16 to 20, and
the result of the classification process is returned by the algorithm
at the step 24, when all instances ˆ
Ihave been processed.
4.4.2 Asymptotic Time Complexity. Although the evaluation of
the time needed to perform the classification of a single instance
is quite unnecessary, the possible implementation of the proposed
FSP approach in a real-time scoring system [37], where the response-
time represents a crucial factor, suggests us to analyze the theoreti-
cal complexity of the classification Algorithm 1.
Denoted as Nthe dimension of the training set I+(i.e., N=|I+|),
we define the asymptotic time complexity of the evaluation of a
single instance (according to the Big O notation) by observing what
(i) the Algorithm 1 presents three nested loops given by the outer
loop that starts at step 2, which executes Ntimes the other two
inner loops (the first that starts at step 5 and the second that
starts at step 8), plus other operations (getTimeseries,getFFT,
comparisons, and assignations), respectively with complexity
of O(n),O(n log n),O(1), and O(1);
(ii) the first inner loop executes one time the same aforementioned
operations, plus its inner loop that executes operations with
complexity O(1)(comparisons and assignations) for a num-
ber of times lesser than N(i.e., for |F|times);
On the basis of the previous considerations, we can conclude that
the asymptotic time complexity of the algorithm is O(N2).
A Fourier Spectral Pattern Analysis to Design Credit Scoring Models IML ’17, October 17-18, 2017, Liverpool, United Kingdom
It should also be noted that the computational time can be ade-
quately reduced by distributing the process over different machines,
by employing large scale distributed computing models like MapRe-
duce [15].
This section reports information about the experimental environ-
ment, the used datasets and metrics, the adopted strategy, the chosen
competitor, as well as the results of the performed experiments.
5.1 Environment
The proposed FSP approach was developed in Java, where we use
the JTransforms4library to operate the Fourier transformations.
The state-of-the-art approach used to evaluate its performance
was made in R5, using the randomForest and ROCR packages, as
detailed in Section 5.5.
The experiments have been conducted by using two real-world
datasets, both characterized by a strong unbalanced distribution of
It should be further added that we verified the existence of a
statistical difference between the results, by using the independent-
samples two-tailed Student's t-tests (p<0.05).
5.2 Datasets
The two real-world datasets used in the experiments (i.e., German
Credit and Default of Credit Card Clients datasets, both available
at the UCI Repository of Machine Learning Databases6) represent
two benchmarks in this research field. In the following we provide
a brief description of their characteristics:
5.2.1 German Credit (GC). It contains 1,000 instances: 700 of
them are non-default instances (70.00%) and 300 are default in-
stances (30.00%). Each instance is composed by 20 features (whose
type is described in Table 1) and a binary class variable (accepted
or rejected).
Table 1: Dataset GC Fields
Feature Description Feature Description
01 Status of checking account 11 Present residence since
02 Duration 12 Property
03 Credit history 13 Age
04 Purpose 14 Other installment plans
05 Credit amount 15 Housing
06 Savings account/bonds 16 Existing credits
07 Present employment since 17 Job
08 Installment rate 18 Maintained people
09 Personal status and sex 19 Telephone
10 Other debtors/guarantors 20 Foreign worker
5.2.2 Default of Credit Card Clients (DC). It contains 30,000 in-
stances: 23,364 of them are non-default instances (77.88%) and
6,636 are default instances (22.12%). Each instance is composed
by 23 features (whose type is described in Table 2) and a binary
class variable (accepted or rejected).
Table 2: Dataset DC Fields
Feature Description Feature Description
01 Credit amount 13 Bill statement in Aug-2005
02 Gender 14 Bill statement in Jul-2005
03 Education 15 Bill statement in Jun-2005
04 Marital status 16 Bill statement in May-2005
05 Age 17 Bill statement in Apr-2005
06 Past repayments in Sep-2005 18 Amount paid in Sep-2005
07 Past repayments in Aug-2005 19 Amount paid in Aug-2005
08 Past repayments in Jul-2005 20 Amount paid in Jul-2005
09 Past repayments in Jun-2005 21 Amount paid in Jun-2005
10 Past repayments in May-2005 22 Amount paid in May-2005
11 Past repayments in Apr-2005 23 Amount paid in Apr-2005
12 Bill statement in Sep-2005
5.3 Metrics
This section introduces the metrics used in the context of this paper.
5.3.1 Shannon Entropy. The Shannon entropy, formalized by
Claude E. Shannon in [39], is one of the most important metrics
used in information theory. It reports the uncertainty associated
with a random variable, allowing us to evaluate the average mini-
mum number of bits needed to encode a string of symbols, based
on their frequency.
More formally, given a set of values v∈V, the entropy H(V)is
defined as shown in the Equation 9, where P(v)is the probability
that the element vis present in the set V.
In the context of the classification methods, the entropy-based
metrics are frequently used during the feature selection [13, 29, 30]
process, which is aimed to detect a subset of relevant features (vari-
ables, predictors) to use during the definition of the classification
model. We use it for this task, dynamically, as described in Sec-
tion 4.3.
5.3.2 Accuracy. The Accuracy metric reports the number of in-
stances correctly classified, compared to the total number of them.
More formally, given a set of instances Xto be classified, it is
calculated as shown in Equation 10, where |X|stands for the total
number of instances, and |X(+)|for the number of those correctly
5.3.3 Sensitivity. Differently from the accuracy metric previ-
ously described, which takes into account all kind of classifications,
through the Sensitivity we only obtain information about the num-
ber of instances correctly classified as reliable. It gives us an impor-
tant information, since it evaluates the predictive power of our FSP
approach in terms of capability to detect the reliable loan applica-
tions, offering a crucial decision support in real-world contexts.
More formally, given a set of instances Xto be classified, the
Sensitivity is calculated as shown in Equation 11, where |X(T P )|
stands for the number of instances correctly classified as reliable
and |X(FN )|for the number of reliable instances wrongly classified
as unreliable.
Sensitivity(X)=|X(T P)|
|X(TP )|+|X(FN)|(11)
IML ’17, October 17-18, 2017, Liverpool, United Kingdom Roberto Saia and Salvatore Carta
5.3.4 F-measure. The F-measure is the weighted average of
the precision and recall metrics. It is a largely used metric in the sta-
tistical analysis of binary classification, returning a value in a range
[0,1], where 0 is the worst value and 1 the best one.
More formally, given two sets Xand Y, where Xdenotes the set
of performed classifications of instances, and Ythe set that contains
the actual classifications of them, this metric is defined as shown in
Equation 12.
F-measure(X,Y)=2·(precision(X,Y)·recall (X,Y) )
(precision(X,Y)+recall (X,Y))
|X|,recall (X,Y)=|Y∩X|
5.4 Strategy
This section reports information about the strategy adopted during
the execution of the experiments.
5.4.1 Cross-validation. In order to reduce the impact of data
dependency, improving the reliability of the obtained results, all
the experiments have been performed by using the k-fold cross-
validation criterion, with k=10.
Each dataset is randomly shuffled, then it is divided in ksubsets,
and each ksubset is used as test set, while the other k-1 subsets are
used as training set. The final result is given by the average of all
5.4.2 Tolerance Range. Considering that we have introduced a
tolerance range ρin the evaluation process performed by the Al-
gorithm 1 (Section 4.4.1), we need to define its upper and lower
bounds in the context of each dataset.
This range is used in the spectrum comparison process in order
to determine when a ∆value, i.e., the difference between the mag-
nitude of the same frequency component of two instances (one of
them that belongs to the non-default cases and the other one that
represents the instance to evaluate), as shown in Figure 6, must be
considered acceptable or not (the classification of an instance as
accepted or rejected depends on the results of these evaluations).
For each frequency component f∈F, measured in the set of
past non-default instances I+, we calculate the difference in terms
of magnitude between each possible pair (f,ˆ
f), with f,ˆ
Denoting as |f−ˆ
f|I+the aforementioned process of calculation
of the differences between the magnitudes assumed by the same
frequency component fin the dataset I+, we define the tolerance
range ρof each f∈Fas shown in the Equation 13.
ρ=[ρmin,ρmax ]
ρmin =min(|f−ˆ
f|I+),ρmax =max(|f−ˆ
Differently from our competitor approach (i.e., Random Forests),
which allows us to determine the parameters value that leads toward
its best performance, our approach adopts a dynamic method in or-
der to determine the optimal range (minimum and maximum value)
for each frequency component, instead to use a single range for all
of them. For this reason, we can not determine these ranges of val-
ues a priori, since they are strictly related to the dataset taken into
account, according to the Equation 13.
2 4 68 10 12 14 16 18 20 22 24 26 28 30
(GC)mtry parameter
Figure 8: Random Forests T uning
5.5 Competitor
Here, we describe the state-of-the-art approach chosen as competi-
tor in order to evaluate the performance of our approach, beside the
parameter tuning process aimed to optimize its performance.
5.5.1 Description. As mentioned previously, the implementa-
tion of the state-of-the-art approach to which we compare our ap-
proach was made in R, by using the randomForest and ROCR pack-
ages. For reproducibility reasons, we fix the seed of the random
number generator by calling the Rfunction set.seed().
5.5.2 Parameters Tuning. In order to get the best performance
from the RF approach, we need to perform a tuning process aimed
to detect the optimal value of its configuration parameters.
The caret package in Rprovides an excellent functionality to per-
form this type of operation. Considering that caret supports only
those algorithm parameters that have a crucial role in the tuning
process, such as the mtry in the RF (number of variables randomly
sampled as candidates at each split), we use caret in order to tune
this parameter. The operation was performed by following the grid
search approach, where each axis of the grid is an algorithm param-
eter and the points in the grid are specific parameters combinations.
The tests were stopped as soon as the measured accuracy did
not improve further. Although the differences are minimal beyond
certain values, as can be seen in the Figure 8, the experiments indi-
cate as optimal value for mtry 27 for the GC dataset and 8for the
DC dataset, since these values lead toward the maximum value of
Accuracy (i.e., respectively, 75.36% and 81.26%).
5.6 Results
This section reports, presents and discusses the results of the per-
formed experiments.
5.6.1 Overview. A first analysis of the experimental results (re-
ported in Figure 9, Figure 10, and Figure 11) shows that:
(i) the performance of our F SP approach is very similar to the
RF one, in terms of Accuracy, with both the GC and DC
(ii) the F SP approach gets better performance than RF one, in
terms of F-measure, by using the DC dataset, and very close
to it by using the GC dataset;
(iii) the F SP approach outperforms the RF one, in terms of Sensi-
tivity, with both the GC and DC datasets.
The above aspects will be more deeply discussed in the next Sec-
tion 5.6.2 and Section 5.6.3.
A Fourier Spectral Pattern Analysis to Design Credit Scoring Models IML ’17, October 17-18, 2017, Liverpool, United Kingdom
0.20 0.40 0.60 0.80 1.00
Figure 9: Accuracy Per f ormance
0.20 0.40 0.60 0.80 1.00
Figure 10: F-measure Per f ormance
0.20 0.40 0.60 0.80 1.00
Figure 11: Sensitivity Per f ormance
5.6.2 Discussion. The first observation that rises by examining
in more detail the experimental results is related to the fact that our
FSP approach gets performance very close (or better) to those of
RF one, although it does not exploit the past default cases during
the training process.
Another observation is instead related to the F-measure results,
which show that the effectiveness of the FSP approach increases
with the number of past non-default instances involved in the train-
ing process (DC dataset), differently from the RF approach, where
this does not happen, although its training process involves both the
default and the non-default past cases.
It should be noted that, in the light of the obtained results, the
proactivity that characterizes our approach can reduce/overcome
the cold-start problem described in Section 2.3, allowing a real-
world system to operate even in the absence of previous cases of
default instances, with all the advantages that derive from it.
The last but not less important observation is related to the re-
sults in terms of sensitivity, which show significant improvements,
compared to the state-of-the-art RF approach taken into account. It
means that the number of correct true positive classifications of in-
stances is higher than that obtained by the RF approach, and this
provides a clear benefit in a real-world context.
5.6.3 Benefits and Limitation. The experimental results presented
and discussed before show that our approach performs similarly to
one of the best performing state-of-the-art approaches such as Ran-
dom Forests, although it operates in a proactive manner.
In addition, it is able to outperform Random Forests when the
training process involves a large number of previous non-default
cases, proving to be more effective than its competitor in the identi-
fication of the reliable instances.
On the basis of these results, a benefit related to the adoption of
our credit scoring approach is its ability to face the data unbalance
problem that reduces the effectiveness of the canonical approaches,
since it exploits only a class of data in the model definition process
(i.e., the previous non-default instances). Such proactive strategy
also reduces/overcomes the well-known cold-start problem.
Another benefit is instead related to the fact that the model used
during the evaluation process, based on the spectral pattern of the
instances, is more stable than the canonical one, because the fre-
quency components are less influenced by the data heterogeneity.
For the aforementioned reasons, our approach can be used in
order to create hybrid approaches able to operate in all contexts, by
combining its capability to operate proactively with the advantages
offered by the non-proactive state-of-the-art approaches.
We can also identify as main limitation of our approach its few
benefits in those cases where it exists a balanced data distribution
with enough default and non-default cases to use during the model
training. However, it should be underlined that this represents an
uncommon real-world scenario.
The credit scoring techniques cover a crucial role in many finan-
cial contexts (i.e., bank loans, mortgage lending, insurance policies,
etc.). They are adopted by the financial operators in order to as-
sess the potential risks related to the customer applications, allow-
ing them to reduce the losses due to default.
In this paper we proposed a novel approach of credit scoring
able to classify the new instances as accepted or rejected by eval-
uating them in terms of frequency spectral pattern. This operation
is performed by moving the evaluation process from the canonical
domain to a frequency one, where the evaluation model is defined
by using only the past non-default loan applications.
Such strategy presents two main advantages, the first of them is
related to its ability to face the data unbalance issue, facing at the
same time the cold-start problem, and the second one is related to
its capability to define a model only by exploiting the non-default
previous instances, allowing a system to operate proactively.
Future work would explore the effect, in terms of performance,
of the inclusion of the default past instances in the model definition
process, evaluating the advantages and disadvantages of the adop-
tion of such non-proactive strategy.
Another interesting study would be to experiment the exploita-
tion of other characteristics of the instances represented in the fre-
quency domain, with the objective to improve the effectiveness of
the classification algorithm.
A secondary but also interesting future work would be the evalu-
ation of our approach in the context of heterogeneous environments,
where numerous types of financial data are involved (e.g., the elec-
tronic commerce environment).
IML ’17, October 17-18, 2017, Liverpool, United Kingdom Roberto Saia and Salvatore Carta
This research is partially funded by Regione Sardegna under project
“Next generation Open Mobile Apps Development” (NOMAD), “Pac-
chetti Integrati di Agevolazione” (PIA) - Industria Artigianato e
Servizi - Annualit`a 2013.
