ChapterPDF Available

A Two-Step Feature Space Transforming Method to Improve Credit Scoring Performance


Abstract and Figures

The increasing amount of credit offered by financial institutions has required intelligent and efficient methodologies of credit scoring. Therefore, the use of different machine learning solutions to that task has been growing during the past recent years. Such procedures have been used in order to identify customers who are reliable or unreliable, with the intention to counterbalance financial losses due to loans offered to wrong customer profiles. Notwithstanding, such an application of machine learning suffers with several limitations when put into practice, such as unbalanced datasets and, specially, the absence of sufficient information from the features that can be useful to discriminate reliable and unreliable loans. To overcome such drawbacks, we propose in this work a Two-Step Feature Space Transforming approach, which operates by evolving feature information in a twofold operation: (i) data enhancement; and (ii) data discretization. In the first step, additional meta-features are used in order to improve data discrimination. In the second step, the goal is to reduce the diversity of features. Experiments results performed in real-world datasets with different levels of unbalancing show that such a step can improve, in a consistent way, the performance of the best machine learning algorithm for such a task. With such results we aim to open new perspectives for novel efficient credit scoring systems.
Content may be subject to copyright.
A Two-Step Feature Space Transforming Method to
Improve Credit Scoring Performance
Salvatore Carta, Gianni Fenu, Anselmo Ferreira, Diego R. Recupero, and Roberto Saia
Department of Mathematics and Computer Science, University of Cagliari, Italy,
Abstract. The increasing amount of credit offered by financial institutions has
required intelligent and efficient methodologies of credit scoring. Therefore, the
use of different machine learning solutions to that task has been growing dur-
ing the past recent years. Such procedures have been used in order to identify
customers who are reliable or unreliable, with the intention to counterbalance fi-
nancial losses due to loans offered to wrong customer profiles. Notwithstanding,
such an application of machine learning suffers with several limitations when put
into practice, such as unbalanced datasets and, specially, the absence of sufficient
information from the features that can be useful to discriminate reliable and un-
reliable loans. To overcome such drawbacks, we propose in this work a Two-Step
Feature Space Transforming approach, which operates by evolving feature infor-
mation in a twofold operation: (i) data enhancement; and (ii) data discretization.
In the first step, additional meta-features are used in order to improve data dis-
crimination. In the second step, the goal is to reduce the diversity of features.
Experiments results performed in real-world datasets with different levels of un-
balancing show that such a step can improve, in a consistent way, the performance
of the best machine learning algorithm for such a task. With such results we aim
to open new perspectives for novel efficient credit scoring systems.
Keywords: Business Intelligence ·Credit Scoring ·Machine Learning·Algo-
rithms ·Transforming
1 Introduction
A report from Trading Economics [1,2], which is based on the information provided
by the European Central Bank1data, has shown that credit for consumers has been
regularly increasing over the last years. Such behavior in the Euro zone, which can be
seen in Fig. 1, is also noticed in other markets such as Russia and USA. This increasing
phenomenon has forced Credit Rating Agencies (CRAs), also known as ratings ser-
vices, to define and establish intelligent strategies to offer credit for the right customers,
minimizing financial losses due to bad debts.
Nowadays, CRAs have been using credit scoring systems coupled with machine
learning solutions in order to perform credit scoring. Such approaches take into account
the big data nature of credit datasets, which can enable machine learning models that
can understand credit information from clients and, consequently, discriminate them
660 670 680 690
Apr. 2018
Jul. 2018
Oct. 2018
Jan. 2019
Apr. 2019
Fig. 1. Euro zone consumer credit in billions of euros
into reliable or non-reliable users. Credit scoring systems have been vital in many finan-
cial areas [3], as they avoid human interference and eliminate biased analyzes of people
information who request financial services, such as a loan. Basically, most of these ap-
proaches can be considered probabilistic approaches [4], performing credit scoring by
calculating in real time the probability of the loan being repaid, partially repaid and even
not repaid based on the given information (e.g., age, job, salary, previous loans status,
marital status, among others) in credit scoring datasets, helping the financial operator
in the decision of grating or not a financial service [5].
Notwithstanding, credit scoring systems are still limited as a solution for defining
loans for three main reasons. The fist one lies in the dataset nature of the problem itself.
Similar to other problems, namely fraud or intrusion detection [6–8], the source of data
typically contains different distributions of classes [9, 10], which, in the specific case
of credit scoring, is more favorable to the reliable instances rather than to the unreli-
able ones [11]. Such a behavior can seriously affect the performance of classification
algorithms, once they can be often biased to classify the most frequent class [11, 12].
The second problem comes from the fact that some datasets face the cold start issue,
on which the unreliable cases do not even exist. Such an issue has motivated several
proactive methods in the literature to deal with such a problem [13–15]. The last prob-
lem, which motivates our solution presented in this work, arises from the heterogeneity
of the data. Such limitation highlights the fact that the data, the way they are disposed
in datasets, are not enough to describe the different instances. Such information is char-
acterized by features that are very different from each other, even thought they belong
to the same class of information. Therefore, further feature transformations are still
needed to provide insightful credit scoring.
Based on our previous experience [13–18] to deal with credit scoring, we present
in this work a solution for data heterogeneity in credit scoring datasets. We do that by
assessing the performance of a Two-Step Feature Space Transforming (TSFST) method
we previously proposed in [19] to improve credit scoring systems. Our approach to im-
prove features information has a twofold process, composed of (i) enrichment; and (ii)
discretization phases. The enrichment step adds several meta-features in the data, in or-
der to better spread the different instances into separated clusters in the Ddimensional
space, whereas the discretization process is done to reduce the number of feature pat-
terns. For the sake of avoiding the risk of overfitting [20] associated to our method and
also highlight its real advantages, we adopted an experimental setup that aims at as-
sessing the real performance of financial systems [21]. Such a methodology considers
our TSFST method evaluated on data never seen before, which we name out-of-sample,
and trained on known and different data, which we name in-sample data. Experiments
considering different classifiers dealing with such feature improvement method spot the
effectiveness of such approach, which can mitigate even the data unbalance nature of
such datasets.
In summary, the main contributions provided through this work are:
1. The establishment of the Two-Step Feature Space Transforming approach, which
enriches and discretizes the original features from credit scoring datasets in order
to boost machine learning classifiers performance when using these features.
2. The assessment of the best classifier to be used with the proposed method, done
after a series of experiments considering the in-sample part of the datasets.
3. The analysis of the method performance considering the out-of-sample part of each
dataset, adding a comparison with the same canonical approach but without con-
sidering our proposed method.
The work presented in this paper is based on our previously published one [19].
Notwithstanding, it has been extended in such a way to add the following new discus-
sions and contributions:
1. Extension of the Background and Related Work section by discussing more relevant
and recent state-of-the-art approaches, extending the information related to this re-
search field with the aim to provide the readers a quite exhaustive overview on the
credit scoring scenario.
2. We changed the order of operations reported in our previous work [19], as we real-
ized that it achieves better results.
3. Inclusion of a new real-world dataset, which allows us to evaluate the performance
using a dataset characterized by a low number of instances (690, which is lower
than 1000 and 30000 from the other datasets) and features (14, which is also a low
number if compared to 21 and 23 from the other datasets).
4. We better discuss the composition of the in-sample and out-of-sample datasets in
terms of number of involved instances and classes. Our choice is based on better
providing details about the data imbalance that is present during both the definition
of the model (done with the in-sample dataset) and the evaluation of its performance
(done with the out-of-sample dataset).
5. We perform an analysis of the asymptotic time complexity related to the proposed
algorithm, which adds valuable information in the context of considering real-time
credit scoring systems.
The rest of this paper has been structured as follows: Section 2 provides information
about the background and the related work of the credit scoring domain. Section 3
introduces the formal notation used and provides the formalization of our proposed
method. Section 4 describes the experimental environment considered. Section 5 reports
and discusses the experimental results in the credit scoring environment and, finally,
Section 6 makes some concluding remarks and points out some directions for future
2 Related Work
In the past few years, it has been witnessed an increasing investment and research over
the credit scoring applications with the aim of performing efficient credit scoring. The
literature [22] describes several kinds of credit risk models in respect to the unreliable
cases, which are commonly known as default cases. Such models are divided into: (i)
Probability of Default (PD) models, which investigate the probability of a default in a
period; (ii) Exposure at Default (EAD) models, which analyse the value the financial
operator is exposed to if a default happens; and (iii) Loss Given Default (LGD) models,
which evaluate the amount of money the operator loses after a default happens. In this
section, we discuss the related work of the first kind of models (PD) only, as they are
related to our proposed method. Further details of EAD and LGD models can be found
in several surveys in the literature [23–25].
The related work in PD models can be strictly divided into six main branches. The
first branch of research is based on statistical methods. For instance, the work in [26]
applies Kolmogorov-Smirnov statistics in credit scoring features to discriminate de-
fault and non-default users. Other methods, such as the Logistic Regression (LR) [27]
and Linear Discriminant Analysis [28] are also explored in the literature to predict the
probability of a default. In [29], the authors propose to use Self Organized Maps and
fuzzy k-Nearest Neighbors for credit scoring.
The second branch of research aims to explore data features transformed into other
feature domains. The work in [18] processes data in the wavelet domain with three
metrics used to rate customers. Similarly, the approach in [17] uses differences of mag-
nitudes in the frequency domain. Finally, the approach in [13] performs comparison of
non-square matrix determinants to allow or deny loans.
The third branch of approaches, which is among the most popular ones in credit
scoring management, is based on machine learning models. In this topic, the work in
[30] considers a Random Forest on preprocessed data. A three-way decision methodol-
ogy with probability sets is considered in [31]. In [32], a deep learning Convolutional
Neural Network approach is applied to pre-selected features that are converted to im-
ages. A specific Support Vector Machines with kernel-free fuzzy quadratic surface is
proposed in [33]. The work in [34] reports the beneficial use of bagging, boosting and
Random Forest techniques to plan and evaluate a housing finance program. An exten-
sive work with machine learning is done in [25], where forty-one methods are compared
when applied to eight Credit Scoring datasets.
In the fourth branch of research, approaches based on general artificial intelligence
such as neural networks have been explored. For example, authors in [35] present the
application of artificial intelligence in the credit scoring area. In [36], the authors use
a novel kind of artificial neural network called extreme learning machines. The work
in [37] reports credit score prediction using the Takagi-Sugeno neuro-fuzzy network.
Finally, the work in [38] performs a benchmark of different neural networks for credit
The fifth branch of research considers hybrid approaches, where more than one
model is used to perform a final decision of credit scoring. The work in [39] used
Gabriel Neighbourhood Graph and Multivariate Adaptive Regression Splines together
with a new consensus approach. Authors in [40] used seven base different classifiers
in dimensionality reduced data with Neighborhood Rough Set. The authors propose a
novel ranking technique used to decide the top-5 best classifiers to be part of a lay-
ered ensemble. The work in [41] uses several classifiers to validate a feature selection
approach called group penalty function. In [42], a similar procedure is done, but includ-
ing normalization and dimensionality reduction preprocessing steps and an ensemble
of five classifiers optimized by a Bayesian algorithm. The same number of classifiers
is used in [43], but with genetic algorithm and fuzzy assignment. In [44], ensembles
are done according to classifier soft probabilities and, in [45], an ensemble with feature
clustering-based feature is done in a weighted voting approach.
The last set of models consider specific features of the problem, such as user pro-
filing in social networks [46–49], news from media [50], data entropy [16] , linear-
dependence [13, 15], among others. One interesting research in this topic is considering
proactive methods [13–15], which previously assume that the credit scoring datasets are
biased and alleviate such a problem before they happen.
Although several approaches have been proposed in literature, there are still many
challenges in credit scoring research. All these issues reduce in a significant way the
performance of Credit Scoring systems, specially when applied to real-world credit risk
management. Such challenges can be enumerated as follows:
1. Lack of Datasets, caused mainly by privacy, competition, or legal issues [51].
2. Non-adaptability, commonly known as overfitting, where Credit Scoring models
are unable to correctly classify new instances.
3. Cold-start, when the datasets used to train a model do not contain enough informa-
tion about default and non default cases [52–56].
4. Data Unbalance, where an imbalanced class distribution of data [57, 58] is found,
being typically beneficial to the non-default class.
5. Data Heterogeneity, where the same information is represented differently in dif-
ferent data samples [59].
Our approach differs from the previous ones in the literature as it deals with the
Data Heterogeneity problem in a two step process. To do that, we perform a series of
transforming steps in order to make the original heterogeneous data better discernible
and separable, which can boost the performance of any classifier. More details of our
approach are discussed in the next section.
3 The Two-step Feature Space Transforming Approach
Before discussing our approach in details, let us define the formal notation used from
this section to the rest of this work. Given a set S={s1,s2,...,sX}of samples (or
instances) already classified in another set C={reliable,unreliabl e}, we then split S
into subsets S+Sof reliable or non default cases, and another subset SSof
unreliable cases, where S+S=/
0. Lets also consider another set P={p1,p2,...,pX}
as the labels (or predictions) given by a credit scoring system for each sample that
will split Sas discussed before, and Y={y1,y2,...,yX}their true labels where PC,
YCand |S|=|P|=|Y|. By considering that each sample has a set of features F=
{f1,f2,..., fN}and that each sample belongs to only one class in the set C, we can
formalize our objective as shown in Equation 1 as follows:
β(pz==yz), (1)
where βbis a logical function that converts any proposition binto 1 if the proposition
is true, and 0 otherwise. In other words, our goal is to maximize the total number of
correct predictions, or β(pz==yz)=1. To increase αof this objective function, several
approaches can be chosen, as discussed previously in the related work in Section 2.
These can be: (i) select and/or transform features [13, 17, 18]; (ii) select the best classi-
fier [30, 32, 33]; or (iii) select the best ensemble of classifiers [39, 40, 42].
In this work, we choose a solution that includes the first and second approaches
simultaneously, proposing a twofold transforming technique that boosts features fF
and applying them to the best classifier. This boosting is done in such a way to better
distribute the features to the classes of interest in the Ndimensional space. With such
a procedure, we expect to maximize αwhen applying such boosted features to the best
classifier for this task.
Fig. 2. Full pipeline of credit scoring systems including our proposed Two-Step Feature Space
Transforming approach.
As can be seen in proposed model pipeline in Figure 2, our approach is composed
of four main steps, described as follows:
1. Data Enrichment: a series of additional features ˆ
Fare added to the original ones
in F, in order to include useful information for better credit scoring.
2. Data Discretization: once enriched, the features are now discretized to lie in a
given range, which is defined in the context of experiments done in the in-sample
part of the dataset.
3. Model Selection: chooses the model to use in the context of the credit score ma-
chine learning applications.
4. Classification: implements the classification algorithm to classify new instances ˆ
into reliable or unreliable.
We discuss each of the above-mentioned steps of our proposed method in the fol-
lowing subsections.
3.1 Data Enrichment
As discussed previously, several works in the literature have pointed out that trans-
forming features can improve the data domain, thus benefiting any machine learning
technique able to discriminate them into disjoint classes. One specific kind of features
transformation is adding meta-features [60]. Such transformation is commonly used in
a machine learning research branch called meta-learning [61]. Such additional features
are composed of summarizing or reusing the existing ones, by calculating values such
as the minimum, maximum, mean value, among others. Such values can be calculated
at each vector domain, or considering all vectors in a matrix of features.
In our proposed method, we use these meta-features in order to balance the loss of
information caused by the data heterogeneity issue present in credit scoring datasets,
adding further data created to boost the characterization of features Finto the reliable
or unreliable classes of interest. Formally, given the set of features F={f1,f2,..., fN},
we add MF ={m fN+1,m f N+2,...,m fN+Z}new meta-features, obtaining the new set
of features shown in Equation 2.
F={f1,f2,..., fN,m fN+1,m fN+2,...,m fN+Z}.(2)
Therefore, we chose for our proposed method Z=|MF |=4 or, in other words,
we add to the original features four additional meta-features. These meta-features have
been calculated feature vector-wise and are the following: Minimum value (min), Max-
imum value (max), Mean (mean), and Standard Deviation (std), then we have MF =
{min,max,mean,std}. By adding more insightful data to the original feature set, this
new process minimizes the pattern reduction effects that are normally present in the het-
erogeneous nature of credit scoring data. Such additional features are better formalized
in a parameter uin Equation 3
min =min(f1,f2,..., fN)
max =max(f1,f2,..., fN)
mean =1
std =q1
3.2 Data Discretization
The data discretization process is commonly used in machine learning algorithms as a
way of data transform [62]. It focuses on transforming the features by dividing each of
them into a discrete number that falls in independent intervals. It means that numerical
features, being discrete or continuous, will be mapped to lie in one of these intervals,
standardizing the whole set of original features. Such a procedure was proven to boost
the performance of many machine learning models [63, 64].
Although the fact that, in one hand, the process of discretization comes with the
drawback of filtering some sort of additional information gathered from the meta-
features in the previous step of our method, it comes with the advantage of understand-
ability, which comes from the conversion of the continuous space to a more limited
(discrete) space [62], which guides a faster and precise learning [63]. Figure 3 shows
one example of discretizing six feature values in the continuous range {0,...,150}into
discrete values in the discrete range {0,1,...,15}.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
66 70 74 78 86 88
Discretization range
Continuous val ues
Fig. 3. Discretization process example of continuous features.
In our approach, each of the features fFin the enriched ˆ
Sfrom the previous step
are transformed through discretization. This is done to move the original continuous
range to a defined discrete range [0,1,...,r]Z, where ris found experimentally as
will be described later in this work. By defining the discretization procedure fr
d, we
operate in order transform each fFinto one of the values in the discrete range of
integers d= [1,2,...,r]. Such a process reduces significantly the number of possible
different patterns in each fF, as shown in Equation 4.
{f1,f2, . . . , fN}r
→ {d1,d2,...,dN},ˆsˆ
3.3 Model Selection
The following step of the credit scoring pipeline chooses the model to be applied in
preprocessed features by our TSFST approach. According to the uand rparameters
from the enrichment and and discretization phases of our approach respectively, a new
set of features T SF ST (S)is formalized as shown in Equation 5.
T SF ST (S) =
d1,1d1,2. . . d1,Nm fu(1,1)m f u(1,2)m fu(1,3)m fu(1,4)
d2,1d2,2. . . d2,Nm fu(2,1)m f u(2,2)m fu(2,3)m fu(2,4)
dX,1dX,2. . . dX,Nm fu(X,1)m fu(X,2)m fu(X,3)m fu(X,4)
Such features are used both in the training and evaluation steps of the model. The
model chosen in our TSFST model is the Gradient Boosting (GB) algorithm [65]. We
chose this algorithm mainly because it follows the idea of boosting. In other words,
the classifier is composed initially of weak learning models that keep only observations
these models successfully classified. Then, a new learner is created and trained on the
set of data that was poorly classified before. Decision trees are usually used in GB.
Performance experiments done in the in-sample part of each dataset assess the choice
of such a model, as will be discussed with further details later in this work.
3.4 Data Classification
The last step of our approach applies the new T SFST -based classifier in the evaluation
(unknown) data, in order to maximize the αmetric discussed in Equation 1. For that,
we consider again Sas the classified samples, which will be the training (or known)
samples, but now we also define ¯
S, which is a new set of unclassified (or unknown)
samples to be evaluated in the TS FST classifier. Such a procedure is done as shown in
Algorithm 1.
Algorithm 1 TSFST classification pipeline
Input: cla=classifier, S=classified (training) instances, ¯
S=unclassified instances, u=meta-features to calculate, r=upper
bound of the discretization process.
Output: out=Classification of instances in ¯
2: MF getM etaFeatures(S,u)Step #1 (enrichment) of TSFST model in the training data
3: Sconcat(S,MF )Concat original data with meta features found
4: ˆ
S=getDiscretized Features(S,r)Step #2 (discretization) of TSFP model in the training data
5: model Cl assi f ierT raining(alg,ˆ
S)Classifier training using the TSFST transformed training data
6: MF 0getMetaFeatures(¯
S,u)Repeat TSFST procedure in the testing samples
7: ¯
S,MF 0)
8: ˆ
S=getDiscretized Features(¯
9: for each ˆ
Sdo Classifier evaluation in each TSFST transformed testing sample
10: cclassi f y(model ,ˆ
11: out.add (c)
12: end for
13: return out
14: end procedure
In the step 1 of this algorithm, the following parameters are used as input: (i) the
classification algorithm cla to be trained and tested using the T SF ST feature set; (ii) the
training classified data Sin its original format; (iii) the new instances to be classified ¯
also in their original format; and (iv) T SFS T parameters, such as the meta-features uto
be used in the enhancement step and the upper bound rto be used in the discretization
step. The data transformation related to our T SF ST approach is performed for sets of
training data Sand testing data ¯
Sat steps 2-4 and 6-8 respectively, and the transformed
data ˆ
Sof the training set trains the model cla at step 5. The classification process is
performed at steps 9-12 for each instance ˆ
¯sin the transformed testing samples set ˆ
with final classifications stored in the out vector. At the end of the process, classification
labels generated by our proposed boosted classifier are returned by the algorithm at step
In order to evaluate the impact in terms of response-time of the proposed approach
in a real-time scoring system, we evaluated the asymptotic time complexity of the pro-
posed Algorithm 1 in terms of big-O notation. According to the formal notation pro-
vided in Section 3, we can do the following observations:
(i) the complexity of the steps 2-4 and 6-8 is O(N), since our T SF ST data transfor-
mation performs a discretization of the original feature values Fat N· |F|times,
after several meta-features are added to them;
(ii) the complexity of the step 5depends on the adopted algorithms, which in our case
is the Gradient Boosting, an algorithm characterized by a training complexity of
O(N· |F| · π), where πdenotes the number of used trees;
(iii) the complexity of the cycle in the steps 9-12 is O(N2), since it involves the pre-
diction complexity of Gradient Boosting (i.e.,O(N·π)) for each instance in the
set ¯
On the basis of the aforementioned observations, we can express the asymptotic
time complexity of the algorithm as O(N2), an asymptotic time complexity that can be
reduced by distributing the process over different machines, by employing large scale
distributed computing models (e.g. MapReduce [66, 67]).
4 Experimental Setup
In this section, we discuss all the experimental environment we considered to perform
credit scoring experiments. In the following subsections we discuss: (i) the datasets con-
sidered; (ii) the metrics used to assess performances; (iii) methodology used to evaluate
the methods; and (iv) models considered and implementation aspects of the proposed
TSFST method.
4.1 Datasets
We consider three datasets to evaluate our approach: (i) the Australian Credit Approval
(AC); (ii) the German Credit (GC); and (iii) the Default of Credit Card Clients (DC).
These datasets represent three real-world data, characterized by a different number of
instances and features, and also a different level of data unbalance. Such datasets are
publicly available2, and previous works the literature have used them to benchmark
their approaches. Such data distribution is described in Table 1.
Table 1. Datasets information
Dataset Total Reliable Unreliable Number of Number of Reliable/unreliable
name instances instances instances features classes instances
|S| |S+| |S| |F| |C|(%)
AC 690 307 383 14 2 44.50 / 55.50
GC 1,000 700 300 21 2 70.00 / 30.00
DC 30,000 23,364 6,636 23 2 77.88 / 22.12
The AC dataset is composed of 690 instances, of which 307 classified as reliable
(44.50%) and 387 classified as unreliable (55.50%), and each instance is composed of
14 features, as detailed in Table 2. For data confidentiality reasons, feature names and
values have been changed to meaningless symbols.
Table 2. Features of AC Dataset
Field Type Field Type
01 Categorical field 08 Categorical field
02 Continuous field 09 Categorical field
03 Continuous field 10 Continuous field
04 Categorical field 11 Categorical field
05 Categorical field 12 Categorical field
06 Categorical field 13 Continuous field
07 Continuous field 14 Continuous field
The GC dataset is composed of 1,000 instances, of which 700 classified as reliable
(70.00%) and 300 classified as unreliable (30.00%), and each instance is composed of
20 features, as detailed in Table 3 below.
Table 3. Features of GC Dataset [19]
Field Feature Field Feature
01 Status of checking account 11 Present residence since
02 Duration 12 Property
03 Credit history 13 Age
04 Purpose 14 Other installment plans
05 Credit amount 15 Housing
06 Savings account/bonds 16 Existing credits
07 Present employment since 17 Job
08 Installment rate 18 Maintained people
09 Personal status and sex 19 Telephone
10 Other debtors/guarantors 20 Foreign worker
Finally, the DC dataset is composed of 30,000 instances, of which 23,364 classified
as reliable (77.88%) and 6,636 classified as unreliable (22.12%), and each instance is
composed of 23 features, as detailed in Table 4.
4.2 Metrics
The literature in machine learning has been investigating several different metrics through
the last decades, in order to find criteria suitable for a correct performance evaluation
of credit scoring models [68]. In [69], several metrics based on confusion matrix were
considered, such as Accuracy,True Positive Rate (TPR),Specificity, or the Matthews
Correlation Coefficient (MCC). Authors in [70] choose metrics based on the error anal-
ysis, such as the Mean Square Error (MSE), the Root Mean Square Error (RMSE) or
the Mean Absolute Error (MAE). Finally there are also some works like in [71] that
evaluate metrics based on the Receiver Operating Characteristic (ROC) curve, such as
the Area Under the ROC Curve (AUC). Considering that some of these metrics do not
Table 4. Features of DC Dataset [19]
Field Feature Field Feature
01 Credit amount 13 Bill statement in August 2005
02 Gender 14 Bill statement in July 2005
03 Education 15 Bill statement in June 2005
04 Marital status 16 Bill statement in May 2005
05 Age 17 Bill statement in April 2005
06 Repayments in September 2005 18 Amount paid in September 2005
07 Repayments in August 2005 19 Amount paid in August 2005
08 Repayments in July 2005 20 Amount paid in July 2005
09 Repayments in June 2005 21 Amount paid in June 2005
10 Repayments in May 2005 22 Amount paid in May 2005
11 Repayments in April 2005 23 Amount paid in April 2005
12 Bill statement in September 2005
work well with unbalanced datasets, like, for example, the metrics based on on the con-
fusion matrix, many works in literature have been addressing the problem of unbalanced
datasets by adopting more than one metric to correctly evaluate their results [72].
In our work, we choose to follow that direction, adopting a hybrid strategy to mea-
sure the performance of the tested approaches. Our metrics chosen are based on confu-
sion matrix results and ROC curve calculation and are described in the following:
True Positive Rate Given TP the number of instances correctly classified as unreliable,
and FN the number of unreliable instances wrongly classified as reliable, the True
Positive Rate (TPR) measures the rate of correct classification of unreliable users in a
credit scoring model min any test set S, as can be shown in Equation 6:
T PRm(S) = T P
(T P +FN).(6)
Such a metric, also known as Sensitivity, indicates the proportion of instances from
the positive class that are correctly classified by an evaluation model, according to the
different classes of a given problem [36].
Matthews Correlation Coefficient The Matthews Correlation Coefficient (MCC) is
suitable for unbalanced problems [73, 74] as it does a balanced evaluation of perfor-
mance. Its formalization, shown in Equation 7, results in a value in the range [1,+1],
with +1 when all the classifications are correct and 1 otherwise, whereas 0 indicates
the performance related to a random predictor. The MCC of a model mthat classifies
any new set Sis calculated as:
MCCm(S) = (T P ·T N)(F P ·FN)
p(T P +FP)·(T P +FN)·(T N +F P)·(T N +F N).(7)
It should be observed that MCC can be seen as a discretization of the Pearson
correlation [75] for binary variables.
AUC The Area Under the Receiver Operating Characteristic curve (AUC) represents
a reliable metric for the evaluation of the performance related to a credit scoring model
[76, 69]. To calculate such a metric, the Receiver Operating Characteristic (ROC) curve
is firstly built by plotting the Sensitivity and the False Positive Rate at different classi-
fication thresholds and, finally, the area under that curve is calculated.
The AUC metric returns a value in the range [0,1], where 1 denotes the best perfor-
mance. AUC is a metric able to assess the predictive capability of an evaluation model,
even in the presence of unbalanced data [76].
Performance This metric is used in the context of our work in order to compare the
several classifiers performance. It is calculated by summarizing all metrics presented
before in all datasets in just one final metric. Considering ζthe number of datasets in
the experiments, the final performance Pof a method min any set Sis calculated as:
Pm(S) =
T PR(S)z+AUC(S)z+MCC(S)z
We also calculate the performance for a model min a single dataset zas follows:
Pm,z(S) = T PR(S)z+AUC(S)z+MCC(S)z
Such a metric also returns a value in the range [0,1], as it comes from three metrics
in the same values range.
4.3 Experimental Methodology
We choose an evaluation criterion that divides each dataset into two pieces: (i) the (in-
sample), used to identify the best model to use to compare with and without our T SF ST
method and the best parameters of our method; and (ii) the (out-of-sample), which we
use for final evaluation. Such a strategy allows the correct evaluation of the results
by preventing the algorithm from yielding results biased by over-fitting [20]. Such an
evaluation procedure has also been followed by other works in the literature [77].
For this reason, each of the adopted datasets has been divided into an in-sample
part, containing 80% of the dataset, and an out-of-sample part, containing the remaining
20%. We opt for such a data split to follow some works in literature [78–80]. In addition,
with the aim to further reduce the impact of the data dependency, we have adopted a
k-fold cross-validation criterion (k=5) inside each in-sample subset. Information about
these subsets are reported in Table 5.
4.4 Considered Models and Implementation Details
In order to evaluate the qualities of the proposed transforming approach, we consider
several models, represented by machine learning classifiers, in order to select the best
one to be used in our approach and to compare its performance before and after our
T SF ST approach is considered in that model. For this task, we have taken into account
Table 5. In-sample and out-of-sample datasets information
Dataset In-sample Out-of-sample
name Reliable % Unreliable % Reliable % Unreliable %
AC 124 45.0 152 55.0 125 45.5 150 54.5
GC 292 73.0 108 27.0 268 67.2 131 32.8
DC 9307 77.5 2693 22.5 9404 78.4 2595 21.6
the following machine learning algorithms widely used in the credit scoring literature:
(i) Gradient Boosting (GB) [81]; (ii) Adaptive Boosting (AD) [82]; (iii) Random Forests
(RF) [83]; (iv) Multilayer Perceptron (MLP) [84]; and (v) Decision Tree (DT) [85].
The code related to the experiments was created with Python using the scikit-learn3
library. For the discretization process, we used the np.digitize() function, which con-
verts the features to a discrete space according to where each feature value is located
in an interval of bins. Such bins are defined as bins ={0,1,...,r2,r1}, where
ris calculated experimentally (we show how we find rlater in this section). In order
to keep the experiments reproducible, we have fixed the seed of the pseudo-random
number generator to 1. In our proposed method, we fixed |u|=4, calculating the four
meta-features described in section 3.
5 Experimental Results
To validate our proposed approach, we performed an extensive series of experiments.
We classified the experiments as follows:
1. Experiments performed in the in-sample part of each dataset: used to assess the
benefits of our approach according to several configurations of parameters in credit
scoring. For that, we average results of a five-fold cross validation.
2. Experiments performed in the out-of-sample part of each dataset: used to compare
our approach with some baselines in real credit scoring. For this experiment, we
used the in-sample part to train and unknown out-of-sample data to test.
5.1 In-sample Experiments
In this set of experiments which uses cross validation in the in-sample part of each
dataset, we choose the following evaluation scenarios:
1. We evaluate the advantages, in terms of performance, of the adoption of some
canonical data preprocessing techniques as input to our data transform approach;
2. We report results in order to find the best parameter rof the discretization step of
our proposed approach.
We discuss the results of these experiments as follows.
Preprocessing Benchmarking The literature has been strongly suggesting the use of
several preprocessing techniques [86, 87] to organize better the data distribution as to
training better and boosting machine learning algorithms performance. One straightfor-
ward way of doing this is to put feature values in the same range of values, therefore, we
decided to verify the performance improvement related to the adoption of two largely
used preprocessing methods: normalization and standardization. In the normalization
process, each feature fFis scaled into the range [0,1], whereas the standardization
(also known as Z-score normalization) re-scales the feature values in such a way that
they assume the properties of a Gaussian distribution, with mean equals to zero and
standard deviation equals to one.
Performance results are shown in Table 6, which reports the mean performance of
the five fold cross validation (i.e., related to the Accuracy,MCC, and AUC metrics)
measured in all datasets and all algorithms after the application of the aforementioned
methods of data preprocessing, along to that measured without any data preprocessing.
Premising that the best performances are highlighted in bold, and all the experiments
involve only the in-sample part of each dataset, on the basis of the obtained results, we
can do the following observations:
- the data normalization and standardization processes do not lead toward significant
improvements, since 7times out of 15 (against 4out of 15 and 4out of 15) we obtain
a better performance without using any canonical data preprocessing.
- in the context of the experiments performed without a data preprocessing, Gradient
Boosting (GB) shows to be the best algorithm between those taken into account, since
it gets the better mean performance on all datasets (i.e.,0.6574 against 0.6431 of ADA,
0.6388 of RFA,0.5317 of MLP, and 0.6147 of DTC);
- for the aforementioned reasons we decided to not apply any method of data prepro-
cessing, using Gradient Boosting as reference algorithm to evaluate our approach
Table 6. Average performance with preprocessing
Algorithm Dataset Non-preprocessed Normalized Standardized
GBC AC 0.8018 0.8005 0.8000
ADA AC 0.7495 0.6735 0.7179
RFA AC 0.8011 0.7505 0.8120
MLP AC 0.5225 0.8079 0.8073
DTC AC 0.7662 0.7093 0.7690
GBC GC 0.5614 0.5942 0.6007
ADA GC 0.5766 0.6246 0.5861
RFA GC 0.5540 0.5614 0.5579
MLP GC 0.6114 0.5649 0.5589
DTC GC 0.5796 0.5456 0.5521
GBC DC 0.6087 0.5442 0.6076
ADA DC 0.6031 0.5361 0.5980
RFA DC 0.5613 0.4909 0.5586
MLP DC 0.4613 0.6177 0.5985
DTC DC 0.4982 0.4572 0.5185
Best cases 7 4 4
Discretization Range Experiments The goal related to this set of experiments is the
definition of the optimal range of discretization rto use in the context of the selected
classification algorithm. Figure 4 reports the obtained results in terms of the perfor-
mance metric for each dataset. Such results indicate 106,25, and 187 as optimal r
values for the AC,GC, and DC datasets respectively.
050 100 150 200
r(AC Dataset)
050 100 150 200
r(GC Dataset)
050 100 150 200
r(DC Dataset)
Fig. 4. In-sample rValue Definition
5.2 Out-of-sample Experiments
Now that we found the discretization parameter of our proposed approach, we focus
our attention on discussing the more realistic scenario of credit scoring. For that we
perform testing on unseen out-of-sample part of the dataset, comparing the effective-
ness of such approach with other competitors. We apply the algorithm and the rvalue
detected through the previous experiments in order to evaluate the capability of the
proposed T SF ST model with regard to a canonical data model (GB), based on the orig-
inal feature space. The analysis of the experimental results shown in Figure 5 leads us
toward the following considerations:
1. as shown in Figure 5, the proposed TSFST model outperforms its competitor in
terms of TPR,MCC, and AUC, in all the datasets, except for a single case (i.e.,TPR
in the DC dataset);
Metrics (AC Dataset)
Canonical model TSFST
0.7625 0.7719
Metrics (GC Dataset)
Metrics (DC Dataset)
Fig. 5. Out-of-sample classification results, comparing GB with and without our proposed T SF ST
2. although it does not outperform its competitor in terms of TPR in the DC dataset,
its better performance in terms of MCC, and AUC indicates that the best competitor
value has produced a greater number of false positives and/or negatives;
3. for the same reason above, the better performance of our approach in terms of TPR
can not be considered a side effect related to the increase in terms of false positive
rate and/or false negative rate, since we also outperform the competitor in terms of
MCC and AUC;
4. considering that the AC,GC, and DC datasets are different in terms of data size,
level of balancing, and number of features, the obtained results prove the effective-
ness of the proposed approach in heterogeneous credit scoring contexts;
5. the adopted validation method, based on the in-sample/out-of-sample strategy, com-
bined with a k-fold cross-validation criterion, proves the real effectiveness of the
proposed approach, since the performance has been evaluated on data never used
before, avoiding over-fitting;
In summary, the experimental results have proved that the proposed approach im-
proves the performance of a machine learning algorithm in the credit scoring context,
allowing us its exploitation in several state-of-the-art approaches.
6 Conclusion
The growth of credit in economy nowadays has required scoring tools in order to allow
reliable loans in complex scenarios. Such an opportunity has led an increasing num-
ber of research focusing on proposing new methods and strategies. Notwithstanding,
similarly to other applications such as fraud detection or intrusion detection, a natural
imbalanced distribution of data among classes of interest is commonly found in credit
scoring datasets. Such a limitation raises issues in models that could be biased in always
classifying samples as the class they have more access in their training. In a such sce-
nario, a slight performance improvement of a classification model produces enormous
advantages, which in our case are related to the reduction of financial losses.
In this work, we report a new research inspired by our previous findings [19]. We
propose a method composed of a twofold transforming process in credit scoring data,
which acts by transforming the features through adding meta-features and also discretiz-
ing the resulting new feature space. From our experiments, we could raise the following
conclusions; (i) our approach boosts classifiers that use original features; (ii) it is able
to improve the performance of the machine learning algorithms; (iii) our approach fits
better in boosted-based classifiers such as gradient boosting. Such findings open new
perspectives for the definition of more effective credit scoring solutions, considering
that many state-of-the-art approaches are based on machine learning algorithms.
As future work, we envision to validate the performance of the proposed data model
in the context of credit scoring solutions that implement more than a single machine
learning algorithm, such as, for example, homogeneous and heterogeneous ensemble
approaches. By achieving good results in this new modelling scenario, we believe we
can achieve a more real world solution for credit scoring.
1. Economics, T.: Euro area consumer credit.
area/consumer-credit?continent=europe (2019)
2. Economics, T.: Euro area consumer spending.
area/consumer-spending?continent=europe (2019)
3. Siddiqi, N.: Intelligent credit scoring: Building and implementing better credit risk score-
cards. John Wiley & Sons (2017)
4. Mester, L.J., et al.: Whats the point of credit scoring? Business review 3(1997) 3–16
5. Hassan, M.K., Brodmann, J., Rayfield, B., Huda, M.: Modeling credit risk in credit unions
using survival analysis. International Journal of Bank Marketing 36(3) (2018) 482–495
6. Dal Pozzolo, A., Caelen, O., Le Borgne, Y.A., Waterschoot, S., Bontempi, G.: Learned
lessons in credit card fraud detection from a practitioner perspective. Expert systems with
applications 41(10) (2014) 4915–4928
7. Saia, R., Carta, S., et al.: A frequency-domain-based pattern mining for credit card fraud
detection. In: IoTBDS. (2017) 386–391
8. Saia, R.: A discrete wavelet transform approach to fraud detection. In: International Confer-
ence on Network and System Security, Springer (2017) 464–474
9. Rodda, S., Erothi, U.S.R.: Class imbalance problem in the network intrusion detection sys-
tems. In: 2016 International Conference on Electrical, Electronics, and Optimization Tech-
niques (ICEEOT), IEEE (2016) 2685–2688
10. Saia, R., Carta, S., Recupero, D.R.: A probabilistic-driven ensemble approach to perform
event classification in intrusion detection system. In: KDIR, SciTePress (2018) 139–146
11. Khemakhem, S., Ben Said, F., Boujelbene, Y.: Credit risk assessment for unbalanced datasets
based on data mining, artificial neural network and support vector machines. Journal of
Modelling in Management 13(4) (2018) 932–951
12. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-
imbalanced data: Review of methods and applications. Expert Systems with Applications 73
(2017) 220–239
13. Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring
models. In: KDIR. (2016) 111–120
14. Saia, R., Carta, S.: Evaluating credit card transactions in the frequency domain for a proactive
fraud detection approach. In: SECRYPT, SciTePress (2017) 335–342
15. Saia, R., Carta, S.: Introducing a vector space model to perform a proactive credit scoring.
In: International Joint Conference on Knowledge Discovery, Knowledge Engineering, and
Knowledge Management, Springer (2016) 125–148
16. Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: International Confer-
ence on Research and Practical Issues of Enterprise Information Systems, Springer (2016)
17. Saia, R., Carta, S.: A fourier spectral pattern analysis to design credit scoring models. In:
Proceedings of the 1st International Conference on Internet of Things and Machine Learning,
ACM (2017) 18
18. Saia, R., Carta, S., Fenu, G.: A wavelet-based data analysis to credit scoring. In: Proceedings
of the 2nd International Conference on Digital Signal Processing, ACM (2018) 176–180
19. Saia, R., Carta, S., Recupero, D.R., Fenu, G., Saia, M.: A discretized enriched technique
to enhance machine learning performance in credit scoring. In: KDIR, ScitePress (2019)
20. Hawkins, D.M.: The problem of overfitting. Journal of chemical information and computer
sciences 44(1) (2004) 1–12
21. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: Machine learning techniques
applied to financial market prediction. Expert Systems with Applications (2019)
22. Crook, J.N., Edelman, D.B., Thomas, L.C.: Recent developments in consumer credit risk
assessment. European Journal of Operational Research 183(3) (2007) 1447–1465
23. Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of
lending to consumers. International Journal of Forecasting 16(2) (2000) 149 – 172
24. Chen, B., Zeng, W., Lin, Y.: Applications of artificial intelligence technologies in credit
scoring: A survey of literature. In: International Conference on Natural Computation (ICNC).
(Aug 2014) 658–664
25. Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art clas-
sification algorithms for credit scoring: An update of research. European Journal of Opera-
tional Research 247(1) (2015) 124–136
26. Fang, F., Chen, Y.: A new approach for credit scoring by directly maximizing the kol-
mogorovsmirnov statistic. Computational Statistics & Data Analysis 133 (2019) 180 – 194
27. Sohn, S.Y., Kim, D.H., Yoon, J.H.: Technology credit scoring model with fuzzy logistic
regression. Applied Soft Computing 43 (2016) 150–158
28. Khemais, Z., Nesrine, D., Mohamed, M., et al.: Credit scoring and default risk prediction: A
comparative study between discriminant analysis & logistic regression. International Journal
of Economics and Finance 8(4) (2016) 39
29. Laha, A.: Developing credit scoring models with som and fuzzy rule based k-nn classifiers.
In: IEEE International Conference on Fuzzy Systems. (July 2006) 692–698
30. Zhang, X., Yang, Y., Zhou, Z.: A novel credit scoring model based on optimized random for-
est. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC).
(Jan 2018) 60–65
31. Maldonado, S., Peters, G., Weber, R.: Credit scoring using three-way decisions with proba-
bilistic rough sets. Information Sciences (2018)
32. Zhu, B., Yang, W., Wang, H., Yuan, Y.: A hybrid deep learning model for consumer credit
scoring. In: International Conference on Artificial Intelligence and Big Data (ICAIBD).
(May 2018) 205–208
33. Tian, Y., Yong, Z., Luo, J.: A new approach for reject inference in credit scoring using
kernel-free fuzzy quadratic surface support vector machines. Applied Soft Computing 73
(2018) 96 – 105
34. de Castro Vieira, J.R., Barboza, F., Sobreiro, V.A., Kimura, H.: Machine learning models for
credit analysis improvements: Predicting low-income families default. Applied Soft Com-
puting 83 (2019) 105640
35. Liu, C., Huang, H., Lu, S.: Research on personal credit scoring model based on artificial
intelligence. In: International Conference on Application of Intelligent Systems in Multi-
modal Information Analytics, Springer (2019) 466–473
36. Bequ´
e, A., Lessmann, S.: Extreme learning machines for credit scoring: An empirical eval-
uation. Expert Systems with Applications 86 (2017) 42–53
37. Pasila, F.: Credit scoring modeling of indonesian micro, small and medium enterprises using
neuro-fuzzy algorithm. In: IEEE International Conference on Fuzzy Systems. (June 2019)
38. Neagoe, V., Ciotec, A., Cucu, G.: Deep convolutional neural networks versus multilayer per-
ceptron for financial prediction. In: International Conference on Communications (COMM).
(June 2018) 201–206
39. Ala’raj, M., Abbod, M.F.: A new hybrid ensemble credit scoring model based on classifiers
consensus system approach. Expert Systems with Applications 64 (2016) 36–55
40. Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood
rough set and multi-layer ensemble classification. Journal of Intelligent & Fuzzy Systems
34(3) (2018) 1543–1549
41. Lpez, J., Maldonado, S.: Profit-based credit scoring based on robust optimization and feature
selection. Information Sciences 500 (2019) 190 – 202
42. Guo, S., He, H., Huang, X.: A multi-stage self-adaptive classifier ensemble model with
application in credit scoring. IEEE Access 7(2019) 78549–78559
43. Zhang, H., He, H., Zhang, W.: Classifier selection and clustering with fuzzy assignment in
ensemble model for credit scoring. Neurocomputing 316 (2018) 210 – 221
44. Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit
scoring using soft probability. Applied Soft Computing 65 (2018) 139 – 151
45. Tripathi, D., Edla, D.R., Kuppili, V., Bablani, A., Dharavath, R.: Credit scoring model based
on weighted voting and cluster based feature selection. Procedia Computer Science 132
(2018) 22 – 31 International Conference on Computational Intelligence and Data Science.
46. Vedala, R., Kumar, B.R.: An application of naive bayes classification for credit scoring in e-
lending platform. In: International Conference on Data Science Engineering (ICDSE). (July
2012) 81–84
47. Sewwandi, D., Perera, K., Sandaruwan, S., Lakchani, O., Nugaliyadde, A., Thelijjagoda,
S.: Linguistic features based personality recognition using social media data. In: 2017 6th
National Conference on Technology and Management (NCTM). (Jan 2017) 63–68
48. Sun, X., Liu, B., Cao, J., Luo, J., Shen, X.: Who am i? personality detection based on deep
learning for texts. In: IEEE International Conference on Communications (ICC). (May 2018)
49. Boratto, L., Carta, S., Fenu, G., Saia, R.: Using neural word embeddings to model user
behavior and detect user segments. Knowledge-based systems 108 (2016) 5–14
50. Zhao, Y., Shen, Y., Huang, Y.: Dmdp: A dynamic multi-source default probability prediction
framework. Data Science and Engineering 4(1) (2019) 3–13
51. L´
opez, R.F., Ramon-Jeronimo, J.M.: Modelling credit risk with scarce default data: on the
suitability of cooperative bootstrapped strategies for small low-default portfolios. JORS
65(3) (2014) 416–434
52. Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recom-
mender systems. Expert Syst. Appl. 41(4) (2014) 2065–2073
53. Son, L.H.: Dealing with the new user cold-start problem in recommender systems: A com-
parative review. Inf. Syst. 58 (2016) 87–104
54. Fern´
ıas, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and
diversity in cross-domain recommendations for cold-start users with positive-only feedback.
In Sen, S., Geyer, W., Freyne, J., Castells, P., eds.: Proceedings of the 10th ACM Conference
on Recommender Systems, Boston, MA, USA, September 15-19, 2016, ACM (2016) 119–
55. Attenberg, J., Provost, F.J.: Inactive learning?: difficulties employing active learning in prac-
tice. SIGKDD Explorations 12(2) (2010) 36–41
56. Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer
relationship management. Journal of Computer and Mathematical Sciences Vol 2(3) (2011)
57. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9)
(2009) 1263–1284
58. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent
Data Analysis 6(5) (2002) 429–449
59. Chatterjee, A., Segev, A.: Data manipulation in heterogeneous databases. ACM SIGMOD
Record 20(4) (1991) 64–68
60. Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning.
Machine learning 54(3) (2004) 187–193
61. Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artificial intelligence
review 18(2) (2002) 77–95
62. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data
mining and knowledge discovery 6(4) (2002) 393–423
63. Garc´
ıa, S., Ram´
ırez-Gallego, S., Luengo, J., Ben´
ıtez, J.M., Herrera, F.: Big data preprocess-
ing: methods and prospects. Big Data Analytics 1(1) (2016) 9
64. Wu, X., Kumar, V.: The top ten algorithms in data mining. CRC press (2009)
65. Breiman, L.: Random forests. Machine Learning 45(1) (2001) 5–32
66. Hashem, I.A.T., Anuar, N.B., Gani, A., Yaqoob, I., Xia, F., Khan, S.U.: Mapreduce: Review
and open challenges. Scientometrics 109(1) (2016) 389–422
67. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun.
ACM 51(1) (2008) 107–113
68. Chen, N., Ribeiro, B., Chen, A.: Financial credit risk assessment: a recent review. Artificial
Intelligence Review 45(1) (2016) 1–23
69. Powers, D.: Evaluation: From precision, recall and f-factor to roc, informedness, markedness
& correlation. Mach. Learn. Technol. 2(01 2008)
70. Chai, T., Draxler, R.R.: Root mean square error (rmse) or mean absolute error (mae)?–
arguments against avoiding rmse in the literature. Geoscientific model development 7(3)
(2014) 1247–1250
71. Huang, J., Ling, C.X.: Using auc and accuracy in evaluating learning algorithms. IEEE
Transactions on knowledge and Data Engineering 17(3) (2005) 299–310
72. Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data–recommendations for the use
of performance metrics. In: 2013 Humaine Association Conference on Affective Computing
and Intelligent Interaction, IEEE (2013) 245–251
73. Luque, A., Carrasco, A., Mart´
ın, A., de las Heras, A.: The impact of class imbalance in
classification performance metrics based on the binary confusion matrix. Pattern Recognition
91 (2019) 216–231
74. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using
matthews correlation coefficient metric. PloS one 12(6) (2017) e0177678
75. Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. In: Noise reduc-
tion in speech processing. Springer (2009) 1–4
76. Abell´
an, J., Castellano, J.G.: A comparative study on base classifiers in ensemble methods
for credit scoring. Expert Systems with Applications 73 (2017) 1–10
77. Rapach, D.E., Wohar, M.E.: In-sample vs. out-of-sample tests of stock return predictability
in the context of data mining. Journal of Empirical Finance 13(2) (2006) 231–247
78. Cleary, S., Hebb, G.: An efficient and functional model for predicting bank distress: In and
out of sample evidence. Journal of Banking & Finance 64 (2016) 101–111
79. Adhikari, R.: A neural network based linear ensemble framework for time series forecasting.
Neurocomputing 157 (2015) 231–242
80. Tamadonejad, A., Abdul-Majid, M., Abdul-Rahman, A., Jusoh, M., Tabandeh, R.: Early
warning systems for banking crises? political and economic stability. Jurnal Ekonomi
Malaysia 50(2) (2016) 31–38
81. Chopra, A., Bhilare, P.: Application of ensemble models in credit scoring models. Business
Perspectives and Research 6(2) (2018) 129–141
82. Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using bayesian hyper-
parameter optimization for credit scoring. Expert Systems with Applications 78 (2017) 225–
83. Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests.
Expert Systems with Applications 42(10) (2015) 4621–4631
84. Luo, C., Wu, D., Wu, D.: A deep learning approach for credit scoring using credit default
swaps. Engineering Applications of Artificial Intelligence 65 (2017) 465–470
85. Damrongsakmethee, T., Neagoe, V.E.: Principal component analysis and relieff cascaded
with decision tree for credit scoring. In: Computer Science On-line Conference, Springer
(2019) 85–95
86. Ghodselahi, A.: A hybrid support vector machine ensemble model for credit scoring. Inter-
national Journal of Computer Applications 17(5) (2011) 1–5
87. Wang, C.M., Huang, Y.F.: Evolutionary-based feature selection approaches with new criteria
for data mining: A case study of credit approval data. Expert Systems with Applications
36(3) (2009) 5900–5908
... Some representative examples are the work of Santana et al. [59], where the authors combine fuzzy logic, neural networks and a variable population optimization technique, to obtain fuzzy classification rules, and another work from the same authors [40], where they define a method able to reduce the number of classification rules involved in the definition of the credit scoring predictive model, reducing the system decision time. Other representative works are that of Carta et al. [12], where the authors adopt a two-step feature space transforming method, with the aim to improve the credit scoring performance, or the work of Zhang et al. [69], where the authors propose a novel sparse multi-criteria optimization classifier based on one-norm regularization, linear and nonlinear programming, for the credit risk evaluation. ...
Full-text available
The credit scoring models are aimed to assess the capability of refunding a loan by assessing user reliability in several financial contexts, representing a crucial instrument for a large number of financial operators such as banks. Literature solutions offer many approaches designed to evaluate users' reliability on the basis of information about them, but they share some well-known problems that reduce their performance, such as data imbalance and heterogeneity. In order to face these problems, this paper introduces an ensemble stochastic criterion that operates in a discretized feature space, extended with some meta-features in order to perform efficient credit scoring. Such an approach uses several classification algorithms in such a way that the final classification is obtained by a stochastic criterion applied to a new feature space, obtained by a two-fold preprocessing technique. We validated the proposed approach by using real-world datasets with different data imbalance configurations, and the obtained results show that it outperforms some state-of-the-art solutions.
Conference Paper
Full-text available
The automated credit scoring tools play a crucial role in many financial environments, since they are able to perform a real-time evaluation of a user (e.g., a loan applicant) on the basis of several solvency criteria, without the aid of human operators. Such an automation allows who work and offer services in the financial area to take quick decisions with regard to different services, first and foremost those concerning the consumer credit, whose requests have exponentially increased over the last years. In order to face some well-known problems related to the state-of-the-art credit scoring approaches, this paper formalizes a novel data model that we called Discretized Enriched Data (DED), which operates by transforming the original feature space in order to improve the performance of the credit scoring machine learning algorithms. The idea behind the proposed DED model revolves around two processes, the first one aimed to reduce the number of feature patterns through a data discretization process, and the second one aimed to enrich the discretized data by adding several meta-features. The data discretization faces the problem of heterogeneity, which characterizes such a domain, whereas the data enrichment works on the related loss of information by adding meta-features that improve the data characterization. Our model has been evaluated in the context of real-world datasets with different sizes and levels of data unbalance, which are considered a benchmark in credit scoring literature. The obtained results indicate that it is able to improve the performance of one of the most performing machine learning algorithm largely used in this field, opening up new perspectives for the definition of more effective credit scoring solutions.
Full-text available
In this paper, we propose a dynamic forecasting framework, named DMDP (dynamic multi-source default probability prediction), to predict the default probability of a company. The default probability is a very important factor to assess the credit risk of listed companies on a stock market. Aiming at aiding financial institutions in decision making, our DMDP framework not only analyzes financial data to capture the historical performance of a company, but also utilizes long short-term memory model to dynamically incorporate daily news from social media to take the perceptions of market participants and public opinions into consideration. The study of this paper makes two key contributions. First, we make use of unstructured news crawled from social media to alleviate the impact of financial fraud issue made on default probability prediction. Second, we propose a neural network method to integrate both structured financial factors and unstructured social media data with appropriate time alignment for default probability prediction. Extensive experimental results demonstrate the effectiveness of DMDP in predicting default probability for the listed companies in mainland China, compared with various baselines.
In recent years, classification ensembles or multiple classifier systems have been widely applied to credit scoring, and they achieve significantly better performance than individual classifiers do. Selective ensembles, an important part of this group of systems, are a promising field of research. However, none of them considers the relative costs of Type I error and Type II error for credit scoring when selecting classifiers, which bring higher risks for the financial institutions. Moreover, earlier dynamic selective ensembles usually select and combine classifiers for each test sample dynamically based on classifiers’ performance in the validation set, regardless of their behaviors in the testing set. To fill the gap and overcome the limitations, we propose a new dynamic ensemble classification method for credit scoring based on soft probability. In this method, the classifiers are first selected based on their classification ability and the relative costs of Type I error and Type II error in the validation set. With the selected classifiers, we combine different classifiers for the samples in the testing set based on their classification results to get an interval probability of default by using soft probability. The proposed method is compared with some well-known individual classifiers and ensemble classification methods, including five selective ensembles, for credit scoring by using ten real-world data sets and seven performance indicators. Through these analyses and statistical tests, the experimental results demonstrate the ability and efficiency of the proposed method to improve prediction performance against the benchmark models.
The main objective of this study is to investigate the behaviour of default prediction models based on credit scoring methods and computational techniques with machine learning algorithms. The predictive capabilities of the models were compared to identify default-prediction mechanisms in the “My Home, My Life” Program (Programa “Minha Casa, Minha Vida” — PMCMV). The PMCMV is one of the largest government initiatives in the world to finance home ownership in the low-income population. Implemented by the Brazilian government, the programme has provided financing in excess of USD 84 billion and by 2016 had already contracted for the construction of over 4.5 million housing units, with 3.3 million units already delivered. The models developed in this study involve different time intervals for default prediction as well as analysis without the use of traditional discriminatory variables (gender, age, and marital status). Three measurements were used to evaluate the quality of the prediction models: area under the ROC curve, the Kolmogorov–Smirnov index, and the Brier score. The results indicated that (1) the accuracy of the models improves as the number of days overdue used to define the default variable increases; (2) the best prediction results were obtained with traditional ensemble techniques — in this case Bagging (BG), Random Forest (RF), and Boosting; and (3) there was a negative impact on all criteria when a smaller number of observations was used, especially on the type II error. It was also found that the discriminatory power of the credit risk rating system is preserved when removing discriminatory variables from the models. Applying the BG algorithm, which is the best prediction method, a default rate of 11.80% could be reduced to 2.95%, which leads to a selection that would result in 197,905 fewer delinquent contracts in the PMCMV, thus representing a savings of approximately USD 3.0 billion in credit losses.
In recent years, credit scoring has received wide attention from financial institutions, with the rating accuracy influencing both risk control and profitability to a considerable extent. This paper presents a novel multi-stage self-adaptive classifier ensemble model based on statistical techniques and machine learning techniques, to improve the prediction performance. First, the multi-step data preprocessing is employed to process the original data into standardized data and generate more representative features. Second, base classifiers can be self-adaptively selected from the candidate classifier repository according to their performance in datasets, and their parameters are optimized by Bayesian optimization algorithm. Third, the ensemble model is integrated through these optimized base classifiers, and it can generate new features through multi-layer stacking and obtain the classifier weights in the ensemble model through the particle swarm optimization. The proposed model is applied to credit scoring to test its prediction performance. In the experimental study, three real-world credit datasets and four evaluation indicators are adopted for performance evaluation. The results show that compared to single classifier and other ensemble classification methods, the proposed model has better performance and better data adaptability. It proves the reliability and practicability of the proposed model, and provides effective decision support for relevant financial institutions.
A novel framework for profit-based credit scoring is proposed in this work. The approach is based on robust optimization, which is designed for dealing with uncertainty in the data, and therefore is effective at classifying new samples that follow a slightly different distribution in relation to the original dataset used to create the model. Instead of minimizing a loss function based on statistical measures, the proposed method maximizes the profit of the credit scoring model, balancing the benefits and losses of granting credit with the variable acquisition costs. The reduction of these is performed using feature selection techniques embedded in the learning process. The robust approach results in four second order cone programming formulations, which can be solved efficiently using interior point algorithms. Experiments on two credit scoring datasets demonstrate the virtues of our approach in terms of its predictive performance, and the managerial insights that can be gained from it.
The objective of this paper is to propose a credit scoring approval model using a feature selection technique performed by Principal Component Analysis (PCA) and ReliefF algorithm followed by a decision tree classifier. As a reference classifier, we have chosen Support Vector Machine (SVM). The performance of our proposed model has been tested using the German credit dataset. The experimental results of the proposed signal processing cascade for the credit scoring lead to the best accuracy of 91.67%, while classifiers without feature selection show the best accuracy of only 75.35%. On the other side, using the same combination of feature selection (PCA and ReliefF) but cascaded with SVM classifier, one has obtained an accuracy of only 85.15%. The experimental results confirm the accuracy of the proposed model, and at the same time they show the importance of feature selection and its optimization for credit scoring decision systems.
Artificial intelligence is considered to be the technological commanding height of the next era. At present, after the development of China’s artificial intelligence industry ranks in the United States, its application in the financial field is also in a new stage of rapid development, and affects many aspects of the financial industry, thus strengthening its research is of great significance. The continuous development of artificial intelligence technology has been widely used in many aspects of financial services, which is of great significance for the realization of its modeling, standardization and intelligent development. However, there are still security risks hidden in the application, which requires attention to this. aspects of the research to identify effective measures for risk prevention, this paper analyzes the application of artificial intelligence in the financial sector in the personal credit score.