ChapterPDF Available

BAGGING AND BOOSTING CLASSIFIERS FOR CREDIT RISK EVALUATION

Authors:

Abstract and Figures

With developing technology, face-to-face interactions with customers have ceded their place to digital communication during loan applications. This transition saves time and costs regarding credit allocation. However, it also leads to more exposure to risky situations because it is harder to understand customer profiles without cultivating a personal relationship. Therefore, classifying loan applicants as good credit or bad credit before deciding to approve or reject a loan is more important than ever, and it directly affects the revenue and losses of a financial institution. This study aims to apply bagging and boosting (AdaBoost) ensemble learning techniques to the seven most widely used classifiers in credit risk evaluation, namely, naïve Bayes, k-nearest neighbours, logistic regression, ridge regression, support vector machines, decision trees and random forest. Even if some of them were previously used in the bagging and boosting techniques as weak learners, this study presents more comprehensive comparisons by using some weak learners, such as naïve Bayes and decision trees, but also some competent or strong learners, such as support vector machines and random forest. Hence, it will be possible to examine the effect of bagging and boosting algorithms on each classifier mentioned. For this purpose, three real- world credit datasets are used, each of which consists of a different sample size, number of features and degree of imbalance.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Extensive research has been performed by organizations and academics on models for credit scoring, an important financial management activity. With novel machine learning models continue to be proposed, ensemble learning has been introduced into the application of credit scoring, several researches have addressed the supremacy of ensemble learning. In this research, we provide a comparative performance evaluation of ensemble algorithms, i.e., random forest, AdaBoost, XGBoost, LightGBM and Stacking, in terms of accuracy (ACC), area under the curve (AUC), Kolmogorov–Smirnov statistic (KS), Brier score (BS), and model operating time in terms of credit scoring. Moreover, five popular baseline classifiers, i.e., neural network (NN), decision tree (DT), logistic regression (LR), Naïve Bayes (NB), and support vector machine (SVM) are considered to be benchmarks. Experimental findings reveal that the performance of ensemble learning is better than individual learners, except for AdaBoost. In addition, random forest has the best performance in terms of five metrics, XGBoost and LightGBM are close challengers. Among five baseline classifiers, logistic regression outperforms the other classifiers over the most of evaluation metrics. Finally, this study also analyzes reasons for the poor performance of some algorithms and give some suggestions on the choice of credit scoring models for financial institutions.
Article
Full-text available
Financial threats are displaying a trend about the credit risk of commercial banks as the incredible improvement in the financial industry has arisen. In this way, one of the biggest threats faces by commercial banks is the risk prediction of credit clients. Recent studies mostly focus on enhancing the classifier performance for credit card default prediction rather than an interpretable model. In classification problems, an imbalanced dataset is also crucial to improve the performance of the model because most of the cases lied in one class, and only a few examples are in other categories. Traditional statistical approaches are not suitable to deal with imbalanced data. In this study, a model is developed for credit default prediction by employing various credit-related datasets. There is often a significant difference between the minimum and maximum values in different features, so Min-Max normalization is used to scale the features within one range. Data level resampling techniques are employed to overcome the problem of the data imbalance. Various undersampling and oversampling methods are used to resolve the issue of class imbalance. Different machine learning models are also employed to obtain efficient results. We developed the hypothesis of whether developed models using different machine learning techniques are significantly the same or different and whether resampling techniques significantly improves the performance of the proposed models. One-way Analysis of Variance is a hypothesis-testing technique, used to test the significance of the results. The split method is utilized to validate the results in which data has split into training and test sets. The results on imbalanced datasets show the accuracy of 66.9% on Taiwan clients credit dataset, 70.7% on South German clients credit dataset, and 65% on Belgium clients credit dataset. Conversely, the results using our proposed methods significantly improve the accuracy of 89% on Taiwan clients credit dataset, 84.6% on South German clients credit dataset, and 87.1% on Belgium clients credit dataset. The results show that the performance of classifiers is better on the balanced dataset as compared to the imbalanced dataset. It is also observed that the performance of data oversampling techniques are better than undersampling techniques. Overall, the Gradient Boosted Decision Tree method performs better than other traditional machine learning classifiers. The Gradient Boosted Decision Tree method gives the best results while utilizing the K-means SMOTE oversampling method. Using one-way ANOVA, the null hypothesis was rejected by a p-value <0.001, hence confirming that the proposed model improved performance is statistical significance. The interpretable model is also deployed on the web to ease the different stakeholders. This model will help commercial banks, financial organizations, loan institutes, and other decision-makers to predict the loan defaulter earlier.
Article
Full-text available
Background: In the recent period, one could notice that more and more international companies implementing their strategies based on the concept of risk management. These companies when they evaluate and qualify their suppliers use the requirements of quality management standards, quality assurance standards (in particular sectors), safety and security management standards for supply chain, as well as business continuity management standards. This article aims to determine the importance of the risk factors affecting relations with suppliers. Methods: The research was carried out between October and November 2017 using the Computer Assisted Telephone Interview (CATI) technique. The study covered 300 producers from the automotive, metal and chemical sectors operating in the Polish B2B market. Results: The surveyed enterprises indicated as the most critical sources of threats in relations with suppliers: the possibility of untimely deliveries, quality defects of products, the financial situation of suppliers, communication problems (related to the understanding of the requirements by the supplier), low level of supply flexibility, product assortment errors, limited production capacity. Conclusions: Recapitulating the theoretical considerations and the results of empirical study, it can be stated that the role of the risk management concept in relations with suppliers is significant. Risk management is still essential to ensure the safety of purchased products as well as to ensure the continuity of processes and to avoid disruptions in supply chains.
Article
Full-text available
The purpose of this study is to propose ways for improving the current Logistics Performance Index published by the World Bank. The Logistics Performance Index is based on a global survey of logistics experts, which can be biased towards a subjective view on different countries' logistics systems, which leads to a potentially skewed rating. The authors propose a modified index that qualitatively and quantitatively represents an objective view of 159 countries' logistics systems and subsystems, based on international statistical data, which can be used as a benchmarking tool for governments.
Article
Full-text available
Background: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. Results: The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. Conclusions: In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.
Article
Developing accurate analytical credit scoring models has become a major focus for financial institutions. For this purpose, numerous classification algorithms have been proposed for credit scoring. However, the application of deep learning algorithms for classification has been largely ignored in the credit scoring literature. The main motivation for this research is to consider the appropriateness of deep learning algorithms for credit scoring. To this end two deep learning architectures are constructed, namely a multilayer perceptron network and a deep belief network, and their performance compared to that of two conventional methods and two ensemble methods for credit scoring. The models are then evaluated using a range of credit scoring data sets and performance measures. Furthermore, Bayesian statistical testing procedures are introduced in the context of credit scoring and compared to frequentist non-parametric testing procedures which have traditionally been considered best practice in credit scoring. This comparison will highlight the benefits of Bayesian statistical procedures and secure empirical findings. Two main conclusions emerge from comparing the different classification algorithms for credit scoring. Firstly, the ensemble method, XGBoost, is the best performing method for credit scoring of all the methods considered here. Secondly, deep neural networks do not outperform their shallower counterparts and are considerably more computationally expensive to construct. Therefore, deep learning algorithms do not seem to be appropriate models for credit scoring based on this comparison and XGBoost should be preferred over the other credit scoring methods considered here when classification performance is the main objective of credit scoring activities.
Article
In recent years, research has found that in many credit risk evaluation domains, deep learning is superior to traditional machine learning methods and classifier ensembles perform significantly better than single classifiers. However, credit evaluation model based on deep learning ensemble algorithm has rarely been studied. Moreover, credit data imbalance still challenges the performance of credit scoring models. Therefore, to go some way to filling this research gap, this study developed a new deep learning ensemble credit risk evaluation model to deal with imbalanced credit data. First, an improved synthetic minority oversampling technique (SMOTE) method was developed to overcome known SMOTE shortcomings, after which a new deep learning ensemble classification method combined with the long-short-term-memory (LSTM) network and the adaptive boosting (AdaBoost) algorithm was developed to train and learn the processed credit data. Then, area under the curve (AUC), the Kolmogorov–Smirnov (KS) and the non-parametric Wilcoxon test were employed to compare the performance of the proposed model and other widely used credit scoring models on two imbalanced credit datasets. The experimental test results indicated that the proposed deep learning ensemble model was generally more competitive when addressing imbalanced credit risk evaluation problems than other models.
Article
The concept of minimum display quantity (MDQ) is unavoidable in brick-and-mortar retailing format owing to which, retailers need to ensure a minimum level of inventory displayed at each store irrespective of the revenue or inventory turns generated by a particular store. It is observed that majority of bricks-and-mortar retailers in India assume; a) existing inventory management system is ideal to their store, b) software solutions record accurate inventory movement, c) involving store management team in inventory-related decision making is risky/biased and most importantly d) loss of sale due to stock-outs is inevitable. Such assumptions and widely followed practice have created a predisposition and mindset in-store managers and they believe that their store delivers revenue and profit to the best of its potential with the inventory which is made available to them through existing inventory management system and we cannot avoid the number of instances consumers are unsatisfied due to stockout situations. In this research, we have analysed the existing decision-making process and control systems related to inventory management of a select retailer, attempted to design a new framework, and applied the same through an experiment to evaluate the change in a) overall store profitability and b) inventory-related key performance indicators.
Article
Credit risk is the risk of financial loss when a borrower fails to meet the financial commitment. While there are many factors that constitute credit risk, due diligence while giving loan (credit scoring), continuous monitoring of customer payments and other behaviour patterns could reduce the probability of accumulating non-performing assets (NPA) and frauds. In the past few years, the quantum of NPAs and frauds have gone up significantly, and therefore it has become imperative that banks and financial institutions use robust mechanisms to predict the performance of loans. The past two decades has seen an immense growth in the area of artificial intelligence, most notably machine learning (ML) with improved access to internet, data, and compute. Whilst there are credit rating agencies and credit scoring companies that provide their analysis of a customer to banks on a fee, the researchers continue to explore various ML techniques to improve the accuracy level of credit risk evaluation. In this survey paper, we performed a systematic literature review on existing research methods and ML techniques for credit risk evaluation. We reviewed a total of 136 papers on credit risk evaluation published between 1993 and March 2019. We studied the implications of hyper parameters on ML techniques being used to evaluate credit risk and, analyzed the limitations of the current studies and research trends. We observed that Ensemble and Hybrid models with neural networks and SVM are being more adopted for credit scoring, NPA prediction and fraud detection. We also realized that lack of comprehensive public datasets continue to be an area of concern for researchers.