Bio-inspired credit risk analysis: Computational intelligence with support vector machines
Abstract
Credit risk analysis is one of the most important topics in the field of financial risk management. Due to recent financial crises and regulatory concern of Basel II, credit risk analysis has been the major focus of financial and banking industry. Especially for some credit-granting institutions such as commercial banks and credit companies, the ability to discriminate good customers from bad ones is crucial. The need for reliable quantitative models that predict defaults accurately is imperative so that the interested parties can take either preventive or corrective action. Hence credit risk analysis becomes very important for sustainability and profit of enterprises. In such backgrounds, this book tries to integrate recent emerging support vector machines and other computational intelligence techniques that replicate the principles of bio-inspired information processing to create some innovative methodologies for credit risk analysis and to provide decision support information for interested parties. © 2008 Springer-Verlag Berlin Heidelberg. All rights are reserved.
Chapters (11)
Credit risk assessment has become an increasingly important area for financial institutions, especially for banks and credit card companies. In the history of financial institutions, some biggest failures were related to credit risk, such as the 1974 failure of Herstatt Bank (Philippe, 2003). In recent years, many financial institutions suffered a great loss from a steady increase of defaults and bad loans from their counterparties. So, for the credit-granting institution, the ability to accurately discriminate the good counterparties and the bad ones has become crucial. In the credit industries, the quantitative credit scoring model has been developed for this task in many years, whose main idea is to classify the credit applicants to be good or bad according to their characters (age, income, job status, etc.) by the model built on the massive information on previous applicants’ characters and their subsequent performance. The rest of this chapter is organized as follows. Section 2.2 briefly introduces the SVM with the NPA algorithm. In Section 2.3, the parameter selection technology based on DOE is briefly discussed and the hybrid algorithm of NPA and the parameter selection is described. The results of the algorithm’s testing on a real-life dataset and comparisons with other methods are discussed in Section 2.4. Section 2.5 gives a short conclusion about the chapter.
With the rapid growth and increased competition in credit industry, credit risk evaluation is becoming more important for credit-granting institutions. A good credit risk evaluation tool can help them to grant credit to more creditworthy applicants thus increasing profits. Moreover, it can deny credit for the noncreditworthy applicants and thus decreasing losses. Currently the credit-granting institutions are paying much more attention to develop efficient and sophisticated tools to evaluate and control credit risks, which can help them to win more market shares without taking too much risk. In recent two decades, credit scoring is becoming one of the primary methods to develop a credit risk assessment tool.
The main purpose of this chapter is to propose the LSSVM-based credit scoring models with direct search method for parameters selection. The rest of this chapter is organized as follows. In Section 3.2, the LSSVM and DS methodology are described briefly. Section 3.3 presents a computational experiment to demonstrate the effectiveness and efficiency of the model and simultaneously we compared the performance between the DS and DOE, GA, and GS methods. Section 3.4 gives concluding remarks.
Different from the previous studies, a novel hybrid intelligent mining system is designed by hybridizing rough sets and support vector machines from a new perspective. In the proposed system, original information table is firstly reduced by rough sets from two-dimensional (attribute dimension and object dimension) reduction (2D-Reduction) view, and then support vector machines are used to extract typical features and to filter its noise and thus reduce the information table further. The goal of first step (i.e., 2D-Reduction) is to reduce the training burden and accelerate the learning process for support vector machines. Finally, the mined knowledge or classification rule sets are generated from the reduced information table by rough sets, rather than from the trained support vector machines. Therefore, the advantage of our proposed hybrid intelligent system is that it can overcome difficulty of extracting rules from a training support vector machine and possess the robustness which is lacking for rough set based approaches. To illustrate the effectiveness of the proposed system, two publicly credit datasets including both consumer and corporation credits are used.
The rest of the chapter is organized as follows. Section 4.2 describes some preliminaries about rough sets and support vector machine. In Section 4.3, the proposed hybrid intelligent mining system incorporating SVM into rough set is described and the algorithms to generate classification rules from information table are proposed. In Section 4.4, we compare and analyze some empirical results about two real-world credit datasets. Finally, some conclusions are drawn in Section 4.5.
In this chapter, we introduce a new credit risk classification technique, least squares fuzzy SVM (LS-FSVM), to discriminate good creditors from bad ones. The fuzzy SVM (FSVM) was first proposed by Lin and Wang (2002) and it has more suitability in credit risk assessment. The main reason is that in credit risk assessment areas we usually cannot label one customer as absolutely good who is sure to repay in time, or absolutely bad who will default certainly, the FSVM treats every sample as both positive and negative classes with the fuzzy membership. By this way the FSVM will have more generalization ability, while preserving the merit of insensitive to outliers. Although the FSVM has good generalization capability, the computational complexity of the existing FSVM is rather difficult because the final solution is derived from a quadratic programming (QP) problem. For reducing the complexity, this chapter proposes a least squares solution to FSVM. In the proposed model, we consider equality constraints instead of inequalities for the classification problem with a formulation in least squares sense. As a result the solutions follow directly from solving a set of linear equations, instead of QP from the classical FSVM approach (Lin and Wang, 2002), thus reducing the computational complexity relative to the classical FSVM. The main motivation of this chapter is to formulate a least squares version of FSVM for binary classification problems and to apply it to the credit risk evaluation field and meantime, to compare its performance with several typical credit risk assessment techniques.
The rest of this chapter is organized as follows. Section 5.2 illustrates the methodology formulation of LS-FSVM. In Section 5.3, we use a realworld credit dataset to test the classification potential of the LS-FSVM. Section 5.4 concludes the chapter.
Extant evidence shows that in the past two decades bankruptcies and defaults have occurred at higher rates than at any time. Due to recent financial crises and regulatory concerns, credit risk assessment is an area that has seen a resurgence of interest from both the academic world and the business community. Especially for credit-granting institutions, such as commercial banks and some credit card companies, the ability to discriminate faithful customers from bad ones is crucial. In order to enable the interested parties to take either preventive or corrective action, the need for efficient and reliable models that predict defaults accurately is imperative.
The general approach of credit risk analysis is to apply a classification technique on similar data of previous customers – both faithful and delinquent customers – in order to find a relation between the characteristics and potential failure. Accurate classifiers should be found in order to categorize new applicants or existing customers as good or bad.
In the seminal paper, Fisher (1936) attempted to find a linear classifier that best differentiates between the two groups of satisfactory and unsatisfactory customers based on statistical discriminant analysis. Nonlinear regression models, logistic regression (Wiginton, 1980) and probit regression (Grablowsky and Talley, 1981), also have been applied in credit risk analysis.
A credit risk decision problem often faced by banks and other financial institutions is to decide whether to grant credit to an individual for his personal financial needs. Credit risk analysis, through the use of credit scoring models, is becoming more automated with the use of computers and the utilization of the Internet to obtain and compile financial data. In recent years, an increasing number of credit scoring models have been developed as a scientific aid to the traditionally heuristic process of credit risk evaluation. Typically, linear discriminant analysis (Fisher, 1936), logit analysis (Wiginton, 1980), probit analysis (Grablowsky and Talley, 1981), linear programming (Glover, 1990), integer programming (Mangasarian, 1965), k-nearest neighbor (KNN) (Henley and Hand, 1996), classification tree (Makowski, 1985), artificial neural networks (ANN) (Malhotra and Malhotra, 2003; Smalz and Conrad, 1994), genetic algorithm (GA) (Chen and Huang, 2003; Varetto, 1998) and support vector machine (SVM) (Van Gestel et al., 20003; Huang et al., 2004), and some hybrid models, such as neuro-fuzzy system (Piramuthu, 1999; Malhotra and Malhotra, 2002), were widely applied to credit risk analysis tasks. Two recent surveys on credit scoring and credit modeling can refer to Thomas (2002) and Thomas et al. (2005).
The main motivation of this chapter is to propose a new evolving LSSVM learning paradigm integrating LSSVM with GA for evaluating credit risk and to test the predictability of the proposed learning paradigm by comparing it with statistical models and neural network models. The rest of the chapter is organized as follows. The next section gives a brief introduction of SVM and LSSVM. The new evolving LSSVM learning paradigm is described in Section 7.3 in detail. In Section 7.4 the research data and comparable classification models are presented. The experimental results are reported in Section 7.5. Section 7.6 concludes the chapter.
At the beginning of 2005, the total outstanding consumer credit reached $2.127 trillion in U.S. Among them $0.801 trillion is revolving and $1.326 trillion is non-revolving. According to data released by the Administrative Office of the U.S. Courts, in the year that ended on December 31, 2003, U.S. bankruptcy filings set a record level, totaling 1,660,245, which included 1,625,208 non-business and 35,037 business bankruptcy filings. Also 2,062,000 people filed bankruptcy in the year that ended on December 31, 2004. Similarly, lack of a robust credit rating model has been an important issue that slowed down the development of complicated products, such as, credit derivatives, in some Asia countries, which made it difficult for investors and firms to find suitable instruments to transfer the credit risks they faced.
The motivation of this chapter is to formulate a multistage reliabilitybased SVM ensemble learning paradigm for credit risk evaluation and compare its performance with other existing credit risk assessment techniques. The rest of the chapter organized as follows. The next section presents a literature review about credit risk evaluation models and techniques. In Section 8.3, an overall formulation process of the multistage SVM ensemble learning model is provided in detail. To verify the effectiveness of the proposed method, two real examples are performed and accordingly the experiment results are reported in Section 8.4. And Section 8.5 concludes the chapter.
Credit risk assessment is a crucial decision for financial institutions due to high risks associated with inappropriate credit decisions that may lead to huge amount of losses. It is an even more important task today as financial institutions have been experiencing serious challenges and competition during the past decades. When considering the case regarding the application for a large loan, such as a mortgage or a construction loan, the lender tends to use the direct and individual scrutiny by a loan officer or even a committee. However, if hundreds of thousands, even millions of credit card or consumer loan applications need to be evaluated, the financial institutions will usually adopt models to assign scores to applicants rather than examining each one in detail. Hence various credit scoring models need to be developed for the purpose of efficient credit approval decisions (Lee and Chen, 2005).
The main motivation of this chapter is to take full advantage of the good generalization capability of SVM and inherent parallelism of metalearning to design a powerful credit risk evaluation system. The rest of this chapter is organized as follows. Section 9.2 describes the building process of the proposed SVM-based metamodeling technique in detail. For illustration and verification purposes, one publicly used credit dataset is used and the empirical results are reported in Section 9.3. In Section 9.4, some conclusions are drawn.
Business credit risk management is a scientific field which many academic and professional people have been working for, at least, the last three decades. Almost all financial organizations, such as banks, credit institutions, clients, etc., need this kind of information for some firms in which they have an interest of any kind. However, business credit risk management is not an easy thing because business credit risk management is a very complex and challenging task from the viewpoint of system engineering. It contains many processes, such as risk identification and prediction, modeling and control. In this complex system analysis, risk identification is no doubt an important and crucial step (Lai et al., 2006a), which directly influences the later processes of business credit risk management. This chapter only focuses on the business credit risk identification and analysis.
The main motivation of this chapter is to design a high-performance business credit risk identification system using knowledge ensemble strategy and meantime compare its performance with other existing single approaches. The rest of the chapter is organized as follows. Section 10.2 introduces the formulation process of the proposed EP-based knowledge ensemble methodology. Section 10.3 gives the research data and experiment design. The experiment results are reported in Section 10.4. Section 10.5 concludes the chapter.
The main contribution of this chapter is that a fully novel intelligentagent- based multicriteria fuzzy GDM model is proposed for the first time, for solving a financial MCDM problem, by introducing some intelligent agents as decision-makers. Compared with traditional GDM methods, our proposed multicriteria fuzzy GDM model has five distinct features. First of all, intelligent agents, instead of human experts, are used as decisionmakers (DMs), thus reducing the recognition bias of human experts in GDM. Second, the judgment is made over a set of criteria through advanced intelligent techniques, based upon the data itself. Third, like human experts, these intelligent agents can also generate different possible opinions on a specified decision problem, by suitable sampling and parameter setting. All possible opinions then become the basis for formulating fuzzy opinions for further decision-making actions. In this way, the specified decision problems are extended into a fuzzy GDM framework. Fourth, different from previous subjective methods and traditional time-consuming iterative procedures, this article proposes a fast optimization technique to integrate the fuzzy opinions and to make the aggregation of fuzzy opinions simple. Finally, the main advantage of the fuzzy aggregation process in the proposed methodology is that it can not only speed up the computational process via information fuzzification but also keep the useful information as possible by means of some specified fuzzification ways.
... The SVM classifiers by the SVM algorithm have been applied for credit risk analysis [3], medical diagnostics [4], handwritten character recognition [5], text categorization [6], information extraction [7], pedestrian detection [8], face detection [9], etc. ...
... Training of the SVM classifier assumes solving a quadratic optimization problem [1]- [3]. Using a standard quadratic problem solver for training the SVM classifier would involve solving a big quadratic programming problem even for a moderate sized data set. ...
... In the simplest case solution of this problem can be achieved by a search of the kernel function types, values of the kernel function parameters and value of the regularization parameter that demands significant computational expenses. A herewith for an assessment of classification quality, the indicators of classification accuracy, classification completeness, etc. can be used [3]. ...
The problem of development of the SVM classifier based on the modified particle swarm optimization has been considered. This algorithm carries out the simultaneous search of the kernel function type, values of the kernel function parameters and value of the regularization parameter for the SVM classifier. Such SVM classifier provides the high quality of data classification. The idea of particles' {\guillemotleft}regeneration{\guillemotright} is put on the basis of the modified particle swarm optimization algorithm. At the realization of this idea, some particles change their kernel function type to the one which corresponds to the particle with the best value of the classification accuracy. The offered particle swarm optimization algorithm allows reducing the time expenditures for development of the SVM classifier. The results of experimental studies confirm the efficiency of this algorithm.
... The credit scoring literature is substantial, and there have been various comparative studies relating to the state of the art. In Yu et al. (2008), Yu compared credit scoring methods using the following metrics: accuracy, interpretability, simplicity, and flexibility. He found that SVM has high accuracy and flexibility, together with a better interpretability than neural networks. ...
... Subsequently, Batuwita and Palade (2010) proposed fuzzy SVM-class imbalance (FSVM-CIL) as a way of addressing both imbalanced data and sensitivity to noise/outliers. In addition, a new version of fuzzy SVM, bilateral-weighted fuzzy SVM, was proposed in Yu et al. (2008). This method constructs two instances from the original instance, one for the positive class and one for the negative class, assigning members to them with different membership weights. ...
... Other advanced SVM methods, including least squares fuzzy SVM (LS-FSVM) and least squares bilateral fuzzy SVM (LS-BFSVM) are also described in Yu et al. (2008). ...
Crediting represents one of the biggest risks faced by the banking sector, and especially by commercial banks. In the literature, there have been a number of studies concerning credit risk management, often involving credit scoring systems making use of machine learning (ML) techniques. However, the specificity of individual banks’ datasets means that choosing the techniques best suited to the needs of a given bank is far from straightforward. This study was motivated by the need by Credins Bank in Tirana for a reliable customer credit scoring tool suitable for use with that bank’s specific dataset. The dataset in question presents two substantial difficulties: first, a high degree of imbalance, and second, a high level of bias together with a low level of confidence in the recorded data. These shortcomings are largely due to the relatively young age of the private banking system in Albania, which did not exist as such until the early 2000s. They are shortcomings not encountered in the more conventional datasets that feature in the literature. The present study therefore has a real contribution to make to the existing corpus of research on credit scoring. The first important question to be addressed is the level of imbalance. In practice, the proportion of good customers may be many times that of bad customers, making the impact of unbalanced data on classification models an important element to be considered. The second question relates to bias or incompleteness in customer information in emerging and developing countries, where economies tend to function with a large amount of informality. Our objective in this study was identifying the most appropriate ML methods to handle Credins Bank’s specific dataset, and the various tests that we performed for this purpose yielded abundant numerical results. Our overall finding on the strength of these results was that this kind of dataset can best be dealt with using balanced random forest methods.
... Choosing the optimal parameters values for the SMV classifier is a significant problem at the moment. It is necessary to find the kernel function type, values of the kernel function parameters and the value of the regularization parameter [1,2]. It is impossible to provide implementing of high-accuracy data classification with the use of the SVM classifier without adequate solution to this problem. ...
... Here, we suggest to use the modified PSO algorithm to find the type of the kernel function, the values of the parameters of the kernel function and the value of the regularization parameter of the SVM classifier simultaneously. , values of the kernel parameters and value of the regularization parameter C , which allows finding a compromise between maximizing of the gap separating the classes and minimizing of the total error [1][2][3][4]. A herewith typically one of the following functions is used as the kernel function ) , ( W N z z i : linear function; polynomial function; radial basis function; sigmoid function [5,6]. ...
... As a result of the training, the classification function is determined in the following form [1][2][3] ...
The problem of the objects identification on the base of their hyperspectral features has been considered. It is offered to use the SVM classifiers on the base of the modified PSO algorithm, adapted to specifics of the problem of the objects identification on the base of their hyperspectral features. The results of the objects identification on the base of their hyperspectral features with using of the SVM classifiers have been presented.
... The SVM classifiers based on the SVM algorithm have been applied for credit risk analysis [3], medical diagnostics [4], handwritten character recognition [5], text categorization [6], information extraction [7], pedestrian detection [8], face detection [9], Earth remote sensing [10], etc. ...
... Linear separation for two classes by the SVM classifier in the 2D space SVM algorithms are well-known for their excellent performance in the sphere of the statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for the Big Data sets: the training of the SVM classifier requires solving a quadratic optimization problem [1,3]. Using a standard quadratic problem solver for the SVM classifier training would involve solving a big quadratic programming problem even for a moderate sized data set. ...
... In the simplest case solution to this problem can be found by a search of the kernel function types, values of the kernel function parameters and value of the regularization parameter that demands significant computational expenses. For an assessment of classification quality, the indicators of classification accuracy, classification completeness, etc. can be used [3]. ...
The problem with development of the support vector machine (SVM) classifiers using modified particle swarm optimization (PSO) algorithm and their ensembles has been considered. Solving this problem would allow fulfilling the high-precision data classification, especially Big Data classification, with the acceptable time expenditures. The modified PSO algorithm conducts a simultaneous search of the type of kernel functions, the parameters of the kernel function and the value of the regularization parameter for the SVM classifier. The idea of particles' «regeneration» served as the basis for the modified PSO algorithm. In the implementation of this algorithm, some particles change the type of their kernel function to the one which corresponds to the particle with the best value of the classification accuracy. The offered PSO algorithm allows reducing the time expenditures for the developed SVM classifiers, which is very important for Big Data classification problem. In most cases such SVM classifier provides the high quality of data classification. In exceptional cases the SVM ensembles based on the decorrelation maximization algorithm for the different strategies of the decision-making on the data classification and the majority vote rule can be used. Also, the two-level SVM classifier has been offered. This classifier works as the group of the SVM classifiers at the first level and as the SVM classifier on the base of the modified PSO algorithm at the second level. The results of experimental studies confirm the efficiency of the offered approaches for Big Data classification.
... Fisher's publication in 1936 is known as the first publication that introduces credit scoring system (Lu et al. 2013). Recent Yu et al. (2008). It is interesting to study because of the complexity of processes and data behavior that changes dynamically. ...
... The more interesting is most of the scholars only discuss credit scoring as feasibility analysis. Moreover, we can see from existing definitions that tend to equalize credit scoring as a credit feasibility analysis such as in Jentzch (2007), Yu et al. (2008), and Abdou and Pointon (2011). ...
Conventional credit scoring model could lead to serious and unfair problems because in certain case it would incriminate one party in financing. Islamic financing scoring model complies with Sharia rules and ensures fairness among parties. Currently, there are no certain rules on Islamic financing scoring model which lead to subjective judgments. In the subjective judgments, words could mean different things to different people. Thus, this paper proposed and deployed models for scoring of default risk level by using Interval Type-2 Fuzzy Set model to support the subjective judgments in maintaining Sharia rules. Installment amount and the sum of delay period has used as variables for that scoring. Interval Type-2 Fuzzy Set model was proposed to support the subjective judgments in maintaining Sharia rules. Beginning delay period also used as a weight to the risk scoring results. Besides that, this paper also proposed the method for computing real loss value. It has used as a basis for fines computation according to default risk level, bad debt expense, and installment weighted average.
... Credit scoring classifiers can be classified as individual classifiers, homogeneous ensemble classifiers or heterogeneous ensembles [5]. According to Yu, Wang and Lai [9], the algorithms can be classified as statistical (such as Linear Discriminant Analysis, Logistic Regression, Probit Regression, K-Nearest Neighbors (KNN), Decision Trees), mathematical programming (such as Linear Programming, Quadratic Programming, Integer Programming), artificial intelligence (such as Artificial Neural Networks, Support Vector Machines, Genetic Algorithm, Genetic Programming, Rough Set), hybrid approaches (such as ANN and Fuzzy Systems, Rough Set and ANN, Fuzzy System and support vector machines) or ensemble approaches (such as ANN Ensemble, support vector machines Ensemble, Hybrid Ensemble). ...
... Algorithms that are commonly used in credit scoring include Bayesian Networks [28,29], Linear Discriminate analysis [30,31], Logistic Regression [31], Artificial Neural Networks [32,33], K-Nearest Neighbor [34,35], Deep Learning [36,37], Decision Trees [38] and support vector machines [9,39]. All these are classification algorithms that end up classifying an applicant into one of the two categoriesgrant loan or deny loan. ...
In consumer loans, where lenders deal with masses, use of algorithms to classify borrowers is fast catching up. Classification based on predictive models tend, to adversely affect borrowers. In this paper, we study the extent to which various algorithms disenfranchise borrowers lying on the boundaries of decision making. In the study, the data used for loan appraisal, and decisions made by the lenders are subjected to a set of select algorithms. The bias suffered by borrowers in each case is determined using mean absolute error (MAE) and relative absolute error (RAE). The results show that FURIA has the least bias with the MAE of 0.2662 and 0.1501 and RAE of 64.19% and 30.31% for the German and Australian data sets respectively. Consequently, FURIA is modified to remove the hard boundaries which results in even lower MAE of 0.2535 and 0.1264 and RAE of 64.14% and 27.73% for the German and Australian data sets respectively.
... The DOE algorithm is an alternative grid search algorithm 4 . In this algorithm, the search of globally optimal solutions is not available in all grid nodes, it is must be performed in the some nodes selected by a certain way (Fig. 1b, a special case of the search space 2D). ...
... ). The clarification of the classification decision for such objects can be realized using the approach based on the combined application of the SVM classifier and the k nearest neighbours (kNN) algorithm 4,5,14,15 . ...
In this paper the hybrid and modified versions of the PSO algorithm applied to improvement of the search characteristics of the classical PSO algorithm in the development problem of the SVM classifier have been offered and investigated. A herewith two hybrid versions of the PSO algorithm assume the use of the classical “Grid Search” (GS) algorithm and the “Design of Experiment” (DOE) algorithm accordingly, and the modified version of the PSO algorithm realizes the simultaneous search of the kernel function type, the parameters values of the kernel function, and also the regularization parameter value. Besides, the questions of applicability of the k nearest neighbors (kNN) algorithm in the development problem of the SVM classifier have been considered.
... Based on the minimization principle of structural risk, support vector machine (SVM) can avoid the over fitting problem [8], and has the ability of minimization on structural risk, and can avoid the dilemma of ANN models falling into local minimum [9]. Sakizadeh et al. [10] collected 229 soil samples, analyzed 12 kinds of heavy metals (Ag, Co, Pb, Tl, Be, Ni, Cd, Ba, Cu, V, Zn and Cr) to predict soil pollution index (SPI) with SVM and ANN algorithms. ...
With the rapid development of network technology and the digital economy, the wave of the era of artificial intelligence has swept the world. Facing the era of big data and artificial intelligence, data-oriented technologies are undoubtedly served as the practical research trend. Therefore, the precise analysis provided by big data and artificial intelligence can provide effective and accurate knowledge and decision-making references for all sectors. In order to effectively and appropriately evaluate the potential risk to soil and groundwater for gas station industry, this study focuses on the potential risk factors affecting soil and groundwater pollution. In the past, our team has evaluated the risk factors affecting the remediation cost of soil and groundwater pollution for possible potential pollution sources such as gas stations, this study proceeds with the existing industrial database for in-depth discussion, uses machine learning technology to evaluate the key factors of pollution risk for soil and groundwater, and compares the differences, applicability and relative importance of the three machine learning techniques (such as neural networks, random forests and support vector machine). The performance indicators reveal that the random forest algorithm is better than support vector machine and artificial neural network. The relative importance of parameters of different machine learning models is not consistent, and the first five dominant parameters are location, number of gas monitoring wells, age of gas station, numbers of gasoline oil nozzle, and number of fuel dispenser for random forest model.
... Credit scoring models and data are the fundamental cornerstones of the systems designed for this purpose (Alma Çallı, 2019). Approaches in decision support systems for credit scoring may be classified into four groups, according to (Yu et al., 2008): statistical techniques, operational research techniques, artificial intelligence techniques, and hybrid / combined and ensemble methods. However, the usual lending process is not carefully followed in the context of online lending, which puts the loan at a higher risk of default. ...
... This was conducted by defining a synthetic risk index using a participatory process, in order to support an operation to restructure debts. Another example of using MCDM in finance is presented in [24]. The proposed model involves a methodology that combines Group Decision Making (GDM), fuzzification, and techniques from Artificial Intelligence (AI). ...
In previous research, the Extended Order Scale (EOS) dedicated to risk assessment was analysed. It was characterised by a Numerical Order Scale (NOS) evaluated by trapezoidal oriented fuzzy numbers (TrOFNs). However, the research showed that EOS with two-stage orientation phases, was too complicated. Therefore, the main aim of our paper is to simplify a Complete Order Scale (COS) to a zero- or one-stage order scale and a hybrid approach. For this purpose, a way to calculate the scoring function is presented. The results show that changes in the COS structure influence the values of a scoring function. Replacing just one linguistic indicator gives different results. Another finding of the research is the method’s flexibility that allows an expert to individually choose the most suitable COS. The research proves that the boundary between various linguistic labels cannot be precisely defined. However, knowledge of a formal COS structure allows it to be transformed into a less complex one.
... In recent years, many classification tasks have been successfully solved through SVM-algorithm based on learning from examples or supervised training. SVM-algorithm ensures construction of binary SVM-classifier by transferring vector of characteristics of classifiable objects to higher dimensions using a special kernel function as well as searching for maximum proximity hyperplane in this space that divides different class objects [40,[49][50][51]. Although SVM-algorithm has better generalization capacity than other algorithms and classification methods, it is difficult to apply because of selecting kernel function type, kernel function values and regularization parameters that affect the data classification quality. ...
Here is a task of classifying aeroballistic vehicle path in atmospheric phase. It has been suggested the task to be solved by means of pattern recognition technique involving tutor-assisted training with a variety of classification samples, i.e., flying vehicle paths in atmospheric phase targeted at different ground objects. The pattern recognition technique has been developed, which involves minimum distance to a class standard and artificial neural network such as multilayer perceptron for the task solution. The novelty of the developed technique includes generation of gliding aircraft path through simulation of spline approximation of sectional polyline defined with a set of fixed points which include origin and end of the path as well as disturbing points. The path recognition system quality is to be assessed through probability measures. The path recognition quality assessment during study of various recognition techniques involved calculations of probability measures for different values of class numbers, path types (with changing number and steepness of maneuvers), and time interval assigned for decision-making. There is a set of programs involving mathematical modeling of aircraft paths in atmospheric phase, recognition procedures based on minimum distance to class standard method and artificial neural network such as multilayer perceptron as well as recognition quality check program through statistical assessment of correct recognition probabilities. The developed set of programs is a basis for study of various aircraft path recognition techniques with different initial data. There are computer modeling results and calculations of aircraft path classification error probabilities using the mentioned techniques in case of various path class number and limitations of decision-making time.
... Los modelos de credit score son útiles para la predeterminación de la capacidad de pago de los sujetos pasivos que requieren préstamos de las instituciones financieras, cada método tiene ventajas y desventajas (Yu, 2008), los modelos tradicionales de regresión lineal, por ejemplo, permiten que cualquier persona puede usarlos sin la necesidad de comprender a fondo el problema, son fáciles de interpretar (Cárdenas-Pérez & Benavides, 2021), mientras que modelos logarítmicos necesitan que sus variables cumplan con algunos supuestos predeterminados por el investigador (Paredes, 2018). ...
La economía popular y solidaria en Ecuador se compone tanto por las instituciones financieras cooperativistas, quienes arriesgan sus capitales, así como por los sujetos de crédito, que en su mayoría no poseen una cultura financiera que les permita mantener una capacidad de pago adecuado; esto deviene en la necesidad que presentan las IFs en lo referente a conocer el score crediticio de sus clientes y de esta manera poder minimizar el riesgo de no pago para evitar el incremento de indicadores como: cartera vencida, morosidad, insolvencia y activos improductivos. La investigación se realiza con enfoque cuantitativo, es bibliográfico y documental, descriptivo y correlacional. Uno de los principales hallazgos es que la aplicación del modelo logit incide en la gestión del score crediticio convirtiéndose en una herramienta financiera de mucha importancia en la toma de decisiones para la aprobación de créditos en las instituciones financieras de la economía popular y solidaria.
... If the value of the assessment for the characteristic 1 is 2 or 3; the value of the assessment 3 P is 2 or 3; the value of the assessment for the characteristic 4 P is equal to 2 or 3, "To accept CP for implementation" with the approximation index of 0.831. 4. If the value of the assessment for the characteristic 1 P is 2 or 3; the value of the assessment 3 P is 2 or 3; the value of the assessment 4 P is 2 or 3; the value of the assessment2 ...
The problem of developing generalizing decision rules for object classification, which arises under conditions of inaccurate knowledge about the values of objects’ attributes, and about the significance of the attributes themselves, has been considered. The approach to the binary classification of objects, which implements the representation of inaccurate knowledge based on linguistic variables and allows one to consider various strategies for the formation of generalizing decision rules for classification using the tools of multiset theory, has been proposed. The example of the formation of generalizing decision rules of binary classification for the set of competitive projects evaluated by the group of experts, confirming the effectiveness of the proposed approach, has been considered. A herewith, visualization of objects, the values of the features of which are the frequency of setting a certain score according to the a priori given rating scale by all experts, in a two-dimensional space using the non-linear dimensionality reduction algorithm named as the UMAP algorithm, has been implemented. Based on the results of visualization and cluster analysis of the initial set of competitive projects, the “noise” project, which negatively affects the results of the formation of generalizing decision rules of binary classification, was identified and removed from further analysis.
... When solving the classification problem, the quality of the classifier is influenced by the level of class balance. At the same time, if most probabilistic models are weakly dependent on class balances, then when using improbability models, in particular, SVM models (Support Vector Machine) [1][2][3][4][5][6][7][8][9][10], the class imbalance problem [5][6][7] is relevant. ...
The approach to the two-stage classification based on the 1-SVM classifier used as the main classifier and the RF classifier used as the auxiliary classifier has been considered. The proposed approach improves the quality of classification when using imbalanced datasets. The results of the comparative analysis of the proposed approach and the alternative approach to the two-stage classification, in which the binary SVM classifier is used as the main classifier, and the RF classifier as the auxiliary classifier, are presented.
... Another parameter that should be taken into account that is ε ("Epsilon") which defines the margin of tolerance where no penalty is given to errors. Normally, these parameter are selected using trial and error method which is a painstaking and overwhelming process (Tzu-Liang et al., 2015;Jin et al., 2014;Lai et al., 2008). In this study, all the above mentioned parameters needed for constructing a robust SVM model are adjusted by using a "Cross-Validation" over the offered dataset. ...
The digital revolution we are witnessing nowadays goes hand in hand with a revolution in cybercrime. This irrefutable fact has been a major reason for making digital forensic (DF) a pressing and timely topic to investigate. Thanks to the file system which is a rich source of digital evidence that may prove or deny a digital crime. Yet, although there are many tools that can be used to extract potentially conclusive evidence from the file system, there is still a need to develop effective techniques for evaluating the extracted evidence and link it directly to a digital crime. Machine learning can be posed as a possible solution looming in the horizon. This article proposes an Enhanced Multiclass Support Vector Machine (EMSVM) model that aims to improve the classification performance. The EMSVM suggests a new technique in selecting the most effective set of parameters when building a SVM model. In addition, since the DF is considered a multiclass classification problem duo to the fact that a file system might be accec-ced by more than one application, the EMSVM enhances the class assignment mechanism by supporting multi-class classification. The article then investigates the applicability of the proposed model in analysing incriminating digital evidence by inspecting the historical activities of file systems to realize if a malicious program manipulated them. The results obtained from the proposed model were promising when compared to several machine-learning algorithms. Ó 2019 Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
... Many analytical techniques have been proposed to distinguish good loan applications from bad applications. For instance, logistic regression [25], linear discriminate analysis [26], and knearest neighbor (KNN) classifiers [27], classification tree [28][29], markov chain [30][31], survival analysis [32], linear and nonlinear programming [33], neural networks [34][35], Support Vector Machines (SVMs) [36][37][38], genetic methods [39][40][41] and so on. Hybrid approaches include fuzzy systems and neural networks [42], fuzzy systems and support vector machines and neural networks and multivariate adaptive regression spines [43]. ...
As known to us, tremendous efforts have been made to exploit information from borrower's credit for loan evaluation in P2P lending, but seldom researches have explored information from investors and borrowers. To this end, we propose an integrated loan evaluation model that exploits and fuses multi-source information from both the borrower and the investor for improving investment decisions in P2P lending. First, based on the borrower's credit, we build a kernel-based credit risk model to quantitatively evaluate each loan. Second, we build an investor composition model that exploits information from the investor's investment behavior for loan evaluation. Then, based on the above two quantitative models and correlation, we define a multi-kernel weight and develop an integrated loan assessment model that can evaluate a loan with both the return and risk. Furthermore, based on the integrated information loan evaluation model, we formalize the investment decisions in P2P lending as a portfolio optimization problem with boundary constraints to help investor make better investment decisions. To validate the proposed model, we perform extensive experiments on the realworld data from the world's largest P2P lending marketplace. Experimental results reveal that the integrated loan evaluation model can significantly enhance investment performance beyond the existing model in P2P market and other baseline models.
... Over the last 20 years, the unprecedented improvements in the cost-effectiveness ration of computer, together with improved computational techniques, make machining learning widely applicable in every aspect of our lives such as education, healthcare, games, finance, transportation, energy, business, science and engineering [1,2]. Among numerous developed machining learning methods, Support Vector Machine (SVM) is a very established one which has become an overwhelmingly popular choice for data analysis [3][4][5][6][7]. In SVM method, a non-linear dataset is transformed via a feature map to another dataset and is separated by a hyperplane in the feature space, which can be effectively performed using the kernel trick. ...
A method for analyzing the feature space used in the quantum classifier on the basis of Pauli decomposition is developed. In particular, for 2-dimensional input datasets, the method boils down to a general formula that easily computes a lower bound of the exact training accuracy, which eventually helps us to see whether the feature space constructed with the selected feature map is suitable for linearly separating the dataset. The effectiveness of this formula is demonstrated, with the special type of five feature maps and four 2-dimensional non-linear datasets.
... The SVM approach has been adopted many attempts have been adopted various forms of PSO in many areas such as "text categorization" Joachims algorithm to select the optimal parameters of SVM. One of (1998), medical diagnostics Raikwal and Saxena the recent studies that applied the standard PSO with (2012), "credit risk analysis" Yu et al. (2008), SVM was introduced by Wei et al. (2011). They used PCA "information extraction" Li et al. (2005), face recognition algorithm for a feature extraction process and PSO utilized Abdullah et al. (2017), etc. Nonetheless, the selection of to find the best parameters of SVM. ...
Support vector machine can determine the global finest solutions in many complicated problems and it is widely used for human face classification in the last years. Nevertheless, one of the main limitations of SVM is optimizing the training parameters, especially when SVM used in face recognition domains. Various methodologies are used to deal with this issue such as PSO, OPSO, AAPSO and AOPSO. Nevertheless, there is a room of advancements in this kind of optimization process. Lately, an improved version of PSO is developed which is called modified PSO. In this study, a new technique based on modified PSO, called (Modified PSO-SVM) is proposed to optimize SVM parameters. The proposed scheme utilizes modified PSO to seek the finest parameters of SVM two human face datasets: SCface, CASIAV5 and CMU Multi-PIE face datasets are used in the experiments. Then, a comparison is done with the PSO-SVM, OPSO-SVM and AOPSO-SVM and it showed promising results in terms of accuracy.
... The SVM approach has been adopted many attempts have been adopted various forms of PSO in many areas such as "text categorization" Joachims algorithm to select the optimal parameters of SVM. One of (1998), medical diagnostics Raikwal and Saxena the recent studies that applied the standard PSO with (2012), "credit risk analysis" Yu et al. (2008), SVM was introduced by Wei et al. (2011). They used PCA "information extraction" Li et al. (2005), face recognition algorithm for a feature extraction process and PSO utilized Abdullah et al. (2017), etc. Nonetheless, the selection of to find the best parameters of SVM. ...
Support vector machine can determine the global finest solutions in many complicated problems and it is widely used for human face classification in the last years. Nevertheless, one of the main limitations of SVM is optimizing the training parameters, especially when SVM used in face recognition domains. Various methodologies are used to deal with this issue such as PSO, OPSO, AAPSO and AOPSO. Nevertheless, there is a room of advancements in this kind of optimization process. Lately, an improved version of PSO is developed which is called modified PSO. In this study, a new technique based on modified PSO, called (Modified PSO-SVM) is proposed to optimize SVM parameters. The proposed scheme utilizes modified PSO to seek the finest parameters of SVM two human face datasets: SCface, CASIAV5 and CMU Multi-PIE face datasets are used in the experiments. Then, a comparison is done with the PSO-SVM, OPSO-SVM and AOPSO-SVM and it showed promising results in terms of accuracy.
... The SVM approach has been adopted many attempts have been adopted various forms of PSO in many areas such as "text categorization" Joachims algorithm to select the optimal parameters of SVM. One of (1998), medical diagnostics Raikwal and Saxena the recent studies that applied the standard PSO with (2012), "credit risk analysis" Yu et al. (2008), SVM was introduced by Wei et al. (2011). They used PCA "information extraction" Li et al. (2005), face recognition algorithm for a feature extraction process and PSO utilized Abdullah et al. (2017), etc. Nonetheless, the selection of to find the best parameters of SVM. ...
... This algorithm is one of the boundary classification algorithms [1,2]. Nowadays, SVM algorithm is applied to solve different classification problems in various applications [3][4][5]. ...
In this paper we suggest the self-tuning multiobjective genetic algorithm (STMGA) based on the NSGA-II. This new algorithm is aimed to improve the SVM classification quality. The quality classification indicators such as overall accuracy, specificity, sensitivity and a number of support vectors represent the objective functions in the STMGA. The ways for realizing the self-tuning of the such STMGA parameters as the crossover probability, the crossover distribution index and the mutation distribution index have been proposed and investigated. The considered STMGA is more flexible in the context of selecting its parameters’ values and allows to refuse from the use of the parameters’ values which are set manually. In the case of the radial basis kernel function used for the SVM classifier development, the STMGA finds the Pareto-front of such parameters values as the regularization parameter value and the Gaussian kernel parameter value which give the best values in the chosen set of the classification quality indicators. The experimental results obtained on the basis of the model and real datasets of loan scoring, medical and technical diagnostics, etc. confirm the efficiency of the proposed STMGA.
... Nowadays, the different methods and algorithms of data mining applied to development of the data classifiers are known. The most famous data mining tools, such as the artificial neural networks, the decision trees, the SVM algorithm (Support Vector Machine Algorithm) [1][2][3][4][5][6][7], the kNN algorithm (k Nearest Neighbors Algorithm) [2], the Parzen window algorithm, etc. can be used in the solving the data classification problems. ...
The aim of this work is to improve the results of the SVM classification (Support Vector Machine) by hybridizing the SVM classifier with the random forest classifier (Random Forest, RF) used as the auxiliary. Specification of the classification decisions obtained on the basis of the SVM classifier is performed for the objects located in the experimentally determined subareas near the hyperplane separating the classes and including both correctly and erroneously classified objects. In the case of improving the quality of the objects classification from the initial dataset, the proposed hybrid approach to the objects classification can be recommended for classification of new objects. When developing the SVM classifier, the fixed default parameters values are used. A comparative analysis of the classification results obtained during the computational experiments in the hybridization of the SVM classifier with two auxiliary classifiers – the random forest classifier (RF classifier) and the k nearest neighbor classifier (kNN classifier), for which the parameters values are determined randomly, confirms the expediency of using of these classifiers to increase the SVM classification quality. It was found that in most cases, the random forest classifier works better in terms of improving the SVM classification quality in comparison with the kNN classifier.
... Crédito é um conceito que está presente no cotidiano das pessoas e empresas, com o passar do tempo é perceptível uma maior necessidade da utilização desse conceito, que para Schrickel[3] crédito significa, "todo ato de vontade ou disposição de alguém de destacar ou ceder, temporariamente, parte do seu patrimônio a um terceiro, com a expectativa de que esta parcela volte a sua posse integralmente, após decorrido o tempo estipulado". Silva [7] defende que a função do crédito consiste em avaliar a mudanças na estratégia de concessão de crédito[8], conseguindo análises mais precisas através das suas abordagens e técnicas de mineração de dados que podem extrair informações importantes de um conjunto de dados.Com a exacerbada quantidade de dados crescendo diariamente, responder uma questão tornou-se necessário [9]: O que fazer com os dados armazenados? As técnicas tradicionais de exploração de dados não são mais adequadas para tratar a grande maioria dos repositórios. ...
Creditis an instrument used to increase and facilitate sales of goods and services. He is responsible for a great part of the results obtained in the companies and for the development and growth of the economy of the country. However, a rigid assessment is necessary to where this credit should go, since, being applied to companies or wrong people, the credentialed can accumulate losses. In this way, this work proposes an approach using Data Mining for credit analysis through the application of Computational Intelligence algorithms, providing a more assertive decision making at the moment of credit granting.
... A company is legally insolvent if the value of its assets is less than the value of its liabilities and a company is bankrupt if it is unable to pay its debts and files a bankruptcy petition [5]. In the model proposed by this paper the bad companies (with positive event of risk) are bankruptcies or forced deletions of companies (informal bankruptcy), good companies (with negative event of risk) are companies non-failed and operating companies (an overview of all research done on credit risk models shows that this is appropriate way of defining bad and good companies for credit risk [6]). To be precise, in Estonia there are certain requirements established for companies that need to be met for legally staying as an operating company. ...
Recent hype in social analytics has modernized personal credit scoring to take advantage of rapidly changing non-financial data. At the same time business credit scoring still relies on financial data and is based on traditional methods. Such approaches, however, have the following limitations. First, financial reports are compiled typically once a year, hence scoring is infrequent. Second, since there is a delay of up to two years in publishing financial reports, scoring is based on outdated data and is not applied to young businesses. Third, quality of manually crafted models, although human-interpretable, is typically inferior to the ones constructed via machine learning. In this paper we describe an approach for applying extreme gradient boosting with Bayesian hyper-parameter optimization and ensemble learning for business credit scoring with frequently changing/updated data such as debts and network metrics from board membership/ownership networks. We report accuracy of the learned model as high as 99.5%. Additionally we discuss lessons learned and limitations of the approach.
Keywords: Business credit scoring, machine learning, boosted decision tree, hyper-parameter tuning.
... Choosing the optimal parameters values for the SVM classifier is a relevant problem. A herewith, it is necessary to find the kernel function type, the values of the kernel function parameters and the value of the regularization parameter [5,6]. It is impossible to provide implementing of highaccuracy data classification with the use of the SVM classifier without adequate solution to this problem. ...
The problem of the data analysis in the educational sphere in the context of prediction of the passing's success of the final state attestation by the graduates of the secondary school has been considered. Such data can be imbalanced substantially. To solve this problem it is offered to use the SVM classifiers on the base of the modified PSO algorithm, which allows choosing the kernel function type, the values of the kernel function parameters and the value of the regularization parameter simultaneously. In advance, the different rebalancing strategies, based on the basic SMOTE algorithm, can be applied for rebalance the classes in the experimental datasets. The prediction results with the use of the SVM classifiers on the base of the modified PSO algorithm and the different rebalancing strategies have been presented and compared with the prediction results received on the base of the most known software packages, such as Statistica StatSoft and IBM SPSS Modeler.
... where 1 v is the number of the nearest neighbors; 2 v is the number of the kernel function type; 3 v is the number of the distance metric type. ...
In this paper the data classification technique, implying the consistent application of the SVM and Parzen classifiers, has been suggested. The Parser classifier applies to data which can be both correctly and erroneously classified using the SVM classifier, and are located in the experimentally defined subareas near the hyperplane which separates the classes. A herewith, the SVM classifier is used with the default parameters values, and the optimal parameters values of the Parser classifier are determined using the genetic algorithm. The experimental results confirming the effectiveness of the proposed hybrid intellectual data classification technology have been presented.
... There are nine neurons in the input layer for each predictor variable. Experiments point out that SVM is a powerful classification method since it has outperformed most of the other methods in a wide variety of applications, such as text categorization and face or fingerprint identification (Yu, 2008). Wang and Lai (2005) proposed a fuzzy support vector machine to discriminate the customers and found out that the new fuzzy support vector machine has more classification ability. ...
Assessing credit risk allows financial institutions to plan future loans freely, to achieve targeted risk management and gain maximum profitability. In this study, the constructed risk assessment models are on a sample data which consists of financial ratios of enterprises listed in the Bourse Istanbul (BIST). 356 enterprises are classified into three levels as the investment, speculative and below investment groups by ten parameters. The applied methods are discriminant analysis, k nearest neighbor (k-NN), support vector machines (SVM), decision trees (DT) and a new hybrid model, namely Artificial Neural Networks with Adaptive Neuro-Fuzzy Inference Systems (ANFIS). This study will provide a comparison of models to build better mechanisms for preventing risk to minimize the loss arising from defaults. The results indicated that the decision tree models achieve a superior accuracy for the prediction of failure. The model we proposed as an innovation has an adequate performance among the applied models
... A company is legally insolvent if the value of its assets is less than the value of its liabilities and a company is bankrupt if it is unable to pay its debts and files a bankruptcy petition [5]. In the model proposed by this paper the bad companies (positive event of risk) are bankruptcies or forced deletions of companies, good companies are companies nonfailed and operating companies (an overview of all research done on credit risk models shows that this is appropriate way of defining bad and good companies for credit risk [6]). To be precise, in Estonia there are certain requirements established for companies that need to be met for legally staying as an operating company. ...
... As for personal credit scoring study, Abdou et al. made personal financial credit scoring [16][17][18][19][20][21][22][23][24][25][26]. As for study of the corporate credit rating, Fernandes et al. represented by the concept of spatial dependence [27][28][29][30][31][32][33]. As for the corporate bankruptcy prediction study, many prediction method are proposed [34][35][36][37][38]. ...
This paper presents investigating fraud transaction detection in the mail order industry. These kinds of detection have done intensively, but the outcome of the research has not shared among the mail-order industry. As the B2C market such as the Amazon type business expands their market volume exponentially, the fraud transactions increase in number. As a matter of course, this phenomenon is not only continuing but clever. One of the conclusive factor for this phenomenon is the payment method. That is, the deferred payment method. The conventional primary indicator for the fraud detection is the ordered time based information. They are shipping address, recipient name, and the payment method. This kind of information makes use of the prediction in common. Conventional detecting method for the fraud depends on the human working experiences so far. From such kind of information, the mail-order company predicts the potential fraud customer with their working experience parameters. As the number of order transaction becomes large, fraud detection becomes difficult. The mail order industry needs something clever detection method. From these backgrounds, we observe the transaction data with the customer attribute information gathered from a mail order company in Japan and characterized the customer with a machine learning method. From the results of the intensive research, potential fraudulent transactions are identified. Intensive research revealed that the classification of the deliberate customer and the careless customer with machine learning.
... In this study, several accounting indicators were obtained to determine the financial development of companies. Bio-inspired computational techniques were applied in accordance with SVMs in another study to model credit risk analysis (Yu et al., 2008b). In another study by Yu et al., the credit risk assessment was performed using a multistage neural network. ...
Credit risk is a common threat to the financial industry since improper management of credit risk leads to heavy financial losses in banking and non-banking sectors. Data mining approaches have been employed in the past to assess the credit risk. This study utilises the German credit dataset sourced from UCI machine learning repository for generating an artificial neural network-based ensemble learning model for credit risk assessment. Eleven data mining algorithms have been applied on an open source tool Weka for performing credit ratings on the German credit dataset using supervised learning approach. The performance of each algorithm was evaluated, and algorithms with the most diverse false positive and false negative results and that are highly accurate were selected for generating an ensemble model. The predicted outcomes of the top five ranked algorithms were fed into a feed-forward artificial neural network by employing an 'nnet' package in R. The artificial neural network-based ensemble model attained an accuracy of 98.98%, performing better than the individual component algorithms. Based on this ANN-based ensemble model, an interactive graphical user interface was further developed in R. The user-friendly graphical user interface can be used by financial organisations as a decision-support system for assessing the credit risk.
... These methods, although not offering the same level of understandability as conventional statistical techniques, have been applied in the credit-scoring space with notable success [15][16][17][18][19][20] . The works of Rosenberg and Gleit [21] , Crook et al. [2] , and Yu [22] provide a more comprehensive review of these and other contemporary classification methods. ...
In this paper some of the main causes of the recent financial crisis are briefly discussed. Specific attention is paid to the accuracy of credit-scoring models used to assess consumer credit risk. As a result, the optimal default definition selection (ODDS) algorithm is proposed to improve credit-scoring for credit risk assessment. This simple algorithm selects the best default definition for use when building credit scorecards. To assess ODDS, the algorithm was used to select the default definition for the random forest tree algorithm. The resulting classification models were compared to other models built using the unselected default definitions. The results suggest that the models developed using the default definition selected by the ODDS algorithm were statistically superior to the models developed using the unselected default indicators.
... Relevant standards such as NIST (2012) explicitly prescribe risk aggregation, but also leave the details mostly unspecified, thus calling for development of aggregation methods. The need to do so has a long history, substantiated for example by Blakley, McDermott & Geer (2001), Carroll (2013), but also in different fields of risk management, say the financial sector, where NNs and support vector machines are used to analyze financial risks (see Bol et al. (1998) or Yu et al. (2008)). The field of security metrics and how to work with them is very active, with a vast number of different approaches having been defined; see Savola (2007), Ming et al (2003), or Hayden (2010) and references therein, to mention only a few. ...
Managing risks in large information infrastructures is often tied to inevitable simplification of the system, to make a risk analysis feasible. One common way of “compacting” matters for efficient decision making is to aggregate vulnerabilities and risks identified for distinct components into an overall risk measure related to an entire subsystem and the system as a whole. Traditionally, this aggregation is done pessimistically by taking the overall risk as the maximum of all individual risks, following the heuristic understanding that the “security chain” is only as strong as its weakest link. As that method is quite wasteful of information, this work proposes a new approach, which uses neural networks to resemble human expert’s decision making in the same regard. To validate the concept, we conducted an empirical study on human expert’s risk assessments, and trained several candidate networks on the empirical data to identify the best approximation to the opinions in our expert group.
... The omission node in the initial localization phase needs assisted localization as follows [13,14]. ...
This paper proposes a novel localization algorithm for wireless sensor network (WSN). Accurate localization is very important for WSN. WSN localization problem is sometimes regarded as an optimization problem. Plant growth simulation algorithm (PGSA) is a kind of new intelligent optimization algorithm, which is intelligent simulation of plant growth in natural way. In addition to the common characteristics of intelligent algorithms, PGSA show robustness and provides a global optimal solution, etc. In this paper, further enhancement of the algorithm by adding the plant root of adaptive backlight function to effectively improve the computing speed and localization precision has been reported. Comparing this algorithm with simulated annealing algorithm (SAA), simulation results show that this algorithm has a higher and more consistent localization precision and faster computational speed.
The number of leasing clients in Slovakia is constantly growing and this sector is becoming an increasingly important part of the local economy. Leasing as such ensures its financial stability, and the leasing companies themselves have changed from medium-sized companies to strong institutional investors who accumulate temporary free funds and place them on the financial markets. The management of potential risks that could jeopardize economic performance and stability must therefore be an essential part of their internal processes and must be given adequate attention. Under the pressure of competition and with the aim of profit, leasing companies also involve modern optimization methods in decision-making, and these become an integral part of business analysis. This work focuses on the potential use of one of the most widely used computational techniques in examining the risk of payment failure of their clients. By discriminatory analysis, we will verify the solvency of clients on the examined sample and then predict the probability of their future nonpayment.
Im November 2017 beschloss die Leitung des Statistischen Bundesamtes (DESTATIS) die Durchführung eines „Proof of Concept Machine Learning“ als Leuchtturmprojekt der neuen Digitalen Agenda. Der Termin für den Projektabschluss wurde auf Juni 2018 festgelegt. Die Themen „Machine Learning“ und noch mehr „Künstliche Intelligenz“ sind sehr umfassend und kaum noch zu überschauen. Das hausinterne Projektteam verständigte sich darauf, sich ausschließlich auf das maschinelle Lernen zu konzentrieren und dessen Anwendbarkeit in allen Fachstatistiken in den Fokus zunehmen. Nach Abschluss des Proof of Concept sollte eine Übersicht der potenziellen Anwendungen in den Fachstatistiken vorliegen. Die Arbeiten am und die Ergebnisse des „Proof of Concept Machine Learning“ wurden in diesem Abschlussbericht festgehalten, der am 31. Juli 2018 vorgelegt wurde.
In November 2017, the management of the Federal Statistical Office (DESTATIS) decided to conduct a "Proof of Concept Machine Learning" as a lighthouse project of the new Digital Agenda. The deadline for the completion of the project was set for June 2018. The topics "Machine Learning" and even more "Artificial Intelligence" are very comprehensive and hardly manageable. The in-house project team agreed to concentrate exclusively on machine learning and to focus on its applicability in all subject statistics. After completion of the proof of concept, an overview of the potential applications in the specialised statistics should be available. The work on and the results of the proof of concept machine learning were recorded in this final report, which was submitted on 31 July 2018.
The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM classifier with default parameters values is developed. Here, the training dataset is designed on the basis of the initial dataset. When developing the SVM classifier, a binary SVM algorithm or one-class SVM algorithm is used. Based on the results of the training of the SVM classifier, two variants of the training dataset are formed for the development of the kNN classifier: a variant that uses all objects from the original training dataset located inside the strip dividing the classes, and a variant that uses only those objects from the initial training dataset that are located inside the area containing all misclassified objects from the class dividing strip. In the second stage, the kNN classifier is developed using the new training dataset above-mentioned. The values of the parameters of the kNN classifier are determined during training to maximize the data classification quality. The data classification quality using the two-stage hybrid SVM-kNN classifier was assessed using various indicators on the test dataset. In the case of the improvement of the quality of classification near the class boundary defined by the SVM classifier using the kNN classifier, the two-stage hybrid SVM-kNN classifier is recommended for further use. The experimental results approve the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. The experimental results obtained with the application of various datasets confirm the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem.
In the modern world of significant advances in technology along with the need to handle massive amount of data in the field of computation biology, microarray gene expression analysis has posed significant challenges to the state-of-the-art data mining techniques. These types of data generally deal with a large number of dimensions and are known as high-dimensional data. Classification is one of the widely used data mining techniques which is being used in diverse range of applications like credit card fraud detection, recognizing cancerous cell, in retail sector, etc. The high dimensions have rendered many existing classification techniques impractical. From the information collected by extensive studies, it is found that highest number of misclassified objects lie beside the hyperplane by which the classes are separated. We have proposed an efficient SVM classifier which hybridizes outlier detection method with SVM, and random forest classifier is used as an auxiliary classifier. This approach significantly improves the accuracy of the existing SVM classifier.
Credit risk assessment has been one of the most appealing topics in banking and finance studies, attracting both scholars’ and practitioners’ attention for some time. Following the success of the Grameen Bank, works on credit risk, in particular for Small Medium Enterprises (SMEs), have become essential. The distinctive character of SMEs requires a method that takes into account quantitative and qualitative information for loan granting decision purposes. In this chapter, we first provide a survey of existing credit risk assessment methods, which shows a current gap in the existing research in regards to taking qualitative information into account during the data mining process. To address this shortcoming, we propose a framework that utilizes an XML-based template to capture both qualitative and quantitative information in this domain. By representing this information in a domain-oriented way, the potential knowledge that can be discovered for evidence-based decision support will be maximized. An XML document can be effectively represented as a rooted ordered labelled tree and a number of tree mining methods exist that enable the efficient discovery of associations among tree-structured data objects, taking both the content and structure into account. The guidelines for correct and effective application of such methods are provided in order to gain detailed insight into the information governing the decision making process. We have obtained a number of textual reports from the banks regarding the information collected from SMEs during the credit application/evaluation process. These are used as the basis for generating a synthetic XML database that partially reflects real-world scenarios. A tree mining method is applied to this data to demonstrate the potential of the proposed method for credit risk assessment.
This book contains a selection of papers from The 2019 International Conference on Software Process Improvement (CIMPS’19), held between the 23th and 25th of October in León, Guanajuato, México. The CIMPS’19 is a global forum for researchers and practitioners that present and discuss the most recent innovations, trends, results, experiences and concerns in the several perspectives of Software Engineering with clear relationship but not limited to software processes, Security in Information and Communication Technology and Data Analysis Field.
The main topics covered are: Organizational Models, Standards and Methodologies, Software Process Improvement, Knowledge Management, Software Systems, Applications and Tools, Information and Communication Technologies and Processes in non-software domains (Mining, automotive, aerospace, business, health care, manufacturing, etc.) with a demonstrated relationship to Software Engineering Challenges.
Innovation is a key driver of competitiveness and productivity in today’s market. In this scenario, Knowledge is seen as a company’s key asset and has become the primary competitive tool for many businesses. However, An efficient knowledge management must face diverse challenges such as the knowledge leakage and the poor coordination of work teams. To address these issues, experts in knowledge management must support organizations to come up with solutions and answers. However, in many cases, the precision and ambiguity of their concepts are not the most appropriate. This article describes a method for the diagnosis and initial assessment of knowledge management. The proposed method uses machine-learning techniques to analyze different aspects and conditions associated with knowledge transfer. Initially, we present a literature review of the common problems in knowledge management. Later, the proposed method and its respective application are exposed. The validation of this method was carried out using data from a group of software companies, and the analysis of the results was performed using Support Vector Machine (SVM).
Climate change and impacts studies are gaining importance in wake of changing climate and its impact. Climate models namely General Circulation Models have been developed by different research groups to study the impact of climate at Global Scale and they are the primary dataset available for modelling global climate change in the future. However, owing to their coarse spatial resolution, GCM models are not appropriate for impact studies at local scale having the finer spatial resolution. Therefore, for impact studies, climate models available at global scale are correlated with atmospheric and climate conditions like temperature and precipitation at local scale through downscaling process. Different downscaling techniques ranging from simple to dynamic downscaling techniques have been developed by the researchers to develop the mathematical models that correlate the GCM outputs with local observations.
Among these downscaling techniques, statistical downscaling techniques are most widely used techniques owing to easy of its implementation through computer based tools. SDSM is one of the widely used software for statistical downscaling that utilizes statistical downscaling technique for downscaling the GCM data-set. However, the available statistical downscaling software tools are not appropriate to automate the downscaling process for multiple grids of a given area of interest (AOI). Using the existing downscaling tools, manual intervention is required to downscale the GCM data at local scale for large AOIs having the sizeable spatial extent.
In this research work, a novel generalized downscaling model namely Efficient Multi-site Statistical Downscaling Model (EMSDM) based on the multivariate regression technique has been developed to automate the downscaling process for multiple grids. EMSDM can be applied to automate the downscaling of GCM data to multiple local grids of a AOI. Internal procedures of EMSDM are programmed in platform independent C programming language for efficiently handling large quantum of GCM and local observation data and carrying out the complex mathematical computations like inversion of large matrices. For demonstrating, the applicability of the model, GCM model namely second generation Canadian Earth System Model (CanESM2) (CanESM2) developed by the Canadian Centre for Climate Modelling and Analysis (CCCma) of Environment and Climate Change Canada and local daily precipitation and temperature data-set acquired from Indian Meteorological Department (IMD) have been used for carrying out downscaling using the proposed model. India has been selected as AOI.
On basis of analysis of downscaling results generated by the model, it can be concluded that proposed model can efficiently be used to carry out statistical downscaling the AOI (comprising of multiple grids) irrespective of its extent. Results generated by the proposed model can be utilized by investigators to carry out climate impacts studies for AOI having large spatial extent.
Moreover, in order to facilitate the spatial geo-visualization of downscaling results, a web GIS based framework has been developed to geo-visualize the time series data generated by EMSDM. In addition of the downscaling, EMSDM is able to generate valuable spatial data-set pertaining to local observation and GCM outputs of given area of interest. These spatial date-set can utilized by the decision makers to investigate spatial distribution of climatological parameters like temperature, precipitation etc.
Aufgabe der amtlichen Unternehmensstatistiken ist die Bereitstellung von Informationen über Struktur und Entwicklung der Wirtschaft, die sie durch Erhebungen, die Nutzung von Verwaltungsdaten, den Zukauf kommerzieller Daten und die Verknüpfung von Mikrodaten gewinnt. In jüngster Zeit wurde darüber hinaus auch der Einsatz von Machine-Learning-Verfahren in amtlichen Unternehmensstatistiken experimentell erprobt, und zwar bei Zuordnungsentscheidungen und der Generierung neuer Informationen. In diesem Beitrag wird das Vorgehen im Überblick dargestellt. Dazu werden zunächst die Methodik des maschinellen Lernens in den Grundzügen dargestellt, bisherige Anwendungsgebiete außerhalb und in der amtlichen Statistik beschrieben sowie die in der Unternehmensstatistik experimentell eingesetzten Verfahren erläutert. Anschließend wird die praktische Anwendung von Support Vector Machines und Random Forests auf fünf konkrete Aufgabenstellungen in ausgewählten Unternehmensstatistiken dargestellt. Abschließend werden die bisherigen Erfahrungen zusammenfassend bewertet und potenzielle weitere Aufgabenstellungen sowie absehbare Weiterentwicklungen der maschinellen Lernverfahren aufgezeigt.
The article discusses the methodology is widely known the SWOT-analysis and suggests approaches to increase confidence in the results of the analysis. Some typical methodological errors are Illustrated by examples and discussed. Recommendations that increase the value of the method in strategic analysis and increase its effectiveness are given. The recommendations could be useful for companies’ managers of different levels in the strategy development and advancement of strategic alternatives, Diploma Thesis supervisors as well as for students and graduates of MBA programs.
The approach to the classification problem of the imbalanced datasets has been considered. The aim of this research is to determine the effectiveness of the SMOTE algorithm, when it is necessary to improve the classification quality of the SVM classifier, which is applied for classification of the imbalanced datasets. The experimental results which demonstrate the improvement of the SVM classifier quality with application of ideas the SMOTE algorithm for the imbalanced datasets in the sphere of medical diagnostics have been given.
Behavioral credit scoring models are a specific kind of credit scoring models, where time-evolving data about delinquency pattern, outstanding amounts, and account activity, is used. These data have a dynamic nature as they evolve over time in accordance with the economic environment. On the other hand, scoring models are usually static, implicitly assuming that the relationship between the performance characteristics and the subsequent performance of a customer will be the same under the current situation as it was when the information on which the scorecard was built was collected, no matter what economic changes have occurred in that period. In this study we investigate how this assumption affects the predictive power of behavioral scoring models, using a large data set from Greece, where consumer credit has been heavily affected by the economic crisis that hit the country since 2009.
The fundamental subject matter of this publication is the analysis of the issue of bankruptcy in the context of appearance of possible threat signals. The presented research aims at proving validation values of described models in predicting possible bankruptcy signals and evaluating the financial condition of the TSL sector (transport, spedition, logistics) entities from Poland and Slovakia. In order to predict the risk of company bankruptcy from the logistics sector, the following statistical models of bankruptcy classification were used: classic linear discriminant analysis and logistic regression. What is more, the predictions based on the so-called classification trees and the method of nearest neighbours was applied. The empirical verification of correct classification by given groups of methods of statistical bankruptcy analysis from the perspective of their efficiency showed that these methods could be characterized with a high quality of bankruptcy prediction. The presented concepts allow evaluating quite easily the threat of bankruptcy for a given group of entities. One vital advantage of the presented results is the fact of dividing research sample into the so-called learning group, for which parameters of the analysed models were estimated, and the test sample researching effectiveness of proper classifications, for which all the predictions were set for a period of both one and two years before the bankruptcy.
Receiver Operating Characteristics (ROC) graphs are useful for organizing classi-fiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been used increasingly in machine learning and data mining research. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
A support vector machine (SVM) learns the decision surface from two distinct classes of the input points. In many applications, each input point may not be fully assigned to one of these two classes. In this paper, we apply a fuzzy membership to each input point and reformulate the SVMs such that different input points can make different constributions to the learning of decision surface. We call the proposed method fuzzy SVMs (FSVMs).
An algorithm is given for finding the point of a convex polyhedron in an n-dimensional Euclidean space which is closest to the origin. It is assumed that the convex polyhedron is defined as the convex hull of a given finite set of points. This problem arises when one wishes to determine the direction of steepest descent for certain minimax problems.
An important feature of radial basis function neural networks is the existence of a fast, linear learning algorithm in a network capable of representing complex nonlinear mappings. Satisfactory generalization in these networks requires that the network mapping be sufficiently smooth. We show that a modification to the error functional allows smoothing to be introduced explicitly without significantly affecting the speed of training. A simple example is used to demonstrate the resulting improvement in the generalization properties of the network.
Many static and dynamic models have been used to assist decision making in the area of consumer and commercial credit. The decisions of interest include whether to extend credit, how much credit to extend, when collections on delinquent accounts should be initiated, and what action should be taken. We survey the use of discriminant analysis, decision trees, and expert systems for static decisions, and dynamic programming, linear programming, and Markov chains for dynamic decision models. Since these models do not operate in a vacuum, we discuss some important aspects of credit management in practice, e.g., legal considerations, sources of data, and statistical validation of the methodology. We provide our perspective on the state-of-the-art in theory and in practice.
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
The typical technique used to construct credit scoring models is discriminant analysis. This paper presents a descriptive example and empirical analysis to illustrate how linear programming might be used to solve discriminant type problems. Results of the analysis indicated that the linear programming procedure performs well in solving the example credit scoring problem. In addition, the structure of the linear programming model was such that changes could be readily made to reflect either conservative or liberal lending policies.
Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS-SVM for Unsupervised Learning LS-SVM for Recurrent Networks and Control.
In this paper, a new trust region algorithm for nonlinear equality constrained LC1 optimization problems is given. It obtains a search direction at each iteration not by solving a quadratic programming subproblem with a trust region bound, but by solving a system of linear equations. Since the computational complexity of a QP-Problem is in general much larger than that of a system of linear equations, this method proposed in this paper may reduce the computational complexity and hence improve computational efficiency. Furthermore, it is proved under appropriate assumptions that this algorithm is globally and super-linearly convergent to a solution of the original problem. Some numerical examples are reported, showing the proposed algorithm can be beneficial from a computational point of view.
In this paper, we show that training of the support vector machine (SVM) can be interpreted as performing the level 1 inference of MacKay's evidence framework. We further on show that levels 2 and 3 of the evidence framework can also be applied to SVMs. This integration allows automatic adjustment of the regularization parameter and the kernel parameter to their near-optimal values, Moreover, it opens up a wealth of Bayesian tools for use with SVMs. Performance of this method is evaluated on both synthetic and real-world data sets.
High-dimensional discriminant analysis is of fundamental importance in multivariate statistics. Existing theoretical results sharply characterize different procedures, providing sharp convergence results for the classification risk, as well as the ℓ2 convergence results to the discriminative rule. However, sharp theoretical results for the problem of variable selection have not been established, even though model interpretation is of importance in many scientific domains. In this paper, we bridge this gap by providing sharp sufficient conditions for consistent variable selection using the ROAD estimator (Fan et al., 2010). Our results provide novel theoretical insights for the ROAD estimator. Sufficient conditions are complemented by the necessary information theoretic limits on variable selection in high-dimensional discriminant analysis. This complementary result also establishes optimality of the ROAD estimator for a certain family of problems.
This paper describes an application of the multi-factor model to the analysis and prediction of corporate failure. The multi-factor model differs from the more usual methods of failure prediction because failure is conditioned on the values of a series of exogenous risk factors rather than on a series of “internal” financial ratio’s. Moreover, the multi-factor model is not primarily aimed at classifying firms in categories, but at modelling the influences of exogenous riskfactors, through sensitivities, on the firm’s cash flow generating process. The paper presents the general multi-factormodel and the conditional failure prediction model as bell as the possibilities to apply the model.
Risk assessment of credit portfolios is of pivotal importance in the banking industry. The bank that has the most accurate view of its credit risk will be the most profitable. One of the main pillars in the assessing credit risk is the estimated probability of default of each counterparty, i.e., the probability that a counterparty cannot meet its payment obligations in the horizon of one year. A credit rating system takes several characteristics of a counterparty as inputs and assigns this counterparty to a rating class. In essence, this system is a classifier whose classes lie on an ordinal scale. In this paper we apply linear regression ordinal logistic regression, and support vector machine techniques to the credit rating problem. The latter technique is a relatively new machine learning technique that was originally designed for the two-class problem. We propose two new techniques that incorporate the ordinal character of the credit rating problem into support vector machines. The results of our newly introduced techniques are promising.
Driven by the need to allocate capital in a profitable way and by the recently suggested Basel II regulations, financial institutions are being more and more obliged to build credit scoring models assessing the risk of default of their clients. Many techniques have been suggested to tackle this problem. Support Vector Machines (SVMs) is a promising new technique that has recently emanated from different domains such as applied statistics, neural networks and machine learning. In this paper, we experiment with least squares support vector machines (LS-SVMs), a recently modified version of SVMs, and report significantly better results when contrasted with the classical techniques.
In this chapter we consider bounds on the rate of uniform convergence. We consider upper bounds (there exist lower bounds as well (Vapnik and Chervonenkis, 1974); however, they are not as important for controlling the learning processes as the upper bounds).
Unrepresentative data samples are likely to reduce the utility of data classifiers in practical application. This study presents a hybrid mining approach in the design of an effective credit scoring model, based on clustering and neural network techniques. We used clustering techniques to preprocess the input samples with the objective of indicating unrepresentative samples into isolated and inconsistent clusters, and used neural networks to construct the credit scoring model. The clustering stage involved a class-wise classification process. A self-organizing map clustering algorithm was used to automatically determine the number of clusters and the starting points of each cluster. Then, the K-means clustering algorithm was used to generate clusters of samples belonging to new classes and eliminate the unrepresentative samples from each class. In the neural network stage, samples with new class labels were used in the design of the credit scoring model. The proposed method demonstrates by two real world credit data sets that the hybrid mining approach can be used to build effective credit scoring models. (c) 2005 Elsevier Ltd. All rights reserved.
This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.
A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ∈.
Learning and evolution ai-e two fundamental forms of adaptation. There has been a gl-eat interest in combining learning and evolution with artificial neural networks (ANN's) in recent years. This paper: I) reviews reviews ent combinations between ANN's and evolutionary algorithms (EA's), including using EA's to evolve ANN connection weights, architectures, learning rules, and input features; 2) discusses different search operators which have been used in various EA's; and 3) points out possible future research directions. it is shown, through a considerably large literature review, that combinations between ANN's and EA's can lead to significantly better intelligent systems than relying on ANN's or EA's alone.
This book focuses on forecasting foreign exchange rates via artificial neural networks (ANNs), creating and applying the highly useful computational techniques of Artificial Neural Networks (ANNs) to foreign-exchange rate forecasting. The result is an up-to-date review of the most recent research developments in forecasting foreign exchange rates coupled with a highly useful methodological approach to predicting rate changes in foreign currency exchanges.
We describe a novel interpolation algorithm to find the optimal image intensity function generating an optimal gray-level estimation of interpolated pixels of digital images. The new approach is based on the proposed image block mapping method and least-square support vector machines (LSSVM) with Gaussian radial basis function (RBF) kernels. With the mapping technique, the interpolation procedure of the LSSVM is actually accomplished in the same input vector space. A number of different scale interpolation experiments are carried out. The experimental results demonstrate that the performance of the proposed algorithm is competitive with many other existing methods, such as cubic, spline, and linear methods. The peak signal-to-noise ratio of the image reconstructed by the proposed algorithm is higher than those obtained by the spline. And the estimated accuracy of the proposed algorithm is similar to that of the cubic algorithm, while the computational requirement is lower than the latter. (C) 2004 Society of Photo-Optical Instrumentation Engineers.
Most commonly used credit screening methods are based on Fisher's linear discriminant function for quantitative variates. The variables actually used in credit scoring are usually, however, qualitative occurring as high/low, or present/ absent. We present a nonparametric approach to the credit screening problem. In our method we do not assume multivariate normality nor do we apply any arbitrary scaling procedures to the qualitative variables. We classify an observation to that group with which it has most in common, this being done so as to minimize expected loss from misclassification. It is a variant of the “closest neighbor” rule. The misclassification probabilities of our screening rule are obtained by a jack-knife method. An empirical method for the selection of variates for screening is also given. The possibilities for adopting the method to an on-line computer system is discussed with an illustrative example.
The paper considers a general approach for classifying objects using mathematical programming algorithms. The approach is based on optimizing a utility function, which is quadratic in indicator parameters and is linear in control parameters (which need to be identified). Qualitative characteristics of the utility function, such as monotonicity in some variables, are included using additional constraints. The methodology was tested with a 'credit cards scoring' problem. Credit scoring is a way of separating specific subgroups in a population of objects (such as applications for credit), which have significantly different credit risk characteristics. A new feature of our approach is incorporating expert judgments in the model. For instance, the following preference was included with an additional constraint: 'give more preference to customers with higher incomes.' Numerical experiments showed that including constraints based on expert judgments improves the performance of the algorithm. Copyright # 2003 John Wiley & Sons, Ltd.
Motivated by a dramatic growth in Bell System uncollectible revenues, a set of uniform, objective, and nondiscriminatory credit-granting practices have been developed which will apply to the 12 million new residential telephone customers each year and result in an annual reduction of $137 million in bad debts. These savings will be factored into the regulatory rate-setting process, and thus a substantial benefit will accrue to the telephone customer.
The cornerstone of the new credit procedures is a set of credit-scoring rules to determine which new telephone service applicants should provide preservice security deposits. By more accurately identifying high-credit-risk applicants and requesting deposits only from them, the reduction in bad debt can be achieved with fewer total deposit requests. These new credit-scoring rules were developed through what is perhaps the largest credit study ever done, involving the credit profiles and telephone payment histories of over 87,000 customers. As a consequence of the study, a new methodology for constructing simple but effective credit-scoring rules has been developed which could be of general use in a broad spectrum of applications, including credit problems of other industries as well as other “classification” or “screening” problems.
The last 30 years have seen the development of credit scoring techniques for assessing the creditworthiness of consumer loan applicants. Traditional credit scoring methodology has involved the use of techniques such as discriminant analysis, linear or logistic regression, linear programming and decision trees. In this paper we look at the application of the k-nearest-neighbour (k-NN) method, a standard technique in pattern recognition and nonparametric statistics, to the credit scoring problem. We propose an adjusted version of the Euclidean distance metric which attempts to incorporate knowledge of class separation contained in the data. Our k-NN methodology is applied to a real data set and we discuss the selection of optimal values of the parameters k and D included in the method. To assess the potential of the method we make comparisons with linear and logistic regression and decision trees and graphs. We end by discussing a practical implementation of the proposed k-NN classifier.
A general class of fuzzy numbers is introduced: triangular fuzzy numbers (TFNs) in extended representation. This representation allows the operations of fuzzy arithmetic as well as fuzzy logic to be defined within this class of extended TFN (at least approximately). This leads naturally to the definition of fuzzy functions, and it is shown that Sugeno-type fuzzy rules can be expressed by such functions. In this context the Sugeno controller is able to use fuzzy numbers as input variables, thus extending the original concept of using crisp values as input.
The importance of fuzzy valuations in diagnostics is mentioned. The problem of aggregation of fuzzy opinions obtained from a group of experts in answer to the question: “Has an object the property labeled A and no property labeled ⊂1A?” is formulated. The set of Fung and Fu's [10] aggregation axioms is shortly described. The idea of weighting expert's opinions is formalized. Then a new set of aggregation axioms is presented and an aggregating operator is introduced. ‘A posteriori’ weighting of opinions in aggregation is examined.