Content uploaded by Martin Rezac
Author content
All content in this area was uploaded by Martin Rezac on Jul 08, 2015
Content may be subject to copyright.
ESTIMATING INFORMATION VALUE
FOR CREDIT SCORING MODELS
ŘEZÁČ Martin, (CZ)
Abstract. Assessing the predictive power of credit scoring models is an important question to
financial institutions. Because it is impossible to use a scoring model effectively without
knowing how good it is, quality indexes like Gini, Kolmogorov-Smirnov statisic and
Information value are used to adress this problem. The paper deals with the Information value,
which enjoy high popularity in the industry. Commonly it is computed by discretisation of data
into intervals using deciles. One constraint is required to be met in this case. Number of cases
have to be nonzero for all intervals. If this constraint is not fulfilled there are some issues to
solve for preserving reasonable results. To avoid these computational issues, I proposed an
alternative algorithm for estimating the Information value, named the empirical estimate with
supervised interval selection. This advanced estimate is based on requirement to have at least k,
where k is a positive integer, observations of scores of both good and bad clients in each
considered interval. Simulation study with normally distributed scores shows high dependency
on choise of the parameter k. If we choose too small value, we get overestimated value of the
Information value, and vice versa. The quality of the estimate was assessed using MSE.
According to this criteria, adjusted square root of number of bad clients seems to be a
reasonable compromise.
Keywords: Credit scoring, Quality indexes, Information value, Empirical estimate, Normaly
distributed scores
Mathematics Subject Classification: Primary 62G05, 62P05; Secondary 65C60.
1 Introduction
Credit scoring is a set of statistical techniques used to determine whether to extend credit (and
if so, how much) to a borrower. When performing credit scoring, a creditor will analyze a relevant
data sample to see what factors have the most effect on credit worthiness. Once these factors and
their importances are known, a model is developed to calculate a credit score for new applicants.
Methodology of credit scoring models and some measures of their quality were discussed in
works like Hand and Henley (1997) or Crook at al. (2007) and books like Anderson (2007), Siddiqi
Aplimat
–
J
ournalofAppliedMathematics
volume4(2011),number3
342
(2006), Thomas et al. (2002) and Thomas (2009). Further remarks connected to credit scoring
issues can be found there as well.
Once a scoring model is available, it is natural to ask how good it is. To measure the partial
processes of a financial institution, especially their components like scoring models or other
predictive models, it is possible to use quantitative indexes such as Gini index, K-S statisic, Lift,
Information value and so forth. They can be used for comparison of several developed models at the
moment of development. It is possible to use them for monitoring the quality of models after the
deployment into real business as well. See Wilkie (2004) or Siddiqi (2006) for more details.
The paper deals primarily with the Information value. Commonly it is computed by
discretisation of data into bins using deciles with requirement on the nonzero number of cases for
all bins. As an alternative method to the empirical estimates one can use the kernel smoothing
theory, which allows to estimate unknown densities and consequently, using some numerical
method for integration, to estimate value of the Information value. See Koláček and Řezáč (2010)
for more details.
The main objective of this paper is a description of the empirical estimate with supervised
interval selection. This advanced estimate is based on requirement to have at least k, where k is a
positive integer, observations of scores of both good and bad clients in each considered interval.
Simulation study with normally distributed scores shows high dependency on choise of the
parameter k. If we choose too small value, we get overestimated value of the Information value, and
vice versa. The quality of the estimate is assessed using MSE. According to this criteria, I proposed
a rule for choice of k, which seems to be a reasonable compromise.
2 Basic notations
Consider the realization ∈ of random value (score) is available for each client. Let be
the indicator of good and bad client
1,
0, (1)
and let , denote cumulative distribution functions of score of bad and good clients, i.e.
| 0,
| 1, ∈. (2)
Aassume , and their corresponding densities ,
are continuous on.
In practice, the empirical distribution functions are used
∑
∧0
∑
∧1,∈,, (3)
where is the score of -th client, , are the number of good and bad clients, respectively and
. is the minimum value of given score, is the maximum value. Finally, we denote
the proportion of bad clients.
3 The Information value
Aplimat–JournalofAppliedMathematics
volume4(2011),number3
343
Very popular quality index, which is based on densities of scores of good and bad clients, is
the Information value (statisic) defined as
∞
, (4)
where
ln
. (5)
Note that the Information value is also called Divergence. See Wilkie (2004), Hand and Henley
(1997) or Thomas (2009) for more details. The example of
for 10% of bad clients with
:0,1 and 90% of good clients with
:4,2 is illustrated in Figure 1.
Figure 1: Contribution to Information value.
However, in practice, the procedure of computation of the Information value can be a little bit
complicated. Firsty, we don't know the right form of densities
,
generally and as the second,
mostly we don't know how to compute the integral. I show some approaches to solve these
computational problems.
3.1 Estimates for normally distributed data
In case of normally distributed data, we know everything what is needed. We just have to
discriminate between two cases. Firstly, we consider that scores of good and bad clients have
common variance. In this case we have
, (6)
where
,
and
are expectations of scores of good and bad clients and is common
standard deviation, see Wilkie (2004) for more details. When equality of variances is not
considered, then in Řezáč (2009) one can find generalized form of
given by
1
∗
1, (7)
Aplimat
–
J
ournalofAppliedMathematics
volume4(2011),number3
344
where
∗
,
,
and
are variances of scores of good and bad clients.
The similar formula can be found in Thomas (2009). For a given data, estimation of
is done by
replacing theoretical means and variances in (6) or (7) by their appropriate empirical expressions.
To explore behaviour of the expression (7) it is possible to use tools offered by system Maple.
See Hřebíček and Řezáč (2008) for more details. An example of usage of the Exploration Assistant
is given in Figure 2. We can see a quadratic dependence on difference of means in part (a).
Furthermore, it is clear from (7) that
takes quite high values when both variances are
approximately equal and smaller or equal to 1, and that it grows to infinity if ratio of the variances
tends to infinity or is nearby zero. These properties of
are illustrated in Figure 2, part (b).
(a) (b)
Figure 2: Maple Exploration Assistant for 3D-plot of I
val
. Dependence of I
val
(a) on
and
for fixed
and
, (b) on
and
for fixed
and
.
3.2 Empirical estimates
The main idea of this chapter is to replace unknown densities by their empirical estimates.
Let's have score values
,1,…, for bad clients and score values
,1,…, for
good clients and denote (resp. ) as the minimum (resp. maximum) of all values. Let's divide the
interval , up to subintervals
,
,
,
,…,
,
, where
1,
and
,1,…,1 are appropriate quantiles of score of all clients. Set
∑
∈
,
∑
∈
,
,1,…, (8)
Aplimat–JournalofAppliedMathematics
volume4(2011),number3
345
observed counts of bad or good clients in each interval. Denote the contribution to the
information value on jth interval, calculated by
ln
,1,…,. (9)
Then the empirical information value is given by
∑
. (10)
However in practice, there could occur computational problems. The Information value index
becomes infinite in cases when some of or are equal to 0. When this arises there are
numerous practical procedures for preserving finite results. For example one can replace the zero
entry of numbers of goods or bads by a minimum constant of say 0.0001. Choosing of the number
of bins is also very important. In the literature and also in many applications in credit scoring, the
value 10 is preferred.
3.3 Empirical estimates with supervised interval selection
This approach follows ideas in the previous chapter. Estimation of information value is given
again by formulas (8) to (10). The main difference lies in construction of the intervals. Because we
want to avoid zero values of and , I simply looked for such selection of intervals, which
provides such values and , which are all positive. This will lead to situation when all
fractions and logarithms in (9) are defined and finite.
More generally, I propose to require to have at least , where is a positive integer,
observations of socres of both good and bad client in each interval, i.e. and for
j1,…,. Set L1
F
∙
,1,…,
(11)
H,
where F
∙is the empirical quantile function appropriate to the empirical cumulative distribution
function of scores of bad clients. means lower integer part of number . Usage of quantile
function of scores of bad clients is motivated by the assumption, that number of bad clients is less
than number of good clients, which is quite natural assumption. If is not divisible by , it is
necessary to adjust our intervals, because we obtain number of scores of bad clients in the last
interval, which is less than . In this case, we have to merge the last two intervals. This will lead to
situation, when it holds for all computed intervals of scores.
Furthermore we need to ensure, that the number of scores of good clients is as required in
each interval. To do so, we compute for all actual intervals. If we obtain for jth interval,
we merge this interval with its neighbor on the right side. This is equivalent with the removal of
from the sequence of borders of the intervals. This can be done for all intervals except the last
one. If we have for the last interval, than we have to merge it with its neighbor on the left
side, i.e. we merge the last two intervals. However, this situation is not very probable. If we have a
Aplimat
–
J
ournalofAppliedMathematics
volume4(2011),number3
346
reasonable scoring model, we can assume that good clients have higher scores than bad clients. It
means that we can expect that the number of scores of good clients is higher than number of scores
of bad clients in the last interval. Due to construction of the intervals, number of scores of bad
clients in the last interval is greater than . Thus, it is natural to expect that number of scores of
good clients in the last interval is also greater than . After all, we obtain and for
all created intervals.
Very important is the choice of . If we choose too small value, we get overestimated value of
the Information value, and vice versa. As a reasonable compromise seems to be adjusted square root
of number of bad clients given by
√, (12)
where means upper integer part of number .
Denote the contribution to the information value on jth interval, calculated by
ln
,1,…,. (13)
where and correspond to observed counts of good and bad clients in intervals created
according to the procedure described in this chapter. The empirical information value with
supervised interval selection is now given by
∑
. (14)
4 Simulation results
It is clear, and it is easy to show that outperformes . However, this chapter is focused
on properties of depending on choice of parameter k and depending on proportion of bad clients
and difference of means of scores of bad and good clients μμ. Consider 10000 clients,
100∙% of bad clients with :μ,1 and 100∙1% of good clients with :,1.
Set 0 and consider 0.5,1and1.5, 0.02,0.05,0.1and0.2. The case μμ
0.5, i.e. 0.25 in our settings, represents weak, μμ1 means high and μμ1.5
very high performance of given scoring model. 2% bad rate (0.02) represents low risk
portfolio, e.g. mortgages (before current crises). 20% bad represents very high risk portfolio, e.g.
subprime cash loans.
Appropriate data sets for simulation was randomly generated 1000 times. Quality of was
assessed using mean squared error given by
. (15)
Given this measure, denote
Aplimat–JournalofAppliedMathematics
volume4(2011),number3
347
.
(16)
Following Table 1 consists of for all considered values of and μμ. Proposed values of
k, √, are presented in the last row of the table.
Table 1: depending on and .
We can see that is increasing according to . This is maybe somewhat surprising, but it is
quite natural. The increasing means increasing number of bad clients, becasuse the number of all
clients was fixed to 10000. If we have enough of bad clients, then too small k leads to too many bins
and consequently to overestimated results. But what is surpricing, it is the dependence on μμ.
While for weak models it is optimal to take very high number of observation in each bin, the
contrary holds for high perfoming models. Overall, √ seems to be a reasonbale
comprimise.
For completeness, Table 2 consists of average numbers of bins for all considered values of
and μμ. We can see that they took values from 8 to 127.
Table 2: Average number of bins depending on and .
The dependece of on k is illustrated in Figure 3 to 5. The highlighted circles correspond
to values of k, where minimal value of the MSE is obtained. The diamonds correspond to values of k
given by (12).
0.02 0.05 0.1 0.2
0.5
29 42 62 84
1
12 18 23 32
1.5
6989
15 23 32 45
0.02 0.05 0.1 0.2
0.5
8,00 13,00 18,00 24,90
1
18,00 28,80 42,76 51,88
1.5
33,62 50,20 95,96 127,67
avg.#ofbins
Aplimat
–
J
ournalofAppliedMathematics
volume4(2011),number3
348
(a) (b)
Figure 3: Dependence of (a)
and (b) MSE on k, 100000 clients,
..
We can see that
is decreasing when k is increasing. In case of μ
μ
0.5, speed of this
decrease is very high for small values of k, while it is nearly negligible for values of k higher than
some critical value. The similar holds for MSE.
(a) (b)
Firuge 4: Dependence of (a)
and (b) MSE on k, 100000 clients,
.
When μ
μ
1, the speed of the decrease is lower compared to the previous case. Furthermore
MSE is not so flat, especially for
2%. But what is interesting and important here, our choice
of k is nearly optimal according to MSE. Moreover, it is valid for all considered values of
.
Aplimat–JournalofAppliedMathematics
volume4(2011),number3
349
(a) (b)
Figure 5: Dependence of (a)
and (b) MSE on k, 100000 clients,
..
Tha last considered difference of means of scores of good and bad clients was μ
μ
1.5. In
this case, the speed of the decrease of
is the lowest compared to the previous two cases. The
novelty, relative to the previous two cases, is the shape of MSE. Especially for the highest
considered value of proportion of bad clients, i.e.
20%, we can see that MSE has really sharp
minimum.
Overall, Figure 3 and Figure 4 show that curves of MSE are quite flat nearby its minimum . It
means that a small deviation of k from
cause a small change in MSE. On the other hand Figure
5 shows the strong dependence on choice of k.
5 Conclusions
I focused on the Information value and described difficulties of its estimation. The most
popular method is the empirical estimator using deciles of given score. But it can lead to infinite
values of
and so a remedy is necessary. To avoid these difficulties I proposed the adjustment
for the empirical estimate, called the empirical estimate with supervised interval selection. It is
based on the assumption that we have at least some positive number of observed scores in each
interval. This directly leads to situation when all fractions and all logarithms are defined and finite.
Consequently,
is defined and finite.
The simulation study was focused on properties of
depending on choice of parameter k
and depending on proportion of bad clients and difference of means of scores of bad and good
clients. Quality of
was assessed using mean squared error, which is easy to compute for
normally distributed scores. Moreover, the optimal value of
was computed.
It was shown that
was increasing according to
. This is maybe somewhat surprising,
but it is quite natural. The increasing
means increasing number of bad clients, becasuse the
number of all clients was fixed in our case. If we have enough of bad clients, then too small k leads
to too many bins and consequently to overestimated results. But what was surpricing, it was the
dependence on μ
μ
. While for weak models it is optimal to take very high number of
observation in each bin, the contrary holds for high perfoming models. Overall, √ seems to
be a reasonbale comprimise.
Aplimat
–
J
ournalofAppliedMathematics
volume4(2011),number3
350
On the other hand, the obtained results open additional possibilities for research. Especially, it
seems that inclusion of μμ, represented by appropriate estimates, to the rule of choise of k
could lead to significantly better estimates of when using the proposed empirical estimate with
supervised interval selection.
References
[1] ANDERSON, R.: The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk
Management and Decision Automation. Oxford : Oxford University Press, 2007.
[2] CROOK, J.N., EDELMAN, D.B., THOMAS, L.C.: Recent developments in consumer
credit risk assessment. European Journal of Operational Research, 183 (3), 1447-1465,
2007.
[3] HAND, D.J. and HENLEY, W.E.: Statistical Classification Methods in Consumer Credit
Scoring: a review. Journal of the Royal Statistical Society, Series A. 160 (3), 523-541,
1997.
[4] HŘEBÍČEK, J., ŘEZÁČ, M.: Modelling with Maple and MapleSim In: 22nd European
Conference on Modelling nad Simulation ECMS 2008 Proceedings, Dudweiler, 60-66,
2008.
[5] KOLÁČEK, J., ŘEZÁČ, M.: Assessment of Scoring Models Using Information Value. In:
Compstat’ 2010 proceedings. Paris, 1191-1198, 2010.
[6] ŘEZÁČ, M.: Indexy kvality normálně rozložených skóre. Forum Statisticum Slovacum,
Bratislava, 2009.
[7] SIDDIQI, N.: Credit Risk Scorecards: developing and implementing intelligent credit
scoring. New Jersey: Wiley, 2006.
[8] TERRELL, G.R.: The Maximal Smoothing Principle in Density Estimation. Journal of
the American Statistical Association, 85, 470-477, 1990.
[9] THOMAS, L.C.: Consumer Credit Models: Pricing, Profit, and Portfolio. Oxford:
Oxford University Press, 2009.
[10] THOMAS, L.C., EDELMAN, D.B., CROOK, J.N.: Credit Scoring and Its Applications.
Philadelphia: SIAM Monographs on Mathematical Modeling and Computation, 2002.
[11] WAND, M.P. and JONES, M.C.: Kernel smoothing. London: Chapman and Hall, 1995.
[12] WILKIE, A.D.: Measures for comparing scoring systems. In: Thomas, L.C., Edelman,
D.B., Crook, J.N. (Eds.), Readings in Credit Scoring. Oxford: Oxford University Press, s.
51-62, 2004.
Current address
Martin Řezáč, Mgr., Ph.D.,
Department of Mathematics and Statistics,
Faculty of Science, Masaryk University,
611 37 Brno, Czech Republic, tel. +420 549 493 919,
e-mail: mrezac@math.muni.cz